Inbound Data Filters (Server-Side)

Due to popular demand, we’ve been working on adding server-side filtering of data. The simple version of this is discarding events from things like browser extensions, but under the hood the goal is to provide a lot more power over time. These apply before rate limits, and are part of a larger shift on our end to allow you to remove unwanted data as much as possible.

Right now we’ve got a few baked in filters (in addition to our already existing IP filters):

  • Legacy browsers [JavaScript only]
  • Web crawlers
  • Browser extensions [JavaScript only]
  • Localhost errors (e.g. from development environments)

If you’re running open source Sentry you’ll also be able to create your own filters. Though we’re not willing to commit to a stable API at this point, it’s unlikely to change, or to change much.

Here’s what it looks like today, within your project settings:

We’d love to get more community feedback, and using this as an avenue for that seems ideal. What would you like to see here?

The initial version of this will be going live today.

This feature sounds great, but I can’t seem to get it to work. Using the hosted version of sentry on sentry.io,

I’ve enabled the filter but I still see javascript errors from URLS on localhost.

Any ideas?

We don’t actually filter request URIs which are bound to local host right now. Today it’s purely based on user.ip. We could probably add that as it doesn’t seem like an issue.

@benvinegar thoughts?

1 Like

This has come up enough already that we should probably do it.

Is there any possibility of a production service making a request to 127.0.0.1 (itself or another service on a different port) and triggering an error that would be suppressed via this filter? That would be my only concern.

It could happen more than you’d think from misconfiguring proxies.

We’d like to whitelist crawlers. We have the Inbound Filter for bad crawlers turned on, however, we find that there are some that generate a lot of errors on the site that aren’t caught by the Inbound Filter. However, we’d like to know if our system is failing for useful crawlers like Google/Bing/etc.

we find that there are some that generate a lot of errors on the site that aren’t caught by the Inbound Filter.

@ehthayerthe filter list is open source and we accept PRs if you’d like to add to it.

However, we’d like to know if our system is failing for useful crawlers like Google/Bing/etc.

So basically you’d like to turn individual ones on-off. Yeah, we can consider that for the future – at least for the “big ones” you mention.

a company like ours lives or dies based on our search rankings, so it’s important to us to know if search crawlers are failing. we’ve turned off the sentry web crawler filter and have implemented our own log4j filter for now. More visibility in the UI as to what crawlers match would help too.

I think we’re going to put some stats on this page, and probably just link to the repo as I’ve done above.

Any update on filtering request URIs that are bound to localhost?

Did you decide not to do it?

Hi!
We’d love to also be able to filter out events by Hostname and not just by IP.
IPs can be dynamic and change over time.
Cheers!

What happened to this feature? I am viewing the settings for my project, and I see tabs for General, Notifications, etc but I don’t see “Inbound Data Filters” anywhere. Was this feature deprecated? I am interested in excluding web crawlers, or at least having errors from web crawlers in a separate search.