Anyway to stop requesting from a source that's logged as disabled?


#1

I’ve had to mask the URLs below, but both of the original ones work just fine if visited via browser. The problem is, I see these errors repeated a lot of times in the worker process, and I’m not sure if there’s a way to just disable a domain via some config option, so that multiple retries won’t even get triggered in the first place?

09:47:30 [WARNING] sentry.http: source.disabled (url=u'https://some-real-domaindomain/' value=u"<class 'requests.exceptions.ConnectionError'>" type='fetch_generic_error')
09:47:30 [WARNING] sentry.http: source.disabled (url=u'https://some-working-url-1' value=u"<class 'requests.exceptions.ConnectionError'>" type='fetch_generic_error')
09:47:30 [WARNING] sentry.http: source.disabled (url=u'https://some-working-url-1' value=u"<class 'requests.exceptions.ConnectionError'>" type='fetch_generic_error')
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/sentry/lang/javascript/errormapping.py", line 126, in rewrite_exception
    if processor.try_process(exc):
  File "/usr/local/lib/python2.7/site-packages/sentry/lang/javascript/errormapping.py", line 77, in try_process
    mapping = self.load_mapping()
  File "/usr/local/lib/python2.7/site-packages/sentry/lang/javascript/errormapping.py", line 59, in load_mapping
    timeout=settings.SENTRY_SOURCE_FETCH_TIMEOUT,
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 501, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sentry/http.py", line 154, in request
    response = requests.Session.request(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/raven/breadcrumbs.py", line 297, in send
    resp = real_send(self, request, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sentry/http.py", line 146, in send
    return super(BlacklistAdapter, self).send(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /facebook/react/master/scripts/error-codes/codes.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fe1a63878d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Related, but not entirely related:

  1. From the installation documents, I see that there can be worker processes specified when running the web process. I’m assuming these worker processes are the web load-balancer types, and not the workers that upload sentry events from redis->database. Am I right?

  2. If I’m right, should I ideally be running one background worker, one cron worker entirely? Right now we are running ~ 10 background workers, 1 cron worker, 1 web process. And we have nearly 10 machines running this setup each. I’m guessing we are doing this wrong because we’ll end up having 10 cron and ~100 background workers in total in the current scenario. This part is not entirely clear in the on premises installation docs, so this might help others as well in their setups.


#2

You can disable source fetching entirely, but afaik there is not a way to customize it per-domain.

Unfortunately I can’t comment on the processor/worker stuff. cc @matt


#3

The problem is, I see these errors repeated a lot of times in the worker process, and I’m not sure if there’s a way to just disable a domain via some config option, so that multiple retries won’t even get triggered in the first place?

There’s not. We only provide a way to block IP addresses, not hostnames.

From the installation documents, I see that there can be worker processes specified when running the web process. I’m assuming these worker processes are the web load-balancer types, and not the workers that upload sentry events from redis->database. Am I right?

This is just the number of web processes to spawn. You’re right that it has nothing to do with asynchronous workers.

If I’m right, should I ideally be running one background worker, one cron worker entirely? Right now we are running ~ 10 background workers, 1 cron worker, 1 web process. And we have nearly 10 machines running this setup each. I’m guessing we are doing this wrong because we’ll end up having 10 cron and ~100 background workers in total in the current scenario. This part is not entirely clear in the on premises installation docs, so this might help others as well in their setups.

You should run 1 cron worker globally, and as many web/worker processes as you need to handle your load. There’s not really a magic number here, just whatever can keep you ingesting events, and keep the queue processing without backlog. Running 10 cron processes won’t really break things, but it’ll generate a ton of extra redundant work.


#4

@benvinegar @matt Thank you for the info.

I do have one more question though: Is there a place I can find more information about source fetching? And anything I can do to avoid the repetitive fetching? The logs do say it’s a warning, but should I be worried this might be a wrong config I missed out on? Because most of the URLs it’s trying to fetch are the exact same things most of the time.


#5

The error you’re seeing has nothing to do with source fetching, to be clear. @benvinegar misidentified. This is specifically from our error mapper for React.

See https://github.com/getsentry/sentry/blob/master/src/sentry/lang/javascript/errormapping.py for more information.

There’s not really a way to disable this, but I’m moreso curious why it’s erroring in the first place since it’s just trying to fetch a publicly accessible url.

Are you running in an environment without public internet or something?


#6

Actually, your logs are showing two issues. One of them is related to source fetching, and the other with the stack trace is the error mapper.

Neither of these are actually problematic, but I think they’re just a misconfiguration of your networking.

You can disable source fetching on a per-project level in the UI or set SENTRY_SCRAPE_JAVASCRIPT_CONTEXT in sentry.conf.py to 100% disable source fetching globally.