Large backlog of events.process_event and events.save_event

yo-gi · October 19, 2020, 7:18pm

Hiya! We recently started running Sentry on-premise and are experiencing a large backlog of events.process_event and events.save_event.

Setup
Sentry 20.9.0bb3d590
Azure machine with 4 cpus and 16 GB of memory
Traffic from 2 - 150 incoming events per minute
All services running on machine except Postgres which is managed separately
5 workers configured like so:

    worker:
      << : *sentry_defaults
      command: run worker
    worker2:
      << : *sentry_defaults
      command: run worker -Q events.process_event
    worker3:
      << : *sentry_defaults
      command: run worker -Q events.process_event
    worker4:
      << : *sentry_defaults
      command: run worker -Q events.process_event
    worker5:
      << : *sentry_defaults
      command: run worker -Q events.process_event

Adding workers and upgrading the machine (previously was 2cpu 8GB) seems to have helped but we’re still seeing the queue build up a lot. Here are the logs from sentry queues list

    activity.notify 0
    alerts 0
    app_platform 0
    assemble 0
    auth 0
    buffers.process_pending 0
    cleanup 0
    commits 0
    counters-0 0
    data_export 0
    default 0
    digests.delivery 0
    digests.scheduling 0
    email 0
    events.preprocess_event 0
    events.process_event 34032
    events.reprocess_events 0
    events.reprocessing.preprocess_event 0
    events.reprocessing.process_event 0
    events.reprocessing.symbolicate_event 0
    events.save_event 18708
    events.symbolicate_event 0
    files.delete 0
    incident_snapshots 0
    incidents 0
    integrations 0
    merge 0
    options 0
    relay_config 0
    reports.deliver 0
    reports.prepare 0
    search 0
    sleep 0
    stats 0
    subscriptions 0
    triggers-0 809
    unmerge 0
    update 0

What steps can we take to investigate and diagnose this issue? Can anyone point me to other threads or resources?

BYK · October 19, 2020, 8:09pm

I think this thread can help you: How to clear backlog and monitor it

yo-gi · October 19, 2020, 8:15pm

Thanks for the response @BYK! I did add the workers already based on that thread (you can see the config in the initial post), I also added one for save_event. This doesn’t really seem to be making a dent in the queue. The Clickhouse max memory flag is also set to 0.3.

Did you mean something else from that thread?

BYK · October 19, 2020, 8:28pm

Ooops, sorry for zombie responding

This may actually be limiting you so you may want to try increasing it a bit.

Also how do your worker logs look like? Maybe there are hints we are missing?

yo-gi · October 19, 2020, 8:38pm

re: Clickhouse, I’ll try that! All the worker logs look like this. I don’t see any errors or warnings that aren’t these

worker5_1                      | 2020-10-19T20:22:38.193458842Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:23:12.644595224Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:23:12.644628325Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:23:18.442984386Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:23:18.443033987Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:23:25.576246320Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:23:25.576301321Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:23:26.634927924Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:23:26.634984525Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:24:10.207719860Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:24:10.207768761Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:24:14.737120000Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:24:14.737182201Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:24:19.714759327Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:24:19.714798828Z   InsecureRequestWarning)
worker5_1                      | 2020-10-19T20:24:23.803244720Z /usr/local/lib/python2.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
worker5_1                      | 2020-10-19T20:24:23.803331422Z   InsecureRequestWarning)

BYK · October 19, 2020, 9:07pm

Those errors might be the reason for your issues. Are you using Sentry with SSL? Do you have any custom config? If so can you share that with us?

yo-gi · October 19, 2020, 9:41pm

We are using Sentry with SSL through an nginx ingress (managed by Kubernetes) that sits in front of this. When I purge the queues, events do get processed for a few hours before they get backlogged again. How should I address these errors?

yo-gi · October 20, 2020, 10:37pm

One other thing is that setting up a worker to ingest events.save_event tasks doesn’t work. This is the config, and I don’t see any events being ingested in the logs, whereas our workers that ingest events.process_event have processing logs.

  worker6:
    << : *sentry_defaults
    command: run worker -Q events.save_event

BYK · October 26, 2020, 8:40am

I don’t have any pointers or clues right now. Were you able to solve this? If not, sharing full logs may help with finding a solution.

Topic		Replies	Views
Not processing events On-Premise	9	1719	December 8, 2020
Sentry: 429 errors On-Premise	8	3047	December 8, 2020
How to clear backlog and monitor it On-Premise	20	14350	January 14, 2021
Sentry consuming all CPU	3	4992	November 29, 2019
Sentry stopped logging after spike? On-Premise	4	5144	October 24, 2021

Large backlog of events.process_event and events.save_event

Related topics