Hiya! We recently started running Sentry on-premise and are experiencing a large backlog of events.process_event and events.save_event.
Setup
Sentry 20.9.0bb3d590
Azure machine with 4 cpus and 16 GB of memory
Traffic from 2 - 150 incoming events per minute
All services running on machine except Postgres which is managed separately
5 workers configured like so:
worker:
<< : *sentry_defaults
command: run worker
worker2:
<< : *sentry_defaults
command: run worker -Q events.process_event
worker3:
<< : *sentry_defaults
command: run worker -Q events.process_event
worker4:
<< : *sentry_defaults
command: run worker -Q events.process_event
worker5:
<< : *sentry_defaults
command: run worker -Q events.process_event
Adding workers and upgrading the machine (previously was 2cpu 8GB) seems to have helped but we’re still seeing the queue build up a lot. Here are the logs from sentry queues list
Thanks for the response @BYK! I did add the workers already based on that thread (you can see the config in the initial post), I also added one for save_event. This doesn’t really seem to be making a dent in the queue. The Clickhouse max memory flag is also set to 0.3.
We are using Sentry with SSL through an nginx ingress (managed by Kubernetes) that sits in front of this. When I purge the queues, events do get processed for a few hours before they get backlogged again. How should I address these errors?
One other thing is that setting up a worker to ingest events.save_event tasks doesn’t work. This is the config, and I don’t see any events being ingested in the logs, whereas our workers that ingest events.process_event have processing logs.
worker6:
<< : *sentry_defaults
command: run worker -Q events.save_event