Not processing events

rokroskar · September 8, 2020, 4:41pm

For a few days we’ve had the problem that all events in our on-premise (k8s/helm) sentry deployment seem to be stuck. There are no errors in the logs as far as I can tell. I can log in to one of the worker pods and see that there are a number of events in the preprocessing queue - I’ve purged this, but it didn’t help. I then went through and restarted all of the services and eventually a few more events showed up but it seems to be stuck again. Any clue how to debug this?

We’re using chart v4.6.0 and app v20.7.2. Thanks!

odinho · September 8, 2020, 4:58pm

I’m using sentry onpremise vanilla and I’m having the same problem. Only “filtered” events come in.

Screenshot from 2020-09-08 18-58-19

untitaker · September 9, 2020, 6:38pm

Hello @odinho, can you post your sentry.conf.py?

@rokroskar I would raise this against the helm chart issue tracker.

odinho · September 9, 2020, 6:59pm

So I was mistaken. I thought it was another problem. But now I see that it is actually processing this. It looked stuck, but it is processing them, it’s just that it has a huge backlog

I used to have like 1/event minute, but now it’s above 40-ish events a minute and the worker can’t keep up during daytime. So it has a queue of 8000 or so events. Since peak time is over it is slowly processing through them, and at 2300-ish it seems to be on top of it again. So sorry about the noise. I had just updated and thought this was related, but it’s just a regular issue.

The server is maxed out on CPU, so I won’t add more workers, but I could maybe give it some more CPUs and then add additional workers. Or I could just start rate limiting events.

Of course the real fix is to fix my apps that’s spamming issues at the Sentry server! Thanks @untitaker!

rokroskar · September 28, 2020, 9:37am

I believe the issue is actually that workers were running out of cpu/memory - are there some official recommendations for resource provisioning?

untitaker · September 28, 2020, 10:07am

We cannot provide recommendations for the helm chart as we don’t maintain it.

rokroskar · September 28, 2020, 10:17am

Sure I understand that, but the question is not about the helm chart, it’s about resource requirements for sentry. Are there some guidelines for running sentry on-premise?

untitaker · September 28, 2020, 10:21am

That really depends on what the Helm chart does. Does it spawn 1 vs n workers? Does it provide special options to increase/reduce memory usage?

The only clear-cut issue I’ve seen with regard to OOM was this: Sentry worker dying

But if that’s your problem depends on whether the Helm chart passes/does not pass this option already.

rokroskar · September 28, 2020, 10:34am

ok thanks @untitaker I’ll take this over to the helm chart repo. Thanks for the pointer to the other thread!

system · December 8, 2020, 1:26pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Large backlog of events.process_event and events.save_event On-Premise	8	3253	October 26, 2020
Accepted events not showing up in project On-Premise	7	5880	November 28, 2019
Sentry consuming all CPU	3	4992	November 29, 2019
Sentry: 429 errors On-Premise	8	3047	December 8, 2020
Workers seems to not work On-Premise	20	10168	March 14, 2017

Not processing events

Related topics