Not processing events

For a few days we’ve had the problem that all events in our on-premise (k8s/helm) sentry deployment seem to be stuck. There are no errors in the logs as far as I can tell. I can log in to one of the worker pods and see that there are a number of events in the preprocessing queue - I’ve purged this, but it didn’t help. I then went through and restarted all of the services and eventually a few more events showed up but it seems to be stuck again. Any clue how to debug this?

We’re using chart v4.6.0 and app v20.7.2. Thanks!

I’m using sentry onpremise vanilla and I’m having the same problem. Only “filtered” events come in.

Screenshot from 2020-09-08 18-58-19

Hello @odinho, can you post your sentry.conf.py?

@rokroskar I would raise this against the helm chart issue tracker.

1 Like

So I was mistaken. I thought it was another problem. But now I see that it is actually processing this. It looked stuck, but it is processing them, it’s just that it has a huge backlog :joy:

I used to have like 1/event minute, but now it’s above 40-ish events a minute and the worker can’t keep up during daytime. So it has a queue of 8000 or so events. Since peak time is over it is slowly processing through them, and at 2300-ish it seems to be on top of it again. So sorry about the noise. I had just updated and thought this was related, but it’s just a regular issue.

The server is maxed out on CPU, so I won’t add more workers, but I could maybe give it some more CPUs and then add additional workers. Or I could just start rate limiting events.

Of course the real fix is to fix my apps that’s spamming issues at the Sentry server! :joy: Thanks @untitaker!

I believe the issue is actually that workers were running out of cpu/memory - are there some official recommendations for resource provisioning?

We cannot provide recommendations for the helm chart as we don’t maintain it.

Sure I understand that, but the question is not about the helm chart, it’s about resource requirements for sentry. Are there some guidelines for running sentry on-premise?

That really depends on what the Helm chart does. Does it spawn 1 vs n workers? Does it provide special options to increase/reduce memory usage?

The only clear-cut issue I’ve seen with regard to OOM was this: Sentry worker dying

But if that’s your problem depends on whether the Helm chart passes/does not pass this option already.

1 Like

ok thanks @untitaker I’ll take this over to the helm chart repo. Thanks for the pointer to the other thread!