I recently upgraded from Sentry 9 to 20, running 20.12.1 now. It is a completely fresh install as I only had a handful of projects in Sentry 9 and this seemed easier.
It is deployed on Kubernetes using the official chart, using Redis as a back-end instead of RabbitMQ, as I did not want to needlessly install a whole bunch of extra stuff.
At first, everything was running fine even if the interface was incredibly slow (and still remains incredibly slow), but as more time passes, it appears that Sentry is getting slower and slower for no apparent reason.
When it was freshly installed, I generated some test events and they were instantly available. Right now, when I generate a test event, it takes well over 2 hours before it shows up.
Furthermore, I’ve noticed that certain Snuba pods in the cluster routinely go into crashloops. They eventually resolve, but it mostly seems down to them not being able to connect to Kafka for whatever reason. The Kafka pods show no errors at all.
This is a completely fresh install on a beefy cluster and it’s simply unworkable as it is now. Sentry 9 was perfect, this upgrade introduced a massive amount of extra dependencies and overhead and performs terribly.
Just for testing purposes, I’ve also deployed Sentry 20 to a different cluster, where exactly the same issues appear after a while.
Does anyone have similar issues and/or any idea of how to fix this? I cannot routinely wait hours for errors to appear…
Maybe less official than I thought? Though I doubt it matters much HOW it is deployed. There are clearly issues here with components not linking up properly and I would like to delve into that… why does it take 2 hours? What causes that? Is there some massive backup and if so where? What logs do I look at, how do I find out, etc.
It is absolutely not official and while I don’t discount the possibility that your issues have nothing to do with that helm chart, in principle your questions and how well Sentry “links up” do depend on how you deploy Sentry.
So far we don’t know very much as to which errors you’re actually encountering. When Snuba says it can’t reach Kafka, surely that means there’s a networking error between those, or that Kafka really is down?
In any case I would first try to report this issue against the helm chart itself and, if you have the time, attempt to repro the same issue using getsentry/onpremise (which is just a docker-compose)
I’ll take this up with the chart maintainers then. Kafka seems to be problematic across the board - I’m finding more and more people with issues running it in various contexts, so my guess is that the problem is somewhere there.
FWIW somebody on Sentry Discord was able to get onpremise running with Redpanda instead of Kafka, which appears to be more stable but also requires license keys. You can try it out though we don’t support that for sure.