We have in our infrastructure a sentry setup that was setup by someone who didn’t document anything and then left the company (like it happens everywhere I guess ).
We’ve recently experienced some performance issues (workers having a hard time processing events as fast as they come in). Here’s our current situation:
- The boxes where the workers run seem fine, high CPU / mem…etc usage but nothing crazy
- Same for the redis cluster, pretty high but not necessarily worrying.
- Nothing in particular in the sentry logs.
- Postgres database is huge (~400G, ~320G just for the sentry_eventmapping table…) and often hits 100% of CPU utilization.
- This table has a bit more than 1 Billion of rows (1 100 832 546 exactly)
So I have two questions:
Do you guys agree that it seems like the performance issue is coming from the current state of the Postgres DB.
How can I fix it?
Thanks a lot for your help