Sentry performances and cleanup

abezard · August 29, 2018, 5:40pm

Hi everybody,

We have in our infrastructure a sentry setup that was setup by someone who didn’t document anything and then left the company (like it happens everywhere I guess ).

We’ve recently experienced some performance issues (workers having a hard time processing events as fast as they come in). Here’s our current situation:

The boxes where the workers run seem fine, high CPU / mem…etc usage but nothing crazy
Same for the redis cluster, pretty high but not necessarily worrying.
Nothing in particular in the sentry logs.
Postgres database is huge (~400G, ~320G just for the sentry_eventmapping table…) and often hits 100% of CPU utilization.
This table has a bit more than 1 Billion of rows (1 100 832 546 exactly)

So I have two questions:

Do you guys agree that it seems like the performance issue is coming from the current state of the Postgres DB.
How can I fix it?

Thanks a lot for your help
Alex-

matt · August 29, 2018, 6:09pm

The fact that you have 1 billion rows itself, isn’t necessarily problematic. It definitely depends on the specs of the machine.

With that said, you can freely TRUNCATE that entire table without losing much functionality. If you upgrade to a newer version of Sentry (I forget what version I added this), we write significantly less into that table, since it’s not always needed.

This table purely facilitates looking up an event id to a group. Which is only uesd when you do a search with an explicit event id.

abezard · August 29, 2018, 7:13pm

Thanks for your quick answer matt.

Also I’ve been trying to run cleanup but the command hangs forever. The process on the box is waiting (Status code S+) and the DB takes forever to execute some queries like:

delete from sentry_eventmapping where id = any(array( select id from sentry_eventmapping where “date_added” < now() - interval ‘7 days’ limit 10000));

I assume the cleanup is hanging because of this: the table being huge, the queries take forever, and so is the cleanup?

matt · August 29, 2018, 7:33pm

Oh, yeah, I think we don’t ship with the right index on that table. I’m not sure why, but looking at code now, there’s no index. I think we’ve applied the index manually on sentry.io years ago and never brought it into code.

You can either add an index onto the date_added column manually, or just truncate the table for now, because like I said, it’s very limited value.

abezard · August 29, 2018, 9:12pm

awesome, thanks matt, you rock!

abezard · August 30, 2018, 8:21am

I’m still getting some performance issues and I don’t get where it’s coming from.

No error in the sentry-web / redis / sentry-cron logs.
I get from time to time the following line in sentry-workers logs

[WARNING] sentry.tasks.process_buffer: Failed to process pending buffers due to error: Unable to acquire <Lock: ‘buffer:process_pending’> due to error: Could not set key: u’l:buffer:process_pending’
but that’s all.

Here are the symptoms:

super fast growing events queue
high CPU usage on the DB side
sentry workers processes seem to be sitting here doing nothing, waiting.

Any idea?

Thanks.

abezard · August 30, 2018, 6:23pm

DB seems stuck on this query:

SELECT “sentry_option”.“id”, “sentry_option”.“key”, “sentry_option”.“value”, “sentry_option”.“last_updated” FROM “sentry_option” WHERE “sentry_option”.“key” = ‘system.url-prefix’

zeeg · August 30, 2018, 6:24pm

That query doesn’t seem like a root cause, but is probably showing up from something else. Is something locking tables?

Topic		Replies	Views
Unable to run "sentry cleanup" due to lack of disk space On-Premise	4	11578	December 7, 2016
Postgres logs is filled with errors On-Premise	5	2878	December 1, 2016
Performance Issues On-Premise	2	2381	January 28, 2019
Why sentry cleanup soo slowly On-Premise	7	2121	September 7, 2021
Postgres is growing rapidly - Manual cleanup On-Premise	17	13896	September 16, 2021

Sentry performances and cleanup

Related topics