Cleanup stuck for several days on "Removing old NodeStore values"

DandyDeveloper · October 29, 2020, 12:46am

I have a mid-sized Sentry system that has been receiving events with a 30 day retention.

I have a cleanup job that runs daily, but wasn’t introduced until the Sentry system had been running for a month or so. Over this period, the DB has managed to scale to about 350GB in size, which may have something to do with this.

It keeps getting stuck on:

Removing expired values for LostPasswordHash
Removing expired values for OrganizationMember
Removing expired values for ApiGrant
Removing expired values for ApiToken
Removing expired files associated with ExportedData
Removing old NodeStore values <----

I’ve managed to have one running for 4 full days, and it is still just stuck here.

I am in the process of running a VACUUM FULL; on PG to try and clean things up in the meantime. (I’ve closed the clean up job in the meantime). Edit: VACUUM didn’t clear anything

Is this expected behaviour? Is it possible that something is getting stuck?

DandyDeveloper · October 30, 2020, 12:59am

Bumping for visibility.

untitaker · October 30, 2020, 9:58am

Can you inspect that process with let’s say wireshark to see if it is hanging up on network activity, or if there’s really just a lot to clean up? Does it use any CPU at all? Does postgres receive any queries (you’d probably have to bump log verbosity in postgres container)

DandyDeveloper · November 9, 2020, 7:27am

Looks like eventually, after some Vacuuming (vacuumdb) it all cleaned up, now that I have the daily cron running for cleanup. Problem is hopefully solved. Thank you for your input @untitaker

mjaferDo · October 22, 2021, 9:55am

@DandyDeveloper I have a similar problem where the nodestore_node has grown to 600GB and we realised the only way to clear it is by running a separate cleanup script for the table and then use pg_repack.
I ran the cleanup script to clear everything down in node store to 0 days. It has been running for hours now. @untitaker we had 90 days worth of data in that table and I ran cleanup in the past with 60 days retention. So removing only 30 days. This completed in an hour or less. However when I run the script to clear everything, something like below.

docker-compose run -T web cleanup --days 0 -m nodestore -l debug

It seems to run for hours possibly days. I know 0 days works as expected since I was able to clear 4GB in our dev environment in around 15-20 minutes.
This is a very long time for the script to run for a small set of data. I know once I am running this regularly as a nightly cron job it may be quicker to run to completion. Since this is not documented as a requirement in the Selfhosted repo it is easy to miss.

Topic		Replies	Views
Why sentry cleanup soo slowly On-Premise	7	1992	September 7, 2021
Cleanup doesn't work On-Premise	7	5558	June 4, 2021
Postgres DB growth On-Premise	5	4167	November 26, 2020
Database cleanup On-Premise	10	21640	October 13, 2020
Postgres is growing rapidly - Manual cleanup On-Premise	17	12881	September 16, 2021

Cleanup stuck for several days on "Removing old NodeStore values"

Related topics