Cleanup stuck for several days on "Removing old NodeStore values"

I have a mid-sized Sentry system that has been receiving events with a 30 day retention.

I have a cleanup job that runs daily, but wasn’t introduced until the Sentry system had been running for a month or so. Over this period, the DB has managed to scale to about 350GB in size, which may have something to do with this.

It keeps getting stuck on:

Removing expired values for LostPasswordHash
Removing expired values for OrganizationMember
Removing expired values for ApiGrant
Removing expired values for ApiToken
Removing expired files associated with ExportedData
Removing old NodeStore values <----

I’ve managed to have one running for 4 full days, and it is still just stuck here.

I am in the process of running a VACUUM FULL; on PG to try and clean things up in the meantime. (I’ve closed the clean up job in the meantime). Edit: VACUUM didn’t clear anything :frowning:

Is this expected behaviour? Is it possible that something is getting stuck?

Bumping for visibility.

Can you inspect that process with let’s say wireshark to see if it is hanging up on network activity, or if there’s really just a lot to clean up? Does it use any CPU at all? Does postgres receive any queries (you’d probably have to bump log verbosity in postgres container)

Looks like eventually, after some Vacuuming (vacuumdb) it all cleaned up, now that I have the daily cron running for cleanup. Problem is hopefully solved. Thank you for your input @untitaker