Unable to run "sentry cleanup" due to lack of disk space

Hi all,

I’m running an on-premise installation of Sentry. I have Postgres running on a separate server. The Postgres box has a 300GB disk for storing the data.

My Postgres box recently ran out of disk space. When it did, Sentry was unable to connect to the database. I cleared out a few GBs of space by getting rid of some old logs, and tried to execute “sentry cleanup”. The cleanup process ran for a few hours, and then failed, saying that the device is out of space. The space I’d cleared on the DB server was now filled up.

I’ve tried running cleanup several times now, but each time the disk usage on the DB box seems to increase, rather than remain constant (since Postgres does not release the space to OS).

Here’s the output of the cleanup operation:

Removing expired values for LostPasswordHash
Removing old NodeStore values
Removing GroupRuleStatus for days=30 project=*
Removing GroupTagValue for days=30 project=*
Removing TagValue for days=30 project=*
Removing GroupEmailThread for days=30 project=*
Removing expired values for EventMapping
Cleaning up unused FileBlob references
File Blobs: 100% |###############| Time: 0:00:00
Removing Event for days=30 project=*

This output is typically followed by an error after a few hours, saying that the device has run out of disk space, and any space I’d previously freed up on my DB box gets used up.

I don’t think it’s the DB size that’s increasing. This is the error I usually get:

django.db.utils.OperationalError: could not access status of transaction 0
DETAIL:  Could not write to file "pg_subtrans/5A78" at offset 221184: No space left on device.

I now have only one more GB that I can free up on the DB server, and I don’t think I should let that get used up.

I’m unable to run VACUUM FULL due to the lack of disk space.

Now, I really should’ve set up the cleanup as a cron job, but I wasn’t expecting this much data. :disappointed_relieved:

Please advise on how I can get out of this mess I’ve got myself into.

Worst case scenario, I’m thinking about dropping the entire database and creating a new one. I’ve exported sentry data using “sentry export”. Once I set up a new DB, I’ll use “sentry import” to get all the necessary data in place. I wish to avoid this option if possible, since this will result in the loss of all data ingested so far.

Thanks in Advance.

Ouch. I’d recommend looking into pg_repack to reclaim some space. It’ll be a painful process, but you can operate on smallest indexes and tables first to slowly relieve pressure.

Alternatively, you can TRUNCATE the sentry_eventmapping table with minimal impact. This should free up lots of space alone. This table only holds mappings for event ids to groups. This is usually ok compromise to reclaim a bunch of space.

Hey Matt,

Thank you for your suggestions. I’m running PostgreSql on an air-gapped server, so installing pg_repack will be an extremely painful process.

I went ahead and truncated the sentry_eventmapping table. That gave me around 50GBs free space.

I executed sentry cleanup after that, and it’s been running for the last 24 hours. Any suggestions on how I can ensure that the process is still running as expected, and not stuck somewhere? I still have 47GBs of free space on the DB server, so space should not be an issue anymore.

Here’s the output of the cleanup process, so far:

Removing expired values for LostPasswordHash
Removing old NodeStore values
Removing GroupRuleStatus for days=30 project=*
Removing GroupTagValue for days=30 project=*
Removing TagValue for days=30 project=*
Removing GroupEmailThread for days=30 project=*
Removing expired values for EventMapping
Cleaning up unused FileBlob references
File Blobs: 100% |###############| Time: 0:00:00
Removing Event for days=30 project=*
Removing Group for days=30 project=*

Here is the size of the DB tables, for a rough idea of the amount of data in them. I’ve only included the tables which are more than an MB in size.

               Table               |  Size   | External Size 
-----------------------------------+---------+---------------
 nodestore_node                    | 78 GB   | 71 GB
 sentry_messagefiltervalue         | 61 GB   | 53 GB
 sentry_eventtag                   | 45 GB   | 29 GB
 sentry_eventuser                  | 16 GB   | 14 GB
 sentry_filtervalue                | 15 GB   | 13 GB
 sentry_message                    | 11 GB   | 5647 MB
 sentry_groupedmessage             | 11 GB   | 8711 MB
 sentry_grouptagkey                | 9937 MB | 6899 MB
 sentry_grouprulestatus            | 2246 MB | 1513 MB
 sentry_grouphash                  | 1938 MB | 1501 MB
 sentry_eventmapping               | 379 MB  | 238 MB
 sentry_groupemailthread           | 277 MB  | 197 MB
 sentry_organizationonboardingtask | 48 MB   | 48 MB

Please let me know your thoughts.

Thanks!

There’s honestly not much more you can do. The cleanup job will eventually finish, but as you know, that’s not going to reclaim space.

At this point, the only thing you can reasonable do is run pg_repack now over the smaller tables and work your way up as you reclaim more space.

Hey Matt,

The cleanup job finally finished after about 48 hours.

Installing pg_repack would’ve been extremely painful. I executed VACUUM FULL on the DB after the cleanup job was done. The small amount of downtime was acceptable in this case. It released around 200GBs.

Thanks for your help!

Cheers,
Shishir