We installed sentry in our datacenter. To comply with data protection, we want to remove user data older than 90 days. Therefore, we added a daily job which executes sentry cleanup --days 90 on the server.
Unfortunately, sentry cleanup needs to be run on the server and is not a command of sentry-cli. This is cumbersome for us and probably for a lot more users as well because we need to ssh into the sentry host and run the script there.
I think it would be great if there was an input field in the settings which defines the number of days to keep user data. The cron job could execute it daily.
If this is not feasible, I could imagine having a sentry-cli command for that.
I’d like to help and implement either of those options, preferably the user interface approach. Before I look in the source code I want to know if this has been discussed in the past and what the general opinion on such feature is. Maybe some of the maintainers can give me information regarding this topic.
While we’re on the topic of improving sentry cleanup I’d like to suggest to add cleaning up of obsolete releases. A release is likely obsolete a few days after the release has become inactive one in all of the known environments.
We currently do this with a custom script using the API. This is necessary for us because we create up to 25 releases per day, each uploading SourceMaps to sentry. Diskspace can quickly become a problem without cleaning up old cruft.
Cleanup is an extremely complicated system and not something we could easily surface in the UI. In lots of cases, cleanup takes many many hours, if not days, to fully execute.
Yeah we are also cleaning up old releases and use the same amount of days to keep for it. See this repo for our take on it (basically using the HTTP API with node in docker).
Hey @matt! So if I understood you correctly, triggering that as a cron job is not feasible? The UI would only configure the cron, not block while the cleanup is executing.
What do you think of the idea to provide an cli command in sentry-cli for it? If this is wanted, I’d like to have a look.
sentry-cli is not feasible for this since it only interacts with our API. And we can’t do proper cleanup to the same extent that sentry cleanup does. No API would surface the right data here, or be even close to capable of doing it in an efficient manner.
The only real option I’d see here is a configuration option for “how many days to clean up” to avoid passing in sentry cleanup --days=xxx, and let sentry cleanup be called and pick up that configuration setting.
Aside from that, there’s just too many options for cleanup, especially at Sentry.io scale. We have to run many in parallel, each operating over different bits of data, basically 24/7. We don’t use a cron, we literally run cleanup in a while loop constantly for different areas of the system.
So I think there might be some simplifications here, but I don’t think any of them would avoid ssh’ing into a server to run. I feel like if you’re capable of running sentry run web somewhere, you can just run sentry cleanup alongside it on cron yourself pretty easily. You don’t need to manually do this each time. Most modern schedulers also have some concept of distributed crons so you can deploy the cron into your cluster if you don’t have access to or don’t want to actually SSH in and do it.