We use Sentry to collect events related to various problems in our CRM. One of the tags we record is user-id which is a number representing the user. The events contain additional user data, such as name, various unstructured data in the message etc.
Now for GDPR compliance we are looking into being able to delete all the events related to a certain user. This would have to work across the various issues as we simply want to remove all the events related to this user.
The events to delete are identified by having the user-id tag with a certain value. What options do we have?
I found that this SQL query directly in the Sentry database fetches all the events I’d like removed but some caches might need to be updated etc. Not having any FKs in the database does not help much
SELECT * FROM sentry_eventtag et JOIN sentry_filtervalue fv ON et.project_id = fv.project_id AND et.value_id = fv.id WHERE fv.key = 'user-id' AND fv.value = '1234'
Your best bet would be doing it either with a batch query (while it works, as tags might change in the future) or writing a script to iterate. We’ll definitely be building things to make scrubbing easier as time goes on.
Thanks so much. Can you please elaborate on the two approaches? In particular, let’s say I’m comfortable with writing a tool that will access the database directly. Are there any caches I should invalidate after removing the events? Any DB references I could break?
I would model what you do off of our API endpoints. There’s a way to delete a tag key, and everything should probably go through the tagstore abstraction. Im not actually familiar with the new version of it (though I dont think its enabled by default), but the schema should be the only cache you really have to worry about. There’s at least two tables in the first iteration: TagValue and GroupTagValue (in addition to the Key tables).