What would be the recommended way to ingest sentry's data into hive (s3)

Hey, I’m new to Sentry and this forum. I’m using AWS EC2 to host onpremise sentry server.
And what is the default data store for Sentry (I see we have postgres and clickhouse, but not sure what’s the different usage).

Trying to use this in an analytics use case where we need to get Sentry data (raw and aggregated) into Hive on S3, so that I could run offline analytics and future build ML model. I wonder what’s the best way to get Sentry data into S3 and make a hive metastore.

Sorry for the native question, Thanks for any help from the community.

Thanks,
Martin

Sentry mainly uses Postgres for everything. Clickhouse is used more as a quick search index.

I think this may help you achieve what you want: GitHub - kanadaj/sentry-s3-nodestore: A Sentry extension to add S3 as a NodeStore backend.

For our usecase, the only data we only need it for one month at most or saying one week. But we have a large analytics use cases for offline data.
In this case, do you suggest that we double write raw log and aggregated issue into Postgres and S3 at the same time, and clean up weekly data in Postgres/Clickhouse so that we can keep data minimum as needed. Or, alternatively, we can ingest Postgres into S3 once data landing in Postgres db.

It looks like Clickhouse is not a data store right?

As I mentioned earlier, I think you can try that S3-nodestore extension and that should take a big load off of Postgres. After that you can do more tuning regarding retention.

Awesome Thanks for the note!

1 Like