What would be the recommended way to ingest sentry's data into hive (s3)

martinhu · May 31, 2021, 2:28am

Hey, I’m new to Sentry and this forum. I’m using AWS EC2 to host onpremise sentry server.
And what is the default data store for Sentry (I see we have postgres and clickhouse, but not sure what’s the different usage).

Trying to use this in an analytics use case where we need to get Sentry data (raw and aggregated) into Hive on S3, so that I could run offline analytics and future build ML model. I wonder what’s the best way to get Sentry data into S3 and make a hive metastore.

Sorry for the native question, Thanks for any help from the community.

Thanks,
Martin

BYK · June 3, 2021, 8:50am

Sentry mainly uses Postgres for everything. Clickhouse is used more as a quick search index.

I think this may help you achieve what you want: GitHub - kanadaj/sentry-s3-nodestore: A Sentry extension to add S3 as a NodeStore backend.

martinhu · June 9, 2021, 4:37pm

For our usecase, the only data we only need it for one month at most or saying one week. But we have a large analytics use cases for offline data.
In this case, do you suggest that we double write raw log and aggregated issue into Postgres and S3 at the same time, and clean up weekly data in Postgres/Clickhouse so that we can keep data minimum as needed. Or, alternatively, we can ingest Postgres into S3 once data landing in Postgres db.

It looks like Clickhouse is not a data store right?

BYK · June 9, 2021, 8:25pm

As I mentioned earlier, I think you can try that S3-nodestore extension and that should take a big load off of Postgres. After that you can do more tuning regarding retention.

martinhu · June 9, 2021, 8:54pm

Awesome Thanks for the note!

system · September 7, 2021, 8:54pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use install.sh , want to know where the data is stored On-Premise	5	3044	December 8, 2020
Failed save logs in AWS S3 Bucket (File Storage) On-Premise	6	2844	April 15, 2020
Nodestore_node size On-Premise	6	3113	March 23, 2021
Postgres nodestore_node table 124gb On-Premise	3	7740	May 27, 2021
Sentry with bigdata suite SDKs	0	715	June 9, 2021

What would be the recommended way to ingest sentry's data into hive (s3)

Related topics