Sentry onpremise monitoring/status.sentry.io

Hello

I noticed the checks on https://status.sentry.io/ but I couldn’t find how they work / what do they check. Is there a wiki/repo that contains the checks that are running for the statuspage ? Do you have somewhere a documentation on how to ensure sentry is running correctly? In docker, we can add healthchecks, but I don’t think It covers the actual running of sentry.
What URLs need to be monitored ? I found /_health/ but it’s not clear what does it check. Is there a script that can be run to check the status of all the kafka consumers /sentry workers? How can I make sure that snuba is working correctly ? The architecture is very complex and I can’t find documentation that covers the monitoring of all the flows ( besides the Datadog integration, which is not OpenSource).

Thank you

We added some health checks to our self-hosted repo recently:

Maybe @jasonious can provide more information about tracking some key endpoints.

Re metrics, there are many metrics backends you can use here: sentry/src/sentry/metrics at master · getsentry/sentry · GitHub

You can also write your own if you really need something custom. You can then set the backend you want to use via these settings:

The healthchecks on docker are good, but I’m also interested in the queue workers, that don’t have docker healthchecks.
For example, on our dev environment no data was entering sentry although all the docker containers were up & running. I noticed that the kafka consumer group - ingest-consumer- was Empty and then I checked the logs and saw that the ingest-consumer container received an error and was doing nothing ( it did not crash so it was blocked ).
This showed me that we need to monitor Kafka consumer groups and maybe their lag ( I’m not sure how much is too much). It would be nice to have this information in a monitoring recommendations wiki that can be used by Sentry administrators.
It’s also not clear for me the /_health/ sentry heathcheck checks - is it just that the http interface is up or does it test connection to all the dependencies, like the database, redis, etc
Also, how can I monitor the sentry worker ? There is a message in sentry when no workers are running, but what is check that is used to get this information ?

Thank you

Sounds like an instance of snuba consumer left kafka cluster peacefully without exiting · Issue #1993 · getsentry/snuba · GitHub. Would be great to add your voice there (and also follow the issue for further resolution).

It just ensures that Django can accept HTTP requests. That said if a required connection drops, the app should crash.

This is the code that checks the workers: sentry/celery_alive.py at 9b9ccc2da8ecd59265d6e22e868838edf7b49a5d · getsentry/sentry · GitHub – we can probably make this something we can listen on. Otherwise you can just rely on the process being up and running.

There’s also a sentry queues list command for checking queue status: sentry/queues.py at dd723b46afdfdc8a082b2b303e8db8b754ece162 · getsentry/sentry · GitHub

As mentioned earlier, @jasonious would probably know better but he’s on vacation right now.