Sentry is totally dead

So we just deployed Sentry on-premise. It’s been working fine since Monday. Wednesday afternoon, it stopped receiving events completely. No errors, events were still being sent, and servers could still connect to the Sentry server, just nothing was being recorded.

Today I started to mess with it. I ran docker-compose down to try and restart it, but I got this error ERROR: error while removing network: network sentry_onpremise_default id XXX has active endpoints.

So I tried this suggestion from Github and ran docker network disconnect -f {network} {endpoint-name}, which I ran against the two endpoints sentry_onpremise_snuba-replacer_1 and sentry_onpremise_snuba-consumer_1.

I tried docker-compose up -d, but it didn’t start the snuba containers, it said the images were already there. Tried hitting the Sentry dashboard, but I got a connection reset error. Nothing. Dead.

I then tried docker-compose up -d --force-recreate, it recreated those two images, still Sentry was dead.

I then noticed my Sentry test server, which is still working fine, had a different list of containers, it had snuba-outcomes-consumer. I realized that this had only recently been added, so on my main Sentry server I ran install.sh to upgrade. Everything seemed to go fine, so I ran docker-compose up -d. Everything was green (no snuba-outcomes-consumer was listed), I hit Sentry again, and still dead. Connection reset.

At this point, I don’t know wtf to do with it. I’m a newbie with Docker, so I’m guessing I screwed something up. My newly deployed Sentry server is dead in the water, I can’t access the dashboard, and I’m not seeing any errors from anything.

Any help here would be truly appreciated.

It is a bit hard to decipher your setup from you description so your post may benefit from some clarification such as “I have 2 Sentry deployments, one for prod and one for testing. They are at versions and . I did the following steps and here are the install and error logs.” etc.

From what I can tell, when you tried to run docker-compose down, snuba_replacer and snuba_consumer were still busy processing events. The post you refer to specifically mentions this command to be run ONLY if the containers are missing, which was not the case for you. Moreover, if you just want to “turn off” a deployment, the command is docker-compose stop, which retains the containers and the network. docker-compose down is akin to uninstalling something, trying to remove the network and containers and then even images and volumes if you pass certain flags.

No errors, events were still being sent, and servers could still connect to the Sentry server, just nothing was being recorded.

This implies a stuck Snuba instance so restarting them probably would have solved the issue: docker-compose restart snuba_consumer etc. You also 100% need the post-process-forwarder as that’s what makes the processed issues appear on Sentry side.

Everything seemed to go fine, so I ran docker-compose up -d . Everything was green (no snuba-outcomes-consumer was listed), I hit Sentry again, and still dead. Connection reset.

This just means docker-compose was able to start the containers. If you cannot reach Sentry via a browser, most probably it fails to start properly for some reason. docker-compose ps would show you the status and docker-compose logs -f web should give you a lot more information about why it is failing to stay up.

I’m a newbie with Docker, so I’m guessing I screwed something up.

Being a newbie is fine, we all were there :slight_smile: What is not great is you asking here first, instead of doing some fiddling with the on-premise repo or searching more details around docker and docker-compose commands. I hope this post gives you some more ideas and pointers to where to look at.

I’d still be happy to guide you if you need any further help.

Thanks,

Sorry I wasn’t clearer, I’m running Debian 9 with Sentry On-Premise, and Apache acting as a reverse proxy for TLS. All I meant is that I have two of these Sentry servers, one for testing and tinkering, the other tracks bugs from production.

But your info did help me get it running again. I upgraded to the latest version as well. Is this Snuba hang-up a common occurrence I need to look out for?

1 Like

Not really that said if you want up time, you should be monitoring your services regardless :slight_smile:

Glad the post was helpful.

Wait. You mean you isn’t constantly staring at the issue list? :smiley: