504 gateway timeout from Sentry when polling sentry

Mike-Hingley · November 7, 2018, 2:35pm

We have a server that polls Sentry every minute or so - we query to determine the status of a project and activate a status lamp in the event of a sentry issue -

The actual page that is returned in this 504 error is - well you’ll have to use your imagination as new users can only put one image - but it has fire, and large SVG’s and everything.

The image seems to suggest that someone’s pager is blowing up right now, but the response doesn’t seem to imply that .

I emailed support@sentry.io and had a sort of lacklustre response …

Hey Mike, which endpoints are you testing against? The response you’re showing is just the one you’ve captured, but it’s hard to tell where an issue might be.

I did respond - we’re calling the List A Project’s Issues call

Anyone else experiencing 504’s from that end point?

Mike-Hingley · November 7, 2018, 2:36pm

Here’s the 504 error we’re getting from Sentry

RamuRChenchaiah · November 8, 2018, 5:49pm

We face similar issue.
Please refer: https://github.com/getsentry/sentry/issues/10481

Mike-Hingley · November 8, 2018, 6:03pm

I got this response from Sentry Support email :

Within our infrastructure, out of total requests, all to event ingestion, all API calls, we see 0.0658% of requests returning a 504 over the past week.

This includes very expensive endpoints such as search queries and file uploads which may just naturally be slower if someone is uploading large files to us.

With that said, the endpoint you were hitting is one of the slower ones for us. It’s not always slow, but at this scale, you’re hitting 99P and max timings.

Over the past day, I can see a very small number of requests hitting a max duration of over 30 seconds for this exact endpoint, which is what would trigger the 504.

In the past day, we’ve had exactly 3 of them. One at 6:09am, another at 6:29am, and another at 7:46am. All times PST.

With volumes of 504s this low only affecting 99P and max timings, this is not enough to send off alarm bells and update the status page. Working on things at scale, it’s hard to move the needle on 99P and max. We have projects we’re working on to help, but due to their nature, almost anything can cause something to do this. Maybe it’s a network hiccup between 1 server and one of the many services it needs to talk to, etc.

At that rate of errors, there is a lot of room for extreme anomalies to happen.

So yes, the endpoint you’re hitting is a relatively expensive query for us. It might sound cheap, as you mention, you only have 1124 issues (I assume, I haven’t looked or confirmed your definition of “record”), this endpoint is effectively a search endpoint. It is generically performing an issue search. And we are not a single tenant application. We have thousands upon thousands of customers and we’re doing tens of thousands of requests per second. So to spit back 3 504 errors out of tens of thousands of requests is pretty good in my opinion.

so based on their response I’d say they probably can’t do anything.

Mike-Hingley · November 8, 2018, 6:04pm

@RamuRChenchaiah - have you been able to confirm where the 504 error is coming from - is it from Sentry?

angieo · March 22, 2019, 5:27am

So you can’t do anything even when the instance is in your site, so far what I understood is that main issue is because of the amount of requests.

I’m getting the 504 in the web server configured but not error in Sentry logs since it seems it is not able to reach Sentry server.

Mike-Hingley · March 25, 2019, 5:38pm

Hi @angieo - I can’t do anything because the server returning the 504 error is the sentry server - we’re not running our own sentry server - rather we’re running a monitoring server that checks the sentry status every minute or so -it uses polling to regularly see if there is an error condition that should be reported on, and if so, it turns my desk lamp red.

In our server logs we’re able to report the output from the request to sentry, and saving it to a disk and loading it up we see the image that I attached in the second update

Apologies for the delay in responding.

angieo · March 25, 2019, 10:51pm

I see, thank so much for the clarification.

chugunovyar · March 15, 2020, 2:06pm

Someone here tried to solve this problem by clustering the work of sentry by putting it for example kubernetes?

Topic		Replies	Views
Performance Sentry API - on Premise, Kubernetes On-Premise	2	1172	November 2, 2021
Sentry.events.time-to-process timer maxes out at 3600<unit> On-Premise	1	981	August 6, 2020
Extremely large initial download. Login takes 1+ minutes. What is this? On-Premise	20	3050	September 1, 2021
Why was sentry down on 31st Jan? Feedback	4	1125	February 4, 2019
Rate Limiting settings ignored On-Premise	2	966	June 24, 2021

504 gateway timeout from Sentry when polling sentry

Related topics