504 gateway timeout from Sentry when polling sentry


#1

We have a server that polls Sentry every minute or so - we query to determine the status of a project and activate a status lamp in the event of a sentry issue -

The actual page that is returned in this 504 error is - well you’ll have to use your imagination as new users can only put one image - but it has fire, and large SVG’s and everything.

The image seems to suggest that someone’s pager is blowing up right now, but the response doesn’t seem to imply that .

I emailed support@sentry.io and had a sort of lacklustre response …

Hey Mike, which endpoints are you testing against? The response you’re showing is just the one you’ve captured, but it’s hard to tell where an issue might be.

I did respond - we’re calling the List A Project’s Issues call

Anyone else experiencing 504’s from that end point?


#2

Here’s the 504 error we’re getting from Sentry


#3

We face similar issue.
Please refer: https://github.com/getsentry/sentry/issues/10481


#4

I got this response from Sentry Support email :

Within our infrastructure, out of total requests, all to event ingestion, all API calls, we see 0.0658% of requests returning a 504 over the past week.

This includes very expensive endpoints such as search queries and file uploads which may just naturally be slower if someone is uploading large files to us.

With that said, the endpoint you were hitting is one of the slower ones for us. It’s not always slow, but at this scale, you’re hitting 99P and max timings.

Over the past day, I can see a very small number of requests hitting a max duration of over 30 seconds for this exact endpoint, which is what would trigger the 504.

In the past day, we’ve had exactly 3 of them. One at 6:09am, another at 6:29am, and another at 7:46am. All times PST.

With volumes of 504s this low only affecting 99P and max timings, this is not enough to send off alarm bells and update the status page. Working on things at scale, it’s hard to move the needle on 99P and max. We have projects we’re working on to help, but due to their nature, almost anything can cause something to do this. Maybe it’s a network hiccup between 1 server and one of the many services it needs to talk to, etc.

At that rate of errors, there is a lot of room for extreme anomalies to happen.

So yes, the endpoint you’re hitting is a relatively expensive query for us. It might sound cheap, as you mention, you only have 1124 issues (I assume, I haven’t looked or confirmed your definition of “record”), this endpoint is effectively a search endpoint. It is generically performing an issue search. And we are not a single tenant application. We have thousands upon thousands of customers and we’re doing tens of thousands of requests per second. So to spit back 3 504 errors out of tens of thousands of requests is pretty good in my opinion.

so based on their response I’d say they probably can’t do anything.


#5

@RamuRChenchaiah - have you been able to confirm where the 504 error is coming from - is it from Sentry?