On premise sentry all of a sudden got kafka issue

The sentry was working well and all of a sudden got some issue on kafka with following issues:
Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}

I am running sentry on premise inside docker containers. How can I fix this Kafka issue where should I reset the offset range or make sure it does not come up again.

Thanks

Looks like you had a burst of events or the system Sentry running on was not able to consume the messages as fast as they were produced. This answer may help you: https://stackoverflow.com/a/36472296/90297

I’ve isolated the environment and am attempting to sent a single crash as a test, and it is throwing the same error.

ingest-consumer_1 | 12:22:27 [INFO] batching-kafka-consumer: Flushing 1 items (from {(u’ingest-events’, 0): [52093L, 52093L]}): forced:False size:False time:True
ingest-consumer_1 | 12:22:27 [INFO] batching-kafka-consumer: Worker flush took 20ms
snuba-transactions-consumer_1 | 2020-10-02 12:22:28,961 Completed processing <Batch: 1 message, open for 1.00 seconds>.
7e3b076e5ce8_sentry_onpremise_snuba-outcomes-consumer_1 | 2020-10-02 12:22:28,984 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-consumer_1 | 2020-10-02 12:22:28,986 Completed processing <Batch: 1 message, open for 1.03 seconds>.
(bunch of post-process-forwarder-1 stacktrace)

post-process-forwarder_1 | File “/usr/local/lib/python2.7/site-packages/sentry/eventstream/kafka/backend.py”, line 195, in run_post_process_forwarder
post-process-forwarder_1 | raise Exception(error)
post-process-forwarder_1 | Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str=“Broker: Offset out of range”}
sentry_onpremise_post-process-forwarder_1 exited with code 0

There are no events currently being processed.

the Kafka service is not reporting any errors. Only the post-process-forwarder. It seem to constantly be in a crash loop itself. with it constantly restarting. Doesn’t seem to be able to recover.

Should I delete and recreate the specific volume this service is using?

I honestly don’t know what to do here. Seems like somehow this process got out of sync with Snuba producers and consumers. I don’t think this consumer has a specific volume you can reset.

I’d try resetting the topic/group offsets and see if it helps. You’d need to translate this to our setup: https://gist.github.com/marwei/cd40657c481f94ebe273ecc16601674b

And just like that, somebody actually documented this minutes ago:

Sorry. I forgot to update the original. That someone was me :slight_smile:

1 Like

Lol, sorry. I was in a hurry. Thanks for this again! :smiley:

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.