KafkaError OFFSET_OUT_OF_RANGE error

Hi,
I have self-hosted Sentry 21.5.1 and after few months of smooth operations I am seeing error very similar to the issue described in https://github.com/getsentry/onpremise/issues/478 . It has most likely been caused by suddent spike of incomming events.

I read through all comments carefully and applied winning solution but my “sentry_onpremise_snuba-outcomes-consumer_1” container keeps “restarting”.

Here is output from docker-compose ps:

sentry_onpremise_clickhouse_1                                 /entrypoint.sh                   Up             8123/tcp, 9000/tcp, 9009/tcp        
sentry_onpremise_cron_1                                       /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_geoipupdate_1                                /usr/bin/geoipupdate -d /s ...   Exit 1                                             
sentry_onpremise_ingest-consumer_1                            /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_kafka_1                                      /etc/confluent/docker/run        Up (healthy)   9092/tcp                            
sentry_onpremise_memcached_1                                  docker-entrypoint.sh memcached   Up             11211/tcp                           
sentry_onpremise_nginx_1                                      nginx -g daemon off;             Up             0.0.0.0:9000->80/tcp,:::9000->80/tcp
sentry_onpremise_post-process-forwarder_1                     /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_postgres_1                                   docker-entrypoint.sh postgres    Up             5432/tcp                            
sentry_onpremise_redis_1                                      docker-entrypoint.sh redis ...   Up             6379/tcp                            
sentry_onpremise_relay_1                                      /bin/bash /docker-entrypoi ...   Up             3000/tcp                            
sentry_onpremise_sentry-cleanup_1                             /entrypoint.sh 0 0 * * * g ...   Up             9000/tcp                            
sentry_onpremise_smtp_1                                       docker-entrypoint.sh exim  ...   Up             25/tcp                              
sentry_onpremise_snuba-api_1                                  ./docker_entrypoint.sh api       Up             1218/tcp                            
sentry_onpremise_snuba-cleanup_1                              /entrypoint.sh */5 * * * * ...   Up             1218/tcp                            
sentry_onpremise_snuba-consumer_1                             ./docker_entrypoint.sh con ...   Up             1218/tcp                            
sentry_onpremise_snuba-outcomes-consumer_1                    ./docker_entrypoint.sh con ...   Restarting                                         
sentry_onpremise_snuba-replacer_1                             ./docker_entrypoint.sh rep ...   Up             1218/tcp                            
sentry_onpremise_snuba-sessions-consumer_1                    ./docker_entrypoint.sh con ...   Up             1218/tcp                            
sentry_onpremise_snuba-subscription-consumer-events_1         ./docker_entrypoint.sh sub ...   Up             1218/tcp                            
sentry_onpremise_snuba-subscription-consumer-transactions_1   ./docker_entrypoint.sh sub ...   Up             1218/tcp                            
sentry_onpremise_snuba-transactions-cleanup_1                 /entrypoint.sh */5 * * * * ...   Up             1218/tcp                            
sentry_onpremise_snuba-transactions-consumer_1                ./docker_entrypoint.sh con ...   Up             1218/tcp                            
sentry_onpremise_subscription-consumer-events_1               /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_subscription-consumer-transactions_1         /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_symbolicator-cleanup_1                       /entrypoint.sh 55 23 * * * ...   Up             3021/tcp                            
sentry_onpremise_symbolicator_1                               /bin/bash /docker-entrypoi ...   Up             3021/tcp                            
sentry_onpremise_web_1                                        /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_worker_1                                     /etc/sentry/entrypoint.sh  ...   Up             9000/tcp                            
sentry_onpremise_zookeeper_1                                  /etc/confluent/docker/run        Up (healthy)   2181/tcp, 2888/tcp, 3888/tcp        

In logs for snuba-outcomes-consumer container (docker-compose logs --tail 50 snuba-outcomes-consumer) I see following error :

snuba-outcomes-consumer_1                   | 2021-07-05 17:00:56,775 New partitions assigned: {Partition(topic=Topic(name='outcomes'), index=0): 2875351}
snuba-outcomes-consumer_1                   | 2021-07-05 17:00:56,778 Caught OffsetOutOfRange('KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}'), shutting down...
snuba-outcomes-consumer_1                   | Traceback (most recent call last):
snuba-outcomes-consumer_1                   |   File "/usr/local/bin/snuba", line 33, in <module>
snuba-outcomes-consumer_1                   |     sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
snuba-outcomes-consumer_1                   |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
snuba-outcomes-consumer_1                   |     return self.main(*args, **kwargs)
snuba-outcomes-consumer_1                   |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
snuba-outcomes-consumer_1                   |     rv = self.invoke(ctx)
snuba-outcomes-consumer_1                   |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
snuba-outcomes-consumer_1                   |     return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-outcomes-consumer_1                   |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
snuba-outcomes-consumer_1                   |     return ctx.invoke(self.callback, **ctx.params)
snuba-outcomes-consumer_1                   |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
snuba-outcomes-consumer_1                   |     return callback(*args, **kwargs)
snuba-outcomes-consumer_1                   |   File "/usr/src/snuba/snuba/cli/consumer.py", line 161, in consumer
snuba-outcomes-consumer_1                   |     consumer.run()
snuba-outcomes-consumer_1                   |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 116, in run
snuba-outcomes-consumer_1                   |     self._run_once()
snuba-outcomes-consumer_1                   |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 146, in _run_once
snuba-outcomes-consumer_1                   |     self.__message = self.__consumer.poll(timeout=1.0)
snuba-outcomes-consumer_1                   |   File "/usr/src/snuba/snuba/utils/streams/backends/kafka/__init__.py", line 396, in poll
snuba-outcomes-consumer_1                   |     raise OffsetOutOfRange(str(error))
snuba-outcomes-consumer_1                   | snuba.utils.streams.backends.abstract.OffsetOutOfRange: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}

BTW: I noticed in the error the topic is “outcomes” so I have tried to reset indexes for ‘outcomes’ topic in ‘snuba-post-processor’ group, but still no luck.

I would really appreciate if someone can explain to me what am I doing wrong and why application of ‘winning solution’ did not work for me.

Additionally, I would like to learn how to identify and “recreate Kafka-related volumes” mentioned in official Self-hosted Sentry documentation → Troubleshooting → Recovery section (I would like to post a link but I am new user so I can post only 2 link in the topic ). I don’t mind loosing all ‘not processed yet’ events so this “nuclear option” might be handy.

Thank you very much in advance

Without seeing the exact state of your setup and seeing the specific commands you are running, it is hard to tell what you are doing wrong (if anything).

You first stop your instance:

docker-compose stop

Then you remove and recreate the Kafka & Zookeeper related volumes

docker volume rm sentry-kafka
docker volume rm sentry-zookeeper
docker volume create --name=sentry-kafka
docker volume create --name=sentry-zookeeper

Then you run

docker-compose up -d

And see what happens. It may still fail, if it does report back here and we’ll try to help.

Thank you very much for instructions @BYK .

When I tried to remove Kafka & Zookeeper related volumes I got following error:

$ docker volume rm sentry-kafka
Error response from daemon: remove sentry-kafka: volume is in use - [e24e53c7bb720b143f4f914c3a6182322ac3943919b13c29c6c58a414aa3733b]

$ docker volume rm sentry-zookeeper 
Error response from daemon: remove sentry-zookeeper: volume is in use - [225aeac577a6d0da12093adde6110dd983c994217402cc3543c01ca1da2228f8]

there were few strange volumes in docker volumes:

$ docker volume ls
DRIVER    VOLUME NAME
local     c42f60ff5f9fb95b07a9e65c2dd560d9b3cb7c13aa8e89f1e2234c5a6dab361d
local     c158d1647287e4e4fa7f208406641a8548b84a82e458d7410b2b14e58f73cbe0
local     d2164cad3b90d729897f0a61ccaafd72ac1115fe876e05fbece80ebc4de91dae
local     sentry-clickhouse
local     sentry-data
local     sentry-kafka
local     sentry-postgres
local     sentry-redis
local     sentry-symbolicator
local     sentry-zookeeper
local     sentry_onpremise_sentry-clickhouse-log
local     sentry_onpremise_sentry-kafka-log
local     sentry_onpremise_sentry-secrets
local     sentry_onpremise_sentry-smtp
local     sentry_onpremise_sentry-smtp-log
local     sentry_onpremise_sentry-zookeeper-log

I found a post which suggested to try

docker-compose down --volumes

so I did, but I was getting same error.

So I found another post which suggested to stop docker remove those volumes manually and restart docker (service docker stop && rm -rf /var/lib/docker/volumes/TheVolumIdYouWantToRemove && service docker start). That worked and I was able to remove and recreate Kafka & Zookeeper related volumes.

The good news is that “sentry_onpremise_snuba-outcomes-consumer_1” container is now UP!!!

Unfortunatelly, now “sentry_onpremise_snuba-replacer_1” and “sentry_onpremise_snuba-sessions-consumer_1” is stuck in ‘restarting’ state.

Logs for snuba-replacer_1 show this:

2021-07-06 10:10:06,957 Caught ConsumerError('KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str="Subscribed topic not available: event-replacements: Broker: Unknown topic or partition"}'), shutting down...
snuba-replacer_1                            | Traceback (most recent call last):
snuba-replacer_1                            |   File "/usr/local/bin/snuba", line 33, in <module>
snuba-replacer_1                            |     sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
snuba-replacer_1                            |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
snuba-replacer_1                            |     return self.main(*args, **kwargs)
snuba-replacer_1                            |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
snuba-replacer_1                            |     rv = self.invoke(ctx)
snuba-replacer_1                            |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
snuba-replacer_1                            |     return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-replacer_1                            |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
snuba-replacer_1                            |     return ctx.invoke(self.callback, **ctx.params)
snuba-replacer_1                            |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
snuba-replacer_1                            |     return callback(*args, **kwargs)
snuba-replacer_1                            |   File "/usr/src/snuba/snuba/cli/replacer.py", line 133, in replacer
snuba-replacer_1                            |     replacer.run()
snuba-replacer_1                            |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 116, in run
snuba-replacer_1                            |     self._run_once()
snuba-replacer_1                            |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 146, in _run_once
snuba-replacer_1                            |     self.__message = self.__consumer.poll(timeout=1.0)
snuba-replacer_1                            |   File "/usr/src/snuba/snuba/utils/streams/backends/kafka/__init__.py", line 398, in poll
snuba-replacer_1                            |     raise ConsumerError(str(error))
snuba-replacer_1                            | snuba.utils.streams.backends.abstract.ConsumerError: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str="Subscribed topic not available: event-replacements: Broker: Unknown topic or partition"}

Ah, for this I think you just need to run ./install.sh again which will create the missing topics.

Sorry for the incomplete instructions btw, we should probably improve the docs around this a bit. Happy to use your help if you are willing to give it a shot.

Hi @BYK ,
running ./install.sh solved the latest error and all containers are in UP state now.

You are my hero. Thank you very much

RE: enhancing docs → I am happy to do so if you explain to me what is the procedure → sorry I am ‘open source’ newbie.

1 Like

Glad it helped you :slight_smile:

Each page on our docs has a link at the bottom:

For that particular page, it points to Sign in to GitHub · GitHub

You can follow it, make adjustments and then propose a pull request where we review and then merge your proposed changes :slight_smile:

Done - Update troubleshooting.mdx by JanoValaska · Pull Request #372 · getsentry/develop · GitHub

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.