Hi,
I have self-hosted Sentry 21.5.1 and after few months of smooth operations I am seeing error very similar to the issue described in https://github.com/getsentry/onpremise/issues/478 . It has most likely been caused by suddent spike of incomming events.
I read through all comments carefully and applied winning solution but my “sentry_onpremise_snuba-outcomes-consumer_1” container keeps “restarting”.
Here is output from docker-compose ps:
sentry_onpremise_clickhouse_1 /entrypoint.sh Up 8123/tcp, 9000/tcp, 9009/tcp
sentry_onpremise_cron_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_geoipupdate_1 /usr/bin/geoipupdate -d /s ... Exit 1
sentry_onpremise_ingest-consumer_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_kafka_1 /etc/confluent/docker/run Up (healthy) 9092/tcp
sentry_onpremise_memcached_1 docker-entrypoint.sh memcached Up 11211/tcp
sentry_onpremise_nginx_1 nginx -g daemon off; Up 0.0.0.0:9000->80/tcp,:::9000->80/tcp
sentry_onpremise_post-process-forwarder_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_postgres_1 docker-entrypoint.sh postgres Up 5432/tcp
sentry_onpremise_redis_1 docker-entrypoint.sh redis ... Up 6379/tcp
sentry_onpremise_relay_1 /bin/bash /docker-entrypoi ... Up 3000/tcp
sentry_onpremise_sentry-cleanup_1 /entrypoint.sh 0 0 * * * g ... Up 9000/tcp
sentry_onpremise_smtp_1 docker-entrypoint.sh exim ... Up 25/tcp
sentry_onpremise_snuba-api_1 ./docker_entrypoint.sh api Up 1218/tcp
sentry_onpremise_snuba-cleanup_1 /entrypoint.sh */5 * * * * ... Up 1218/tcp
sentry_onpremise_snuba-consumer_1 ./docker_entrypoint.sh con ... Up 1218/tcp
sentry_onpremise_snuba-outcomes-consumer_1 ./docker_entrypoint.sh con ... Restarting
sentry_onpremise_snuba-replacer_1 ./docker_entrypoint.sh rep ... Up 1218/tcp
sentry_onpremise_snuba-sessions-consumer_1 ./docker_entrypoint.sh con ... Up 1218/tcp
sentry_onpremise_snuba-subscription-consumer-events_1 ./docker_entrypoint.sh sub ... Up 1218/tcp
sentry_onpremise_snuba-subscription-consumer-transactions_1 ./docker_entrypoint.sh sub ... Up 1218/tcp
sentry_onpremise_snuba-transactions-cleanup_1 /entrypoint.sh */5 * * * * ... Up 1218/tcp
sentry_onpremise_snuba-transactions-consumer_1 ./docker_entrypoint.sh con ... Up 1218/tcp
sentry_onpremise_subscription-consumer-events_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_subscription-consumer-transactions_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_symbolicator-cleanup_1 /entrypoint.sh 55 23 * * * ... Up 3021/tcp
sentry_onpremise_symbolicator_1 /bin/bash /docker-entrypoi ... Up 3021/tcp
sentry_onpremise_web_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_worker_1 /etc/sentry/entrypoint.sh ... Up 9000/tcp
sentry_onpremise_zookeeper_1 /etc/confluent/docker/run Up (healthy) 2181/tcp, 2888/tcp, 3888/tcp
In logs for snuba-outcomes-consumer container (docker-compose logs --tail 50 snuba-outcomes-consumer) I see following error :
snuba-outcomes-consumer_1 | 2021-07-05 17:00:56,775 New partitions assigned: {Partition(topic=Topic(name='outcomes'), index=0): 2875351}
snuba-outcomes-consumer_1 | 2021-07-05 17:00:56,778 Caught OffsetOutOfRange('KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}'), shutting down...
snuba-outcomes-consumer_1 | Traceback (most recent call last):
snuba-outcomes-consumer_1 | File "/usr/local/bin/snuba", line 33, in <module>
snuba-outcomes-consumer_1 | sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
snuba-outcomes-consumer_1 | File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
snuba-outcomes-consumer_1 | return self.main(*args, **kwargs)
snuba-outcomes-consumer_1 | File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
snuba-outcomes-consumer_1 | rv = self.invoke(ctx)
snuba-outcomes-consumer_1 | File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
snuba-outcomes-consumer_1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-outcomes-consumer_1 | File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
snuba-outcomes-consumer_1 | return ctx.invoke(self.callback, **ctx.params)
snuba-outcomes-consumer_1 | File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
snuba-outcomes-consumer_1 | return callback(*args, **kwargs)
snuba-outcomes-consumer_1 | File "/usr/src/snuba/snuba/cli/consumer.py", line 161, in consumer
snuba-outcomes-consumer_1 | consumer.run()
snuba-outcomes-consumer_1 | File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 116, in run
snuba-outcomes-consumer_1 | self._run_once()
snuba-outcomes-consumer_1 | File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 146, in _run_once
snuba-outcomes-consumer_1 | self.__message = self.__consumer.poll(timeout=1.0)
snuba-outcomes-consumer_1 | File "/usr/src/snuba/snuba/utils/streams/backends/kafka/__init__.py", line 396, in poll
snuba-outcomes-consumer_1 | raise OffsetOutOfRange(str(error))
snuba-outcomes-consumer_1 | snuba.utils.streams.backends.abstract.OffsetOutOfRange: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}
BTW: I noticed in the error the topic is “outcomes” so I have tried to reset indexes for ‘outcomes’ topic in ‘snuba-post-processor’ group, but still no luck.
I would really appreciate if someone can explain to me what am I doing wrong and why application of ‘winning solution’ did not work for me.
Additionally, I would like to learn how to identify and “recreate Kafka-related volumes” mentioned in official Self-hosted Sentry documentation → Troubleshooting → Recovery section (I would like to post a link but I am new user so I can post only 2 link in the topic ). I don’t mind loosing all ‘not processed yet’ events so this “nuclear option” might be handy.
Thank you very much in advance