Events have stopped appearing

liam-careerhub · December 21, 2020, 12:32am

Thanks in advance for your help.

My setup:

Running getsentry/onpremise at 20.12.1 (github.com) on using the install.sh script on Ubuntu
8GB RAM / 2CPU@2.5GHz VM
Using external Postgres server
Using external Redis instance
We have a pretty high volume of events
Migrated from a v9 setup.

Everything was running smoothly on 20.12.1 until events suddenly stopped appearing. There was no update or migration happening at the time.

I’ve tried restarting and recreating containers and rebooting the host VM, but it appears that events are no longer being processed. CPU usage has been pegged at 100% for a few days.

CPU mostly seems to be consumed by Snuba and Celery

Snuba keeps restarting over and over.

docker logs sentry_onpremise_snuba-consumer_1 reveals the below logs, which looks like the issue is somewhere in Kafka.

I’ve had a look through topics in this forum and on GitHub and haven’t found anything which has helped.

Summary

‘[’ c = - ‘]’
snuba consumer --help
set – snuba consumer --storage events --auto-offset-reset=latest --max-batch-time-ms 750
set gosu snuba snuba consumer --storage events --auto-offset-reset=latest --max-batch-time-ms 750
exec gosu snuba snuba consumer --storage events --auto-offset-reset=latest --max-batch-time-ms 750
2020-12-21 00:28:38,313 New partitions assigned: {Partition(topic=Topic(name=‘events’), index=0): 8412616}
2020-12-21 00:28:38,315 Caught OffsetOutOfRange(‘KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str=“Broker: Offset out of range”}’), shutting down…
Traceback (most recent call last):
File “/usr/local/bin/snuba”, line 33, in
sys.exit(load_entry_point(‘snuba’, ‘console_scripts’, ‘snuba’)())
File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/usr/src/snuba/snuba/cli/consumer.py”, line 161, in consumer
consumer.run()
File “/usr/src/snuba/snuba/utils/streams/processing/processor.py”, line 109, in run
self._run_once()
File “/usr/src/snuba/snuba/utils/streams/processing/processor.py”, line 139, in _run_once
self.__message = self.__consumer.poll(timeout=1.0)
File “/usr/src/snuba/snuba/utils/streams/backends/kafka.py”, line 749, in poll
return super().poll(timeout)
File “/usr/src/snuba/snuba/utils/streams/backends/kafka.py”, line 400, in poll
raise OffsetOutOfRange(str(error))
snuba.utils.streams.backends.abstract.OffsetOutOfRange: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str=“Broker: Offset out of range”}

Alexander · December 21, 2020, 10:44am

Have you tried this: Sentry no more catch errors ?

liam-careerhub · December 22, 2020, 12:15am

Thanks for helping Alexander,

No, I hadn’t seen that one. I have given it a try today (ran all the commands, none failed) but it doesn’t seem to have fixed the issue. The Snuba containers are still restarting over and over with Kafka out of range errors.

Is there a way to reset the Kafka storage entirely? Will that affect the historical saved events?

caseyduquettesc · December 28, 2020, 9:16pm

Here’s what I use to reset offsets in my kafka cluster, obviously replace bootstrap address with your own:

unset JMX_PORT; bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group snuba-post-processor --all-topics --to-latest [--execute]

List the kafka groups:

unset JMX_PORT; bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

I found that this only happens as a result of some other error causing the service not to run and the kafka cache expiring. You can buy yourself more time to recover if you extend the duration kafka retains messages for.

offsets.retention.minutes
log.retention.hours
log.retention.bytes

liam-careerhub · January 3, 2021, 11:27pm

Thanks @caseyduquettesc, that command is pretty similar to the one in the issue @Alexander linked, but has the additional all-topics flag, is that the key here?

I’ve just come back from holidays and it seems while I was away the Kafka server has fixed itself into a good state and events are appearing again. I’m not sure if it was because of the command @Alexander sent earlier working but just taking a few days to work? Either way, I will keep your help in mind if it ever does it again, and I’ll look into adjusting those timeframes.

Thanks again for your help both of you, happy new year!

system · January 18, 2021, 11:27pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sentry stops processing events after upgrade 10.0 => 20.8.0.dev0ba2aa70 On-Premise	52	11490	December 8, 2020
Perfomance metric alarm related to Snuba-events-subscriptions-consumers On-Premise	3	1917	July 30, 2021
Events not displaying On-Premise	13	5304	January 22, 2021
Unable to capture event in Sentry 20.6.0 On-Premise	6	2548	December 8, 2020
No new events showing up in GUI On-Premise	4	1274	September 3, 2021

Events have stopped appearing

Related topics