Sentry stops processing events after upgrade 10.0 => 20.8.0.dev0ba2aa70

hheexx · August 18, 2020, 11:24am

It worked for 12 hours and stopped again

ChibangLW · August 19, 2020, 6:36am

I had the same result. But it stopped after 8 hours.

fabriciols · August 19, 2020, 12:24pm

Im having the same issue… already removed the kafka/zookeeper volumes, but the problem is still happening.

Eis · August 21, 2020, 1:12pm

I’m encountering a similar problem. My installation is Standart (besides the port of the nginx)

Sometimes on the Web interface the Backlog Error pops up.

The Size of the Queue increases if i send new Errors. Therefore i guess my Server receives the Errors and it at least gets to the Backlog. At least to my understanding all Containers required are up and running.

@e2_robert does the docker stop/start sets it to run since your entry 6 days ago?

amit1 · August 23, 2020, 11:35pm

Any updates on this issue. Same issue persists for our on-prem instance running of the repo clone

@e2_robert @BYK

e2_robert · August 24, 2020, 6:57am

I ended up having a dirty workaround by using a cronjob that restarts kafka, worker & relay every full hour. Was processing events for 3 days, when a full restart was required, since some snuba containers stopped working.

Did anyone successfully downgrade? If so, which versions are compatible (I assume going to Sentry 10 won’t work)?

renchap · August 24, 2020, 4:04pm

I am having issues as well, and I noticed processes are getting killed because the server is running out of memory frequently:

# dmesg | grep "Out of memory"
[56625.167756] Out of memory: Kill process 9431 (clickhouse-serv) score 202 or sacrifice child
[57970.751846] Out of memory: Kill process 10723 (clickhouse-serv) score 210 or sacrifice child
[58370.176550] Out of memory: Kill process 13908 (clickhouse-serv) score 216 or sacrifice child
[62206.017300] Out of memory: Kill process 24844 (clickhouse-serv) score 348 or sacrifice child
[68831.860803] Out of memory: Kill process 27860 (clickhouse-serv) score 224 or sacrifice child

Maybe this is also happening to you?

Here Clickhouse is killed multiple times, but Kafka is also using quite a lot of memory and was also killed. I suspect this might corrupt their file and sometimes prevent them restarting.

e2_robert · August 25, 2020, 10:48am

I don’t have out-of-memory issues.

Eis · August 25, 2020, 4:43pm

No Out of memory displays here either
But after the restart i have not encountered a new fallout. --> but there was only one new entry 5minutes ago

Findmate · August 26, 2020, 8:38pm

I’m having the same issue after upgrading from 10.1.0 to 20.8.0, although it did not happen immediately. I upgraded via ./install.sh and seemed to all go fine.

After upgrading, it performed almost identically to the previous version, but then on 8/18 I stopped getting events after changing nothing. Upgrade was done on 8/7 so it worked fine for 11 days before this happened.

After starting with docker-compose up to watch the logs, I can see this error repeatedly:

worker_1                   | 20:33:00 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:6ce4d190b4244e4aa956e0690eeda3ff:6')
worker_1                   | 20:33:00 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:8128b458011849a189e4c03e82184e9c:6')

It seems to run this for a very long time, then goes to 0% CPU after awhile, then never processes any new events.

I was very fortunate to have saved a snapshot of the instance prior to upgrading. I’ve reverted back to 10.1.0 and its working just fine again.

hheexx · August 27, 2020, 9:56am

I don’t have OOM errors neither, VM has 16GB.

hheexx · August 27, 2020, 11:00am

set – snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

set gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

exec gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

2020-08-27 10:47:22,943 New partitions assigned: {Partition(topic=Topic(name=‘ingest-sessions’), index=0): 0}

2020-08-27 10:47:23,109 Partitions revoked: [Partition(topic=Topic(name=‘ingest-sessions’), index=0)]

‘[’ c = - ‘]’

snuba consumer --help

set – snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

set gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

exec gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750

%3|1598525451.691|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 5ms in state CONNECT)

%3|1598525451.695|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 3ms in state CONNECT)

%3|1598525452.686|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)

%3|1598525452.691|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)

2020-08-27 10:51:03,877 New partitions assigned: {Partition(topic=Topic(name=‘ingest-sessions’), index=0): 0}

What does ‘New partitions assigned’ mean? He manages to connect in the end or not?

fabriciols · August 27, 2020, 1:57pm

Still with the same issue…
I do a git pull && ./install.sh everyday to see if it gets fixed, but still not.

ibm5155 · August 27, 2020, 6:27pm

Same problem here :c

marbon87 · August 28, 2020, 6:01am

Same problem here, but restarting only the worker solves the problem for a few ours.
We didn’t had any problems with 20.7.2. Problems were starting with 20.8

After restarting the worker i have tons of these messages in the worker log:

05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:50e084e07290492ca85fe87a269f3a4f:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:e2620e41ca894f47ba547595dc3f3284:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:d64d2ab997a1414385259e0a1762aa3b:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:66d775ad13ef4bb68b63c59b82b2851f:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:ad3b2445331947769e4da8d8d340c146:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:d441cebf14724d169ab84f4b49d8d039:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:9c3a9528afd14db6b1837e2e5ad448e2:3')

xiaomao361 · August 28, 2020, 10:10am

same problem

I now restart the worker every 3~4 hours

I also have a strange problem that some errors （from a few days ago） occurred time to time (but this project has been stopped)

I don’t konw if this is a kafka error, or my configuration error

BTW, there was some connect error in sentry_install_log

%3|1598057437.870|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.5:9092 failed: Connection refused (after 5ms in state CONNECT)57438.865|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)ka failed (attempt 0)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}

Findmate · August 28, 2020, 6:17pm

I had upgraded Sentry on another project with identical configuration, and upgrade process.

This instance is not having the same issue, the only real difference between the two is the second one has a much lower volume of events approximately only 1% of the events of the instance that is having the issue, so I suspect it may be related to load/volume of events.

liangrong · August 31, 2020, 2:52am

try my trick: add web as dependency to relay in docker-compose.yml as:

  relay:
    << : *restart_policy
    image: '$RELAY_IMAGE'
    volumes:
      - type: bind
        read_only: true
        source: ./relay
        target: /work/.relay
    depends_on:
      - kafka
      - redis
      - web

This trick is to ensure that only start relay service after web is up, so the upstream destination of relay is reachable.

fabriciols · September 1, 2020, 10:59am

did not work for me… thanks for sharing!

kdelbegue · September 1, 2020, 2:04pm

Restarting worker container every hour make the following error disappear :

Background workers haven't checked in recently. It seems that you have a backlog of 80 tasks. Either your workers aren't running or you need more capacity.

But that’s all… Since more than 15 days I haven’t any error event coming in…

Trying to git pull && ./install.sh frequently, hoping for an update

Topic		Replies	Views
Events have stopped appearing On-Premise	5	3032	January 3, 2021
Upgrading from 21.2 to 21.5 events are'nt process anymore On-Premise	9	1650	May 4, 2021
Unable to capture event in Sentry 20.6.0 On-Premise	6	2560	December 8, 2020
Events not displaying On-Premise	13	5387	January 22, 2021
Sentry 10 events are not fully processed On-Premise	8	2604	January 29, 2020

Sentry stops processing events after upgrade 10.0 => 20.8.0.dev0ba2aa70

Related topics