It worked for 12 hours and stopped again
I had the same result. But it stopped after 8 hours.
Im having the same issueā¦ already removed the kafka/zookeeper volumes, but the problem is still happening.
Iām encountering a similar problem. My installation is Standart (besides the port of the nginx)
Sometimes on the Web interface the Backlog Error pops up.
The Size of the Queue increases if i send new Errors. Therefore i guess my Server receives the Errors and it at least gets to the Backlog. At least to my understanding all Containers required are up and running.
@e2_robert does the docker stop/start sets it to run since your entry 6 days ago?
Any updates on this issue. Same issue persists for our on-prem instance running of the repo clone
I ended up having a dirty workaround by using a cronjob that restarts kafka
, worker
& relay
every full hour. Was processing events for 3 days, when a full restart was required, since some snuba containers stopped working.
Did anyone successfully downgrade? If so, which versions are compatible (I assume going to Sentry 10 wonāt work)?
I am having issues as well, and I noticed processes are getting killed because the server is running out of memory frequently:
# dmesg | grep "Out of memory"
[56625.167756] Out of memory: Kill process 9431 (clickhouse-serv) score 202 or sacrifice child
[57970.751846] Out of memory: Kill process 10723 (clickhouse-serv) score 210 or sacrifice child
[58370.176550] Out of memory: Kill process 13908 (clickhouse-serv) score 216 or sacrifice child
[62206.017300] Out of memory: Kill process 24844 (clickhouse-serv) score 348 or sacrifice child
[68831.860803] Out of memory: Kill process 27860 (clickhouse-serv) score 224 or sacrifice child
Maybe this is also happening to you?
Here Clickhouse is killed multiple times, but Kafka is also using quite a lot of memory and was also killed. I suspect this might corrupt their file and sometimes prevent them restarting.
I donāt have out-of-memory issues.
No Out of memory displays here either
But after the restart i have not encountered a new fallout. --> but there was only one new entry 5minutes ago
Iām having the same issue after upgrading from 10.1.0 to 20.8.0, although it did not happen immediately. I upgraded via ./install.sh
and seemed to all go fine.
After upgrading, it performed almost identically to the previous version, but then on 8/18 I stopped getting events after changing nothing. Upgrade was done on 8/7 so it worked fine for 11 days before this happened.
After starting with docker-compose up
to watch the logs, I can see this error repeatedly:
worker_1 | 20:33:00 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:6ce4d190b4244e4aa956e0690eeda3ff:6')
worker_1 | 20:33:00 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:8128b458011849a189e4c03e82184e9c:6')
It seems to run this for a very long time, then goes to 0% CPU after awhile, then never processes any new events.
I was very fortunate to have saved a snapshot of the instance prior to upgrading. Iāve reverted back to 10.1.0 and its working just fine again.
I donāt have OOM errors neither, VM has 16GB.
set ā snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
set gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
exec gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
2020-08-27 10:47:22,943 New partitions assigned: {Partition(topic=Topic(name=āingest-sessionsā), index=0): 0}
2020-08-27 10:47:23,109 Partitions revoked: [Partition(topic=Topic(name=āingest-sessionsā), index=0)]
ā[ā c = - ā]ā
snuba consumer --help
set ā snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
set gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
exec gosu snuba snuba consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
%3|1598525451.691|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 5ms in state CONNECT)
%3|1598525451.695|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 3ms in state CONNECT)
%3|1598525452.686|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1598525452.691|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-08-27 10:51:03,877 New partitions assigned: {Partition(topic=Topic(name=āingest-sessionsā), index=0): 0}
What does āNew partitions assignedā mean? He manages to connect in the end or not?
Still with the same issueā¦
I do a git pull && ./install.sh everyday to see if it gets fixed, but still not.
Same problem here :c
Same problem here, but restarting only the worker solves the problem for a few ours.
We didnāt had any problems with 20.7.2. Problems were starting with 20.8
After restarting the worker i have tons of these messages in the worker log:
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:50e084e07290492ca85fe87a269f3a4f:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:e2620e41ca894f47ba547595dc3f3284:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:d64d2ab997a1414385259e0a1762aa3b:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:66d775ad13ef4bb68b63c59b82b2851f:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:ad3b2445331947769e4da8d8d340c146:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:d441cebf14724d169ab84f4b49d8d039:3')
05:42:42 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u'e:9c3a9528afd14db6b1837e2e5ad448e2:3')
same problem
I now restart the worker every 3~4 hours
I also have a strange problem that some errors ļ¼from a few days agoļ¼ occurred time to time (but this project has been stopped)
I donāt konw if this is a kafka error, or my configuration error
BTW, there was some connect error in sentry_install_log
%3|1598057437.870|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.5:9092 failed: Connection refused (after 5ms in state CONNECT)57438.865|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)ka failed (attempt 0)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
I had upgraded Sentry on another project with identical configuration, and upgrade process.
This instance is not having the same issue, the only real difference between the two is the second one has a much lower volume of events approximately only 1% of the events of the instance that is having the issue, so I suspect it may be related to load/volume of events.
try my trick: add web
as dependency to relay
in docker-compose.yml as:
relay:
<< : *restart_policy
image: '$RELAY_IMAGE'
volumes:
- type: bind
read_only: true
source: ./relay
target: /work/.relay
depends_on:
- kafka
- redis
- web
This trick is to ensure that only start relay
service after web
is up, so the upstream destination of relay
is reachable.
did not work for meā¦ thanks for sharing!
Restarting worker
container every hour make the following error disappear :
Background workers haven't checked in recently. It seems that you have a backlog of 80 tasks. Either your workers aren't running or you need more capacity.
But thatās allā¦ Since more than 15 days I havenāt any error event coming inā¦
Trying to git pull && ./install.sh
frequently, hoping for an update