I have Sentry 10.1.0.dev0 installed on a Debian 9 box via the install script. It stopped receiving events today. This is separate but related to my earlier thread, since I had everything running fine. I did upgrade my Sentry server using install.sh, if you need the install log let me know, it’s too big to post.
Ran docker-compose ps :
Name Command State Ports
-----------------------------------------------------------------------------------------------------------------------
sentry_onpremise_clickhouse_1 /entrypoint.sh Up 8123/tcp, 9000/tcp, 9009/tcp
sentry_onpremise_cron_1 /bin/sh -c exec /docker-en ... Up 9000/tcp
sentry_onpremise_kafka_1 /etc/confluent/docker/run Up 9092/tcp
sentry_onpremise_memcached_1 docker-entrypoint.sh memcached Up 11211/tcp
sentry_onpremise_post-process-forwarder_1 /bin/sh -c exec /docker-en ... Up 9000/tcp
sentry_onpremise_postgres_1 docker-entrypoint.sh postgres Up 5432/tcp
sentry_onpremise_redis_1 docker-entrypoint.sh redis ... Up 6379/tcp
sentry_onpremise_sentry-cleanup_1 /entrypoint.sh 0 0 * * * g ... Up 9000/tcp
sentry_onpremise_smtp_1 docker-entrypoint.sh exim ... Up 25/tcp
sentry_onpremise_snuba-api_1 ./docker_entrypoint.sh api Up 1218/tcp
sentry_onpremise_snuba-cleanup_1 /entrypoint.sh */5 * * * * ... Up 1218/tcp
sentry_onpremise_snuba-consumer_1 ./docker_entrypoint.sh con ... Restarting
sentry_onpremise_snuba-outcomes-consumer_1 ./docker_entrypoint.sh con ... Restarting
sentry_onpremise_snuba-replacer_1 ./docker_entrypoint.sh rep ... Up 1218/tcp
sentry_onpremise_symbolicator-cleanup_1 /entrypoint.sh 55 23 * * * ... Up 3021/tcp
sentry_onpremise_symbolicator_1 /bin/bash /docker-entrypoi ... Up 3021/tcp
sentry_onpremise_web_1 /bin/sh -c exec /docker-en ... Up 127.0.0.1:9000->9000/tcp
sentry_onpremise_worker_1 /bin/sh -c exec /docker-en ... Up 9000/tcp
sentry_onpremise_zookeeper_1 /etc/confluent/docker/run Up 2181/tcp, 2888/tcp, 3888/tcp
Ok, I’ll try that out. Also, I do get a lot of these messages during install, is this normal?
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-05-11 20:37:19,357 Connection to Kafka failed (attempt 15)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
EDIT: I did pull the latest version with git, ran install again, it went through 59 of the above error and ended with this:
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 11, in <module>
load_entry_point('snuba', 'console_scripts', 'snuba')()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Cleaning up...
kafka_1 | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.19.0.3:2181. Will not attempt to authenticate using SASL (unknown error)
kafka_1 | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /172.19.0.10:37118, server: zookeeper/172.19.0.3:2181
kafka_1 | [main-SendThread(zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server zookeeper/172.19.0.3:2181, unexpected error, closing socket connection and attempting reconnect
kafka_1 | java.io.IOException: Connection reset by peer
kafka_1 | at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
kafka_1 | at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
kafka_1 | at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
kafka_1 | at sun.nio.ch.IOUtil.read(IOUtil.java:192)
kafka_1 | at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
kafka_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:75)
kafka_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
kafka_1 | at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
docker-compose logs zookeeper
zookeeper_1 | ===> Launching ...
zookeeper_1 | ===> Launching zookeeper ...
zookeeper_1 | [2020-05-12 15:21:46,729] WARN Either no config or no quorum defined in config, running in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
zookeeper_1 | [2020-05-12 15:21:46,842] WARN o.e.j.s.ServletContextHandler@4d95d2a2{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
zookeeper_1 | [2020-05-12 15:21:46,842] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)
zookeeper_1 | [2020-05-12 15:21:46,938] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
zookeeper_1 | java.io.IOException: No snapshot found, but there are log entries. Something is broken!
zookeeper_1 | at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
zookeeper_1 | at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
zookeeper_1 | at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
zookeeper_1 | at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
zookeeper_1 | at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
zookeeper_1 | at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
Literally all I’ve done is git pull and run install.sh. At first I did run the install script before pulling anything down, if that could have messed something up. At this point it seems like zookeeper is broken.
Yup, this is a known issue with zookeeper unfortunately. We’ll be adding an automated fix soon but until then you can run docker volume rm sentry-zookeper && docker volume create --name sentry-zookeper to fix it.
I just updated the on-premise Sentry instance on our Debian box to Sentry 10.1.0.dev0 by running the ./install.sh. I am getting the exact same errors that you posed above.
So I did a git pull to get the latest changes and ran ./install.sh again, which also printed the KafkaException: KafkaError{code=_TRANSPORT [...] multiple times for me.
Unfortunately docker volume rm sentry-zookeper && docker volume create --name sentry-zookeeper does not do the trick for me.
docker volume ls gives me the following output:
DRIVER VOLUME NAME
local 0a2cb31dab08a438a508b77b51fd144d7ee34226444db254af7296228b69ed61
local 0fbe9c58a7ab7617851c588404a6d71456f5edf977aa52d10d2c86d15910614f
local 3d08e4b57d68dab696e1019bf4e726f093fa49dababe09496e89f4004e35c224
local 4e457d766ecc8d6a78475e08fe658058023d7979defd8da3292382f16ea7a977
local 6a6a487e144fb9c72e325e09d75cb06595692e43fbdf1e73a0bbf8563ab83c4c
local 9f75bf1599442ac7fae2f99e1e2ef0a805bee49c49dcc7a0f212ae3d6cc324a2
local 4478e387ba9b4c32dceca5b600b1d0e5e27f03af6076079fafe19d4a2495307e
local 77209715264f3b3a0cc19030be5d37cdd8c34e1c2c1a0608e658ff4ce807079b
local ad8109ba3de00d22efdaf0b31864e9903b15b61049ff9e82e568ddf0945142e1
local b856f5fca4f2338a9dc10a3f84a086ddbca509df216fdd6197996b051c729020
local d6f19963dddb57d10abaec575802a5e160b08f146a9709bafa276d4e6512a9a7
local dfa20d838e74a27e1bf23163df455312097951e59cc11b2473248f5a86bfa6a5
local dff80ad769afea9ce08e6ac4b406844418107d4973bd0228cb2b833a124fda03
local e8b2ce88ad2613ee77053a4531d17704d0be45aea939bb104951a116ce4da9cb
local nextcloud_db
local nextcloud_nextcloud
local sentry-clickhouse
local sentry-data
local sentry-kafka
local sentry-postgres
local sentry-redis
local sentry-symbolicator
local sentry-zookeeper
local sentry_onpremise_sentry-clickhouse-log
local sentry_onpremise_sentry-kafka-log
local sentry_onpremise_sentry-secrets
local sentry_onpremise_sentry-smtp
local sentry_onpremise_sentry-smtp-log
local sentry_onpremise_sentry-zookeeper-log
After running docker volume rm sentry-zookeper && docker volume create --name sentry-zookeeper I start all containers using docker-compose up but still get the java.io.IOException: No snapshot found, but there are log entries. Something is broken! message for zookeeper_1.
And then recreate them with docker volume create --name <volume_name>
Now Kafka and Zookeeper appear to be working again (they both were in a restart loop before). Though, relay is giving me issues now:
relay_1 | 2020-05-15T11:20:23Z [relay::cli] ERROR: relay has no credentials, which are required in managed mode. Generate some with "relay credentials generate" first.
I realized there is a relay folder now within the onpremise folder, which contains a config.yml. I’m not quite sure how to generate credentials, could you help me out with that one?
Edit:
I tried to generate the relay credentials doing
docker exec -it sentry_onpremise_relay_1 /bin/bash
root@b367d6f99467:/work# relay credentials generate
ERROR relay::cli > could not write config file (file /work/.relay/credentials.json)
caused by: Read-only file system (os error 30)
I assume I somehow have to generate a credentials.json within the onpremise/relay/ directory?
Edit 2:
I managed to create the credentials.json by setting the mode in relay/config.yml to proxy, then executing the credentials generate on the container, then setting the mode back to managed. After starting the container I got
relay_1 | caused by: Permission denied (os error 13)
relay_1 | error: could not open config file (file /work/.relay/credentials.json)
So I chmodded the onpremise/relay/credentials.json to 777.
Now I get the following error relay_1 | 2020-05-15T11:57:30Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 401 Unauthorized. Where do I have to put the credentials in order to get it working?
Running ./install.sh should take care of generating the credentials and putting them in the right place for you. Did that not work for some reason? Relevant lines are:
I’d recommend deleting the credentials file, undoing your permission changes (444 should suffice) and running ./install.sh to see if it helps or not. Make sure you have the latest version of on-premise repo before running though.
I have tried running ./install.sh with the latest version of on-premise yesterday and it somehow did not work correctly.
The odd part is, for some reason which I still haven’t figured out, the install script had not generated the credentials file. I removed the self-generated credentials.json and re-ran ./install.sh yesterday - and it generated the credentials.json and everything fired up correctly (to my knowledge).
At first at least one project was not able to receive events, I have no idea why, but I just re-ran the ./install.sh after rebooting the machine and now everything seems to be working again. I’ve just fired a test-exception and it is being logged to Sentry.
3/10 projects have received events so far, I am going to test the other projects on monday if they do not show any log entries by then.