Sentry no more catch errors

Since yesterday our instance of Sentry doesn’t work. In fact the web app is up but no events are catched.
I’ve run the ./install.sh to upgrade to latest version.
I’ve also run the cleanup command: /usr/bin/docker-compose --file /home/sentry/onpremise/docker-compose.yml exec worker sentry cleanup --days 30

My server has it’s own nginx instance that listen on 443 and use our ssl certificates to pass the request to Sentry on port 9000. Maybe it’s now useless since an nginx container exists (but i don’t know how to configure it to listen on our host:443 + use certificates
But i don’t think it’s the origin of the problem.

Here is the logs:

sentry@vps560644:~/onpremise$ docker-compose logs -f | grep error -i
clickhouse_1               | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse_1               | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
kafka_1                    | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.24.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
kafka_1                    | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket error occurred: zookeeper/172.24.0.2:2181: Connection refused
kafka_1                    | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.24.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
kafka_1                    | [2020-07-22 14:20:35,165] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-22 from OfflinePartition to OnlinePartition (state.change.logger)
postgres_1                 | ERROR:  relation "south_migrationhistory" does not exist at character 15
kafka_1                    | [2020-07-22 14:20:35,185] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-30 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,188] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-8 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,189] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-21 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,190] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-4 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,191] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition outcomes-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,191] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-27 from OfflinePartition to OnlinePartition (state.change.logger)
relay_1                    | 2020-07-22T14:57:26Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.24.0.10:9092 failed: Connection refused (after 36ms in state CONNECT)
relay_1                    | 2020-07-22T14:57:26Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-07-22T14:57:27Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.24.0.10:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                    | 2020-07-22T14:57:27Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-07-22T14:57:27Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    | 2020-07-22T14:57:27Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    | 2020-07-22T14:57:29Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    | 2020-07-22T14:57:31Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: No route to host (os error 113)
relay_1                    |   caused by: No route to host (os error 113)
relay_1                    |   caused by: No route to host (os error 113)
relay_1                    | 2020-07-22T14:57:33Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-07-22T14:57:36Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-07-22T14:57:41Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
kafka_1                    | [2020-07-22 14:20:35,192] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-7 from OfflinePartition to OnlinePartition (state.change.logger)
...
kafka_1                    | [2020-07-22 14:20:35,220] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-2 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:20:35,221] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition errors-replacements-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | kafka.common.StateChangeFailedException: Failed to elect leader for partition errors-replacements-0 under strategy OfflinePartitionLeaderElectionStrategy(false)
kafka_1                    | [2020-07-22 14:20:35,221] ERROR [Controller id=1002 epoch=20] Controller 1002 epoch 20 failed to change state for partition __consumer_offsets-43 from OfflinePartition to OnlinePartition (state.change.logger)
...
relay_1                    | 2020-07-22T14:57:49Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-07-22T14:57:49Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
nginx_1                    | 2020/07/22 14:57:49 [error] 6#6: *5 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /organizations/sentry/projects/ HTTP/1.0", upstream: "http://172.24.0.21:9000/organizations/sentry/projects/", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/issues/?project=10&query=is%3Aunresolved&statsPeriod=14d"
kafka_1                    | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.24.0.6:2181. Will not attempt to authenticate using SASL (unknown error)
nginx_1                    | 2020/07/22 14:57:50 [error] 6#6: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /favicon.ico HTTP/1.0", upstream: "http://172.24.0.21:9000/favicon.ico", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/projects/"
relay_1                    | 2020-07-22T14:57:50Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
nginx_1                    | 2020/07/22 14:57:51 [error] 6#6: *9 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /organizations/sentry/projects/ HTTP/1.0", upstream: "http://172.24.0.21:9000/organizations/sentry/projects/", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/issues/?project=10&query=is%3Aunresolved&statsPeriod=14d"
nginx_1                    | 2020/07/22 14:57:51 [error] 6#6: *11 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /favicon.ico HTTP/1.0", upstream: "http://172.24.0.21:9000/favicon.ico", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/projects/"
nginx_1                    | 2020/07/22 14:57:52 [error] 6#6: *15 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /organizations/sentry/projects/ HTTP/1.0", upstream: "http://172.24.0.21:9000/organizations/sentry/projects/", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/issues/?project=10&query=is%3Aunresolved&statsPeriod=14d"
relay_1                    | 2020-07-22T14:57:52Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
nginx_1                    | 2020/07/22 14:57:52 [error] 6#6: *17 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /favicon.ico HTTP/1.0", upstream: "http://172.24.0.21:9000/favicon.ico", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/projects/"
nginx_1                    | 2020/07/22 14:57:52 [error] 6#6: *19 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /organizations/sentry/projects/ HTTP/1.0", upstream: "http://172.24.0.21:9000/organizations/sentry/projects/", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/issues/?project=10&query=is%3Aunresolved&statsPeriod=14d"
nginx_1                    | 2020/07/22 14:57:53 [error] 6#6: *21 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /favicon.ico HTTP/1.0", upstream: "http://172.24.0.21:9000/favicon.ico", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/projects/"
nginx_1                    | 2020/07/22 14:57:53 [error] 6#6: *23 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /organizations/sentry/projects/ HTTP/1.0", upstream: "http://172.24.0.21:9000/organizations/sentry/projects/", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/issues/?project=10&query=is%3Aunresolved&statsPeriod=14d"
nginx_1                    | 2020/07/22 14:57:53 [error] 6#6: *25 connect() failed (111: Connection refused) while connecting to upstream, client: 172.24.0.1, server: , request: "GET /favicon.ico HTTP/1.0", upstream: "http://172.24.0.21:9000/favicon.ico", host: "log.tomhealth.fr", referrer: "https://log.tomhealth.fr/organizations/sentry/projects/"
relay_1                    | 2020-07-22T14:57:54Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
post-process-forwarder_1   | %3|1595429876.376|ERROR|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.24.0.10:9092 failed: Connection refused
post-process-forwarder_1   | %3|1595429876.377|ERROR|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
post-process-forwarder_1   | %3|1595429876.378|ERROR|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.24.0.10:9092 failed: Connection refused
post-process-forwarder_1   | %3|1595429876.378|ERROR|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
ingest-consumer_1          | %3|1595429876.427|ERROR|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.24.0.10:9092 failed: Connection refused
ingest-consumer_1          | %3|1595429876.427|ERROR|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
relay_1                    | 2020-07-22T14:57:58Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-07-22T14:58:03Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-07-22T14:58:05Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
kafka_1                    | [2020-07-22 14:58:06,615] ERROR [Controller id=1002 epoch=21] Controller 1002 epoch 21 failed to change state for partition __consumer_offsets-22 from OfflinePartition to OnlinePartition (state.change.logger)
...
kafka_1                    | [2020-07-22 14:58:06,766] ERROR [Controller id=1002 epoch=21] Controller 1002 epoch 21 failed to change state for partition __consumer_offsets-24 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | [2020-07-22 14:58:06,770] ERROR [Controller id=1002 epoch=21] Controller 1002 epoch 21 failed to change state for partition cdc-0 from OfflinePartition to OnlinePartition (state.change.logger)
...
kafka_1                    | [2020-07-22 14:58:06,832] ERROR [Controller id=1002 epoch=21] Controller 1002 epoch 21 failed to change state for partition errors-replacements-0 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1                    | kafka.common.StateChangeFailedException: Failed to elect leader for partition errors-replacements-0 under strategy OfflinePartitionLeaderElectionStrategy(false)
kafka_1                    | [2020-07-22 14:58:06,832] ERROR [Controller id=1002 epoch=21] Controller 1002 epoch 21 failed to change state for partition __consumer_offsets-43 from OfflinePartition to OnlinePartition (state.change.logger)
...
relay_1                    | 2020-07-22T14:58:10Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-07-22T14:58:22Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-07-22T14:58:22Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
...
2 Likes

This looks like migration didn’t run, try running sentry upgrade or sentry django migrate from one of the sentry containers.

I tried both command in a container but both return the same thing : no migration to apply

sentry upgrade
07:13:38 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
07:13:43 [INFO] sentry.plugins.github: apps-not-configured
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, jira_ac, nodestore, sentry, sessions, sites, social_auth
Running migrations:
  No migrations to apply.
Creating missing DSNs
Correcting Group.num_comments counter
root@0b2cfdadfb2b:/# sentry django migrate
07:15:49 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
07:15:52 [INFO] sentry.plugins.github: apps-not-configured
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, jira_ac, nodestore, sentry, sessions, sites, social_auth
Running migrations:
  No migrations to apply.

I’ve installed a new instance on a WSL2 to check what could happen.
The instance works finely (it catches the events and display them in the project) But i have the same kind of error in relay container :

2020-07-23T16:38:04Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.9:9092 failed: Connection refused (after 88ms in state CONNECT)
2020-07-23T16:38:04Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-07-23T16:38:04Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.9:9092 failed: Connection refused (after 0ms in state CONNECT)
2020-07-23T16:38:04Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-07-23T16:38:06Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-07-23T16:38:07Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-07-23T16:38:08Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-23T16:38:09Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-23T16:38:11Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-23T16:38:15Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-23T16:38:20Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)

Version of this release : Sentry 20.8.0.dev01957e9e
VS
Version of the release installed in production : 20.8.0.dev05f081c2

If i run the install.sh in production server, i have this error in the logs :

Creating sentry_onpremise_kafka_1      ... done
+ '[' b = - ']'
+ snuba bootstrap --help
+ set -- snuba bootstrap --force
+ set gosu snuba snuba bootstrap --force
+ exec gosu snuba snuba bootstrap --force
2020-07-24 07:07:45,746 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:47,751 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:49,759 Connection to Kafka failed (attempt 2)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:51,762 Connection to Kafka failed (attempt 3)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:53,766 Connection to Kafka failed (attempt 4)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:55,769 Connection to Kafka failed (attempt 5)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:57,775 Connection to Kafka failed (attempt 6)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:07:59,779 Connection to Kafka failed (attempt 7)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}2020-07-24 07:08:00,947 Failed to create topic ingest-sessions
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'ingest-sessions' already exists."}
2020-07-24 07:08:00,948 Failed to create topic events
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'events' already exists."}
2020-07-24 07:08:00,948 Failed to create topic event-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'event-replacements' already exists."}
2020-07-24 07:08:00,949 Failed to create topic snuba-commit-log
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'snuba-commit-log' already exists."}
2020-07-24 07:08:00,949 Failed to create topic cdc
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'cdc' already exists."}
2020-07-24 07:08:00,950 Failed to create topic errors-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'errors-replacements' already exists."}
2020-07-24 07:08:00,950 Failed to create topic outcomes
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap
    future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'outcomes' already exists."}

Another information : if i do a captureMessage, i receive an eventId so i imagine that the event is stored somewhere, but then it’s not managed by a component of the Sentry stack.

These logs suggest that your Kafka instance is having trouble staying up or being reached. Are you sure you have enough resources on the machine you are running Sentry on?

I though i had enough resources.
Without any more lues, i resetted the server and install it again (50Go HDD, 8Go RAM)
Hope it will be enough.

And so, it happened again. Yesterday at around 10h00 it stopped to catch events,
Abasolutely no idea on how to fix this.

I’m trying to run /usr/bin/docker-compose --file /home/sentry/onpremise/docker-compose.yml exec worker sentry cleanup --days 30 but i’m not confident on what i’m doing

Had the same problem last month aswell. To make it run again i used:
docker-compose stop
docker-compose rm
docker-compose up -d
rebuilding everything helped me. (and pacht to the newest version fixed it for me finaly)

2 Likes

In some cases, the Post Process Forwarder gets into a bad state and you need to do some manual cleanup.

From Post Process Forwarder - KafkaError "Offset Out of Range" · Issue #478 · getsentry/self-hosted · GitHub

(This is if you haven’t changed the default number of instances from the main docker-compose, and if you are in the onpremise directory)

  1. Stop sentry

    docker-compose stop

  2. Start only Zookeeper and Kafka

    docker start sentry_onpremise_kafka_1 sentry_onpremise_zookeeper_1

  3. Hop into the Kafka instance interactively

    docker exec -it sentry_onpremise_kafka_1 /bin/bash

  4. Receive consumers list

    kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --list

  5. Get group info

    kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --group snuba-post-processor -describe

  6. Set the offsets to latest

    kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute

  7. Exit the Kafka instance

    exit

  8. Stop Zookeeper and Kafka

    docker stop sentry_onpremise_kafka_1 sentry_onpremise_zookeeper_1

  9. Restart the entire stack

    docker-compose up

  10. Check the logs. You should see the standard retry errors as it starts up, but the continuous restarts should be resolved.
4 Likes

In the middle of an upgrade from Sentry 9.0.0 to 21.5.1 (currently in testing state so no real load), I got this (infamous) Sentry issue.

In our case it’s these two Docker containers restarting:

sentry_onpremise_snuba-subscription-consumer-events_1         ./docker_entrypoint.sh sub ...   Restarting
sentry_onpremise_snuba-subscription-consumer-transactions_1   ./docker_entrypoint.sh sub ...   Restarting

The steps described do not fix the issue, but I have a feeling I’m looking at the wrong consumers.
Running the steps as described above (and / or the ones here) give this output, but no fix to the issue:

root@sentry0-dev:/srv/getsentry/onpremise# doco ps
            Name                        Command                    State                      Ports
---------------------------------------------------------------------------------------------------------------
sentry_onpremise_kafka_1       /etc/confluent/docker/run   Up (health: starting)   9092/tcp
sentry_onpremise_zookeeper_1   /etc/confluent/docker/run   Up (health: starting)   2181/tcp, 2888/tcp, 3888/tcp
root@sentry0-dev:/srv/getsentry/onpremise# docker exec -it sentry_onpremise_kafka_1 /bin/bash
root@3c2256caf236:/#
root@3c2256caf236:/# kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --list
transactions_group
snuba-replacers
query-subscription-consumer
snuba-consumers
subscriptions-commit-log-ffd8f6aac5fa11ebb1090242ac140015
snuba-post-processor:sync:23f54096c5f811eb876c0242ac140016
ingest-consumer
subscriptions-commit-log-29027fa6c5fb11ebab100242ac140015
snuba-transactions-subscriptions-consumers
subscriptions-commit-log-d607d3f0c5fa11ebb2dc0242ac140012
snuba-events-subscriptions-consumers
snuba-post-processor
subscriptions-commit-log-d6a75a4cc5fa11eb89d20242ac140015
subscriptions-commit-log-31864046c5fa11eb80830242ac140015
snuba-subscription-consumer-events
subscriptions-commit-log-286de3f0c5fb11ebb1070242ac140012
subscriptions-commit-log-ff3f05eac5fa11ebac400242ac140012
snuba-subscription-consumer-transactions
root@3c2256caf236:/#
root@3c2256caf236:/# kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --group snuba-post-processor -describe

Consumer group 'snuba-post-processor' has no active members.

GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
snuba-post-processor events          0          248453          248453          0               -               -               -
root@3c2256caf236:/# kafka-consumer-groups --bootstrap-server 127.0.0.1:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor           events                         0          248453
root@3c2256caf236:/# exit
exit
root@sentry0-dev:/srv/getsentry/onpremise# docker stop sentry_onpremise_kafka_1 sentry_onpremise_zookeeper_1
sentry_onpremise_kafka_1
sentry_onpremise_zookeeper_1

What should I run instead to get the 2 containers behave?

UPDATE - see also my answer below. Pick these:

  • snuba-transactions-subscriptions-consumers
  • snuba-events-subscriptions-consumers

Answering my own question after getting it fixed.

A list of all available groups & topics might give something more to hold on to:

docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --all-groups --all-topics --describe

The output is too big / too wide to include here.

But from the output I picked snuba-transactions-subscriptions-consumers here:

$ docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-transactions-subscriptions-consumers --topic events --reset-offsets --to-latest --dry-run
Creating sentry_onpremise_kafka_run ... done

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-transactions-subscriptions-consumers events                         0          248453

$ docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-transactions-subscriptions-consumers --topic events --reset-offsets --to-latest --execute
Creating sentry_onpremise_kafka_run ... done

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-transactions-subscriptions-consumers events                         0          248453

And for snuba-events-subscriptions-consumers here:

$  docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-events-subscriptions-consumers --topic events --reset-offsets --to-latest --dry-run
Creating sentry_onpremise_kafka_run ... done

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-events-subscriptions-consumers events                         0          248453
$ docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-events-subscriptions-consumers --topic events --reset-offsets --to-latest --execute
Creating sentry_onpremise_kafka_run ... done

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-events-subscriptions-consumers events                         0          248453

Afterward I saw in the compose logs:

snuba-subscription-consumer-transactions_1 | 2021-06-05 13:40:35,402 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 248453}

and

snuba-subscription-consumer-events_1 | 2021-06-05 13:48:33,257 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 248453}

Hope this helps anyone else running into comparable issues.

1 Like

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.