Sentry did not access new events

Hello

Today my on-premise Sentry experienced a high load (300,000 events instead of the usual 2000 per hour) and after I sorted out the events, I stopped accepting new ones. I can send an event via the sentry-cli, get the event ID in response, but it does not appear either in the sentry or in the nginx\postgres logs. How can I diagnose the problem?
Thank you,
Best regards,
Roman

Very likely that your Kafka instance is clogged up and stuck hence the events are now being pooled in relay. You may want to check this guide: https://develop.sentry.dev/self-hosted/troubleshooting/#kafka

Thank you for fast reply.
Kafka is not stuck.
docker logs kafka says:

docker kafka logs

===> Launching …
===> Launching kafka …
[2021-09-01 15:08:37,138] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-09-01 15:08:37,755] WARN The package io.confluent.support.metrics.collectors.FullCollector for collecting the full set of support metrics could not be loaded, so we are reverting to anonymous, basic metric collection. If you are a Confluent customer, please refer to the Confluent Platform documentation, section Proactive Support, on how to activate full metrics collection. (io.confluent.support.metrics.KafkaSupportConfig)
[2021-09-01 15:08:37,755] WARN The support metrics collection feature (“Metrics”) of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
[2021-09-01 15:08:38,654] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2021-09-01 15:08:38,712] INFO [kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner)
[2021-09-01 15:08:39,012] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2021-09-01 15:08:39,046] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(0.0.0.0,9092,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.network.SocketServer)
[2021-09-01 15:08:39,048] INFO [SocketServer brokerId=1001] Started 1 acceptor threads for data-plane (kafka.network.SocketServer)
[2021-09-01 15:08:39,147] INFO Creating /brokers/ids/1001 (is it secure? false) (kafka.zk.KafkaZkClient)
[2021-09-01 15:08:39,168] INFO Stat of the created znode at /brokers/ids/1001 is: 2039,2039,1630508919158,1630508919158,1,0,0,72057595854389251,180,0,2039
(kafka.zk.KafkaZkClient)
[2021-09-01 15:08:39,169] INFO Registered broker 1001 at path /brokers/ids/1001 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 2039 (kafka.zk.KafkaZkClient)
[2021-09-01 15:08:39,410] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2021-09-01 15:08:39,495] INFO [SocketServer brokerId=1001] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)

After kafka troubleshooting steps it says:

Attempt of Kafka recovery

[root@sentry]# docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list
Starting sentry_onpremise_zookeeper_1 … done
snuba-consumers
ingest-consumer
snuba-transactions-subscriptions-consumers
transactions_group
snuba-events-subscriptions-consumers
snuba-post-processor
subscriptions-commit-log-6cae903e0b3611ec81460242ac130017
snuba-post-processor:sync:7089f3240b3611ecbd800242ac130013
subscriptions-commit-log-6cac252e0b3611eca9f20242ac130007
snuba-replacers
query-subscription-consumer
[root@sentry]# docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers -describe
Starting sentry_onpremise_zookeeper_1 … done

GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-consumers outcomes 0 7556652 7556653 1 rdkafka-f201f3c3-c960-48bf-abc3-19825cf12a7a /172.19.0.17 rdkafka
snuba-consumers events 0 6927857 6927857 0 rdkafka-f9e967d4-315d-4633-94bd-8f53dff2810b /172.19.0.15 rdkafka
snuba-consumers ingest-sessions 0 1374002 1374002 0 rdkafka-f836a3c6-fba3-4d3b-a68f-b36a11e2414c /172.19.0.9 rdkafka
[root@sentry]# docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --dry-run
Starting sentry_onpremise_zookeeper_1 … done

Error: Assignments can only be reset if the group ‘snuba-consumers’ is inactive, but the current state is Stable.

GROUP TOPIC PARTITION NEW-OFFSET
[root@sentry]# docker-compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --execute
Starting sentry_onpremise_zookeeper_1 … done

Error: Assignments can only be reset if the group ‘snuba-consumers’ is inactive, but the current state is Stable.

GROUP TOPIC PARTITION NEW-OFFSET
[root@sentry]#

Also I’ve found relay logs:

Relay docker logs

2021-09-01T20:32:55Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:32:57Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:32:57Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:32:57Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:32:58Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 60s
2021-09-01T20:32:58Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: no record found for name: web...local.dns.domain. type: AAAA class: IN
2021-09-01T20:33:09Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:33:11Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:33:13Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2021-09-01T20:33:13Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime

Time is correct.

Web service have not any errors at current moment, only info.
Redis logs:

docker redis logs

77:C 01 Sep 2021 20:33:55.797 * RDB: 0 MB of memory used by copy-on-write
1:M 01 Sep 2021 20:33:55.821 * Background saving terminated with success
1:M 01 Sep 2021 20:35:19.648 * 10000 changes in 60 seconds. Saving…
1:M 01 Sep 2021 20:35:19.649 * Background saving started by pid 78
78:C 01 Sep 2021 20:35:19.899 * DB saved on disk
78:C 01 Sep 2021 20:35:19.900 * RDB: 1 MB of memory used by copy-on-write
1:M 01 Sep 2021 20:35:19.950 * Background saving terminated with success
1:M 01 Sep 2021 20:36:33.646 * 10000 changes in 60 seconds. Saving…
1:M 01 Sep 2021 20:36:33.647 * Background saving started by pid 79
79:C 01 Sep 2021 20:36:33.986 * DB saved on disk
79:C 01 Sep 2021 20:36:33.987 * RDB: 1 MB of memory used by copy-on-write
1:M 01 Sep 2021 20:36:34.048 * Background saving terminated with success
1:M 01 Sep 2021 20:37:58.980 * 10000 changes in 60 seconds. Saving…
1:M 01 Sep 2021 20:37:58.981 * Background saving started by pid 80
80:C 01 Sep 2021 20:37:59.323 * DB saved on disk
80:C 01 Sep 2021 20:37:59.323 * RDB: 1 MB of memory used by copy-on-write
1:M 01 Sep 2021 20:37:59.382 * Background saving terminated with success

Ok, recovered Kafka, some of events was sent to the sentry (37 events), but now It’s stuck again and did not process anything

I have added dependency of “web” to docker-compose to block “relay”. Restarted and now all works correctly

Post for solution:

  1. Kafka instance is clogged up and stuck hence the events are now being pooled in relay . You may want to check this guide: https://develop.sentry.dev/self-hosted/troubleshooting/#kafka
  2. I have added dependency of “web” to docker-compose to block “relay”. Restarted and now all works correctly

Thank you, @BYK

1 Like

Ah, we should do this by default now, thanks for the reminder!

Oh wait, we already have it here: onpremise/docker-compose.yml at b32de84d31dd1cc0b9c9534fae2e95ad86a0cbee · getsentry/onpremise · GitHub

Are you sure you have the latest version of the repo?

Yes, i’m sorry, This was added not so long ago, so I didn’t have time to update

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.