After upgrade to 20.9 Sentry showing "info" level events only

Hello

Similar to others, we have met the problem that Sentry will stop working with following symptons after we have upgrade into 20:

  1. Only info level events are captured and displayed. All error and fatal level sentry event will not be displayed.
  2. It is traffic volume related. With big sentry traffic, the problem will happend after 3 days around, with lower traffic, it might be a week.
  3. the way to turn Sentry back to service is to clear away volumes of kafka, zookeeper, and redis, and run the install.sh again. With this approach, Sentry will back to serive, displaying captured error and fatal level events until it next stop

Checking around forum, I guess the problem may be related to relay service, which failed to send request to web.

So I changed the logging level of relay service to TRACE and got following logs

relay logs
2020-08-30T09:49:11Z [relay::setup] INFO: launching relay from config folder /work/.relay
2020-08-30T09:49:11Z [relay::setup] INFO:   relay mode: managed
2020-08-30T09:49:11Z [relay::setup] INFO:   relay id: 02e827b1-ca56-4f33-9f97-4c00fdcacac7
2020-08-30T09:49:11Z [relay::setup] INFO:   public key: FSQ5SkT2oFaroJN0eub5pWZmjitFA_Pzu739pCB5kNc
2020-08-30T09:49:11Z [relay::setup] INFO:   log level: TRACE
2020-08-30T09:49:11Z [relay_server::actors::upstream] INFO: upstream relay started
2020-08-30T09:49:11Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:11Z [relay_server::actors::outcome::processing] INFO: OutcomeProducer started.
2020-08-30T09:49:11Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.128.10:9092 failed: Connection refused (after 21ms in state CONNECT)
2020-08-30T09:49:11Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-08-30T09:49:11Z [relay_server::actors::events] INFO: starting 8 event processing workers
2020-08-30T09:49:11Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.128.10:9092 failed: Connection refused (after 1ms in state CONNECT)
2020-08-30T09:49:11Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-08-30T09:49:11Z [relay_server::service] INFO: spawning http server
2020-08-30T09:49:11Z [relay_server::service] INFO:   listening on: http://0.0.0.0:3000/
2020-08-30T09:49:11Z [actix_net::server::server] INFO: Starting 8 workers
2020-08-30T09:49:11Z [relay_server::actors::store] INFO: store forwarder started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T09:49:11Z [actix_net::server::server] INFO: Starting server on 0.0.0.0:3000
2020-08-30T09:49:11Z [relay_server::actors::controller] INFO: relay server starting
2020-08-30T09:49:11Z [relay_server::actors::connector] INFO: metered connector started
2020-08-30T09:49:11Z [relay_server::actors::events] INFO: event manager started
2020-08-30T09:49:11Z [relay_server::actors::project_local] INFO: project local cache started
2020-08-30T09:49:11Z [relay_server::actors::project_upstream] INFO: project upstream cache started
2020-08-30T09:49:11Z [relay_server::actors::project_cache] INFO: project cache started
2020-08-30T09:49:11Z [relay_server::actors::project_keys] INFO: project cache started
2020-08-30T09:49:11Z [relay_server::actors::relays] INFO: key cache started
2020-08-30T09:49:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:14Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 0 seconds
2020-08-30T09:49:14Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:17Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:17Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 1 seconds
2020-08-30T09:49:18Z [relay_server::actors::project] DEBUG: project 3 initialized without state
2020-08-30T09:49:18Z [relay_server::actors::project] DEBUG: project 3 state requested
2020-08-30T09:49:18Z [relay_server::actors::events] TRACE: queued event
2020-08-30T09:49:18Z [relay_server::actors::project] DEBUG: project 3 state request amended
2020-08-30T09:49:18Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1)
2020-08-30T09:49:18Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:18Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:18Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:19Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 2)
2020-08-30T09:49:19Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:19Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:21Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 3)
2020-08-30T09:49:21Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:21Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:21Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:21Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 1 seconds
2020-08-30T09:49:23Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:23Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 4)
2020-08-30T09:49:23Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:23Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:26Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:26Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 2 seconds
2020-08-30T09:49:27Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 5)
2020-08-30T09:49:27Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:27Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:28Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:31Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:31Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 3 seconds
2020-08-30T09:49:32Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 6)
2020-08-30T09:49:32Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:32Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:34Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:37Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:37Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 5 seconds
2020-08-30T09:49:39Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 7)
2020-08-30T09:49:39Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:39Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:42Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:45Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:45Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 7 seconds
2020-08-30T09:49:51Z [relay_server::actors::project_upstream] DEBUG: updating project states for 0/0 projects (attempt 8)
2020-08-30T09:49:52Z [relay_server::actors::project] DEBUG: project 3 state requested
2020-08-30T09:49:52Z [relay_server::actors::events] TRACE: queued event
2020-08-30T09:49:52Z [relay_server::actors::project] DEBUG: project 3 state request amended
2020-08-30T09:49:52Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1)
2020-08-30T09:49:52Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:52Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:53Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:49:53Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 2)
2020-08-30T09:49:53Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:53Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:55Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 3)
2020-08-30T09:49:55Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:55Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:56Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:49:56Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 11 seconds
2020-08-30T09:49:57Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 4)
2020-08-30T09:49:57Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:49:57Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:49:58Z [relay_server::actors::project] DEBUG: project 3 state request amended
2020-08-30T09:49:58Z [relay_server::actors::events] TRACE: queued event
2020-08-30T09:49:58Z [relay_server::actors::project] DEBUG: project 3 state request amended
2020-08-30T09:50:01Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 5)
2020-08-30T09:50:01Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:50:01Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:50:06Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 6)
2020-08-30T09:50:06Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:50:06Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:50:07Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T09:50:10Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-08-30T09:50:10Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 17 seconds
2020-08-30T09:50:14Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 7)
2020-08-30T09:50:14Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T09:50:14Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-08-30T09:50:25Z [relay_server::actors::project_upstream] DEBUG: updating project states for 0/0 projects (attempt 8)
2020-08-30T09:50:25Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
  caused by: failed to fetch project state from upstream
2020-08-30T09:50:25Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
  caused by: failed to fetch project state from upstream
2020-08-30T09:50:25Z [relay_server::actors::outcome::processing] TRACE: handling outcome
2020-08-30T09:50:25Z [relay_server::actors::outcome::processing] TRACE: Tracking kafka outcome: TrackRawOutcome { timestamp: "2020-08-30T09:50:25.764955Z", org_id: None, project_id: ProjectId(3), key_id: None, outcome: 3, reason: Some("internal"), event_id: Some(EventId(345b4e55-0290-4640-b689-fd8ff78119ef)), remote_addr: Some("10.255.255.154"), source: None }
2020-08-30T09:50:25Z [relay_server::actors::outcome::processing] TRACE: handling outcome
2020-08-30T09:50:25Z [relay_server::actors::outcome::processing] TRACE: Tracking kafka outcome: TrackRawOutcome { timestamp: "2020-08-30T09:50:25.764987Z", org_id: None, project_id: ProjectId(3), key_id: None, outcome: 3, reason: Some("internal"), event_id: Some(EventId(aa0cf1bd-5c07-45c1-9d34-49acdbc07cab)), remote_addr: Some("10.255.255.154"), source: None }
2020-08-30T09:50:27Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)

So I guess maybe due to this authentication encountered error: could not send request to upstream results in error fetching project states: attempted to send request while not yet authenticated, the events are all queued up, until the resource was somehow eaten up.

To figure out, as relay service has web for upstream, so I added this dependency in docker-compose.yml as

  relay:
    << : *restart_policy
    image: '$RELAY_IMAGE'
    volumes:
      - type: bind
        read_only: true
        source: ./relay
        target: /work/.relay
    depends_on:
      - kafka
      - redis
      - web

The I have following errors and it seems the problem is solved while I will wait for one week to see:

connection was reset
2020-08-30T10:28:47Z [relay::setup] INFO: launching relay from config folder /work/.relay
2020-08-30T10:28:47Z [relay::setup] INFO:   relay mode: managed
2020-08-30T10:28:47Z [relay::setup] INFO:   relay id: 02e827b1-ca56-4f33-9f97-4c00fdcacac7
2020-08-30T10:28:47Z [relay::setup] INFO:   public key: FSQ5SkT2oFaroJN0eub5pWZmjitFA_Pzu739pCB5kNc
2020-08-30T10:28:47Z [relay::setup] INFO:   log level: TRACE
2020-08-30T10:28:47Z [relay_server::actors::upstream] INFO: upstream relay started
2020-08-30T10:28:47Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T10:28:47Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.176.10:9092 failed: Connection refused (after 5ms in state CONNECT)
2020-08-30T10:28:47Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-08-30T10:28:47Z [relay_server::actors::outcome::processing] INFO: OutcomeProducer started.
2020-08-30T10:28:47Z [relay_server::actors::events] INFO: starting 8 event processing workers
2020-08-30T10:28:47Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.176.10:9092 failed: Connection refused (after 12ms in state CONNECT)
2020-08-30T10:28:47Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-08-30T10:28:47Z [relay_server::service] INFO: spawning http server
2020-08-30T10:28:47Z [relay_server::service] INFO:   listening on: http://0.0.0.0:3000/
2020-08-30T10:28:47Z [actix_net::server::server] INFO: Starting 8 workers
2020-08-30T10:28:47Z [relay_server::actors::store] INFO: store forwarder started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-08-30T10:28:47Z [actix_net::server::server] INFO: Starting server on 0.0.0.0:3000
2020-08-30T10:28:47Z [relay_server::actors::controller] INFO: relay server starting
2020-08-30T10:28:47Z [relay_server::actors::connector] INFO: metered connector started
2020-08-30T10:28:47Z [relay_server::actors::events] INFO: event manager started
2020-08-30T10:28:47Z [relay_server::actors::project_local] INFO: project local cache started
2020-08-30T10:28:47Z [relay_server::actors::project_upstream] INFO: project upstream cache started
2020-08-30T10:28:47Z [relay_server::actors::project_cache] INFO: project cache started
2020-08-30T10:28:47Z [relay_server::actors::project_keys] INFO: project cache started
2020-08-30T10:28:47Z [relay_server::actors::relays] INFO: key cache started
2020-08-30T10:28:47Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-08-30T10:28:47Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 0 seconds
2020-08-30T10:28:47Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T10:28:47Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-08-30T10:28:47Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 1 seconds
2020-08-30T10:28:48Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T10:28:48Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-08-30T10:28:48Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 1 seconds
2020-08-30T10:28:50Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T10:28:50Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-08-30T10:28:50Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 2 seconds
2020-08-30T10:28:52Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-08-30T10:28:56Z [relay_server::actors::upstream] DEBUG: got register challenge (token = DUiIvgwLcnLEn2YULXvO3z7bryMEZl6NEHROz3hBDmQrqTda0DqwaahwnomNk6V5pphlIAiNSZnwa-yeHHZuaA)
2020-08-30T10:28:56Z [relay_server::actors::upstream] DEBUG: sending register challenge response
2020-08-30T10:28:56Z [relay_server::actors::upstream] INFO: relay successfully registered with upstream
2020-08-30T10:29:01Z [relay_server::actors::project] DEBUG: project 3 initialized without state
2020-08-30T10:29:01Z [relay_server::actors::project] DEBUG: project 3 state requested
2020-08-30T10:29:01Z [relay_server::actors::events] TRACE: queued event
2020-08-30T10:29:01Z [relay_server::actors::project] DEBUG: project 3 state request amended
2020-08-30T10:29:01Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1)
2020-08-30T10:29:01Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-30T10:29:01Z [relay_server::actors::project] DEBUG: project 3 state updated
2020-08-30T10:29:01Z [relay_server::actors::events] TRACE: processing json event
2020-08-30T10:29:01Z [relay_server::actors::events] TRACE: sending envelope to kafka
2020-08-30T10:29:01Z [relay_server::actors::store] TRACE: Sending event item of envelope to kafka
2020-08-30T10:29:52Z [relay_server::actors::events] TRACE: queued event
2020-08-30T10:29:52Z [relay_server::actors::events] TRACE: processing json event
2020-08-30T10:29:52Z [relay_server::actors::events] TRACE: sending envelope to kafka
2020-08-30T10:29:52Z [relay_server::actors::store] TRACE: Sending event item of envelope to kafka
2020-08-30T10:29:58Z [relay_server::actors::events] TRACE: queued event
2020-08-30T10:29:58Z [relay_server::actors::events] TRACE: processing json event
2020-08-30T10:29:58Z [relay_server::actors::events] TRACE: sending envelope to kafka
2020-08-30T10:29:58Z [relay_server::actors::store] TRACE: Sending event item of envelope to kafka

Please kindly check this issue

Hi @liangrong, thanks for the detailed report. Relay should be able to handle Sentry Web not being available immediately and its core dependencies are only the data stores. However, we are investigating potential issues per your report.

By the way, can you ensure your sentry.conf.py file matches the following config for SENTRY_WEB_OPTIONS:

Hi

I made no change to sentry.conf.py. For my installation, I have put all config in the config.yml file, so the sentry.conf.py is generated by install.sh at its default

1 Like

I got into same issue with version 20.9 fresh deployment.

update

after hours heavy load test, Sentry stop displaying events it received. The stop time is around 2020-09-01T04:19:42.901949+00:00, which is the last updating toward postgres.

From the log, we can see that nginx is still active in dispatching the events it received, and ingest-consumer is also actively forwarding to kafka.

for kafka, there is no any error logs, which means it is still working.

while for worker, there are following logs at time when Sentry stop displaying:

worker log
04:06:14 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
04:08:44 [INFO] sentry.tasks.digests: Skipped digest delivery due to empty digest (target_type=u'IssueOwners' project=3L target_identifier=u'None')
04:11:14 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
04:11:14 [INFO] sentry.tasks.update_user_reports: update_user_reports.records_updated (reports_with_event=0 updated_reports=0 reports_to_update=0)
04:16:14 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 316, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 592, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 1047, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 344, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 721, in _brpop_read
    **options)
  File "/usr/local/lib/python2.7/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 403, in read_response
    (e.args,))
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
04:19:09 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
%3|1598942736.326|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
%3|1598942736.327|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
%3|1598942773.160|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
%3|1598942773.367|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down

for snuba-consumer, there is a session out error towards kafka

snuba-comsumer
2020-09-01 04:19:49,317 Completed processing <Batch: 7 messages, open for 1.10 seconds>.
2020-09-01 04:19:50,603 Completed processing <Batch: 9 messages, open for 1.18 seconds>.
2020-09-01 04:19:51,789 Completed processing <Batch: 12 messages, open for 1.15 seconds>.
2020-09-01 04:19:53,247 Completed processing <Batch: 5 messages, open for 1.23 seconds>.
%4|1598942903.321|SESSTMOUT|rdkafka#consumer-2| [thrd:main]: Consumer group session timed out (in join-state started) after 480353 ms without a successful response from the group coordinator (broker 1001, last error was Success): revoking assignment and rejoining group
2020-09-01 06:48:23,322 Partitions revoked: [Partition(topic=Topic(name='events'), index=0)]
2020-09-01 06:48:26,125 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 278957}

for snuba-outcomes-consumer, seems to have exactly the same as snuba-comsumer

for postgres, it seems to be idle, as the logs are not refreshing anymore

By docker-compose down and up

  1. cached events of error and fatal levels are displaying now, but the new coming events are still invisible
  2. both cached and newly coming events of info level are displaying now.

Then by a nuke of sentry-redis volume, all come back as normal now

update 2

now sentry is not displaying any more again.

At this moment, from log, I can see tat for worker service it has:

worker log
7:54:59 [INFO] sentry.tasks.update_user_reports: update_user_reports.records_updated (reports_with_event=0 updated_reports=0 reports_to_update=0)
07:56:30 [INFO] sentry.tasks.digests: Skipped digest delivery due to empty digest (project=21L target_type=u'IssueOwners')
07:59:59 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
08:04:59 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
08:09:59 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
08:09:59 [INFO] sentry.tasks.update_user_reports: update_user_reports.records_updated (reports_with_event=0 updated_reports=0 reports_to_update=0)
08:14:59 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 316, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 592, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 276, in create_loop
    tick_callback()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 1040, in on_poll_start
    cycle_poll_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 322, in on_poll_start
    self._register_BRPOP(channel)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 308, in _register_BRPOP
    channel._brpop_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 714, in _brpop_start
    self.client.connection.send_command('BRPOP', *keys)
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 603, in send_packed_command
    (errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
08:15:52 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Restoring 5 unacknowledged message(s)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 316, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 592, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 276, in create_loop
    tick_callback()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 1040, in on_poll_start
    cycle_poll_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 322, in on_poll_start
    self._register_BRPOP(channel)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 308, in _register_BRPOP
    channel._brpop_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 714, in _brpop_start
    self.client.connection.send_command('BRPOP', *keys)
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 603, in send_packed_command
    (errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
08:16:02 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Restoring 4 unacknowledged message(s)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 316, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 592, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 276, in create_loop
    tick_callback()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 1040, in on_poll_start
    cycle_poll_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 322, in on_poll_start
    self._register_BRPOP(channel)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 308, in _register_BRPOP
    channel._brpop_start()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 714, in _brpop_start
    self.client.connection.send_command('BRPOP', *keys)
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 603, in send_packed_command
    (errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
08:16:03 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Restoring 14 unacknowledged message(s)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 316, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 592, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 1047, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 344, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py", line 721, in _brpop_read
    **options)
  File "/usr/local/lib/python2.7/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 403, in read_response
    (e.args,))
ConnectionError: Error while reading from socket: ('Connection closed by server.',)
08:16:05 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Restoring 1 unacknowledged message(s)

and the last postgress work is at 2020-09-01T08:15:35.297000+00:00 until it comes idle again

update 3

Now sentry is stopping displaying any more.

All docker container run fine except

  1. postgres service is becoming idle, no more database operation any more, and log stopped at
postgres logs
ERROR:  duplicate key value violates unique constraint "sentry_grouprelease_group_id_044354c8_uniq"
DETAIL:  Key (group_id, release_id, environment)=(277, 533, ) already exists.
STATEMENT:  INSERT INTO "sentry_grouprelease" ("project_id", "group_id", "release_id", "environment", "first_seen", "last_seen") VALUES (2, 277, 533, '', '2020-09-02T07:57:06.219000+00:00'::timestamptz, '2020-09-02T07:57:06.219000+00:00'::timestamptz) RETURNING "sentry_grouprelease"."id"
ERROR:  duplicate key value violates unique constraint "sentry_grouprelease_group_id_044354c8_uniq"
DETAIL:  Key (group_id, release_id, environment)=(1162, 175, ) already exists.
STATEMENT:  INSERT INTO "sentry_grouprelease" ("project_id", "group_id", "release_id", "environment", "first_seen", "last_seen") VALUES (3, 1162, 175, '', '2020-09-02T07:57:08.655000+00:00'::timestamptz, '2020-09-02T07:57:08.655000+00:00'::timestamptz) RETURNING "sentry_grouprelease"."id"
ERROR:  duplicate key value violates unique constraint "sentry_eventuser_project_id_377a63c0_uniq"
DETAIL:  Key (project_id, hash)=(3, 81dcb6ea3e90cd50ec27d76f0236e274) already exists.
STATEMENT:  INSERT INTO "sentry_eventuser" ("project_id", "hash", "ident", "email", "username", "name", "ip_address", "date_added") VALUES (3, '81dcb6ea3e90cd50ec27d76f0236e274', '23a814ea543a529c', NULL, NULL, NULL, NULL, '2020-09-02T07:57:59.890618+00:00'::timestamptz) RETURNING "sentry_eventuser"."id"
ERROR:  duplicate key value violates unique constraint "sentry_eventuser_project_id_377a63c0_uniq"
DETAIL:  Key (project_id, hash)=(2, 90ffcaf7dcd7b644aa0d2a223032ed1d) already exists.
STATEMENT:  INSERT INTO "sentry_eventuser" ("project_id", "hash", "ident", "email", "username", "name", "ip_address", "date_added") VALUES (2, '90ffcaf7dcd7b644aa0d2a223032ed1d', '65b809347c2899d2', NULL, NULL, NULL, NULL, '2020-09-02T07:58:08.736572+00:00'::timestamptz) RETURNING "sentry_eventuser"."id"
ERROR:  duplicate key value violates unique constraint "sentry_grouprelease_group_id_044354c8_uniq"
DETAIL:  Key (group_id, release_id, environment)=(190, 487, ) already exists.
STATEMENT:  INSERT INTO "sentry_grouprelease" ("project_id", "group_id", "release_id", "environment", "first_seen", "last_seen") VALUES (2, 190, 487, '', '2020-09-02T07:58:06.579000+00:00'::timestamptz, '2020-09-02T07:58:06.579000+00:00'::timestamptz) RETURNING "sentry_grouprelease"."id"
  1. at the time of last postgress insert operation, for worker there is an log of
worker logs
07:57:38 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.

07:57:42 [INFO] sentry.tasks.digests: Skipped digest delivery due to empty digest (target_type=u’IssueOwners’ project=10L target_identifier=u’None’)
Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 276, in create_loop
tick_callback()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1040, in on_poll_start
cycle_poll_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 322, in on_poll_start
self._register_BRPOP(channel)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 308, in _register_BRPOP
channel._brpop_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 714, in _brpop_start
self.client.connection.send_command(‘BRPOP’, *keys)
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 603, in send_packed_command
(errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
07:57:56 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…
Restoring 2 unacknowledged message(s)
Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 276, in create_loop
tick_callback()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1040, in on_poll_start
cycle_poll_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 322, in on_poll_start
self._register_BRPOP(channel)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 308, in _register_BRPOP
channel._brpop_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 714, in _brpop_start
self.client.connection.send_command(‘BRPOP’, *keys)
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 603, in send_packed_command
(errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
07:58:12 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…
Restoring 4 unacknowledged message(s)
Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 276, in create_loop
tick_callback()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1040, in on_poll_start
cycle_poll_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 322, in on_poll_start
self._register_BRPOP(channel)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 308, in _register_BRPOP
channel._brpop_start()
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 714, in _brpop_start
self.client.connection.send_command(‘BRPOP’, *keys)
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 603, in send_packed_command
(errno, errmsg))
ConnectionError: Error 32 while writing to socket. Broken pipe.
07:58:35 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…
Restoring 4 unacknowledged message(s)
07:58:38 [ERROR] sentry.errors.events: process.failed.empty (cache_key=u’e:62b36a2df6db445aa2cd1890d5bffa4f:2’)
Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 354, in create_loop
cb(*cbargs)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1047, in on_readable
self.cycle.on_readable(fileno)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 344, in on_readable
chan.handlerstype
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 721, in _brpop_read
**options)
File “/usr/local/lib/python2.7/site-packages/redis/client.py”, line 680, in parse_response
response = connection.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 624, in read_response
response = self._parser.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 403, in read_response
(e.args,))
ConnectionError: Error while reading from socket: (‘Connection closed by server.’,)
07:58:38 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…

P.S.

In report of update 2, I mistook worker as web. The error is corrected

update 4

by upgrade to Sentry 20.9.0.dev00d0cb1f, now Sentry is stable as before even in heavy load.

Case closed

1 Like

I encourage you to try out the suggestion here anyway: https://github.com/getsentry/onpremise/issues/629#issuecomment-693576019