Sentry stopped logging after spike?

Hi there,

we have an issue with Sentry 21.5.1 d8a761f. We sometimes still receive alerts about new issues per email but if we want to check the issue on the page it says there’s nothing to be found.

Older issues that were logged 2+ months before can be accessed.

Looking at the usage stats it looks like something is coming in.

Could this spike the issue?

We have since then restarted/upgraded sentry as well.

I have read through the forum but could not find a similar issue.

I’m assuming it has something to do with relay:

2021-07-16T16:56:25Z [relay_server::endpoints::common] ERROR: error handling request: failed to queue envelope
  caused by: Too many envelopes (event_buffer_size reached)

Any idea what could be going on or how to fix this issue?
I have posted the output of docker compose logs below.

Thank you for reading and best regards.

Logs:

cron_1                                      | 07:39:25 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
cron_1                                      | 07:39:51 [INFO] sentry.plugins.github: apps-not-configured
cron_1                                      | * Unknown config option found: 'slack.legacy-app'
clickhouse_1                                | 2021.07.20 08:45:35.637165 [ 101 ] {} <Information> HTTPHandler: Done processing query
clickhouse_1                                | 2021.07.20 08:45:35.646436 [ 86 ] {d0006c71-c07b-4b70-8c72-c6ee107765f1} <Information> executeQuery: Read 1 rows, 4.31 KiB in 0.033 sec., 30 rows/sec., 132.02 KiB/sec.
clickhouse_1                                | 2021.07.20 08:45:35.647226 [ 86 ] {} <Information> HTTPHandler: Done processing query
clickhouse_1                                | 2021.07.20 08:45:37.554022 [ 101 ] {45ac6bd4-011c-4183-9ccf-9e406b909cb3} <Information> executeQuery: Read 1 rows, 92.00 B in 0.009 sec., 105 rows/sec., 9.47 KiB/sec.
clickhouse_1                                | 2021.07.20 08:45:37.554644 [ 101 ] {} <Information> HTTPHandler: Done processing query
ingest-consumer_1                           | 08:43:33 [INFO] batching-kafka-consumer: Worker flush took 24ms
ingest-consumer_1                           | 08:44:00 [INFO] batching-kafka-consumer: Flushing 1 items (from {('ingest-events', 0): [87581, 87581]}): forced:False size:False time:True
ingest-consumer_1                           | 08:44:00 [INFO] batching-kafka-consumer: Worker flush took 32ms
ingest-consumer_1                           | 08:45:34 [INFO] batching-kafka-consumer: Flushing 1 items (from {('ingest-events', 0): [87582, 87582]}): forced:False size:False time:True
ingest-consumer_1                           | 08:45:34 [INFO] batching-kafka-consumer: Worker flush took 43ms
memcached_1                                 | Signal handled: Terminated.
sentry-cleanup_1                            | 00:00:16 [INFO] sentry.deletions.async: object.delete.executed (object_id=2550 transaction_id='846fd37c69d94b0c996a44bcd5349fb2' app_label='sentry' model='Group')
sentry-cleanup_1                            | Cleaning up unused FileBlob references
sentry-cleanup_1                            | * Unknown config option found: 'slack.legacy-app'
File Blobs: 100% |##############################################| Time: 0:00:00
sentry-cleanup_1                            | * Unknown config option found: 'slack.legacy-app'
relay_1                                     |   caused by: Too many envelopes (event_buffer_size reached)
relay_1                                     | 2021-07-20T07:40:50Z [relay_server::actors::project_upstream] ERROR: error fetching project states: could not send request to upstream
relay_1                                     |   caused by: error sending request for url (http://web:9000/api/0/relays/projectconfigs/?version=2): operation timed out
relay_1                                     | 2021-07-20T07:40:56Z [relay_server::actors::project_upstream] ERROR: error fetching project states: could not send request to upstream
relay_1                                     |   caused by: error sending request for url (http://web:9000/api/0/relays/projectconfigs/?version=2): operation timed out
smtp_1                                      | ' /etc/exim4/update-exim4.conf.conf
smtp_1                                      | + update-exim4.conf -v
smtp_1                                      | using non-split configuration scheme from /etc/exim4/exim4.conf.template
smtp_1                                      |   273 LOG: MAIN
smtp_1                                      |   273   exim 4.92 daemon started: pid=273, no queue runs, listening for SMTP on port 25 (IPv6 and IPv4)
nginx_1                                     | 172.20.0.1 - - [20/Jul/2021:08:45:33 +0000] "POST /api/8/store/ HTTP/1.0" 200 41 "-" "sentry.php/2.5.2" "-"
nginx_1                                     | 172.20.0.1 - - [20/Jul/2021:08:45:36 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.11.0" "-"
nginx_1                                     | 172.20.0.1 - - [20/Jul/2021:08:45:38 +0000] "POST /api/8/store/ HTTP/1.0" 200 41 "-" "sentry.php/2.5.2" "-"
postgres_1                                  | DETAIL:  Key (group_id, release_id, environment)=(527, 195, production) already exists.
postgres_1                                  | STATEMENT:  INSERT INTO "sentry_grouprelease" ("project_id", "group_id", "release_id", "environment", "first_seen", "last_seen") VALUES (14, 527, 195, 'production', '2021-07-20T08:38:55.185000+00:00'::timestamptz, '2021-07-20T08:38:55.185000+00:00'::timestamptz) RETURNING "sentry_grouprelease"."id"
postgres_1                                  | ERROR:  duplicate key value violates unique constraint "sentry_environmentprojec_project_id_environment_i_91da82f2_uniq"
postgres_1                                  | DETAIL:  Key (project_id, environment_id)=(3, 2) already exists.
postgres_1                                  | STATEMENT:  INSERT INTO "sentry_environmentproject" ("project_id", "environment_id", "is_hidden") VALUES (3, 2, NULL) RETURNING "sentry_environmentproject"."id"
post-process-forwarder_1                    |     raise Exception(error)
post-process-forwarder_1                    | Exception: KafkaError{code=COORDINATOR_LOAD_IN_PROGRESS,val=14,str="JoinGroup failed: Broker: Coordinator load in progress"}
post-process-forwarder_1                    | 07:40:16 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
post-process-forwarder_1                    | 07:40:30 [INFO] sentry.plugins.github: apps-not-configured
post-process-forwarder_1                    | 07:40:34 [INFO] sentry.eventstream.kafka.backend: Received partition assignment: [TopicPartition{topic=events,partition=0,offset=-1001,error=None}]
redis_1                                     | 1:M 20 Jul 2021 08:45:26.006 * 100 changes in 300 seconds. Saving...
redis_1                                     | 1:M 20 Jul 2021 08:45:26.018 * Background saving started by pid 1601
redis_1                                     | 1601:C 20 Jul 2021 08:45:26.079 * DB saved on disk
redis_1                                     | 1601:C 20 Jul 2021 08:45:26.081 * RDB: 0 MB of memory used by copy-on-write
redis_1                                     | 1:M 20 Jul 2021 08:45:26.119 * Background saving terminated with success
kafka_1                                     | [2021-07-15 07:39:57,742] INFO Stat of the created znode at /brokers/ids/1001 is: 141,141,1626334797717,1626334797717,1,0,0,72060675200319489,180,0,141
kafka_1                                     |  (kafka.zk.KafkaZkClient)
kafka_1                                     | [2021-07-15 07:39:57,743] INFO Registered broker 1001 at path /brokers/ids/1001 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 141 (kafka.zk.KafkaZkClient)
kafka_1                                     | [2021-07-15 07:39:59,390] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
kafka_1                                     | [2021-07-15 07:39:59,656] INFO [SocketServer brokerId=1001] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
snuba-api_1                                 | 2021-07-19 21:07:55,529 Connection was closed, reconnecting.
snuba-api_1                                 | 2021-07-19 21:07:55,530 Error on socket shutdown: [Errno 107] Transport endpoint is not connected
snuba-api_1                                 | 2021-07-20 06:44:43,259 Error on clickhouse:9000 ping: Unexpected EOF while reading bytes
snuba-api_1                                 | 2021-07-20 06:44:43,260 Connection was closed, reconnecting.
snuba-api_1                                 | 2021-07-20 06:44:43,260 Error on socket shutdown: [Errno 107] Transport endpoint is not connected
snuba-cleanup_1                             | 2021-07-20 08:25:03,027 Dropped 0 partitions on clickhouse:9000
snuba-cleanup_1                             | 2021-07-20 08:30:03,145 Dropped 0 partitions on clickhouse:9000
snuba-cleanup_1                             | 2021-07-20 08:35:03,342 Dropped 0 partitions on clickhouse:9000
snuba-cleanup_1                             | 2021-07-20 08:40:03,782 Dropped 0 partitions on clickhouse:9000
snuba-cleanup_1                             | 2021-07-20 08:45:03,775 Dropped 0 partitions on clickhouse:9000
snuba-consumer_1                            | 2021-07-20 08:41:16,416 Completed processing <Batch: 1 message, open for 1.05 seconds>.
snuba-consumer_1                            | 2021-07-20 08:42:04,289 Completed processing <Batch: 1 message, open for 1.05 seconds>.
snuba-consumer_1                            | 2021-07-20 08:43:34,588 Completed processing <Batch: 1 message, open for 1.04 seconds>.
snuba-consumer_1                            | 2021-07-20 08:44:01,725 Completed processing <Batch: 1 message, open for 1.04 seconds>.
snuba-consumer_1                            | 2021-07-20 08:45:35,650 Completed processing <Batch: 1 message, open for 1.06 seconds>.
snuba-outcomes-consumer_1                   | 2021-07-20 08:45:15,015 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-outcomes-consumer_1                   | 2021-07-20 08:45:21,811 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-outcomes-consumer_1                   | 2021-07-20 08:45:22,911 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-outcomes-consumer_1                   | 2021-07-20 08:45:35,640 Completed processing <Batch: 1 message, open for 1.05 seconds>.
snuba-outcomes-consumer_1                   | 2021-07-20 08:45:37,557 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-replacer_1                            |     self.__message = self.__consumer.poll(timeout=1.0)
snuba-replacer_1                            |   File "/usr/src/snuba/snuba/utils/streams/backends/kafka/__init__.py", line 398, in poll
snuba-replacer_1                            |     raise ConsumerError(str(error))
snuba-replacer_1                            | snuba.utils.streams.backends.abstract.ConsumerError: KafkaError{code=COORDINATOR_LOAD_IN_PROGRESS,val=14,str="JoinGroup failed: Broker: Coordinator load in progress"}
snuba-replacer_1                            | 2021-07-15 07:40:25,158 New partitions assigned: {Partition(topic=Topic(name='event-replacements-legacy'), index=0): 0}
snuba-sessions-consumer_1                   | 2021-07-20 08:45:07,342 Completed processing <Batch: 1 message, open for 1.03 seconds>.
snuba-sessions-consumer_1                   | 2021-07-20 08:45:13,057 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-sessions-consumer_1                   | 2021-07-20 08:45:17,946 Completed processing <Batch: 1 message, open for 1.03 seconds>.
snuba-sessions-consumer_1                   | 2021-07-20 08:45:30,831 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-sessions-consumer_1                   | 2021-07-20 08:45:32,442 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-subscription-consumer-events_1        | 2021-07-20 08:43:37,606 Worker flush took 0ms
snuba-subscription-consumer-events_1        | 2021-07-20 08:44:05,628 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=87578, hi=87579)})
snuba-subscription-consumer-events_1        | 2021-07-20 08:44:05,630 Worker flush took 0ms
snuba-subscription-consumer-events_1        | 2021-07-20 08:45:38,716 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=87579, hi=87580)})
snuba-subscription-consumer-events_1        | 2021-07-20 08:45:38,719 Worker flush took 0ms
snuba-subscription-consumer-transactions_1  | 2021-07-20 08:43:38,063 Worker flush took 0ms
snuba-subscription-consumer-transactions_1  | 2021-07-20 08:44:05,087 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=87578, hi=87579)})
snuba-subscription-consumer-transactions_1  | 2021-07-20 08:44:05,088 Worker flush took 0ms
snuba-subscription-consumer-transactions_1  | 2021-07-20 08:45:39,181 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=87579, hi=87580)})
snuba-subscription-consumer-transactions_1  | 2021-07-20 08:45:39,183 Worker flush took 0ms
snuba-transactions-consumer_1               | 2021-07-20 08:41:16,383 Completed processing <Batch: 1 message, open for 1.02 seconds>.
snuba-transactions-consumer_1               | 2021-07-20 08:42:04,241 Completed processing <Batch: 1 message, open for 1.00 seconds>.
snuba-transactions-consumer_1               | 2021-07-20 08:43:34,556 Completed processing <Batch: 1 message, open for 1.00 seconds>.
snuba-transactions-consumer_1               | 2021-07-20 08:44:01,690 Completed processing <Batch: 1 message, open for 1.00 seconds>.
snuba-transactions-consumer_1               | 2021-07-20 08:45:35,596 Completed processing <Batch: 1 message, open for 1.01 seconds>.
subscription-consumer-transactions_1        | 07:40:14 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.on_revoke (offsets='{}' partitions='[TopicPartition{topic=transactions-subscription-results,partition=0,offset=-1001,error=None}]')
subscription-consumer-transactions_1        | 07:40:14 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.on_assign (offsets='{0: None}' partitions='[TopicPartition{topic=transactions-subscription-results,partition=0,offset=-1001,error=None}]')
subscription-consumer-transactions_1        | 07:40:32 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.commit_offsets (offsets='{0: None}' partitions='[0]')
subscription-consumer-transactions_1        | 07:40:32 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.on_revoke (offsets='{}' partitions='[TopicPartition{topic=transactions-subscription-results,partition=0,offset=-1001,error=None}]')
subscription-consumer-transactions_1        | 07:40:32 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.on_assign (offsets='{0: None}' partitions='[TopicPartition{topic=transactions-subscription-results,partition=0,offset=-1001,error=None}]')
symbolicator-cleanup_1                      | 2021-07-18T23:55:01Z [symbolicator::cache] INFO: Cleaning up cache: cficaches
symbolicator-cleanup_1                      | 2021-07-19T23:55:01Z [symbolicator::cache] INFO: Cleaning up cache: objects
symbolicator-cleanup_1                      | 2021-07-19T23:55:01Z [symbolicator::cache] INFO: Cleaning up cache: object_meta
symbolicator-cleanup_1                      | 2021-07-19T23:55:01Z [symbolicator::cache] INFO: Cleaning up cache: symcaches
symbolicator-cleanup_1                      | 2021-07-19T23:55:01Z [symbolicator::cache] INFO: Cleaning up cache: cficaches
web_1                                       | [uwsgi-http key:  client_addr: 172.20.0.29 client_port: 53913] hr_read(): Connection reset by peer [plugins/http/http.c line 917]
web_1                                       | 08:30:46 [INFO] sentry.integrations.jira.webhooks: missing-assignee-email (issue_key='ROX-36' integration_id=1)
web_1                                       | [uwsgi-http key:  client_addr: 172.20.0.29 client_port: 53916] hr_read(): Connection reset by peer [plugins/http/http.c line 917]
web_1                                       | [uwsgi-http key:  client_addr: 172.20.0.29 client_port: 29344] hr_read(): Connection reset by peer [plugins/http/http.c line 917]
web_1                                       | [uwsgi-http key:  client_addr: 172.20.0.29 client_port: 62116] hr_write(): Connection reset by peer [plugins/http/http.c line 565]
zookeeper_1                                 | ===> Launching ...
zookeeper_1                                 | ===> Launching zookeeper ...
zookeeper_1                                 | [2021-07-15 07:39:28,627] WARN Either no config or no quorum defined in config, running  in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
zookeeper_1                                 | [2021-07-15 07:39:31,883] WARN o.e.j.s.ServletContextHandler@4d95d2a2{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
zookeeper_1                                 | [2021-07-15 07:39:31,959] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)
subscription-consumer-events_1              |     raise KafkaException(error)
subscription-consumer-events_1              | cimpl.KafkaException: KafkaError{code=COORDINATOR_LOAD_IN_PROGRESS,val=14,str="JoinGroup failed: Broker: Coordinator load in progress"}
subscription-consumer-events_1              | 07:40:17 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
subscription-consumer-events_1              | 07:40:30 [INFO] sentry.plugins.github: apps-not-configured
subscription-consumer-events_1              | 07:40:32 [INFO] sentry.snuba.query_subscription_consumer: query-subscription-consumer.on_assign (offsets='{0: None}' partitions='[TopicPartition{topic=events-subscription-results,partition=0,offset=-1001,error=None}]')
worker_1                                    | 08:27:46 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
worker_1                                    | 08:32:46 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
worker_1                                    | 08:37:46 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
worker_1                                    | 08:39:57 [INFO] sentry.tasks.update_user_reports: update_user_reports.records_updated (reports_to_update=0 reports_with_event=0 updated_reports=0)
worker_1                                    | 08:42:46 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.

Seems likely.

failed to queue envelope caused by: Too many envelopes (event_buffer_size reached)

That sounds to me like you’re overloading your system, in which case the official recommendation is going to be to migrate to SaaS, as onpremise is not designed for heavy workloads.

Thank you for the input.
This is usually not a common occurrence so I thought there was a fix that would help me start it up again since relay also doesnt seem to connect properly:

2021-07-20T07:40:56Z [relay_server::actors::project_upstream] ERROR: error fetching project states: could not send request to upstream
relay_1                                     |   caused by: error sending request for url (http://web:9000/api/0/relays/projectconfigs/?version=2): operation timed out