Sentry 20.9.0- kafka failures

I thought this was fixed in Sentry 20.9.0. Still seeing this issue . Workaround was to restart the app, every night . But it is annoying, I have a Hefty sentry instance with 16GB and 8 Cores.

Any workaround on this @BYK

I have this already in the config.yml
postprocess.use-cache-key: 1.0

Docker-compose on the workers. The save_events is failing

 worker1:
    << : *sentry_defaults
    command: run worker -Q events.process_event
  worker2:
    << : *sentry_defaults
    command: run worker -Q events.reprocessing.process_event
  worker3:
    << : *sentry_defaults
    command: run worker -Q events.reprocess_events
  worker4:
    << : *sentry_defaults
    command: run worker -Q events.save_event
  worker5:
    << : *sentry_defaults
    command: run worker -Q subscriptions
  worker6:
    << : *sentry_defaults
    command: run worker -Q integrations
  worker:
    << : *sentry_defaults
    command: run worker
  ingest-consumer:
    << : *sentry_defaults
    command: run ingest-consumer --all-consumer-types
  post-process-forwarder:
    << : *sentry_defaults
    # Increase `--commit-batch-size 1` below to deal with high-load environments.
    command: run post-process-forwarder --commit-batch-size 100

Logs -

    worker_1                       |     **options)
    worker4_1                      | %3|1613649298.817|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 1 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.817|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 1 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.817|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
    worker4_1                      | %3|1613649298.818|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker_1                       |   File "/usr/local/lib/python2.7/site-packages/redis/client.py", line 680, in parse_response
    worker4_1                      | %3|1613649298.818|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.818|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
    worker4_1                      | %3|1613649298.818|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.818|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker_1                       |     response = connection.read_response()
    worker4_1                      | %3|1613649298.818|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
    worker4_1                      | %3|1613649298.846|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.846|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect
    worker4_1                      | %3|1613649298.846|ERROR|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: 1/1 brokers are down
    worker_1                       |   File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
    worker4_1                      | %3|1613649298.849|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: 3 request(s) timed out: disconnect

I think that option was introduced in 20.10.1. That said at this point you are quite behind so I recommend upgrading to latest version and then go from there.

a few weeks ago my setup fails too, but after upgrading to the newest version is site working correcly without any issues.
best regards

1 Like