Upgrading from 21.2 to 21.5 events are'nt process anymore

Hey,

it seems that is was’nt the best idea to upgrade our working sentry instance on friday afternoon, which was setup 2 month ago. We use the latest master version from the onpremise repo. After the install.sh was successfull executed, the webfrontend starts without problem, but our events aren’t process anymore. Its seems like the that kafka some how doesnt work anymore.

Find the logs attached of the relay after the docker-compose up -d

relay_1                                     | 2021-05-03T08:16:52Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Disconnected (after 292723ms in state UP)
relay_1                                     | 2021-05-03T08:16:52Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                                     | 2021-05-03T08:16:52Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 333ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 333ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 467ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 467ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:16:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:16:58Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Failed to resolve 'kafka:9092': Name or service not known (after 66ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:58Z [rdkafka::client] ERROR: librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/1001: Failed to resolve 'kafka:9092': Name or service not known (after 66ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:59Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Failed to resolve 'kafka:9092': Name or service not known (after 240ms in state CONNECT)
relay_1                                     | 2021-05-03T08:16:59Z [rdkafka::client] ERROR: librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/1001: Failed to resolve 'kafka:9092': Name or service not known (after 240ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:00Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:00Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:00Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:00Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/1001: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 1ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 1ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:04Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                                     | 2021-05-03T08:17:04Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:04Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:05Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:17:05Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:17:05Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:17:05Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
relay_1                                     | 2021-05-03T08:17:05Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:06Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:09Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:12Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:16Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 7ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:16Z [rdkafka::client] ERROR: librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 7ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:16Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 3ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:16Z [rdkafka::client] ERROR: librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 3ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:17Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 403 Forbidden
relay_1                                     |   caused by: no error details
relay_1                                     | 2021-05-03T08:17:18Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:18Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:18Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                                     | 2021-05-03T08:17:18Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.21.0.11:9092 failed: Connection refused

What i did, to try to fix the problem:

It is important to mention that I have secured the Sentry server via SSL, but nginx and web interface works without problems.

The host also has more than enough power, 32 cores, 128gb RAM etc.

Thanks for your help!

Can you share your Kafka and ZK logs please?

Also, have you tried restarting the relay service: docker-compose restart relay

Hey BYK,

thank you for the quick answer.

i restarted the relay several times. Executed docker-compose down / up. Nothings helps :frowning:

Zookeeper:

Kafka:

Hope this helps!

Yeah, as I suspected your Zookeeper instance keeps restarting. Kafka also reports low amount of available memory:

kafka_1                                     | [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.free=1891MB
kafka_1                                     | [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.max=27305MB
kafka_1                                     | [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.total=1932MB

I’d check the memory usage and OOM-killer logs as Zookeeper does not state any reason for the shutdown which makes me think it is being killed by something else and keeps restarting.

Thanks for the hint.

But thats really strange, we have more than enough memory on the host machine.

Top:

%Cpu(s):  0.5 us,  0.2 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128932.7 total,  94672.4 free,  26881.0 used,   7379.4 buff/cache

egrep -i -r 'killed process' /var/log/ is empty.

But I also have to admit that I’ve been trying to get it running again somehow for almost 2 days. I’ve tried rly a lot. Delete all volumes except the postgre volume. I’m on the verge of just setting everything up again.

It might be disk too of course. If Docker is running inside a VM or something, the allocated memory might be low. I really cannot debug a system I don’t operate from here :smiley:

Fair enough. Thanks for your help! Its running on a dedicated machine.

I’ll take a break and have an experienced admin look at it tomorrow.

Props to your fast support. Sentry is a great piece of software. :heart_eyes:

If I find the solution, I’ll let you know.

1 Like

My pleasure and happy to try helping more. It’s just hard with this latency and unknowns :slight_smile:

All I can see is ZK restarting itself which seems to be the root of all issues you are having.

Ok. So. I have now clue but its working now, with a fresh new installation of version 21.2 on our System, everything works without problems.

21.5 leads to fail on our system. :man_shrugging:

Very unsatisfactory solution. But thank god, we are’nt blind any more