Nothing can connect to Kafka

I have all containers that need to connect to Kafka on a internal network.

I keep getting errors like this:

%3|1607642598.185|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.28.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1607642591.800|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.28.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1607642595.208|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.28.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1607642593.551|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.28.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-12-10T23:23:01Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.28.0.10:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)

It’s clear Kafka is unreachable, here’s what i tried:

  • I can open a shell in the Relay container and ping Kafka.
  • I can install kafkacat in the Relay container and connect that to Kafka.
  • Disabled ipv6.
  • Restarted/Recreated the container and the whole stack. (only the containers, no volumes)
  • Fully recreated from scratch (removed all volumes, deleted all files, re-cloned git repo, ran install again, rebuilt containers with --no-cache)

If anyone has any ideas what I could try I would love to hear them. Im really lost here since not even fully recreating worked.

Are you using the hostname or the IP address when testing these?

Also what do you kafka and zookeeper logs say?

Its connecting with hostname “kafka” as per the default docker compose file. I was using the same hostname when testing with ping and kafkacat.

i will attach kafka and zookeeper logs shortly, kafka logs seemed fine when i looked at them, dont know about zookeeper since i wasn’t aware that had much to do with it (ive never used kafka).

Can you then try to see if the IP address resolved from the kafka hostname with your trials match the ones from the error messages? If not this looks like a bad DNS cache somewhere.

I can ping “kafka” in the Relay container and get the same IP as the logs show.
Sadly i couldn’t figure out a way for kafkacat to tell me the IP its connecting to. Heres the kafkacat output anyways: https://pastebin.com/CtDMTRbk

since 3 links was too much for one post:

kafka log: https://pastebin.com/zbtK7KBr
zookeeper log: https://pastebin.com/5Fs7MED3

I have now tried recreating everything after removing all the containers and volumes and then doing docker system prune -a to get rid of absolutely everything that could be interfering.

This could be a issue specific to my system, but i’ve tried to go every step i can to make sure this is a clean install.

Docker version: 19.03.12, build 48a66213fe
Operating System: Ubuntu 20.04 LTS
Kernel: Linux 5.4.0-31-generic
Architecture: x86-64

Just check out the release/20.11.1 branch instead of masted and also updated docker to the newest version. same issues. also, someone else has the same issue New install fails on Ubuntu 20.04 - kafta errors

What does the “issues”-page say in Sentry?

the issues page is obviously empty, i cant send any events to sentry so how should it be able to get issues?

there is a banner about workers not checking in and another about 1 issue with my config.

Relay i still getting connection refused when trying to connect to kafka

Depends on whats faulty.

Try bringing down sentry and then run:

docker system prune --volumes
docker image prune -a

To get rid of everything related to Sentry.
Then run ./install.sh again

How did you install docker? Did you choose docker in that feature-list while installing Ubuntu?

thread title: Nothing can connect to Kafka
should be quite obvious. kafka is unreachable meaning no events whatsoever can be processed. how should there be anything on my issues page?

i have done that. a lot of times. only difference is that i removed the volumes manually by name (docker volume ls | grep sentry and then pipe those back into docker volume rm)

i followed this page https://docs.docker.com/engine/install/ubuntu/ which is the official install instructions.

Good luck :slight_smile:

@laundmo - seems like your ZooKeeper instance fails, causing Kafka to fail. The error somewhat seems to be related to this fix: https://github.com/getsentry/onpremise/pull/525/files

I’d try running that line manually after the install, restart the ZK instance and see if it helps. If it doesn’t, I’d try deleting all zookeeper-related volumes. Note that these volumes would have a prefix as they are not “global” docker volumes but scoped to the Sentry on-premise compose project.

I see that you’ve done this but I cannot really understand why ZK wouldn’t stay up.

It seems its not zookeeper or kafka, both stay up just fine.

latest zookeeper logs: https://pastebin.com/YaSkCZq8
latest kafka logs: https://pastebin.com/C9Tga5UM

at this point i have given up on getting it to work. it seems something is just deeply broken and i dont have the time or motivation to spend days on this. Im going to try if there are any updates/changes but other than that im done. disappointing.

I understand the frustration and disappointment and am sorry about the experience. All I can guess is something being broken at Docker or Network layer as there’s nothing special we are doing to connect to kafka and these all work in 2 different CI systems.

Hope we get to the bottom of this.

The weird thing is that so many other services run just fine with no network issues whatsoever. I run multiple databases, 2 instances of RabbitMQ, and at least 20 different other webservices on the same server just fine.

@laundmo I’m not denying there might be things we can do to improve the situation. Just saying we just don’t know yet and for all we know, both of our CI flows seem to be able to install sentry and get a test event through. So I’m guessing there’s at least some part related to your specific setup (maybe it is about Ubuntu 20.04, or the Docker version, or something else).

We will be trying these out soon but any indicators or direct help you can provide would also help others who might be experiencing similar issues.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.