Kafka Error after Upgrading

This is either a network or capacity issue with Kafka.

cat /etc/kafka/kafka.properties
offsets.topic.num.partitions=2
advertised.listeners=PLAINTEXT://kafka:9092
offsets.topic.replication.factor=1
zookeeper.connect=zookeeper:2181
log.dirs=/var/lib/kafka/data
listeners=PLAINTEXT://0.0.0.0:9092
log.retention.hours=24
confluent.support.metrics.enable=false

I donot see any messaged in existing topics

usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic events --from-beginning --max-messages 100
^CProcessed a total of 0 messages

usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic ingest-transactions --from-beginning --max-messages 100

I tried with ā€˜foobarā€™, got LEADER_NOT_AVVALIABLE.
usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic foobarevents --from-beginning --max-messages 100
[2020-08-14 20:50:01,326] WARN [Consumer clientId=consumer-console-consumer-46855-1, groupId=console-consumer-46855] Error while fetching metadata with correlation id 2 : {foobarevents=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

usr/bin/kafka-topics --list --zookeeper zookeeper:2181
__consumer_offsets
cdc
cdcā€“consumer-property
errors-replacements
errors-replacementsā€“consumer-property
event-replacements

events
eventsā€“consumer-property
ingest-attachments
ingest-events
ingest-sessions
ingest-transactions
outcomes
snuba-commit-log

Seems iā€™ve had some success, i deleted the following volumes and Sentry has come back online.

    docker volume rm sentry-data
    docker volume rm sentry-zookeeper
    docker volume rm sentry-kafka
    docker volume rm sentry-symbolicator
    docker volume rm sentry-kafka
    docker volume rm sentry-zookeeper
    docker volume rm sentry-zookeeper-log
    docker volume rm sentry-kafka-log
    ./install.sh 
     docker-compose up -d

Ok,

I think I am understanding the issue here. It seems to be a resource issue.

After bringing back up the docker instance, it falls over under too much load.

I am going to move to much higher spec hardware to see if the issue persists with a quality dedicated server.

Iā€™m looking at a 128Gb dedicated server with XeonĀ® D-2141I processors and NVMe drivers.

@BYK can you confirm whether the app is typically Memory / CPU / IO constrained?

If I had to guess, Iā€™d go with memory and disk capacity as limiting factors. Maybe @matt can provide more info on this.

@BYK thanks.

Update, brand new server 64Gb of RAM, NVMe drives, Clean install of Ubuntu 20.04 nothing else running. I see the exact same issue.

+ exec gosu snuba snuba bootstrap --force
%3|1597958012.567|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.21.0.5:9092 failed: Connection refused (after 1ms in state CONNECT)

Once we set the docker container up, we sent a steady stream of test events and the container fell over with this error after an hour or so.

Anything I can check here before I tear this down?

@turbo124 - do you see any logs from Kafka that may reveal any uptime issues?

Oh wait, these may be from Kafka itself. I honestly donā€™t know what this might be. Were you able to figure out a solution?

@BYK unfortunately no, I donā€™t have much experience with Kafka.

I am running into similar issues. Have anyone found a solution yet?