Kafka Error after Upgrading

After grading from commit 5d064c to commit 89e80 gives this error.

worker_1 | %3|1575794821.759|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.64.10:9092 failed: Connection refused
worker_1 | %3|1575794821.759|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.64.10:9092 failed: Connection refused
worker_1 | %3|1575794821.759|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.64.10:9092 failed: Connection refused
worker_1 | %3|1575794821.759|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.64.10:9092 failed: Connection refused
worker_1 | %3|1575794827.761|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve ‘kafka:9092’: Name or service not known
worker_1 | %3|1575794827.761|ERROR|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve ‘kafka:9092’: Name or service not known
worker_1 | %3|1575794827.762|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve ‘kafka:9092’: Name or service not known

Logs from kafka

kafka_1 | [2019-12-08 08:51:34,121] ERROR [KafkaServer id=3130] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka_1 | org.apache.kafka.common.KafkaException: Found directory /var/lib/kafka/data/data, ‘data’ is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion).
kafka_1 | Kafka’s log directories (and children) should only contain Kafka topic data.
kafka_1 | at kafka.log.Log$.exception$1(Log.scala:2265)
kafka_1 | at kafka.log.Log$.parseTopicPartitionName(Log.scala:2272)
kafka_1 | at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:260)
kafka_1 | at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$11$$anonfun$apply$15$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:345)
kafka_1 | at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
kafka_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
kafka_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
kafka_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
kafka_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
kafka_1 | at java.lang.Thread.run(Thread.java:748)

Update:

I changed

volumes:
- ‘sentry-kafka:/var/lib/kafka/data’

to

volumes:
- ‘sentry-kafka:/var/lib/kafka’

Everything is working fine now

If you look at the diff here https://github.com/getsentry/onpremise/compare/5d064c..89e80#diff-4e5e90c6228fd48698d074241c2ba760R92 it shows the change you had to adjust, yes.

I’d rather you migrate your sentry-kafka volume or wipe it: docker volume rm sentry-kafka && docker volume create sentry-kafka. Note that you may lose some events if you wipe it so if you chose to migrate, you can do something like

docker run --rm -it -v sentry-kafka:/kafka alpine ash -c \
  "mv /kafka/data/* /kafka; rm -rf /kafka/data"

After wipe and recreate volume we got the following error

kafka_1 | [2019-12-10 04:05:18,245] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-1 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-1 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-5 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-5 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-26 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-26 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-29 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-29 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-34 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-34 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-10 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-10 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-32 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-32 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2019-12-10 04:05:18,246] ERROR [Controller id=3150 epoch=5] Controller 3150 epoch 5 failed to change state for partition __consumer_offsets-40 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-40 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

I assume it’s as expected?

1 Like

Thanks for reporting back and no that’s not very expected. I’m not an expert on Kafka but maybe we need to clear the zookeeper volume too to prevent this?

We removed zookeeper and redis volume then rebuild, but to no avail. A serious side effect was Sentry hangs if you try to delete any event, so we ended up rebuilding the whole stack (lost all data :frowning:).

@cwang - sorry you had to go through this. This is why we have marked v10 as beta for now. I’ll explore ways of a cleaner migration from earlier v10 installations in the meantime.

No worries. Glad we can make some contribution. You guys are doing a really awesome job :slight_smile:

1 Like

Im having the same issue. ERROR [Controller id=1004 epoch=53] Controller 1004 epoch 53 failed to change state for partition __consumer_offsets-40 from OfflinePartition to OnlinePartition (state.change.logger)

Our sentry is not showing any issues on dashboard. :frowning:

We are hitting this also, deleting volumes doesn’t seem to fix the issue either… any other ideas?

Which volumes have you tried deleting?

sentry-data
sentry-redis
sentry-zookeeper
sentry-kafka
sentry-clickhouse
sentry-symbolicator

Can you also try removing the following:

  • sentry-zookeeper-log
  • sentry-kafka-log

@BYK unfortunately still no go.

@turbo124 if you can share some logs it may help us understand if there’s something else going on?

sure, when running ./install.sh i see this. Did you want any from a specific container also?

+ exec gosu snuba snuba bootstrap --force^M
%3|1597226008.239|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.16.6:9092 failed: Connection refused (after 1ms in state CONNECT)^M
%3|1597226009.236|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.16.6:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppress`Preformatted text`


cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}^M
%3|1597226010.238|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.16.6:9092 failed: Connection refused (after 0ms in state CONNECT)^M
%3|1597226011.238|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#192.168.16.6:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppresse



2020-08-12 09:53:36,456 Failed to create topic snuba-commit-log^M
Traceback (most recent call last):^M
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 92, in bootstrap^M
    future.result()^M
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result^M
    return self.__get_result()^M
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result^M
    raise self._exception^M
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'snuba-commit-log' already exists."}^M

I was getting the same issues @turbo124

We have similiar problem - maybe same.

Try adding KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: ‘2’ to kafka config in docker-compose

@BYK, after removing the kafka and zookeeper volumes and changing the KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: ‘2’. I see these errors

snuba-replacer_1 | + ‘[’ r = - ‘]’
snuba-replacer_1 | + snuba replacer --help
snuba-replacer_1 | + set – snuba replacer --storage events --auto-offset-reset=latest --max-batch-size 3
snuba-replacer_1 | + set gosu snuba snuba replacer --storage events --auto-offset-reset=latest --max-batch-size 3
snuba-replacer_1 | + exec gosu snuba snuba replacer --storage events --auto-offset-reset=latest --max-batch-size 3
snuba-replacer_1 | Traceback (most recent call last):
snuba-replacer_1 | File “/usr/local/bin/snuba”, line 33, in
snuba-replacer_1 | sys.exit(load_entry_point(‘snuba’, ‘console_scripts’, ‘snuba’)())
snuba-replacer_1 | File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 722, in call
snuba-replacer_1 | return self.main(*args, **kwargs)
snuba-replacer_1 | File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 697, in main
snuba-replacer_1 | rv = self.invoke(ctx)
snuba-replacer_1 | File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 1066, in invoke
snuba-replacer_1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-replacer_1 | File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 895, in invoke
snuba-replacer_1 | return ctx.invoke(self.callback, **ctx.params)
snuba-replacer_1 | File “/usr/local/lib/python3.8/site-packages/click/core.py”, line 535, in invoke
snuba-replacer_1 | return callback(*args, **kwargs)
snuba-replacer_1 | File “/usr/src/snuba/snuba/cli/replacer.py”, line 132, in replacer
snuba-replacer_1 | replacer.run()
snuba-replacer_1 | File “/usr/src/snuba/snuba/utils/streams/processing.py”, line 132, in run
snuba-replacer_1 | self._run_once()
snuba-replacer_1 | File “/usr/src/snuba/snuba/utils/streams/processing.py”, line 138, in _run_once
snuba-replacer_1 | msg = self.__consumer.poll(timeout=1.0)
snuba-replacer_1 | File “/usr/src/snuba/snuba/utils/streams/kafka.py”, line 412, in poll
snuba-replacer_1 | raise ConsumerError(str(error))
snuba-replacer_1 | snuba.utils.streams.consumer.ConsumerError: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str=“Subscribed topic not available: event-replacements: Broker: Unknown topic or partition”}

No longer getting the above issue , after
re-running ./install.sh. Workers still not processing tasks. The only error i see is from the snuba-replacer_1 to kafka

@BYK @hheexx, any idea

sentry_onpremise_snuba-replacer_1

%3|1597293627.097|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.24.0.9:9092 failed: Connection refused (after 168ms in state CONNECT)
%3|1597293627.930|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.24.0.9:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-08-13 04:40:49,183 New partitions assigned: {Partition(topic=Topic(name=‘event-replacements’), index=0): 0}

@hheexx tried adding the kafka offset, no luck unfortunately.