- ‘[’ b = - ‘]’
- snuba bootstrap --help
- set – snuba bootstrap --force
- set gosu snuba snuba bootstrap --force
- exec gosu snuba snuba bootstrap --force
2020-03-15 14:08:34,544 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 55, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str=“Failed to get metadata: Local: Broker transport failure”}
2020-03-15 14:08:36,547 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 55, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str=“Failed to get metadata: Local: Broker transport failure”}
2020-03-15 14:08:37,630 Failed to create topic events
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 435, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘events’ already exists.”}
2020-03-15 14:08:37,631 Failed to create topic event-replacements
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘event-replacements’ already exists.”}
2020-03-15 14:08:37,631 Failed to create topic snuba-commit-log
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘snuba-commit-log’ already exists.”}
2020-03-15 14:08:37,632 Failed to create topic cdc
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘cdc’ already exists.”}
2020-03-15 14:08:37,632 Failed to create topic ingest-sessions
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘ingest-sessions’ already exists.”}
2020-03-15 14:08:37,632 Failed to create topic errors-replacements
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘errors-replacements’ already exists.”}
2020-03-15 14:08:37,632 Failed to create topic outcomes
Traceback (most recent call last):
File “/usr/src/snuba/snuba/cli/bootstrap.py”, line 89, in bootstrap
future.result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 428, in result
return self.__get_result()
File “/usr/local/lib/python3.7/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str=“Topic ‘outcomes’ already exists.”}
2020-03-15 14:08:37,658 Tables for dataset transactions created.
2020-03-15 14:08:37,683 Tables for dataset events created.
2020-03-15 14:08:37,688 Tables for dataset groupedmessage created.
2020-03-15 14:08:37,695 Tables for dataset sessions created.
2020-03-15 14:08:37,713 Tables for dataset events_migration created.
2020-03-15 14:08:37,715 Tables for dataset outcomes_raw created.
2020-03-15 14:08:37,719 Tables for dataset groupassignee created.
2020-03-15 14:08:37,719 Tables for dataset discover created.
2020-03-15 14:08:37,723 Tables for dataset outcomes created.
These seem okay. Those exceptions are printed for information in case something is very off. Looks like the topics were already created so this is not your first run or install, which is fine. I’ll see if we can swallow those specific errors.
Do you have any other issue with your setup?
Thanks.
When the server starts, the following error occurs.
kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-22 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2020-03-19 09:04:43,651] ERROR [Controller id=1005 epoch=41] Controller 1005 epoch 41 failed to change state for partition __consumer_offsets-30 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-30 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
kafka_1 | at kafka.controller.PartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:292)
kafka_1 | at kafka.controller.PartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:210)
kafka_1 | at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:133)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:123)
kafka_1 | at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:109)
kafka_1 | at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:66)
kafka_1 | at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:260)
kafka_1 | at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
kafka_1 | at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1134)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:89)
kafka_1 | at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
kafka_1 | at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:88)
kafka_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
kafka_1 | [2020-03-19 09:04:43,654] ERROR [Controller id=1005 epoch=41] Controller 1005 epoch 41 failed to change state for partition __consumer_offsets-8 from OfflinePartition to OnlinePartition (state.change.logger)
kafka_1 | kafka.common.StateChangeFailedException: Failed to elect leader for partition __consumer_offsets-8 under strategy OfflinePartitionLeaderElectionStrategy
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:366)
kafka_1 | at kafka.controller.PartitionStateMachine$$anonfun$doElectLeaderForPartitions$3.apply(PartitionStateMachine.scala:364)
kafka_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
kafka_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
kafka_1 | at kafka.controller.PartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:364)
Another issue is that my project can’t collect issues. Why?
All I can say is your Kafka instance has some issues but I don’t know what. It can be a corrupt volume so if you don’t have any data to lose, you may try re-creating the sentry-kafka
volume. This is also the reason why you cannot get any events.
Okay! I try it.
@CharlesBird did you ever get this resolved?
Hi there!
I’m having the same issue with kafka container talking about partitions and leadership :’(
I’ve already tried to remove the volume and restart the kafka container multiple times with no luck :’(
@BYK please, help us :’(
Unfortunately I’m not a Kafka expert, I just bring services together and make sure they work properly in multiple environments such us our local envs, the CI envs etc. If you are having issues, I encourage you to try finding the root cause yourself as we know this setup works.
My primary suspects would be disk space, disk integrity, network and available memory to the system. If all those look fine, then I honestly don’t know what might be the issue.
by the way, I changed kafka and zookeeper versions to latest (from 5.1.2) in compose, and (kafka) errors are gone.
But… Unfortunatelly, sentry still ignores new events and neither can load old events
Can you advice any way to catch a place where it losing incoming events?
(just in case: even calling Sentry.captureException(new Error("Something broke"));
from browser console doesn’t create any new events in the sentry project :’(
looks like I finally beat it. Sentry was failed to receive new events because ingest-consumer command in 10.0.0
doesn’t support --all-consumer-types
.
So, I think, it can be fixed by upgrade to dev-version or by creating 3 consumers for each type: events
(!), transactions
, attachments
.
Although, I still not sure what about old
events that was there before upgrade from 9.1.2 to 10.0.0 :-/
So, anyway, @BYK, it seems, you should bump kafka and zookeeper’s versions in docker-compose.yml to something newer (I’m not sure about minimal working versions, but setting 5.5.0
or latest
did the trisk for me).
… and that was false positive…
Sentry only gets events from itself now. But don’t catch ones from other projects…
I guess we can switch to 5.1.4
safely but don’t know about 5.5.0
. Do you know what was fixed in the recent versions?
Yeah, right now we are expecting people to use the latest versions and seems like there’s a preference towards a fixed 10.0.0
version by some people. I’ll take the time to tag the on-premise repo for 10.0.0
with fixed image versions so everything works but it will be older versions of everything (about 4 months old at this point). I’d highly recommend using the latest versions for now.
Unfortunatelly, no, I didn’t check release history and don’t know, where exactly it was fixed. I just locally changed :5.1.2
to :latest
, and it stopped to randomly crash.
By the way, I also found that it was needed to add
KAFKA_REPLICA_FETCH_MAX_BYTES: '1073741824'
KAFKA_MESSAGE_MAX_BYTES: '1073741824'
in environment block in kafka container, to prevent “message too long” errors when sentry upgrade
migrates old events.
Although, I still can’t figure out how to fix duplicates
errors in postgres (https://github.com/getsentry/sentry/issues/18492) :-/
Submitted a PR for the Kafka upgrade: https://github.com/getsentry/onpremise/pull/465
The MAX_BYTES
options, I don’t know tho. Responded to the issue over at GitHub.
Hey it’s Will from team.
I’m having the same erorr output. But the PR got merged? https://github.com/getsentry/onpremise/pull/465
am I on the right versions? let me see
Creating sentry_onpremise_kafka_1 ... done
+ '[' b = - ']'
+ snuba bootstrap --help
+ set -- snuba bootstrap --force
+ set gosu snuba snuba bootstrap --force
+ exec gosu snuba snuba bootstrap --force
2020-05-23 02:28:28,966 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-05-23 02:28:30,968 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
yes I have 5.5.0 for cp-zookeeper and cp-kafka.
This one you are seeing is normal as our takes a bit longer for Kafka to be ready compared to Snuba. It should automatically clear and move on with the installation.
it re-attempts a total of 59 times then finishes on this:
2020-05-23 06:47:47,510 Connection to Kafka failed (attempt 58)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-05-23 06:47:49,517 Connection to Kafka failed (attempt 59)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 11, in <module>
load_entry_point('snuba', 'console_scripts', 'snuba')()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Cleaning up...
My install.sh is not finishing on the ““You’re all done! Run the following command to get Sentry running:””
I run docker-compose up
anyways and get various error looking info and can’t load up Sentry on localhost:9000 (502 Bad Gateway), idk if these would be related to the above kafka issue. I’ll search the forum or maybe open a new forum post:
worker_1 | 06:52:40 [ERROR] celery.worker.job: Task sentry.tasks.options.sync_options[fc35b23c-8040-4178-a8a1-4676c0958c09] raised unexpected: OperationalError('could not translate host name "postgres" to address: Name or service not known\n',) (data={u'hostname': 'celery@25e8b82c032d', u'name': 'sentry.tasks.options.sync_options', u'args': '[]', u'internal': False, u'kwargs': '{}', u'id': 'fc35b23c-8040-4178-a8a1-4676c0958c09'})
kafka_1 | [main-SendThread(zookeeper:2181)] ERROR org.apache.zookeeper.client.StaticHostProvider - Unable to resolve address: zookeeper:2181
kafka_1 | java.net.UnknownHostException: zookeeper
kafka_1 | at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
kafka_1 | at java.net.InetAddress.getAllByName(InetAddress.java:1193)
and…
postgres_1 |
postgres_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres_1 |
postgres_1 | FATAL: database files are incompatible with server
postgres_1 | DETAIL: The data directory was initialized by PostgreSQL version 9.5, which is not compatible with this version 9.6.18.
^ this info repeats over and over, it never seems to settle on anything healthy looking.
several months ago i had this working on my macbook but maybe was a different version.
Seems like you have multiple issues:
- For some reason zookeeper is not reachable. This one looks like a DNS issue.
docker-compose
handles this automatically so don’t really know what’s going on there - I guess you were trying to upgrade from a
9.1.2
installation which uses Postgres 9.5 and the automatic upgrade to Postgres 9.6 failed. You’d need to inspect the install logs to figure out what went wrong there.