Kafka issues when upgrade from 9.1.2 to 20.8.0

So when I run ./install.sh the following error comes from Kafka

    Creating sentry_onpremise_kafka_1      ... done
    + '[' b = - ']'
    + snuba bootstrap --help
    + set -- snuba bootstrap --force
    + set gosu snuba snuba bootstrap --force
    + exec gosu snuba snuba bootstrap --force
    %3|1598717082.331|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 2ms in state CONNECT)
    2020-08-29 16:04:43,328 Connection to Kafka failed (attempt 0)
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
        client.list_topics(timeout=1)
    cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
    %3|1598717083.332|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 4ms in state CONNECT, 1 identical error(s) suppressed)
    %3|1598717084.333|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
    %3|1598717085.334|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
    2020-08-29 16:04:45,336 Connection to Kafka failed (attempt 1)
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
        client.list_topics(timeout=1)
    cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
    %3|1598717086.341|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
    %3|1598717087.342|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
    2020-08-29 16:04:47,344 Connection to Kafka failed (attempt 2)
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
        client.list_topics(timeout=1)
    cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
    %3|1598717088.349|FAIL|rdkafka#producer-4| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
    %3|1598717089.350|FAIL|rdkafka#producer-4| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
    2020-08-29 16:04:49,352 Connection to Kafka failed (attempt 3)
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
        client.list_topics(timeout=1)
    cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
    %3|1598717090.357|FAIL|rdkafka#producer-5| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
    %3|1598717091.358|FAIL|rdkafka#producer-5| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.26.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
    2020-08-29 16:04:51,359 Connection to Kafka failed (attempt 4)
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
        client.list_topics(timeout=1)
    cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
    2020-08-29 16:04:53,703 Failed to create topic cdc
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,710 Failed to create topic outcomes
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,711 Failed to create topic ingest-sessions
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,713 Failed to create topic events
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,717 Failed to create topic event-replacements
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,718 Failed to create topic snuba-commit-log
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,729 Failed to create topic errors-replacements
    Traceback (most recent call last):
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/src/snuba/snuba/cli/bootstrap.py", line 94, in bootstrap
        future.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.8/site-packages/confluent_kafka/admin/__init__.py", line 225, in _make_topics_result
        result = f.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
        raise self._exception
    cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed while waiting for response from broker: Local: Timed out"}
    2020-08-29 16:04:53,764 Creating tables for storage events

But ./install.sh commands runs successfully after this error.

Everything works for 3-4 hours, our sentry instance receive error and all.
But after this time kafka exits and I think this is due to some memory issue my server has 3Gb RAM btw.

When I run the following command,

# dmesg | grep "Out of memory"

[30129211.818081] Out of memory: Kill process 17182 (java) score 202 or sacrifice child

Does anyone know how can I limit memory usage of kafka.
Also as some people suggested use the following env config in docker-compose.yml file,

KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2'
KAFKA_LOG_RETENTION_HOURS: '24'

This isn’t helping. If anyone knows how can I resolve this please let me know.

Thanks.

The errors are not important if the script kept going on. These are transient errors while Snuba waits for Kafka to be up and running.

Looks like you have actually identified the real issue here, which is not enough memory and OOM killer killing Kafka and potential other services. Are you able to increase the ram you are using?

For the kafka options in the docker-compose file, I think you need to at least restart for KAFKA_LOG_RETENTION_HOURS to take effect and do a clean install (or at least remove any kafka-related volumes) for KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS to take effect.

I increased server’s RAM to 4 GB, I’m still facing the same problem. Can you tell me what is the minimum RAM requirements for sentry 20.8.0?

It really depends on the amount of load/traffic you are getting. We have merged a patch that reduces the amount of RAM used by Clickhouse that can help with your situation and it will be release as 20.9.0 tomorrow if you can wait until then.

@BYK I created a new instance with 4 GB of RAM and 25 GB space. As you suggested, I installed i.e. 20.9.0 on my server. Initially, it was working fine but later on, Kafka started to consume a lot of memory. And finally, my server hung up and I event can’t ssh into my server.

The load/traffic on my server was very minimal as I was testing if this will work or not.
It was just receiving one or two events in hours.

As you can see in the following image, PID-1742 consuming memory around 25%, it increased to 30-35%, and then instance stopped working.

Following image shows what PID-1742 is.

Please let me know how can I limit the memory usage by kafka.
Thanks!

PS. I’m using latest on-promise setup from your github repo.

How exactly are you running Sentry, can you provide more details? An what commands are you running to get this information?

I just cloned onpremise repo, made some changes in config files, and ran install script.

  1. For first image I just used,
    $ top
  2. For second image I used,
    $ docker-compose top

If you need more details, please let me know.
Thanks!