Hi Experts,
I am setting on-prem sentry in Kubernetes. I see a huge lag in the “ingest-event” kafka topic processing by ingest-consumer. In order to make it more efficient, I increased the “ingest-events” topic’s partition to 5 and have 5 replicas of sentry-ingest-consumer running to poll each partition.
Below is the sentry-ingest-consumer arguments passed to the container during run time.
["–config", “/shared-config/”, “run”,“ingest-consumer”,"–all-consumer-types", “–max-batch-size”, “1000”]
Though the --max-batch-size is mentioned as 1000, the ingest-consumer does not process 1000 messages in a poll, though I have a huge lag to process (look at partition 0).
14:00:12 [WARNING] sentry.utils.geo: Error opening GeoIP database: /geoip/GeoLite2-City.mmdb
14:00:13 [WARNING] sentry.utils.geo: Error opening GeoIP database in Rust: /geoip/GeoLite2-City.mmdb
14:03:34 [INFO] sentry.plugins.github: apps-not-configured
14:03:50 [DEBUG] batching-kafka-consumer: Topic 'ingest-events' is ready
14:03:50 [DEBUG] batching-kafka-consumer: Topic 'ingest-transactions' is ready
14:03:50 [DEBUG] batching-kafka-consumer: Topic 'ingest-attachments' is ready
14:03:50 [DEBUG] batching-kafka-consumer: Starting
14:04:13 [INFO] batching-kafka-consumer: New partitions assigned:
[TopicPartition{topic=ingest-events,partition=1,offset=-1001,error=None}]
14:04:15 [INFO] batching-kafka-consumer: Flushing 495 items (from {('ingest-events', 1):
[26206, 26700]}): forced:False size:False time:True
14:04:15 [DEBUG] batching-kafka-consumer: Flushing batch via worker
14:07:35 [INFO] batching-kafka-consumer: Worker flush took 200238ms
14:07:35 [DEBUG] batching-kafka-consumer: Committing Kafka offsets
14:07:35 [DEBUG] batching-kafka-consumer: Committed offsets: [TopicPartition{topic=ingest-
events,partition=1,offset=26701,error=None}]
14:07:35 [DEBUG] batching-kafka-consumer: Kafka offset commit took 70ms
14:07:35 [DEBUG] batching-kafka-consumer: Resetting in-memory batch
14:07:37 [INFO] batching-kafka-consumer: Flushing 678 items (from {('ingest-events', 1): [26701,
27378]}): forced:False size:False time:True
14:07:37 [DEBUG] batching-kafka-consumer: Flushing batch via worker
14:12:03 [INFO] batching-kafka-consumer: Worker flush took 266006ms
./kafka-consumer-groups.sh --bootstrap-server xxxx:9092,xxxx:9092,xxxx:9092 --group ingest-consumer -describe
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
CONSUMER-ID HOST CLIENT-ID
ingest-consumer ingest-events 0 16036942 17279611 1242669 rdkafka-
034219b6-62e6-4347-847e-65cc73126275 /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-transactions 0 - 0 - rdkafka-034219b6-
62e6-4347-847e-65cc73126275 /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-attachments 0 - 0 - rdkafka-034219b6-
62e6-4347-847e-65cc73126275 /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-events 2 27856 27863 7 rdkafka-4cce6dd9-
5ef2-4283-8c73-9b8d1c4b6dad /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-events 3 26336 27908 1572 rdkafka-
bc571d01-e18f-4d1b-b1c0-97788c28ee86 /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-events 4 27361 28119 758 rdkafka-e9af7b70-
31d6-4539-8408-3c98cd043ee9 /xxx.xxx.xxx.xxx rdkafka
ingest-consumer ingest-events 1 26701 27901 1200 rdkafka-1d2101c1-
7407-4b51-999c-a90bc5cf569a /xxx.xxx.xxx.xxx rdkafka
How can I fine tune & rightly balance the workload across multiple ingest-consumers and kafka to process the ingest-events efficiently in production workload ?
Let me know if there is better kafka consumer fine tuning options available.