Migrated from v9 to v10 recently. Now, the Docker volume sentry-kafka
was consuming 107G within just 4 days of operation. Is there a possibility to restrict it? This might be related to How to disable Kafka logs in on premise sentry
I’ve found this great article about limiting Kafka: http://iv-m.github.io/articles/kafka-limit-disk-space/
It suggests to add:
"log.retention.hours": 24,
"log.retention.bytes": 1073741824,
"log.segment.bytes": 107374182,
"log.retention.check.interval.ms": 20000,
"log.segment.delete.delay.ms": 1000
However, where would I apply this to? I’ve tried sentry/sentry.conf.py
, but its Kafka options are only for the producer end.
What options would make sense in terms of Sentry? Does it only temporarily store incoming requests before they are processed or is Kafka also used as permanent storage?
Even if I’d loose some events, I’m in favor of keeping this volume small.
2 Likes
You should be able to add these values to the kafka container in your docker-compose.yml file as environment variables. Just prefix them with KAFKA_
, use _
instead of .
and all uppercase. See here: https://docs.confluent.io/current/installation/docker/config-reference.html#confluent-kafka-configuration
For your last questions I don’t have answers but I am interested in the answers too.
Hi,
we have the same problem - Sentry ( Sentry 10.1.0.dev098c2f17 ) OnPremise, docker env, Kafka is using 10GB + daily.
I changed kafka configuration as written above and space is now cleared/cleaned.
The question still remains :
What options would make sense in terms of Sentry? Does it only temporarily store incoming requests before they are processed or is Kafka also used as permanent storage?
Even if I’d loose some events, I’m in favor of keeping this volume small.
Was this ever answered?
No, unfortunately this was never answered. I was scared of setting any KAFKA_ environment variables. Our Kafka container uses between 40G and 100G. Seems to run stable though.
Hi again, now our Kafka went crazy consuming nearly 100G.
What is the environment variables that you’ve configured for your instance? I’ve used 50G for retention size, 1G for segments and a bit higher intervals now:
KAFKA_LOG_RETENTION_HOURS: 24
KAFKA_LOG_RETENTION_BYTES: 53687091200
KAFKA_LOG_SEGMENT_BYTES: 1073741824
KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: 300000
KAFKA_LOG_SEGMENT_DELETE_DELAY_MS: 60000
In the logs I could see the LogCleaner
to be started:
kafka_1 | [2020-07-27 09:06:10,351] INFO Starting the log cleaner (kafka.log.LogCleaner)
kafka_1 | [2020-07-27 09:06:14,414] INFO [kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner)
After some minutes, disk usage dropped from 98G to 19G. I hope to see no further issues with e.g.: lost messages.
Hi,
I’ve used parameters suggested in your first post. Our sentry also uses ~20GB of disk.
I’m not really sure how to check if the parameters are ‘ok’ eg, if there are any ‘lost messages’. For now it seems to be working ok.