Sentry disk cleanup [kafka]

Hi I installed sentry on-premise using the docker compose script a few months ago and now my disk is almost full.
The version of Sentry I am currently running is Sentry 20.10.0.dev0 bdad080 and it is running on a Ubuntu 18.04 Server system.

I first ran docker-compose --file docker-compose.yml exec worker sentry cleanup --days 30 but this did not have any effect.

When I ssh to my sentry server and check what is using space this is what my docker volumes directory looks like:

root@sentry:/var/lib/docker/volumes# du -hd1
8.0K    ./sentry_onpremise_sentry-secrets
26G     ./sentry-postgres
168K    ./sentry_onpremise_sentry-zookeeper-log
128K    ./sentry-zookeeper
28K     ./sentry-symbolicator
8.0K    ./70a9678977238f80bf4552ae6c4a80adc48d9e33e70cd1f416aebf427170dc88
8.0K    ./28494c3632ce6423e39f86a2cf2418912ccf4c070247c4a8ee2c3cc5a870cb3d
8.0K    ./sentry_onpremise_sentry-kafka-log
2.1M    ./sentry-redis
46G     ./sentry-kafka
272M    ./sentry-clickhouse
12K     ./096c71be115419766c5fb74f2cfcd6c350a1339bb673f8b29d6c82f318c46def
38M     ./sentry_onpremise_sentry-smtp
12K     ./a30daf1cc27a3adcba4e0b68d3600b9fe35c153a20a6d6be84f75b6a46258630
692K    ./sentry_onpremise_sentry-smtp-log
500K    ./sentry-data
1.3G    ./sentry_onpremise_sentry-clickhouse-log
73G     .

After reading Database cleanup I figured that I can vacuum the postgres database and ran the following commands.

root@sentry:/var/lib/docker/volumes# docker exec -it sentry_onpremise_postgres_1 bash
root@8ab81c942824:/# su postgres
postgres@8ab81c942824:/$ psql
postgres=# VACUUM FULL;
VACUUM
postgres=# 

For me this took about 20 minutes to run and reduced the size of the sentry-postgres volume from 26 GB to 1.6GB

The sentry-kafka volume is taking up 46GB. Is there something similar I can do to clean out the volume.

root@sentry:/var/lib/docker/volumes/sentry-kafka/_data# du -hd1
8.0K    ./__consumer_offsets-28
22M     ./snuba-commit-log-0
20K     ./__consumer_offsets-14
32K     ./__consumer_offsets-6
8.0K    ./__consumer_offsets-15
8.0K    ./__consumer_offsets-35
8.0K    ./__consumer_offsets-11
8.0K    ./__consumer_offsets-34
8.0K    ./__consumer_offsets-18
141M    ./outcomes-0
8.0K    ./__consumer_offsets-38
8.0K    ./__consumer_offsets-8
8.0K    ./__consumer_offsets-2
8.0K    ./__consumer_offsets-47
8.0K    ./__consumer_offsets-32
20K     ./__consumer_offsets-19
8.0K    ./__consumer_offsets-3
44K     ./event-replacements-0
8.0K    ./__consumer_offsets-10
20K     ./__consumer_offsets-16
8.0K    ./ingest-transactions-0
8.0K    ./__consumer_offsets-27
41M     ./__consumer_offsets-33
12K     ./__consumer_offsets-20
8.0K    ./__consumer_offsets-36
32K     ./__consumer_offsets-31
20K     ./__consumer_offsets-30
8.0K    ./__consumer_offsets-17
20K     ./__consumer_offsets-40
48K     ./__consumer_offsets-43
8.0K    ./__consumer_offsets-25
20K     ./__consumer_offsets-13
8.0K    ./__consumer_offsets-23
8.0K    ./__consumer_offsets-5
20K     ./__consumer_offsets-44
8.0K    ./__consumer_offsets-24
8.0K    ./__consumer_offsets-29
8.0K    ./__consumer_offsets-48
8.0K    ./__consumer_offsets-26
8.0K    ./errors-replacements-0
8.0K    ./__consumer_offsets-22
8.0K    ./ingest-sessions-0
23G     ./events-0
8.0K    ./__consumer_offsets-12
32K     ./__consumer_offsets-46
8.0K    ./__consumer_offsets-49
1.3M    ./ingest-attachments-0
8.0K    ./__consumer_offsets-41
8.0K    ./__consumer_offsets-1
8.0K    ./__consumer_offsets-0
82M     ./__consumer_offsets-45
8.0K    ./cdc-0
8.0K    ./__consumer_offsets-37
36K     ./__consumer_offsets-39
8.0K    ./__consumer_offsets-4
23G     ./ingest-events-0
8.0K    ./__consumer_offsets-42
8.0K    ./__consumer_offsets-7
8.0K    ./__consumer_offsets-21
9.1M    ./__consumer_offsets-9
46G     .

The events-0 and ingest-events-0 directories take up 23 GB each.
The majority of the contents of this is in .log files.

I tried to remove the volumes for sentry-kafka and sentry-zookeeper as suggested here Kafka is failing to change state to Online but docker-compose fails to start the kafka and zookeeper docker containers.
Running a bash install.sh fails with an error saying

%3|1602005098.025|FAIL|rdkafka#producer-5| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#10.0.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1602005099.024|FAIL|rdkafka#producer-5| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#10.0.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-10-06 17:24:59,024 Connection to Kafka failed (attempt 4)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
    client.list_topics(timeout=1)

Any suggestions on how to clean this space up will be much appreciated.
I have a snapshot of the VM in the pre postgres vacuum state if you would like me to run anything.

Thanks,
Reshad

For anyone wanting to play with the settings for kafka this post helped a lot Restrict Kafka disk usage

I found it easier editing the .env file rather than the docker-compose.yml file.

I added the following lines to my .env file

KAFKA_LOG_RETENTION_HOURS=24
KAFKA_LOG_RETENTION_BYTES=53687091200
KAFKA_LOG_SEGMENT_BYTES=1073741824
KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS=300000
KAFKA_LOG_SEGMENT_DELETE_DELAY_MS=60000
1 Like

设置以后,直接就生效了吗?需要重新启动吗?为啥我设置了 感觉磁盘空间还是在变大

can you help me ? thanks!