Migration Error

Hi,

I recently had my sentry disk fill up which cause me to try and run a bash instal.sh again to re-run the cleanup.

However this failed with the following error.

Creating sentry_onpremise_kafka_1      ... 
Creating sentry_onpremise_kafka_1      ... done
Creating sentry_onpremise_clickhouse_1 ... done
+ '[' b = - ']'
+ snuba bootstrap --help
+ set -- snuba bootstrap --no-migrate --force
+ set gosu snuba snuba bootstrap --no-migrate --force
+ exec gosu snuba snuba bootstrap --no-migrate --force
%3|1606035465.928|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.19.0.5:9092 failed: Connection refused (after 5ms in state CONNECT)
%3|1606035466.923|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.19.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-11-22 08:57:46,923 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-11-22 08:57:48,331 Failed to create topic cdc
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'cdc' already exists."}
2020-11-22 08:57:48,333 Failed to create topic events
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'events' already exists."}
2020-11-22 08:57:48,333 Failed to create topic ingest-sessions
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'ingest-sessions' already exists."}
2020-11-22 08:57:48,333 Failed to create topic event-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'event-replacements' already exists."}
2020-11-22 08:57:48,333 Failed to create topic snuba-commit-log
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'snuba-commit-log' already exists."}
2020-11-22 08:57:48,334 Failed to create topic outcomes
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'outcomes' already exists."}
2020-11-22 08:57:48,334 Failed to create topic errors-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'errors-replacements' already exists."}
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_zookeeper_1  ... 
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_redis_1      ... 
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations migrate --force
+ set gosu snuba snuba migrations migrate --force
+ exec gosu snuba snuba migrations migrate --force
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/migrations.py", line 50, in migrate
    runner.run_all(force=force)
  File "/usr/src/snuba/snuba/migrations/runner.py", line 132, in run_all
    pending_migrations = self._get_pending_migrations()
  File "/usr/src/snuba/snuba/migrations/runner.py", line 257, in _get_pending_migrations
    raise MigrationInProgress(migration_key)
snuba.migrations.errors.MigrationInProgress: transactions: 0008_transactions_add_timestamp_index
An error occurred, caught SIGERR on line 255
Cleaning up...

I followed the steps described in this post but got nowhere Sentry no more catch errors

I tried upgrading to the latest version at the sentry-onpremise repository but it gives me the same error.

Any help getting sentry started again would be much appreciated.

Thanks,
Reshad

TLDR: running docker-compose --no-ansi run --rm snuba-api migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index fixed my specific migration issue.
More details as to how I came to that conclusion and how to triage and fix similar issues below.

So for anyone who encounters this in the future here is how I fixed it:

I first found the snuba migrations script https://github.com/getsentry/snuba/blob/master/README.md#migrations and run the migrations list command on the container.

# docker-compose --no-ansi run --rm snuba-api migrations list
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_redis_1      ... 
Starting sentry_onpremise_zookeeper_1  ... 
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations list
+ set gosu snuba snuba migrations list
+ exec gosu snuba snuba migrations list
system
[X]  0001_migrations

events
[X]  0001_events_initial
[X]  0002_events_onpremise_compatibility
[X]  0003_errors
[X]  0004_errors_onpremise_compatibility
[X]  0005_events_tags_hash_map
[X]  0006_errors_tags_hash_map
[X]  0007_groupedmessages
[X]  0008_groupassignees
[X]  0009_errors_add_http_fields
[X]  0010_groupedmessages_onpremise_compatibility
[X]  0011_rebuild_errors

transactions
[X]  0001_transactions
[X]  0002_transactions_onpremise_fix_orderby_and_partitionby
[X]  0003_transactions_onpremise_fix_columns
[X]  0004_transactions_add_tags_hash_map
[X]  0005_transactions_add_measurements
[X]  0006_transactions_add_http_fields
[X]  0007_transactions_add_discover_cols
[-]  0008_transactions_add_timestamp_index (IN PROGRESS)

discover
[ ]  0001_discover_merge_table
[ ]  0002_discover_add_deleted_tags_hash_map

outcomes
[X]  0001_outcomes
[X]  0002_outcomes_remove_size_and_bytes

sessions
[X]  0001_sessions

This showed me that there was a migration IN PROGRESS even though there was nothing running.

Reading a little more I found an option to reverse a single migration.
I ran this script against the failed snuba migration, in my case this was in the group transactions and had a id of 0008_transactions_add_timestamp_index

# docker-compose --no-ansi run --rm snuba-api migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index
Starting sentry_onpremise_redis_1 ... 
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_zookeeper_1  ... 
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index
+ set gosu snuba snuba migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index
+ exec gosu snuba snuba migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index
Finished reversing migration transactions: 0008_transactions_add_timestamp_index

# docker-compose --no-ansi run --rm snuba-api migrations list
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_zookeeper_1  ... 
Starting sentry_onpremise_redis_1      ... 
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations list
+ set gosu snuba snuba migrations list
+ exec gosu snuba snuba migrations list
system
[X]  0001_migrations

events
[X]  0001_events_initial
[X]  0002_events_onpremise_compatibility
[X]  0003_errors
[X]  0004_errors_onpremise_compatibility
[X]  0005_events_tags_hash_map
[X]  0006_errors_tags_hash_map
[X]  0007_groupedmessages
[X]  0008_groupassignees
[X]  0009_errors_add_http_fields
[X]  0010_groupedmessages_onpremise_compatibility
[X]  0011_rebuild_errors

transactions
[X]  0001_transactions
[X]  0002_transactions_onpremise_fix_orderby_and_partitionby
[X]  0003_transactions_onpremise_fix_columns
[X]  0004_transactions_add_tags_hash_map
[X]  0005_transactions_add_measurements
[X]  0006_transactions_add_http_fields
[X]  0007_transactions_add_discover_cols
[ ]  0008_transactions_add_timestamp_index

discover
[ ]  0001_discover_merge_table
[ ]  0002_discover_add_deleted_tags_hash_map

outcomes
[X]  0001_outcomes
[X]  0002_outcomes_remove_size_and_bytes

sessions
[X]  0001_sessions

After ruining the migrations reverse command I ran migrations list once more and the [-] next to 0008_transactions_add_timestamp_index had changed to [] and also the IN PROGRESS had been removed too.

I then ran the migrations migrate command, the same way install.sh does (this was the command that was originally failing for me when I ran install.sh) and it succeeded.

# docker-compose --no-ansi run --rm snuba-api migrations migrate --force
Starting sentry_onpremise_zookeeper_1 ... 
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_redis_1      ... 
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations migrate --force
+ set gosu snuba snuba migrations migrate --force
+ exec gosu snuba snuba migrations migrate --force
Finished running migrations
# docker-compose --no-ansi run --rm snuba-api migrations list
Starting sentry_onpremise_zookeeper_1 ... 
Starting sentry_onpremise_clickhouse_1 ... 
Starting sentry_onpremise_zookeeper_1  ... done
Starting sentry_onpremise_redis_1      ... 
Starting sentry_onpremise_clickhouse_1 ... done
Starting sentry_onpremise_redis_1      ... done
Starting sentry_onpremise_kafka_1      ... 
Starting sentry_onpremise_kafka_1      ... done
+ '[' m = - ']'
+ snuba migrations --help
+ set -- snuba migrations list
+ set gosu snuba snuba migrations list
+ exec gosu snuba snuba migrations list
system
[X]  0001_migrations

events
[X]  0001_events_initial
[X]  0002_events_onpremise_compatibility
[X]  0003_errors
[X]  0004_errors_onpremise_compatibility
[X]  0005_events_tags_hash_map
[X]  0006_errors_tags_hash_map
[X]  0007_groupedmessages
[X]  0008_groupassignees
[X]  0009_errors_add_http_fields
[X]  0010_groupedmessages_onpremise_compatibility
[X]  0011_rebuild_errors

transactions
[X]  0001_transactions
[X]  0002_transactions_onpremise_fix_orderby_and_partitionby
[X]  0003_transactions_onpremise_fix_columns
[X]  0004_transactions_add_tags_hash_map
[X]  0005_transactions_add_measurements
[X]  0006_transactions_add_http_fields
[X]  0007_transactions_add_discover_cols
[X]  0008_transactions_add_timestamp_index

discover
[X]  0001_discover_merge_table
[X]  0002_discover_add_deleted_tags_hash_map

outcomes
[X]  0001_outcomes
[X]  0002_outcomes_remove_size_and_bytes

sessions
[X]  0001_sessions

Running migrations list now shows [X] against all the migrations, denoting they have all completed.

Running bash install.sh and then docker-compose up -d brought sentry back up for me.

1 Like

Yeah, was about to mention this seems very similar to

Would you mind explaining how you ended up in this state in the first place so we can prevent it if possible?

Hi,

This happened after kafka on sentry filed up disk.
I had made these changes to my .env file Sentry disk cleanup [kafka] about a month back and that seemed to fix the problem (the kafka size dropped significantly).

Kafka filled up disk again on Sunday (the kafka volume took 80GB of space).
I bumped the disk space a bit and tried to run my older install.sh (master from about 12th Oct, I don’t have a commit hash) and it failed with something similar to the first error (the one with KafkaError{code=TOPIC_ALREADY_EXISTS).

I tried updating to the latest tag, and then to master but kept getting the same error along with the migration error snuba.migrations.errors.MigrationInProgress: transactions: 0008_transactions_add_timestamp_index.

After going through the docker compose yaml, and then the snuba code I found the migration commands which I posted above.

My best hunch is one of the the first install.sh runs tried to migrate and failed/errored out for some reason.
Once the state is marked in progress, it is not reverted even when you re-try the migration.
Some way to detect and mitigate or warn the user here would be nice.

That said, kafka filling disk is what could be a bigger problem.
I had a 30 GB PG volume and a 80 GB kafka volume before the upgrade
Once I successfully got sentry working the kafka volume was down to 24 GB.
I still don’t know why it fills disk the way it does.