Hi!
I’m struggling with following situation:
Currently we use sentry 9.0.0 and want to upgrade to 20.9. I managed to upgrade from 9.0.0 to 9.1.2, the upgrade went through quite fast and easily, with no errors. 9.1.2 is running without any flaws.
I set up a sentry 20.9.0 instance in our bare-metal Kubernetes Cluster (converted the docker-compose.yml with “kompose” and did a lot of modifications to get it up and running) as a new Installation. This seems to work perfectly, everything seems to work just fine, I can fire an sample issue and we can log exceptions of our applications in the new instance.
My steps when I try to upgrade from 9.1.2 to 20.9.0:
- wipe the postgres database (external postgres database server) and restore the 9.1.2 database
- wipe all volumes (pvc on our Kubernetes cluster)
- deploy the sentry 20.9.0 pods (same as the docker-compose created containers)
- after all pods are running I ensure that the Kafka Pod is up and running - if it is running:
- start “snuba bootstrap --force” inside the snuba-api pod, followed by “snuba migrations migrate” (not sure if the migrations migrate step is needed)
- the output of “snuba bootstrap --force” tells me that the kafka topics were created and some migrations have been run
- then I start “sentry upgrade --noinput” inside the sentry-web pod:
*Faking migration for contenttypes.0001_initial*
* Faking migration for contenttypes.0002_remove_content_type_name*
* Faking migration for auth.0001_initial*
* Faking migration for auth.0002_alter_permission_name_max_length*
* Faking migration for auth.0003_alter_user_email_max_length*
* Faking migration for auth.0004_alter_user_username_opts*
* Faking migration for auth.0005_alter_user_last_login_null*
* Faking migration for auth.0006_require_contenttypes_0002*
* Faking migration for sites.0001_initial*
* Faking migration for admin.0001_initial*
* Faking migration for sessions.0001_initial*
* Operations to perform:*
* Apply all migrations: admin, auth, contenttypes, jira_ac, nodestore, sentry, sessions, sites, social_auth*
* Running migrations:*
* Applying admin.0002_logentry_remove_auto_add... OK*
* Applying auth.0007_alter_validators_add_error_messages... OK*
* Applying auth.0008_alter_user_username_max_length... OK*
* Applying sentry.0002_912_to_recent... OK*
* Applying sentry.0003_auto_20191022_0122... OK*
* Applying sentry.0004_bitfieldtestmodel_blankjsonfieldtestmodel_callabledefaultmodel_jsonfieldtestmodel_jsonfieldwithdefau... OK*
* Applying sentry.0005_fix_content_types... OK*
* Applying sentry.0006_sentryapp_date_published... OK*
* Applying sentry.0007_auto_20191029_0131... OK*
* Applying sentry.0008_auto_20191030_0016... OK*
* Applying sentry.0009_auto_20191101_1608... OK*
* Applying sentry.0010_auto_20191104_1641... OK*
* Applying sentry.0011_remove_pagerdutyservice_service_id_from_state... OK*
* Applying sentry.0012_remove_pagerdutyservice_service_id... OK*
* Applying sentry.0013_auto_20191111_1829... OK*
* Applying sentry.0014_delete_sentryappwebhookerror... OK*
* Applying sentry.0015_delete_sentryappwebhookerror_db... OK*
* Applying sentry.0016_delete_alert_rule_deprecated_fields... OK*
* Applying sentry.0017_incident_aggregation... OK*
* Applying sentry.0018_discoversavedquery_version... OK*
* Applying sentry.0019_auto_20191114_2040... OK*
* Applying sentry.0020_auto_20191125_1420... OK*
* Applying sentry.0021_auto_20191203_1803... OK*
* Applying sentry.0021_auto_20191202_1716... OK*
* Applying sentry.0022_merge... OK*
* Applying sentry.0023_hide_environment_none_20191126... OK*
* Applying sentry.0024_auto_20191230_2052...Events to process: 291181*
- during the upgrade I checked the output of:
-
clickhouse-server.log: (just a small portion I cut out)
2020.10.12 06:28:29.472705 [ 49 ] {} SystemLog (system.trace_log): Flushing system log
-
2020.10.12 06:28:29.473670 [ 49 ] {} DiskLocal: Reserving 1.00 MiB on disk
default
, having unreserved 764.17 GiB.* -
2020.10.12 06:28:29.684439 [ 49 ] {} system.trace_log: Renaming temporary part tmp_insert_202010_379_379_0 to 202010_379_379_0.*
-
2020.10.12 06:28:29.812088 [ 87 ] {} default.sentry_local: Renaming temporary part tmp_merge_20191230-90_1006_1021_3 to 20191230-90_1006_1021_3.*
-
2020.10.12 06:28:29.815816 [ 87 ] {} default.sentry_local (MergerMutator): Merged 6 parts: from 20191230-90_1006_1016_2 to 20191230-90_1021_1021_0*
-
2020.10.12 06:28:29.965497 [ 73 ] {} HTTPHandler-factory: HTTP Request for HTTPHandler-factory. Method: POST, Address: 10.128.34.31:35084, User-Agent: none, Content Type: , Transfer*
-
Encoding: chunked*
-
2020.10.12 06:28:29.965614 [ 73 ] {} HTTPHandler: Request URI: /?load_balancing=in_order&insert_distributed_sync=1&query=INSERT+INTO+default.sentry_local+FORMAT+JSONEachRow*
-
2020.10.12 06:28:29.969915 [ 73 ] {f8922bc5-db6b-4ddd-a5dd-45ca123c51f4} executeQuery: (from 10.128.34.31:35084) INSERT INTO default.sentry_local FORMAT JSONEachRow *
-
2020.10.12 06:28:29.970097 [ 73 ] {f8922bc5-db6b-4ddd-a5dd-45ca123c51f4} AccessRightsContext (default): Access granted: INSERT(event_id, project_id, group_id, timestamp, deleted, re*
-
tention_days, platform, message, primary_hash, received, search_message, title, location, user_id, username, email, ip_address, geo_country_code, geo_region, geo_city, sdk_name, sdk_version*
-
, type, version, offset, partition, message_timestamp, os_build, os_kernel_version, device_name, device_brand, device_locale, device_uuid, device_model_id, device_arch, device_battery_level*
-
, device_orientation, device_simulator, device_online, device_charging, level, logger, server_name, transaction, environment,
sentry:release
,sentry:dist
,sentry:user
, site, url, app_* -
device, device, device_family, runtime, runtime_name, browser, browser_name, os, os_name, os_rooted,
tags.key
,tags.value
, _tags_flattened,contexts.key
,contexts.value
, http_method* -
, http_referer,
exception_stacks.type
,exception_stacks.value
,exception_stacks.mechanism_type
,exception_stacks.mechanism_handled
,exception_frames.abs_path
, `exception_frames.fi* -
lename
,
exception_frames.package,
exception_frames.module,
exception_frames.function,
exception_frames.in_app,
exception_frames.colno,
exception_frames.lineno,
exception_frame* -
s.stack_level
, culprit, sdk_integrations,
modules.name,
modules.version`) ON default.sentry_local* -
2020.10.12 06:28:29.979567 [ 73 ] {f8922bc5-db6b-4ddd-a5dd-45ca123c51f4} DiskLocal: Reserving 1.14 MiB on disk
default
, having unreserved 764.17 GiB*
snuba consumer:
*| 2020-10-12 06:33:41,113 Completed processing <Batch: 26 messages, open for 2.64 seconds>. │*
-
│ 2020-10-12 06:33:48,709 Completed processing <Batch: 22 messages, open for 7.59 seconds>. │*
-
│ 2020-10-12 06:33:51,143 Completed processing <Batch: 55 messages, open for 2.43 seconds>. │*
-
│ 2020-10-12 06:33:53,863 Completed processing <Batch: 19 messages, open for 2.72 seconds>. │*
-
│ 2020-10-12 06:33:57,664 Completed processing <Batch: 26 messages, open for 3.79 seconds>. │*
-
│ 2020-10-12 06:34:00,271 Completed processing <Batch: 26 messages, open for 2.51 seconds>. │*
-
│ 2020-10-12 06:34:02,413 Completed processing <Batch: 16 messages, open for 2.14 seconds>. │*
-
│ 2020-10-12 06:34:04,756 Completed processing <Batch: 22 messages, open for 2.34 seconds>. │*
-
│ 2020-10-12 06:34:08,826 Completed processing <Batch: 21 messages, open for 4.06 seconds>. │*
-
│ 2020-10-12 06:34:13,036 Completed processing <Batch: 32 messages, open for 4.21 seconds>. │*
-
│ 2020-10-12 06:34:16,709 Completed processing <Batch: 36 messages, open for 3.67 seconds>. │*
-
│ 2020-10-12 06:34:20,354 Completed processing <Batch: 35 messages, open for 3.64 seconds>. │*
-
│ 2020-10-12 06:34:22,662 Completed processing <Batch: 28 messages, open for 2.31 seconds>.*
-
ingest-customer:
│ 06:25:30 [INFO] batching-kafka-consumer: Flushing 1 items (from {(u’ingest-events’, 0): [45L, 45L]}): forced:False size:False time:True │
-
│ 06:25:31 [INFO] batching-kafka-consumer: Worker flush took 95ms │*
-
│ 06:25:53 [INFO] batching-kafka-consumer: Flushing 1 items (from {(u’ingest-events’, 0): [46L, 46L]}): forced:False size:False time:True │*
-
│ 06:25:53 [INFO] batching-kafka-consumer: Worker flush took 56ms │*
-
│ 06:26:24 [INFO] batching-kafka-consumer: Flushing 1 items (from {(u’ingest-events’, 0): [47L, 47L]}): forced:False size:False time:True │*
-
│ 06:26:24 [INFO] batching-kafka-consumer: Worker flush took 10ms │*
-
│ 06:27:23 [INFO] batching-kafka-consumer: Flushing 1 items (from {(u’ingest-events’, 0): [48L, 48L]}): forced:False size:False time:True │*
-
│ 06:27:23 [INFO] batching-kafka-consumer: Worker flush took 73ms*
and (this one looks suspicious to me bc of the missing key id error msg):
relay (switched to debug mode TRACE):
*│ 2020-10-12T06:26:23Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state request amended │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::project] DEBUG: project state 99a1f406149c47a5b8fd97c82c95ea1d updated │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state requested │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::project] DEBUG: project state 99a1f406149c47a5b8fd97c82c95ea1d updated │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::events] TRACE: processing json event │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::events] ERROR: project state for key 99a1f406149c47a5b8fd97c82c95ea1d is missing key id │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::events] TRACE: sending envelope to kafka │*
- │ 2020-10-12T06:26:23Z [relay_server::actors::store] TRACE: Sending event item of envelope to kafka*
when the upgrade is finished (lasts about 4 hours), I see all my imported projects and issues. But if I try to open the details of an issue I receive this error:
relay log:
│ 2020-10-12T13:09:32Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state requested │
│ 2020-10-12T13:09:32Z [relay_server::actors::events] TRACE: queued event │
│ 2020-10-12T13:09:32Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state request amended │
│ 2020-10-12T13:09:32Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1) │
│ 2020-10-12T13:09:32Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1 │
│ 2020-10-12T13:09:32Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state request amended │
│ 2020-10-12T13:09:32Z [relay_server::actors::events] TRACE: queued event │
│ 2020-10-12T13:09:32Z [relay_server::actors::project] DEBUG: project 99a1f406149c47a5b8fd97c82c95ea1d state request amended │
│ 2020-10-12T13:09:32Z [relay_server::actors::project] DEBUG: project state 99a1f406149c47a5b8fd97c82c95ea1d updated │
│ 2020-10-12T13:09:32Z [relay_server::actors::events] DEBUG: dropped event: event submission rejected with reason: ProjectId │
│ 2020-10-12T13:09:32Z [relay_server::actors::outcome::processing] TRACE: handling outcome │
│ 2020-10-12T13:09:32Z [relay_server::actors::outcome::processing] TRACE: Tracking kafka outcome: TrackRawOutcome { timestamp: “2020-10-12T13:09:32.470154Z”, org_id: None, project_id: ProjectId(1), key_id: None, outcome: 3, reason: Some(“project_id”), event_id: Some(EventId(6adafc12-9df3-47aa-9ea5-8e8be7c35c3f)), remote_addr: Some(“10.128.90.0”), source: None }
any help would be appreciated…