No events/issues from 9.1.2 to 21.1.0 (with Helm)

Hello,

I recently had an issue with an On-Premise 9.1.2 (setup through a CloudFormation template) and basically our service died leaving us only with a dump from postgresql.

I have a Kubernetes cluster at hand, so I tried the Helm charts, with a hook to import the DB (between db-check and db-init).

The users, projects and probably all settings have tranferred well, but I don’t see no issue/events.

I tried to understand what went wrong from previous issues posted here (mainly #11387) but I’m a bit struggling to find clues/errors on what is wrong in my setup.

I have no issues in the db-init job:
02:13:55 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
02:13:58 [INFO] sentry.plugins.github: apps-not-configured
Operations to perform:
Apply all migrations: admin, auth, contenttypes, jira_ac, nodestore, sentry, sessions, sites, social_auth
Running migrations:
Applying sentry.0024_auto_20191230_2052…Events to process: 61355

Event migration done. Migrated 61355 of 61355 events.

 OK
  Applying sentry.0025_organizationaccessrequest_requester... OK
  Applying sentry.0026_delete_event... OK
  Applying sentry.0027_exporteddata... OK
  Applying sentry.0028_user_reports... OK
  Applying sentry.0029_discover_query_upgrade... OK
  Applying sentry.0030_auto_20200201_0039... OK
  Applying sentry.0031_delete_alert_rules_and_incidents... OK
* Unknown config option found: 'postprocess.use-cache-key'
User Options: 100% |############################################| Time: 0:00:00
  Applying sentry.0032_delete_alert_email... OK
  Applying sentry.0033_auto_20200210_2137... OK
  Applying sentry.0034_auto_20200210_2311... OK
  Applying sentry.0035_auto_20200127_1711... OK
  Applying sentry.0036_auto_20200213_0106... OK
  Applying sentry.0037_auto_20200213_0140... OK
  Applying sentry.0038_auto_20200213_1904... OK
  Applying sentry.0039_delete_incidentsuspectcommit... OK
  Applying sentry.0040_remove_incidentsuspectcommittable... OK
  Applying sentry.0041_incidenttrigger_date_modified... OK
  Applying sentry.0042_auto_20200214_1607... OK
  Applying sentry.0043_auto_20200218_1903... OK
  Applying sentry.0044_auto_20200219_0018... OK
  Applying sentry.0045_remove_incidentactivity_event_stats_snapshot... OK
  Applying sentry.0046_auto_20200221_1735... OK
  Applying sentry.0047_auto_20200224_2319... OK
  Applying sentry.0048_auto_20200302_1825... OK
  Applying sentry.0049_auto_20200304_0254... OK
  Applying sentry.0050_auto_20200306_2346... OK
Audit Log Entrys: 100% |########################################| Time: 0:00:00
  Applying sentry.0051_fix_auditlog_pickled_data... OK
  Applying sentry.0052_organizationonboardingtask_completion_seen... OK
  Applying sentry.0053_migrate_alert_task_onboarding... OK
  Applying sentry.0054_create_key_transaction... OK
  Applying sentry.0055_query_subscription_status... OK
  Applying sentry.0056_remove_old_functions... OK
  Applying sentry.0057_remove_unused_project_flag... OK
  Applying sentry.0058_project_issue_alerts_targeting... OK
  Applying sentry.0059_add_new_sentry_app_features... OK
  Applying sentry.0060_add_file_eventattachment_index... OK
  Applying sentry.0061_alertrule_partial_index... OK
  Applying sentry.0062_key_transactions_unique_with_owner... OK
  Applying sentry.0063_drop_alertrule_constraint... OK
  Applying sentry.0064_project_has_transactions... OK
  Applying sentry.0065_add_incident_status_method... OK
  Applying sentry.0066_alertrule_manager... OK
Organizations: 100% |###########################################| Time: 0:00:00
  Applying sentry.0067_migrate_rules_alert_targeting... OK
  Applying sentry.0068_project_default_flags... OK
  Applying sentry.0069_remove_tracked_superusers... OK
  Applying sentry.0070_incident_snapshot_support... OK
  Applying sentry.0071_add_default_fields_model_subclass... OK
  Applying sentry.0072_alert_rules_query_changes... OK
  Applying sentry.0073_migrate_alert_query_model... OK
  Applying sentry.0074_add_metric_alert_feature... OK
  Applying sentry.0075_metric_alerts_fix_releases... OK
  Applying sentry.0076_alert_rules_disable_constraints... OK
  Applying sentry.0077_alert_query_col_drop_state... OK
  Applying sentry.0078_incident_field_updates... OK
  Applying sentry.0079_incidents_remove_query_field_state... OK
  Applying sentry.0080_alert_rules_drop_unused_tables_cols... OK
  Applying sentry.0081_add_integraiton_upgrade_audit_log... OK
  Applying sentry.0082_alert_rules_threshold_float... OK
  Applying sentry.0083_add_max_length_webhook_url... OK
  Applying sentry.0084_exported_data_blobs... OK
  Applying sentry.0085_fix_error_rate_snuba_query... OK
  Applying sentry.0086_sentry_app_installation_for_provider... OK
  Applying sentry.0087_fix_time_series_data_type... OK
  Applying sentry.0088_rule_level_resolve_threshold_type... OK
  Applying sentry.0089_rule_level_fields_backfill... OK
Audit Log Entrys: 100% |########################################| Time: 0:00:00
  Applying sentry.0090_fix_auditlog_pickled_data_take_2... OK
  Applying sentry.0091_alertruleactivity... OK
  Applying sentry.0092_remove_trigger_threshold_type_nullable... OK
  Applying sentry.0093_make_identity_user_id_textfield... OK
  Applying sentry.0094_cleanup_unreferenced_event_files... OK
  Applying sentry.0095_ruleactivity... OK
  Applying sentry.0096_sentry_app_component_skip_load_on_open... OK
  Applying sentry.0097_add_sentry_app_id_to_sentry_alertruletriggeraction... OK
  Applying sentry.0098_add-performance-onboarding... OK
Projects: 100% |################################################| Time: 0:00:00
  Applying sentry.0099_fix_project_platforms... OK
  Applying sentry.0100_file_type_on_event_attachment... OK
  Applying sentry.0101_backfill_file_type_on_event_attachment... OK
  Applying sentry.0102_collect_relay_analytics... OK
  Applying sentry.0103_project_has_alert_filters... OK
  Applying sentry.0104_collect_relay_public_key_usage... OK
  Applying sentry.0105_remove_nullability_of_event_attachment_type... OK
  Applying sentry.0106_service_hook_project_id_nullable... OK
  Applying sentry.0107_remove_spaces_from_slugs... OK
  Applying sentry.0108_update_fileblob_action... OK
  Applying sentry.0109_sentry_app_creator... OK
  Applying sentry.0110_sentry_app_creator_backill... OK
  Applying sentry.0111_snuba_query_event_type... OK
  Applying sentry.0112_groupinboxmodel... OK
  Applying sentry.0113_add_repositoryprojectpathconfig... OK
  Applying sentry.0114_add_unhandled_savedsearch... OK
  Applying sentry.0115_add_checksum_to_debug_file... OK
Project Debug Files: 100% |#####################################| Time: 0:00:00
  Applying sentry.0116_backfill_debug_file_checksum... OK
  Applying sentry.0117_dummy-activityupdate... OK
  Applying sentry.0118_backfill_snuba_query_event_types... OK
  Applying sentry.0119_fix_set_none... OK
  Applying sentry.0120_commit_author_charfield... OK
GroupInbox: 100% |#                                            | ETA:  --:--:--
  Applying sentry.0121_obliterate_group_inbox... OK
  Applying sentry.0122_add_release_status... OK
  Applying sentry.0123_groupinbox_addprojandorg... OK
  Applying sentry.0124_add_release_status_model... OK
  Applying sentry.0125_add_platformexternalissue_project_id... OK
  Applying sentry.0126_make_platformexternalissue_group_id_flexfk... OK
  Applying sentry.0127_backfill_platformexternalissue_project_id... OK
  Applying sentry.0128_change_dashboards... OK
  Applying sentry.0129_remove_dashboard_keys... OK
  Applying sentry.0130_remove_old_widget_models... OK
  Applying sentry.0131_drop_widget_tables... OK
  Applying sentry.0132_groupownermodel... OK
  Applying sentry.0133_dashboard_delete_object_status... OK
  Applying sentry.0134_dashboard_drop_object_status_column... OK
  Applying sentry.0135_removinguniquegroupownerconstraint... OK
Organizations: 100% |###########################################| Time: 0:00:00
  Applying sentry.0136_issue_alert_filter_all_orgs... OK
  Applying sentry.0137_dashboard_widget_interval... OK
  Applying sentry.0138_widget_query_remove_interval... OK
  Applying sentry.0139_remove_widgetquery_interval... OK
  Applying sentry.0140_subscription_checker... OK
  Applying sentry.0141_remove_widget_constraints... OK
  Applying sentry.0142_add_dashboard_tombstone... OK
  Applying sentry.0143_add_alerts_integrationfeature... OK
  Applying sentry.0144_add_publish_request_inprogress_status... OK
  Applying sentry.0145_rename_alert_rule_feature... OK
Organizations: 100% |###########################################| Time: 0:00:00
  Applying sentry.0146_backfill_members_alert_write... OK
  Applying sentry.0147_add_groupinbox_date_added_index... OK
  Applying sentry.0148_group_id_bigint... OK
  Applying sites.0002_alter_domain_unique... OK
Creating missing DSNs
Correcting Group.num_comments counter

I have seen somewhere people having results with “kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic events --from-beginning --max-messages 100” and indeed I have seen that in the logs:
[2,“insert”,{“group_id”:6255,“event_id”:“ee42f86a9947489082cd4cf76aa6e66f”,“organization_id”:1,“project_id”:8,“message”:“Unable to load the handpiece trajectory SAXBuilder.java build JDOMParseException Error on line 1 of document file:/C:/Users/DWOS/AppData/Local/DWIO/ioClientTmp/Patients/1-0-0-A78/1-0-0-1252/UPPER_ARCH.xml: El contenido no est\u00e1 permitido en el pr\u00f3logo. org.jdom.input.SAXBuilder in build”,“platform”:“java”,“datetime”:“2020-11-25T08:08:05.000000Z”,“data”:
[…]
,[“sentry:release”,“3.1.2.623”],[“sentry:user”,“username:DWIOC-10-001417”]],“extra”:{“Overclock settings”:“CPU Freq: 3.7GHz\nCPU VCore: 1.35V”,“Sentry-Threadname”:“FxProcessChain-5-17”,“TeamViewer ID”:“1381213686”,“Video adapter driver version”:“DriverVersion: 26.21.14.3086”},“sdk”:{“name”:“sentry-java”,“version”:“1.7.16-9b60b”,“integrations”:[“log4j”]},“errors”:[{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.0.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.1.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.2.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.3.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.14.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.15.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.29.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.38.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.39.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.40.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.41.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.42.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.43.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.44.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.45.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.46.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.47.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.48.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.49.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.50.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.0.stacktrace.frames.51.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.0.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.1.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.2.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.3.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.14.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.15.lineno”,“value”:-1,“reason”:“expected an unsigned integer”},{“type”:“invalid_data”,“name”:“exception.values.1.stacktrace.frames.29.lineno”,“value”:-1,“reason”:“expected an unsigned integer”}],“key_id”:“9”,“project”:8,“grouping_config”:{“id”:“legacy:2019-03-12”},“hashes”:[“07e9b53e57c1148fccbfde10c5b6fd94”],“location”:“SAXBuilder.java”,“metadata”:{“filename”:“SAXBuilder.java”,“function”:“build”,“type”:“JDOMParseException”,“value”:“Error on line 1 of document file:/C:/Users/DWOS/AppData/Local/DWIO/ioClientTmp/Patients/1-0-0-A78/1-0-0-1252/UPPER_ARCH.xml: El contenido no est\u00e1 permitido en el pr\u00f3logo.”},“title”:“JDOMParseException: Error on line 1 of document file:/C:/Users/DWOS/AppData/Local/DWIO/ioClientTmp/Patients/1-0-0-A78…”,“use_rust_normalize”:true},“primary_hash”:“07e9b53e57c1148fccbfde10c5b6fd94”,“retention_days”:90},{“is_new”:false,“is_regression”:false,“is_new_group_environment”:false,“skip_consume”:true}]
Processed a total of 100 messages

So I have the events somewhere, but I am not sure how to import them and why the migration did not transfer them.

Can someone point me to the good direction/log/configuration ?

I have sadly run out of time on that issue, and will move forward with the older events/issues lost. I have tried to understand the flow in the migration files but was not able to make progress there.

The only thing that seemed suspicious to me was that the parameter SkipConsume was true in 0024_auto_20191230_2052.py (I thought that could be why the events stayed in Kafka and did not go further).

We had a fix that went into 21.2.0 but this seems irrelevant as you don’t seem to have hit that case.

I think it may just be you needing to run docker-compose up -d and waiting a while for the system to process all the queued events. Due to the async nature of the whole system, event migration completing there doesn’t necessarily mean they are all processed immediately (correct me if I’m wrong @lynnagara)

Yeah, agree with what @BYK says. The events can only be processed if Snuba’s events consumer is running. You can run it while the migration is taking place if you like. Otherwise it will have to catch up when it is started.

Thanks for the input !

Indeed, I may have waited too little but at some point I tried to let it run over night and the result was the same.

Now we have moved on and we are OK with the current state: the service is working fine and we jsut let go of the previous stack traces.
It just eroded a bit the confidence we put in the service (or maybe rather in our ability to maintain it ?)

One point though: it seems that for issues that are coming back, we have the original “First Seen”, so all is not lost.

Depending on your hardware and number of events to migrate, several hours may not be enough :slight_smile: That said I have to agree that this is a bit unusual.

Newer versions of Sentry is not as easy to maintain unfortunately due to increased number of services (well, they provide awesome new capabilities but it’s not free).

That is not too surprising to me as a significant amount of event data is still stored on Postgres. Good to see that you still retained some historical data.

Anything we can help you at this point?

That is true !
I’ve never seen such a big technological stack:

  • 3 DB systems (PostGres, Clickhouse, Redis)
  • 2 queuing systems (RabbitMQ, Kafka)
  • various other systems (a service discovery, an API engine, a search engine,…)

I don’t know whether it’s daunting, mesmerizing or simply a foretaste of what’s coming in cloud services, but it sure got me scratching my head for a couple of weeks.

Thanks, we are good now.

1 Like

This kind of roughly shows where each part fits in: https://develop.sentry.dev/architecture/

One reason is the slow evolution of the architecture so it almost always ends up more complex than it could have been if it was designed from the beginning. The second part is due to the inevitable discrepancy between the self-hosted and cloud-hosted versions as the requirements from both services are quite different (one is mostly single-organization, way smaller scale, the other needs to scale essentially infinitely on cloud infra with ever-increasing orgs).

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.