Cant get email alert anymore

speech77 · November 25, 2020, 3:42pm

Hi there,

as by subject, we can’t get new email alert anymore after sentry host has gone out of disk space.
Email test works correctly.

I’m still trying to figure out which container should log the error…

Please help

UPDATE: containers are all up & running

snuba-subscription-consumer-transactions_1 errors

snuba-subscription-consumer-transactions_1  | 2020-11-25 15:34:56,240 Could not construct valid time interval between MessageDetails(offset=1991, timestamp=datetime.datetime(2020, 11, 25, 15, 34, 54, 166000)) and Message(partition=Partition(topic=Topic(name='events'), index=0), offset=1992)!
snuba-subscription-consumer-transactions_1  | Traceback (most recent call last):
snuba-subscription-consumer-transactions_1  |   File "/usr/src/snuba/snuba/subscriptions/consumer.py", line 129, in poll
snuba-subscription-consumer-transactions_1  |     time_interval = Interval(previous_message.timestamp, message.timestamp)
snuba-subscription-consumer-transactions_1  |   File "<string>", line 5, in __init__
snuba-subscription-consumer-transactions_1  |   File "/usr/src/snuba/snuba/utils/types.py", line 67, in __post_init__
snuba-subscription-consumer-transactions_1  |     raise InvalidRangeError(self.lower, self.upper)
snuba-subscription-consumer-transactions_1  | snuba.utils.types.InvalidRangeError: (datetime.datetime(2020, 11, 25, 15, 34, 54, 166000), datetime.datetime(2020, 11, 25, 15, 34, 54, 165000))

snuba-subscription-consumer-events_1 errors

snuba-subscription-consumer-events_1        | 2020-11-25 15:43:05,698 Could not construct valid time interval between MessageDetails(offset=2296, timestamp=datetime.datetime(2020, 11, 25, 15, 43, 3, 303000)) and Message(partition=Partition(topic=Topic(name='events'), index=0), offset=2297)!
snuba-subscription-consumer-events_1        | Traceback (most recent call last):
snuba-subscription-consumer-events_1        |   File "/usr/src/snuba/snuba/subscriptions/consumer.py", line 129, in poll
snuba-subscription-consumer-events_1        |     time_interval = Interval(previous_message.timestamp, message.timestamp)
snuba-subscription-consumer-events_1        |   File "<string>", line 5, in __init__
snuba-subscription-consumer-events_1        |   File "/usr/src/snuba/snuba/utils/types.py", line 67, in __post_init__
snuba-subscription-consumer-events_1        |     raise InvalidRangeError(self.lower, self.upper)
snuba-subscription-consumer-events_1        | snuba.utils.types.InvalidRangeError: (datetime.datetime(2020, 11, 25, 15, 43, 3, 303000), datetime.datetime(2020, 11, 25, 15, 43, 3, 300000))

UPDATE2: kafka events and ingest-events are growing up, is it normal?

3.0M	/var/lib/kafka/data/__consumer_offsets-0
4.0K	/var/lib/kafka/data/cdc-0
0		/var/lib/kafka/data/cleaner-offset-checkpoint
4.0K	/var/lib/kafka/data/errors-replacements-0
8.0K	/var/lib/kafka/data/event-replacements-0
436M	/var/lib/kafka/data/events-0
4.0K	/var/lib/kafka/data/events-subscription-results-0
328K	/var/lib/kafka/data/ingest-attachments-0
436M	/var/lib/kafka/data/ingest-events-0
8.0K	/var/lib/kafka/data/ingest-sessions-0
4.0K	/var/lib/kafka/data/ingest-transactions-0
4.0K	/var/lib/kafka/data/log-start-offset-checkpoint
4.0K	/var/lib/kafka/data/meta.properties
2.3M	/var/lib/kafka/data/outcomes-0
4.0K	/var/lib/kafka/data/recovery-point-offset-checkpoint
4.0K	/var/lib/kafka/data/replication-offset-checkpoint
396K	/var/lib/kafka/data/snuba-commit-log-0
4.0K	/var/lib/kafka/data/transactions-subscription-results-0

chadwhitacre · November 25, 2020, 9:14pm

Sorry to be dense … are you expecting an email telling you that you’ve run out of disk space? Or an email alert about something else, which disk space seems to prevent?

BYK · November 26, 2020, 6:00am

No, I think they are referring to the notification emails. Them not being sent and the growth indicate stuck / broken workers. I’d check the logs for them.

speech77 · November 26, 2020, 3:59pm

Hi guys,

thanks a lot for your answers.

@chadwhitacre an issue about that in the ‘internal’ project would be great

@BYK yes! you are right!
When the host went out of disk space, I did free some space and restart the cluster.
Notification emails did not get back to work as expected.

I had to fix with an rm on all volumes but:

sentry-data
sentry-postgres
sentry-clickhouse

Is it the right way in this scenario?
Could you please provide us some hint/ integrity check to perform just to be sure the cluster is healthy?

BYK · November 27, 2020, 11:17am

PRs accepted

Not the ideal one as I think you just removed all in-flight events (which may be okay to lose but we’d rather avoid this). I think the correct recovery would be to follow these instructions: Post Process Forwarder - KafkaError "Offset Out of Range" · Issue #478 · getsentry/self-hosted · GitHub (official docs on this coming soon) as I’m assuming it was Kafka which was having issues.

If you can find your logs from that time (especially for the worker service), we can make , more educated guesses.

You can try sending a test event and make sure it shows up. If you already have a project set up for this you can try that or do what we do here in onpremise tests: https://github.com/getsentry/onpremise/blob/19f4561a9e2abe32dc5eb5a03a332b50f2265b4b/test.sh

Hope these help!

speech77 · November 27, 2020, 12:13pm

Thanks a lot!

system · December 12, 2020, 12:13pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Errors on new build- snuba-subscription-consumer-events_1 On-Premise	2	1805	June 24, 2021
Many snuba.utils.types.InvalidRangeError	1	1091	December 30, 2021
We are continuously testing the part where there is a problem related to the topic On-Premise	1	3504	December 14, 2021
Sentry's email alert do not send On-Premise	5	1560	March 30, 2021
Perfomance metric alarm related to Snuba-events-subscriptions-consumers, Snuba-transactions-subscriptions-consumers On-Premise	2	3330	December 14, 2021

Cant get email alert anymore

Related topics