How to clear backlog and monitor it

amit1 · August 11, 2020, 5:22pm

Seeing this message “Background workers haven’t checked in recently. It seems that you have a backlog of 200 tasks. Either your workers aren’t running or you need more capacity.”

How can i check tasks are getting processed and monitor them sentry-cli gives a http200 , however the Ui doesnot process the event

BYK · August 11, 2020, 6:11pm

Can you share your worker logs?

amit1 · August 11, 2020, 6:42pm

I removed volumes
sentry-kafka
sentry-zookeeper
and reran ./install.sh

During the install and docker-compose up i get
snuba-outcomes-consumer_1 | %3|1597171318.012|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.9:9092 failed: Connection refused (after 59ms in state CONNECT)
snuba-outcomes-consumer_1 | %3|1597171318.012|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.9:9092 failed: Connection refused (after 48ms in state CONNECT)

amit1 · August 11, 2020, 7:50pm

my worker logs. But , seems like the problem is with the Kafka connections

entry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
/usr/local/lib/python2.7/site-packages/cryptography/init.py:39: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
CryptographyDeprecationWarning,
18:42:15 [INFO] sentry.plugins.github: apps-not-configured
18:42:16 [INFO] sentry.bgtasks: bgtask.spawn (task_name=u’sentry.bgtasks.clean_dsymcache:clean_dsymcache’)
18:42:16 [INFO] sentry.bgtasks: bgtask.spawn (task_name=u’sentry.bgtasks.clean_releasefilecache:clean_releasefilecache’)

-------------- celery@9e74d72ecd16 v4.1.1 (latentcall)
---- **** -----
— * *** * – Linux-3.10.0-957.10.1.el7.x86_64-x86_64-with-debian-10.1 2020-08-11 18:42:20
– * - **** —

** ---------- [config]
** ---------- .> app: sentry:0x7fc2a527acd0
** ---------- .> transport: redis://redis:6379/0
** ---------- .> results: disabled://
*** — * — .> concurrency: 2 (prefork)
– ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
— ***** -----
-------------- [queues]
.> activity.notify exchange=(direct) key=activity.notify
.> alerts exchange=(direct) key=alerts
.> app_platform exchange=(direct) key=app_platform
.> assemble exchange=(direct) key=assemble
.> auth exchange=(direct) key=auth
.> buffers.process_pending exchange=(direct) key=buffers.process_pending
.> cleanup exchange=(direct) key=cleanup
.> commits exchange=(direct) key=commits
.> counters-0 exchange=counters(direct) key=default
.> data_export exchange=(direct) key=data_export
.> default exchange=(direct) key=default
.> digests.delivery exchange=(direct) key=digests.delivery
.> digests.scheduling exchange=(direct) key=digests.scheduling
.> email exchange=(direct) key=email
.> events.preprocess_event exchange=(direct) key=events.preprocess_event
.> events.process_event exchange=(direct) key=events.process_event
.> events.reprocess_events exchange=(direct) key=events.reprocess_events
.> events.reprocessing.preprocess_event exchange=(direct) key=events.reprocessing.preprocess_event
.> events.reprocessing.process_event exchange=(direct) key=events.reprocessing.process_event
.> events.reprocessing.symbolicate_event exchange=(direct) key=events.reprocessing.symbolicate_event
.> events.save_event exchange=(direct) key=events.save_event
.> events.symbolicate_event exchange=(direct) key=events.symbolicate_event
.> files.delete exchange=(direct) key=files.delete
.> incident_snapshots exchange=(direct) key=incident_snapshots
.> incidents exchange=(direct) key=incidents
.> integrations exchange=(direct) key=integrations
.> merge exchange=(direct) key=merge
.> options exchange=(direct) key=options
.> relay_config exchange=(direct) key=relay_config
.> reports.deliver exchange=(direct) key=reports.deliver
.> reports.prepare exchange=(direct) key=reports.prepare
.> search exchange=(direct) key=search
.> sleep exchange=(direct) key=sleep
.> stats exchange=(direct) key=stats
.> subscriptions exchange=(direct) key=subscriptions
.> triggers-0 exchange=triggers(direct) key=default
.> unmerge exchange=(direct) key=unmerge
.> update exchange=(direct) key=update

Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 354, in create_loop
cb(*cbargs)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1047, in on_readable
self.cycle.on_readable(fileno)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 344, in on_readable
chan.handlerstype
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 721, in _brpop_read
**options)
File “/usr/local/lib/python2.7/site-packages/redis/client.py”, line 680, in parse_response
response = connection.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 624, in read_response
response = self._parser.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 403, in read_response
(e.args,))
ConnectionError: Error while reading from socket: (‘Connection closed by server.’,)
18:47:28 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…
Restoring 7 unacknowledged message(s)

worker: Warm shutdown (MainProcess)
19:43:04 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
/usr/local/lib/python2.7/site-packages/cryptography/init.py:39: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
CryptographyDeprecationWarning,

amit1 · August 12, 2020, 12:24am

@BYK, the issue on the worker node still exist . The error i am getting on the worker is below.

i checked networking from worker to redis, and that looks good. so not sure . The UI robot , sample event creation works, but sending an event to a DSN externally doesnot .

Traceback (most recent call last):
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 316, in start
blueprint.start(self)
File “/usr/local/lib/python2.7/site-packages/celery/bootsteps.py”, line 119, in start
step.start(parent)
File “/usr/local/lib/python2.7/site-packages/celery/worker/consumer/consumer.py”, line 592, in start
c.loop(*c.loop_args())
File “/usr/local/lib/python2.7/site-packages/celery/worker/loops.py”, line 91, in asynloop
next(loop)
File “/usr/local/lib/python2.7/site-packages/kombu/asynchronous/hub.py”, line 354, in create_loop
cb(*cbargs)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 1047, in on_readable
self.cycle.on_readable(fileno)
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 344, in on_readable
chan.handlerstype
File “/usr/local/lib/python2.7/site-packages/kombu/transport/redis.py”, line 721, in _brpop_read
**options)
File “/usr/local/lib/python2.7/site-packages/redis/client.py”, line 680, in parse_response
response = connection.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 624, in read_response
response = self._parser.read_response()
File “/usr/local/lib/python2.7/site-packages/redis/connection.py”, line 403, in read_response
(e.args,))
ConnectionError: Error while reading from socket: (‘Connection closed by server.’,)
00:06:55 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection…
Restoring 7 unacknowledged message(s)

BYK · August 12, 2020, 9:19am

Maybe your Redis port or credentials are not set correctly?

amit1 · August 13, 2020, 4:16am

Running nmap from worker to redis, shows good connection ,
I am not sure on any creds on redis. What might it be , i retried install , still same issue

nmap -p 6379 redis
Starting Nmap 7.70 ( https://nmap.org ) at 2020-08-13 04:14 UTC
Nmap scan report for redis (172.22.0.8)
Host is up (0.000085s latency).
rDNS record for 172.22.0.8: sentry_onpremise_redis_1.sentry_onpremise_default

Redis configs

on sentry.conf.py
SENTRY_OPTIONS[“redis.clusters”] = {
“default”: {
“hosts”: {0: {“host”: “redis”, “password”: “”, “port”: “6379”, “db”: “0”}}
}
}
on docker-compose.yml
redis:
<< : *restart_policy
image: ‘redis:5.0-alpine’
volumes:
- ‘sentry-redis:/data’

BYK · August 21, 2020, 8:33am

Are you using the on-premise repo without any modifications or do you have a custom setup? If you have some customizations, can you make sure you mount the sentry config volume to worker service too and the worker image and sentry images are at the same version?

amit1 · August 23, 2020, 11:35pm

@BYK , i am using the on-premise repo , directly. No custom setup.
How do i check the version of worker image and the sentry images?

Based on the on-prem docker-compose, there is no volume mounted for the worker service
worker:
<< : *sentry_defaults
command: run worker

Our production setup , ingests 100-200 tasks every 1hour. I stood the application from scratch using the new repo. It works initially , and then as the traffic increases, the workers stop processing the tasks . Now there are about 20K messages to be processed.

I am perplexed , if the kafka configurations are not optimized for a production setup. My issue is similar to topic Sentry stops processing events after upgrade 10.0 => 20.8.0.dev0ba2aa70

BYK · August 24, 2020, 8:44am

None of the on-premise setup is optimized for heavy use as you can guess from our use of docker-compose and having everything on a single node.

The run workers command has some options for production optimization and fine tuning:

github.com

getsentry/sentry/blob/0e23483769fe39b6f53f95ed0d8e7af976847a1d/src/sentry/runner/commands/run.py#L144-L164


@click.option(
    "--queues",
    "-Q",
    type=QueueSet,
    help=(
        "List of queues to enable for this worker, separated by "
        "comma. By default all configured queues are enabled. "
        "Example: -Q video,image"
    ),
)
@click.option("--exclude-queues", "-X", type=QueueSet)
@click.option(
    "--concurrency",
    "-c",
    default=cpu_count(),
    help=(
        "Number of child processes processing the queue. The "
        "default is the number of CPUs available on your "
        "system."
    ),
)

You may want to leverage those such as having multiple, dedicated workers for specific queues.

amit1 · August 24, 2020, 9:01pm

@BYK are you suggesting something like on the docker-compose
worker:
<< : *sentry_defaults
command: run worker -c -Q ingest-consumer, snuba-consumers,…
basically concurrent workers processes for each topic

BYK · August 25, 2020, 6:12pm

Similar: having multiple separate workers (such as worker-1, worker-2 etc) dedicated to specific queues. I think you can see which queues get the highest load from somewhere and you can have dedicated workers for those queues only.

amit1 · August 25, 2020, 6:50pm

@BYK, I looked up the Queues using below. Are the worker processing messages in each of these topics ?
kafka-topics --list --zookeeper zookeeper:2181
__consumer_offsets
cdc
errors-replacements
event-replacements
events
ingest-attachments
ingest-events
ingest-sessions
ingest-transactions
outcomes
snuba-commit-log

I am not sure how to see the the highest load. Any pointers you can provide will help.

BYK · August 26, 2020, 6:37pm

@amit1 - oh, those are Kafka queues, which are not used by workers. Worker queues are in Redis. I think these are all the queues we have:

github.com

getsentry/sentry/blob/fc218f38f5276e0e2298ee711d876a9b2c573028/src/sentry/conf/server.py#L543-L584


CELERY_QUEUES = [
    Queue("activity.notify", routing_key="activity.notify"),
    Queue("alerts", routing_key="alerts"),
    Queue("app_platform", routing_key="app_platform"),
    Queue("auth", routing_key="auth"),
    Queue("assemble", routing_key="assemble"),
    Queue("buffers.process_pending", routing_key="buffers.process_pending"),
    Queue("commits", routing_key="commits"),
    Queue("cleanup", routing_key="cleanup"),
    Queue("data_export", routing_key="data_export"),
    Queue("default", routing_key="default"),
    Queue("digests.delivery", routing_key="digests.delivery"),
    Queue("digests.scheduling", routing_key="digests.scheduling"),
    Queue("email", routing_key="email"),
    Queue("events.preprocess_event", routing_key="events.preprocess_event"),
    Queue(
        "events.reprocessing.preprocess_event", routing_key="events.reprocessing.preprocess_event"
    ),
    Queue("events.symbolicate_event", routing_key="events.symbolicate_event"),
    Queue(

This file has been truncated. show original

amit1 · September 8, 2020, 4:48pm

@BYK, i was able to fix this issue by adding additional workers … as below .

I have a 4 CPU , 8 GB ram setup . and see a very high utilization of resources. Is this normal . Any way to optimize resource allocation. The sentry app, uses about 7.2GB of memory out of the 8Gb allocated.

worker1:
    << : *sentry_defaults
    command: run worker -Q events.process_event
  worker2:
    << : *sentry_defaults
    command: run worker -Q events.reprocessing.process_event
  worker3:
    << : *sentry_defaults
    command: run worker -Q events.reprocess_events
  worker4:
    << : *sentry_defaults
    command: run worker -Q events.save_event
  worker5:
    << : *sentry_defaults
    command: run worker -Q subscriptions
  worker:
    << : *sentry_defaults
    command: run worker

BYK · September 8, 2020, 5:30pm

Might be related to Clickhouse. I strongly recommend you to keep an eye on feat(clickhouse): Reduce max memory usage to 30% of RAM by BYK · Pull Request #662 · getsentry/self-hosted · GitHub

amit1 · September 8, 2020, 6:46pm

@BYK I see the PR merged. do you suggest a reinstall of the app , to pull down new images?

BYK · September 8, 2020, 7:05pm

Responded on the PR (no need to double post ):

@amitseth7 this should resolve the high memory usage of Clickhouse, yes. To be able to use this fix, you don’t need a re-install with new images.

You can just apply this PR manually and do docker-compose restart clickhouse . Make sure to check the log output to see sentry.xml getting applied in the init phase of clickhouse and you should be good.

amit1 · September 8, 2020, 7:17pm

My apologies.
I will reapply the PR.

vtlzh · December 7, 2020, 5:05am

I have a question.
Last worker will understand by himself that another queues are used by their own workers or I must use --exclude option for it?

  worker:
    << : *sentry_defaults
    command: run worker -X events.process_event,events.reprocessing.process_event,events.reprocess_events,events.save_event,subscriptions

Topic		Replies	Views
Background workers haven't checked in recently	18	11942	December 6, 2018
Large backlog of events.process_event and events.save_event On-Premise	8	3398	October 26, 2020
Sentry consuming all CPU	3	5182	November 29, 2019
Sentry in kubernetes On-Premise	16	9315	June 30, 2020
Sentry stops processing events after upgrade 10.0 => 20.8.0.dev0ba2aa70 On-Premise	52	11693	December 8, 2020

How to clear backlog and monitor it

Related topics