Sentry worker dying

HappyPanda · August 7, 2020, 9:27am

Currently setting up sentry in kubernetes. Currently trying to run the sentry run worker process in a pod that has a memory and cpu limit of 3GB and 1 core. This is the error message i am getting:

09:17:00 [ERROR] multiprocessing: Process 'Worker-77' pid:90 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-76' pid:89 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-75' pid:88 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-74' pid:87 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-73' pid:86 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-72' pid:85 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-71' pid:84 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-70' pid:83 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-69' pid:82 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-68' pid:81 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-67' pid:80 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-66' pid:79 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-65' pid:78 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-64' pid:77 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-63' pid:76 exited with 'signal 9 (SIGKILL)'
09:17:00 [ERROR] multiprocessing: Process 'Worker-62' pid:75 exited with 'signal 9 (SIGKILL)'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 374, in start
    return self.obj.start()
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 280, in start
    blueprint.start(self)
  File "/usr/local/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 884, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python2.7/site-packages/celery/worker/loops.py", line 48, in asynloop
    raise WorkerLostError('Could not start worker processes')
WorkerLostError: Could not start worker processes
09:17:00 [ERROR] celery.worker: Unrecoverable error: WorkerLostError('Could not start worker processes',)
09:17:06 [INFO] sentry.bgtasks: bgtask.stop (task_name=u'sentry.bgtasks.clean_dsymcache:clean_dsymcache')
09:17:06 [INFO] sentry.bgtasks: bgtask.stop (task_name=u'sentry.bgtasks.clean_releasefilecache:clean_releasefilecache')

Sentry version is 20.7.2.

untitaker · August 7, 2020, 9:41am

It looks like something external, such as oomkiller, terminates your
worker with SIGKILL.

HappyPanda · August 7, 2020, 9:50am

Yes OOM is killing the workers because they use 4 GB of Ram so this seems a tad bit excessive…but that shouldnt be the default state.

untitaker · August 7, 2020, 10:01am

We did not set up the OOM killer, so we can’t fix the broken default state. It is not part of any docker container. From a quick google search I think you want to pass --oom-kill-disable to docker run somehow.

Also please stop replying to multiple old threads about your problem, it’s sufficient to open up one thread and wait a bit for a response.

HappyPanda · August 7, 2020, 10:39am

The upper bounds of this memory was deliberately set because i have quite strict resource restrictions. Cant do anything about the OOM Killer sadly. What fascinates me is that by default, the worker process takes so many resources and just dies when limited. I mean this process alone has almost double the resources that are advised in the onpremise repo…

untitaker · August 7, 2020, 11:28am

It seems that your celery worker attempts to spawn a large amount of subprocesses, or at least a number that seems way too high for a 3GB RAM machine. sentry run worker infers this process count from the number of CPUs it detects, try something like sentry run worker -c 1 perhaps.

We do say the minimum requirement is 2.4GB, but I think this might have been measured on a single-core machine or something like that. Might be worth revisiting or adding a disclaimer that RAM usage depends on CPU count, if the above suggestion works.

HappyPanda · August 10, 2020, 8:12am

Above suggestions works perfectly fine! No crashes whatsoever over the weekend and a reduced RAM usage to ~150 MB. Thanks a lot!

Topic		Replies	Views
Performance of sentry worker	2	2516	September 28, 2020
Sentry docker on kubernetes stop uWSGI	0	2081	January 28, 2018
Sentry consuming all CPU	3	4920	November 29, 2019
Sentry 20.9.0- kafka failures On-Premise	3	1301	May 23, 2021
Sentry 21.5.1 slow processing messages	1	863	August 21, 2021

Sentry worker dying

Related topics