Cpu usage peaks 100%

Hello, all @BYK

We have on-premise Sentry installation 20.10.1 (https://github.com/getsentry/onpremise).
Hardware :
CPU(s): 4
RAM: 16GB
Ubuntu 20.04.1 LTS

config.yml
SENTRY_WEB_OPTIONS = {
“http”: “%s:%s” % (SENTRY_WEB_HOST, SENTRY_WEB_PORT),
“protocol”: “uwsgi”,
# This is needed in order to prevent https://git.io/fj7Lw
“uwsgi-socket”: None,
“so-keepalive”: True,
# Keep this between 15s-75s as that’s what Relay supports
“http-keepalive”: 15,
“http-chunked-input”: True,
# the number of web workers
“workers”: 3,
“threads”: 4,
“memory-report”: False,
# Some stuff so uwsgi will cycle workers sensibly
“max-requests”: 100000,
“max-requests-delta”: 500,
“max-worker-lifetime”: 86400,
# Duplicate options from sentry default just so we don’t get
# bit by sentry changing a default value that we depend on.
“thunder-lock”: True,
“log-x-forwarded-for”: False,
“buffer-size”: 32768,
“limit-post”: 209715200,
“disable-logging”: True,
“reload-on-rss”: 600,
“ignore-sigpipe”: True,
“ignore-write-errors”: True,
“disable-write-exception”: True,

We are experiencing cpu 100% peaks and looks that all cpu is used by workers and clickhouse server.
Anyone had this issue? Any ideas how to resolve this?
screens from htop and docker stats attached:


Hey , from what i can see you have your worker container completely flooded with work .
(a tip google for ‘ctop docker’ i use it a lot to inspect quickly the logs and the container in side if needed ) .
I lately suffered a DDoS , and it was our fault , as one of the projects is an electron app that should to updates and 1600 clients in Win were crashing with NsisUpdater problem.
That made sentry go completely crazy , Kafka was crashing all the time and just after some time i started looking for the traffic coming in on the web container.
Hope you find the source of your problem

Hey @digas and thanks for quick replay and assist…

i installed ctop, but i see pretty the same info as from docker stats/docker logs (cpu,ram,iops,net), there is no lot of additional clues…


Sorry if my response miss lead you, of thinking that i’m from Sentry or i’m a “Guru” on Sentry. The ctop was just a tip of a tool . I hope someone with more knowledge then me could help you.
So you don’t have anything coming on the worker logs ?

@digas for me it looks that its somehow related with Sentry config like with count of workers, requests, threads, buffers size etc etc… maybe @BYK will join our topic and give some clues…
this peaks is like once per 10-20minute,10-15seconds long and mostly when there is lot of events processing…

PS before we migrated to 20.10.1 we had 9.12 sentry deployment (deployed on server with 4GB ram + 2cpus) and there was no issues with cpu overloading at all (was processing the same count of events as now)

Kristaps

@BYK any clues on this?

Kristaps

Hey @BYK, can you assist on this?

Kristaps

@lvdombrkr can you please stop spamming me and the topic?

@BYK yes, sure sorry for this :slight_smile:

I have, the same problem as described in the topic, anyone has solved this?

I’ve tried upgrading to the newest with ./install.sh pulling changes from the master but didn’t help.

Hey @Krzysieqq,

I tried lot of things - changing nginx config, linux sysctl config,changing sentry threads/workers, limited/changed priority clickhouse/workers cpu usage, but nothing helped.
At the moment as workaround we just increased vpcus till 8 and processing with this setup.
After we increased vpcus we still have those 100% cpu peaks but its not freezing all server as before with 4vcpu.

Kristaps

You should have this in already if you are using 20.10.1 but it still looks a bit suspicious:

With https://github.com/getsentry/sentry/pull/20781 we dropped Clickhouse’s memory usage quite a lot. In your logs, it still seems to be using 13GB which is quite a lot more than 30% of your total 16G available. Could you be running out of memory frequently and the swap operations taking a lot of CPU cycles?

Also yeah, 4 cores is not much cosidering how many processes are running on a single host.