config.yml
SENTRY_WEB_OPTIONS = {
“http”: “%s:%s” % (SENTRY_WEB_HOST, SENTRY_WEB_PORT),
“protocol”: “uwsgi”,
# This is needed in order to prevent https://git.io/fj7Lw
“uwsgi-socket”: None,
“so-keepalive”: True,
# Keep this between 15s-75s as that’s what Relay supports
“http-keepalive”: 15,
“http-chunked-input”: True,
# the number of web workers
“workers”: 3,
“threads”: 4,
“memory-report”: False,
# Some stuff so uwsgi will cycle workers sensibly
“max-requests”: 100000,
“max-requests-delta”: 500,
“max-worker-lifetime”: 86400,
# Duplicate options from sentry default just so we don’t get
# bit by sentry changing a default value that we depend on.
“thunder-lock”: True,
“log-x-forwarded-for”: False,
“buffer-size”: 32768,
“limit-post”: 209715200,
“disable-logging”: True,
“reload-on-rss”: 600,
“ignore-sigpipe”: True,
“ignore-write-errors”: True,
“disable-write-exception”: True,
We are experiencing cpu 100% peaks and looks that all cpu is used by workers and clickhouse server.
Anyone had this issue? Any ideas how to resolve this?
screens from htop and docker stats attached:
Hey , from what i can see you have your worker container completely flooded with work .
(a tip google for ‘ctop docker’ i use it a lot to inspect quickly the logs and the container in side if needed ) .
I lately suffered a DDoS , and it was our fault , as one of the projects is an electron app that should to updates and 1600 clients in Win were crashing with NsisUpdater problem.
That made sentry go completely crazy , Kafka was crashing all the time and just after some time i started looking for the traffic coming in on the web container.
Hope you find the source of your problem
Sorry if my response miss lead you, of thinking that i’m from Sentry or i’m a “Guru” on Sentry. The ctop was just a tip of a tool . I hope someone with more knowledge then me could help you.
So you don’t have anything coming on the worker logs ?
@digas for me it looks that its somehow related with Sentry config like with count of workers, requests, threads, buffers size etc etc… maybe @BYK will join our topic and give some clues…
this peaks is like once per 10-20minute,10-15seconds long and mostly when there is lot of events processing…
PS before we migrated to 20.10.1 we had 9.12 sentry deployment (deployed on server with 4GB ram + 2cpus) and there was no issues with cpu overloading at all (was processing the same count of events as now)
I tried lot of things - changing nginx config, linux sysctl config,changing sentry threads/workers, limited/changed priority clickhouse/workers cpu usage, but nothing helped.
At the moment as workaround we just increased vpcus till 8 and processing with this setup.
After we increased vpcus we still have those 100% cpu peaks but its not freezing all server as before with 4vcpu.
You should have this in already if you are using 20.10.1 but it still looks a bit suspicious:
With https://github.com/getsentry/sentry/pull/20781 we dropped Clickhouse’s memory usage quite a lot. In your logs, it still seems to be using 13GB which is quite a lot more than 30% of your total 16G available. Could you be running out of memory frequently and the swap operations taking a lot of CPU cycles?
Also yeah, 4 cores is not much cosidering how many processes are running on a single host.