Hello. We are running an in-house service using 9.1.2 on-premise version. The service’s TPS averages between 20 and 40 and rises to around 60 when high.
The infrastructure we operate is as follows.
Nginx server: 8cpu 3ea
Web server: 8cpu 3ea
Worker server: 8cpu 5ea
DB server: 20cpu 1ea
During service operation, error events exposed from the dashboard are delayed.
We checked the status of the queue with the sentry queues list command. preprocess_event was accumulating.
We increased the concurrency number from the default 8-configuration to 16 or 32 so that workers can do more. However, DB cpu usage only increased, but there was no significant change.
I tried increasing the number of physical worker servers in a different way, but this did not change either.
I tried running the worker with a lower performance 1cpu server, but nothing changed. So, we are currently running 10 workers with 1cpu server.
Searching has shown that it is possible to run a worker that only performs work on a specific queue. So, we ran a dedicated worker for preprocess, process, and save queue.
When I checked the results, I noticed that the jobs were accumulated in the save queue.
There was no change even when the worker for save queue was scaled out.
The request is sent to the sentry web, and the work is continuously accumulated in the queue, but I think that the above phenomenon occurs because the worker does not work properly. How do I solve this problem?