Long queue processing time

We are trying to understand why sometimes messages appear in the system only an hour after they appear in the source.
The first thing I found was that at such moments there is a large queue of counters-0 and events.preprocess_event, I tried to increase the number of workers from 6 to 36 (6 processes with 6 concurrencies), but this did not help, every day I observe surges of up to 20,000 counters -0 and up to several thousand events.preprocess_event.

In order to understand the problem, I turned on the internal metrics and redirected them to Prometheus via StatsD and StatsD-exporter, but they did not help me because I do not understand what this or that metric is responsible for.

The first question is, is there somewhere a description of internal metrics?

The second question - can someone else advise how to understand what the problem is?

I have had the same problem for a long time, I have already made several changes and it still doesn’t solve it, when it starts to enter events it piles up and gets delayed.

StepanKuksenko , did you manage to solve?