Is the Kafka key necessary?


When a project reports a large number of errors, because project_id is specified as the key of kafka, all the data hits the same Partition, which leads to lag of the clickhouse consumer.
Question:
I can delete the key in the producer?

@fpacifici thoughts on this?

Hi,

unfortunately semantic partitioning (by project id) of the main events topics is a functional requirement, so you cannot remove the key from there.
This is a requirement to preserve sequential consistency between the order of the events in a project and some functionalities that happen downstream of Kafka. Specifically, alerts require this partitioning to provide correct results (that is why you cannot create a metric alert across projects), mutability of events (delete/merge/unmerge) requires this guarantee and several post processing action like external integrations as they need to be executed when events are already stored.

There is a way to speed up the consumer though by processing events on multiple cores.
the snuba consumers have three parameters:

  • processes number of processes that process events concurrently (defaults to 1)
  • input-block-size size in bytes of the input buffer that is dispatched to different processes. Ensures it is big enough to contain a lot of events (like more than 50Mb)
  • output-block-size size in bytes of the buffer to reassemble the messages. Bigger than the input block size.

You can set them here:

and

If you set the three your consumer should run on multiple cores and increase the throughput a lot (at the cost of more system resources used).

Hope this helps and happy to help if there are further questions

Filippo

It works at clickhouse consumer, but in post-process-forwarder has the same problem,there is no process parameters.

How to improve post-process-forwarder processing speed?

Hi,

sorry for that there is no immediate solution yet.
The multiprocess consumer relies on python shared memory support which came with python 3.8. Sentry is still being migrated to 3.8 and we are actively working on porting the multiprocess consumer there.

Though this won’t be available before the end of July.

Best
Filippo

Is it possible to increase the number of post-process-forwarder instances?

You should be able to start multiple forwarders, though they consume from the same topic as the Snuba consumer, which is partitioned in the same way. If you receive too many events on a single project, they will be on the same partition thus processed by the same post process forwarder.

Still you should be able to start multiple instances just by scaling out the post-process-forwarder service with docker-compose. This is not Sentry specific, it is just the scale option on docker-compose --scale post-process-forwarder=NUMBER OF REPLICAS
You can also increase the messages processed before committing to reduce the number of commits and that could give you a performance gain --commit-batch-size 500 in the docker compose file.

Best
Filippo