Hi There,
I have set up my on-premise environment in the following way: EKS cluster with redis, kafka and postgresql services as AWS Managed services - Elastic Cache, MSK and RDS respectively.
Clickhouse and symbolicator are deployed as statefulset.
While I was load testing the setup to test the scale in and out
the snuba component: snuba-event-consumer is going on a crashloopback.
Logs:
```❯ kubectl logs snuba-event-consumer-xxxxxxxx -n sentry
2020-12-17 09:46:47,231 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 13713}
2020-12-17 09:46:49,335 Completed processing <Batch: 368 messages, open for 2.09 seconds>.
2020-12-17 09:46:50,635 Caught Exception('Broker: Not enough in-sync replicas'), shutting down...
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/consumer.py", line 161, in consumer
consumer.run()
File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 109, in run
self._run_once()
File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 144, in _run_once
self.__processing_strategy.poll()
File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/transform.py", line 55, in poll
self.__next_step.poll()
File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 122, in poll
self.__close_and_reset_batch()
File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 105, in __close_and_reset_batch
self.__batch.join()
File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 73, in join
self.__step.join(timeout)
File "/usr/src/snuba/snuba/consumer.py", line 238, in join
self.__replacement_batch_writer.join(timeout)
File "/usr/src/snuba/snuba/consumer.py", line 163, in join
self.__producer.flush(*args)
File "/usr/src/snuba/snuba/utils/streams/backends/kafka.py", line 755, in __commit_message_delivery_callback
raise Exception(error.str())
Exception: Broker: Not enough in-sync replicas```
The snuba-event-consumer deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: snuba
name: snuba-event-consumer
namespace: sentry
spec:
replicas: 1
selector:
matchLabels:
app: snuba-event-consumer
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: snuba-event-consumer
spec:
containers:
- image: getsentry/snuba:77a6bbfc892c442e3a2230ca20cc6bcc5e2620ce
imagePullPolicy: Always
name: snuba-event-consumer
resources:
requests:
cpu: “0.125”
memory: “350Mi”
limits:
cpu: “0.25”
memory: “700Mi”
command: [“snuba”]
args: [“consumer”,“–storage”, “events”,“–auto-offset-reset=latest”, “–max-batch-time-ms”, “750”]
envFrom:
- configMapRef:
name: snuba-config
The snuba.config is as follows:
apiVersion: v1
kind: ConfigMap
metadata:
name: snuba-config
namespace: sentry
data:
SNUBA_SETTINGS: docker
CLICKHOUSE_HOST: clickhousedb
DEFAULT_BROKERS: 'xxx:9092','xxxx:9092','xxx:9092'
REDIS_HOST: redis-service
UWSGI_MAX_REQUESTS: '10000'
UWSGI_DISABLE_LOGGING: 'true'
The default kafka options provided in sentry.conf.py are as follows:
DEFAULT_KAFKA_OPTIONS = {
"bootstrap.servers": 'xxx:9092','xxxx:9092','xxx:9092',
"message.max.bytes": 50000000,
"socket.timeout.ms": 10000,
"acks": 1,
}
*Are there any recommendations for the following kafka configuration to be set ?And also is there any snuba specific kafka settings to be used ? I cant find any int the documentation.
min.insync.replicas
replication.factor
ack
At present I have the following kafka configuration set in MSK:
auto.create.topics.enable = true
delete.topic.enable = true
default.replication.factor = 3
min.insync.replicas = 2