Ok, after more investigation, leaving the snuba-api container running for some time gives the following log:
Running Snuba API server with default arguments: --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ '[' api = bash ']'
+ '[' a = - ']'
+ '[' api = api ']'
+ '[' 1 -gt 1 ']'
+ _default_args='--socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive'
+ echo 'Running Snuba API server with default arguments: --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive'
+ set -- uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ set -- uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ snuba uwsgi --help
+ exec gosu snuba uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
*** Starting uWSGI 2.0.17 (64bit) on [Wed Jan 29 11:22:42 2020] ***
compiled with version: 8.3.0 on 29 January 2020 04:12:42
os: Linux-4.15.0-1052-azure #57-Ubuntu SMP Tue Jul 23 19:07:16 UTC 2019
nodename: snuba-api-7cb67b4b5b-qdhh2
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 2
current working directory: /usr/src/snuba
detected binary path: /usr/local/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:1218 fd 3
uwsgi socket 0 bound to UNIX address /tmp/snuba.sock fd 6
Python version: 3.7.6 (default, Jan 3 2020, 23:35:31) [GCC 8.3.0]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x5634ea369fd0
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 145808 bytes (142 KB) for 1 cores
*** Operational MODE: single process ***
initialized 38 metrics
WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x5634ea369fd0 pid: 1 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 13, cores: 1)
metrics collector thread started
spawned uWSGI http 1 (pid: 16)
2020-01-29 11:37:05,172 Error running query: SELECT (arrayJoin(tags.key) AS tags_key), (count() AS count) FROM sentry_local SAMPLE 1 PREWHERE project_id IN (1) WHERE project_id IN (1) AND timestamp >= toDateTime('2020-01-15T11:27:04') AND timestamp < toDateTime('2020-01-29T11:27:04') AND deleted = 0 GROUP BY (tags_key) ORDER BY count DESC LIMIT 0, 1000
Code: 209. None (clickhouse:8123)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 237, in connect
self.receive_hello()
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 312, in receive_hello
packet_type = read_varint(self.fin)
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/reader.py", line 30, in read_varint
i = f.read_one()
File "clickhouse_driver/bufferedreader.pyx", line 66, in clickhouse_driver.bufferedreader.BufferedReader.read_one
File "clickhouse_driver/bufferedreader.pyx", line 193, in clickhouse_driver.bufferedreader.BufferedSocketReader.read_into_buffer
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./snuba/web/query.py", line 140, in raw_query
with_totals=request.query.has_totals(),
File "./snuba/clickhouse/native.py", line 171, in execute
sql, with_column_types=True, settings=settings, **kwargs
File "./snuba/clickhouse/native.py", line 66, in execute
raise e
File "./snuba/clickhouse/native.py", line 60, in execute
result = conn.execute(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/client.py", line 196, in execute
self.connection.force_connect()
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 171, in force_connect
self.connect()
File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 245, in connect
'{} ({})'.format(e.strerror, self.get_description())
clickhouse_driver.errors.SocketTimeoutError: Code: 209. None (clickhouse:8123)
Wed Jan 29 11:37:05 2020 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /query (ip 10.244.0.127) !!!
Wed Jan 29 11:37:05 2020 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306] during POST /query (10.244.0.127)
OSError: write error
[pid: 13|app: 0|req: 1/1] 10.244.0.127 () {32 vars in 438 bytes} [Wed Jan 29 11:27:04 2020] POST /query => generated 0 bytes in 600374 msecs (HTTP/1.1 500) 2 headers in 0 bytes (0 switches on core 0)
This log makes me think that snuba-api is unresponsive because clickhouse is unresponsive. I found some error logs, related to the being able to listen on IPv6 addresses. I changed the config.xml through a config-map, so I don’t get any errors logged, but still snuba-api is unresponsive and keeps logging that message.
For reference, this is the config.xml I’ve used:
<yandex>
<!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
<listen_host>0.0.0.0</listen_host>
<listen_try>1</listen_try>
<logger>
<console>1</console>
</logger>
</yandex>