Sentry on k8s (Helm): timeout error on snuba-api

Hello.

I’m trying to deploy Sentry 10 on K8s using Helm (https://github.com/aarroyoc/sentry-helm). However, I’m stuck at an error. Sentry loads fine, but calls to snuba-api give a timeout error and I don’t know what is missing.

This is the only meaningful log I was able to get (sentry-web):

 Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/sentry/api/base.py", line 89, in handle_exception
    response = super(Endpoint, self).handle_exception(exc)
  File "/usr/local/lib/python2.7/site-packages/rest_framework/views.py", line 449, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python2.7/site-packages/sentry/api/base.py", line 196, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sentry/api/endpoints/organization_tags.py", line 24, in get
    use_cache=request.GET.get("use_cache", "0") == "1",
  File "/usr/local/lib/python2.7/site-packages/sentry/utils/services.py", line 104, in <lambda>
    context[key] = (lambda f: lambda *a, **k: getattr(self, f)(*a, **k))(key)
  File "/usr/local/lib/python2.7/site-packages/sentry/tagstore/snuba/backend.py", line 374, in get_tag_keys_for_projects
    **optimize_kwargs
  File "/usr/local/lib/python2.7/site-packages/sentry/tagstore/snuba/backend.py", line 290, in __get_tag_keys_for_projects
    **kwargs
  File "/usr/local/lib/python2.7/site-packages/sentry/utils/snuba.py", line 621, in query
    **kwargs
  File "/usr/local/lib/python2.7/site-packages/sentry/utils/snuba.py", line 526, in raw_query
    return bulk_raw_query([snuba_params], referrer=referrer)[0]
  File "/usr/local/lib/python2.7/site-packages/sentry/utils/snuba.py", line 558, in bulk_raw_query
    query_results = [snuba_query(query_param_list[0])]
  File "/usr/local/lib/python2.7/site-packages/sentry/utils/snuba.py", line 551, in snuba_query
    raise SnubaError(err)
SnubaError: HTTPConnectionPool(host='snuba-api', port=1218): Read timed out. (read timeout=30) 
127.0.0.1 - - [21/Jan/2020:09:07:34 +0000] "GET /api/0/organizations/sentry/tags/?use_cache=1 HTTP/1.1" 500 617 "http://localhost:9000/organizations/sentry/issues/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0"

If I connect to the snuba-api pod and I install curl, doing curl http://localhost:1218 also gives a timeout error.

Any hint of what’s missing?

Can you try running snuba-api with the --debug flag to see its output? Right now it seems like snuba-api is not running or responding properly and it can be due to many things.

I’m not able to run snuba-api with the --debug flag. Setting container args to [“api”, “–help”] doesn’t start the pod.

@aarroyoc - we had 2 faulty images last week so maybe you just got unlucky? Can you try updating your Snuba image from the latest and try again?

Just updated, no differences

Well, unless you can provide some logs I can’t really help more. We know that the Snuba images work so it has to do something with how you are setting all this up. I really want to help but you need to provide us something to work with.

Hi,
I’m trying to install this chart on GKE v.1.13.11-gke.14 and got an error:

Error: chart requires kubeVersion: 1.x which is incompatible with Kubernetes v1.13.11-gke.14

Could you please share your versions and advice how to fix that error?

Thx.

Ok, after more investigation, leaving the snuba-api container running for some time gives the following log:

Running Snuba API server with default arguments: --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ '[' api = bash ']'
+ '[' a = - ']'
+ '[' api = api ']'
+ '[' 1 -gt 1 ']'
+ _default_args='--socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive'
+ echo 'Running Snuba API server with default arguments: --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive'
+ set -- uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ set -- uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
+ snuba uwsgi --help
+ exec gosu snuba uwsgi --master --manage-script-name --wsgi-file snuba/web/wsgi.py --die-on-term --socket /tmp/snuba.sock --http 0.0.0.0:1218 --http-keepalive
*** Starting uWSGI 2.0.17 (64bit) on [Wed Jan 29 11:22:42 2020] ***
compiled with version: 8.3.0 on 29 January 2020 04:12:42
os: Linux-4.15.0-1052-azure #57-Ubuntu SMP Tue Jul 23 19:07:16 UTC 2019
nodename: snuba-api-7cb67b4b5b-qdhh2
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 2
current working directory: /usr/src/snuba
detected binary path: /usr/local/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:1218 fd 3
uwsgi socket 0 bound to UNIX address /tmp/snuba.sock fd 6
Python version: 3.7.6 (default, Jan  3 2020, 23:35:31)  [GCC 8.3.0]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x5634ea369fd0
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 145808 bytes (142 KB) for 1 cores
*** Operational MODE: single process ***
initialized 38 metrics
WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x5634ea369fd0 pid: 1 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 13, cores: 1)
metrics collector thread started
spawned uWSGI http 1 (pid: 16)
2020-01-29 11:37:05,172 Error running query: SELECT (arrayJoin(tags.key) AS tags_key), (count() AS count) FROM sentry_local SAMPLE 1 PREWHERE project_id IN (1) WHERE project_id IN (1) AND timestamp >= toDateTime('2020-01-15T11:27:04') AND timestamp < toDateTime('2020-01-29T11:27:04') AND deleted = 0 GROUP BY (tags_key) ORDER BY count DESC LIMIT 0, 1000
Code: 209. None (clickhouse:8123)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 237, in connect
    self.receive_hello()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 312, in receive_hello
    packet_type = read_varint(self.fin)
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/reader.py", line 30, in read_varint
    i = f.read_one()
  File "clickhouse_driver/bufferedreader.pyx", line 66, in clickhouse_driver.bufferedreader.BufferedReader.read_one
  File "clickhouse_driver/bufferedreader.pyx", line 193, in clickhouse_driver.bufferedreader.BufferedSocketReader.read_into_buffer
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./snuba/web/query.py", line 140, in raw_query
    with_totals=request.query.has_totals(),
  File "./snuba/clickhouse/native.py", line 171, in execute
    sql, with_column_types=True, settings=settings, **kwargs
  File "./snuba/clickhouse/native.py", line 66, in execute
    raise e
  File "./snuba/clickhouse/native.py", line 60, in execute
    result = conn.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/client.py", line 196, in execute
    self.connection.force_connect()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 171, in force_connect
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 245, in connect
    '{} ({})'.format(e.strerror, self.get_description())
clickhouse_driver.errors.SocketTimeoutError: Code: 209. None (clickhouse:8123)
Wed Jan 29 11:37:05 2020 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /query (ip 10.244.0.127) !!!
Wed Jan 29 11:37:05 2020 - uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 306] during POST /query (10.244.0.127)
OSError: write error
[pid: 13|app: 0|req: 1/1] 10.244.0.127 () {32 vars in 438 bytes} [Wed Jan 29 11:27:04 2020] POST /query => generated 0 bytes in 600374 msecs (HTTP/1.1 500) 2 headers in 0 bytes (0 switches on core 0)

This log makes me think that snuba-api is unresponsive because clickhouse is unresponsive. I found some error logs, related to the being able to listen on IPv6 addresses. I changed the config.xml through a config-map, so I don’t get any errors logged, but still snuba-api is unresponsive and keeps logging that message.

For reference, this is the config.xml I’ve used:

    <yandex>
      <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
      <listen_host>0.0.0.0</listen_host>
      <listen_try>1</listen_try>

      <logger>
          <console>1</console>
      </logger>
    </yandex>

It seems more like a network connection issue to me. Are you sure Clickhouse is listening on port 8123 as its default is 9000?

Hello @aarroyoc, were you able to get sentry 10 working. I am also in the process of doing this and looking for ways to start

@aarroyoc +1, I’m also wanting to install sentry 10 to k8s, is there any new progress?

Hey, I have sentry 10 working in k8s now. For the error you should make sure click is running and you pass the right service to all sentry services

So you use this Helm repo to install?
And my another concern of the upgrading to 10 is that it add way more dependencies compared to 9. I used to have only external postgres & redis on 9 and it had a great performance. Is there gonna be any performance issues if all the 10 dependencies deployed in containers?

Sorry for the delay, Yes, I have Sentry 10 running with a Helm chart and it’s available online. However, my chart is not very customizable, so maybe you want to copy it and modify as you wish. Anyway:

Telefónica Helm Chart for Sentry 10