Sentry responds HTTP 200 with event ID but no event on dashboard

Hello,

My on-premise Sentry server installed on an Ubuntu 18.04.5 machine running through apache SSL proxy does receive events and replies to all HTTPS requests with code 200 and an event ID in JSON response body, but the dashboard won’t show any event.

Thanks

We’d need to see the logs for relay and kafka to make any speculations. We’d also need to know about which version of Sentry you are using.

2020-08-27T12:50:57Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:50:58Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:06Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:06Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:06Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:10Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:10Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:10Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:39Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:40Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:40Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:40Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:51:40Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:52:22Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:52:22Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:53:28Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:53:28Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:55:02Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:55:02Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:55:27Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:55:27Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:57:30Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T12:57:59Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:00:07Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:04:30Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:06:04Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:08:48Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:26:35Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:26Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:27Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:28Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:29Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:30Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:31Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:32Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:33Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:34Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:35Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
2020-08-27T13:46:36Z [relay_server::utils::kafka] ERROR: failed to produce message to Kafka (delivery callback): Message production error: MessageTimedOut (Local: Message timed out)
===> ENV Variables ...
ALLOW_UNSIGNED=false
COMPONENT=kafka
CONFLUENT_DEB_VERSION=1
CONFLUENT_PLATFORM_LABEL=
CONFLUENT_SUPPORT_METRICS_ENABLE=false
CONFLUENT_VERSION=5.5.0
CUB_CLASSPATH=/etc/confluent/docker/docker-utils.jar
HOME=/root
HOSTNAME=a0ac1f3a2a81
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
KAFKA_LOG4J_LOGGERS=kafka.cluster=WARN,kafka.controller=WARN,kafka.coordinator=WARN,kafka.log=WARN,kafka.server=WARN,kafka.zookeeper=WARN,state.change.logger=WARN
KAFKA_LOG4J_ROOT_LOGLEVEL=WARN
KAFKA_MAX_REQUEST_SIZE=50000000
KAFKA_MESSAGE_MAX_BYTES=50000000
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
KAFKA_TOOLS_LOG4J_LOGLEVEL=WARN
KAFKA_VERSION=
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
PYTHON_PIP_VERSION=8.1.2
PYTHON_VERSION=2.7.9-1
SCALA_VERSION=2.12
SHLVL=1
ZULU_OPENJDK_VERSION=8=8.38.0.13
_=/usr/bin/env
===> User
uid=0(root) gid=0(root) groups=0(root)
===> Configuring ...
===> Running preflight checks ... 
===> Check if /var/lib/kafka/data is writable ...
===> Check if Zookeeper is healthy ...
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=a0ac1f3a2a81
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_212
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Azul Systems, Inc.
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/zulu-8-amd64/jre
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/etc/confluent/docker/docker-utils.jar
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=5.3.0-1034-azure
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.free=56MB
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.max=864MB
[main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.total=59MB
[main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher@cc34f4d
[main] INFO org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
[main] INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
[main] INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.18.0.11:2181. Will not attempt to authenticate using SASL (unknown error)
[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /172.18.0.25:44370, server: zookeeper/172.18.0.11:2181
[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server zookeeper/172.18.0.11:2181, sessionid = 0x1000050aa4d0004, negotiated timeout = 40000
[main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1000050aa4d0004 closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000050aa4d0004
===> Launching ... 
===> Launching kafka ... 
[2020-08-12 14:10:57,956] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2020-08-12 14:10:58,518] WARN The package io.confluent.support.metrics.collectors.FullCollector for collecting the full set of support metrics could not be loaded, so we are reverting to anonymous, basic metric collection. If you are a Confluent customer, please refer to the Confluent Platform documentation, section Proactive Support, on how to activate full metrics collection. (io.confluent.support.metrics.KafkaSupportConfig)
[2020-08-12 14:10:58,518] WARN The support metrics collection feature ("Metrics") of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
[2020-08-12 14:10:59,584] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2020-08-12 14:10:59,678] INFO [kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner)
[2020-08-12 14:10:59,981] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2020-08-12 14:11:00,017] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(0.0.0.0,9092,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.network.SocketServer)
[2020-08-12 14:11:00,019] INFO [SocketServer brokerId=1001] Started 1 acceptor threads for data-plane (kafka.network.SocketServer)
[2020-08-12 14:11:00,119] INFO Creating /brokers/ids/1001 (is it secure? false) (kafka.zk.KafkaZkClient)
[2020-08-12 14:11:00,139] INFO Stat of the created znode at /brokers/ids/1001 is: 270,270,1597241460129,1597241460129,1,0,0,72057940492484613,180,0,270
 (kafka.zk.KafkaZkClient)
[2020-08-12 14:11:00,140] INFO Registered broker 1001 at path /brokers/ids/1001 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 270 (kafka.zk.KafkaZkClient)
[2020-08-12 14:11:00,416] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2020-08-12 14:11:00,492] INFO [SocketServer brokerId=1001] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2020-08-13 01:50:34,516] WARN Client session timed out, have not heard from server in 12001ms for sessionid 0x1000050aa4d0005 (org.apache.zookeeper.ClientCnxn)

Sentry 20.8.0.dev0a72e031

I guess the issue is from kafka. What’s kafka ?

There’s something else : after running docker-compose restart, I get HTTP 502 errors on developer endpoints but not on dashboard endpoints. That’s new.

The errors on Kafka indicate a zookeeper failure so let’s check its logs too?

Kafka is the main message transport between various Sentry services so if that doesn’t work, it is normal that you won’t see any events being processed.

Don’t know much about this one but if you are getting 5xx errors, it may be due to insufficient memory or disk space.

===> ENV Variables ...
ALLOW_UNSIGNED=false
COMPONENT=zookeeper
CONFLUENT_DEB_VERSION=1
CONFLUENT_PLATFORM_LABEL=
CONFLUENT_SUPPORT_METRICS_ENABLE=false
CONFLUENT_VERSION=5.5.0
CUB_CLASSPATH=/etc/confluent/docker/docker-utils.jar
HOME=/root
HOSTNAME=607596a80f2f
KAFKA_VERSION=
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
PYTHON_PIP_VERSION=8.1.2
PYTHON_VERSION=2.7.9-1
SCALA_VERSION=2.12
SHLVL=1
ZOOKEEPER_CLIENT_PORT=2181
ZOOKEEPER_LOG4J_ROOT_LOGLEVEL=WARN
ZOOKEEPER_TOOLS_LOG4J_LOGLEVEL=WARN
ZULU_OPENJDK_VERSION=8=8.38.0.13
_=/usr/bin/env
===> User
uid=0(root) gid=0(root) groups=0(root)
===> Configuring ...
===> Running preflight checks ... 
===> Check if /var/lib/zookeeper/data is writable ...
===> Check if /var/lib/zookeeper/log is writable ...
===> Launching ... 
===> Launching zookeeper ... 
[2020-08-27 14:41:55,989] WARN Either no config or no quorum defined in config, running  in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2020-08-27 14:41:56,952] WARN o.e.j.s.ServletContextHandler@4d95d2a2{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
[2020-08-27 14:41:56,953] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)

Very interesting, thanks !

Disk : 30GB total, 6.3GB used
RAM : 4GB total, 1.9GB used
It’s a Microsoft Azure VM.

Thanks !

Zookeeper seems fine unless it is killed
Are you able to try with more RAM? 4GB should be enough for light loads but I don’t know your load.

It’s not :
sentry_onpremise_zookeeper_1 /etc/confluent/docker/run Up 2181/tcp, 2888/tcp, 3888/tcp

What’s the issue then ?

When creating events and monitoring RAM usage with htop, I see no peak.
Do you really think it’s necessary ?
That will increase the bill.

What do you mean by load ?
Currently there are only 7 projects and all their history is empty.

Thanks

I mean, how would I know? :smiley: I’m not a magician, just making some guesses here on a system I never have seen or touched. Ultimately, you need to figure out what’s going on.

At this point you may want to try running docker-compose down --remove-orphans, remove any zookeeper related volumes and then bring things back up and see if it fixes the issues.

Considering the emergency, I could give you access to it.
Because, it’s very important to me that I get it working before week-end.

Will do.

I got this error while running the command :

ERROR: error while removing network: network sentry_onpremise_default id 20360ac805a287bcb663f7f7f804815a6317d11f2daad16a68ed1bf9f734dfcf has active endpoints

Then, docker volume ls returns :

DRIVER              VOLUME NAME
local               1c05db5471610b6d9a0d68f4bf1c1ccbb7aef6dd294833ef9e42ae1109dc933c
local               63949aff390dfd877bfa3766465aab13b3b8325f5f5ec6b99663b2cb9dee03e4
local               sentry-clickhouse
local               sentry-data
local               sentry-kafka
local               sentry-postgres
local               sentry-redis
local               sentry-symbolicator
local               sentry-zookeeper
local               sentry_onpremise_sentry-clickhouse-log
local               sentry_onpremise_sentry-kafka-log
local               sentry_onpremise_sentry-secrets
local               sentry_onpremise_sentry-smtp
local               sentry_onpremise_sentry-smtp-log
local               sentry_onpremise_sentry-zookeeper-log

So I rm-ed both volumes related to zonekeeper.

Now I have a new problem : after entering my TOTP code I’m being redirected back to e-mail/password form…

Edit : that’s a browser issue which happened on another computer, I’ll check it out later.

502s aren’t happening anymore, but I noticed that none of relay, kafka and zookeeper generate logs at the time which Sentry replies to events with HTTP 200.

New logs appeared recently :

snuba-consumer_1 | %3|1597288279.541|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Failed to resolve 'kafka:9092': Temporary failure in name resolution (after 125688ms in state CONNECT, 1 identical error(s) suppressed)

snuba-transactions-consumer_1 | %3|1597291599.204|FAIL|rdkafka#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve 'kafka:9092': Temporary failure in name resolution (after 446154ms in state CONNECT, 1 identical error(s) suppressed)

snuba-outcomes-consumer_1 | %3|1597288470.794|FAIL|rdkafka#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve 'kafka:9092': Temporary failure in name resolution (after 165000ms in state CONNECT, 1 identical error(s) suppressed)

snuba-replacer_1 | %3|1597288605.389|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Failed to resolve 'kafka:9092': Temporary failure in name resolution (after 157001ms in state CONNECT, 1 identical error(s) suppressed)

Oh, those look like quite weird networking issues. 40+ seconds of name resolution times are just ridiculous. Could you be having some virtual networking issues with docker-compose?

I have no idea, this is an empty VM in which I installed docker, docker-compose then getsentry/onpremise.

Theorically, there shouldn’t be any issue.

What can I do ?

Thanks

I did a quick google search for “docker-compose dns issues” and found this one: https://github.com/moby/moby/issues/41003

Seems like it is resolved in a recent version so maybe that would help?

My docker version is 19.03.12, which is the latest one released in 2020-06-18, and my docker-compose version is 1.26.2, which is also the latest one released in 2020-07-02.

Maybe you are having issue with you DNS provider, somehow affecting this?

I don’t see any DNS issue, for instance ping sentry.io works perfectly.