I’m running On Premise 20.8.0 since yesterday.
For the first hour everything was fine, my events got processed and showed up in my project.
However, that is not the case anymore.
Some logs:
nginx
:
nginx_1 | 10.10.2.65 - - [10/Sep/2020:07:45:11 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 172.20.0.1 - - [10/Sep/2020:07:45:12 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 10.10.2.65 - - [10/Sep/2020:07:45:13 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 172.20.0.1 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 10.10.2.65 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 172.20.0.1 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 10.10.2.65 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 172.20.0.1 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
nginx_1 | 10.10.2.65 - - [10/Sep/2020:07:45:14 +0000] "POST /api/2/store/ HTTP/1.0" 200 41 "-" "sentry-php/1.10.0"
etc. This seems to indicate events are being added?
ingest-consumer
:
ingest-consumer_1 | 07:45:43 [INFO] batching-kafka-consumer: Worker flush took 171ms
ingest-consumer_1 | 07:45:44 [INFO] batching-kafka-consumer: Flushing 6 items (from {(u'ingest-events', 0): [1377L, 1382L]}): forced:False size:False time:True
ingest-consumer_1 | 07:45:44 [INFO] batching-kafka-consumer: Worker flush took 87ms
ingest-consumer_1 | 07:45:45 [INFO] batching-kafka-consumer: Flushing 14 items (from {(u'ingest-events', 0): [1383L, 1396L]}): forced:False size:False time:True
ingest-consumer_1 | 07:45:46 [INFO] batching-kafka-consumer: Worker flush took 642ms
This seems it’s doing some ingesting?
Now if I docker-compose logs -f
I see no other output whatsoever (except for sometimes redis writing it’s data to disk), which makes me think one of the other parts is not doing it’s job?
The last part of the relay
log:
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
relay_1 | 2020-09-10T07:40:42Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 | caused by: failed to fetch project state from upstream
This is when I restarted the stack. After that nothing.
The last part of the kafka
log:
kafka_1 | [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher@cc34f4d
kafka_1 | [main] INFO org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
kafka_1 | [main] INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
kafka_1 | [main] INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
kafka_1 | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.20.0.19:2181. Will not attempt to authenticate using SASL (unknown error)
kafka_1 | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /172.20.0.11:38414, server: zookeeper/172.20.0.19:2181
kafka_1 | [main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server zookeeper/172.20.0.19:2181, sessionid = 0x1000000f2560000, negotiated timeout = 40000
kafka_1 | [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000000f2560000
kafka_1 | [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1000000f2560000 closed
kafka_1 | ===> Launching ...
kafka_1 | ===> Launching kafka ...
kafka_1 | [2020-09-10 07:40:50,319] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
kafka_1 | [2020-09-10 07:40:52,089] WARN The package io.confluent.support.metrics.collectors.FullCollector for collecting the full set of support metrics could not be loaded, so we are reverting to anonymous, basic metric collection. If you are a Confluent customer, please refer to the Confluent Platform documentation, section Proactive Support, on how to activate full metrics collection. (io.confluent.support.metrics.KafkaSupportConfig)
kafka_1 | [2020-09-10 07:40:52,090] WARN The support metrics collection feature ("Metrics") of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
kafka_1 | [2020-09-10 07:40:55,802] INFO Starting the log cleaner (kafka.log.LogCleaner)
kafka_1 | [2020-09-10 07:40:56,215] INFO [kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner)
kafka_1 | [2020-09-10 07:40:57,514] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
kafka_1 | [2020-09-10 07:40:57,598] INFO [SocketServer brokerId=1001] Created data-plane acceptor and processors for endpoint : EndPoint(0.0.0.0,9092,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.network.SocketServer)
kafka_1 | [2020-09-10 07:40:57,601] INFO [SocketServer brokerId=1001] Started 1 acceptor threads for data-plane (kafka.network.SocketServer)
kafka_1 | [2020-09-10 07:40:57,905] INFO Creating /brokers/ids/1001 (is it secure? false) (kafka.zk.KafkaZkClient)
kafka_1 | [2020-09-10 07:40:57,953] INFO Stat of the created znode at /brokers/ids/1001 is: 288,288,1599723657931,1599723657931,1,0,0,72057598103650305,180,0,288
kafka_1 | (kafka.zk.KafkaZkClient)
kafka_1 | [2020-09-10 07:40:57,954] INFO Registered broker 1001 at path /brokers/ids/1001 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 288 (kafka.zk.KafkaZkClient)
kafka_1 | [2020-09-10 07:41:00,235] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
kafka_1 | [2020-09-10 07:41:00,509] INFO [SocketServer brokerId=1001] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
Maybe this zookeeper
log is interesting:
zookeeper_1 | ===> Launching ...
zookeeper_1 | ===> Launching zookeeper ...
zookeeper_1 | [2020-09-10 07:39:55,044] WARN Either no config or no quorum defined in config, running in standalone mode (org.apache.zookeeper.server.quorum.QuorumPeerMain)
zookeeper_1 | [2020-09-10 07:39:56,423] WARN o.e.j.s.ServletContextHandler@4d95d2a2{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
zookeeper_1 | [2020-09-10 07:39:56,423] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)
zookeeper_1 | [2020-09-10 07:39:57,217] WARN Exception causing close of session 0x0: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn)
zookeeper_1 | [2020-09-10 07:39:57,219] WARN Unexpected exception (org.apache.zookeeper.server.WorkerService)
zookeeper_1 | java.lang.NullPointerException
zookeeper_1 | at org.apache.zookeeper.server.ZooKeeperServer.removeCnxn(ZooKeeperServer.java:135)
zookeeper_1 | at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:602)
zookeeper_1 | at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:375)
zookeeper_1 | at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530)
zookeeper_1 | at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:155)
zookeeper_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
zookeeper_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
zookeeper_1 | at java.lang.Thread.run(Thread.java:748)
I first thought my system had not enough RAM (4GB) since it was almost full all the time so I increased it to 8 GB. Now it’s not full but doing the same thing.
I can give logs of other containers if needed.
Anyone has any clue what is going on?