relay_1 | 2021-01-07T12:06:47Z [actix::actors::resolver] WARN: Can not create system dns resolver: io error
This log line indicates that the underlying library will not use the system resolver (docker’s one!), but will fall back to a default which will be Google’s DNS which obviously can’t resolve Docker compose internal web.
I read lots of Rust code, even built some Rust to test things out, but could not figure out how the io error from the logs is triggered.
But I found that relay supports multiple http client libraries as it is moving from actix-web-client to reqwest. The DNS problems come from the resolver that actix-web uses so I configured relay with the other library like this in relay/config.yml:
http:
_client: "reqwest"
Docker compose down/up, then relay uses the other http library and (after a couple of seconds) it works!
Changing the http client library actually resolves the DNS AAAA request issue, but the DNS resover issue still appears:
[actix::actors::resolver] WARN: Can not create system dns resolver: io error
And the relay service still does not operate well. The error message changed to:
[relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
This error did not happened before, so this might be a different issue or gets caused by changing the http client library, I do not know.
Full log with logging set to DEBUG. The following lines appear over and over again.
[relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
[relay_server::actors::upstream] DEBUG: Recovering from network outage.
[relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
btw: do I need to add the credentials from relay/credentials.json anywhere?
Today I stumbled across this issue while migrating our on-prem installation onto another system and upgrading from 9.0 to 20.12.1.
I got the exact same error as shown in JanMalte’s post.
The key here is this line:
relay_1 | 2021-01-07T12:06:47Z [actix::actors::resolver] WARN: Can not create system dns resolver: io error
After having spent most of the day debugging this issue, to me it looks like this:
relay version 20.12.1 uses actix version 0.7.9 which in turn uses trust-dns-resolver(1) 0.10.0. This version doesn’t recognize newer resolver options in resolv.conf - in my case it was the trust-ad option that led to an exception InvalidOption(17). That exception is unfortunately masked by actix and only leads to the generic Can not create system dns resolver: io error message. I compiled and ran an example resolver(2) provided by trust-dns to debug this issue to confirm that removing the problematic option makes the error go away.
To workaround this issue, I have added this section to my docker-compose.yml:
relay:
entrypoint:
- sh
- -c
- |
# NOTE: Using temp file since `sed -i` fails with `Device or resource busy`.
tmp_resolv_conf="$$(mktemp)"
sed '/^options /s/\<trust-ad\>//g' /etc/resolv.conf >"$$tmp_resolv_conf"
cp "$$tmp_resolv_conf" /etc/resolv.conf
rm -f "$$tmp_resolv_conf"
exec bash /docker-entrypoint.sh run
This simply removes the trust-ad option from resolv.conf and this way makes the resolver errors go away and event ingestion working again.
A real fix would involve updating trust-dns-resolver to a newer version. I tested the latest version, 0.20.0, using the CLI client(3) provided with this version and confirmed it parses resolv.conf correctly even with trust-ad option enabled.
But 0.20.0 is only provided with actix starting from 0.11.0-beta.1 (the latest at the time of writing). I don’t know if there’s an earlier version of trust-dns-resolver that already has a fix and is shipped with earlier versions of actix…
I can confirm that @thomasw 's solution worked for me. Swapping over to the reqwest http client as @stefanw suggested did not however. I continually got the No valid json body error as JanMalte reported.
I’m not sure what the best upstream fix is, but this works for me for now.
@BYK Adding the workaround upstream would be very welcome so I can get rid of the modification in my docker-compose.yml. I’d think adding it to the docker image’s entrypoint script should be most appropriate.
Also, I dug a little bit deeper and figured that trust-dns-resolver uses another crate, resolv-conf, to parse resolv.conf. This added support for trust-ad in version 0.7.0, which is the latest at the time of writing. In addition it also added support for the no-reload option which I haven’t seen in the wild yet, but might good to know about nevertheless. Version 0.7.0 is used by trust-dns-resolver starting from 0.20.0 (which is also the latest) which in turn is used as already said by actix starting from 0.11.0-beta.1 (also the latest). So, given it might take a while until a version of actix containing a fix is ready for prime time it might indeed be worth adding the workaround.
@thomasw Do you know you can override most docker-compose settings in a separate docker-compose.override.yml file?
This i the was I did for applying the workaround without messing with the upstream docker-compose.yml.
docker-compose.override.yml
version: '3.4'
services:
relay:
sysctls:
net.ipv6.conf.all.disable_ipv6: 1
net.ipv6.conf.default.disable_ipv6: 1
net.ipv6.conf.lo.disable_ipv6: 1
entrypoint:
- sh
- -c
- |
# NOTE: Using temp file since `sed -i` fails with `Device or resource busy`.
tmp_resolv_conf="$$(mktemp)"
sed '/^options /s/\<trust-ad\>//g' /etc/resolv.conf >"$$tmp_resolv_conf"
cp "$$tmp_resolv_conf" /etc/resolv.conf
rm -f "$$tmp_resolv_conf"
exec bash /docker-entrypoint.sh run
Just modify your relay/config.yml file accordingly and let us know how it goes please. If this causes any headaches, I’ll revert back to resolv.conf modification path.
@BYK I did as you suggested, reverted my modifications to docker-compose.yml and modified relay/config.yml so it uses reqwest.
Unfortunately I get largely the same results as @JanMalte reported previously.
Full log of relay server:
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 2ms in state CONNECT)
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 2ms in state CONNECT)
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2021-01-14T16:40:19Z [actix::actors::resolver] WARN: Can not create system dns resolver: io error
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 57ms in state CONNECT)
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 57ms in state CONNECT)
2021-01-14T16:40:19Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2021-01-14T16:40:19Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
2021-01-14T16:40:19Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
2021-01-14T16:40:20Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2021-01-14T16:40:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2021-01-14T16:40:20Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2021-01-14T16:40:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2021-01-14T16:40:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: operation timed out
2021-01-14T16:40:25Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:40:27Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:40:30Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 0ns
2021-01-14T16:40:30Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:40:30Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 1s
2021-01-14T16:40:31Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 1.5s
2021-01-14T16:40:33Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 2.25s
2021-01-14T16:40:35Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 3.375s
2021-01-14T16:40:36Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:40:39Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 5.0625s
2021-01-14T16:40:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:40:44Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 7.59375s
2021-01-14T16:40:50Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 30 identical error(s) suppressed)
2021-01-14T16:40:50Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 30 identical error(s) suppressed)
2021-01-14T16:40:50Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 30 identical error(s) suppressed)
2021-01-14T16:40:50Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.18.0.12:9092 failed: Connection refused (after 0ms in state CONNECT, 30 identical error(s) suppressed)
2021-01-14T16:40:51Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 11.390625s
2021-01-14T16:40:55Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
2021-01-14T16:41:03Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 17.0859375s
2021-01-14T16:41:17Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): operation timed out
2021-01-14T16:41:25Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 25.62890625s
2021-01-14T16:41:47Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: could not send request using reqwest
caused by: error sending request for url (http://web:9000/api/0/relays/register/challenge/): operation timed out
2021-01-14T16:41:55Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 38.443359375s
2021-01-14T16:42:26Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:43:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:44:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:45:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:46:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:47:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:48:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:49:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2021-01-14T16:50:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
As can be seen in the logs it uses reqwest as intended but upstream requests fail with No valid json body resulting in event ingestion to fail…