[dns-operations] Retrying with TCP on timeout?

Fri Apr 20 11:11:48 UTC 2018

On Fri, Apr 20, 2018 at 12:51:03PM +0200, Stephane Bortzmeyer wrote:
> DNS resolvers retry with TCP when they receive a response with the TC
> (truncated) bit. But when the authoritative name servers timeout?
> 
> Use case: the two authoritative name servers for .pf no longer reply
> over UDP, only TCP. Apparently, no resolver (I tried several, plus
> the RIPE Atlas probes) retry with TCP.

Thanks for this datapoint. On a sort of generic note, resolvers have to
balance how hard they work on broken domains with how well they serve
correctly functioning domains.

Let's say 20 queries come in for broken domains where half the nameservers
time out, a further quarter is lame and only 1 server answers, but only
after dropping the EDNS buffer size in three increments.

This may represent the same amount of CPU and memory cycles as doing 500
lookups for domains that have properly functioning nameservers.

It is my view we should reward that second part with stellar performance.

Everytime we add logic to improve the functioning or broken domain names,
say by falling back to TCP just to see if that helps, we penalise the well
working domains, and further inform the world of DNS hosters that
'everything goes'.

Now, is this tradeoff real? On a resolver on a serious CPU at home, serving
20 devices, you can keep both communities happy. But on a resolver CPU at a
provider cranking out >50000 RD responses/second, these are real choices.

     Bert