[dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region

Thu Jul 20 02:09:44 UTC 2023

> On 20 Jul 2023, at 01:23, Shumon Huque <shuque at gmail.com> wrote:
> 
> On Wed, Jul 19, 2023 at 3:28 AM Shane Kerr <shane at time-travellers.org> wrote:
> Shumon and all,
> 
> On 18/07/2023 21.41, Shumon Huque wrote:
> > On Tue, Jul 18, 2023 at 3:29 PM Viktor Dukhovni <ietf-dane at dukhovni.org 
> > <mailto:ietf-dane at dukhovni.org>> wrote: 
> > 
> > Yes, I agree. A resolver can't really tell that a response with an 
> > expired signature wasn't an attacker trying to replay old data. For 
> > robustness against attacks, it must re-query other available other 
> > servers if they exist.
> 
> I kind of think that a resolver using UDP should just drop a response on 
> the floor if it has an expired signature. Otherwise an attacker can 
> induce behavior change by spoofing replies, which is itself a security 
> problem (in this case, blocking with a response that would arrive later 
> and work, effectively removing a name server from the set of name 
> servers queried for a given lookup).
> 
> The problem is that in the general case the resolver can't really tell if
> this was an attack or a misconfiguration. So, it's best to build in robust
> behavior to deal with the case more generally. Which in my opinion is
> "drop the response on the floor, maybe blacklist the server for a while,
> and retry the next server". If a later valid response does come, then be
> prepared to accept it (if you've still held on to the query).
> 
> This idea mostly applies to UDP without DNS cookies since it is the only 
> transport easily vulnerable to spoofing. With other transports you are 
> much more sure that the answer actually came from the server you are 
> querying, and so you can be confident that the server is giving out 
> bogus answers. (TCP is vulnerable to BGP hijacking and the like, but in 
> that case you would still expect to get bogus answers for subsequent 
> queries to the same server.)
> 
> Well, there are inline attackers as well and DNSSEC is designed to protect
> against those too. If this only applied to UDP, then other protocols that use
> connection oriented transport or session oriented frameworks on top of UDP
> would not bother with cryptographic authentication either. And yet, they all
> do.
> 
> Unfortunately I don't think any resolvers hold onto a UDP query until 
> after the DNSSEC validation. So there is not really much option other 
> than to try again. 🤓
> 
> Yeah, but I don't really see that as a problem. I see concerns have been
> raised in this thread about the NXNAME attack and such dissuading 
> resolver implementers from more retries, but in my view the only thing
> that taught us is that resolver implementers need to go back to first
> principles and sensibly bound the amount of work they are willing to do,
> not eliminate retries.

Lookups take enormous numbers of queries these days.  A support customer
was asking why a lookup wasn’t completing within 3 seconds.  The resolution
process took 48 queries with a cold cache.  Involved several CDNs and required
fetching nameserver addresses in several different TLDs.  There where no retries
in that count.

CNAME chains are expensive but we have a whole industry that has fallen in love
with them.

Yes, we do have query limits but they need to be large to handle this sort of
stuff.

> To quote RFC 1034 (published in 1987):
> 
> "The recommended priorities for the resolver designer are:
> 
>    1. Bound the amount of work (packets sent, parallel processes
>       started) so that a request can't get into an infinite loop or
>       start off a chain reaction of requests or queries with other
>       implementations EVEN IF SOMEONE HAS INCORRECTLY
>       CONFIGURED SOME DATA."
> 
> Note: the capitalized phrase for emphasis.
> 
> (My addendum: they should also bound the time spent.)
> 
> Transient configuration problems are _pervasive_ in deployed DNS
> infrastructure. If BIND for example did not have the robust retry
> behavior that Mark Andrews documented upthread, we could never
> have used them in our infrastructure. In my experience Unbound also
> had the same robustness, but I'm now a little concerned by the description
> of the Unbound failures reported by Gavin M during this Verisign incident.
> Maybe they have limited retries too aggressively? It would be good to 
> get some NLnetLabs colleagues to chime in with a description of their
> behavior.
> 
> Shumon.
> 
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: marka at isc.org