[dns-operations] caches only resetting TTL? was Re: Whereto find "DNS resolution path corruption"?

Wed Feb 27 14:45:19 UTC 2008

> I don't agree with Paul's solution of allways going for the root resolution
> path when a record expires.  The reason being that it is more expensive.

really, it's not.

> A resolver then needs to do 3 itterative lookups, where in 99% of the time
> it can do with 1.

nope.  let me clarify.  for every NS RRset in a recursive server's cache, 
store two new attributes: referring zone, and referring zone ttl.  do not
update these from zone apex information, only from zone ancestor data.  if,
when fetching something from the cache for use in an answer, authority, or
additional data section of some response, you cross an name containing an
NS RRset whose referring zone ttl has ticked down to zero, then stop what
you're doing and go re-issue a query for the name/class/type you're trying
to access from your own cache, against the closest ancestor nameservers
above the NS RRset you're trying to cross on your way to that data.  when
you get resolution (could be a refresh of the delegation, could be RCODE=3)
then restart the processing of your original query.

the original goal of this exercise was and is different from solving the
particular problem described on this thread, but it happens to work for this
case also.  the original goal was, if a registrar puts a 300 second TTL on
some new delegation NS RRset, because it's from a new registrant and
statistically speaking new registrants are spammers, phishers, or other
nogoodniks, and if the registrant is in fact a phisher and the domain is
W1ND0WSUPDATE.COM and they put a TTL of 30 days on their WWW A RR, and if
folks call the registrar and complain and get the zone removed, then i want
my resolver to purge everything it knows about this zone after 300 seconds,
rather than keeping copies of it for 30 days.

olafur gudmundson, WG chair of IETF DNSEXT, has offered me a beer if i write
this up.  since i'm busy writing drafts after the philly meeting deadline, i
guess i can write one more, covering this topic.  but meanwhile, rest assured
that it's not really very much traffic, or complexity, and has an overall
low cost and high benefit.

> The mechanisms implemented in the resolvers are RFC compliant, and
> efficient, as long as the zone administrators are responsible for their
> zones.

that may be, but, the RFC's don't specifically mention the area covered by my
proposed change -- so my change is compliant also.

> And as Mark sugests, I don't think we should adapt a resolver's behaviour
> just because people do bad things. Resolvers become way too complex if we
> need to take into account all the bad things people can do to their
> zones. KISS.

and yet, the correct operation of a resolver operator's corner of the internet
can be affected by the incorrected operation of a authority server's corner
of the internet.  this is a case where jon postel's maxim is 180 degrees off
course -- in DNS, one should be conservative in what one accepts.

> The only thing I could immagine that can be of help is when authoritative
> nameservers give out warning messages when it is running primary for a zone
> that is not delegated to them, but then again, that's just as expensive to
> detect, and needs to be switched off for zones that are run like that
> intentionally.

i know of no way to do this reliably.  folks have asked for it in BIND for a
decade or longer... but breadth first downward traversal is a diagnostic only
kind of capability, and when it doesn't reach a particular server name or
server IP address, it's still possible that that server is in the data chain.