[dns-operations] resimprove and Re: DNS Flush Protocol

Sat Mar 28 00:09:29 UTC 2015

Edward Lewis wrote:
> On 3/27/15, 16:00, "Paul Vixie" <paul at redbarn.org> wrote:
>
>> ... have you read
>> <http://datatracker.ietf.org/doc/draft-vixie-dnsext-resimprove/> and do
>> you have comments?
>
> Speaking for myself ... - I read it back in time. I distinctly remember ... that I didn't support it's ideas.
>
> I get the idea of using the cutpoint TTL to determine when to refresh apex data. I get that if a zone is hijacked, this would lower recovery time.

not just hijacked. see also "oops".

> ...
> I more cling to the sanctity of a delegation. When a parent delegates to a child, the child is the master over it's contents, not the parent. Outside of permitting the delegation (holding an NS set) and indicating its security entry parameters, all else is property of the child.

so, the NS set is ok to mess with, the DS set is ok to mess with, but
the NS TTL is not? i cannot imagine how you differentiate the
parent-vs-child "mastery" in these three cases.

> ...
> A resolver does have free-will to do what it needs to find answers to queries. It could choose to over amp queries if it wants up to date information at all times (whether for rainy days or just because). Most do judging from work I've done in the past. (E.g., very, very few queriers honored a multi-day TTL I had for a particular record set.) (Had I known this, it would have made difference in how I set TTL values.)

this is an unrelated topic, but it does make me want some of our
research brethren to look at the query logs on various busy authority
servers (root, tld, uni, ent, isp) and characterize the treatment of DNS
TTL by isolating same-question same-questioner "flows". such a study
would have to ignore the repeated-query flows from recursives who are
behind firewalls and therefore don't hear answers to their questions.
(that's "many.")

> But I think that it is unwise to optimize for "rainy days" when "sunny days" are more frequent.
> ...
> ... unless you can show me statistics that say hijacked zones matter enough, I'm not going to optimize that direction.

let's be clear. i'm suggesting that if a registrant could cause a
near-global cache flush of a zone by merely visiting their registrar
interface and adding or deleting an NS RR (perhaps with a plan of
deleting it or adding it back a day later), this would be a great way to
solve the "oops" problems we see. for example:

> -------- Original Message --------
> Subject: 	[dns-operations] resolver ops: please refresh gov.on.ca
> Date: 	Fri, 12 Dec 2014 21:49:24 -0500
> From: 	Mark E. Jeftovic <markjr at easydns.com>
> Organization: 	easyDNS Technologies Inc.
> To: 	dns-operations at lists.dns-oarc.net <dns-operations at dns-oarc.net>
>
>
>
> All resolver nameserver operators, if you could refresh your caches for gov.on.ca
>
> There has been an incident where the government of ontario nameservers were briefly hijacked
>
> We will post details to follow
>
> in the meantime, if you can refresh your caches, the proper records should be: [...]

your question about cost:benefit is interesting. if we assume that
verisign changed its TTL for delegations under COM from two weeks to two
hours, or if the EPP system would allow registrars (but *not*
registrants) to set it and this became a common CYA value (Cover Your
A$$) in case of crime or "oops", then we could expect one new query per
actively-caching full resolver per two-hour period. if 50M of COM's
subdomains are accessed by 10M full resolvers within any given two hour
period, that's 30M queries per hour added to COM server load which is
~8K queries per second. i think the COM servers (counting all anycast
servers of all COM name server names) receive O(10^6) queries per second
today, most of which are junk (negative responses, or responses that
can't be delivered to the initiator due to initiator-side firewall
problems).

if we replay this without O(10^7) servers, either because many of them
don't cache more than a handful of COM names during any given two hour
period (so, DSL and cable modems), or because most of them will never be
upgraded to follow the convention i am suggesting regarding
delegation-TTL based purge, then the actual number of queries drops from
~8K/sec (which is de minimis) to something ~0. likewise if the two-hour
working set of COM names is less than O(5x10^7) then the cost of wide
adoption of the proposed convention is ~0.

in summary, this proposed convention does not optimize for rainy days.
its costs are noise-level, especially compared to the junk queries all
over the DNS today. its benefits will be rare, but important when seen.

-- 
Paul Vixie