[dns-operations] resimprove and Re: DNS Flush Protocol

Mon Mar 30 14:20:12 UTC 2015

On 3/27/15, 20:09, "Paul Vixie" <paul at redbarn.org> wrote:
>Edward Lewis wrote:
>> On 3/27/15, 16:00, "Paul Vixie" <paul at redbarn.org> wrote:
>>
>>not just hijacked. see also "oops".

My response began with objecting to the notion that we should ignore
measurements of how the Internet was working.

Orthogonally, one can design to optimize for rainy days or sunny days.
When a system is running well, sunny days are more common and arguably
tradeoffs in design should optimize for this.

The proposed recommendation that a cache refresh delegation information
based on the smaller of the parent's TTL and child's TTL optimizes for the
rainy day situations of "oops" or "hijacks."  From inside the plumbing of
the protocol, those two use cases are identical, the difference is above
the protocol - whether the actor is authorized or not and when the
authorized agent discovers the incident.  "Oops" may be mostly reported by
owner-operators, "hijacks" by they or DNS hosting providers.

Measurements of how often "oops" or "hijacks" would be nice to have before
committing to optimizing for them.  While a particular incident of either
will garner more attention than a proper change to delegation data, this
isn't representative of the relative incidence.

I conjecture that "oops" and "hijacks" are less common that proper
changes.  (I don't have a measurement nor can imagine a reasonable way to
measure this.)  Because of this, I would not use the parent's TTL (if
lower) as the lower bound on refreshing delegation data.  If I am wrong,
and "oops" and "hijacks" are more common than I believe, then I think the
problem is much larger.

If the protocol is subject to considerable operator error or falsified
updates, IMHO, attention is needed to the design of the protocol, tools
available, and operating security procedures.  As for the latter, I'm
aware much attention has gone to that - harking back to the
"SQL-injection" attacks resulting in a rash of hijacks not too many years
ago.

As for "oops" - start with making tools better and then note there are
more organizations specializing in DNS management that can be used.
Instead of penalizing all operators because the protocol has rough and
sharp edges, take steps to limit the rainy days.

This is assuming that the rainy days are much less common than the sunny
days.

>so, the NS set is ok to mess with, the DS set is ok to mess with, but
>the NS TTL is not? i cannot imagine how you differentiate the
>parent-vs-child "mastery" in these three cases.

I lost you at "mess with" - I'm not sure what you are saying.

>this is an unrelated topic, but it does make me want some of our
>research brethren to look at the query logs on various busy authority
>servers (root, tld, uni, ent, isp) and characterize the treatment of DNS
>TTL by isolating same-question same-questioner "flows". such a study
>would have to ignore the repeated-query flows from recursives who are
>behind firewalls and therefore don't hear answers to their questions.
>(that's "many.")

I've done work in that field already, but not in a way I can disclose
particulars. I mention this only because it might explain why I have
different opinions.

More or less, queriers by and large ignore TTLs.  With at least unbound
capping at 1 day and BIND at (if it still does) 1 week, longer TTL
settings are useless.  In the top 10 queriers (of what I measured) are
many non-DNS cache servers too, which show no regard for the typical DNS
message exchange.  I don't mean things that look like broken or cobbled
DNS protocol elements, I mean things that don't even seem to be pretending
to be following the DNS protocol.

>let's be clear. i'm suggesting that if a registrant could cause a
>near-global cache flush of a zone by merely visiting their registrar
>interface and adding or deleting an NS RR (perhaps with a plan of
>deleting it or adding it back a day later), this would be a great way to
>solve the "oops" problems we see. for example:

I've never assumed that a registrant would be the (only) one to trigger
the cache flush.  The example provided is from a DNS hosting provider.

It would be very, very bad if a registrant could cause a full-system flush
of data by merely changing data at a registrar.  As I mentioned above,
optimizing for rainy situations penalizes everyone.  There are means to
properly change NS sets.  Lowering TTLs, shifting traffic over time, etc.,
are established practices, if not universally understood.  For those
following the proper steps, there's no burden on others.

>> -------- Original Message --------
>> Subject: 	[dns-operations] resolver ops: please refresh gov.on.ca
>> Date: 	Fri, 12 Dec 2014 21:49:24 -0500
>>To: 	dns-operations at lists.dns-oarc.net <dns-operations at dns-oarc.net>

>in summary, this proposed convention does not optimize for rainy days.
>its costs are noise-level, especially compared to the junk queries all
>over the DNS today. its benefits will be rare, but important when seen.

It certainly doesn't optimize for sunny days, and with the goal of
recovering from rainy days, I think it does.  Costs include more
complexity to the protocol and alters the trade-off analysis that goes
into designing DNS operational parameters.  (It's not just "more useless
packets.")

Benfits being rare - that speaks for itself.  And "important" for whom,
and how often?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4604 bytes
Desc: not available
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20150330/afef8c24/attachment.bin>