[dns-operations] resolvers considered harmful

Thu Oct 23 20:22:24 UTC 2014

On Thu, Oct 23, 2014 at 03:29:41PM -0400, Mark Allman wrote:
> 
>   - The TLDs are a little weird in that they are trying to control for
>     their load and yet serving someone else's names.  

This characterization of things makes me a little uneasy at the
possible mismatch between your model of how the DNS works and mine.

I think that a delegating name is not "serving someone else's names".
I think a name server offering an in-baliwick referral is in fact
doing its job, and is managing part of its namespace by telling the
query source about how one kind of distributed management (the
distribution of authority) is happening in the DNS.  Therefore, .com
is not "serving someone else's name" when responding to example.com,
but in fact serving its _own_ name (com, and everything underneath
it), and then telling the query source, "I delegated that part of the
com namespace away."

I wouldn't care about being picky here, except that a number of
remarks in this thread have suggested to me that you have something
like a client-server model of the DNS in your head -- not dissimilar to
http -- and that's IMO a bad model to have.

This is easier to see in the event that you think about multiple zones
being served by the same authoritative name server with CNAME or DNAME
links between them.

>     they want the flexibility and don't pay the serving price.
>     Meanwhile, the TLDs don't directly care about the flexibility and so
>     they optimize for load shedding.  So, um, yeah ....

So, um, no.  The NS set from the child is in many resolvers the thing
that will be in the cache, so if the child opts for a short TTL and
the parent doesn't, the parent pays the cost anyway.  For very popular
names like, say, netflix.com in a large ISP like, say, Comcast, the
differences might be quite significant for the TLD -- maybe a few
dozen queries to the .com zone as opposed to maybe 100s or maybe
10,000 in the same short period.

I don't _know_ which of these is the case, but your analysis actually
doesn't tell us anything about it, either, because the rates that come
from a user population of 100 behind 1 resolver giving up that
resolver are going to be different from the rates that come from a
user population of 1,000 or 10,000 or 100,000.  Without knowing which
of these is relevant, it's going to be awful hard to extrapolate from
your results.

This is a methodological problem you have as a result of how your
first dataset is built.  It's therefore something that needs further
study.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com