[dns-operations] Perils of Transitive Trust, followup.

Fri May 5 03:09:49 UTC 2006

On Thu, 04 May 2006 07:02:02 -0400
Emin Gun Sirer <egs at cs.cornell.edu> wrote:

> Our survey of DNS vulnerabilities was recently discussed on this
> mailing list. Having read through the comments, we noticed some
> confusion about some of the findings, and a dangerous myth that glue
> records can fix/patch over problems inherent in the DNS delegation 
> graph.

The start of the discussion here left out a number of details for
those of us not in the know so I began reviewing as much of the CoDoNS
and DHT-based naming system related material I could in the past few
days.  Your responses thus far seem to have left out answers to some
concerns I've seen voiced so I'd like to take this opportunity for
follow-on summary questions as well as to ask you to address some
observations of my own.

Note, I don't claim to have done a rigorous study and there are gaps
in my DNS knowledge so forgive me for any, of what I hope would only
be, small lapses of confusion.

CoDoNS has been implemented on a subset of Planet Lab hosts, according
to this page:

  <http://beehive.cs.cornell.edu/~ramasv/codonsstatus.html>

currently nine sites at research-oriented insitutions, likely well
connected via networks such as Abilene and Geant and the number of
queries per _hour_  is lower than what many DNS servers see in a
second.  Similarly, the evalution as described in the paper used the
12-hour MIT lab data consisting of 281,943 queries, which was divided
between 75 Planet Lab hosts.  This hardly seems to be an exhaustive
test environment and probably not what many operators on this list
would call a "real workload".  Am I misinterpreting the evaluation
summary?

The system as described in the "The Design and Implementation of a
Next Generation Name Service for the Internet" requires a central
signing authority or perhaps "a small number of well-known public
keys for globally trusted authorities" for records and these in turn
are used to sign a set of selected namespace operators and presumably
thereon into the namespace.  These are not trivial requirements based
on past experience.  Even if CoDoNs is better than DNS in all other
aspects, can it be deployed in place of DNS without some sort of
trusted signing infrastructure?

I was a little surprised to see you say things like:

  "In the presence of conflicting or inconsistent records, clients
  simply pick the records signed by an operator they trust, similar
  to the way they pick between separate sets of root servers today."

  "Since CoDoNS eliminates physical delegations and designated
  nameservers, it breaks the monopoly of namespace operators and
  creates a level playing field where namespace operators need to
  compete with each other on service."

I'd like to ask you to comment on what you think about RFC 2826 and
how the CoDoNS design supports the IAB's statements or not?

The current architecture effectively has some built in policy knobs,
at least at some levels in the hierarchy, that allow an administrator
to choose who provides service for their zones and who they delegate
to.  While not perfect, particularly with the transitive trust
relationships you've detailed, this aspect of control would be lost
unless TTLs are < 30 seconds, which incurs it's own trade-off.  Not
only that of performance, but also of responsibility (fiduciary
responsility you've used in another scenario) and the ability to
traceback and troubleshoot problems.  Who do you go to when problems
arise, potentially any or all of the X number node admins?

This may be covered in more detailed in the Beehive literature and if
so please say so, but I haven't gotten there yet, how does the system
deal with misbehaving or poor performing peers?  I saw a brief mention
of the "Secure Routing for Structured Peer-to-Peer Overlay Networks"
paper and the call for even more storage and link capacity requirements
by just reducing the latecy requirements.  This is one of the areas
that does not seem to be fully explored for real world environments,
because it seems like today the basic solution is to just replicate as
much as possible, if not every name you can get your hands on.

CoDoNS (Beehive) seems to do an aggressive amount of caching (largely
due to the above?), perhaps even caching as much as the namespace as
possible for it's performance gain.  How does CoDoNS handle distributed
cache exhaustation attacks, such as random queries for bogus names,
resulting in negative caching or random queries for names that end up
being associated with a wildcard.  Your analysis on the flash-crowd
effect seems to suggest that this could be a problem, as CoDoNS performed
noticably worse in your analysis during a period of high load for queries
not in the cache at boot or for flash crowds.

Coupled with the above, since each CoDoNS server has to perform all
of the certificate caching and verification for every unique record,
isn't this going to overwhelm many systems if we continue to see the
types of attacks as we see today?

The FAQ claims support for dynamic or source-based answer selection,
however, am I correct in understanding that all these sorts of records
would need to have < 30 TTLs to function properly?  Yet it seems that
widespread use of low TTLs in CoDoNS would be highly undesirable and
offset many of it's supposed advantages.

>From one of your recent posts:

  >CoDoNS seems to be caching records and polling on expiry. Won't this
  >wreak havoc for records with short TTLs?

  CoDoNS does not cache resource records with TTLs equal to or shorter
  than 30 seconds.

This doesn't seem to be very convincing.  All it takes is for some
additional queries for multiple unique names with TTLs of 30 seconds
to have a similar effect.  Furthermore:

  >Once a name is cached in CoDoNS, will my server receive queries
  >from all the caching hosts upon expiration?

  Only the home node in CoDoNS will refetch the record upon expiration,
  no matter how widely the name might be replicated within CoDoNS. The 
  structure of the CoDoNS ring enables the home node to quickly
  disseminate updates to all caches when updates are detected.

If an attack is launching a number of random or even a sizeable number
of common queries against the system, won't that the replication load
spread throughout the entire system, possibly even making things worse
that if it had just been relegated to subset of paths on the current
DNS tree?

When names are used for malevolent purposes it is hard enough as it
is to mitigate the abuse.  If the name is completely controlled by the
name owner, being able to move it around from home node to home node
quickly may be nice for the majority of users, but this system makes
the use of naming for ill intent much more difficult for the mitigators
to deal with.  Since there isn't any hierarchy, who gets to override
a "bad" name from the system and shut it down?  Does each node have
to deal with it separately?

Finally, since every CoDoNS server provides recursive, caching service
and we still do not have BCP 38 widely deployed where it's needed,
shouldn't we be a little afraid of widescale CoDoNS deployment?  :-)

p.s. Yes, I'm the same jtk from another email address that sent you
corrections to the FAQ the other day.

John