[dns-operations] Perils of Transitive Trust, followup.

Thu May 4 11:02:02 UTC 2006

Hi everyone,

Our survey of DNS vulnerabilities was recently discussed on this
mailing list. Having read through the comments, we noticed some
confusion about some of the findings, and a dangerous myth that glue
records can fix/patch over problems inherent in the DNS delegation 
graph.

The goal of this message is to pop the "security-through-glue" myth,
and clarify some of the other issues that came up during the recent
discussion.

>How can 30% of names be vulnerable if only 17% of nameservers are 
>running old software with known exploits?

The percentages depend on the role the nameservers play in name
delegations. Suppose there are only two nameservers and 100 domain
names on earth. Server A serves ten names, and server B serves ninety
names.  If B is running an old version of BIND with a known exploit,
50% of nameservers have known exploits, yet 90% of names are
vulnerable.

>Won't DNSSEC deployment fix the security problems in DNS?

No. DNSSEC is better than nothing, but it's not a complete fix. In our
SIGCOMM'04 and IMC'05 papers, we identified two intertwined but
separate problems about the shape of the DNS dependency graph. One
problem is that there are many nameservers with known exploits which
lead to scripted attacks against nameservers, while another problem is
that the name delegation graph has bottlenecks which lead to
denial-of-service attacks. Clever attackers can combine these two
types of attacks to extend their reach. DNSSEC addresses the former
problem, but it does nothing to protect against denial-of-service
attacks against DNS servers.

>Don't glue records keep client resolvers from having to explore large
>portions of the name delegation graph, and therefore avoid the
>problems you mention?

The short answer is no, the glue fundamentally does not eliminate the
dependencies in delegation graphs, and will not by itself compensate
for vulnerabilities.

Let's introduce an example: Suppose a client C is looking up SITE at a
nameserver NS0, SITE is in a domain served by NS1, NS1 is at IP
address IP1, and the name NS1 is in a domain served by NS2. NS0 may
provide a response that says that NS1 is authoritative for SITE, and
glue that says that NS1 is at IP1. If client C trusts the glue record,
it no longer needs to consult NS2. To a naive observer, it may seem
that glue avoids the problems of transitive trust.

Incidentally, the client should be wary of trusting glue records
unconditionally, as they are non-authoritative. A well-known cache
poisoning attack works by tricking clients to believe glue records for
all time and for all queries. Glue should be trusted for only the
lookup in question for only the duration of that lookup. We'll assume
that the clients treat glue properly (even though many do not).

For glue to serve any function at all, it first has to be present.

>Why might the glue not be present?

For glue to be present, nameservers should be configured to serve it
in the first place. Glue may be missing due to various reasons:

* Some nameservers are configured explicitly to not return out of zone
  (aka out of bailiwick) glue. If NS1 is out of zone for NS0 (e.g. NS0 
  serves FOO.COM and NS1 is not in *.FOO.COM), NS0 might be configured
  to never provide any glue. This forces C to follow delegation
  chains.
* Some nameservers, such as newer versions of BIND by default, are
  configured to discard out of zone glue records during zone
  transfers. To provide out of zone glue, such nameservers have to 
  follow delegations themselves on the server side.
* If glue fetching is off and the parent nameserver has no zone
  transfer agreement with the authoritative nameserver (e.g. parent 
  serves *.SE, a name NAME.SE is delegated to NS.HU), then the
  parent nameserver will provide either no glue, forcing clients to
  perform transitive lookups, or chase glue, performing transitive
  lookups itself. Either way, the delegation chains need to be followed.

>Suppose glue _is_ present, doesn't that fix everything?

A client resolver that trusts glue records can, indeed, avoid
independently exploring the portion of the name delegation graph that
lies behind NS1. This obviates the _client_ from having to consult
the large dependency chains that are the root cause of the problem.
But now the _server_ may have to follow the dependency chains.

The information in the glue record has to come from somewhere. It is
true that the client does not have to discover it independently,
thanks to the glue. But the glue provider, NS0, had to acquire the
binding somehow, initially as well as periodically as it expires.

There are only three ways in which a nameserver can acquire records
that serve as glue:

* Glue is baked in: The IP address binding for NS1 can be configured
  statically into NS0. This is a bad idea that should not be used in
  practice, and surveys indicate that it is indeed rare.

* Glue is acquired via DNS: In essence, instead of having the client
  explore the dangerous part of the delegation graph, the nameserver
  (NS0) explores the same part of the graph when glue expires. This is
  called "glue chasing" and is the default behavior in BIND 4.X and
  BIND 8, which accounted for 22% of the nameservers during our study.

  Glue chasing nameservers are completely vulnerable to problems of
  transitive trust. In fact, attacks are even easier to launch against
  glue chasing nameservers, as the clever attacker need only launch her
  attack when glue is about to expire. She can time her attack, DoS any
  non-compromisable nameservers that serve NS1's name, force NS0 to
  have to inquire about NS1's binding from a nameserver she has
  compromised, and thus extend her reach to all clients that consult
  NS0 for names served by NS1, without breaking into NS1.

* Glue is acquired via a zone transfer: NS0 might have set up zone
  transfers with NS2, and thus can learn the IP address for NS1 via
  the zone transfer. Zone transfers may be plain or cryptographically
  signed, and the cryptographic keys may be kept online on NS2 or
  offline.

  An attacker which has compromised NS2 can poison NS0 unless the glue
  is transferred via a zone transfer, unless the zone transfer is
  signed, and unless the signing keys are kept offline. That's three
  separate conditions, and the first one is difficult to fulfill as
  the two servers might not have a master-slave relationship. Further,
  if the keys are kept online, for instance, an attacker that has
  broken into NS2 can simply modify the database to change the IP
  address for NS1, sign it and pass it onto NS0, which will serve
  poisoned glue and allow the attacker to hijack SITE, without having
  to break into NS1.

So, no, glue, even when present, is not a panacea.

>You seem to imply that nameservers in the .EDU domain which play a
>large role in name delegation graphs are dangerous. I know the folks
>who operate the servers at X.EDU and they do a terrific job!

At an educational domain ourselves, we also know some of the same
folks and realize first-hand how hard they work under competing time
pressures. The issue is not that educational nameservers are more
vulnerable. It's that such nameservers should not play a large role in
the resolution of unaffiliated names. Educational institutions (say,
NYU) has no fiduciary responsibility to people who own DNS names (say, 
in the Ukraine), yet may well be in a position to control large sections
of the same namespace (NYU appears in the dependency graph of all names
in the Ukrainian namespace). This creates two problems: educational
nameservers become more prominent targets because they play a large 
role in DNS dependency graphs, and pose a legal liability for the 
university should a nameserver get compromised.

>Your survey examined BIND version numbers as reported by the
>nameservers. The nameservers might be reporting incorrect version
>numbers.

True, we did not break into the nameservers to verify the presence of
exploits, as that is illegal. The default behavior for BIND is to
truthfully report version numbers if version reporting is enabled.
While it is possible for a production nameserver to pretend to have a
flaw when, in fact, it does not, this requires extra effort and makes
little sense. Perhaps some of the nameservers reporting old version
numbers are honeypots; chances are small that honeypots account for a
significant fraction of the ~27000 vulnerable nameservers our survey
uncovered.

Further, we consider name servers not returning version numbers or
running non-BIND software to be non-vulnerable. In practice, these
name servers could also have easy-to-exploit vulnerabilities, making
our survey results less alarming than reality.

>We, an inside group of DNS system administrators, knew about these
>problems already.

It isn't sufficient for some people to be aware of potential problems
with transitive trust in DNS. The architecture of DNS implies that the
namesystem will not be secure until all administrators are aware of,
and take active steps to avoid, problems stemming from transitive 
trust.

>I found a bug in CoDoNS...

Thanks! Any new system will have its share of bugs (and BIND admins
should not be too unfamiliar with bugs). Let us know and we'll fix it!

>CoDoNS seems to be caching records and polling on expiry. Won't this
>wreak havoc for records with short TTLs?

CoDoNS does not cache resource records with TTLs equal to or shorter 
than 30 seconds. 

>Once a name is cached in CoDoNS, will my server receive queries
>from all the caching hosts upon expiration?

Only the home node in CoDoNS will refetch the record upon expiration,
no matter how widely the name might be replicated within CoDoNS. The 
structure of the CoDoNS ring enables the home node to quickly
disseminate updates to all caches when updates are detected.

>Your CoDoNS system proposes to use a peer-to-peer distributed hash
>table (DHT) to serve DNS. Why would a DHT make sense for serving DNS?

DNS is already a large distributed hash table, albeit with poor
failure resilience, slow lookup performance, and no support for
unplanned record updates. Maintaining a secure namespace requires
substantial manual effort. Administering DNS is not only difficult and
expensive, but manual administration can lead to inconsistencies and
errors. These are not surprising, given that DNS was designed over 25
years ago when we did not know much about building failure-resilient,
high performance, self-organizing distributed systems. We now know how
to do better, and the time is ripe for rethinking the architecture of
the naming system.

--

Looking forward to a more secure name system,
Gun & Rama.