[dns-operations] summary of recent vulnerabilities in DNS security.

Thu Oct 24 00:34:55 UTC 2013

> From: Haya Shulman <haya.shulman at gmail.com>

> > I'm puzzled by the explanation of Socket Overloading in
> > https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
> > I understand it to say that Linux on a 3 GHz CPU receiving 25,000
> > packets/second (500 bytes @ 100 Mbit/sec) spends so much time in
> > interrupt code that low level packet buffers overflow.

> Just to clarify, the attacker ran (two to three sync-ed hosts, and the
> burst was split among those hosts).

No number of hosts sending packets can exceed 25,000 500 Byte packets
to a single 100 MHz 802.3 host. In fact, 802.3 preamble, headers, CRC,
and IFG limit the 500 Byte packet rate to below 25K pps.  However,
multiple attacking hosts could cause excessive link-layer contention
(nothing to do with host or host network interface "interrupts" or
"buffers") and so packet losses in either or both directions for
legitimate DNS traffic and so the reported effects.

> > Could the packet losses have been due to the system trying to send
> > lots of ICMP Port-Unreachables?

> But, why would ICMP errors cause loss?

Sending ICMP packets requires resources, including wire and hub
occupancy, CPU cycles, "interrupts", kernel lock contention, kernel
buffers, network hardware buffers, and so on and so forth.  Any or all
of that can increase losses among the target DNS requests and responses.

> Inbound packets have higher priority over outbound packets.

I either do not understand that assertion or I disagree with it.  I
would also not understand or disagree with the opposite claim.  At
some points in the paths between the wire and the application (or more
accurately, between the two applications on the two hosts), one could
say that input has higher or lower priority than output, but most of
the time the paths contend, mostly first-come-first-served for resources
including memory bandwidth, DMA engines, attention from the 802.3 state
machine, host/network firmware or hardware queues and locks, kernel
locks, application locks, and application thread scheduling.

> > How was it confirmed that kernel interrupt handling was the cause
> > of the packet losses instead of the application (DNS server) getting
> > swamped and forcing the kernel to drop packets instead of putting

> This a good question. So, this evaluation is based on the following
> observation: when flooding closed ports, or other ports (not the ones on
> which the resolver expects to receive the response) - no loss was incurred,
> but all connections experience an additional latency; alternately, when
> flooding the correct port - the response was lost, and the resolver would
> retransmit the request after a timeout.

Ok, so a ~100 Mbit/sec attack on non-DNSSEC DNS traffic succeeded
on a particular LAN.  Without more information, how can more be
said?  Without more data we should not talk about interrupts, I/O
priority, or even whether the attack would work on any other LAN.

> I used the default buffers in OS and resolver. So, you think that it could
> be that the loss was on the application layer?...

I avoid talk about "layers" above the link layer, because the phrases
are generally at best unclear and confusing.  At worst, the phrases
are smoke screens.  In this case, there is no need to talk about an
"application layer," because we are presumably talking about "two
application programs" that are BIND, NSD, and/or Unbound.  If BIND was
used, then I could (but would try not to) speculate about BIND's
threading and request/response handing and consequent request or
response dropping.

Without data such as packet counts from standard tools such as `netstat`,
my bet is what I said before, that the application fell behind, its socket
buffer overflowed, and the results were as seen.  However, I would not
bet too much, because there are many other places where the DNS requests
or responses could have been lost including:
  - intentional rate limiting in the DNS server, perhaps even RRL
  - intentional rate limiting in the kernel such as iptables
  - intentional rate limiting in a bridge ("hub") in the path
  - unintentional link layer rate limiting due to contention for
     bridge buffers or wires.  At full speed from the attacking systems,
     unrelated cross traffic through hubs in the path or on the wires
     to DNS server would cause packet losses including losses of valid
     answers and so timeouts and so the observed effect.

>                        One of the main factors of the attack is `burst
> concentration`. 

That suggests (but certainly does not prove) link layer contention
instead of my pet application socket buffer overflow.  (I mean
overloading of or contention for wires or hubs (or routers?).)

A meta-question should be considered.  How much time and attention
should be given to yet another attack that apparently requires 100
Mbit/sec floods (I don't recall that this paper said how long this
attack flood must continue) and only when DNSSEC is not used?  Many
of us could probably do more interesting things than fuzz DNS caches
with access to the LAN where these tests were done--or most LANS.
(By "another" I'm referring to the mistaken reports that RRL+SLIP=1
is bad because of non-DNSSEC cache corruption under 4 hour 100 Mbit/sec
floods.)

Instead of looking for yet more obscure ways (e.g. 100 Mbit/sec floods
on LANs) in which non-DNSSEC DNS is insecure, why not enable DNSSEC
and declare victory?

Vernon Schryver    vjs at rhyolite.com