[dns-operations] rate-limiting state
paul at redbarn.org
Sun Feb 23 00:57:14 UTC 2014
sorry for the delay in getting back to this thread. i know damian raised
some important points.
Damian Menscher wrote:
> On Thu, Feb 6, 2014 at 4:46 PM, Paul Vixie <paul at redbarn.org
> <mailto:paul at redbarn.org>> wrote:
> Damian Menscher wrote:
> > ...
> > My recommendation (which Vixie and Vernon disagree with) is to
> use RRL
> > with slip=1 -- return TC=1 responses to all queries over the limit.
> my disagreement is explained in detail here:
> Since I haven't explained my objections before, I'll pick apart your
> 1) "RRL must be attenuative in packets per second, not just in bits
> per second". The attacker is using DNS amplification specifically to
> increase bits/second.
no. there are two classes of attackers: those who understand and
innovate, vs. those who just follow well trodden paths. the attackers we
mostly see are of the second variety, but our defenses must take account
of both varieties.
> If they wanted to amplify packets/second they could just spoof syn
> packets to webservers.
and they will, when we force them to, which is my goal in the DNS RRL
work. forcing the attacker to adopt a more complex technique is not a
pure win but i'll take it.
> Returning to a 1:1 ratio should be our goal, and slip=1 achieves that.
my goal is as stated, attenuation of both packets and bits, for the
reasons i've stated. if the attacker is willing to accept 1:1 then they
can forge packets directly to the victim. i want to encourage that, by
being a worse alternative for them than reflecting through my server. i
won't try to talk you out of your chosen goal, so long as you clearly
state it when making recommendations in keeping with it.
> 2) "A pure TCP fallback strategy would be less reliable due to the
> fragility of TCP/DNS". You go on to argue that the 3-way handshake
> adds latency and server load, which I agree with. But keep in mind
> only the legitimate queries will need to use TCP, so the actual load
> is low.
no. actual query load as witnessed on dns servers i have operated even
15+ years ago was not sustainable via tcp due to state load.
> And these are queries which would otherwise have had to retry over
> UDP after a timeout (and even then only have a 50% success rate), so
> the amortized latency hit isn't particularly significant either.
anyone on the internet can exhaust the tcp listener quota of any dns
server they target, thus ensuring that tcp fallback temporarily fails
for other victims whom they are simultaneously trying to starve via an
RRL flow overrun. that's what i mean by "tcp fragility". any design that
calls for tcp fallback of dns is by definition too fragile to be used in
production. (that's why i criticized nominum's answer to the kaminsky
attacks back in 2008, too.)
> 3) [Addressing the increased poisoning risk], "requires many hours of
> uninterrupted 100 Mbit/sec blasting from the attacker to the victim in
> order to have a chance at success". I don't worry about 100Mbps attacks,
you work at google. in my world, a 100Mbit/sec attack is noticeable. in
your world, it doesn't even raise an eyebrow.
> but in the age of 10Gbps (unamplified) attacks, I think this does
> introduce a non-negligible (and unnecessary!) risk for high-value
> domains. Keep in mind a single poison packet can inject a high TTL to
> cause a long outage, and potentially use that time to steal
> unencrypted data (SMTP, for example). Why take that risk just to
> reduce the amplification factor *below* 1:1?
there may be a marked difference in our perspective. i went along with
bernstein-style source port randomization as a temporary work around to
the kaminsky bug back in 2008, because we had to have something, and it
was something. the real fix, as i said then, is dnssec. other real
fixes, like eastlake-style cookies, or several proposals i'm aware of
which havn't been published as yet, might also come. but in no case did
i sign up for, nor will i accept an indefinite future where, cache
poisoning remains feasible using sustained flows of 100Mbit/sec for five
to fifteen minutes.
that means the risk which you claim is non-negligible and unnecessary,
i claim is both negligible and a sunk cost. so, i won't spend new manna
on it. especially if to get additional traction on it i would have to
accept a defense strategy for reflection that made me no less attractive
than sending spoofed packets directly to the victims.
i think my article covers pretty well the topic of why reflection is a
separate boon for attackers, over and above amplification.
see also the followup acm queue article at
> > This ensures your legitimate users can get through with a TCP
> > rather than having to attempt multiple retries before learning to
> > retry over TCP. Does slip=1 address your concerns?
> > Of course TCP isn't perfect -- it has higher latency and
> > per-connection costs -- but at least it ensures your legitimate
> > can't be affected by the RRL.
> it does not. see [ibid].
> Do you have additional arguments I missed?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dns-operations