[dns-operations] DNS ANY record queries - Reflection Attacks

Wed Sep 12 08:21:35 UTC 2012

greetings from tokyo. you guys have got fairly deep into this thread
while i was in the air. catching up to the last 12 hours or so has been
like watching a filmed train derailment in slow motion. ouch.

On 9/11/2012 2:21 PM, Eric Osterweil wrote:
> Hey all, I think it's great that we are rallying (as a community) to find ways to address these DNS-based DDoS attacks, but I'm a little worried about this specific way we are proposing to do it.  That is, I think I either don't understand RRL, or I _do_ understand it, and worry about the correctness of the overall approach.

i think subsequent content on this thread shows that our documentation
is inadequate and that your understanding is incomplete. i've seen
nothing indicating that you've identified any flaws. please correct me
on that point if i've misread you.

> So, can I just make sure I understand the RRL idea?  If, under non-attack circumstances, I get a traffic rate of `r' from a given subnet, but an amplification attack sends me `99*r' (causing a total traffic rate of `100*r'), then I should rate limit?  So, my back of the envelope calculation says that I will reward the attack traffic over the non-attack traffic.  That is, if I limit the response rate back down to `r', then I will drop 99/100 responses to reach that target.  My legitimate client (subnet) has only about a 1/100 chance of getting each query answered here (all other response slots are given to my adversary)...

vernon answered that math question so i'll move on to this:

> I think rate limiting is kind of the wrong direction.  Did I misunderstand some aspect?

since you have definitely misunderstood several aspects, i don't know if
you still feel that DNS RRL is "kind of the wrong direction". if so,
please explain your proposed alternative, which has vernon has already
said here, should not be "do nothing". if there's a better way to do
_something_, i'd like to hear more about it.

> Also, when you say, ``shockingly effective,'' how can we measure effectiveness, in order to verify the approach?

"shockingly" is meant to indicate my surprise that we didn't have to
fine tune the numbers -- attacks look reliably like attacks even using
round numbers for all the knobs. to your question: i measure
effectiveness by the false positive rate and the false negative rate. if
either is non-negligible then the defense is ineffective. noting that
DNS RRL (not BIND RRL, please!) is running successfully in production,
we also have operator feedback which is universally of the form "it
noticeably helped, and did not noticeably hurt."

a full study of this matter would take several real attack flows and
model several alternative defenses, to show whether looking at and
counting only IP and UDP header details has a higher or lower false
positive or false negative rate. i'm not sure how to measure the impact
on non-attack flows unless there are some in the data set or unless we
imagine them. i'm not uninterested in a full study, and i hope to hear
from someone who wants to perform such.

On 9/11/2012 7:23 PM, Eric Osterweil wrote:
> On Sep 11, 2012, at 11:29 AM, Vernon Schryver wrote:
>
>> We propose dropping only identical responses to a given CIDR block instead of all responses.
> So, I don't understand something... If you see a lot of identical responses from an authority, could that not be because it is an authority for those responses?

this question led to all kinds of misunderstanding down-thread. we _are_
the authority, and the responses we are dropping are the ones we are
contemplating the sending of.

> How do you distinguish a netblock with multiple resolvers, or anycast resolvers?  Perhaps more directly, are you dropping responses from legitimate clients and how do you feel about them being collateral damage?

those would be false negatives, which are low, for the
statistics-related reasons vernon has given. if i found them to be high
or thought that they could be high then i would be more concerned.
(iptables based solutions have this problem; DNS RRL does not.)

>>>                    I think rate limiting is kind of the wrong direction.
>>> Did I misunderstand some aspect?
>> What do you think would be the right direction?  Doing nothing is not acceptable.
> Well, if doing something hurts the legitimate clients more than doing nothing, I think you need to be upfront about that.  I think that's worse than doing nothing.

what? this is wrong in two ways. flat out, factually and logically, wrong.

first, we're not hurting legitimate clients. the design of DNS RRL goes
out of its way to protect those.

second, if we were hurting legitimate clients, the damage would be to us
(the authority, since we'd be muting our content), whereas the cost of
doing nothing is born primarily by the DDoS victims who we answer even
though they are not querying us. whether this is better or worse than
doing nothing depends on who you're trying to protect, and the above
observation ("i think that's worse than doing nothing") is a total
nonsequitur.

>> We think that rate limiting is only a work around for the failure
>> of the responsible parties to implement BCP 38 or other effective
>> mechanisms to stop the abuse the transmit on behalf of their users.
>> In the distant future we hope it won't be needed.
> That's kind of passing the buck.  I think if your remediation winds up hurting legitimate clients you need to either quantify how you've made something better, or recognize that it's made things worse.

i don't think there's a buck. our remediation does NOT hurt legitimate
clients, as vernon's statistical analysis repeated elsewhere on-thread
shows pretty clearly. the burden is on you to show otherwise, and i'll
accept either an empirical or analytic demonstration.

>> One way to measure the effectiveness of a defense is to compare the
>> work the bad guy must do with the benefit to the bad guy.  In this
>> case, rate limiting at 10 identical repsonses and using the default
>> {slip 2;} means that in common scenarios, the amplification is less
>> than 1.  The bad guy gets less result from a reflection DoS attack
>> than a direct DoS attack.  Under the circumstances, I think that
>> is effective.
> OK, but you've also almost certainly eliminated the legitimate client's ability to query you for responses.  ...

no. just, no. this result is nowhere shown.

On 9/11/2012 9:00 PM, Vernon Schryver wrote:
>> From: Eric Osterweil <eosterweil at verisign.com>
>> So, I don't understand something... If you see a lot of identical
>> responses from an authority, could that not be because it is an authority
>> for those responses?  How do you distinguish a netblock with multiple
>> resolvers, or anycast resolvers? 
> The BIND RRL code is part of the resolver.  It does not "see a lot of
> identical responses from an authority" except when it is the authority.

vernon's use of the word "resolver" here is potentially misleading. DNS
RRL happens in the output stage of a DNS responder, which is usually not
called a resolver. i hope everyone will re-read this as though vernon
had said "is part of the DNS server's response logic."

>> So, every identical response either gets dropped or gets its TC bit set?
> No, every *excessive* identical response is either not sent (dropped)
> or a tiny TC=1 response is sent instead.

moreover, the definition of the word "identical" is not what one would
expect. perhaps we should say "vastly similar" rather than "identical".
one of the things DNS RRL counts is the number of times a negative
answer is generated, per-client-netblock, per-SOA-apex. these responses
are not identical but they all flow from the same SOA. another thing we
count is the number of times a wildcard is used per-client-netblock.
these responses are in no way identical but we treat them as such for
the purpose of rate limiting. these are things i do not think a firewall
can do unless it's so DNS-aware that it knows where the apex is, knows
what names exist, and knows what wildcards exist. (more on that in my
response to colm's thread.)

On 9/11/2012 10:21 PM, Eric Osterweil wrote:
> On Sep 11, 2012, at 5:00 PM, Vernon Schryver wrote:
>
>>> So, every identical response either gets dropped or gets its TC bit set?
>> No, every *excessive* identical response is either not sent (dropped)
>> or a tiny TC=1 response is sent instead.
> Wait, are we still talking about the resolver?  This seems to indicated a different deployment model than your above comment (why would I send a TC bit to my stub)?

we were never talking about "the resolver". see my clarification to
vernon's terminology, above.

> This is a tradeoff, so it's important (imho) to describe how much good is being done with how much not-good.

our claim for DNS RRL is that the not-good done is negligible due to the
fact that real clients retry. our whole design is based on limiting the
impact on good flows while dropping bad ones.

it's possible that you've imagined a weakness by which a new kind of
attacker could target the DNS RRL machinery in a way that mutes goodput,
where this muting, and not DDoS, is the goal of the attack. i invite you
to code this up and demonstrate it. my concern in this regard is muting
an authority server during a kaminsky-style attack on some caching
resolver in order to lengthen the poison-attack window. but i was not
able to make it work in the current DNS RRL design. "help wanted."

in closing:

today spamhaus.org's name servers were treated to a many-tens-of-gigabit
attack, using DNS amplification and reflection. those of us who run
spamhaus.org name servers were injured by this. and while we don't know
the identities of the non-BCP38 networks where the attacks originated,
since those networks allow outbound source-IP spoofing... we do know the
addresses of the name servers who answered queries they heard with our
address spoofed as the source-IP. in terms of long term recourse, i
can't ask the networks who did this to me to turn on source-IP
validation. but i can ask the name servers who reflected and amplified
these attacks to turn on DNS rate limiting.

thus you can see that DNS rate limiting's design is rooted in economics
while still governed by technology. we are coming up with solutions that
we can ask involved parties to implement. what's your idea?

paul