[dns-operations] DNS ANY record queries - Reflection Attacks

Wed Sep 12 19:53:14 UTC 2012

Hey Paul,

As I said to Vernon: I am happy to be wrong in my concerns, but I continue to feel that there is no substitute for solid analysis and measurements when infrastructure availability is on the line.

On Sep 12, 2012, at 4:21 AM, paul vixie wrote:

<snip>

>> I think rate limiting is kind of the wrong direction.  Did I misunderstand some aspect?
> 
> since you have definitely misunderstood several aspects, i don't know if
> you still feel that DNS RRL is "kind of the wrong direction". if so,
> please explain your proposed alternative, which has vernon has already
> said here, should not be "do nothing". if there's a better way to do
> _something_, i'd like to hear more about it.

At the very least, as you are proposing something, you ought to do proper analysis.  It is not up to me to come up with something new just because a proposal has holes.  Even if, as you contend, those holes are just a lack of proper evaluation. ;)

> 
>> Also, when you say, ``shockingly effective,'' how can we measure effectiveness, in order to verify the approach?
> 
> "shockingly" is meant to indicate my surprise that we didn't have to
> fine tune the numbers -- attacks look reliably like attacks even using
> round numbers for all the knobs. to your question: i measure
> effectiveness by the false positive rate and the false negative rate. if
> either is non-negligible then the defense is ineffective. noting that
> DNS RRL (not BIND RRL, please!) is running successfully in production,
> we also have operator feedback which is universally of the form "it
> noticeably helped, and did not noticeably hurt."

Wait, so you have FP and FN rates?  That's great!  Can we get a look at those and the details of how you determined what the FP/FNs were?  This is very much part of what I was trying to find.

> 
> a full study of this matter would take several real attack flows and
> model several alternative defenses, to show whether looking at and
> counting only IP and UDP header details has a higher or lower false
> positive or false negative rate. i'm not sure how to measure the impact
> on non-attack flows unless there are some in the data set or unless we
> imagine them. i'm not uninterested in a full study, and i hope to hear
> from someone who wants to perform such.

Well, I think this is really kind of critical. Saying that you have something that works great, but we should all just trust you because you think it's really shockingly effective sounds like a lot to take on faith.  I've outlined the concerns I have with overly aggressive filtering techniques already, so I don't think repeating them is useful.  I think you owe it to the community to support your own proposal with real analysis and corresponding measurements.

> 
> On 9/11/2012 7:23 PM, Eric Osterweil wrote:
>> On Sep 11, 2012, at 11:29 AM, Vernon Schryver wrote:
>> 
>>> We propose dropping only identical responses to a given CIDR block instead of all responses.
>> So, I don't understand something... If you see a lot of identical responses from an authority, could that not be because it is an authority for those responses?
> 
> this question led to all kinds of misunderstanding down-thread. we _are_
> the authority, and the responses we are dropping are the ones we are
> contemplating the sending of.

Exactly... There was a repeated divergence to discussing the resolver's perspective, and I was trying to keep us on track with the OP.

> 
>> How do you distinguish a netblock with multiple resolvers, or anycast resolvers?  Perhaps more directly, are you dropping responses from legitimate clients and how do you feel about them being collateral damage?
> 
> those would be false negatives, which are low, for the
> statistics-related reasons vernon has given. if i found them to be high
> or thought that they could be high then i would be more concerned.
> (iptables based solutions have this problem; DNS RRL does not.)

That is conjecture.  We don't live on the back of an envelope, we live in an operational world: measurements are what matter.

> 
>>>>                   I think rate limiting is kind of the wrong direction.
>>>> Did I misunderstand some aspect?
>>> What do you think would be the right direction?  Doing nothing is not acceptable.
>> Well, if doing something hurts the legitimate clients more than doing nothing, I think you need to be upfront about that.  I think that's worse than doing nothing.
> 
> what? this is wrong in two ways. flat out, factually and logically, wrong.
> 
> first, we're not hurting legitimate clients. the design of DNS RRL goes
> out of its way to protect those.

If you drop legit traffic, cause timeouts, and unreachability to someone's zone because they have deployed RRL under ANY-type reflector attacks, and then A-type reflector attacks cause RRL to shutdown their zone, I'd say you have caused harm.  An attacker would then have the ability to cause your name servers to stop being productive to any other org's netblock, this would be a dangerous new attack vector.

> 
> second, if we were hurting legitimate clients, the damage would be to us
> (the authority, since we'd be muting our content), whereas the cost of
> doing nothing is born primarily by the DDoS victims who we answer even
> though they are not querying us. whether this is better or worse than
> doing nothing depends on who you're trying to protect, and the above
> observation ("i think that's worse than doing nothing") is a total
> nonsequitur.

I totally agree that the DDoS threat is important.  On the other hand, opening new attack vectors that may not even address the real problem is also dangerous.  Doing something is not the same as doing something helpful.  And to clarify, you are proposing a technique that _other_ people will deploy, so they will bear the pain until they figure out why client resolvers are failing their stubs. 

I really would be happy to be wrong, but so far my concerns are only getting louder as the requests for analysis are being met with incredulity...

> 
>>> We think that rate limiting is only a work around for the failure
>>> of the responsible parties to implement BCP 38 or other effective
>>> mechanisms to stop the abuse the transmit on behalf of their users.
>>> In the distant future we hope it won't be needed.
>> That's kind of passing the buck.  I think if your remediation winds up hurting legitimate clients you need to either quantify how you've made something better, or recognize that it's made things worse.
> 
> i don't think there's a buck. our remediation does NOT hurt legitimate
> clients, as vernon's statistical analysis repeated elsewhere on-thread
> shows pretty clearly. the burden is on you to show otherwise, and i'll
> accept either an empirical or analytic demonstration.

This analysis misses a lot.  Resolver retx'ing 4 times in order to be likely to get a response is a big change... How long does a resolver have before the stub times out?  I know that _you_ know the stub runs the show as far as timeliness.

> 
>>> One way to measure the effectiveness of a defense is to compare the
>>> work the bad guy must do with the benefit to the bad guy.  In this
>>> case, rate limiting at 10 identical repsonses and using the default
>>> {slip 2;} means that in common scenarios, the amplification is less
>>> than 1.  The bad guy gets less result from a reflection DoS attack
>>> than a direct DoS attack.  Under the circumstances, I think that
>>> is effective.
>> OK, but you've also almost certainly eliminated the legitimate client's ability to query you for responses.  ...
> 
> no. just, no. this result is nowhere shown.

My comment just above talks about this.  The stub has left the show by the time Vernon's 4th retx gets sent.

> 
> On 9/11/2012 9:00 PM, Vernon Schryver wrote:
>>> From: Eric Osterweil <eosterweil at verisign.com>
>>> So, I don't understand something... If you see a lot of identical
>>> responses from an authority, could that not be because it is an authority
>>> for those responses?  How do you distinguish a netblock with multiple
>>> resolvers, or anycast resolvers? 
>> The BIND RRL code is part of the resolver.  It does not "see a lot of
>> identical responses from an authority" except when it is the authority.
> 
> vernon's use of the word "resolver" here is potentially misleading. DNS
> RRL happens in the output stage of a DNS responder, which is usually not
> called a resolver. i hope everyone will re-read this as though vernon
> had said "is part of the DNS server's response logic."

It's ok, everything is host... or so I'm told. ;)

> 
>>> So, every identical response either gets dropped or gets its TC bit set?
>> No, every *excessive* identical response is either not sent (dropped)
>> or a tiny TC=1 response is sent instead.
> 
> moreover, the definition of the word "identical" is not what one would
> expect. perhaps we should say "vastly similar" rather than "identical".
> one of the things DNS RRL counts is the number of times a negative
> answer is generated, per-client-netblock, per-SOA-apex. these responses
> are not identical but they all flow from the same SOA. another thing we
> count is the number of times a wildcard is used per-client-netblock.
> these responses are in no way identical but we treat them as such for
> the purpose of rate limiting. these are things i do not think a firewall
> can do unless it's so DNS-aware that it knows where the apex is, knows
> what names exist, and knows what wildcards exist. (more on that in my
> response to colm's thread.)

This has all been very fluffy and nonspecific.  I really think anyone proposing this type of enhancement owes it to their perspective consumers/adopters to do much more specific and detailed measurements and analysis.  I have seen some comments to the effect of, ``what's your idea,'' and, ``you don't get it,'' but this is really up to _YOU_ to provide, with your solution. I've given a number of detailed concerns and questions and so far they have been ignored.  When your adopters get non-ANY attacks, I think there will be some serious pain felt by them.  I'd be _delighted_ to be wrong (which is why I was asking for numbers and analysis), but so far, the lack of these worries me much more.

> 
> On 9/11/2012 10:21 PM, Eric Osterweil wrote:
>> On Sep 11, 2012, at 5:00 PM, Vernon Schryver wrote:
>> 
>>>> So, every identical response either gets dropped or gets its TC bit set?
>>> No, every *excessive* identical response is either not sent (dropped)
>>> or a tiny TC=1 response is sent instead.
>> Wait, are we still talking about the resolver?  This seems to indicated a different deployment model than your above comment (why would I send a TC bit to my stub)?
> 
> we were never talking about "the resolver". see my clarification to
> vernon's terminology, above.

I think you might be talking to Vernon here, so I'll leave this to him.

> 
>> This is a tradeoff, so it's important (imho) to describe how much good is being done with how much not-good.
> 
> our claim for DNS RRL is that the not-good done is negligible due to the
> fact that real clients retry. our whole design is based on limiting the
> impact on good flows while dropping bad ones.

Analysis notwithstanding...

> 
> it's possible that you've imagined a weakness by which a new kind of
> attacker could target the DNS RRL machinery in a way that mutes goodput,
> where this muting, and not DDoS, is the goal of the attack. i invite you
> to code this up and demonstrate it. my concern in this regard is muting
> an authority server during a kaminsky-style attack on some caching
> resolver in order to lengthen the poison-attack window. but i was not
> able to make it work in the current DNS RRL design. "help wanted."

I fear that the attacks we already see will do this for me... ANY-type attacks are the flavor de jour, but they are not the only ones out there...

> 
> in closing:
> 
> today spamhaus.org's name servers were treated to a many-tens-of-gigabit
> attack, using DNS amplification and reflection. those of us who run
> spamhaus.org name servers were injured by this. and while we don't know
> the identities of the non-BCP38 networks where the attacks originated,
> since those networks allow outbound source-IP spoofing... we do know the
> addresses of the name servers who answered queries they heard with our
> address spoofed as the source-IP. in terms of long term recourse, i
> can't ask the networks who did this to me to turn on source-IP
> validation. but i can ask the name servers who reflected and amplified
> these attacks to turn on DNS rate limiting.

You have to realize that you are not the only one who sees quite a large number of very large reflector attacks.  There is (as maybe you know?) quite a bit of variability, and real solutions need real measurements and analysis or they can (and often do) have the potential to do a lot of harm themselves.

> 
> thus you can see that DNS rate limiting's design is rooted in economics
> while still governed by technology. we are coming up with solutions that
> we can ask involved parties to implement. what's your idea?

Also thus, you can see that the potential collateral damage done by under-analyzed approaches can outweigh optimistic appraisals...  My idea to to verify our work so that we actually know it's merits. :)

Eric