[dns-operations] DNS ANY record queries - Reflection Attacks

Tue Sep 11 19:23:40 UTC 2012

Hey Vernon,

On Sep 11, 2012, at 11:29 AM, Vernon Schryver wrote:

>> From: Eric Osterweil <eosterweil at verisign.com>
> 
>> So, can I just make sure I understand the RRL idea?  If, under
>> non-attack circumstances, I get a traffic rate of `r' from a given
>> subnet, but an amplification attack sends me `99*r' (causing a total
>> traffic rate of `100*r'), then I should rate limit?  So, my back of
>> the envelope calculation says that I will reward the attack traffic
>> over the non-attack traffic.  That is, if I limit the response rate
>> back down to `r', then I will drop 99/100 responses to reach that
>> target.  My legitimate client (subnet) has only about a 1/100 chance
>> of getting each query answered here (all other response slots are given
>> to my adversary)...
> 
> That computation might be correct if DNS clients did not retransmit,
> if the BIND RRL idea involved only discarding responses,
> and if Paul and I proposed dropping 99% of all traffic for a CIDR block.
> We advocate none of that.

Hmm.. I may still be missing some nuances, what are the specifics?  Note, I asked if you were normalizing the traffic back down to `r'.  I wasn't presuming to know how RRL works, just that the inherent nature of rate limiting punishes the less talkative.  If the amplification is smaller than the legitimate traffic, then it would indeed punish the attacker (but also the legitimate).

> 
> We propose dropping only identical responses to a given CIDR block
> instead of all responses.

So, I don't understand something... If you see a lot of identical responses from an authority, could that not be because it is an authority for those responses?  How do you distinguish a netblock with multiple resolvers, or anycast resolvers?  Perhaps more directly, are you dropping responses from legitimate clients and how do you feel about them being collateral damage?

> 
> The BIND RRL code has a notion of "slip" or responding while rate
> limiting with TC=1.  It has a default slip rate of 2, or responding
> with TC=1 instead of dropping every other identical response.

So, every identical response either gets dropped or gets its TC bit set?

> 
> A DNS client that retransmits N times to a DNS server that answers
> 50% with TC=1 of the time will get an answer to 1-(0.5)^N of its
> queries.  For N=4, it will get a TC=1 answer 94% of the time.

Wait, I'm very confused... The above sounds like you respond to 94% of the reflector attack queries (which furthers the attack).  Just adding the TC bit still sends the traffic, but in the event that the legit client gets an answer, they will not beat you up as they fallback over TCP too.... So, didn't you just increase your operational overhead, while still answering most of the attack traffic?

> 
> 
>>                    I think rate limiting is kind of the wrong direction.
>> Did I misunderstand some aspect?
> 
> What do you think would be the right direction?  Doing nothing is
> not acceptable.

Well, if doing something hurts the legitimate clients more than doing nothing, I think you need to be upfront about that.  I think that's worse than doing nothing.

> 
> We think that rate limiting is only a work around for the failure
> of the responsible parties to implement BCP 38 or other effective
> mechanisms to stop the abuse the transmit on behalf of their users.
> In the distant future we hope it won't be needed.

That's kind of passing the buck.  I think if your remediation winds up hurting legitimate clients you need to either quantify how you've made something better, or recognize that it's made things worse.

> 
> 
>> Also, when you say, ``shockingly effective,'' how can we measure
>> effectiveness, in order to verify the approach?
> 
> One way to measure the effectiveness of a defense is to compare the
> work the bad guy must do with the benefit to the bad guy.  In this
> case, rate limiting at 10 identical repsonses and using the default
> {slip 2;} means that in common scenarios, the amplification is less
> than 1.  The bad guy gets less result from a reflection DoS attack
> than a direct DoS attack.  Under the circumstances, I think that
> is effective.

OK, but you've also almost certainly eliminated the legitimate client's ability to query you for responses.  So, while I'm glad to see some quantification of effectiveness, I think it misses an important point.  It is trivially easy for an adversary to use this to DDoS a client's ability to query an authority for anything.  I think someone could target your zone and a recursive resolver and ensure that your zone drops responses to legitimate queries from that resolver, no?

Eric