[dns-operations] DNS ANY record queries - Reflection Attacks

Wed Sep 12 19:44:20 UTC 2012

Hey Vernon,

Sorry for the delay in responding, I had a busy night...  As a level set: I am happy to be wrong in my concerns, but I continue to feel that there is no substitute for solid analysis and measurements when infrastructure availability is on the line.

On Sep 11, 2012, at 7:28 PM, Vernon Schryver wrote:

>> From: Eric Osterweil <eosterweil at verisign.com>
> 
>> Fair enough, except I'm pretty sure some of the deployment being
>> talked about (even in this thread) is at the authority (not the
>> resolver)...  
> 
>>> Paul Vixie and I are not advocating DNS rate limiting in firewalls.
>>> We're talking about rate limiting in the hosts at the ends of the
>>> intertubes.
>> 
>> Again, this thread started somewhere else.  Clearly, I agree that
>> people should be able to manage their own user experiences. ;)
> 
> By definition, all DNS servers and clients run on hosts.  When a
> router or bridge does DNS stuff, it is a host and subject to the
> Host Requirements RFCs.  

Super informative, thanks.

> I and I think some others have been talking
> about filtering DNS requests and/or responses in the following hosts
> and/or firewalls close to those aforesaid hosts:
>   - DNS servers, either authority servers or resolvers
>   - putative DNS clients that are targets of DNS reflection DoS attacks.

Indeed, and as I stated, the OP and, by extension, I were talking about authoritative servers.  True, they are hosts, but in this regard, not all hosts behave the same. ;)

> 
> The money-back warranty on the BIND RRL patch only covers its installation
> on authority DNS servers, although I have received positive reports
> from resolver operators.  The current version includes additional
> features suggested by operators of combined authorities/resolvers.

Have those reports mostly been surrounding ANY queries (as this thread does)?  This, I believe, is a very fundamental confirmation bias.  ANY queries are not widely used for much of anything (qmail not withstanding).  Thus, if you wind up with a solution that drops qmail traffic (which then must incur a large delay to come back), it doesn't matter, because there are no user eyeballs watching.  That is, your false positives go unnoticed.  If this approach were slotted for just ANY queries, I think it would be OK, but we could also do the more simple approach of just rate limiting this qtype.  otoh, if you apply this to _all_ qtypes (including A), and there is a large scale A reflector attack, I _strongly_ suspect the people who have deployed RRL will have a great many angry clients...

I feel the confirmation of this approach is far too narrow if it only includes ANY query attacks.

> 
> 
> 
>> 1 - If you uniformly drop 50% of a 100x amplification attack, you
>> are still reflecting 50x amplification, right?
> 
> That is wrong for the BIND RRL patch.  With default parameters, the
> BIND RRL drops 50% of responses and substitutes a small TC=1 response
> for the other 50%.  That gives an amplification for the responses it
> sends of <= 1.0 responses sent and an overall amplification <= 0.5.
> (It currently forgets about ENDS in the TC=1 responses giving a default
> amplification of < 0.5.  An influential commentator calls that a bug.)

OK, this is beginning to become clearer... But I have to admit, this still seems worrisome to me.  If you drop 50% of legit traffic (a generous assumption as it assumes a uniform distribution, which is not established by any of the analysis I have seen), and the other 50% (that you service as TC-bit mini-responses) comes back to you as TCP.  Thus, you have taken your own processing requirements way up (as your clients will now all hit you over TCP instead of UDP).

As there are generally very few ANY queries for most people, this overhead might look small, but would not be so for A queries (for most zones).  Also, the end user experience suffers as resolvers whose queries do get dropped (who are presumably re-querying 4x before they get to you 94%) fail to get an answer to their stubs before the stubs timeout.  Basically, I think you shed a lot of attack traffic and still impugn the clients.

Here's my meta point (though I'm sure we all enjoy the statistical back-and-forth), this seemingly simple change seems to have (like everything) very complicated interactions with (and within) DNS.  I think it is very important to measure and address the since _you_ are proposing a very significant change.  The push back from first principles, and anecdotal hearsay about success do not seem to justify the scope of this proposed change.

> 
> 
>> 2 - If you wait for (say) 4 responses, your stub (the client
>> driving the upstream resolver) has almost certainly timed out, and
>> the DDoS has succeeded, if I'm not mistaken, right?
> 
> I think that is mistaken.  Consider the implications of the default
> values of "attempts" and "timeout" keywords in /etc/resolv.conf and
> of _res.retry and _res.retrans the libc interface to the de facto
> standard stub.

I think I have considered them, but I'm happy to hear details if you think I've missed something?

> 
> 
> 
>> Then it should be easy enough for someone to explain the above, no?
>> Having deployed something does not mean that it was effective, and
>> blocking traffic does not tell me how much legit traffic and how much
>> attack traffic was blocked.  I don't see why this is so hard, I just
>> want to understand the assertion.
> 
> Consider where the BIND RRL patch has been installed and then ask
> yourself why *you* have not noticed any collateral response losses.
> See https://lists.dns-oarc.net/pipermail/dns-operations/2012-June/008453.html

Sorry, I don't follow.

Eric