[dns-operations] dnsop-any-notimp violates the DNS standards

Tue Mar 17 00:50:44 UTC 2015

On 3/16/15 4:15 PM, P Vixie wrote:
> 
> 
> On March 17, 2015 7:42:09 AM GMT+09:00, Michael Sinatra <michael at brokendns.net> wrote:
>>
>>
>> On 03/16/15 07:23, bert hubert wrote:
>>
>>> Separately, I fail to see why we actually need to outlaw ANY queries
>> when we
>>> can happily TC=1 them. 
>>
>> If the public recursives also support TC=1 on all ANY queries, then
>> this
>> works.  If not, the issue arises where just-below-the-radar attacks are
>> using many public recursives, in which case you're not stopping much.
> 
> Michael, what attacks do you think we can stop by limiting ANY? Paul

The attack that I have had to grapple with is this:

* Someone sets up a bot to query public recursives (google, opendns,
level3, etc.) for a particular domain whose ANY response is large.
(This _usually_ means DNSSEC-signed.)

* The query from each <client,domain,qtype> tuple is just barely slow
enough not to trigger rate limiting from the public recursive service.

* The backend of the public recursive service queries my authoritatives
for some of the involved domains.  Suppose the response is just under
the usual typical default EDNS0 buffer size of 4096.

* These domains are DNSSEC-signed with NSEC3.  Many tools set the TTL of
NSEC3PARAM to 0 when signing zones with NSEC3.  The NSEC3PARAM RR is
part of the ANY response.

* The public recursive servers use an implementation that clears all
records from the cache when the TTL of one record expires, so the next
time the recursive server gets an ANY query, it must re-query the
authoritative server.

In this situation, if I set TC=1 for all ANY queries on my authoritative
server, but the public recursives don't, then the victim still gets hit
with a pretty big amplification attack, and my authoritative servers get
hammered with TCP queries.  It's annoying for me--not insurmountable,
but annoying, as the thousands of simultaneous TCP connections require
some tuning to manage reasonably.  But for the victim?  Who knows--I
can't see who the victim is in this case.  The more I tune my servers,
the more data gets likely thrown at the victim.

I have seen this in the wild, even where the response is bigger than
4096, so the TC bit should be set all around.  Note also that if my
response is bigger than 4096, I'll send an empty response back with TC=1
(I am using BIND-latest).  I have seen some recursive implementations (e.g.
unbound) that will dutifully send the victim everything right up to the
next RRset that would push it over 4K and set TC=1 for good measure.  So
the victim still gets a ~4000-byte UDP response, even with TC set.

So my point is that if we're going to specify TC=1 for ANY queries, it
has to be mandatory, and all implementations have to handle it the same:
Send an empty NOERROR and set TC=1.  If I am the only one setting TC=1,
it won't doing any good for the attack described above, even if my
domains are the ones being used in the attack.

The other option is to allow the authoritative servers to control what
gets set out in response to QTYPE=ANY.  But I see devils in the details,
just as with NOTIMP and other proposals.

michael