[dns-operations] Best Practices

Sun Jun 16 18:57:28 UTC 2013

> From: Florian Weimer <fw at deneb.enyo.de>

> >> ISC-TN-2012-1 is unfortunately not very clear about the actual key
> >> used to determine the bucket to account against.
> >
> > What is the relevance of the shortcomings of
> > http://ss.vix.su/~vixie/isc-tn-2012-1.txt to whether RRL works on
> > authority (or even recursive**) servers without too much tuning?
>
> I used it (perhaps naïvely) as the reference for the default settings
> and their potential impact.  Is there a better source (except the
> source code itself)?

Most people who read anything read software documentation and never
read RFCs, IDs, and so forth.  There are links on
http://www.redbarn.org/dns/ratelimits to relevant text for three
RRL implemenations.

I wish more people would read RFCs, IDs, technical notes, etc., but
the documentation for an implementation tends to be more relevant
and useful to users of an implementation.  That would be true of
RRL even if that technical note had even a single word about
configuration file syntax, because there are differences among RRL
implementations.  (Whether those differences matter at all is
controversial.  We all agree they are not fatal.)

As for reading the code, please do!  I'm biased, but I think the BIND9
RRL code is more clear, concise, and unambigous than any natural
language text.  Writing English about what happens in all of the
cases would be hard work; reading that English would be worse.

> > The R in RRL stands for "response", and so rate limits should ignore
> > the question section as much as possible.
> > For non-empty, non-error, non-wildcard generated, non-referral
> > responses, the key is {class,qname,qtype,client IP block}.
>
> Okay, that makes sense, but contradicts the previous sentence.  I
> don't quite get how you can ignore the question section, but extract
> QNAME and QTYPE.

What do you suggest as concise, clear terms for the owner name and
primary RR type (e.g. neither RRSIG nor CNAME when the qtype is neither)
in the answer section of simple, non-error, non-referral, non-wildcard
responses?  "Qname" and "qtype" are only wrong in their connotations
or when the response is an error, referral, empty (NODATA), or
generated by a wildcard.

> I'm worried what happens if I send garbage <random>.example.com
> queries through a legitimate resolver, and how that would imapct
> (legitimate) queries for not-so-random.example.com.

I can't think of an answer would not seem fippant or condescending
to anyone who understands something about DNS.  It seems obvious
to count all NXDDOMAIN responses for that SOA domain the same and
different from non-error responses.  Won't the stream of requests
seen by the recursive and authorative resolvers differing only in
IP headers, transaction IDs, and RD=0/1 and the effects of the cache
in the recursive resolver?  Won't the responses seen by the stubb
and recursive resolvers differ additionally only in the effects of
RRL?  (Well, there are also the effects of packet losses, ACLs, etc.)

> > Recursive servers should generally not need RRL, because they
> > shouldn't be open and so needn't worry about reflection DDoS
> > attacks.
>
> On the other hand, accidental DoS of resolver service triggered by
> garbage queries from badly written clients is a problem, isn't it?
> It's not something RRL intends to solve, but I worry that it makes
> matters worse.

Please be more specific about garbage queries from badly written
clients that might trigger RRL and that are not intended to be DoS
attacks.  Are those queries for non-existent domains?

Some DNS clients such as browsers rendering pages with many <IMG> tags
and SMTP servers receiving bursts of spam might repeat single DNS
requests more than reasonable DNS reflection DoS attack mitigation
rates.  However, the mechanisms that ensure that victims of reflection
attacks continue to get some DNS service during attacks work even
better for such stuttering DNS clients.

For example, an SMTP server receiving 1000 spam from 10.2.3.4 might
make 1000 requests for 4.3.2.10.dnsbl.example.com.
With "rate-limit { responses-per-second 15; };" and if the spam are
sent sequentially:
 1. the first 15 responses for the first 15 spam will be sent immediately.
 2. the response to sixth query will be dropped and the SMTP server
     will time out perhaps 3 seconds and try again.
 3. the retransmitted query for the 6th-30th will be answered, because
    the response rate for 4.3.2.10.dnsbl.example.com will have dropped
    below 15.
 4. repeat from step #2 for the remaining spam.

If the 1000 spam are handled within 1 second or if a web browser is
rendering a page with 30 <IMG> tags for images on www.example.com, then
 1. the first 15 DNS responses are sent, half of the remaining responses
     are dropped, and the rest are "slipped" or answered with truncated (TC=1)
     responses.
 3. the DNS client immediately retries the slipped half over TCP.
     It times-out the dropped half, and in the worst case, retransmits
     in the same second.
 4. Th DNS server will answer none of the retransmissions, because
     the first burst of 1000 drove the count negative and we're
     still in the window.  It will "slip" half and drop other half.
 5. repeat from #3

For 30 concurrently, simplistically rendered <IMG> tags, there
will be 1 timeout.  If your web browser limits itself to fewer than
15 concurrent connections to any single HTTP server, it probably
won't be affected at all.

For 1000 concurrent spam, wouldn't it be wiser to rate limit your spammers?

Vernon Schryver    vjs at rhyolite.com