[dns-operations] omnibus reply (Re: solutions for DDoS mitigation of DNS)
Paul Vixie
paul at redbarn.org
Thu Apr 2 20:03:41 UTC 2020
there has been quite a bit of factual confusion on this thread while i slept;
so much so that i can't really figure out where to chime in most usefully. so
i'll answer three questions which seem most pertinent, choosing the best
example of each question from the thread before me.
---
first: On Thursday, 2 April 2020 12:54:41 UTC Tessa Plum wrote:
> On 2020/4/2 5:39 下午, Ray Bellis wrote:
> > If it's an authoritative server, turn on Response Rate Limiting (RRL) if
> > it's BIND, or the equivalent feature if is isn't.
>
> Yes they are authoritative servers.
> Does RRL work based on IP addr? but the requesting IP seems spoofed.
the authoritative reference to all things DNS RRL related is here:
http://www.redbarn.org/dns/ratelimits
which refers to this document:
http://family.redbarn.org/~vixie/isc-tn-2012-1.txt
which answers your question as follows:
> ISC-TN-2012-1-Draft1 DNS Response Rate Limiting April-2012
>
> 3 - Responder Behaviour
>
> 3.1. When generating a response, a server will take the requestor's IP
> address and mask it according to either IPV4-PREFIX-LENGTH or
> IPV6-PREFIX-LENGTH, and then impute a domain name which is either a
> wildcard name (if a wildcard match occurred) or the zone name (if no
> match occurred) or the query name, and a boolean error indicator (was
> the response code REFUSED, FORMERR or SERVFAIL, or was it not?), and use
> this tuple <mask(IP), imputed(NAME), errorstatus> to select a state
> blob, creating this if necessary.
>
> 3.2. If the selected state blob indicates that this response has been
> sent too often to requestors on this network, then consider whether to
> send a truncated response, or a leaked response, or no response. In any
> case increment a counter to indicate that the response has been
> considered.
>
> 3.3. When a state blob's age goes over WINDOW, and its counter has not
> been incremented within WINDOW, then discard the state blob.
>
> 3.4. In the event that the creation of a new state blob would cause the
> table to exceed MAX-TABLE-SIZE, the least recently used state blob
> should be discarded.
>
> 3.5. Noting: Conceptually speaking, a state blob is either filling,
> full, or draining. To be filling means that the rate limit has not been
> exceeded. To be full means that the rate limit has been exceeded. To be
> draining means that the rate limit was once exceeded and the rate has
> not yet returned to zero.
the document is short, and worthy of reading or re-reading.
---
second, On Thursday, 2 April 2020 10:22:21 UTC Jim Reid wrote:
> > On 2 Apr 2020, at 11:10, Davey Song <songlinjian at gmail.com> wrote:
> > I'm very confused that why people on the list are suggesting RRL (even
> > BCP38) to the victim of DoS attack? If I remember correctly, the goal of
> > both RRL and BCP38 is to reduce the chance of participating the attack as
> > a innocent helper.
> RRL won’t help with the volume of incoming queries. It will however reduce
> the volume of outgoing responses which may well be DoS’ing another innocent
> victim.
this is true as far as it goes, but does not go far enough. some attacks are
against distant victims whose source-ip's are therefore spoofed, and in that
case the source-ip's are in a narrow band and DNS RRL will prevent an
authority server from participating as an amplifier of that attack.
other attacks are against the authority server itself, and so the spoofed-
source IP addresses are somewhat irrelevant; they do not identify a victim,
they are merely randomized in order to hide the identity of the attacker. in
this case the attack is against the authority server's capacity, which can be
seen as three critical resources: inbound network path, outbound network path,
and server CPU.
if the attack is large enough to congest your inbound network path, then your
only fix is to add more servers in other locations (having different inbound
network paths.) you may also want to consider a service like "akamai
cleanfeed" which can, with your cooperation, advertise via global BGP the
address of your attacked servers, and route that traffic through a "scrubbing
center". akamai has competitors in this arena, but they're the one i know
well.
if the attack is not large enough to congest your inbound network path, then
it may be possible to protect your outbound network path or your CPU using
some kind of filtering. DNS RRL is an example of such filtering. by not
answering predicted-to-be-spoofed queries, you save the CPU time used to
assemble responses, and you save the outbound network capacity needed to
transmit such responses.
some have asked, isn't this a trivial obstruction that a correctly functioning
attacker can bypass with creative randomness in their spoofed-source IP
address generator? and the answer is, "yes that's usually true". however, not
all attackers function correctly in this regard, and of those who do, there is
a maximum number of /24 (or /56) flow buckets they can use, which at 20 Gb/s
in IPv4 requires reuse, which will lead DNS RRL to attenuate non-victim flows.
in IPv6 the number of possible buckets is far greater than DNS RRL's state
capacity and so in that case you'll need something like strict-mode uRPF,
which requires a full routing table (no default route) so that you can drop
all packets whose source address isn't covered by an explicit BGP path. this
way lies madness, and if you're up against a correctly functioning attacker,
you'll lose more often than you'll win, and you'll need a "scrubbing service"
that can take over the advertisement of your network's reachability during
times of DDoS, and which does not require you to host your servers elsewhere.
---
third and last, On Thursday, 2 April 2020 11:37:36 UTC Klaus Darilion wrote:
> ... It is not that I
> argue against rate limiting, but that admins should be aware when it
> actually helps, and when not. ...
>
> We also used rate limiting with dnsdist, but due to the mentioned
> problems we switched to high performance backends for the zones which
> are under constant attack.
there is never a time when DNS RRL won't help, but it may not be _enough_.
DNS RRL should be the default for all authority servers, subject to tuning,
but never requiring knowledge or action by operators.
if you turn on DNS RRL on an authority server that you didn't think was being
abused or attacked, you will see a drop in your egress traffic.
turn it on and keep it on. use the default recommended settings unless you're
interested in operational research.
once that's been done, solve whatever problems you still have, along the lines
i explained last night:
* subscribe to a "DDoS scrubbing service"
* add more network capacity
* use local anycast to increase the per-logical-server capacity
* add more secondary servers
open source DNS software and OSPF ECMP is adequate here, you do not need a
commercial load balancer nor a commercial DNS appliance.
again, DNS RRL has no downside. i hereby call upon all DNS vendors to make it
their default.
--
Paul
More information about the dns-operations
mailing list