[dns-operations] rate-limiting state
colm at stdlib.net
Fri Feb 7 17:50:47 UTC 2014
On Fri, Feb 7, 2014 at 9:35 AM, Paul Vixie <paul at redbarn.org> wrote:
> Colm MacCárthaigh wrote:
> > ... But [RRL] also increases the collateral damage to your legitimate
> > users. If an attacker spoofs a popular resolver, then for just 5-10
> > queries per second they cause a degradation in service to the real
> > legitimate users of that resolver. With the default settings, 12.5% of
> > queries from that resolver may not get any answer at all, even with
> > three attempts, and the lookup time is increased by about 1.3 RTTs on
> > average. With the resolver trying 3 authoritative nameservers, the
> > availability hit diminishes to about 0.2% (which brings us to two
> > nines), but the RTT hit gets worse.
> You're calling it damage for some reason, even though it's not
> stub-visible. RRL does of course rely on retries and TC=1 and other
> known recursive-dns behaviour. That's because RRL is a protocol-specific
> method. A non-DNS protocol would probably call for a different method.
> If we take your 30% average RTT impact to heart, we've moving the needle
> for a stub transaction from 20ms to 25ms. I'm unwilling to call that
> damage, collateral or otherwise.
I got the RTT impact wrong by forgetting that TCP would itself take extra
round-trips. It's closer to 2.5x if I account for that. If you have a 20ms
average RTT as a base-case, you're probably using anycast at a bunch of
datacenters and should deploy more sophisticated techniques. I'm more
worried about RRL as the default for the small implementors who are in a
relatively small number of locations.
> Now if I have a botnet or client that can generate 1M PPS (this is
> > small, but adjust to any number), I can try to spoof 66,666 popular
> > resolvers (this is a knowable set) at 5 QPS each to 3 auth servers, I
> > can use RRL to degrade service in a more widespread way.
> > Now, let's say you have the capacity to answer these queries (which is
> > realistic for some) which behavior is better for your users? Just
> > answering the responses? Or rate-limiting the responses?
> Rate limiting is always better, given that recursive servers will retry,
> will act on TC=1, and will stop asking once they cache the result.
Just to be clear; you're saying it's better, for your legitimate users,
only to answer their queries probabilistically? I agree that they'll retry,
and with 2 retries they'll even get a TC=1 87.5% of the time. But I have to
consider that some kind of degradation of service. For the 12% who got no
answer, do you consider that some kind of degradation?
> My overall point is that with RRL there is some trade-off between
> > protecting innocent reflection victims and opening yourself to an
> > attack that degrade service to your real users in some way. Were RRL
> > to be widely deployed, attacks could shift to table-exhaustion and
> > popular-resolver spoofing and be effective in different ways.
> There is no operable trade-off of the kind you're proposing. RRL makes
> everyone's life better except the attackers, in all cases. The "degrade"
> you're describing is far better than the non-RRL case, and is in any
> case not user-visible.
If I answer all of the responses, that is 100% non user-visible. All of my
user-facing queries get answers. If I use RRL, my user queries can be
degraded. And that is user visible, including to stubs, even with caching.
If you cause a caching resolver to delay or timeout lookups that does hold
up and impact stubs.
> Are you criticizing RRL for using the known
> behaviour of recursive servers (retrying, respecting TC=1, ceasing to
> ask once an answer is obtained) deliberately to increase resiliency?
No, I think that's smart. Responding with TC=1 all of the time would make
me a little more comfortable with the impact, though like you I would then
question its efficacy for reflections.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dns-operations