[dns-operations] rate-limiting state

Fri Feb 7 20:51:10 UTC 2014

Colm MacCárthaigh wrote:
>
> On Fri, Feb 7, 2014 at 9:35 AM, Paul Vixie <paul at redbarn.org
> <mailto:paul at redbarn.org>> wrote:
>
>     Colm MacCárthaigh wrote:
>
>
>
>  > Now if I have a botnet or client that can generate 1M PPS (this is
>
>     > small, but adjust to any number), I can try to spoof 66,666 popular
>     > resolvers (this is a knowable set) at 5 QPS each to 3 auth
>     servers, I
>     > can use RRL to degrade service in a more widespread way.
>     >
>     > Now, let's say you have the capacity to answer these queries
>     (which is
>     > realistic for some) which behavior is better for your users? Just
>     > answering the responses? Or rate-limiting the responses?
>
>     Rate limiting is always better, given that recursive servers will
>     retry,
>     will act on TC=1, and will stop asking once they cache the result.
>
>
> Just to be clear; you're saying it's better, for your legitimate
> users, only to answer their queries probabilistically? I agree that
> they'll retry, and with 2 retries they'll even get a TC=1 87.5% of the
> time. But I have to consider that some kind of degradation of service.
> For the 12% who got no answer, do you consider that some kind of
> degradation?

since you've asked for clarity, let me provide it as follows.

for the case of an attack against the name server itself, not a
reflection victim via ddos but a pool of response-starvation victims via
a logic attack on RRL itself, it is theoretically worse for the
response-starvation victims to have RRL deployed. i say "theoretically"
because the impact would be (a) exceedingly brief, (b) exceedingly
narrow, (c) not user-visible, and must be (d) exquisitely and
expensively well targeted. in this one corner case, my statement "always
better" is wrong.

for the case of an attack against a reflection victim via name server
reflected DDoS, the impact on response-starvation victims due to the RRL
logic, will be no worse than the impact of not having RRL, and if the
attack is large, it will be better with RRL than without. therefore my
statement "always better" should have been written "never worse" and i
apologize.

> ... If I use RRL, my user queries can be degraded. And that is user
> visible, including to stubs, even with caching. If you cause a caching
> resolver to delay or timeout lookups that does hold up and impact stubs.

i any delay at all is to you "user visible" even though once cached the
same delay won't reoccur within a DNS TTL interval, then so be it. DNS
RRL is a DNS-specific rate limiter which relies on retries, TC=1
behaviour, and caching -- by design, mind you -- for its success. i'm
not merely splitting hairs here -- in my own testing, the only way i
could cause a stub lookup failure, noting that a stub tries multiple
recursive servers and will retry to each, was to send enough attack
traffic toward all of that stub's recursives to also cause failures on
unrelated names. in that latter case, it made no difference whether RRL
was off or on, because the attack was on the recursive name server's
resources, not the RRL logic. so, if you know a way to reliably cause
targeted stub query failures using an attack that only works with RRL
turned on, i'd like to see your demonstration.

vixie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20140207/04350a5d/attachment.html>