[dns-operations] More Aggressive prefetch for popular names

Mon Apr 8 21:20:31 UTC 2019

On 6 Apr 2019, at 10:39, Davey宋 <ljsong at biigroup.cn> wrote:

> You are right in normal case. But name owners may want change their TTL under some events promptly, setting lager TTL in case of ddos attack, small TTL if they want their RR be freshed in urgent situation. But each changes need to wait previous TTL to timeout in the current DNS TTL context.

I think if you have a strong business requirement for rapid changes to particular RRSets, you set a low TTL on those RRSets.

The authoritative DNS services of my acquaintance are provisioned orders of magnitude beyond the steady state capacity required to handle queries in order to have an abundance of headroom to deal with junk. Under high load, the non-junk is in the noise floor and increasing the non-junk, even by a factor of (say) ten, is not going to have a substantive impact on available capacity.

> AFAIK, in real event the name owners need to call the ISP DNS operators to manually fresh their cache. ISP is willing to cooperate for popular names because they care about the users' experince (people will call in and complain during the TTL). If there is a widely accepted approach to notify them to fetch the changes , I think they are willing to hear.

It is true in my experience that from time to time someone has a bad day and getting resolvers to flush their caches for particular names is a way to make the aftermath less hellish. For most people, though, that's a self-limiting problem; you lower your TTLs after the time, or remove root privileges for the person that did that, or something.

I do wonder about the potential for gaming "popular" in your scheme. If I can persuade a large number of resolvers that a particular domain is popular, then I might be able to increase the query rate from all of those resolvers towards the authoritative servers for that domain significantly.

Imagine a rarely-used zone hosted on a set of nameservers that have limited connectivity (but, demonstrably, sufficient based on years of experience). I use an ad network or a botnet (or a set of open resolvers that I query directly) to send regular queries for names in that domain through 100,000 open resolvers. Many of them will be answered from the cache until the resolvers all decide that this name is popular, at which point they ignore the 604,800 second TTLs and start sending queries every 30 seconds. 100,000 resolvers will now send queries 30 times per second to servers whose normal average query rate might be 30 times per hour, and they will continue to do so until they stop believing the name is popular. Now multiply that by 10,000,000 rarely-used domains and wonder what happens to the capacity planning for resolvers.

I'm not sure I agree that there is a problem here to be solved, but I do think there might well be a problem waiting to be caused.

Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20190408/6f4803bd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20190408/6f4803bd/attachment.sig>