[dns-operations] More Aggressive prefetch for popular names
phoffman at proper.com
Mon Apr 8 03:14:02 UTC 2019
On 7 Apr 2019, at 19:03, Davey Song wrote:
>> The "popular sites" you mention have all done this already. They also
>> tend to use services like Akamai, which use short TTLs, dynamic
>> and CDNs which limit the types of damage that you are describing.
>> I missed one case in the "outage of popular names during the TTL ".
>> It is
> that the short DNS TTL of CDN ,5 minutes for example, will be
> ignored and changed by resolver operators up to 2-3 hours due to some
> policy conflicts.
Please describe these "policy conflicts", and how they appear for some
names but not for others.
> It occurred one or twice in a month observed in one large
> CDN operator I'm familiar with. I'm not sure how Akamai or Cloudflare
> handle this, but it happens every month, people are suffering.
Please describe who is suffering, and how they suffer. (It feels like
this could be an exaggeration.)
> It is partially due to different interest of recursive/authoritative
> operators and loosely coordination between them as people mentioned.
> But I
> also observed that resolver operators have motivation and tools to set
> policy of a minimum TTL or a larger TTL . They care more about the
> of cache miss than rate of serving stale data. Normally they are
> cooperative if they receive a call and notice the conflicts for
> names case by case, but there seems no automatic approach set before
> event between resolver and authoritative operators.
As the previous messages from others have said, the automatic approach
is the setting of TTLs.
>> We have to get out of the mindset that it's our job to fix someone
>> else's mistakes.
> Mistakes of both resolver and authoritaive servers are observed. I'm
> writing this not asking to add more straw on the camel. I just would
> to konw any best practice on this issue on this mailing list. Or it is
> nothing but other people's problem?
It is the problem of the authoritative servers when they guess wrong
about their TTLs, and then they learn to guess better in the future.
That's not "other people": it is all of us. Where you are getting
pushback is by trying to fix other people's problems in ways that make
the system more fragile.
More information about the dns-operations