[dns-operations] More Aggressive prefetch for popular names
samwu at dnspod.com
Mon Apr 8 17:46:50 UTC 2019
I can share some stories.
There’re many ISPs in China, some bigger, like China Telecom / China Unicom, some smaller, like Great Wall boardband / Topway. In China, traffic charge between ISPs was extremely high, sometimes it could be a weapon to combating competitors. So, DNS hijack/tampering/spoofing is a very common method in China. ISP setup CDN/Cache Server in their own network, tampering DNS response, returning larger TTL, hijack traffic to their CDN, to avoid expensive bills (pay to other ISPs).
Sometimes, ISP’s Recursive DNS may encounter network outage, or problems to communicate to Authoritative DNS (usually Authoritative DNS under attack). ISP, or Authoritative DNS themself, will returning a larger TTL to client, to ensure clients can access to website even if the DNS Server down. When the server come back, they need to refresh Recursive’s cache. It happened every week like this case.
2010, baidu.com has been stolen, hacker change DNS records, it took days to flush Recursive caches.
Yes, of course, ISPs provide service to flush caches because high demands. And of course, not cheap.
> 在 2019年4月8日，下午10:05，Paul Hoffman <phoffman at proper.com> 写道：
> It sounds like you are saying "some resolver operators extend the TTLs given in DNS responses, and this causes problems". If so, we agree.
> If, however, you are saying "some resolver operators extend the TTLs given in DNS responses, and this causes problems that the rest of the DNS community should solve", we disagree. The customers of that resolver will have problems like the one you listed, and that is fully the fault of the resolver operator.
> On 7 Apr 2019, at 21:26, Davey Song wrote:
>> Local resolver has policy/strategy to set a larger TTL to reduce the
>> upstream traffic, in order to increase the cache hit rate and response
>> time. Some times, local resolver has policy to serve stale data in case of
>> network failure after TTL timeout. There may be others situation cause the
>> cache serve stale data.
> This is a misconfiguration on the part of the resolver operator.
>> If any intentional operation, or software bug, or manual misconfiguration
>> on resolver will cause the serve-stale situation which will become a
>> problems for names changing their records like NS, A/AAAA during the period
>> of stale data in the cache but not others keep unchanged.
>> The recent event happened last week was a name of CCTV VOD services, people
>> call in complaining they can not open the video. It was found that in Gang
>> Zhou City, the DNS of a local broadband service provider served stale data
>> for that name for hours. It is not clear which conflict or bug make the
>> trouble, but the fact is cache of that local ISP and downstream forwarder's
>> cache got impact. It takes time to purge that cache.
> Exactly right. That resolver operator should investigate the bug (most likely in their configuration, possibly in the unnamed resolver software they are using, and prevent it from happening in the future.
>> I did it in the above. It does not sound like an exaggeration, I think. If
>> you are talking with CDN/Cloud people, this is a typical operation issue
>> they need to face.
> It is not "typical": we rarely hear of this problem.
>> No. DNS in ISP and Teleco did something wrong.
> One which can be fixed.
>> As you are one author of
>> DOH, you must konw how name owners want to bypass the DNS in the middle.
> This makes no sense. DoH was developed so that applications can use resolvers securely using HTTP semantics; that's completely unrelated to what you have said above.
> --Paul Hoffman
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> dns-operations mailing list
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3916 bytes
Desc: not available
More information about the dns-operations