[dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

Brian Somers bsomers at opendns.com
Thu Apr 2 20:49:55 UTC 2020


I’ve flushed shopdisney.co.uk/NS globally.  Should work now for Umbrella/OpenDNS/Cisco

> On Apr 2, 2020, at 1:36 PM, Brian Somers <bsomers at OpenDNS.com> wrote:
> 
> This is what I see with diagnostics turned up:
> 
> $ dig +bufsize=16384 +cd +dnssec shopdisney.co.uk @test-resolver
> ....
> shopdisney.co.uk.       0       IN      TXT     "shopdisney.co.uk categorization: None"
> shopdisney.co.uk.       0       IN      TXT     "cache_get shopdisney.co.uk/A: ttl=0 cache_flags=0x0 (NOTFOUND)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get shopdisney.co.uk/NS: prefixlen=0 ttl=87746 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "RESOLVER: shopdisney.co.uk IN NS ns1.disneyinternational.net"
> shopdisney.co.uk.       0       IN      TXT     "RESOLVER: shopdisney.co.uk IN NS ns2.disneyinternational.net"
> shopdisney.co.uk.       0       IN      TXT     "RESOLVER: shopdisney.co.uk IN NS ns3.disneyinternational.net"
> shopdisney.co.uk.       0       IN      TXT     "RESOLVER: shopdisney.co.uk IN NS ns4.disneyinternational.net"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns3.disneyinternational.net/A: prefixlen=0 ttl=27772 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns3.disneyinternational.net/AAAA: prefixlen=0 ttl=27772 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns4.disneyinternational.net/A: prefixlen=0 ttl=27772 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns4.disneyinternational.net/AAAA: ttl=0 cache_flags=0x0 (NOTFOUND)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns1.disneyinternational.net/A: prefixlen=0 ttl=27772 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns1.disneyinternational.net/AAAA: prefixlen=0 ttl=27772 cache_flags=0x1 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns2.disneyinternational.net/A: prefixlen=0 ttl=27772 cache_flags=0x2241 (HAVEDATA)"
> shopdisney.co.uk.       0       IN      TXT     "cache_get ns2.disneyinternational.net/AAAA: ttl=0 cache_flags=0x0 (NOTFOUND)"
> shopdisney.co.uk.       0       IN      TXT     "tx shopdisney.co.uk/A clientsubnet=enabled zone=shopdisney.co.uk level=0"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to [2001:500:94:1::144]:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 2001:500:94:1::144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to 208.78.70.144:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 208.78.70.144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to 204.13.250.144:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 204.13.250.144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to [2001:500:90:1::144]:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 2001:500:90:1::144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to 204.13.251.144:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 204.13.251.144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "Sending EDNS0 with bufsize 1410 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "Sending 45 bytes to 208.78.71.144:53 using UDP, timeout 350ms"
> shopdisney.co.uk.       0       IN      TXT     "Received EDNS0 with bufsize 4096 and the DO bit"
> shopdisney.co.uk.       0       IN      TXT     "AUTH server 208.78.71.144 returned REFUSED - abandoned"
> shopdisney.co.uk.       0       IN      TXT     "No authoritative answers for shopdisney.co.uk/A"
> ....
> shopdisney.co.uk.       0       IN      TXT     "servfail shopdisney.co.uk/A”
> ….
> 
> HTH
>> Brian
> 
>> On Apr 2, 2020, at 12:56 PM, Doug Barton <dougb at dougbarton.email> wrote:
>> 
>> Howdy,
>> 
>> I redelegated shopdisney.co.uk this morning. I can see that all of the Nominet authorities are returning the correct new NS set, however I have a number of reports of resolution failures. There are resolvers from OpenDNS, Google, Virgin, O2, and others that are not finding any name servers at all, and refusing to re-query. This is causing address record resolution failures for users behind those resolvers.
>> 
>> What is odd to me is that earlier this week we cross-pollinated the old and new zone files with both the old and new sets of name servers. I have seen situations in the past where cutting cleanly from one set of name servers to a completely different set has caused problems, so we take this extra step of updating the zones so that no matter what point in the process we're at the resolving name servers will always have at least one good set to query. It's always worked for me in the past.
>> 
>> What's even more strange is that we also did shopdisney.it this morning, having done the same preparation, and it's solid as a rock. It's only the CO.UK name that is failing. When querying OpenDNS or Google directly I get the same result when it fails:
>> 
>> dig @8.8.4.4 shopdisney.co.uk ns
>> ; <<>> DiG 9.10.6 <<>> @8.8.4.4 shopdisney.co.uk ns
>> ; (1 server found)
>> ;; global options: +cmd
>> ;; Got answer:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1587
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
>> 
>> ;; OPT PSEUDOSECTION:
>> ; EDNS: version: 0, flags:; udp: 512
>> ;; QUESTION SECTION:
>> ;shopdisney.co.uk.		IN	NS
>> 
>> ;; Query time: 501 msec
>> ;; SERVER: 8.8.4.4#53(8.8.4.4)
>> ;; WHEN: Thu Apr 02 12:28:46 PDT 2020
>> ;; MSG SIZE  rcvd: 45
>> 
>> The flags are the same for the OpenDNS servers.
>> 
>> Has anyone seen this happen before? I've seen plenty of cases where resolvers have hung onto the old NS set for too long (following the parent TTL instead of the child), which is why I have been adding both sets of name servers to both zones in advance of the redelegation. But I have literally never seen a case where a resolver not only has no NS records, but also will not re-query.
>> 
>> My first thought was that Nominet withdrew the delegation for a short period, and the resolvers have a negative cache entry, but when doing the UAT this morning I happened to catch the exact point at which they changed. In serial number 1308977661 they had the old NS set, and in 1308977662 they had the new one. So that doesn't seem to be the problem.
>> 
>> If anyone from OpenDNS and/or Google can take a look at a resolver that is failing for shopdisney.co.uk and tell me what's in the logs I would deeply appreciate it. Since I can't figure out what happened, I'm not sure how to mitigate it for the next change.
>> 
>> In the past I've taken the intermediate step of also updating the parent delegation to include both NS sets, which I plan to do for the next set of updates just to be on the safe side, but given this fun new failure mode it's not clear to me that even doing that will insulate us.
>> 
>> Any thoughts/help/advice welcome,
>> 
>> Doug
>> _______________________________________________
>> dns-operations mailing list
>> dns-operations at lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 





More information about the dns-operations mailing list