[dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

Doug Barton dougb at dougbarton.email
Thu Apr 2 20:19:08 UTC 2020


Thank you for the response. I think it is dependent on the node, since I 
still see it failing sometimes:

dig @8.8.4.4 shopdisney.co.uk ns

; <<>> DiG 9.10.6 <<>> @8.8.4.4 shopdisney.co.uk ns
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 48385
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shopdisney.co.uk.		IN	NS

;; Query time: 290 msec
;; SERVER: 8.8.4.4#53(8.8.4.4)
;; WHEN: Thu Apr 02 13:18:04 PDT 2020
;; MSG SIZE  rcvd: 45


On 2020-04-02 13:13, Puneet Sood wrote:
> Pasted wrong output above.
> 
> dig @8.8.4.4 shopdisney.co.uk
> 
> ; <<>> DiG 9.11.5-P4-5.1+build2-Debian <<>> @8.8.4.4 shopdisney.co.uk
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15107
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 512
> ;; QUESTION SECTION:
> ;shopdisney.co.uk.              IN      A
> 
> ;; ANSWER SECTION:
> shopdisney.co.uk.       299     IN      A       13.248.150.189
> shopdisney.co.uk.       299     IN      A       76.223.18.1
> 
> ;; Query time: 17 msec
> ;; SERVER: 8.8.4.4#53(8.8.4.4)
> ;; WHEN: Thu Apr 02 16:13:12 EDT 2020
> ;; MSG SIZE  rcvd: 77
> 
> On Thu, Apr 2, 2020 at 4:12 PM Puneet Sood <puneets at google.com> wrote:
>> 
>> Hi Doug,
>> 
>> Google Public DNS resolution is working now.
>> 
>> Google Public DNS is “parent-centric”—meaning that it only uses the
>> name servers that are returned in the referral responses from the
>> parent zone name servers, and does not make NS queries to this child
>> zone. So updating the parent delegation to include both NS sets will
>> help with Google Public DNS resolution.
>> 
>> -Puneet
>> 
>> $ dig @8.8.4.4 shopdisney.co.uk ns
>> 
>> ; <<>> DiG 9.11.5-P4-5.1+build2-Debian <<>> @8.8.4.4 shopdisney.co.uk 
>> ns
>> ; (1 server found)
>> ;; global options: +cmd
>> ;; Got answer:
>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8397
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
>> 
>> ;; OPT PSEUDOSECTION:
>> ; EDNS: version: 0, flags:; udp: 512
>> ;; QUESTION SECTION:
>> ;shopdisney.co.uk.              IN      NS
>> 
>> ;; ANSWER SECTION:
>> shopdisney.co.uk.       167     IN      NS      a12-66.akam.net.
>> shopdisney.co.uk.       167     IN      NS      a18-64.akam.net.
>> shopdisney.co.uk.       167     IN      NS      a28-65.akam.net.
>> shopdisney.co.uk.       167     IN      NS      a1-127.akam.net.
>> shopdisney.co.uk.       167     IN      NS      a9-66.akam.net.
>> shopdisney.co.uk.       167     IN      NS      a13-67.akam.net.
>> 
>> ;; Query time: 20 msec
>> ;; SERVER: 8.8.4.4#53(8.8.4.4)
>> ;; WHEN: Thu Apr 02 16:08:57 EDT 2020
>> ;; MSG SIZE  rcvd: 178
>> 
>> On Thu, Apr 2, 2020 at 4:01 PM Doug Barton <dougb at dougbarton.email> 
>> wrote:
>> >
>> > Howdy,
>> >
>> > I redelegated shopdisney.co.uk this morning. I can see that all of the
>> > Nominet authorities are returning the correct new NS set, however I have
>> > a number of reports of resolution failures. There are resolvers from
>> > OpenDNS, Google, Virgin, O2, and others that are not finding any name
>> > servers at all, and refusing to re-query. This is causing address record
>> > resolution failures for users behind those resolvers.
>> >
>> > What is odd to me is that earlier this week we cross-pollinated the old
>> > and new zone files with both the old and new sets of name servers. I
>> > have seen situations in the past where cutting cleanly from one set of
>> > name servers to a completely different set has caused problems, so we
>> > take this extra step of updating the zones so that no matter what point
>> > in the process we're at the resolving name servers will always have at
>> > least one good set to query. It's always worked for me in the past.
>> >
>> > What's even more strange is that we also did shopdisney.it this morning,
>> > having done the same preparation, and it's solid as a rock. It's only
>> > the CO.UK name that is failing. When querying OpenDNS or Google directly
>> > I get the same result when it fails:
>> >
>> > dig @8.8.4.4 shopdisney.co.uk ns
>> > ; <<>> DiG 9.10.6 <<>> @8.8.4.4 shopdisney.co.uk ns
>> > ; (1 server found)
>> > ;; global options: +cmd
>> > ;; Got answer:
>> > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1587
>> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
>> >
>> > ;; OPT PSEUDOSECTION:
>> > ; EDNS: version: 0, flags:; udp: 512
>> > ;; QUESTION SECTION:
>> > ;shopdisney.co.uk.              IN      NS
>> >
>> > ;; Query time: 501 msec
>> > ;; SERVER: 8.8.4.4#53(8.8.4.4)
>> > ;; WHEN: Thu Apr 02 12:28:46 PDT 2020
>> > ;; MSG SIZE  rcvd: 45
>> >
>> > The flags are the same for the OpenDNS servers.
>> >
>> > Has anyone seen this happen before? I've seen plenty of cases where
>> > resolvers have hung onto the old NS set for too long (following the
>> > parent TTL instead of the child), which is why I have been adding both
>> > sets of name servers to both zones in advance of the redelegation. But I
>> > have literally never seen a case where a resolver not only has no NS
>> > records, but also will not re-query.
>> >
>> > My first thought was that Nominet withdrew the delegation for a short
>> > period, and the resolvers have a negative cache entry, but when doing
>> > the UAT this morning I happened to catch the exact point at which they
>> > changed. In serial number 1308977661 they had the old NS set, and in
>> > 1308977662 they had the new one. So that doesn't seem to be the problem.
>> >
>> > If anyone from OpenDNS and/or Google can take a look at a resolver that
>> > is failing for shopdisney.co.uk and tell me what's in the logs I would
>> > deeply appreciate it. Since I can't figure out what happened, I'm not
>> > sure how to mitigate it for the next change.
>> >
>> > In the past I've taken the intermediate step of also updating the parent
>> > delegation to include both NS sets, which I plan to do for the next set
>> > of updates just to be on the safe side, but given this fun new failure
>> > mode it's not clear to me that even doing that will insulate us.
>> >
>> > Any thoughts/help/advice welcome,
>> >
>> > Doug
>> > _______________________________________________
>> > dns-operations mailing list
>> > dns-operations at lists.dns-oarc.net
>> > https://lists.dns-oarc.net/mailman/listinfo/dns-operations


More information about the dns-operations mailing list