[dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

Puneet Sood puneets at google.com
Thu Apr 2 20:13:54 UTC 2020


Pasted wrong output above.

dig @8.8.4.4 shopdisney.co.uk

; <<>> DiG 9.11.5-P4-5.1+build2-Debian <<>> @8.8.4.4 shopdisney.co.uk
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15107
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shopdisney.co.uk.              IN      A

;; ANSWER SECTION:
shopdisney.co.uk.       299     IN      A       13.248.150.189
shopdisney.co.uk.       299     IN      A       76.223.18.1

;; Query time: 17 msec
;; SERVER: 8.8.4.4#53(8.8.4.4)
;; WHEN: Thu Apr 02 16:13:12 EDT 2020
;; MSG SIZE  rcvd: 77

On Thu, Apr 2, 2020 at 4:12 PM Puneet Sood <puneets at google.com> wrote:
>
> Hi Doug,
>
> Google Public DNS resolution is working now.
>
> Google Public DNS is “parent-centric”—meaning that it only uses the
> name servers that are returned in the referral responses from the
> parent zone name servers, and does not make NS queries to this child
> zone. So updating the parent delegation to include both NS sets will
> help with Google Public DNS resolution.
>
> -Puneet
>
> $ dig @8.8.4.4 shopdisney.co.uk ns
>
> ; <<>> DiG 9.11.5-P4-5.1+build2-Debian <<>> @8.8.4.4 shopdisney.co.uk ns
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8397
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 512
> ;; QUESTION SECTION:
> ;shopdisney.co.uk.              IN      NS
>
> ;; ANSWER SECTION:
> shopdisney.co.uk.       167     IN      NS      a12-66.akam.net.
> shopdisney.co.uk.       167     IN      NS      a18-64.akam.net.
> shopdisney.co.uk.       167     IN      NS      a28-65.akam.net.
> shopdisney.co.uk.       167     IN      NS      a1-127.akam.net.
> shopdisney.co.uk.       167     IN      NS      a9-66.akam.net.
> shopdisney.co.uk.       167     IN      NS      a13-67.akam.net.
>
> ;; Query time: 20 msec
> ;; SERVER: 8.8.4.4#53(8.8.4.4)
> ;; WHEN: Thu Apr 02 16:08:57 EDT 2020
> ;; MSG SIZE  rcvd: 178
>
> On Thu, Apr 2, 2020 at 4:01 PM Doug Barton <dougb at dougbarton.email> wrote:
> >
> > Howdy,
> >
> > I redelegated shopdisney.co.uk this morning. I can see that all of the
> > Nominet authorities are returning the correct new NS set, however I have
> > a number of reports of resolution failures. There are resolvers from
> > OpenDNS, Google, Virgin, O2, and others that are not finding any name
> > servers at all, and refusing to re-query. This is causing address record
> > resolution failures for users behind those resolvers.
> >
> > What is odd to me is that earlier this week we cross-pollinated the old
> > and new zone files with both the old and new sets of name servers. I
> > have seen situations in the past where cutting cleanly from one set of
> > name servers to a completely different set has caused problems, so we
> > take this extra step of updating the zones so that no matter what point
> > in the process we're at the resolving name servers will always have at
> > least one good set to query. It's always worked for me in the past.
> >
> > What's even more strange is that we also did shopdisney.it this morning,
> > having done the same preparation, and it's solid as a rock. It's only
> > the CO.UK name that is failing. When querying OpenDNS or Google directly
> > I get the same result when it fails:
> >
> > dig @8.8.4.4 shopdisney.co.uk ns
> > ; <<>> DiG 9.10.6 <<>> @8.8.4.4 shopdisney.co.uk ns
> > ; (1 server found)
> > ;; global options: +cmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1587
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> >
> > ;; OPT PSEUDOSECTION:
> > ; EDNS: version: 0, flags:; udp: 512
> > ;; QUESTION SECTION:
> > ;shopdisney.co.uk.              IN      NS
> >
> > ;; Query time: 501 msec
> > ;; SERVER: 8.8.4.4#53(8.8.4.4)
> > ;; WHEN: Thu Apr 02 12:28:46 PDT 2020
> > ;; MSG SIZE  rcvd: 45
> >
> > The flags are the same for the OpenDNS servers.
> >
> > Has anyone seen this happen before? I've seen plenty of cases where
> > resolvers have hung onto the old NS set for too long (following the
> > parent TTL instead of the child), which is why I have been adding both
> > sets of name servers to both zones in advance of the redelegation. But I
> > have literally never seen a case where a resolver not only has no NS
> > records, but also will not re-query.
> >
> > My first thought was that Nominet withdrew the delegation for a short
> > period, and the resolvers have a negative cache entry, but when doing
> > the UAT this morning I happened to catch the exact point at which they
> > changed. In serial number 1308977661 they had the old NS set, and in
> > 1308977662 they had the new one. So that doesn't seem to be the problem.
> >
> > If anyone from OpenDNS and/or Google can take a look at a resolver that
> > is failing for shopdisney.co.uk and tell me what's in the logs I would
> > deeply appreciate it. Since I can't figure out what happened, I'm not
> > sure how to mitigate it for the next change.
> >
> > In the past I've taken the intermediate step of also updating the parent
> > delegation to include both NS sets, which I plan to do for the next set
> > of updates just to be on the safe side, but given this fun new failure
> > mode it's not clear to me that even doing that will insulate us.
> >
> > Any thoughts/help/advice welcome,
> >
> > Doug
> > _______________________________________________
> > dns-operations mailing list
> > dns-operations at lists.dns-oarc.net
> > https://lists.dns-oarc.net/mailman/listinfo/dns-operations




More information about the dns-operations mailing list