[dns-operations] Experiences with a post 2019 Flag Day Resolver

Mark Andrews marka at isc.org
Wed Sep 18 21:53:31 UTC 2019



> On 18 Sep 2019, at 10:13 pm, Hellqvist, Björn <bjorn.hellqvist at teliacompany.com> wrote:
> 
> Hi, 
>  
> For us nothing really interesting happened on the Flag day on our resolvers, since we did not update to a version that had removed these workarounds. Most of the calls to our help desk was regarding why their zones where marked with warnings. 
>  
> But we did encountered the problem with DNS cookies before the flag day, like 1.5 years earlier, when we introduced it to our platform. It took a while to figure out what happened since we only run 50% with it turned on and we had a load balancer in front. So it randomly worked and failed.
>  
> We concluded then that the world was not ready for DNS cookies. One that we contacted used a 14 year old Windows installation, they have now changed to have BIND on the fronting delegated DNS servers. 

Old versions of Windows DNS return FORMERR to DNS COOKIE which triggers fallback to plain DNS in named and should in everything else as too many old servers just echo back the request with rcode set to FORMERR and qr set to 1 rather than build a valid STD 13 FORMERR response (just the DNS header with rcode set to FORMERR and qr=1). Now that does require two round trips to get a answer but is no worse than the original introduction of EDNS which also
triggered FORMERR.  Unfortunately the echoed request FORMERR response is indistinguishable from a EDNS FORMERR response for a bad EDNS option as they both contain a OPT record.

Actual error rates by mode over time to EDNS queries with unknown EDNS options present <https://ednscomp.isc.org/compliance/ts/au.optfail.html>.  The 2019 DNS Flag cleaned out a lot of broken firewalls that block requests with EDNS options present.

The world is *never* ready for change.  You just have to go ahead and have the world catch up.

> We did also actually turn off DNS cookies to the Authoritative on the servers to get rid of it. 
>  
> BR,
> Bjorn Hellqvist
> Senior System Expert (Internet, DNS & Automation)
> Telia Company
> Solna, Sweden
>  
> From: dns-operations <dns-operations-bounces at dns-oarc.net> On Behalf Of Shumon Huque
> Sent: den 16 september 2019 20:21
> To: DNS Operations List <dns-operations at dns-oarc.net>
> Subject: [dns-operations] Experiences with a post 2019 Flag Day Resolver
>  
> Hi folks,
> 
> I haven't seen too much discussion here about operational experiences with post 2019 DNS Flag Day resolvers, so I thought I'd share ours. Would be interesting to hear from others on this topic.
> 
> We recently upgraded some of our resolvers to BIND 9.14.x. Soon after, we started getting complaints about numerous sites unable to be resolved (the response from the resolver to clients is SERVFAIL). We assumed this was related to post flag-day non-workarounds for broken authoritative servers. Hence, we expected these sites to also fail on software deployed by other flag day participants, namely Google Public DNS, Cloudflare, Quad9, OpenDNS, and recent versions of Unbound, PowerDNS, and Knot. But the sites resolved fine on all these platforms (actually I didn't get around to testing Knot yet, since the few minutes I devoted to figuring out how to use build tools unfamiliar to me like Ninja/Meson wasn't enough - will try to learn them later).
> 
> Some quick debugging revealed that this is because BIND sends outbound queries with DNS cookies by default, and none of the other implementations do. These non-resolving sites don't answer any queries with cookies. We tested sending a variety of other EDNS options to them, and the "non response" behavior is also the same. But all of them do respond to EDNS enabled queries containing no options.
> 
> (BIND of course has been using cookies by default in earlier versions too, but presumably they had the workaround behavior of retrying without them on non-response).
> 
> The proportion of these sites in comparison to the total population of zones that our resolvers talk to, is small, but not trivial. We have attempted to contact the zone owners in question as we discover them, pointing them to the various DNS compliance testing tools/sites. But this was getting to be burdensome enough that we ended up turning off outbound cookies ("send-cookie no;" in the global options).
> 
> Google Public DNS sends the EDNS Client Subnet option to authority servers that we run, and presumably to those broken servers too. We cannot observe the conversation between Google and the broken sites, but since they resolve, we assume that they might at least have a workaround to retry such sites without ECS (or maybe a dynamically maintained ECS blacklist is in use). Perhaps, a Google Public DNS operator can confirm or disconfirm this.
> 
> --
> Shumon Huque
>  
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: marka at isc.org





More information about the dns-operations mailing list