[dns-operations] Akamai now works with ENT (Empty Non-Terminals)?
jreed at akamai.com
Sun Apr 14 13:59:33 UTC 2019
On Sat, 13 Apr 2019, Shumon Huque wrote:
> I do remember hearing from Tale about operational problems that were
> preventing Akamai from deploying their Empty Non-Terminal response fix,
> but I must admit I never got around to learning the details. Thanks for
> the detailed explanation you provided at this link. (I'm on that list,
> but somehow your note evaded my notice.)
The problem was specifically around the interaction between wildcards and
ENTs. Correctly answering ENTs wasn't the hard part, the hard part was
ensuring that we didn't break existing customer wildcard behavior (which
because of a lack of ENTs, was not compliant with RFC 4592).
> Now that you've deployed the fix, are you able to share how you
> addressed the customer expectation problem that you describe? e.g. Did
> you advise customers not to deploy CNAME records before they are fully
The problem I described in the dnsop post wasn't our customers, it was our
customers' customers. It put us in the position of having to tell a
large SaaS provider that _their_ customers were doing the wrong thing, and
they obviously were not very amenable to hearing that. In fact, in one
case, they did move to a competitor that still had the incorrect wildcard
> Did you deploy special case code in your nameservers for
> these services to match wildcards above or parallel to the ENTs, while
> providing NODATA responses to the ENTs? Or something else?
Yes. We did an exhaustive survey of our zones, tried to infer customer
intent, worked with our customers where we could, and when we couldn't,
were forced to develop a new feature that operates in a similar manner as
the one described. (I'm not sure how detailed I can get.)
> At the time, many other large cloud providers also exhibited incorrect behavior around wildcards and empty non-terminals, but I think Akamai was called out specifically at OARC 27 or 28 with
> this GIF: https://giphy.com/gifs/hero0fwar-caddy-shack-ToMjGpz81S7usvTIM8w
> It was being discussed much earlier actually. I gave this presentation at the Spring 2015 OARC workshop, where I discussed this problem in the context of a qname minimization survey:
Thanks for the link.
> At that time, Akamai and Cloudflare were the main problematic services in a survey of the Alexa top 1000 sites. I notified both of them well before my talk, and Cloudflare had already fixed their ENT
> response behavior by the time of the talk. There were several Chinese CDN providers on the list too - I managed to contact a couple of them, and they informed me they were working on fixes, but I didn't
> follow up and check on their status.
I don't want to get into finger-pointing, but I think the wildcard
matching issue we encountered was is subtler and harder to detect than the
examples given in the slides.
Consider the following zone in its entirety:
example.com IN SOA [omitted for brevity]
example.com IN NS ns1.example.com.
*.example.com IN CNAME not-yet-provisioned.example.biz
www.dev.example.com IN A 192.168.0.1
When we surveyed other providers at the start of this rollout in 2016, we
found that a query for "dev.example.com" would correctly return an ENT for
nearly all of them, which is consistent with your results.
However, for _all but one_ of the providers we surveyed, a query for
"doesnotexist.dev.example.com" would return the CNAME
not-yet-provisioned.example.biz. That is not compliant with RFC 4592 --
the correct response is NXDOMAIN (there is no name, and there is no
wildcard at the closest encloser, which is the ENT dev.example.com).
Further complicating the issue, for those providers that offered DNSSEC
signing, when DNSSEC was enabled for such a zone, the wildcard matching
behavior became correct (as it had to be, in order for negative proofs to
work) and an NXDOMAIN was returned for "doesnotexist.dev.example.com".
I'd like to hear more about your original methodology for determining
whether a provider correctly supported ENTs or not. The wildcard behavior
is hard to infer, because knowledge of the zone contents is required. In
our case, we actually provisioned a sample zone on other providers to test
the wildcard behavior, and found that of the ones we tested, only Verisign
did the wildcard behavior correctly, at least when we tested this in 2016.
But my larger point was that explaining the concept of wildcards, closest
enclosers, and empty-non-terminals to our customers was a NIGHTMARE.
Customers choose cloud providers specifically so they _don't_ have to be
DNS experts, and it's a non-starter to have a conversation along the lines
of "Well yes, I know your zone works fine on $OTHER_PROVIDER, but you see
there are actually hundreds of invisible records in your zone which are
interfering with your wildcard matching."
More information about the dns-operations