[dns-operations] Quad9 DNSSEC Validation?

Mon Mar 1 16:33:44 UTC 2021

I've scanned the response below. I have a number of other things on my
plate at this moment and that response requires a more careful read.

I did, however, want to respond immediately that it does appear to address
my primary concern as the irs.gov operator, which was the lack of
transparency.

Hopefully the .gov NTA can be narrowed since it is and has been the full
intent of the IRS for the past decade that our advertised DNS security
information should be used and enforced.

NIST does a pretty good job of monitoring the DNSSEC status of second level
domains in .gov. The color coding for the last column (DNSSEC) is not
necessarily intuitive. Because it is also tracking compliance, red means
unsigned and insecure delegation. (No signatures and no DS record in the
parent gTLD.) So those zones will validate properly as insecure zones even
though they are out of compliance. It's the yellow zones that have a DNSSEC
issue. That could be the presence of a DS record in the gTLD zone but no
signatures, a mismatch between the DS record and KSK, or some other signing
issue. I didn't go through the list in detail, so I'm not sure how a zone
with signatures but no DS record in the parent would be color coded. A zone
like that would be out of compliance, but would resolve just fine as an
insecure zone.

If you are checking a zone in .gov, the NIST validator should be one place
to look.

https://fedv6-deployment.antd.nist.gov/cgi-bin/generate-gov

Thanks for the response and it does address my main concern. We would also
like to see irs.gov and our other domains validated at Quad9 and everywhere
else, but that's something that has been improving gradually over time.
According to the APNIC Labs data it should be up to roughly a third of US
endpoints now.

I have a longer discussion on NTAs and their utility from a complex
enterprise network perspective I can share later, but I did want to say the
IRS does not add any NTAs for any public zone, including other .gov zones.
That is our policy. If it is a failure at another agency, we work to
contact them and have them fix their problem sometimes through application
or service contacts. If there is an error in our own signing, it impacts
our entire network. Until the issue is resolved, the site or application
will not be available to our employees. I recognize that's an advantage of
being a closed enterprise network rather than a public DNS provider or even
an ISP.

Thanks,

Scott

Scott

On Mon, Mar 1, 2021 at 9:40 AM John Todd <jtodd at quad9.net> wrote:

>
> TL;DR:
> - We agree: Quad9 should be more transparent about it's NTA list and
> policy; that will be forthcoming, and we hope others will do the same. It’s
> time to do that.
> - NTAs are terrible, and we wish they didn't have to exist, but... they
> do, at the moment, and not just for Quad9
> - Is anyone interested in being a central NTA manager so this can be less
> arbitrary and fractured?
> - If not, can we develop a best practice on publishing NTAs and NTA
> policies for everyone to follow?
> - Better yet: Can we (recursive DNS operators) agree to just get rid of
> NTAs entirely?
>
> Long form:
> This email is a condensed summary of a conversation Bill and I had based
> on the issues mentioned in this thread, so this text is a mix of both his
> and my comments from here on down, and several thread topics are combined.
>
> Billl includes: “First of all, let me say that my reply near the beginning
> of this thread was admittedly exasperated and I took a tone which was too
> short and too snide, and I apologize for that. This is an issue that we’ve
> been trying to get people to pay attention to for many years, and it’s
> immensely frustrating when we finally get someone to notice… and they lay
> it at our doorstep. But that doesn’t make it any less of an issue.”
>
> So, first things first. The comments about a lack of publishing the NTA
> list are correct and we are falling short on that, and that is something we
> need to remedy. It's been on the "to-do" list, but has not been high enough
> to score for completion in our constantly large list of operational work
> with (relatively) small non-profit resources, but we'll change that. We’ll
> have our NTA list up on our website shortly after with some discussion of
> policy with the team here of what gets domains put on that list and
> how/when they should be taken off. We've recently undertaken extensive
> review of our privacy policies and transparency statements, and NTAs seem
> to be a reasonable thing to add to the list of review and publication. The
> addition process for NTAs to date has been subjective, and that needs to be
> better documented and published, and the domains listed in a way that can
> be discovered on our website. This needs to be done both as assurance to
> our users as to the exceptions to our validation claims, and also hopefully
> as an additional indication to domain operators who are important enough to
> except but also broken enough to fail validation.
>
> Adding NTAs is driven by direct complaints by end users that they cannot
> reach the resource they are trying to access - this is interrupt-driven.
> Removing NTAs has been driven by time, and testing, and available cycles of
> humans to evaluate and determine that the fault is no longer in place.
> Sometimes NTAs stay past their necessary duration, as there are limited
> resources to focus on non-interrupt items; we apologize for that lag in
> removal for some of these domains, and we think the publication of the list
> will allow others to help us remove repaired domains when they note that
> the underlying issue is no longer apparent.
>
> As we will be undergoing this transparency process, we would hope that
> others providing similar DNS recursive services would hope to do the same.
> Kudos to Cisco for calling that out as an intended NTA publication concept
> in their policy (
> https://learn-umbrella.cisco.com/i/1202769-support-for-dnssec-in-umbrella/0?)
> but we're unable to find this dashboard (sorry if we've just not dug deeply
> enough, or perhaps it's only available to paying customers.) We're not able
> to find even a policy statement for Cloudflare, Google, Comcast, Deutsche
> Telekom, KPN, Reliance Jio or others who are actively enforcing strict
> validation about what NTAs they have in place or when they are
> added/deleted, though there are certainly discussions about some of those
> providers having NTAs in threads similar to this one over time. Perhaps
> some of these providers have public NTA lists, but some quick searching did
> not find anything obvious - does anyone have pointers?
>
> So, let’s all do this.(*) That will help people understand the scope of
> the problem, and we hope that it will get the discussion moving again. We
> would actually like to see some sort of "best practices" policy for NTA
> implementation, or at least NTA declaration, or perhaps our publication of
> our methods might move towards that as an agreeable first attempt at a best
> practice. Ideally, the best possible case would to be having no NTAs at
> all, but it's clear that most resolver operators have NTAs in place in a
> non-zero volume. We hope we can come up with a way to use them as levers to
> improve security with those domains, rather than just create hidden
> exceptions.
>
> Is anyone else here interested in the discussion about a standardized
> method of NTA publication and policy statement publication? The discussions
> about privacy policy went exceptionally well in that regard leading to
> RFC8932, though this topic of NTA transparency is a much smaller slice of
> policy framing. There perhaps may be some other better forum in which to
> move that discussion, though making it an IETF Draft discussion or BCP may
> be somewhat heavy for the need.
>
> On Feb 28, 2021, at 8:38 PM, Paul Vixie <paul at redbarn.org> wrote:
> the technology of negative trust anchors is exactly as wrongheaded as it
> can possibly be. the pressure to not break stuff should be unrelenting, and
> the cost of breaking it should be extreme.
>
> Yep, this is exactly correct. Honestly, we wouldn’t have started all this
> if we’d thought that we were going to be relying on NTAs. We launched with
> DNSSEC strict validation three years ago. We were naively optimistic, and
> got lucky to some degree - there were only a few problem domains (though
> some were still quite large, depth-wise, such as .gov and .mil) and overall
> the process has been good with few complaints that warranted NTAs, though
> sporadic exceptions needed to be made. It's been encouraging to see strict
> validation becoming the standard for most large resolvers, which is
> progress! But we (meaning "large strict DNSSEC resolver operators") are all
> doing with a few NTAs, because although the world isn’t as bad a place as
> many DNSSEC naysayers thought it was, it’s also not as good a place as we
> hoped it’d be, either.
>
> So to your point: Yes, we would very much like to see a world without
> NTAs, where everyone validated DNSSEC in a strict fashion such that
> problems were painful and immediate to domain operators with faults. Let's
> see what we can do to move towards that goal - we really like that idea.
> However, if that isn't the immediate result, can we all agree on a method
> to publish data that makes these exceptions less frequent and shorter in
> duration? We pledge to have more transparency, but it would be
> disappointing if we were the only ones to do so.
>
> also, negative trust anchors aren’t part of the global MIB, and lead to
> different
> behaviour for different users.
>
> Well, kind of. But only incidentally for different users. Really, behavior
> is different based on which resolver the user is pointed at.
>
> As long as each recursive resolver implements NTAs silently and
> independently, there’s not 100% overlap between them, and users just shop
> resolvers until they find one with the NTA that allows them to still reach
> af.mil or the CDC or mail.mil, or whatever. The user blames the resolver
> that doesn’t have an NTA and praises the one that does have an NTA (or
> which doesn't do DNSSEC at all!) No pressure is exerted on the actual
> offending party, and resolver operators wind up having to juggle the
> subjective risks and benefits of NTAs versus user
> departure/complaints/confusion.
>
> Again to your point: Consistent failures are explainable; inconsistent
> failures are not. "Well, it works on a.b.c.d but not on 9.9.9.9" is a
> difficult problem to solve when the white-hot anger of tens or hundreds of
> thousands of end users is applied to the support structures of a platform
> which can no longer resolve an important address that has just broken
> either DNSSEC or some other authoritative-side issue which can be worked
> around by resolver operators jumping through hoops. Even if the problem is
> explainable ("The domain operator broke their own DNSSEC,") a result that
> leads to end users moving to a non-DNSSEC platform or NTA-excepted platform
> is a less than ideal result, but that's what we face. Other providers have
> NTAs, so we have NTAs.
>
> On Feb 28, 2021, at 9:14 PM, Vladimír Čunát <vladimir.cunat+ietf at nic.cz>
> wrote:
> My (naive?) hope is that large validating services could form some
> agreement to start
> acting stricter in this respect. Of course it's often hard to argue that a
> breakage is the
> domain's fault as long as it works almost everywhere else, but
> dnsflagday.net has shown that similar arrangements are possible to pull
> off.
>
> Yes, exactly. This is a prisoner's dilemma problem, and everyone is
> defecting on their own terms - not a good situation.
>
> There have been several hallway discussions at DNS-OARC and other forums,
> back when hallway discussions were a thing (or did it make it into a list
> discussion?) about creating shared NTA lists or at least everyone publicly
> publishing or stating their NTAs in some standardized way that the "greater
> DNS community" could see what might need temporary workarounds. We’d very
> much like to be using a list that was publicly available and was formed and
> managed through public discussion. That would solve two goals: first, it
> would name-and-shame the folks who are so broken that they have to be put
> on the list; second, it would take care of all the resolver-shopping by
> users. If something caused a DNSSEC failure on one, it would DNSSEC fail on
> the others as well. Then there would no longer be competitive pressure to
> add NTAs. It seems unlikely however that there could be a centralized NTA
> list - there were fears voiced of responsibility (aka: lawsuit,) mis-use or
> fault, and security. Though if some neutral party could create it, we would
> closely evaluate using such a list if it was responsive to our specific
> customer requests, and was secure. It would be surprising but welcome to
> see someone step up to this task, though DNS-OARC would be on the short
> list of candidates. As noted above, we would really just prefer a world
> where NTAs were entirely abandoned by enough of the significant operational
> community that it became impossible for a domain operator to continue with
> faults. Are we there yet?
>
> On Feb 28, 2021, at 7:09 PM, Scott Morizot <tmorizot at gmail.com> wrote:
> It is supposed to be temporary and domain name specific. In fact, the
> informational
> RFC states that technical personnel should ensure it is due to a
> misconfiguration
> and not the sort of attack DNSSEC is intended to prevent and that they
> should make every reasonable attempt to contact the domain owner.
>
> Yep, all those are the case. Quad9 implements NTAs specifically,
> temporarily, after determining that it’s a misconfiguration, and then also
> making a reasonable attempt to contact the domain name owner (SOA email
> addresses or RFC2142 addresses are typically used, but that is another
> thread of woe, so we end up scraping websites and often in languages that
> are not typically used by our support desk - we do make the effort.) We are
> quite often successful in reaching domain operators and informing them that
> their DNSSEC is not functioning as expected, and that typically precludes
> any NTA addition - I think the summary here is that NTAs are quite rare,
> and we do try to help authoritative operators identify their problems. Most
> NTAs can be removed after short application and repair by the domain
> operator.
>
> Zones under .GOV have been a continuous challenge, as have those within
> .MIL. There were wide-ranging faults in those TLDs for some time, creating
> continuous and new support threads. The move towards mandatory DNSSEC for
> those zones was admirable, and we think was the right fundamental decision,
> but the operational reality of a first-mover project caused many lumps in
> the process. There are fewer issues now, and we're encouraged to see so
> much of this domain space signed. Is it time to remove those NTAs? Almost
> certainly, and we agree that today those are too broad a set of exceptions.
> The remaining zones that are failing strict validation under those
> top-level domains will have to be contacted as the faults arise, and
> possibly more specific NTAs re-implemented if they continue to cause a high
> enough complaint ratio. Or maybe we reinstall no NTAs in those TLDs if the
> problems have subsided to a level that allows more specific focus on just a
> few faulty zones, to produce the pain required for repair.
>
> Perversely, the more users one has who are in US government sector areas,
> the more severe the problems when zones within .gov failed previously due
> to DNSSEC errors, and the more rapidly the users shifted away to non-DNSSEC
> resolvers in those problem events. As many of our beta-user base several
> years ago were US-based state, local, and small federal offices, this led
> to Quad9 being more than normally sensitive to faults on zones within those
> TLDs. This is not an excuse, but is some background on why those two
> particular zones were so broadly excepted.
>
> At the IRS, most of our DNS is signed.
>
> We are in fervent agreement that important domains like the IRS.gov should
> be signed, and all domains ultimately, and we've been disappointed that
> there was enough breakage in .GOV that caused continual support challenges.
> Too much time has passed since a full NTA review on our side, and we need
> to focus on just the domains that continue to be faulty and which cause our
> end users the most difficulty. We agree that needs to be a more transparent
> list, and a more transparent policy, and we'll make that happen soon -
> thank you for calling us out on this, and we'll do better, and we hope that
> leads to everyone else moving in that same direction of transparency.
>
> (*) Can we short-circuit this whole issue, perhaps? Have we reached a
> world where strict validation of DNSSEC is now viable, with no NTAs? I
> think it is worth evaluating, because even if that day is not today or this
> year then when would it be? How could we determine the viability of such a
> shift? If NTA elimination was a DNS Flag Day event for strict-validating
> recursive operators, where some significant portion of the largest
> resolvers agreed on that policy, I know that would make everyone here
> exceptionally happy. This whole subjective-decision issue could go away and
> functional comparisons against other large recursive resolver arrays (open
> or closed) would not have any differences in DNSSEC results, at least none
> that would be able to be blamed on "manual exceptions." I think this
> deserves to be broken out into a separate thread of discussion if anyone
> wishes to continue the conversation, as this is not a Quad9-specific
> aspiration.
>
> sdfasd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20210301/60a062b5/attachment.html>