<div dir="ltr">I've scanned the response below. I have a number of other things on my plate at this moment and that response requires a more careful read.<div><br></div><div>I did, however, want to respond immediately that it does appear to address my primary concern as the <a href="http://irs.gov">irs.gov</a> operator, which was the lack of transparency.</div><div><br></div><div>Hopefully the .gov NTA can be narrowed since it is and has been the full intent of the IRS for the past decade that our advertised DNS security information should be used and enforced.<br><br></div><div>NIST does a pretty good job of monitoring the DNSSEC status of second level domains in .gov. The color coding for the last column (DNSSEC) is not necessarily intuitive. Because it is also tracking compliance, red means unsigned and insecure delegation. (No signatures and no DS record in the parent gTLD.) So those zones will validate properly as insecure zones even though they are out of compliance. It's the yellow zones that have a DNSSEC issue. That could be the presence of a DS record in the gTLD zone but no signatures, a mismatch between the DS record and KSK, or some other signing issue. I didn't go through the list in detail, so I'm not sure how a zone with signatures but no DS record in the parent would be color coded. A zone like that would be out of compliance, but would resolve just fine as an insecure zone.</div><div><br></div><div>If you are checking a zone in .gov, the NIST validator should be one place to look.</div><div><br></div><div><a href="https://fedv6-deployment.antd.nist.gov/cgi-bin/generate-gov">https://fedv6-deployment.antd.nist.gov/cgi-bin/generate-gov</a><br></div><div><br></div><div>Thanks for the response and it does address my main concern. We would also like to see <a href="http://irs.gov">irs.gov</a> and our other domains validated at Quad9 and everywhere else, but that's something that has been improving gradually over time. According to the APNIC Labs data it should be up to roughly a third of US endpoints now.</div><div><br></div><div>I have a longer discussion on NTAs and their utility from a complex enterprise network perspective I can share later, but I did want to say the IRS does not add any NTAs for any public zone, including other .gov zones. That is our policy. If it is a failure at another agency, we work to contact them and have them fix their problem sometimes through application or service contacts. If there is an error in our own signing, it impacts our entire network. Until the issue is resolved, the site or application will not be available to our employees. I recognize that's an advantage of being a closed enterprise network rather than a public DNS provider or even an ISP. </div><div><br></div><div>Thanks,</div><div><br></div><div>Scott</div><div><br></div><div>Scott</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 1, 2021 at 9:40 AM John Todd <<a href="mailto:jtodd@quad9.net">jtodd@quad9.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>


<div>

<div style="font-family:sans-serif"><div style="white-space:normal"><br><p dir="auto">TL;DR:<br>

  - We agree: Quad9 should be more transparent about it's NTA list and policy; that will be forthcoming, and we hope others will do the same. It’s time to do that.<br>

  - NTAs are terrible, and we wish they didn't have to exist, but... they do, at the moment, and not just for Quad9<br>

  - Is anyone interested in being a central NTA manager so this can be less arbitrary and fractured?<br>

  - If not, can we develop a best practice on publishing NTAs and NTA policies for everyone to follow?<br>

  - Better yet: Can we (recursive DNS operators) agree to just get rid of NTAs entirely?</p>

<br><p dir="auto">Long form:<br>

This email is a condensed summary of a conversation Bill and I had based on the issues mentioned in this thread, so this text is a mix of both his and my comments from here on down, and several thread topics are combined.</p>

<p dir="auto">Billl includes: “First of all, let me say that my reply near the beginning of this thread was admittedly exasperated and I took a tone which was too short and too snide, and I apologize for that. This is an issue that we’ve been trying to get people to pay attention to for many years, and it’s immensely frustrating when we finally get someone to notice… and they lay it at our doorstep. But that doesn’t make it any less of an issue.”</p>

<p dir="auto">So, first things first. The comments about a lack of publishing the NTA list are correct and we are falling short on that, and that is something we need to remedy. It's been on the "to-do" list, but has not been high enough to score for completion in our constantly large list of operational work with (relatively) small non-profit resources, but we'll change that. We’ll have our NTA list up on our website shortly after with some discussion of policy with the team here of what gets domains put on that list and how/when they should be taken off. We've recently undertaken extensive review of our privacy policies and transparency statements, and NTAs seem to be a reasonable thing to add to the list of review and publication. The addition process for NTAs to date has been subjective, and that needs to be better documented and published, and the domains listed in a way that can be discovered on our website. This needs to be done both as assurance to our users as to the exceptions to our validation claims, and also hopefully as an additional indication to domain operators who are important enough to except but also broken enough to fail validation.</p>

<p dir="auto">Adding NTAs is driven by direct complaints by end users that they cannot reach the resource they are trying to access - this is interrupt-driven. Removing NTAs has been driven by time, and testing, and available cycles of humans to evaluate and determine that the fault is no longer in place. Sometimes NTAs stay past their necessary duration, as there are limited resources to focus on non-interrupt items; we apologize for that lag in removal for some of these domains, and we think the publication of the list will allow others to help us remove repaired domains when they note that the underlying issue is no longer apparent.</p>

<p dir="auto">As we will be undergoing this transparency process, we would hope that others providing similar DNS recursive services would hope to do the same. Kudos to Cisco for calling that out as an intended NTA publication concept in their policy (<a href="https://learn-umbrella.cisco.com/i/1202769-support-for-dnssec-in-umbrella/0" style="color:rgb(57,131,196)" target="_blank">https://learn-umbrella.cisco.com/i/1202769-support-for-dnssec-in-umbrella/0</a>?) but we're unable to find this dashboard (sorry if we've just not dug deeply enough, or perhaps it's only available to paying customers.)  We're not able to find even a policy statement for Cloudflare, Google, Comcast, Deutsche Telekom, KPN, Reliance Jio or others who are actively enforcing strict validation about what NTAs they have in place or when they are added/deleted, though there are certainly discussions about some of those providers having NTAs in threads similar to this one over time. Perhaps some of these providers have public NTA lists, but some quick searching did not find anything obvious - does anyone have pointers?</p>

<p dir="auto">So, let’s all do this.(*) That will help people understand the scope of the problem, and we hope that it will get the discussion moving again. We would actually like to see some sort of "best practices" policy for NTA implementation, or at least NTA declaration, or perhaps our publication of our methods might move towards that as an agreeable first attempt at a best practice. Ideally, the best possible case would to be having no NTAs at all, but it's clear that most resolver operators have NTAs in place in a non-zero volume. We hope we can come up with a way to use them as levers to improve security with those domains, rather than just create hidden exceptions.</p>

<p dir="auto">Is anyone else here interested in the discussion about a standardized method of NTA publication and policy statement publication? The discussions about privacy policy went exceptionally well in that regard leading to RFC8932, though this topic of NTA transparency is a much smaller slice of policy framing. There perhaps may be some other better forum in which to move that discussion, though making it an IETF Draft discussion or BCP may be somewhat heavy for the need.</p>

<blockquote style="border-left:2px solid rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto"> On Feb 28, 2021, at 8:38 PM, Paul Vixie <<a href="mailto:paul@redbarn.org" target="_blank">paul@redbarn.org</a>> wrote:<br>

 the technology of negative trust anchors is exactly as wrongheaded as  it can possibly be. the pressure to not break stuff should be unrelenting,  and the cost of breaking it should be extreme.</p>

</blockquote><p dir="auto">Yep, this is exactly correct. Honestly, we wouldn’t have started all this if we’d thought that we were going to be relying on NTAs. We launched with DNSSEC strict validation three years ago. We were naively optimistic, and got lucky to some degree - there were only a few problem domains (though some were still quite large, depth-wise, such as .gov and .mil) and overall the process has been good with few complaints that warranted NTAs, though sporadic exceptions needed to be made. It's been encouraging to see strict validation becoming the standard for most large resolvers, which is progress!  But we (meaning "large strict DNSSEC resolver operators") are all doing with a few NTAs, because although the world isn’t as bad a place as many DNSSEC naysayers thought it was, it’s also not as good a place as we hoped it’d be, either.</p>

<p dir="auto">So to your point: Yes, we would very much like to see a world without NTAs, where everyone validated DNSSEC in a strict fashion such that problems were painful and immediate to domain operators with faults. Let's see what we can do to move towards that goal - we really like that idea. However, if that isn't the immediate result, can we all agree on a method to publish data that makes these exceptions less frequent and shorter in duration?  We pledge to have more transparency, but it would be disappointing if we were the only ones to do so.</p>

<blockquote style="border-left:2px solid rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">also, negative trust anchors aren’t part of the global MIB, and lead to different<br>

behaviour for different users.</p>

</blockquote><p dir="auto">Well, kind of.  But only incidentally for different users.  Really, behavior is different based on which resolver the user is pointed at.</p>

<p dir="auto">As long as each recursive resolver implements NTAs silently and independently, there’s not 100% overlap between them, and users just shop resolvers until they find one with the NTA that allows them to still reach <a href="http://af.mil" target="_blank">af.mil</a> or the CDC or <a href="http://mail.mil" target="_blank">mail.mil</a>, or whatever. The user blames the resolver that doesn’t have an NTA and praises the one that does have an NTA (or which doesn't do DNSSEC at all!) No pressure is exerted on the actual offending party, and resolver operators wind up having to juggle the subjective risks and benefits of NTAs versus user departure/complaints/confusion.</p>

<p dir="auto">Again to your point: Consistent failures are explainable; inconsistent failures are not.  "Well, it works on a.b.c.d but not on 9.9.9.9" is a difficult problem to solve when the white-hot anger of tens or hundreds of thousands of end users is applied to the support structures of a platform which can no longer resolve an important address that has just broken either DNSSEC or some other authoritative-side issue which can be worked around by resolver operators jumping through hoops. Even if the problem is explainable ("The domain operator broke their own DNSSEC,") a result that leads to end users moving to a non-DNSSEC platform or NTA-excepted platform is a less than ideal result, but that's what we face.  Other providers have NTAs, so we have NTAs.</p>

<br><blockquote style="border-left:2px solid rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">On Feb 28, 2021, at 9:14 PM, Vladimír Čunát <<a href="mailto:vladimir.cunat%2Bietf@nic.cz" target="_blank">vladimir.cunat+ietf@nic.cz</a>> wrote:<br>

My (naive?) hope is that large validating services could form some agreement to start<br>

acting stricter in this respect.  Of course it's often hard to argue that a breakage is the<br>

domain's fault as long as it works almost everywhere else, but <a href="http://dnsflagday.net" target="_blank">dnsflagday.net</a> has shown that similar arrangements are possible to pull off.</p>

</blockquote><p dir="auto">Yes, exactly.  This is a prisoner's dilemma problem, and everyone is defecting on their own terms - not a good situation.</p>

<p dir="auto">There have been several hallway discussions at DNS-OARC and other forums, back when hallway discussions  were a thing (or did it make it into a list discussion?) about creating shared NTA lists or at least everyone publicly publishing or stating their NTAs in some standardized way that the "greater DNS community" could see what might need temporary workarounds. We’d very much like to be using a list that was publicly available and was formed and managed through public discussion. That would solve two goals: first, it would name-and-shame the folks who are so broken that they have to be put on the list; second, it would take care of all the resolver-shopping by users. If something caused a DNSSEC failure on one, it would DNSSEC fail on the others as well. Then there would no longer be competitive pressure to add NTAs. It seems unlikely however that there could be a centralized NTA list - there were fears voiced of responsibility (aka: lawsuit,) mis-use or fault, and security. Though if some neutral party could create it, we would closely evaluate using such a list if it was responsive to our specific customer requests, and was secure. It would be surprising but welcome to see someone step up to this task, though DNS-OARC would be on the short list of candidates. As noted above, we would really just prefer a world where NTAs were entirely abandoned by enough of the significant operational community that it became impossible for a domain operator to continue with faults. Are we there yet?</p>

<blockquote style="border-left:2px solid rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">On Feb 28, 2021, at 7:09 PM, Scott Morizot <<a href="mailto:tmorizot@gmail.com" target="_blank">tmorizot@gmail.com</a>> wrote:<br>

It is supposed to be temporary and domain name specific. In fact, the informational<br>

RFC states that technical personnel should ensure it is due to a misconfiguration<br>

and not the sort of attack DNSSEC is intended to prevent and that they should make every reasonable attempt to contact the domain owner.</p>

</blockquote><p dir="auto">Yep, all those are the case.  Quad9 implements NTAs specifically, temporarily, after determining that it’s a misconfiguration, and then also making a reasonable attempt to contact the domain name owner (SOA email addresses or RFC2142 addresses are typically used, but that is another thread of woe, so we end up scraping websites and often in languages that are not typically used by our support desk - we do make the effort.)  We are quite often successful in reaching domain operators and informing them that their DNSSEC is not functioning as expected, and that typically precludes any NTA addition - I think the summary here is that NTAs are quite rare, and we do try to help authoritative operators identify their problems. Most NTAs can be removed after short application and repair by the domain operator.</p>

<p dir="auto">Zones under .GOV have been a continuous challenge, as have those within .MIL. There were wide-ranging faults in those TLDs for some time, creating continuous and new support threads. The move towards mandatory DNSSEC for those zones was admirable, and we think was the right fundamental decision, but the operational reality of a first-mover project caused many lumps in the process. There are fewer issues now, and we're encouraged to see so much of this domain space signed. Is it time to remove those NTAs?  Almost certainly, and we agree that today those are too broad a set of exceptions. The remaining zones that are failing strict validation under those top-level domains will have to be contacted as the faults arise, and possibly more specific NTAs re-implemented if they continue to cause a high enough complaint ratio. Or maybe we reinstall no NTAs in those TLDs if the problems have subsided to a level that allows more specific focus on just a few faulty zones, to produce the pain required for repair.</p>

<p dir="auto">Perversely, the more users one has who are in US government sector areas, the more severe the problems when zones within .gov failed previously due to DNSSEC errors, and the more rapidly the users shifted away to non-DNSSEC resolvers in those problem events. As many of our beta-user base several years ago were US-based state, local, and small federal offices, this led to Quad9 being more than normally sensitive to faults on zones within those TLDs. This is not an excuse, but is some background on why those two particular zones were so broadly excepted.</p>

<blockquote style="border-left:2px solid rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">At the IRS, most of our DNS is signed.</p>

</blockquote><p dir="auto">We are in fervent agreement that important domains like the IRS.gov should be signed, and all domains ultimately, and we've been disappointed that there was enough breakage in .GOV that caused continual support challenges. Too much time has passed since a full NTA review on our side, and we need to focus on just the domains that continue to be faulty and which cause our end users the most difficulty. We agree that needs to be a more transparent list, and a more transparent policy, and we'll make that happen soon - thank you for calling us out on this, and we'll do better, and we hope that leads to everyone else moving in that same direction of transparency.</p>

<br><p dir="auto">(*) Can we short-circuit this whole issue, perhaps? Have we reached a world where strict validation of DNSSEC is now viable, with no NTAs? I think it is worth evaluating, because even if that day is not today or this year then when would it be? How could we determine the viability of such a shift?  If NTA elimination was a DNS Flag Day event for strict-validating recursive operators, where some significant portion of the largest resolvers agreed on that policy, I know that would make everyone here exceptionally happy. This whole subjective-decision issue could go away and functional comparisons against other large recursive resolver arrays (open or closed) would not have any differences in DNSSEC results, at least none that would be able to be blamed on "manual exceptions." I think this deserves to be broken out into a separate thread of discussion if anyone wishes to continue the conversation, as this is not a Quad9-specific aspiration.</p>

<br></div>

sdfasd


</div>

</div>


</blockquote></div>