[dns-operations] Today's Problem: repo.fpki.gov

cjc+dns-oarc at pumpky.net cjc+dns-oarc at pumpky.net
Wed Oct 19 06:35:22 UTC 2022


Thanks for the interesting discussion on the 
qa.ws.iqt.fiscal.treasury.gov problem. Nice to know I'm not the only one 
who doesn't quite understand why we're getting mixed results despite the 
obviously non-compliant behavior.

But got a new one today. Different failure mode, but same thing. 
Sometimes works, but sometimes SERVFAIL.

Noticed when we started to get some server-admin-heartburn when CRL 
downloads started to fail because of DNS errors. The servers for the 
zone fpki.gov are handing out different DNSKEYs (if the server responds 
at all). The DNSviz for this one pretty clearly catches the problem, but 
you may need a screen magnifier,

https://dnsviz.net/d/repo.fpki.gov/Y08v3Q/dnssec/

You can read all of the "Errors" on the left. (BTW, this zone was 
_completely_ broken for a while this evening. The auth servers appeared 
down. Thought they might have been trying to fix this, but looks like 
it's still there.)

I thought we might have caught it midway through in a bad rollover, but 
it's been this way for a while and the SOAs on all of the servers match.

So it's pretty easy to see how something could break. If a resolver gets 
the DNSKEYs from a server with ones that don't match the RRSIGs you've 
got, you can't validate.

But here's my question, are DNS resolvers, and specifically, BIND, 
forgiving enough to try other authoritative servers for missing DNSKEYs 
for cases just like this? Will they searching other authoritative 
servers in search of a matching DNSKEY.

Or can they come at it from the other way? If the RRSIGs don't line up 
with the available DNSKEYs, the server doesn't cache these target RRsets 
and the resolver makes another try, possibly to a different server.

But even if resolvers do this stuff, I think I still see how this could 
break things. If a recursive resolver is doing "forward-only" through 
another caching resolver, the end resolver will only get whatever the 
forwarder has in its cache. If the middle resolver has incompatible or 
incomplete DNSKEYs and RRSIGs, there isn't a way for the end resolver to 
force the intermediate resolver to go out and get DNSKEYs from the other 
authoritative servers for the zone.

Does that scenario make sense? I've been dumping caches and trying to 
see what the server is doing when things are working and when they are 
not, but thought I'd just try the people with the deep resolver 
knowledge.

But I /really/ just wish .gov orgs would fix their @*%$ DNSSEC!



More information about the dns-operations mailing list