[dns-operations] Today's Problem: repo.fpki.gov
cjc+dns-oarc at pumpky.net
cjc+dns-oarc at pumpky.net
Wed Oct 19 06:35:22 UTC 2022
Thanks for the interesting discussion on the
qa.ws.iqt.fiscal.treasury.gov problem. Nice to know I'm not the only one
who doesn't quite understand why we're getting mixed results despite the
obviously non-compliant behavior.
But got a new one today. Different failure mode, but same thing.
Sometimes works, but sometimes SERVFAIL.
Noticed when we started to get some server-admin-heartburn when CRL
downloads started to fail because of DNS errors. The servers for the
zone fpki.gov are handing out different DNSKEYs (if the server responds
at all). The DNSviz for this one pretty clearly catches the problem, but
you may need a screen magnifier,
https://dnsviz.net/d/repo.fpki.gov/Y08v3Q/dnssec/
You can read all of the "Errors" on the left. (BTW, this zone was
_completely_ broken for a while this evening. The auth servers appeared
down. Thought they might have been trying to fix this, but looks like
it's still there.)
I thought we might have caught it midway through in a bad rollover, but
it's been this way for a while and the SOAs on all of the servers match.
So it's pretty easy to see how something could break. If a resolver gets
the DNSKEYs from a server with ones that don't match the RRSIGs you've
got, you can't validate.
But here's my question, are DNS resolvers, and specifically, BIND,
forgiving enough to try other authoritative servers for missing DNSKEYs
for cases just like this? Will they searching other authoritative
servers in search of a matching DNSKEY.
Or can they come at it from the other way? If the RRSIGs don't line up
with the available DNSKEYs, the server doesn't cache these target RRsets
and the resolver makes another try, possibly to a different server.
But even if resolvers do this stuff, I think I still see how this could
break things. If a recursive resolver is doing "forward-only" through
another caching resolver, the end resolver will only get whatever the
forwarder has in its cache. If the middle resolver has incompatible or
incomplete DNSKEYs and RRSIGs, there isn't a way for the end resolver to
force the intermediate resolver to go out and get DNSKEYs from the other
authoritative servers for the zone.
Does that scenario make sense? I've been dumping caches and trying to
see what the server is doing when things are working and when they are
not, but thought I'd just try the people with the deep resolver
knowledge.
But I /really/ just wish .gov orgs would fix their @*%$ DNSSEC!
More information about the dns-operations
mailing list