[dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region
ietf-dane at dukhovni.org
Tue Jul 18 21:53:49 UTC 2023
On Tue, Jul 18, 2023 at 10:25:01PM +0200, Ondřej Surý wrote:
> It’s exactly like the serve-stale. The inception of the protocol
> change is driven by this isolated incident. That’s not a proper
> design, that’s slapping more bandaids on the camel.
I don't even see a "protocol change" here. A bogus (possibly forged)
answer arrived from server A, perhaps server B should be tried.
> There’s already mechanism for not serving a stale RRSIGs. The EXPIRE
> field in the SOA record should be set to a value that’s lower than the
> RRSIG resigning interval (the minimal interval between now and
> shortest RRSIG expiry in the zone).
We're not just talking about expired RRSIG as the sole use-case. Some
servers have bugs, some primaries failed to implement AXFR atomically,
propagating the RRSet update sans RRSIG update (or the other way
around). Some mishandle ENTs, ...
> Currently, it’s 7 days for .com which almost exactly matches the RRSIG
> expiry-inception difference and that doesn’t give any wiggle room if
> things go wrong.
Expiry in the SOA applies to AXFR, but may deployments are not
AXFR-based. And Verisign apparently did try to isolate the server,
sadly that didn't work out as expected.
> . - 7 days SOA expiry and 14 days signature validity
> .cz - 7 days SOA expiry and 14 days signature validity
> .nl - 28 days SOA expiry and 14 days signature validity
> .org - 14 days SOA expiry and 3 weeks signature validity
Do any of these use AXFR? If all the servers are effectively "primary",
with incremental zone updates driven by some other process, the SOA
expiry is of little relevance. Sure they should go offline before
signatures start to go stale (as Verisign tried to do, but failed).
The "go offline" logic should therefore be robust, but that's not
the topic at hand I think. The topic is whether "bogus" should
generally be retriable (or even required to be retriable within
reasonable retry limits, and with error caching holddowns to
avoid thundering herd storms, ...).
More information about the dns-operations