[dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region

Tue Jul 18 20:25:01 UTC 2023

It’s exactly like the serve-stale. The inception of the protocol change is driven by this isolated incident. That’s not a proper design, that’s slapping more bandaids on the camel.

There’s already mechanism for not serving a stale RRSIGs. The EXPIRE field in the SOA record should be set to a value that’s lower than the RRSIG resigning interval (the minimal interval between now and shortest RRSIG expiry in the zone).

Currently, it’s 7 days for .com which almost exactly matches the RRSIG expiry-inception difference and that doesn’t give any wiggle room if things go wrong.

Out of curiosity (and on the phone) I’ve checked:

. - 7 days SOA expiry and 14 days signature validity
.cz - 7 days SOA expiry and 14 days signature validity
.nl - 28 days SOA expiry and 14 days signature validity
.org - 14 days SOA expiry and 3 weeks signature validity

Perhaps, we can start by having a solid recommendation for SOA expiry value for DNSSEC signed zones?

If those intervals are shorter, the disconnect authoritative servers should start giving SERVFAIL much sooner which would be then properly handled even in the ancient deployments (meaning the whole ecosystem would benefit now and not in so so so far future) and it doesn’t require any change in the protocol.

Ondrej
--
Ondřej Surý <ondrej at sury.org> (He/Him)

> On 18. 7. 2023, at 21:39, Viktor Dukhovni <ietf-dane at dukhovni.org> wrote:
> 
> On Tue, Jul 18, 2023 at 08:54:04PM +0200, Ondřej Surý wrote:
> 
>> With my implementor’s hat on, I think this is wrong approach. It
>> (again) adds a complexity to the resolvers and yet again based
>> (mostly) on isolated incident. I really don’t want yet another
>> “serve-stale” in the resolvers. I have to yet see an evidence that
>> serve-stale has helped anything since the original incident, but now
>> every resolver has to have it because people want it.
> 
> How is this akin to "serve stale"?  We're talking about retrying
> response that fail to validate, just one might/would retry a response
> that is "REFUSED", "SERVFAIL", has TC=1 over UDP, contains garbage, ...
> 
> The "serve stale" situation is quite different, here substantial new
> logic is required, whereas with invalid responses, it is just a matter
> of trying the next server up to some reasonable work limit.
> 
> Retries to reach a better authoritative server are core element of DNS
> resilience in the face if inevitable partial degradation of service.
> 
> --
>    Viktor.
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations