[dns-operations] DNS .com/.net resolution problems in the Asia/Pacific region

Wed Jul 19 04:42:36 UTC 2023

Thanks Mark for the clarification.

I just hate adding new knobs and exceptions in a scramble mode. If the knob is already there then it’s already there.

There’s already too many knobs in DNS and we all know that.

--
Ondřej Surý <ondrej at sury.org> (He/Him)

> On 19. 7. 2023, at 0:43, Mark Andrews <marka at isc.org> wrote:
> 
> Except BIND does exactly this.  It retries and if all the servers for the zone fail the <name,type> is flagged as bad for 10 minutes and any validation that depends on that lookup fails with DNS_R_BROKENCHAIN which results in SERVFAIL rather than a retry.  This was how we dealt with the so called “rollover and die” issue.
> 
>                } else if (result == DNS_R_BROKENCHAIN) {
>                        isc_result_t tresult;
>                        isc_time_t expire;
>                        isc_interval_t i;
> 
>                        isc_interval_set(&i, DNS_RESOLVER_BADCACHETTL(fctx), 0);
>                        tresult = isc_time_nowplusinterval(&expire, &i);
>                        if (negative &&
>                            (fctx->type == dns_rdatatype_dnskey ||
>                             fctx->type == dns_rdatatype_ds) &&
>                            tresult == ISC_R_SUCCESS)
>                        {
>                                dns_resolver_addbadcache(res, fctx->name,
>                                                         fctx->type, &expire);
>                        }
>                        done = true;
>                        goto cleanup_fetchctx;
>                } else {
>                        fctx_try(fctx, true, true);
>                        goto cleanup_fetchctx;
>                }
> 
> The world doesn’t fall over with limited retries.  We had zero reports resolution failures due to this incident.  This also allows a validator behind a validator to work reliably by having the validator that talks directly to the authoritative servers filter out the garbage responses.  Always send CD=1 is STUPID.
> 
>> On 19 Jul 2023, at 04:54, Ondřej Surý <ondrej at sury.org> wrote:
>> 
>> With my implementor’s hat on, I think this is wrong approach. It (again) adds a complexity to the resolvers and yet again based (mostly) on isolated incident. I really don’t want yet another “serve-stale” in the resolvers. I have to yet see an evidence that serve-stale has helped anything since the original incident, but now every resolver has to have it because people want it.
>> 
>> And operationally, it will just pamper over the issue which might then go unnoticed for longer period of time rather than being fixed right away.
>> 
>> Ondrej
>> --
>> Ondřej Surý <ondrej at sury.org> (He/Him)
>> 
>>>> On 18. 7. 2023, at 20:38, Gavin McCullagh <gmccullagh at gmail.com> wrote:
>>> 
>>> I'd like to reach out to NLNet about changing Unbound to do this, so I want to make sure people have a chance to disagree.  Feel free to voice your disagreement (and reasons) here if you do.
>> 
>> 
>> _______________________________________________
>> dns-operations mailing list
>> dns-operations at lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742              INTERNET: marka at isc.org
>