<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 18/07/2023 23.53, Viktor Dukhovni
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:ZLcJ7TrenKjGJiSS@straasha.imrryr.org">
<pre class="moz-quote-pre" wrap="">On Tue, Jul 18, 2023 at 10:25:01PM +0200, Ondřej Surý wrote:
</pre>
<blockquote type="cite" style="color: #999999;">
<pre class="moz-quote-pre" wrap="">It’s exactly like the serve-stale. The inception of the protocol
change is driven by this isolated incident. That’s not a proper
design, that’s slapping more bandaids on the camel.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">I don't even see a "protocol change" here. A bogus (possibly forged)
answer arrived from server A, perhaps server B should be tried.</pre>
</blockquote>
<p>I agree that at least one retry to a different IP seems nice
before returning SERVFAIL, similarly to the case of reply not
coming (in time). I thought popular resolvers do something like
that already. But as mentioned, it's better to be careful about
the overall amount of retries (which is not trivial to balance
really).</p>
<p>As for papering over issues, ideally most problems would not be
solved as response to "internet breaking" for common users, though
I'd generally try to avoid adding workarounds. Serious
deployments should have monitoring to detect such problems, or
possibly even approaches like this (though I'm not so sure):
<a class="moz-txt-link-freetext" href="https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-error-reporting/">https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-error-reporting/</a><br>
</p>
<p>--Vladimir | knot-resolver.cz<br>
</p>
</body>
</html>