<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 18/07/2023 23.53, Viktor Dukhovni

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:ZLcJ7TrenKjGJiSS@straasha.imrryr.org">

      <pre class="moz-quote-pre" wrap="">On Tue, Jul 18, 2023 at 10:25:01PM +0200, Ondřej Surý wrote:

</pre>

      <blockquote type="cite" style="color: #999999;">

        <pre class="moz-quote-pre" wrap="">It’s exactly like the serve-stale. The inception of the protocol

change is driven by this isolated incident. That’s not a proper

design, that’s slapping more bandaids on the camel.

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">I don't even see a "protocol change" here.  A bogus (possibly forged)

answer arrived from server A, perhaps server B should be tried.</pre>

    </blockquote>

    <p>I agree that at least one retry to a different IP seems nice

      before returning SERVFAIL, similarly to the case of reply not

      coming (in time).  I thought popular resolvers do something like

      that already.  But as mentioned, it's better to be careful about

      the overall amount of retries (which is not trivial to balance

      really).</p>

    <p>As for papering over issues, ideally most problems would not be

      solved as response to "internet breaking" for common users, though

      I'd generally try to avoid adding workarounds.  Serious

      deployments should have monitoring to detect such problems, or

      possibly even approaches like this (though I'm not so sure):

      <a class="moz-txt-link-freetext" href="https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-error-reporting/">https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-error-reporting/</a><br>

    </p>

    <p>--Vladimir | knot-resolver.cz<br>

    </p>

  </body>

</html>