<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
On 06/02/2024 17.06, Peter Thomassen wrote:
<div class="moz-forward-container">
<blockquote type="cite"
cite="mid:23c8f958-ef31-42bf-92ed-afc2917ec4e3@desec.io">Then,
how to define a false positive rate? <br>
<br>
Look at all blocked queries, and do a post-hoc investigation? <br>
<br>
How about popularity -- should one factor in that blocking
*.ddns.net is more severe than blocking *.blank.page? I.e., is
it a ratio of blocked/total queries, or blocked/total names? </blockquote>
<p>Yes, primarily post-hoc I expect - I mean, if we could easily
recognize false positives in advance, we'd do that during the
blocking, right? I'd do this statistically. Take a sample from
the blocked names. You could weight the names with whatever you
like when choosing among them, e.g. the mentioned popularity by
unique IPs querying them. Then evaluate the sample in some
better way, probably by a human. You could mix in a sample from
non-blocked names, too (say [1]).</p>
<p>I think it's not difficult to design these measurements in a
way that you get an OK ratio of complexity (and human work) vs.
precision of the false-positive estimate. Actually I suspect
that it's probably *not* worth trying to affect the choice of
the evaluated sample by reports from users, as it's probably
very hard to get statistically correct-ish numbers out of that.<br>
</p>
<p>[1] <a class="moz-txt-link-freetext"
href="https://en.wikipedia.org/wiki/Scientific_control#Controlled_experiments"
moz-do-not-send="true">https://en.wikipedia.org/wiki/Scientific_control#Controlled_experiments<br>
</a></p>
<p>(reposted from a correct e-mail address; Peter will probably
get a duplicate)<br>
</p>
<p>--Vladimir | knot-resolver.cz<br>
</p>
</div>
</body>
</html>