[dns-operations] Filtering policy: false positive rate

Wed Feb 7 08:19:34 UTC 2024

Hello Peter,
We operate a blocking DNS resolver. Our false positives are either customer escalated to our support team via portal/email/phone, or our analyst team discover false positives using monitoring metrics such as source IP diversity, query volume, blocked domain age, recent query trends, recent past performance of a feed/source, etc.
Very broadly, if we were to block a popular established domain that is requested by a high number of different source IPs it will probably correlate with high customer impact… but that’s not a hard and fast rule. We’ve experienced very popular false positive domains that seem to cause little to no impact to customer business operations. Perhaps because they’re being queried by unimportant machines rather than humans in front of browsers. Equally, sometimes a single blocked website interrupting a senior employee’s workflow can elicit a fast escalation from their IT department who have been put under pressure to fix it quickly. In short, it is sensible to collect blocked domain traffic metrics, but I wouldn’t rely on them exclusively in identifying customer impact.
We maintain a spreadsheet of all false positives and we join each one back with the originating threat intelligence that caused it to enter the block list. Using this we track the performance of each source feed over time. If a particular source feed, or a sub-section of a source feed, begins to misbehave we can escalate with the author or remove it from the block list altogether.
So, regarding your question, we find that evaluating groups of false positives based on their source metadata is significantly easier than playing whack-a-mole with individual domains as they pop up. As such, we put a lot of effort into maintaining as much metadata as possible as the data flows through the system from source to block list to dashboard.
I don’t know how others are planning to do their heuristics work, but perhaps the AI/ML model could be encouraged to expose some of its decision-making parameters rather than just giving a black box yes/no answer? Maybe the different ML stages offer scores, confidence intervals, keyword matches, or other data flags that describe how a blocking outcome was reached. Such data alongside a list of false positives might help an engineer determine that ‘ML feature X’ seems to be a common problem so maybe dial it down a bit in the model.
James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20240207/cf3fb3ac/attachment.html>