[dns-operations] DDoS botnet behaviour

Sun Jun 10 23:36:40 UTC 2012

> From: Jim Reid <jim at rfc1035.com>

> My logs tended to have a few hundred entries at a time for the same  
> (spoofed?) IP address. So as soon as I blackholed the last IP address  
> in the log file, entries for another would be appended. At 4am and  
> there's a caffeine deficit, this looks like a new client has  
> immediately popped up to replace the one that's just been nuked. In  
> fact, the "new" IP address was already there and its queries were lost  
> amongst the noise of the other 100+ addresses that were firing crap at  
> the name server.

That raises two issues.

A problem with the response rate limiting code I've been working is
logging.  One needs to be able to find out response have been rate
limited and why.  To answer that question, my current logging code
simply logs to a new BIND9 category whenever it drops a response (or
would have dropped when in test mode).  The problem is that even on
my small DNS servers that generates too much noise.  My plan is to
change from instantaneous to retrospective logging that says something
equivalent to "10.2.3.4 recently asked 27 times for A records for
example.com and the last 13 responses were dropped."

The second issue concerns log noise and the popular enthusiasm for
using Bloom filters for DNS response rate limiting.  I've heard more
than one suggestion for using Bloom filters for DNS response rate
limiting.  Bloom filters are a great idea for some things but I think
they a problem instead of a solution here.  The problem is suggested
by the word "probabilistic" in "Part of a series on Probabilistic data
structures" on https://en.wikipedia.org/wiki/Bloom_filter

It's like the difference between accounting and statistics.  You don't
(and for privacy reasons must not) care exactly how many nearby
households have incomes above or below twice the median for your
neighborhood.  A statistical statement like 99.9% of your neighbors
earned $31,000 +/-$10,000 is fine.  On the other, accounting hand,
you'd be unhappy if your bank told you that 0.1% of your bank statements
would be fiction, and you'd have to guess which.

Bloom filters have false positives.  If you know enough about your
data, you can make the false positive probability as low as you like,
but you cannot make that probability zero without giving up the reasons
why you chose a Bloom filter.  Never mind the difficulities in knowing
enough about your DNS query stream, and not that it is always a
probability distribution as opposed to a rate.  Computing that
distribution depends on hard to answer questions such as how independent
your hash functions really are on your real data.

The connection with logging is that you need to be able to answer
the question "Why did your DNS server drop my requests?"  With any
sort of probabilistic filter including Bloom filters, you won't be
able to say "You sent more than X requests" without turning on
query-logging and slogging through GBytes of log lines.

I think doing the retrospective logging I plan would make any Bloom
filter scheme equivalent to a straight forward hash table.  Log messages
saying "IP addresses that the filter says are the same recently asked
27 times for A records for example.com and the last 13 responses were
dropped" would not satisfy people wanting to know why their customer's
browsers are stalling when trying to get to their web sites.

Vernon Schryver    vjs at rhyolite.com