[dns-operations] dnsflow again (Re: DNS Traffic Archive Protocol )

Mon Dec 6 19:54:34 UTC 2010

> Date: Mon, 6 Dec 2010 14:58:46 +0100
> From: Phil Regnauld <regnauld at nsrc.org>
> 
> > > nice idea, but I'm missing the rationale *why*:
> > 
> > Same, here - storage is cheap.
> 
> 	The rationale was not only about storage space, but also about
> processing time. The author mentioned that he would look more closely
> at nmsg.  Once he's done that and returned to the list, the question
> will be whether the performance benefit outweighs having yet another
> storage format, one targeted to one protocol only, or if more work
> should be done to at least accomodate other message formats, or if the
> effort is misplaced.

i've been thinking for a while now that a new subclass of dnsqr would be
useful, and since this is one of the use cases (compression for the purpose
of decreasing the processing time by analysts), i'd like to explain what
i mean by "dnsflow".

a dnsqr message is a query and a response bundled into a single message.
i don't mean both original DNS messages are included, i mean there's a
single dnsqr message having elements from both the query and the response,
and having the ability to express things like "there was no response."
since many dnsqr messages generated by any given sensor are duplicates
in various ways, we should be able to compress them usefully, where being
useful includes counting the number of times that duplicates occurred and
generating a summary data stream.

at its simplest, dnsflow would mean that if you got these dnsqr's...

client ip	server ip	opcode	q-tuple		rcode	intent
------------------------------------------------------------------------
204.152.187.6	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.187.6	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.187.6	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.187.6	192.5.5.241	query	isc.org/in/a	noerr	referral

204.152.187.13	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.187.13	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.187.13	192.5.5.241	query	isc.org/in/a	noerr	referral

204.152.188.10	192.5.5.241	query	vix.com/in/a	noerr	referral
204.152.188.10	192.5.5.241	query	isc.org/in/a	noerr	referral

...then the result after dnsflow filtering would be...

kind		thing				count
-----------------------------------------------------
client ip	204.152.187.6			4
client ip	204.152.187.13			3
client ip	204.152.188.10			2
server ip	192.5.5.241			9
opcode		query				9
query q-tuple	vix.com/in/a			9
query rcode	noerr				9
query intent	referral			9

now, that's "at its simplest" and i think it's easy to argue that it's so
simple as to be useless.  without compound buckets you don't know what
you'd need to know.  so we might like to see these additional dnsflows:

kind		thing				count
-----------------------------------------------------
qtuple-by-cli	vix.com/in/a-204.152.187.6	3
qtuple-by-cli	isc.org/in/a-204.152.187.6	1
qtuple-by-cli	vix.com/in/a-204.152.187.13	1
qtuple-by-cli	isc.org/in/a-204.152.187.13	1
qtuple-by-cli	vix.com/in/a-204.152.188.10	1
qtuple-by-cli	isc.org/in/a-204.152.188.10	1
serv-by-cli	192.5.5.241-204.152.187.6	4
serv-by-cli	192.5.5.241-204.152.187.13	3
serv-by-cli	192.5.5.241-204.152.188.10	2
...

apologies for the low quality of these examples, they aren't complete but
they should show the flavour.  what should be obvious from looking at these
is how much more useful compression is when you know the duplicate counts.
also obvious is that you have to know what compound keys are interesting
and that this may take some work.  of maximum interest should be the way
that lots of dnsflows from different servers can be combined and then
recompressed.  in the above examples 192.5.5.241, f-root, is always the
server ip, but if several different f-root anycast instances all generate
this format they can be usefully combined.  (which reminds me that we likely
need server ID in addition to server IP, to keep SOME counters independent.)
and if other rootops also generated these stats then THOSE could be combined.
so if some online gangster was ddos'ing some online casino using the root
name server system as a reflector (such that the victim's ip address was
spoofed as the query source address) and several rootops combined their
stats we would end up with a very small "dnsflow" stream from arbitrarily
large attacks.

this is what i had in mind when i asked duane wessels to work on DSC back
in the days of the original DNS-OARC grant administered by CAIDA and ISC.
(at that time nmsg didn't exist and so XML was used and we never built a
vibrant ecosystem around the DSC technology, it just does what it does.)

many details would have to get ironed out during deployment, like balancing
dnsflow bucket size (how high does the count get before you close the bucket,
produce some output, and reset the counter?) against dnsflow bucket count
(how many buckets can you have open before you start LRU'ing the oldest ones?)
and dnsflow bucket age (how long can you keep incrementing a counter before
you decide to just close/emit/reset based on the age of the oldest increment?)

this is the approach i've been thinking about, and it's the kind of thing
i am hoping to see the nmsg framework used for, and it's the kind of thing
i instantly re-envisioned when the nmsg dnsqr schema first came into use.
there are three core components to the solution: an nmsg dnsflow schema that
would probably be a subclass of dnsqr; a dnsqr-to-dnsflow filter that can
run on the collectors, and a dnsflow-to-dnsflow filter that just aggregates.

nmsg dnsflow co-conspirators are urgently desired.  sponsors, partner-coders,
partner-operators, partner-visionaries.  if we got nmsg horribly wrong,
please TELL US so that we can work on it in a way that's more relevant.
if on the other hand it seems like this could be the solution to a problem
you're having, CONTACT US so that we can work together on it.  for one thing
this seems directly responsive to the needs of a "dns traffic archive
protocol".