[dns-operations] dnsflow again (Re: DNS Traffic Archive Protocol )

Tue Dec 7 02:29:40 UTC 2010

Paul Vixie wrote:
> a dnsqr message is a query and a response bundled into a single message.
> i don't mean both original DNS messages are included, i mean there's a
> single dnsqr message having elements from both the query and the response,
> and having the ability to express things like "there was no response."
> since many dnsqr messages generated by any given sensor are duplicates
> in various ways, we should be able to compress them usefully, where being
> useful includes counting the number of times that duplicates occurred and
> generating a summary data stream.

this description is slightly ambiguous -- the dnsqr message schema does
include various elements (IPs, port numbers, qname/qtype/qclass,
response rcode, etc.) that are culled from the query and response, but
it _also_ includes the original, unmodified query and response packets,
starting at the IP header (no link layer headers).

my original thinking in laying out the schema this way was so that dnsqr
messages could be filtered in bulk over fields of particular interest in
the IP/UDP/DNS headers without having to invoke full-blown protocol
decoders (and also so that i could reuse dnsqr message objects as hash
table entries :).  it occurs to me that with a few minor changes and
additions the dnsqr format could support 'reductions', i.e. field
deletions (and/or the population of certain optional fields with data
from the packet payloads, before deleting the packets from the message)
that would look very similar to "dnsflow" without having to be developed
as another message schema.

> at its simplest, dnsflow would mean that if you got these dnsqr's...
> 
> client ip	server ip	opcode	q-tuple		rcode	intent
> ------------------------------------------------------------------------
> [...]

> ...then the result after dnsflow filtering would be...
> 
> kind		thing				count
> -----------------------------------------------------
> [...]

> now, that's "at its simplest" and i think it's easy to argue that it's so
> simple as to be useless.  without compound buckets you don't know what
> you'd need to know.  so we might like to see these additional dnsflows:
> 
> kind		thing				count
> -----------------------------------------------------
> qtuple-by-cli	isc.org/in/a-204.152.187.6	1
> [...]

you're starting to lose me with your "kind" / "thing" examples but i
think i get the gist of what you're saying.

dnsqr, as constituted in its current version, basically provides a
stream of mostly immutable tuples.  (i say mostly because there's no
good way to modify an NMSG message from the command line without writing
a libnmsg program.)  there are more than just these fields in the real
implementation, but for a simple example:

    (type, query_ip, response_ip, proto, query_port, response_port, id,
        qname, qtype, qclass, rcode, query_packet, response_packet)

instead of developing a new message schema for "dnsflow", and a tool to
map from the lossless "dnsqr" to the lossy "dnsflow" format, i believe
we could simply delete fields from the dnsqr tuple.  e.g., supposing we
don't care about the port numbers, the id, or the raw packets, we invoke
a not-yet-written tool to delete those fields (this tool could also
generate this particular subset of dnsqr from a live network source by
simply performing the deletions prior to generating output):

    # dnstool -r dnsqr_original.nmsg -w dnsflow.nmsg \
        -x id -x query_port -x response_port -x query_packet -x response_packet

(this could obviously be simplified to a --iwantthiscombo command line
flag but i wanted to show the full generality of the approach.)

you would then have a bunch of messages with the smaller tuples:

    (type, query_ip, response_ip, proto, qname, qtype, qclass, rcode)

in addition to simple deletions, we could also do reductions.  e.g., one
could select a prefix length and reduce the query_ip and response_ip
fields to new "query_net" and "response_net" fields that are filled with
the network prefixes covering the original IPs.  the *_ip fields would
then be deleted.

    # dnstool -r input.nmsg -w output.nmsg \
        --aggregate-query-ip4 16 --aggregate-query-ip6 40 \
        --aggregate-response-ip4 24 --aggregate-response-ip6 64

(for extra credit, import a BGP table dump and dynamically aggregate on
longest covering prefix, or reduce to *_asn fields instead of *_net
fields.)

now you have a stream of reduced tuples:

    (type, query_net, response_net, proto, qname, qtype, qclass, rcode)

then suppose that you wanted to aggregate these reduced-tuple messages
-- messages that are identical except for timestamp are collapsed into a
single message with additional (time_first, time_last, count) fields.
(which can themselves be collapsed together, summing the counts and
setting the timestamp pair to the earliest/latest values.)

first you would sort your collected dnsqr logs, so that all the
identical messages are adjacent in the message stream:

    # for i in `seq 0 23`; do dnstool --sort -r dnsflow-$i.nmsg \
        -w dnsflow-sorted-$i.nmsg && rm dnsflow-$i.nmsg; done

then you would merge the identical, adjacent messages together:

    # dnstool --merge -r dnsflow-sorted-*.nmsg -w dnsflow-merged.nmsg \
        && rm dnsflow-sorted-*.nmsg

now you have a deduplicated set of messages with these fields:

    (count, time_first, time_last, type, query_net, response_net, proto,
        qname, qtype, qclass, rcode)

> many details would have to get ironed out during deployment, like balancing
> dnsflow bucket size (how high does the count get before you close the bucket,
> produce some output, and reset the counter?) against dnsflow bucket count
> (how many buckets can you have open before you start LRU'ing the oldest ones?)
> and dnsflow bucket age (how long can you keep incrementing a counter before
> you decide to just close/emit/reset based on the age of the oldest increment?)

all imminently solveable problems.  (btw, i think you want to use a FIFO
rather than an LRU in this particular case.)

i think a hypothetical tool like the one i've described would help solve
CZ.NIC's problem upthread (except for the need for extreme bit-packing
efficiency, but since NMSG has transparent zlib support and benefits
from protobuf's varint encoding i tend not to worry about encoding
efficiency too much) as well as a more general class of problems.

(there is also the problem that dnsqr supports TCP packets but does not
yet have TCP stream reassembly support, but that's just a Small Matter
of Programming.)

-- 
Robert Edmonds
edmonds at isc.org