[dns-operations] dnsflow again (Re: DNS Traffic Archive Protocol )

Tue Dec 7 08:44:12 UTC 2010

On 12/07/2010 03:29 AM, Robert Edmonds wrote:
>
> dnsqr, as constituted in its current version, basically provides a
> stream of mostly immutable tuples.  (i say mostly because there's no
> good way to modify an NMSG message from the command line without writing
> a libnmsg program.)  there are more than just these fields in the real
> implementation, but for a simple example:
>
>      (type, query_ip, response_ip, proto, query_port, response_port, id,
>          qname, qtype, qclass, rcode, query_packet, response_packet)
>
> instead of developing a new message schema for "dnsflow", and a tool to
> map from the lossless "dnsqr" to the lossy "dnsflow" format, i believe
> we could simply delete fields from the dnsqr tuple.  e.g., supposing we
> don't care about the port numbers, the id, or the raw packets, we invoke
> a not-yet-written tool to delete those fields (this tool could also
> generate this particular subset of dnsqr from a live network source by
> simply performing the deletions prior to generating output):
>
>      # dnstool -r dnsqr_original.nmsg -w dnsflow.nmsg \
>          -x id -x query_port -x response_port -x query_packet -x response_packet
>
> (this could obviously be simplified to a --iwantthiscombo command line
> flag but i wanted to show the full generality of the approach.)
>
> you would then have a bunch of messages with the smaller tuples:
>
>      (type, query_ip, response_ip, proto, qname, qtype, qclass, rcode)
>

What about other data that are now apparently either not present or 
present only in the raw packets, such as the time it took the server to 
respond to a query, request of DNSSEC, request of recursion, etc.?
One might also want to store only the answer and authority sections of 
the reply, but not the additional section.
Could such things be easily included or would a new schema or format 
make more sense?

> first you would sort your collected dnsqr logs, so that all the
> identical messages are adjacent in the message stream:
>
>      # for i in `seq 0 23`; do dnstool --sort -r dnsflow-$i.nmsg \
>          -w dnsflow-sorted-$i.nmsg&&  rm dnsflow-$i.nmsg; done
>
> then you would merge the identical, adjacent messages together:
>
>      # dnstool --merge -r dnsflow-sorted-*.nmsg -w dnsflow-merged.nmsg \
>          &&  rm dnsflow-sorted-*.nmsg
>
> now you have a deduplicated set of messages with these fields:
>
>      (count, time_first, time_last, type, query_net, response_net, proto,
>          qname, qtype, qclass, rcode)
>

I agree that aggregation is the most powerful compression method in this 
case, but for many applications (such as anomaly detection, traffic peak 
analysis, etc.) it is necessary to have the stream of packets sorted by 
time, not by the content.
Also, unlike for recursive servers, for authoritative servers, identical 
queries from one client get repeated only once per TTL of the 
corresponding RR, so the period for aggregation would have to be 
relatively large (at least when the aggregation is performed on the 
whole tuples as you described), thus making the whole process of 
compression and eventual later decompression more demanding.

Best regards

Beda

-- 
Bedrich Kosata
CZ.NIC Labs <http://labs.nic.cz>