[dns-operations] dnsflow again (Re: DNS Traffic Archive Protocol )
Bedrich Kosata
bedrich.kosata at nic.cz
Tue Dec 7 08:44:12 UTC 2010
On 12/07/2010 03:29 AM, Robert Edmonds wrote:
>
> dnsqr, as constituted in its current version, basically provides a
> stream of mostly immutable tuples. (i say mostly because there's no
> good way to modify an NMSG message from the command line without writing
> a libnmsg program.) there are more than just these fields in the real
> implementation, but for a simple example:
>
> (type, query_ip, response_ip, proto, query_port, response_port, id,
> qname, qtype, qclass, rcode, query_packet, response_packet)
>
> instead of developing a new message schema for "dnsflow", and a tool to
> map from the lossless "dnsqr" to the lossy "dnsflow" format, i believe
> we could simply delete fields from the dnsqr tuple. e.g., supposing we
> don't care about the port numbers, the id, or the raw packets, we invoke
> a not-yet-written tool to delete those fields (this tool could also
> generate this particular subset of dnsqr from a live network source by
> simply performing the deletions prior to generating output):
>
> # dnstool -r dnsqr_original.nmsg -w dnsflow.nmsg \
> -x id -x query_port -x response_port -x query_packet -x response_packet
>
> (this could obviously be simplified to a --iwantthiscombo command line
> flag but i wanted to show the full generality of the approach.)
>
> you would then have a bunch of messages with the smaller tuples:
>
> (type, query_ip, response_ip, proto, qname, qtype, qclass, rcode)
>
What about other data that are now apparently either not present or
present only in the raw packets, such as the time it took the server to
respond to a query, request of DNSSEC, request of recursion, etc.?
One might also want to store only the answer and authority sections of
the reply, but not the additional section.
Could such things be easily included or would a new schema or format
make more sense?
> first you would sort your collected dnsqr logs, so that all the
> identical messages are adjacent in the message stream:
>
> # for i in `seq 0 23`; do dnstool --sort -r dnsflow-$i.nmsg \
> -w dnsflow-sorted-$i.nmsg&& rm dnsflow-$i.nmsg; done
>
> then you would merge the identical, adjacent messages together:
>
> # dnstool --merge -r dnsflow-sorted-*.nmsg -w dnsflow-merged.nmsg \
> && rm dnsflow-sorted-*.nmsg
>
> now you have a deduplicated set of messages with these fields:
>
> (count, time_first, time_last, type, query_net, response_net, proto,
> qname, qtype, qclass, rcode)
>
I agree that aggregation is the most powerful compression method in this
case, but for many applications (such as anomaly detection, traffic peak
analysis, etc.) it is necessary to have the stream of packets sorted by
time, not by the content.
Also, unlike for recursive servers, for authoritative servers, identical
queries from one client get repeated only once per TTL of the
corresponding RR, so the period for aggregation would have to be
relatively large (at least when the aggregation is performed on the
whole tuples as you described), thus making the whole process of
compression and eventual later decompression more demanding.
Best regards
Beda
--
Bedrich Kosata
CZ.NIC Labs <http://labs.nic.cz>
More information about the dns-operations
mailing list