[dns-operations] DNS Traffic Archive Protocol

Thu Dec 2 21:55:39 UTC 2010

you wouldn't want to use dnsqr for this use case since dnsqr is designed
to be losslessly convertible to pcap format.  dnsqr is also somewhat
biased towards passive DNS replication on recursive servers where the
response is quite important.  it's less important to keep responses when
monitoring authoritative nameservers, of course.

nmsg (which is distinct from dnsqr, which is simply an nmsg plugin),
however, might be more useful, as it's basically an encapsulation format
for payloads encoded in google protobuf format:

    http://code.google.com/p/protobuf/

a protobuf representation of your data would be somewhat less efficient
than packing the bits by hand, but i suspect with compression the sizes
might be competitive.  protobuf is still quite efficient, though, see:

    http://code.google.com/apis/protocolbuffers/docs/encoding.html

you might also consider an "aggregated" format for older data.  that is,
given a stream of multiple data tuples that differ only by timestamp,
e.g.:

    time: T_1
    data: foo

    time: T_2
    data: foo

    time: T_3
    data: bar

    time: T_4
    data: foo

    time: T_5
    data: bar

you aggregate this into:

    time_first: T_1
    time_last: T_4
    count: 3
    data: foo

    time_first: T_3
    time_last: T_5
    count: 2
    data: bar

nmsg / protobufs make this easy since you can use the exact same schema
for the pre- and post- aggregated data sets.  hand-packed structs are
less flexible in this regard.

i've used this approach for aggregating both ZFA TLD data as well as
passive DNS data.  when aggregated on monthly boundaries i get about a
20X reduction in size for ZFA TLD data and about a 10X reduction in size
for passive DNS data.  i'd be interested if this strategy works for
authoritative nameserver traces as well.

if you're interested in developing custom code based on nmsg, the
mailing list is here:

    https://lists.isc.org/mailman/listinfo/nmsg-dev

Bedrich Kosata wrote:
> Dear Paul,
> 
> I must admit that I checked it only very briefly. My impression was
> that it was too low level for our use - for example TCP traffic is
> stored as raw packets - and that it would also take to much space.
> One of the important features of what we do is that we (optionally)
> throw some of the things out, which I am not sure is possible with
> dnsqr.
> 
> However, after the responses on this mailing list, I will certainly
> give it a try to see how it works for us and probably report back
> what I find.
> 
> Best regards
> 
> Beda
> 
> On 12/02/2010 01:52 PM, Paul Vixie wrote:
> >bedrich, did you look at nmsg's dnsqr schema and tool set before you set
> >out? --paul

-- 
Robert Edmonds
edmonds at isc.org