[dns-operations] dnsflow again (Re: DNS Traffic Archive Protocol )

Tue Dec 7 20:36:18 UTC 2010

On 12/07/2010 08:51 PM, Robert Edmonds wrote:
> Bedrich Kosata wrote:
>> On 12/07/2010 03:29 AM, Robert Edmonds wrote:
>>> dnsqr, as constituted in its current version, basically provides a
>>> stream of mostly immutable tuples.  (i say mostly because there's no
>>> good way to modify an NMSG message from the command line without writing
>>> a libnmsg program.)  there are more than just these fields in the real
>>> implementation, but for a simple example:
>>>
>>>      (type, query_ip, response_ip, proto, query_port, response_port, id,
>>>          qname, qtype, qclass, rcode, query_packet, response_packet)
>>>
>>> instead of developing a new message schema for "dnsflow", and a tool to
>>> map from the lossless "dnsqr" to the lossy "dnsflow" format, i believe
>>> we could simply delete fields from the dnsqr tuple.  e.g., supposing we
>>> don't care about the port numbers, the id, or the raw packets, we invoke
>>> a not-yet-written tool to delete those fields (this tool could also
>>> generate this particular subset of dnsqr from a live network source by
>>> simply performing the deletions prior to generating output):
>>>
>>>      # dnstool -r dnsqr_original.nmsg -w dnsflow.nmsg \
>>>          -x id -x query_port -x response_port -x query_packet -x response_packet
>>>
>>> (this could obviously be simplified to a --iwantthiscombo command line
>>> flag but i wanted to show the full generality of the approach.)
>>>
>>> you would then have a bunch of messages with the smaller tuples:
>>>
>>>      (type, query_ip, response_ip, proto, qname, qtype, qclass, rcode)
>>>
>>
>> What about other data that are now apparently either not present or
>> present only in the raw packets, such as the time it took the server
>> to respond to a query, request of DNSSEC, request of recursion,
>> etc.?
>
> we would add new fields to the ISC/dnsqr message schema as necessary.
> for instance, we could add an optional "delay" field that is populated
> with the difference between the response timestamp and the query
> timestamp when the raw query/response packets and their timestamps are
> removed from the message.
>
> (in fact the nmsg message module API would make it easy to have a
> "delay" field that is either calculated on-demand based on the
> query/response packet timestamps or stored directly in a protobuf field
> if the packets and their timestamps have been removed from the message.)
>
> similar we could have new optional "qflags" and "rflags" fields that can
> be populated with the query flags and the response flags.  and these
> don't necessarily have to be copied verbatim from the DNS message
> header, they could be synthesized from the message header flags and the
> extended EDNS flags, which would take care of recording both the DO and
> RD flags.  (similarly those fields can be synthesized on-demand if the
> original packets are present in the message.)
>

Are we the still talking about the original dnsqr schema or is this 
something new, similar to what Paul proposed? To me it seems that with 
so many adjustments, you have just laid out plans for a completely new 
schema.
BTW, I did some preliminary tests and I think that a nmsg schema which 
would resemble completely or almost completely parsed DNS response (with 
most parts optional) could be a good way to proceed. It would allow 
different parts of the data to be omitted, which seems crucial for such 
a format, but could also accommodate all the data and in all cases allow 
for faster parsing - due to elimination of different network layers, 
fragmentation, ip versions, etc.
I tried converting data from my experimental format into a protobuf 
based format. The result was almost twice the size, but it compressed to 
almost the same size when advanced compression (xz) was used. Therefore 
it seems to me that nmsg is really the way forward because tools are 
already available to work with it.

>> Also, unlike for recursive servers, for authoritative servers,
>> identical queries from one client get repeated only once per TTL of
>> the corresponding RR, so the period for aggregation would have to be
>> relatively large (at least when the aggregation is performed on the
>> whole tuples as you described),
>
> it is not quite the same thing, but i have done something similar with
> the .com ZFA TLD dumps.  basically the NS and A/AAAA RRsets in the .com
> zone are repeated once a day and aggregated together.  i have no
> problems aggregating entire months of NMSG-formatted sorted .com data
> together.  (the only issue is sorting the .com data which in my naïve
> implementation loads the data for an entire zone version into memory.)
> there aren't any network-level fields (query IPs, query ports, etc.) in
> this application but the situation is roughly analogous to a single IP
> asking for every record in the zone once a day :)
>
>> thus making the whole process of compression and eventual later
>> decompression more demanding.
>
> you mention "decompression" but this is a lossy compression scheme.  how
> do you decompress a tuple of elements into the original packet?  are you
> instead talking about synthesizing a DNS message that resembles the
> original message, e.g. for importing into tools that only process raw
> messages?
>

By decompression I just meant conversion from tuple domain (order) to 
time domain, so that application that require input it temporal order 
could use the output. I did not have any particular tool in mind.

Best regards

Beda