[dns-operations] DNS Traffic Archive Protocol

Bedrich Kosata bedrich.kosata at nic.cz
Thu Dec 2 12:25:09 UTC 2010


Hi Phil,

On 12/02/2010 12:47 PM, Phil Regnauld wrote:
> 	Hi Bedrich,
>
> 	Thanks for the clarification.
>
> 	A few comments:
>
> Bedrich Kosata (bedrich.kosata) writes:
>> - combine DNS queries and responses together to remove redundancy.
>
> 	Does the new format preserve the timestamp for when both query
> 	and response were originally received ?
>

The timestamp of the query is saved and then the response time (response 
timestamp - query timestamp) is saved, so both timestamps are saved in 
one way or another.

>> The reason for such optimizations lies within a simple calculation.
>> When only one byte is stored for every packet in a traffic of 10,000
>> queries per second (qps), it amounts to ~860 MB of data per day and
>> ~300 GB per year. Therefore, for CZ.NIC, each wasted byte in the
>> format structure means 300 GB of useless data per year.
>
> 	Considering a real-world scenarion, do you have a comparison,
> 	estimated, of what it would require in diskspace to store a whole
> 	year of data for an auth server for .CZ, using a traditional format
> 	like ncap/pcap, nmsg, and the new format you propose ?
>

Unfortunately, I cannot give you data about nmsg because I did not try 
it yet.
When talking about pcap vs. the experimental format, it would be 
something like 30 TB/year using *gzipped* pcap and <3 TB/year with a 
*gzipped* version of the current format (which will probably change in 
the following weeks and months).

> 	You mention 8% of the original pcap file sizes below, that's a factor
> 	12, which is definitely interesting, and probably makes a good
> 	argument against just saying "but disk is cheap" - especially the
> 	improved processing speeds.  What about nmsg ?
>

Of course this high "compression" ratio is achieved mainly by omitting 
the content of responses. On the other hand, because the data in the 
responses is generated by our authoritative servers anyway, for many 
purposes, it is not necessary to store it.
In our case, we would probably use two (or more) storage levels with 
different retention periods. For example, complete packets for the last 
month and packets without reply content for long term storage.

>> 2/ a library for reading the experimental format written in C which
>> is capable of reading data 50x faster than from the pcap file.
>
> 	Nice.
>
>> We would also like to implement an export option that would
>> reconstruct as much as possible of the original pcap files, so that
>> the stored content may be used for example for testing of DNS
>> servers by replaying the stored queries.
>
> 	A filter/import module for wireshark/tshark would be nice to
> 	have as well.
>

This is an interesting point. Conversion back to pcap should accomplish 
the same, but direct import would surely be more convenient and also faster.

>> As I wrote at the beginning of this email, we are in an early stage
>> of development, but the results so far are very interesting. We
>> would be happy for any input on this subject and hope to have some
>> working code to show you soon.
>
> 	In general, I like the idea, but I can certainly see a benefit to
> 	having a more general format that could handle other protocols
> 	as well, without having to start from scratch.
>

I can see your point and we will certainly evaluate other possibilities.
However, using an existing protocol would certainly bring some overhead 
to the format. Also, keeping the whole thing simple has much appeal. 
Many things that try to be good for anything end up being good for nothing.
I would rather make it a simple, one purpose thing that works really well.

> 	Look forward to examining the code.
>

It will take me some time to get it to the stage when it would be good 
enough for release, so please be patient.

Cheers,

Beda

> 	Cheers,
> 	Phil




More information about the dns-operations mailing list