[dns-operations] compressing DNS traffic data
jgreco at ns.sol.net
Wed Dec 8 11:13:17 UTC 2010
> On 8 Dec 2010, at 09:16, Stephane Bortzmeyer wrote:
> > "RR class" unlike what I wrote earlier, I believe you can safely save
> > two bytes here :-)
> I fail to understand this angels-on-pinhead debate when storage costs
> less than a tenth of a cent per gigabyte in mainstream retail outlets.
> The cost of everyone's time discussing what to delete or what/how to
> compress must be orders of magnitude more than a few hundred 1 TB
> disks. Assume we have 100-200 engineers here paid rather more than the
> minimum wage, each or whom spends an hour or two reading or posting on
> this thread about archiving DNS traffic. Then do the arithmetic.
> Why not just agree to store everything in wire format (maybe with
> added timestamps if sub-millisecond precision is available) and be
> done with it?
Presumably because there's more to it than just that.
Storing data in wire format is great, except that now you also have
more I/O to do. Compressing it is problematic as well, but relatively
speaking, there's been an explosion in cores, memory, and algorithms
that could make compression/decompression extremely fast. As a result,
it's certainly very conceivable that I/O could become a significant
bottleneck. My experience with storage certainly suggests that this
is usually true.
Presumably we're looking to archive data in order to make use of it at
some later time; it's hard to know now what form that will take. If,
however, ten sites each have an engineer analyzing the data once a
week, and this storage format survives for a decade, that's 5200 runs.
If each of those runs is on 1TB of data, and it's able to pull in the
data at 200MB/sec, that's what, about two hours for uncompressed data
but if you have a format that manages a 2:1 compression ratio, that
dips to about one hour.
So you've suddenly saved 5200 hours of engineer time. That assumes a
limited amount of data, a limited number of sites using the data, and
a limited lifetime to the format as well. All of these assumptions
are likely to be conservative.
What about other costs? SATA disk is cheap, yes, but larger amounts
of storage aren't. What about the site that wishes to store all DNS
traffic to/from its servers for a period of years? We see dumb stuff
happening with data retention laws all around the world, so not only
could it happen, but it very likely will somewhere.
Engineering up-front, while it might seem expensive, is often cheaper
in the long run than counseling everyone to just throw more resources
at a poorly-designed solution.
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
More information about the dns-operations