[dns-operations] 'dnstap' (Re: Prevalence of query/response logging?)

Tue Jul 8 02:33:22 UTC 2014

Roland Dobbins wrote:
> I think dnstap is a very good idea; still, it would be helpful to understand why it wasn't implemented in IPFIX, rather than in a custom telemetry format . . .

We did not frame the evaluation in terms of selecting (or building) a
particular "telemetry format".  Instead we focused on some finer grained
functional areas where we knew we would have to build or select
particular components:

    1) The "dnstap" idea entails modifying existing DNS servers, adding
    inline payload logging capabilities to the fast path of the DNS
    server.  Performance is a key consideration, and we would prefer to
    have the capability to, under high load, drop excess logging
    payloads rather than block the server from making progress at its
    real job of returning answers to clients.  So we need some sort of
    asynchronously-processed circular queue that can offload as much of
    this work from the DNS server's critical path.  

    2) A way of encoding the log payload from the DNS server's
    internal, in-memory representation, to a serialized byte sequence
    that can be transported over something like a socket or to a file.
    (The "encoding".)

    3) A way of actually transporting the serialized log payload to a
    receiver over something like a socket or file.  (The "transport".)

I don't believe IPFIX has much to offer for #1, since this is an overly
specific (yet quite important) implementation detail.  We ended up
writing our own lockless memory-barrier based circular buffer
implementation, based on a technique used in the Linux kernel:

    https://www.kernel.org/doc/Documentation/circular-buffers.txt

and then placing this in a library for re-use in different applications.

If you combine #1 and #3 above and allow them to be implemented in a
single package, one obvious contender is ZeroMQ; ultimately I think
ZeroMQ is not that great of a choice for embedding *directly* in DNS
servers for a few different reasons: e.g., there are several different
versions (the Debian archive offers ZeroMQ major versions 2.x, 3.x, and
4.x) and the compatibility guarantees are somewhat convoluted.  So we
did not select ZeroMQ for use in the DNS server-side component.  But I
didn't want to preclude the possibility of re-sending dnstap payloads
over binary-clean transports that are transparent to payload content
like ZeroMQ, hence the "transport/encoding" split between #2 and #3.

It looks like maintaining the #2/#3 "transport/encoding" split with
IPFIX is impossible; it appears IPFIX is tightly coupled to the IP
transport protocol: there is an IPFIX-over-UDP, IPFIX-over-TCP,
IPFIX-over-SCTP...  What if you want to send payloads over an AF_UNIX
socket, or via an HTTP(S) GET/POST, WebSockets connection, some new
technology that hasn't been invented yet, etc.?  Enforcing a firm
separation between a generic lower-level "transport" and a specific
upper-level "encoding" is something that worked out pretty well for us
in a different context:

    http://www.caida.org/workshops/isc-caida/1210/slides/isc1210_redmonds.html

I say "appears" above because my next complaint is that there are too
many specifications documents for IPFIX.  There are several dozen listed
here:

    https://datatracker.ietf.org/wg/ipfix/documents/

This is in contrast to generic serialization systems for structured data
like Protocol Buffers, Thrift, Apache Avro, MessagePack, Cap'n Proto,
BSON, etc. etc.  Most of these can be described in a single fairly
succinct document each; IPFIX appears to encompass a lot more than just
serialization of structured data and consequently has a much larger
specification footprint.  If IPFIX is well-suited for applications other
than representing IP flows, it is awfully hard to tell from the outside
without plowing through a ton of specifications.  This is itself a
downside; we have to convince not just ourselves, but DNS software
vendors to import this code and DNS software users that they might want
to use this code.

For a dnstap file format I was awfully tempted to use the traditional
pcap-savefile(5) format with a new linktype, but pcap has a hard 64K
frame size limit, which would make it impossible to represent dnstap
payloads with maximally sized DNS messages in a single frame, which I
wanted to make a hard requirement for dnstap.  I tried to find the
analogous limit for IPFIX, which appears to also use a 16-bit field to
represent message length.  (Possibly IPFIX can split payloads across
multiple messages, but if it can, this is not readily apparent, and we
would prefer not to have to invoke such a capability anyway.)

Also, I found the following blog post rather interesting:

    http://www.ntop.org/nprobe/why-nprobejsonzmq-instead-of-native-sflownetflow-support-in-ntopng/

The fact that not even flow probe vendors are happy with IPFIX is
somewhat telling.  I do not know enough about flow probes to evaluate
most of his very specific technical complaints with IPFIX, but something
like JSON or protobufs paired with ZeroMQ is a fairly reasonable
solution for a wide variety of use cases.

So, sorry we didn't pick IPFIX.  It just doesn't look like a good fit
for what we want to make possible, and there are a lot of general
purpose technologies out there that I would consider first before
considering IPFIX for a particular application.

-- 
Robert Edmonds