[dns-operations] 'dnstap' (Re: Prevalence of query/response logging?)

Mon Jul 7 19:42:50 UTC 2014

Hi, Bert:

bert hubert wrote:
> Paul, I've written many many TCP/IP reassemblers and in fact the overhead is
> trivial. Your kernel does it all the time for example. The trick is to have
> a limited window in which you do the reassembly, and not scan over the
> entire file. Neither does a kernel.

Having QA'd IDSes in a past life, I don't disagree that the overhead, in
terms of memory and CPU, ought to be minimal.

However, the implementation complexity of a production grade TCP stream
reassembler is high enough and the environment unforgiving enough that
I'd prefer to hand the task off to a bullet-proof stand-alone library
implementation.  The last time I went looking for such an implementation
I came up empty, but I'd love to be proven wrong.

> Having said all that, it doesn't mean we aren't big fans of logging. But
> people I know are also big fans of logging being separate from their
> production servers, and this implies packets & reassembly. This is why we
> have ample tooling in powerdns-tools to analyze packets. 
> 
> Packets also have the wonderful advantage that they represent what actually
> happened and not what the nameserver THOUGHT that happened.  We've for
> example been able to solve & debug many issues caused by malformed packets. 
> Such malformed packets would probably not retained their unique malformation
> by serialization to dnstap.
> 
> As another example, we've in the past had cases where our own logging showed
> we were serving answers with low latency yet packet traces showed
> signigicant time between query and response packets.  The ultimate issued
> turned out to be queueing before our listening socket.  Once we *got* the
> packet we answered it quickly enough.  But we did not (and could not easily)
> account for when the packet hit the server.
> 
> Our tool 'dnsscope' shows such statistics wonderfully.

I agree with you that in many cases being able to know "what actually
happened" on the network vs what the DNS software thought had happened
is quite handy, and I don't see packet capture as a technology being
displaced for those cases when you want to get at the network-level
artifacts.

(I should note that dnstap will happily serialize malformed DNS
*messages* [e.g., say some DNS record data is encoded incorrectly], but
malformed *packets* are out-of-scope [e.g., say some middlebox corrupts
a fragmented EDNS response and the receiver's kernel discards the
packets instead of passing them to the nameserver process].)

There are a lot of great use cases for DNS packet capture that can show
network-level malfeasance (here I take an expansive view of
"network-level" that includes everything after the initiator send()'s
and the responder recv()'s) that will be awkward or impossible to
replicate with an in-server logging facility like dnstap.  Those use
cases aren't what I'd like to focus on with dnstap.  It's a nice bonus
that the in-server approach obviates the need to condition the input by
extracting DNS payload content from the lower layer frames (reassembling
IP fragments, TCP streams, etc.), but that's not the primary reason I
started working on the dnstap idea, however.

The original, motivating use case for dnstap is passive DNS replication,
and specifically the kind of hardened passive DNS replication that we
implemented at Farsight (well, originally at ISC).  It's worth quoting
from Florian Weimer's original passive DNS paper on the "hardening"
difficulties:

    Most DNS communication is transmitted using UDP. The only protection
    against blindly spoofed answers is a 16 bit message ID embedded in
    the DNS packet header, and the number of the client port (which is
    often 53 in inter-server traffic). What is worse, the answer itself
    contains insufficient data to determine if the sender is actually
    authorized to provide data for the zone in question. In order to
    solve this problem, resolvers have to carefully validate all DNS
    data they receive, otherwise forged data can enter their caches.

    ("Passive DNS Replication" § 3.3, "Verification")

There are two interrelated issues here that Florian left to future
implementers:

    + "[B]lindly spoofed [UDP] answers".  We solved this in the capture
    component of our passive DNS system ("dnsqr") by keeping a table of
    outstanding UDP queries and doing full RFC 5452 (hi Bert!) § 9.1
    style matching of the corresponding responses.

    + "[T]he answer itself contains insufficient data to determine if
    the sender is actually authorized to provide data for the zone in
    question."  This is trickier; basically there is nothing internal to
    the contents of a standalone DNS query/response transaction that
    allows us to evaluate the trustworthiness of the authority and
    additional sections of the response message.  (For instance, if you
    see a query/response for the question name "www.example.com", may
    the authority section specify NS records for "example.com"?)

    The tack we took for this problem is to passively build a giant
    cache of NS and A/AAAA records (bootstrapped from the root zone),
    and work downwards from there based on the responses logged by our
    capture component.  There are obvious scaling problems with this
    approach.

This latter problem is unwieldy enough to do with passive packet
capture, especially when you are aggregating the responses from many
recursive servers (as we are), that it'd be highly desireable to be able
to obviate it somehow.  And there is: if we can modify the recursive DNS
implementation (and this is a big if), we can have the DNS server log
the cache-miss response and annotate it with the "bailiwick" domain for
the transaction.  This is enough information that we can elide the
large, stateful bailiwick reconstruction cache of the passive packet
capture approach.  We have a working patchset for Unbound implementing
this idea and I know that it's possible with BIND.

There are other use cases where it'd nice to be able to avoid resorting
to packet capture.  For instance, virtually all of the "DNS looking
glass" implementations I've seen do some sort of munging of the DNS
message content into text/JSON/HTML/etc.  Ideally it'd be possible to
have the option of passing along the original verbatim DNS response
message content.  (I think the RIPE Atlas DNS probe currently comes
closest to this ideal.  IIRC, there is a way to extract the original DNS
message byte sequence, but I believe it's a base64-encoded payload
inside a JSON document, or something like that.)

Another closely related use case is actually being able to save a trace
of the DNS message(s) sent/received by debugging tools like dig, kdig,
drill, delv, etc.  IMO, it's inconvenient enough setting up a packet
capture tool running alongside the query tool (needs root, needs to
include DNS packet traffic initiated by the query tool but exclude any
other incidental DNS traffic that may be captured, may need to scrub IP
header addresses from your local network if you want to share the
capture, etc. etc.) in order to save a proper "archival quality" copy of
the message data that people rarely do this; what you get instead is
usually a copy-paste of the "dig-style" output generated by these tools
in most cases.  And you end up with more-or-less pointless differences
between the output formats of these tools, like, to pick an example at
random, the trailing metadata that these tools generate, which might
look like

    ;; Query time: 0 msec
    ;; SERVER: 127.0.0.1#53(127.0.0.1)
    ;; WHEN: Mon Jul 07 14:35:57 EDT 2014
    ;; MSG SIZE  rcvd: 239

or

    ;; Received 239 B
    ;; Time 2014-07-07 14:35:54 EDT
    ;; From 127.0.0.1 at 53(UDP) in 0.2 ms

depending on the whims of the vendor who produced the tool you're using.

CZ.NIC's kdig has working support for being able to export the
query/response messages in dnstap format and to generate display output
from the messages saved to a dnstap file, and I hope to be able to
extend the debugging tools from other vendors to be able to similarly
handle dnstap files.

Plain old query logging at scale will probably best be done by packet
capture for the forseeable future, unless you'd like to be able to
export information that doesn't appear on the wire (e.g., whether a
query was served from cache or not), in which case something like dnstap
might be a good fit.  Certainly I'd like to have the DNS resolver on my
home network be able to generate good logs "for free" out of the box
much like your typical HTTP server (apache/nginx/etc.) comes properly
configured to log accesses.

However, what I don't think the future involves is hanging some more
%s's off of a big printf() style format string like:

        ns_client_log(client, NS_LOGCATEGORY_QUERIES, NS_LOGMODULE_QUERY,
                      level, "query: %s %s %s %s%s%s%s%s%s (%s)", namebuf,
                      classname, typename, WANTRECURSION(client) ? "+" : "-",
                      (client->signer != NULL) ? "S": "",
                      (client->opt != NULL) ? "E" : "",
                      ((client->attributes & NS_CLIENTATTR_TCP) != 0) ?
                                 "T" : "",
                      ((extflags & DNS_MESSAGEEXTFLAG_DO) != 0) ? "D" : "",
                      ((flags & DNS_MESSAGEFLAG_CD) != 0) ? "C" : "",
                      onbuf);

(Not to pick on BIND/ISC specifically here, but I had the function
handy.)

> It so happens that we now have the infrastructure to plug in arbitrary
> modules at packet entry & exit, we could perhaps do a dnstap implementation
> there. Will keep you posted.

This is great news; in general I think a lot of people would like to see
more "hook"-ability like this from DNS software.  (Unbound's module
stacks are quite interesting and I originally wanted to implement dnstap
in Unbound as an Unbound module, but I wasn't able to get it to work
out, unfortunately.)

-- 
Robert Edmonds