[dns-operations] Mozilla Firefox and ANY queries

Andrew Sullivan ajs at anvilwalrusden.com
Sat Feb 28 03:01:56 UTC 2015

On Fri, Feb 27, 2015 at 02:09:26AM -0800, Reed Loden wrote:
> They believe this is https://bugzilla.mozilla.org/show_bug.cgi?id=1093983,
> though I'm not 100% sure after reading the comments. Could be separate
> issues conflated together...
> In any case, it would be helpful to the release mgmt team if they had a
> better idea of the problems this is causing and how critical of an issue it
> is, in order to better prioritize it (once a true cause has been found).
> Also, any ideas on timeline as to when this started would help a lot.


I can't speak for recursive operators.  I looked for about 10 seconds
into this on Dyn's authoritative data today, and there was a giant
spike in ANY queries visible on them.  Note that this could be a
coincidence -- I didn't check.  The increase is days and not weeks

I've read the bugzilla ticket, though, and it's not clear that the
implementers understand what ANY does.  I don't have a login, but the
ticket says that this list is being monitored so I hope this helps.
ANY is _not_ a way of getting multiple records at once, and there's
good reason to suppose (given the ANY queries I see above) that it's
not actually happening in the order Steve Workman appears to think it

According to reporter causeless, what one sees is an A query followed
by an ANY query.  On any properly-working system, the ANY query will
provide exactly and only the response for the A query just resolved.
ANY is not really an RRTTYPE, but rather a QTYPE that says, "Gimme
what you have."  That this is resulting in cache abuse suggests that
what's _actually_ happening is that the A or AAAA and the ANY are
being issued in an unpredictable order.

In particular, this bit of Workman's response appears to suggest a
misunderstanding of how DNS works:

    We had to use this approach of requesting A/AAAA then ANY because
    of the particular nuances of hostname resolution APIs: —
    getaddrinfo, the POSIX cross-platform API doesn’t give access to
    TTL by itself. Previously on Windows, and currently on other
    platforms, we provided a fixed expiration time for the record
    because we can’t get the accurate TTL from the record itself.  —
    DnsQuery on Windows does provide access to the TTL, but it doesn’t
    provide an easy way to do an UNSPEC request - this means that not
    only could we not get A and AAAA at the same time, but we would
    miss out on NetBT resolutions. And the ordering of the addresses
    in the response would be different - this can be configured
    differently on different networks and machines and Windows takes
    care of that complexity in getaddrinfo. — Basically, DnsQuery on
    it’s own could have caused a ton of compatibility issues.

I'm aware of the crappy no-TTL stuff in getaddrinfo.  (I suggest that
to fix this for real, adopt the getdns api from
https://getdnsapi.net.)  It is not possible, under POSIX or any other
DNS query method, to get A and AAAA "at the same time": these are two
different queries.  That's required by the DNS protocol because of
caches.  It might well be that the DnsQuery interface (I haven't
investigated it) ought to provide that factility, but according to the
above complaint it doesn't.  ANY won't help you here: if there's
anything in the cache, then ANY will return that and nothing else.

Also, this seems to suggest that there's an order to DNS query
responses.  This is a mistake.  DNS provides unordered sets of data.
In fact, against many modern resolvers, repeated queries for the same
name, class, and type will return the results in different orders; but
this is actually just to fake naive client implementations into using
random records, since most of them just pick the first one.

In general, ANY is useful for troubleshooting but should never be used
for regular operation.  Its output is unpredictable given the effects
of caches.  It can return enormous result sets, and some authoritative
servers have taken to refusing ANY queries because of the frequency
with which such queries show up in amplification attacks.

The approach described in this ticket is broken.  I'm happy to talk
with people about how to solve the problem they're trying to solve if
that would help, but I suspect it's going to be harder than
implementers think.  (This is a special case of the DNS Rule, which is
"It's harder than it ought to be." :-/ )


Andrew Sullivan
ajs at anvilwalrusden.com

More information about the dns-operations mailing list