[dns-operations] Google Chrome Pre-Caching

Thu Sep 4 17:16:13 UTC 2008

On Thu, Sep 04, 2008 at 10:07:11AM -0400, Chris Roosenraad wrote:

> there is an estimated 2% take rate so far.  I doubt it'll climb much
> higher for a while...the early adopters have

Just a guess: it may grow in popularity.  The beta is advertised on
the google home page, and even made the local television news in some
US markets.  This browser is being promoted.

> The only folks I think have to worry are those who run the .co
> TLD...they're going to get hammered by this.

Here are some more notes:

1) More NXDOMAIN Correction Issues/Privacy Issues
---------------------------------------------------

The 'DNS error path correction' vendor community (also called NXDOMAIN
rewriters, DNS error monetizers, typo advertisers, etc., depending on
your point of view) is likely also affected by Chrome's default
behavior.

Chrome's network default is to "show suggestions for navigation
errors" (an option, defaulted on, in the advanced configuration
settings).  Just like the google toolbar default, NXDOMAIN answers are
sent via a dnserror GET parameter to linkhelp.clients.google.com (a
const wchar--unlikely to change soon).  For example:

GET /tbproxy/lh/fixurl?hl=en-US&sd=com&url=http%3A%2F%2Fnytimes.co%2F&sourceid=chrome&error=dnserror HTTP/1.1

Of course, this error correction occurs in the application layer, and
not the resolution control plane.  So Chrome still obeys the http
302s, etc., provided by opendns, etc., and others who ultimately use
http correction paths to terminate NXDOMAIN resolutions.  Nonetheless,
Chrome does obtain NXDOMAIN reporting from users who have not yet
selected a DNS error path collection service.  (In the vernacular of
the current Internet meme: Chrome is drinking your milk shake.  Chrome
users who don't sign up with a DNS rewriter have just "signed up" with
Google's linkhelp NXDOMAIN correction system.)

This should be off-by-default, and opt-in, in my opinion.  I don't
want my honeypots reporting robotic NXDOMAIN traffic to google (they
likely don't want my noise either), and I'd expect real users would
have substantive privacy concerns.  It may even seen unfair to some
domain-correcting services.

Some commentators creatively call this overall trend "Web 2.0
sharecropping" to make a point about consent-based social networking;
however, this is a legitimate privacy issue in DNS resolution, since
the qnames resulting in NXDOMAIN are individually attributable to end
users, which (afaik) have not opted in.  (I.e., Ones dyslexia or
inability to spell is now evidenced in a database row somewhere; this
contrasts to the larger statistical study of aggregated user errors
above the recursive.)

As for the reporting of DNS stats to google, mentioned elsewhere in
this thread, Chrome does not upload stats (e.g., the contents
"about:dns", "about:histograms", etc.) to google without consent, and
this defaults to 'no' on install.  Chrome appears to just use the
visual studio breakpad; their cut just reports dns resolution
failures, along with typical dumps.  The "--disable-metrics-reporting"
argument turns off uploading of stats (though all stats are collected
locally).

2) More On Reporting/Prefetching
--------------------------------------

This is definitely a browser useful in honeypots and DNS analysis
farms.  Some other useful command line args for DNS testing include:

   Chrome.exe  --dns-log-details \         <-- useful logging
               --dns-prefetch-disable \
               --remote-shell-port ARG \   <-- wow
               --record-mode \             <-- for creating cache-only
               --playback-mode                 sessions

There's also an interesting contradiction in chrome_switches.cc, which
states in comment "Chrome will support prefetching of DNS information.
Until this becomes the default, we'll provide a command line switch."
Yet the opposite logic is enabled, and prefetching is turned on.

Perhaps google's beta is testing the impact of prefetching, or this
evidences some debate?  Perhaps I'm reading into the use of negative
variable names for positive values, and the "* -1" logic found in
comments.

If prefetching is only being tested, then the ISP recursives and .co
authority operators may wish to weigh in with their stats.  Likewise,
DNS/IDS vendors may wish to engage this issue.  I have not been able
to get Chrome to tigger snort-IDS rules written for Kaminsky-style
attacks (these often look at NXDOMAIN rates, not +rd rates); however,
I imagine DNS security vendors will have concerns about prefetching
large batches of domain names as well.  A browser doing larger volumes
of DNS lookups looks like a bot, or a NAT point.

3) Good News: Possible MX Ignorance
--------------------------------------

I have not found a way to get Chrome to do large volleys of async MX
lookups, however.  (The code appears fixed only on class type IN A).
Chrome does not even appear to do DNS prefetching on pages full of
mailto: hrefs.  At least it did not bite on this:

  grep mailto ~/mbox | tr ':< >=[]()?"' '\n' | grep '@' | \
     head -n 10000000 | sort | uniq | \
     awk 'BEGIN{print "<html>"} \
          {print "<a href=\"mailto:"$1"\">"$1"</a>"} \
          END{print "</html>"}' > ~/mxbait.html

So Chrome appears to avoid mail-related prefetching.  I have not
located the logic in the observer code yet.

4) Bot or Not: Chrome Complications For DNS Research
------------------------------------------------------

Which DNS lookups are prefetch noise and which are user-driven?

Besides timing analysis, there appears to be no way using DNS traffic
alone to differentiate omnibar-generated DNS queries (caused by humans
at keyboards--or crafty scripts), and DNS prefetching caused by HTTP
parsing (driven by web 2.0 link proliferation).  The sourceid GET
parameter sent to the google linkhelp domain might provide some clue
as to the view origin of the DNS query, in cases where NXDOMAIN is
found.  Likewise, the ie (input encoding) GET parameter for linkhelp
fetches may also provide context.  

But beyond that, DNS researchers might now have to look at http
traffic for linkhelp--a daunting prospect for high speed links, and a
far more difficult policy hurdle.  Timing analysis is what's left for
DNS-only span analysis.

5) Summary Thoughts
---------------------

Overall, this is a neat tool.  I like the data collection aspect
(without central reporting, of course), so that opinions about DNS
query volumes can be tested.

So far, it looks like we see 3x to 4x the DNS query rate, anecdotally
(perhaps more), for an average >10% chance of a modest rendering
speedup.  Plus, we see prefetching on restart--all for a browser
that's designed not to crash/restart excessively in the first
place....  The benefits of prefetching are easily measured, but seem
modest. I don't yet know how browsers measure improvements in
performance over generations.  

But what are the costs?  Recursive operators should take note of the
increased query volume, particularly if they are still struggling with
port randomization overhead.  I would be interested in hearing of any
success stories or failures in the load issue.

-- 
David Dagon              /"\                          "When cryptography
dagon at cc.gatech.edu      \ /  ASCII RIBBON CAMPAIGN    is outlawed, bayl
Ph.D. Student             X     AGAINST HTML MAIL      bhgynjf jvyy unir
Georgia Inst. of Tech.   / \                           cevinpl."