[dns-operations] Source code to identify the fake DNS packets from China Re: Odd behaviour on one node in I root-server (facebook, youtube & twitter)

Mon Mar 29 19:21:01 UTC 2010

On Sat, Mar 27, 2010 at 10:07:53PM +0800, ?????? wrote:
> These wrong DNS replies are sent by the notorious Great Firewall of
> China. Checked by this program, the phenomenon is GFW's DNS poisoning
> without doubt.

I'd like to add a few comments to this topic.  

Please note that in this discussion, I do not state an opinion about
any governmental policy re: modifying DNS answers.  (Nobody asked my
opinion, and it's not relevant or on topic here).  I think that's an
important caveat: Much the discussion on this topic considers this
practice to be censorship.  My comments do not require one to have an
opinion on that topic. I've attempted to state strictly neutral,
factual observations.

So, what are the implications of such a DNS policy?  I can suggest a
few:

1) "Network Policy and Topology Leak"

   As noted by others in this list, almost any IP in China will reply
   to a query for a qname matching a substring for {twitter, facebook,
   youtube}.  For example, using prips(1) in the bsd sysutils/prips
   ports, one can:

   for ip in `prips $CHINA_CIDR`; do dig @${ip} twitter.com; done

   Nearly 80% of all networks in China will answer, on average.
   Clearly, the host does not even have to be a DNS resolver.  In some
   tests of several large networks over 3/4 of the IPs reply.

   What are some implications of this?

   a) Entire CIDRs (large ones: /16s, etc.) in China are open
      recursive, at least for qnames matching those three strings.

      Although open recursive DDoS amplification attacks might favor
      large records in cache, network operators in China might
      nonetheless share the Internet community's concern about the
      increase in the number of open recursive hosts.

      Of course, there are tens of millions of open recursives
      already.  But this behavior makes nearly every host in China
      open recursive for a large set of qnames.

      People far more clever in designing attacks than me might use
      this.  For example, this property might be used to congest low
      bandwidth segments in China, and complicate filtering as a
      remediation (since there are no longer just a few open
      recursives to filter, but entire networks).

   b) Note that about 20% of the IPs don't answer.  Some of these
      might be routers, some might be hosts that operate under a
      different policy system.  A common technique for some networks
      is to blackhole traffic, to frustrate attacker probes.

      But now outsiders can use queries for {facebook, twitter,
      youtube} to trivially enumerate hosts in China's network that
      are _not_ subject to this DNS policy.  I am not aware of a
      similar, trivial technique for mapping networks in any other
      country or network.  The DNS policy has made such a study easy.

   c) A few of the hosts do answer, but provide the 'correct' answer
      (meaning: an answer one might obtain from the zone authority).
      These hosts are open recursives (and have a meaningful fpdns
      signature).  Further, these hosts can also be observed seeking
      glue from authorities for zones with embedded substrings (e.g.,
      A? facebook.please.visit.me.from.china.example.com).

      There are well-known techniques for finding open recursives.
      This DNS policy lets outsiders identify hosts in China that for
      some reason are _not_ subject to the same rewriting policy.

      This leaks information from the host country about which
      networks are subject to the discussed policy, and which ones are
      not.  Thus, one could infer the extent and reach of the
      governmental organization requiring this policy, simply by
      querying a set of IPs in China.

      Discovering these policy relationships by other means would seem
      difficult; at least I'm not aware of where this information can
      be found.  But the DNS rewriting policy makes it trivial to
      discover the cross-entity relationships in networks.

   In short, modifying the resolution plane via the routing plane
   reveals information about how the networks in China are managed.
   Since finding this information about other networks in other
   countries is hard (and requires complex probes, which are often
   blocked or stopped via abuse@ email), this is an unusual property
   of the DNS rewriting policy.

2) "Mirror Loss"

   I do not know what root operators will do, but some might wish to
   disable anycast roots inside China (even if they already require
   the anycast prefix to be announced with BGP NO-EXPORT, etc.)
   Likewise, some DNSBLs will not locate their mirrors in China.

   I'm of course speculating, but if we assume the tendency is for
   operators to not locate mirrors in China, then the implications
   are:

   a) China's 400+ million users do not see the benefit of a local
      anycast root.  Although nobody asked my opinion, I'd personally
      would like the users of China to have the same quality of
      service I enjoy with localized DNS roots.

   b) The resolution plane in China becomes potentially more brittle.
      One motivation for anycasted roots is to resist DDoS (properly
      understood as Distributed _Degradation_ of Service, since
      universal *Denial* of service is not always a realistic attack
      outcome).  Once again, nobody asked my opinion, but I think the
      Internet should be equally robust, anywhere, and should not have
      areas where the resolution plane is more brittle and vulnerable.

   c) Domain BL operators will not locate mirrors in China, since some
      spam domains might have matching substrings.  For operational
      reasons, this means they might also move IP BLs outside of
      China, simply for convenience.  This means:

      -- Unless extensive localized mirrors are created, those outside
         of China can witness the rates of spam arrivals (and
         filtration) in Chinese networks.  Consider:

         www.cc.gatech.edu/~feamster/publications/dnsbl.pdf 

         This analysis does not merely work with botnets; without
         local mirrors, BL behavior is broadcast to other parts of the
         Internet.

      -- Users in China will experience slower resolution (and perhaps
         more timeouts) when checking BL status.

3) "Mirror Utilization Leak"

   The hosts that rewrite DNS packets appear to pick a non-random
   ordering of 9 answer IPs.  (As others have noted, the RRs always
   include one of: 243.185.187.39, 203.98.7.65, 159.106.121.75,
   93.46.8.89, 78.16.49.15, 59.24.3.173, 46.82.174.68, 37.61.54.158,
   and 8.7.198.45, and the TTL is either 300 or 86400).

   I only have a few months of testing data, but the pattern in which
   these IPs appear looks non-random.  (One can further differentiate
   between answers with 300 or 86400 TTL, which appear to come from
   different routers doing the rewriting.  Specifically, there appear
   to be *either* a ratio of 2 to 1 routers providing the 86400 TTL
   answers, *or* an equal ratio, with congestion on the routers
   providing the 300 TTL.  This is a guess based on a basic sampling
   of the TTLs and timeouts, and could be wrong.)

   But since the pattern appears non-random, if one queries a random
   network rapidly (say, once a second), the pattern of IPs will
   repeat, in order---except if another user made a query.  In such a
   case, one of the IPs will be skipped, out of order.  Measuring the
   time between each repeated answer therefore lets one estimate how
   many *other* third party queries were rewritten (minus one's own
   measurement queries, of course).

   Potentially, this leaks information about how many queries are
   being intercepted by this DNS policy.  (Of course, many of these
   queries could come from outside researchers, such as myself.)

   If I'm correct about this (I could be wrong), one can estimate the
   rate of blocking performed by this DNS policy.  So far, the rate
   appears diurnal, which further suggests this technique measures the
   deterministic, non-random selection of RRs.

   Assuming I'm correct, this reveals details about the rule's
   application.  This would be unusual.  It is rare for a network
   policy to provide information about how many times the rule has
   been applied.  (Normally, such statistics are known only to the
   firewall operators or those with access to routers or snmp traps).

   At least that is what I would expect from a policy that many see as
   sensitive.

4) "Potential Channel Loss."

   Some applications use DNS to tunnel information, e.g., nstx,
   ozzymanDNS, etc.  It is unlikely, but possible, that offending
   substrings could be generated naturally.  This may disrupt the
   operation of these tools.

   Perhaps that's a good thing; DNS tunnels are not always welcomed by
   network operators.  But certainly tunnel operators outside of China
   enjoy a slightly more robust channel.  This varies with the size of
   the substring against which qnames are matched of course.  A short
   substring (e.g., 'cnn') would likely generate false positives for
   numerous tunnel applications.

5) "Dictionary Reduction Via CNAME Manipulation."

   If twitter, youtube, and facebook are all blocked by a DNS policy,
   what would stop these zone owners from obtaining CNAMES (perhaps
   IDN domains) and pointing to the blocked IPs?   (Likewise,
   HTTP redirects could be used).

   Certainly filter evasions via aliases are possible.  If the DNS
   policy were to keep pace with the ability of these zone owners to
   make aliases for their blocked domains, then the DNS policy would
   also have to block the aliases.

   In one scenario, the zone owners of twitter, youtube and facebook
   could select IDN names, or other crafted strings that collide with
   domain assets owned in China.  (Indeed, any person could acquire
   and publicize an alias for a blocked service, using a name that
   collides with another service.)  A well-chosen string would be one
   that results in many false positives for the DNS policy.

   This means the DNS policy would either not keep pace with the
   aliases for the blocked domains, or suffer false positives.  I also
   believe the undecidable dimensions of this problem prevent the use
   of good automation tools to craft useful DNS policy updates.
   (I.e., humans must be used to craft new block strings---ones which
   are not accepted by systems vulnerable to false positives.)

   Taken to an extreme, this effectively gives remote third parties,
   generally not located in China, some small degree of control over
   the qname dictionary that China might use.   

   I stress this is only a theory.  I suspect it will not be tested,
   given the trouble of actually publicizing a colliding alias.

In short, there are many negative (likely unintended) outcomes from
the DNS policy, beyond merely propagating the DNS policy to caches
outside of China.

                            *     *     *

I work as a university researcher, and am quite realistic about the
humble area of the world I occupy.  I can't even influence the color
of chalk used in the lecture halls.

Perhaps some readers of this list believe themselves more influential,
and I believe many might be.  But I do offer this: I suspect
characterizing this issue as a "censorship problem" engages a larger,
non-technical issue.  At least from how I see things, this might
guarantee things don't get fixed quickly.  I could be wrong.

I'd suggest the networking community observe the neutral, factual
implications of this DNS policy.  I believe many of its side-effects
can be avoided.

I invite scrutiny of the ideas I've suggested above.  This list has
already provided an excellent overview of the unwanted, external
side-effect that China's policy had on others.  Have I correctly
characterized the effects on their own network?

-- 
David Dagon              /"\                          "When cryptography
dagon at cc.gatech.edu      \ /  ASCII RIBBON CAMPAIGN    is outlawed, bayl
Postdoc Researcher        X     AGAINST HTML MAIL      bhgynjf jvyy unir
Georgia Inst. of Tech.   / \                           cevinpl."