[dns-operations] Quick anycast primer

Fri Jul 14 07:41:38 UTC 2006

Sorry for dragging this out, but there seems to be a lot of confusion here 
about anycast DNS, unicast DNS, and failure handling.

There's a traditional way for DNS to handle server failures, and an 
additional failure handling mechanism introduced by anycast.  The two 
coexist quite well, so it's ideal to include both of them.

Focusing for the moment on authoritative servers rather than caching 
resolvers:

In either an anycast or unicast environment, each zone has a set of 
authoritative "servers," which are either the address of a unicast server 
or the address of an anycast cloud.  A resolver gets that list of NS 
records and tries each one (in an implementation-dependent fashion) until 
it finds one that works.  If one of those addresses doesn't respond, it 
goes on to the next one, often after a timeout.  If there are more NS 
records for a zone, there are more server addresses that can be tried, and 
thus more that can be non-responsive at once without causing a complete 
failure.  If all server addresses become unreachable from the perspective 
of a given caching resolver, that caching resolvers won't be able to 
complete a lookup involving that zone.

Anycast doesn't change this.  Anycast does add an additional mechanism for 
working around failures.

In a typical anycast cloud, there are several servers in several locations 
sharing a service address.  When all servers in the cloud are up, and 
routing to all of them is working properly, queries sent to that service 
address are responded to by the topologically closest server.  The 
additional failure handling of anycast comes in two forms:  When a server 
in an anycast cloud goes down properly, it withdraws its routing 
announcement and queries get transparently redirected to the next closest 
server in the cloud.  If a server in an anycast cloud goes down improperly 
and fails to withdraw its route, queries sent to that service address may 
fail, but *only* if the queries are coming from somewhere that considers 
that server the topologically closest server in the cloud.  Queries to 
that service address that get to other servers in the cloud anyway will 
continue to be answered.

It's still ideal for a zone to have several NS records, whether those 
service addresses point at unicast servers or anycast clouds.  For each 
service address, reliability should be better if it's a service address in 
a well managed anycast cloud than if it's a single unicast server.  If the 
addresses were all different addresses for the same anycast cloud or for 
the same unicast server, or if a zone had a really small number of NS 
records (anycast or unicast), there would be a problem.  There's no 
problem with a zone that has several NS records all pointing at different 
anycast clouds.  There are several TLD operators (and others, I'm sure) 
who operate multiple, distinct, anycast clouds.

I'm not going to go into anycasting a caching resolver service here 
specifically, but the issues should be more or less the same.

-Steve