[dns-operations] Quick anycast primer
scg at gibbard.org
Fri Jul 14 07:41:38 UTC 2006
Sorry for dragging this out, but there seems to be a lot of confusion here
about anycast DNS, unicast DNS, and failure handling.
There's a traditional way for DNS to handle server failures, and an
additional failure handling mechanism introduced by anycast. The two
coexist quite well, so it's ideal to include both of them.
Focusing for the moment on authoritative servers rather than caching
In either an anycast or unicast environment, each zone has a set of
authoritative "servers," which are either the address of a unicast server
or the address of an anycast cloud. A resolver gets that list of NS
records and tries each one (in an implementation-dependent fashion) until
it finds one that works. If one of those addresses doesn't respond, it
goes on to the next one, often after a timeout. If there are more NS
records for a zone, there are more server addresses that can be tried, and
thus more that can be non-responsive at once without causing a complete
failure. If all server addresses become unreachable from the perspective
of a given caching resolver, that caching resolvers won't be able to
complete a lookup involving that zone.
Anycast doesn't change this. Anycast does add an additional mechanism for
working around failures.
In a typical anycast cloud, there are several servers in several locations
sharing a service address. When all servers in the cloud are up, and
routing to all of them is working properly, queries sent to that service
address are responded to by the topologically closest server. The
additional failure handling of anycast comes in two forms: When a server
in an anycast cloud goes down properly, it withdraws its routing
announcement and queries get transparently redirected to the next closest
server in the cloud. If a server in an anycast cloud goes down improperly
and fails to withdraw its route, queries sent to that service address may
fail, but *only* if the queries are coming from somewhere that considers
that server the topologically closest server in the cloud. Queries to
that service address that get to other servers in the cloud anyway will
continue to be answered.
It's still ideal for a zone to have several NS records, whether those
service addresses point at unicast servers or anycast clouds. For each
service address, reliability should be better if it's a service address in
a well managed anycast cloud than if it's a single unicast server. If the
addresses were all different addresses for the same anycast cloud or for
the same unicast server, or if a zone had a really small number of NS
records (anycast or unicast), there would be a problem. There's no
problem with a zone that has several NS records all pointing at different
anycast clouds. There are several TLD operators (and others, I'm sure)
who operate multiple, distinct, anycast clouds.
I'm not going to go into anycasting a caching resolver service here
specifically, but the issues should be more or less the same.
More information about the dns-operations