[dns-operations] Best practices for Linux/UNIX stub resolver failover
Chuck Anderson
cra at WPI.EDU
Tue Apr 22 19:04:27 UTC 2014
Is it really expected that the first DNS server listed in
/etc/resolv.conf should never go down? Operationally speaking, who
can actually rely on listing multiple nameservers in /etc/resolv.conf
and using libc's failover mechanism in any kind of production server?
Because the failover behavior in libc is atrocious--each new or
existing process has to re-do the failover after timing out, and even
long-running processes have to call res_init() to re-read resolv.conf.
It seems that the only sensible way to run a datacenter (or a network
full of Linux workstations for that matter) is to either:
1. Make sure the first nameserver listed in resolv.conf never goes
down by using Anycast DNS or some other failover mechanism like
VRRP or CARP on the DNS server side.
or:
2. Use a local DNS daemon on every server with forwarders configured
to the network's nameservers, and fix resolv.conf to 127.0.0.1.
I posted this thread awhile ago to the Fedora development list to
propose the latter solution:
Subject: default local DNS caching name server
https://lists.fedoraproject.org/pipermail/devel/2014-April/197755.html
"While DNSSEC support has historically been a driving factor for
implementing this, there is an even more fundamental need due to
the poor performance of the system in case the first listed
nameserver in /etc/resolv.conf fails for some reason. It is
shameful that Linux systems and applications in general still,
after 20+ years, can't perform adequately after a primary DNS
server failure. The stub resolver in glibc which uses
/etc/resolv.conf can decide that the first listed nameserver entry
is down, but this decision has to be made over and over in every
single process on the system that is doing DNS resolution,
resulting in repeated long application hangs/delays. We need an
independent, system-wide DNS cache, and always point resolv.conf
to 127.0.0.1 to solve this fundamental design problem with how
name resolution works on a Linux system. Windows has had a
default system-wide DNS cache for over a decade. It is about time
that Linux catches up."
What do the DNS experts say about best practices for DNS failover in
the stub resolver?
More information about the dns-operations
mailing list