[dns-operations] Best practices for Linux/UNIX stub resolver failover

Chuck Anderson cra at WPI.EDU
Tue Apr 22 19:04:27 UTC 2014


Is it really expected that the first DNS server listed in
/etc/resolv.conf should never go down?  Operationally speaking, who
can actually rely on listing multiple nameservers in /etc/resolv.conf
and using libc's failover mechanism in any kind of production server?
Because the failover behavior in libc is atrocious--each new or
existing process has to re-do the failover after timing out, and even
long-running processes have to call res_init() to re-read resolv.conf.
It seems that the only sensible way to run a datacenter (or a network
full of Linux workstations for that matter) is to either:

1. Make sure the first nameserver listed in resolv.conf never goes
   down by using Anycast DNS or some other failover mechanism like
   VRRP or CARP on the DNS server side.

or:

2. Use a local DNS daemon on every server with forwarders configured
   to the network's nameservers, and fix resolv.conf to 127.0.0.1.

I posted this thread awhile ago to the Fedora development list to
propose the latter solution:

Subject: default local DNS caching name server

https://lists.fedoraproject.org/pipermail/devel/2014-April/197755.html

    "While DNSSEC support has historically been a driving factor for
    implementing this, there is an even more fundamental need due to
    the poor performance of the system in case the first listed
    nameserver in /etc/resolv.conf fails for some reason.  It is
    shameful that Linux systems and applications in general still,
    after 20+ years, can't perform adequately after a primary DNS
    server failure.  The stub resolver in glibc which uses
    /etc/resolv.conf can decide that the first listed nameserver entry
    is down, but this decision has to be made over and over in every
    single process on the system that is doing DNS resolution,
    resulting in repeated long application hangs/delays.  We need an
    independent, system-wide DNS cache, and always point resolv.conf
    to 127.0.0.1 to solve this fundamental design problem with how
    name resolution works on a Linux system.  Windows has had a
    default system-wide DNS cache for over a decade.  It is about time
    that Linux catches up."

What do the DNS experts say about best practices for DNS failover in
the stub resolver?



More information about the dns-operations mailing list