[dns-operations] Best practices for Linux/UNIX stub resolver failover
Brett Carr
Brett.Carr at nominet.org.uk
Fri May 2 09:04:41 UTC 2014
On 22 Apr 2014, at 20:04, Chuck Anderson <cra at WPI.EDU> wrote:
> Is it really expected that the first DNS server listed in
> /etc/resolv.conf should never go down? Operationally speaking, who
> can actually rely on listing multiple nameservers in /etc/resolv.conf
> and using libc's failover mechanism in any kind of production server?
> Because the failover behavior in libc is atrocious--each new or
> existing process has to re-do the failover after timing out, and even
> long-running processes have to call res_init() to re-read resolv.conf.
> It seems that the only sensible way to run a datacenter (or a network
> full of Linux workstations for that matter) is to either:
>
> 1. Make sure the first nameserver listed in resolv.conf never goes
> down by using Anycast DNS or some other failover mechanism like
> VRRP or CARP on the DNS server side.
>
> or:
>
> 2. Use a local DNS daemon on every server with forwarders configured
> to the network's nameservers, and fix resolv.conf to 127.0.0.1.
>
Not an expert either but I remember having bad experiences of this on Solaris 8/10 and early versions of Linux many years ago but I was under the impression that modern versions of *nix had this covered, certainly the quick test I just did on a RHEL6 server makes me think this is no longer an issue.
The server I have has a resolv.conf configured as follows:
nameserver 192.168.1.244
nameserver 192.168.1.1
If I stop the resolver running on 192.168.1.244 and then generate a query I see the following in a tcpdump
09:57:01.348084 IP 192.168.1.173.47658 > 192.168.1.244.domain: 35734+ A? pool.ntp.org. (30)
09:57:01.348160 IP 192.168.1.173.47658 > 192.168.1.244.domain: 61935+ AAAA? pool.ntp.org. (30)
09:57:01.348223 IP 192.168.1.173.42169 > 192.168.1.1.domain: 35734+ A? pool.ntp.org. (30)
09:57:01.348267 IP 192.168.1.173.42169 > 192.168.1.1.domain: 61935+ AAAA? pool.ntp.org. (30)
This is very quick failover to the second resolver.
Or did I mis-understand the problem.
Brett
More information about the dns-operations
mailing list