[dns-operations] Best practices for Linux/UNIX stub resolver failover

Tue Apr 22 20:11:44 UTC 2014

On Tue, Apr 22, 2014 at 12:25:03PM -0700, Todd Lyons wrote:
> On Tue, Apr 22, 2014 at 12:04 PM, Chuck Anderson <cra at wpi.edu> wrote:
> > Is it really expected that the first DNS server listed in
> > /etc/resolv.conf should never go down?  Operationally speaking, who
> 
> No.
> 
> > can actually rely on listing multiple nameservers in /etc/resolv.conf
> > and using libc's failover mechanism in any kind of production server?
> > Because the failover behavior in libc is atrocious--each new or
> > existing process has to re-do the failover after timing out, and even
> > long-running processes have to call res_init() to re-read resolv.conf.
> > It seems that the only sensible way to run a datacenter (or a network
> > full of Linux workstations for that matter) is to either:
> >
> > 1. Make sure the first nameserver listed in resolv.conf never goes
> >    down by using Anycast DNS or some other failover mechanism like
> >    VRRP or CARP on the DNS server side.
> 
> [root at site03 ~]# more /etc/resolv.conf
> search example.net
> nameserver 192.168.1.10
> nameserver 192.168.2.10
> options rotate timeout:2
> 
> > What do the DNS experts say about best practices for DNS failover in
> > the stub resolver?
> 
> I'm curious to see what they think here too.
> 
> ...Todd

Not an expert either, but do like the local resolver option a lot.
We've certainly been bitten by this on our Linux-based (appliance)
monitoring environments when DNS infrastructure has gone down.

Probably will explore introducing anycast internally.

Ray