[dns-operations] DNS load-balancing/failover using an ASR 9xxx (few questions)

Fri Aug 15 17:17:28 UTC 2014

On Thu, 2014-08-14 at 17:48 +0000, Jake Zack wrote:
> Anyone doing this?
> 
>  
> 
> Previously I’d been using Cisco 3945’s and 3845’s running standard
> IOS…thus using Cisco IP SLA + track to do DNS queries of each server
> and add/remove them from the cluster.
> 
>  
> 
> In the ASR 9xxx series with IOS XR, the “ipsla” that it has available
> doesn’t seem to do either TCP connections or UDP DNS queries.  It
> seems my only real option is to monitor for ICMP reachability and
> nothing else.
> 
>  
> 
> Anyone have a better solution?  I’ve considered throwing a wrapper
> around BIND doing OSPF updates and such…but it seems unideal.
> 
>  
> 
> -Jake
> 
> DNS Administrator – CIRA (.CA TLD)
> 
> 

We are using a couple of small clusters of Linux Servers (Scientific
linux (whitebox RHEL distribution) for recursive resolvers. They consist
of 2 load balancers using a CMAN/Pacemaker cluster. The load balancing
is done with the Linux kernel's IP Virtual SErvice (IPVS) featire. The
resolver IPs are VIPs managed by the cluster. And the load balancers are
setup to replication their connection tables to each other to add in
seamless failover capabilities

Also in the mix I run keepalived on the load balancers. Keepalived
manages the IPVS configuration in conjunction with health checks for
each of the back-end nodes. If a back-end node stop responding, the IPVS
configuration is altered to remove that node from tthe cluster.

And note that keepalived also implements a VRRP routing daemon for
failover between a set of routers. (We don't use VRRP in our setup.)

There are 4 back-end servers running just Bind as caching name servers
with a few of our main authoritative zones as slaves.  The load
balancers have all of the back-end servers in their configurations, but
we normally only have 2 back-end nodes servicing one of the resolver
VIPs. The other two are set to weight 0. I can alter the weights in the
lod balancers to bring back-end nodes in and out of service and to move
them between resolver VIPs.

I've clocked a resolver cluster (1 Load Balancers, 2 backend nodes and
named caches flushed) north of 11,000 queries per second before it
queries started to fail.

I've been using a similar setup (minus the keepalived) for well over 7
years with out any major issues. The resolvers clusters have been
running about 3 years without any major issues.

-- 
Stephen L Johnson  <stephen.johnson at arkansas.gov>
Unix Systems Administrator / DNS Hostmaster
Department of Information Systems
State of Arkansas
501-682-4339