[dns-operations] Bind 9.8.0 intermittent problem with non-recursive responses
Carlos Vicente
cvicente.lists at gmail.com
Thu May 19 22:45:37 UTC 2011
Hi Patrick,
This is interesting. I just realized that the problem is not exclusive
of my anycast servers. I noticed that my authoritative-only servers
were not returning the ADDITIONAL section either, so I restarted BIND,
and they started doing so.
So this does look more clearly like some kind of bug in BIND. I'll try
to open a case with ISC.
Thanks for your reply.
cv
On Thu, May 19, 2011 at 11:49 AM, Patrick, Robert (CONTR)
<Robert.Patrick at hq.doe.gov> wrote:
> Carlos,
>
> I've observed the same behavior with BIND 9.8.0 running on generic IPv4 assigned to an Ethernet interface, not using loopback with AnyCast. Odds are good this is a software bug in BIND. Same behavior observed on two nearly identical platforms, while on two others I've not run into the same issues.
>
> Best I could determine, the problem became apparent after some duration of runtime and/or queries or query volume. On servers that only handle inside "trusted" users I've not seen the problem at all and they're still running 9.8.0 today. On external Internet-facing servers where the problem was triggered almost daily we rolled back to 9.7.x until a fix is released (or 9.8.1, and we'll try again).
>
> FYI, server O/S in my case is CentOS 5.6 32-bit, should be equivalent to Red Hat.
>
> Hopefully an ISC POC will contact you directly. Send configs and they'll probably assist in debugging.
>
> -----Original Message-----
> From: dns-operations-bounces at lists.dns-oarc.net [mailto:dns-operations-bounces at lists.dns-oarc.net] On Behalf Of Carlos Vicente
> Sent: Thursday, May 19, 2011 1:58 PM
> To: bind-users at lists.isc.org; dns-operations at lists.dns-oarc.net
> Subject: [dns-operations] Bind 9.8.0 intermittent problem with non-recursive responses
>
> Dear lists [apologies if you receive two copies of this message],
>
> I am in the process of implementing anycast recursive DNS service for
> our campus using a combination of servers running Bind 9.8.0 and Cisco's
> IP SLA feature. There are three identical Redhat servers connected to
> three different routers with point-to-point /30 links. The servers are
> configured with an anycast address attached to an alias of the loopback
> interface:
>
> [note: these are not the actual IP addresses]
>
> lo:1 Link encap:Local Loopback
> inet addr:192.168.32.32 Mask:255.255.255.255
> UP LOOPBACK RUNNING MTU:16436 Metric:1
>
> These caching servers are also configured as stealth slaves for our
> zones (using Bind's 'also-notify' option in our master). This allows us
> to serve the latest contents of our zones without having to wait for
> TTLs to expire.
>
> In our tests, we've come across a very interesting but annoying problem.
> After several hours of operation, the servers start to respond to CNAME
> queries in an inconsistent manner. For example:
>
> # dig @192.168.32.32 www.uoregon.edu
>
> ; <<>> DiG 9.8.0-RedHat-9.8.0-4.uopel5 <<>> @192.168.32.32 www.uoregon.edu
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14280
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 6, ADDITIONAL: 4
>
> ;; QUESTION SECTION:
> ;www.uoregon.edu. IN A
>
> ;; ANSWER SECTION:
> www.uoregon.edu. 600 IN CNAME uowc-www.uoregon.edu.
> uowc-www.uoregon.edu. 86400 IN A 192.168.142.125
>
> ;; AUTHORITY SECTION:
> uoregon.edu. 86400 IN NS phloem.uoregon.edu.
> uoregon.edu. 86400 IN NS bigdog.lsu.edu.
> uoregon.edu. 86400 IN NS sns-pb.isc.org.
> uoregon.edu. 86400 IN NS arizona.edu.
> uoregon.edu. 86400 IN NS ruminant.uoregon.edu.
> uoregon.edu. 86400 IN NS dns.cs.uoregon.edu.
>
> ;; ADDITIONAL SECTION:
> phloem.uoregon.edu. 86400 IN A 192.168.32.35
> phloem.uoregon.edu. 86400 IN AAAA 2001:468:d01:20::80df:2023
> ruminant.uoregon.edu. 86400 IN A 192.168.60.22
> ruminant.uoregon.edu. 86400 IN AAAA 2001:468:d01:3c::80df:3c16
>
> ;; Query time: 0 msec
> ;; SERVER: 192.168.32.32#53(192.168.32.32)
> ;; WHEN: Wed May 18 12:51:06 2011
> ;; MSG SIZE rcvd: 300
>
>
> # dig @192.168.32.32 www.uoregon.edu
>
> ; <<>> DiG 9.8.0-RedHat-9.8.0-4.uopel5 <<>> @192.168.32.32 www.uoregon.edu
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34776
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;www.uoregon.edu. IN A
>
> ;; ANSWER SECTION:
> www.uoregon.edu. 600 IN CNAME uowc-www.uoregon.edu.
>
>
> As you can see, the second response does not include the AUTHORITY or
> the ADDITIONAL sections. This causes our users' machines to fail
> to resolve the A records because the resolver library does not query a
> second time. This second type of response appears to be the server
> acting as an authoritative-only server, not as a caching recursive server.
>
> Here are the most interesting details:
>
> - We have only observed this happening when querying the anycast
> address, not the address associated with the ethernet interface.
> - The behavior is independent of the network. We can replicate it by
> querying the anycast address from the server itself.
> - Our production (non-anycast) servers run the exact same version of
> Bind with the exact same configuration, and we have never observed this
> problem.
> - Bind's debugging output is exactly the same in both cases, so
> it offers no clues about the difference in responses.
> - Restarting Bind, the problem goes away for several hours. It requires
> the server to receive query traffic during those hours, otherwise the
> problem does not happen.
>
> Here's the options section of the config:
>
> options {
> version "9999.9.9";
> recursive-clients 5000;
> directory "/etc/named";
> allow-transfer { none; };
> blackhole { attackers; };
> listen-on-v6 { any; };
> allow-recursion { customers; };
> allow-query { any; };
> dnssec-enable yes;
> dnssec-validation yes;
>
> };
>
>
> Bind is listening on the anycast address (in addition to its NIC IP
> address):
>
> # netstat -lnp |grep 192.168.32.32
> tcp 0 0 192.168.32.32:53 0.0.0.0:*
> LISTEN 30771/named
> udp 0 0 192.168.32.32:53 0.0.0.0:*
> 30771/named
>
> These are the details of our Bind daemon (custom-built RPM, based on
> Fedora's source RPM):
>
> # named -V
> BIND 9.8.0-RedHat-9.8.0-4.uopel5 built with
> '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu'
> '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr'
> '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin'
> '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include'
> '--libdir=/usr/lib64' '--libexecdir=/usr/libexec'
> '--sharedstatedir=/usr/com' '--mandir=/usr/share/man'
> '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var'
> '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static'
> '--disable-openssl-version-check' '--enable-exportlib'
> '--with-export-libdir=/usr/lib64'
> '--with-export-includedir=/usr/include'
> '--includedir=/usr/include/bind9' 'build_alias=x86_64-redhat-linux-gnu'
> 'host_alias=x86_64-redhat-linux-gnu'
> 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS=
> -DDIG_SIGCHASE' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
> -mtune=generic' 'FFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
> -mtune=generic'
> using OpenSSL version: OpenSSL 0.9.8e-rhel5 01 Jul 2008
> using libxml2 version: 2.6.26
>
> # uname -a
> Linux adns1 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011
> x86_64 x86_64 x86_64 GNU/Linux
>
> # cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.6 (Tikanga)
>
>
> I would really appreciate any help with this.
>
> Thanks in advance,
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> dns-jobs mailing list
> https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
>
More information about the dns-operations
mailing list