[dns-operations] EDNS issue

Fri Feb 25 22:32:43 UTC 2011

In message <311A73CB2084450AA03DA889D693DD01 at local>, "George Barwood" writes:
> 
> ----- Original Message ----- 
> From: "Mark Andrews" <marka at isc.org>
> To: "George Barwood" <george.barwood at blueyonder.co.uk>
> Cc: <frnkblk at iname.com>; <dns-operations at dns-oarc.net>
> Sent: Friday, February 25, 2011 11:59 AM
> Subject: Re: [dns-operations] EDNS issue
> 
> 
> > 
> > 
> > In message <15A36D24D8F44607A9C047E7D7894AC0 at local>, "George Barwood" writes:
> >> 
> >> ----- Original Message ----- 
> >> From: "Mark Andrews" <marka at isc.org>
> >> To: <frnkblk at iname.com>
> >> Cc: <dns-operations at dns-oarc.net>
> >> Sent: Friday, February 25, 2011 1:26 AM
> >> Subject: Re: [dns-operations] EDNS issue
> >> 
> >> 
> >> > 
> >> > In message <006d01cbd482$d54e5a30$7feb0e90$@iname.com>, "Frank Bulk" writes:
> >> >> Our ISP helpdesk has been receiving a lot of complaints about their
> >> >> inability to check the weather weather.gov, specifically,
> >> >> forecast.weather.gov.  Some digs showed that queries were failing, and my
> >> >> BIND logs show the same:
> >> > 
> >> > Make sure you can receive fragmented UDP responses.  The servers
> >> > are sending good reponses.
> >> 
> >> Can you clarify why end users were complaining?
> >> Doesn't BIND time out quickly enough if fragmented packets get lost?
> >> I was under the impression that the result would be reduced performance
> >> rather than failure.
> > 
> > It takes time to detect that a reply has been blocked and when you
> > have a CNAME chain and you need to go through the process a second
> > time the client sometimes times out.
> > 
> > There is only so much you can do in 2-3 seconds and most of that
> > is limited by the speed of light.
> 
> Right,  it depends quite a bit on how long you wait before starting
> the fallback to 512 bytes and then TCP after truncation, ok.
> 
> But looking quite briefly at what was going on, it seems the DNSKEY response for
> weather.gov and also noaa.gov ( due to the CNAME ) are fragmented ( they
> have 2 RRSIGs, where 1 RRSIG is more common practice ).
> 
> These have a 1 day TTL, so you would expect to see only 1 error once they
> have been fetched and cached with TCP fallback.

radar.weather.gov has a 5 second ttl and edge-ext.lb.noaa.gov has
a 30 second ttl so every thirty seconds named does a double fallback
and that takes too long for some clients.

We keep telling people that they need to support EDNS up to 4096
bytes over UDP.  Sometimes they only fix part the problem when there
are multiple things to fix (allowing bigger packets *and* allowing
fragments).

While the oarc test is useful it really isn't the best tool to check
if you have a good EDNS path.  It was designed to discover what the
path supports not to say if it is good.  It also is not IP version
specific.

What is needed is test queries (IPv4 and IPv6) that will only succeed
if the UDP path is good for 4096 byte EDNS.

	dig @server1 ipv4-testname txt
	dig @server1 ipv6-testname txt

and do that for each of the servers listed in resolv.conf

Mark

[drugs:~] marka% dig radar.weather.gov cname

; <<>> DiG 9.6.0-APPLE-P2 <<>> radar.weather.gov cname
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58808
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;radar.weather.gov.		IN	CNAME

;; ANSWER SECTION:
radar.weather.gov.	5	IN	CNAME	edge-ext.lb.noaa.gov.

;; Query time: 1029 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Feb 26 09:11:56 2011
;; MSG SIZE  rcvd: 66

[drugs:~] marka% 

[drugs:~] marka% dig edge-ext.lb.noaa.gov

; <<>> DiG 9.6.0-APPLE-P2 <<>> edge-ext.lb.noaa.gov
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42253
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;edge-ext.lb.noaa.gov.		IN	A

;; ANSWER SECTION:
edge-ext.lb.noaa.gov.	30	IN	A	140.90.200.22
edge-ext.lb.noaa.gov.	30	IN	A	140.172.17.22
edge-ext.lb.noaa.gov.	30	IN	A	129.15.96.22

;; Query time: 1299 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Feb 26 09:17:35 2011
;; MSG SIZE  rcvd: 86

[drugs:~] marka%

> But the logs are apparently showing several errors, up to 30 seconds apart,
> and some after the problem was apparently resolved. It sounds as if there
> is more than 1 error every 24 hours happening.
> 
> I guess there is something about how BIND operates that I don't understand,
> or there other responses being fragmented as well.
> 
> George

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org