[dns-operations] EDNS issue

Fri Feb 25 13:49:07 UTC 2011

----- Original Message ----- 
From: "Mark Andrews" <marka at isc.org>
To: "George Barwood" <george.barwood at blueyonder.co.uk>
Cc: <frnkblk at iname.com>; <dns-operations at dns-oarc.net>
Sent: Friday, February 25, 2011 11:59 AM
Subject: Re: [dns-operations] EDNS issue

> 
> 
> In message <15A36D24D8F44607A9C047E7D7894AC0 at local>, "George Barwood" writes:
>> 
>> ----- Original Message ----- 
>> From: "Mark Andrews" <marka at isc.org>
>> To: <frnkblk at iname.com>
>> Cc: <dns-operations at dns-oarc.net>
>> Sent: Friday, February 25, 2011 1:26 AM
>> Subject: Re: [dns-operations] EDNS issue
>> 
>> 
>> > 
>> > In message <006d01cbd482$d54e5a30$7feb0e90$@iname.com>, "Frank Bulk" writes:
>> >> Our ISP helpdesk has been receiving a lot of complaints about their
>> >> inability to check the weather weather.gov, specifically,
>> >> forecast.weather.gov.  Some digs showed that queries were failing, and my
>> >> BIND logs show the same:
>> > 
>> > Make sure you can receive fragmented UDP responses.  The servers
>> > are sending good reponses.
>> 
>> Can you clarify why end users were complaining?
>> Doesn't BIND time out quickly enough if fragmented packets get lost?
>> I was under the impression that the result would be reduced performance
>> rather than failure.
> 
> It takes time to detect that a reply has been blocked and when you
> have a CNAME chain and you need to go through the process a second
> time the client sometimes times out.
> 
> There is only so much you can do in 2-3 seconds and most of that
> is limited by the speed of light.

Right,  it depends quite a bit on how long you wait before starting
the fallback to 512 bytes and then TCP after truncation, ok.

But looking quite briefly at what was going on, it seems the DNSKEY response for
weather.gov and also noaa.gov ( due to the CNAME ) are fragmented ( they
have 2 RRSIGs, where 1 RRSIG is more common practice ).

These have a 1 day TTL, so you would expect to see only 1 error once they
have been fetched and cached with TCP fallback.

But the logs are apparently showing several errors, up to 30 seconds apart,
and some after the problem was apparently resolved. It sounds as if there
is more than 1 error every 24 hours happening.

I guess there is something about how BIND operates that I don't understand,
or there other responses being fragmented as well.

George

> Mark
> -- 
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org