[dns-operations] Delegation checking (was: Re: Some DNSSEC trivia)

Edward Lewis Ed.Lewis at neustar.biz
Thu Jan 10 14:54:46 UTC 2008

At 3:51 +0100 1/10/08, Michael Monnerie wrote:

>I've read all this thread with big interest, and there seems to be the
>side of techies who like to fix broken and bad things, and the lawyers
>who are concerned about contracts.

I know my responses, on-list and off, put me into the lawyers 
category.  But I have been through this work as an engineer and know 
well why it is not worth the cost.  I have also worked in TLD 
registries and understand the tradeoffs made in resource allocation.

Engineers are trained to fix a problem at all costs.  That's they way 
I was brought up.  I had to learn the hard way when it came time to 
get the funds to work on a task "just to fix something" and 
discovered that what I was taught as an engineer was wrong.  A 
problem is to be fixed only if it's worth fixing.

>Has anybody got estimations or real numbers about how many problems
>would be solved, how many domains would not meet requirements, and what
>this would help to save the planet? I read about "bad
>domains", "hackers", "poisoning" etc. but how many are there really?

When I last measured (probably 3 or 4 years ago), in a portion of the 
reverse map, it ranged from 30% where a registry would cut the 
delegation before informing the registrant of the zone name (unlike 
the forward, the registrant likely cannot guess the name ahead of 
registration) to under 10% at registries that cut the delegation only 
after it was running.

After a pass at lameness checking, the 30% number dropped a bit.  It 
didn't get tracked over time, but when it dropped, it seemed to rise 
again because new zones were being cut constantly.  (Keep in mind 
that there are many differences between the reverse and forward maps 
in DNS.)

To me that indicated that a check before delegation did have a big 
impact on the rate of lameness.  The impact was bigger than repeated 
testing and nagging.

The reason I was asked (by the "customers" of the registry) to chase 
down lameness was an old resolver that didn't realize a lame 
delegation message was that and not a referral. I.e., the resolver 
didn't recognize it was being referred back to the root or towards 
the root.  Instead of stopping, the resolver followed the message.

What was overlooked at the time was that a non-response did not 
trigger this, yet that was also the focus.  But the problem that 
launched the work was "true" lameness, where a server is not 
authoritative for a zone it is thought to be authoritative for, not 
broken delegations.

By the time I put down my pencil on the problem, the broken resolver 
had largely disappeared from the network and the ill effects of lame 
delegations had faded into the noise below things like spam, botnets 
and other wasted use of transmission energy.  The policy has remained 
on the books but may be removed for other reasons.

>It would be interesting to see some agreement on what checks should
>exactly be done, and then just run these (silently, without any
>shutdowns or informational e-mails or whatever) - looks like there are
>enough people here that are able to to it to a respectful number of

I couldn't even settle on my own set of checks that should be done, 
much less get a consensus.  You can always get agreement on a set of 
checks if you pare back to a few obvious ones, but then the sentiment 
is that you are not aggressive enough and you aren't going to catch 
enough of the broken delegations to make it worth the effort.

E.g., is a zone broken if one of the server delegations still works? 
If one worked, then any, including the broken resolvers, eventually 
got there and the looping problem was squelched.  I used to label the 
choices as "save the Internet" and "save the DNS".  The former 
limited reporting to zones that were completely unreachable and the 
other reported server problems.

E.g., in statistics, do you quite percentages by zone, by NS record, 
by name server addresses, unreachable zones, partially reachable 
zones, etc?

I don't mean to say you can't define the problem, but there are many 
angles to it and probably there's no definition that will reach a 
significant consensus.

Edward Lewis                                                +1-571-434-5468

Think glocally.  Act confused.

More information about the dns-operations mailing list