[dns-operations] Delegation health was Re: Worst current practice example

Mark Andrews marka at isc.org
Tue May 4 13:08:19 UTC 2010


In message <a06240801c805b141c546@[193.0.24.128]>, Edward Lewis writes:
> At 21:00 +1000 5/4/10, Mark Andrews wrote:
> 
> >I don't think this one is notable other than it should be something
> >that should be picked up in regular checks of delegations by parent
> >zone administrators and corrected after consultation with the child
> >zones administrators.  If they are unreachable or fail to correct
> >it within a reasonable period of time the delegation should be
> >pulled.  This is not a new requirement.
> 
> It may be a requirement in the RFC but it is not practical in operations.
> What makes this impractical?
> 
> 1. What is "Unreachable?" - just because X can't reach Y doesn't mean 
> Z can't reach Y.  (This is the Bill Manning reply.)

By unreachable I was actually refering to the administrators.

> 2. Fail to connect - failure on which end?
> 
> 3. Registries already tread lightly on pulling down delegations 
> involved in illegal activity (with illegal being a local 
> determination), treading into unreliable technical checks is not 
> worth it.

Pulling down is the last resort.  Getting the delegation corrected
is the desired result.  I suspect most registants would be more
than happy to have someone catch and report their errors to them
as they are not always easily visible unless you are a external
party.

> 4. And then there is volume.  If you demand that the test happen on a 
> daily basis and the TLD has 10 million delegations (there are a 
> handful with that now), that's 115 checks per second using a 24 hour 
> clock.  If 99% are good, that means every second you are launching 
> yet another in depth check into a potentially bad delegation, 60 
> times a minute, 3600 times an hour.

That's just a cost of doing business.  Most of it can be automated
or don't you require valid contact details?

As for the number of delegations that are problematic.  Removing
then provides incentive for people to actually ensure that they are
initially correct and remain correct.  The current situation is the
direct result of failure to check and correct.

I'd love to see weekly reports about the numbers of broken delegations
per infrustucture zone.  Both raw and as a percentage.

> (To pump this a bit, COM I would guess would have a significant 
> problem with this.  The current floated population is 80 million.  At 
> 1% bad, that would be 8 per second.  With all of the work the COM 
> engineers do now, do you think they could add on such a workload? 
> Granted, the 1% guess is just a number plucked from air.  They way 
> COM is monitored, I bet they have some idea of the real number.)
>
> ...All this to figure out why people can't get to a delegation that 
> seems to otherwise be a delegation no one needs to see.  (The old "if 
> a tree falls in a forest and no one hears it, did it make a sound" 
> question.)  It's not like a browser user complained that they 
> couldn't get to a site.

The problem is that people do hear the tree falling or stumble across it.

Yes, "I can't lookup <foo>" is a pretty regular sort of message on
bind-users.  Most of the time it ends up being a delegation problem.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org



More information about the dns-operations mailing list