[dns-operations] DNS zone monitoring

Joe Greco jgreco at ns.sol.net
Mon Jun 14 04:29:22 UTC 2010


> On 2010-06-13, at 22:56, Joe Greco wrote:
> 
> > I was just in a discussion elsewhere that brought up an old topic:
> > 
> > How do people monitor for secondary servers that are having trouble
> > updating a zone from the master?
> 
> We direct an apex/IN/SOA query to all servers for each zone we are 
> checking, and if we see inconsistent serial numbers we sound alarms.

Yes, but that's only useful if your SOA's are changing.  For many zones,
there's no need for the serials to change.  Besides, I already indicated
we did that.  :-)

> > Obviously, we do all the normal sanity checks (SOA's match, etc) but
> > other than monitoring the log file and watching for errors such as
> 
> If SOA serials match then no zone transfers will happen and you have 
> no errors to look for.

False; I'm *specifically* interested in a case where that's clearly
not the case, which would be a secondary which has become unable to
reach its master (think: because of a firewall rule, etc., that was
inadvertently and sloppily applied).  In such a case, the secondary
will wait for possibly as long as an entire $refresh period before
initiating a check, but once that happens and it cannot reach the
master, you only get $expire more seconds.  This means that your zone
lives for something between $expire and ($refresh + $expire) before
going all kablooey, and in the meantime serving either outdated or
current information, depending on whether or not the master's data
has been twiddled.

In the case the master's data is updated, that failure is detectable,
given reasonable polling and SOA values, but what about the case where
the zone isn't changing?  It seems to be working... seems to be working..
then suddenly falls off a cliff when it expires and is suddenly not 
working.  That's a train wreck if your nameservers all go in short
order and stop serving a zone.

Relying on monitoring the logfiles for such failures seems like a kludge,
but I'm not seeing the alternative.  In particular, I cannot think of
any way in which the remaining expiration period for a zone is usefully
exposed, say something along the lines of what a TTL is.

> If you want to test the transfer machinery, then arrange for the SOA 
> serial number to be increased regularly and look for signs that the 
> updated zones are not being served where they should be. Depending on 
> the signature validity periods and re-signing intervals you choose, 
> simply signing your zones might be enough to provide SOA serial 
> increases sufficient to do useful monitoring.

That strikes me as ugly and dangerous in some ways (though possibly
offset as an improvement in others); it's akin to writing all your
web pages in PHP even though most of them could be served up with a
static page.  You introduce other possible failures into the mix.

I thought maybe I was missing something, but it seems to me (and I've
spent a little time looking now) that there's no real method for this
and nobody's really doing it.  I guess I should just not be so paranoid
and let this bug me for another ten years...?  Heh.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.



More information about the dns-operations mailing list