[dns-operations] BIND, Knot and NSD behaviour when serial number goes backwards
anandb at ripe.net
Sun Feb 19 11:27:04 UTC 2017
We run a mixture of BIND, Knot and NSD on our name servers, and
sometimes this offers the opportunity to witness their different
behaviours for corner cases. We've had one today. The serial number of a
zone that we slave went backwards, from 2017021712 to 2017021701.
Here's what BIND 9.10 did:
19-Feb-2017 06:33:04.547 general: zone va/IN/main: serial number
(2017021701) received from master 188.8.131.52#53 < ours (2017021712)
19-Feb-2017 07:25:46.495 general: zone va/IN/main: expired
19-Feb-2017 07:25:46.559 general: zone va/IN/main: Transfer started.
19-Feb-2017 07:25:46.572 general: zone va/IN/main: transferred serial
2017021701: TSIG 'main.ripe.net'
Here's what Knot 2.3 did:
2017-02-18T20:59:21 info: [va.] refresh, outgoing, 184.108.40.206 at 53: zone
2017-02-19T00:59:21 info: [va.] refresh, outgoing, 220.127.116.11 at 53: zone
2017-02-19T04:59:21 info: [va.] refresh, outgoing, 18.104.22.168 at 53: zone
2017-02-19T08:59:21 info: [va.] refresh, outgoing, 22.214.171.124 at 53: zone
And here's what NSD 4.1 did:
[2017-02-19 07:44:52.590] nsd: info: xfrd: zone va. ignoring old
serial from 126.96.36.199
[2017-02-19 07:44:52.590] nsd: info: xfrd: zone va. bad transfer 0
[2017-02-19 07:44:55.660] nsd: error: xfrd: zone va. has expired
When BIND sees a lower serial number, it ignores it, and considers that
a failure to refresh, and retries using the retry timer (which is 1 hour
for this zone). Eventually, it expires the zone, and then pragmatically
ignores the zone content and retransfers it, and recovers.
Knot just thinks there's nothing to do, and happily chugs along,
NSD, like BIND, ignores the lower serial number, and keeps trying to
refresh, but with an somewhat more irregular schedule (I think it
deliberately slows down, to avoid DoSsing the master in case the retry
timer was too small). Eventually, it gives up and expires the zone, but
does not attempt to retransfer. It starts to SERVFAIL.
As an operator, in order to fix this, I have to force a transfer for
both Knot and NSD, like this:
knotc zone-retransfer va.
nsd-control force_transfer va.
BIND's behaviour here is the most pragmatic, because it recovers
automatically. NSD's behaviour is also fine, in my opinion, because this
really is an error condition that requires some intervention. Knot's
behaviour is probably the worst of the three, because it is blissfully
unaware of the problem.
The plusses and minuses of these behaviours can of course be debated,
and I'm sure there would be many opinions. I personally prefer the NSD
behaviour. BIND's is also okay, but it sort of hides the problem (only
visible if you look at logs). Knot's behaviour is probably the worst.
I'll open an issue and see what its developers think.
More information about the dns-operations