[dns-operations] BIND, Knot and NSD behaviour on zone expiry
pk at DENIC.DE
Tue Feb 11 10:24:11 UTC 2014
On Mon, Feb 10, 2014 at 11:52:11PM +0100, Anand Buddhdev wrote:
> The zone's operator had accidentally set its serial in the future, and
> then set it back, not realising that they should have performed a serial
this is the core of the problem. There might be more than one appropriate
response to a protocol violation.
> Regardless of the recovery method, I'm more interested in opinion about
> zone expiry. All the servers were able to query the master for the SOA
> record, as well as transfer from it. However, after seeing an older
> serial for an extended period, both BIND and NSD expired the zone,
> presumably because they couldn't synchronise the zone with the master.
> Knot seems to think that it's okay to serve the zone as long as it can
> query the master, even if the master's serial number is different.
> Is Knot's behaviour acceptable?
I see no reason to single out a particular implementation. In fact, the more
diversity we see, the more interesting parts of the DNS specification we get into.
Zone expiry hasn't been fully specified in 1034/1035 (remember the
SERVFAIL vs REFUSED discussion).
To detect changes, secondaries just check the SERIAL field of the SOA
for the zone. In addition to whatever other changes are made, the
SERIAL field in the SOA of the zone is always advanced whenever any
change is made to the zone. The advancing can be a simple increment, or
could be based on the write date and time of the master file, etc. The
purpose is to make it possible to determine which of two copies of a
zone is more recent by comparing serial numbers. Serial number advances
and comparisons use sequence space arithmetic, so there is a theoretic
limit on how fast a zone can be updated, basically that old copies must
die out before the serial number covers half of its 32 bit range. In
practice, the only concern is that the compare operation deals properly
with comparisons around the boundary between the most positive and most
negative 32 bit numbers.
This was later refined with RFC 1982.
The periodic polling of the secondary servers is controlled by
parameters in the SOA RR for the zone, which set the minimum acceptable
polling intervals. The parameters are called REFRESH, RETRY, and
EXPIRE. Whenever a new zone is loaded in a secondary, the secondary
waits REFRESH seconds before checking with the primary for a new serial.
If this check cannot be completed, new checks are started every RETRY
seconds. The check is a simple query to the primary for the SOA RR of
the zone. If the serial field in the secondary's zone copy is equal to
the serial returned by the primary, then no changes have occurred, and
Note it says "is equal to" and not "is equal to or lower than", even though the
text in the previous paragraph suggests that one half of the 2^32 space is "older"
and also is explicit about the purpose: "... make it possible to determine which
of two copies of a zone is more recent".
the REFRESH interval wait is restarted. If the secondary finds it
impossible to perform a serial check for the EXPIRE interval, it must
assume that its copy of the zone is obsolete an discard it.
It can be argued that it's _not_ impossible to perform the check, just that
the check found no increase (and no equality, either). There's an edge case
for serial + 2^31, though.
1035, for completeness, not contributing much, reads:
SERIAL The unsigned 32 bit version number of the original copy
of the zone. Zone transfers preserve this value. This
value wraps and should be compared using sequence space
Secondary servers use the serial number in the SOA record of the zone
to determine when it is necessary to update their local copy of the
zone. Serial numbers are basically just 32 bit unsigned integers
that wrap around from the biggest possible value to zero again. See
[RFC1982] for a more rigorous definition of the serial number.
Occasionally due to editing errors, or other factors, it may be
necessary to cause a serial number to become smaller. Never simply
decrease the serial number. Secondary servers will ignore that
change, and further, will ignore any later increments until the
earlier large value is exceeded.
While this is descriptive rather than normative text, it can be argued that this
behaviour was expected and intended. In fact, if "change" would have been
the desired indicator, the whole 'sequence space arithmetic' would have been
useless. Also, it would have led to swing state under certain circumstances.
RFC 2136, 22.214.171.124., is the only text that explicitly mentions the "lower"
relation, but doesn't help here.
So, it can be argued that expiring the zone for an SOA serial to be higher than at the
respective master, is already a step too far. To that extent, Knot's behaviour
is protocol conformant and also in line with behaviour warned about as early as RFC 1034
and RFC 2181. Doesn't play nice with DNSSEC, though.
> In my opinion, BIND has done the pragmatic thing here and recovered by
I agree with that, too. However, it only worked because this server continued
the SOA checks after the expire (and remember, this proves that the checks
aren't "impossible", so there was no reason to expire in the first place).
NSD, as per your observation, not only discards the contents of the zone but
apparently does not continue the SOA checks. Makes sense in those situations
where the master has gone. Especially when you have a server with a large number
of zones sourced from the same unreachable master, the housekeeping overhead can
Therefore, the summary response is "it depends". If I had a wish, I'd ask that Knot
not simply be adjusted to "what BIND does", because we seem to have a difference
in interpretation of the spec and there is a need to fix that. The ever so often
abused 'robustness principle' isn't enough to rule: it isn't the server's fault
to start with. Which means, this is work for someplace IETF.
More information about the dns-operations