[dns-operations] Intermittent failure on slave zone

Phil Regnauld regnauld at nsrc.org
Tue Mar 1 13:43:12 UTC 2022

Hi Kristian,

Comments inline. This may be a better topic for the bind-users, but
let's see.

Kristian Vilmann (kristian.vilmann) writes:
> The setup:
> Master server (Hidden,internal zones)
>      |
>      |
> Secondary (recursor, cache, Internal zones)
>      |
>      |
> Cache
>      |
>      |
> Internet
> Only the secondary is known by the servers.

    Ok - personnaly I would have left the "secondary" as pure resolver,
    and have some forward/stub zones pointing to the hidden SOA (which
    allows you to substitude BIND with somethinge else for the recursive).

> Config on secondary:


> Logging is configured:

    That looks reasonable, 'default' should catch everything not
    explicitly defined (unless it's off by default, like queries).

> Most of the time it works but once or twice during the day suddenly a query
> fails for a while. Maybe 15 seconds - maybe a minute. I'm not sure how long
> time it takes before it works again. It could be a query for
> influx.int.myzone.eu - an internal host all the servers use all the time.

    Have you been able to strace and attach to the process when it
    happens ?

> We have extensive logging on applications that rely on DNS, so errors are
> visible almost immediately. But even if I'm actively monitoring the errors,
> I cannot reproduce the error with dig on the commandline - which makes
> sense, since queries again are getting the correct response after a very
> short while.

    Ok, so you probably haven't had the time to strace...

> Often I see subsequent queries for influx.int.myzone.eu.myzone.eu. That
> makes sense, but I cannot figure out why it fails in the first place. I see
> nothing in the logs. It happens also when the secondary server is almost
> idle, so I doubt it has anything to do with load.

    Are you seeing actual queries in the log for "myzone.eu.myzone.eu" ?

> As far as I can see, requests to the internal zones are not cached. It makes
> sense since the secondary server has the zone in memory already.

    Correct, the resolver module won't be active for authoritative zones.
    It's not recommended to mix auth and recursive service on the same system
    (although for an internal setup it makes sense, even id I'd put those zones
    behind a stub/forwarded statement instead).

> Is there an error log I haven't discovered yet? Any pointers are much
> appreciated.

    You could try explicitly logging some of the categories listed at
    https://www.zytrax.com/books/dns/ch7/logging.html, but 'default' should
    catch them, as mentioned.

    I'm trying to remember if there could be an issue with BIND trying to fetch
    NS at the parent externally, but I don't see why that would be the case
    -- I'm assuming 'myzone.eu' isn't the real zone name ?


More information about the dns-operations mailing list