[dns-operations] Full-service resolver - Pending Upstream Query behaviour

Frederico A C Neves fneves at registro.br
Tue Oct 5 16:01:19 UTC 2021

Yesterday FB properties incident demonstrated that some
ISPs/Enterprises Full-service resolvers farms are better prepared to
handle the situation and not impact resolution services for all the
other domains.

We do have documents to explain BCP to deal with negative answers and
how to select secondaries for your domains but not that I know of how
to handle of what I'm describing as the Perfect Storm of Pending
Upstream Queries.

I do expect that on the software architecture this is something
related to how to join outstanding upstream queries and how to cache
total timeout of auth-servers. On the operational side I know that
some operators opted as a hopelessness hack to configure auth zones
for dealing with the situation.

Anyway I think that even though the incident was not DNS related "We",
as the DNS community, could probably do better in future events.

I would like to start a discussion or to hear implenters and operators
of Full-service resolvers on what would be the best software
architecture or best current configuration practice to handle a
traffic pattern when a very popular name enters a scenario were all
the auth-servers are timing-out or network unreachable.


More information about the dns-operations mailing list