[dns-operations] smart failover: Lua record experiments
x2t at foxtrot.emu.st
Tue Nov 1 22:41:06 UTC 2016
> > What happens if Amazon adopts this, and I stick a while true loop in
> > my zone? Does their entire infrastructure go down?
I guess this is always the challenge when taking the (big) step to a procedural
approach. Presumably declarative approaches were discarded as insufficiently
> This is a very good point, and we've spent a few hours pondering this (we =
> me, plus #powerdns IRC channel). Thank you for bringing it up.
The other thing to consider is that these runnable scripts may be expressed in
the zone but they might end up being evaluated in an independent system which
merely makes the results available to the DNS server in some
implementation-specific way (e.g. as a unique memcache key that matches the
qname/class/type or as a structured URL such as localhost://Qname/Class/Type.
This way all of a zones runnable scripts could be containerized and resource
constrained as needed. With a container-per-zone or container-per-customer
implementation it's also conceivable to have container state so that repeated
external queries don't necessarily re-evaluate the script every time. (It's not
clear to me that the evaluation rate should always match the TTL).
But if an implementation goes this far it begs the question as to whether a
better model is to just offer a standardize Lua container that exposes
well-defined URLs then limit the DNS evaluation side to querying
One challenge will be drawing the line on which external resources these scripts
can access. Is SNMP access reasonable? How about ICMP? These are common in
Nagios-like systems which are doing similar things.
Defining the restriction on script output is another challenge. As described the
Class and Type are pre-defined therefore a script is constrained to returning a
single type-matching piece of Rdata. For example an IN/A script cannot return a
CNAME or multiple RRs. But that might be a pretty common requirement if a
last-resort answer intends to offload to a third-party hosting service.
Another way of doing this might be to associate a default value with each script
which is used in the event of a script error or timeout or null return. But even
then the default might want to be a CNAME or multiple RRs.
More information about the dns-operations