[dns-operations] Cloudflare DNS resolver (188.8.131.52): Weird DNSSEC race condition
michael at brokendns.net
Mon Aug 6 23:07:05 UTC 2018
On 08/06/18 03:44, Shane Kerr wrote:
> I wasn't aware of this feature but it seems pretty cool to me.
> It also should result in smaller packets and less CPU load on the
> authoritative side for insecure zones, so really it seems like it should
> have been the behavior from the beginning. (A large TLD operator
> mentioned this to me like 10 years ago...)
If I am an authoritative server operator, and I want my zone to remain
insecure, then I probably should not be signing it. If I am signing it,
and I am returning RRSIGs in response to clients that set the DO bit,
then I am essentially signaling my intent that it be *possible* for a
downstream client to validate the signatures I have intentionally placed
in my zone, *or* that I will soon be introducing a DS record and want
signatures to appear in caches before putting the DS record in place.
The current behavior, unfortunately, violates the *spirit* of the timing
recommendations in RFC 7583, section 3.3.5. (That section ostensibly
discusses DNSKEYs, but the effect is the same of the RRSIGs are
effectively being stripped/ignored by a recursive resolver.)
IMO, a better behavior is the following:
- if there is no DS record or other trust anchor configured *and* the
downstream client does NOT set the DO bit in the query to the recursive
resolver, then the recursive resolver does NOT set the DO bit in the
query to the authoritative server.
- if there is a DS record or configured trust anchor, then the recursive
resolver sets the DO bit in the query to the authoritative server and
attempts to validate the received signature. The RRSIG is passed on to
the downstream client based on the presence of the DO bit in the
downstream client's query.
- if there is no DS record or other trust anchor configured, but the
downstream client *does* set the DO bit in the query to the recursive
resolver, then the recursive resolver SHALL set the DO bit in the query
to the authoritative server and return the RRSIGs to the downstream
client (but will not attempt its own validation).
The three above cases all assume that the RR being queried by the
downstream client is not already cached by the recursive resolver.
HOWEVER, that still doesn't fix the following scenario:
- If the RR is first queried by a client that doesn't set the DO bit,
and is then subsequently queried by a client that DOES set the DO bit,
then the recursive resolver will return what's in-cache, i.e. the RR
without the RRSIG. This would continue to break the use-case described
and it would greatly complicate timing of the introduction of the DS
record for a busy zone that was moving from insecure to secure. The
only way to fix that is to set a flag on the cached entry that
represents whether the DO bit was set when the recursive resolver
queried the authoritative server and cached the result. If the flag
isn't set in the cache and the DO bit was set on a subsequent query by a
downstream client, then the recursive resolver would have to re-fetch
the RR with the DO bit set.
This is starting to get complicated as we attempt to get rid of the
corner cases, and as it currently stands, this optimization makes it
difficult to time the introduction of new trust anchors and use
knot-resolver (and Cloudflare DNS) as a forwarder for a validating
resolver. (As cool as I think the optimization otherwise is...)
More information about the dns-operations