[dns-operations] Cloudflare DNS resolver (1.1.1.1): Weird DNSSEC race condition
Michael Sinatra
michael at brokendns.net
Wed Aug 8 21:15:10 UTC 2018
On 08/06/18 16:48, Paul Hoffman wrote:
> On 6 Aug 2018, at 16:07, Michael Sinatra wrote:
>
>> HOWEVER, that still doesn't fix the following scenario:
>
> This thread should probably be moved to DNSOP because of this very
> important bit:
>
>> - If the RR is first queried by a client that doesn't set the DO bit,
>> and is then subsequently queried by a client that DOES set the DO bit,
>> then the recursive resolver will return what's in-cache, i.e. the RR
>> without the RRSIG. This would continue to break the use-case described
>> here:
>>
>> https://gitlab.labs.nic.cz/knot/knot-resolver/issues/153
>>
>> and it would greatly complicate timing of the introduction of the DS
>> record for a busy zone that was moving from insecure to secure. The
>> only way to fix that is to set a flag on the cached entry that
>> represents whether the DO bit was set when the recursive resolver
>> queried the authoritative server and cached the result. If the flag
>> isn't set in the cache and the DO bit was set on a subsequent query by a
>> downstream client, then the recursive resolver would have to re-fetch
>> the RR with the DO bit set.
>
> And this one:
>
>> This is starting to get complicated as we attempt to get rid of the
>> corner cases, and as it currently stands, this optimization makes it
>> difficult to time the introduction of new trust anchors and use
>> knot-resolver (and Cloudflare DNS) as a forwarder for a validating
>> resolver. (As cool as I think the optimization otherwise is...)
>
> Although the "how" is implementation-dependent, documenting which states
> a validating resolver SHOULD/MUST keep track of is definitely an
> operational practice. And, FWIW, I don't consider what you are
> describing as an "edge case".
I'll raise it in DNSOP--thanks for suggesting, Paul.
Are you aware of any other RFC sections which suggest timing of RRSIG
introduction into the zone? I know it's a significant issue for
algorithm rollovers, but for new zones that are about to move from
insecure to secure, are there good recommendations? I cited 7583,
section 3.3.5, but it only explicitly mentions DNSKEY presence. My rule
of thumb has been that DNSKEYs *and* RRSIGs should appear for at least
1x(Longest-TTL-in-zone), including the negative TTL. But I don't know
if that's been codified anywhere in an RFC or operational practice (or
if that's even the correct rule-of-thumb, so I often do 2x(longest-TTL)
just in case.
To Petr's point, I definitely appreciate the attention you're giving
this, and I do agree that I am assuming correct compliance. I think the
balance I am trying to achieve is, how do we compensate for brokenness
while not punishing those who are trying to do the right thing? As it
stands, even if I am doing the right thing with respect to timing, I
still might break resolution of a zone that I manage for anyone who is
forwarding to Cloudflare or knot-resolver for up to 3 hours, just by
adding the DS record.
I suspect the above trade-off is the precise reason for DNS Flag Day.
michael
More information about the dns-operations
mailing list