[dns-operations] Cloudflare DNS resolver (1.1.1.1): Weird DNSSEC race condition

Michael Sinatra michael at brokendns.net
Wed Aug 8 21:15:10 UTC 2018


On 08/06/18 16:48, Paul Hoffman wrote:
> On 6 Aug 2018, at 16:07, Michael Sinatra wrote:
> 
>> HOWEVER, that still doesn't fix the following scenario:
> 
> This thread should probably be moved to DNSOP because of this very
> important bit:
> 
>> - If the RR is first queried by a client that doesn't set the DO bit,
>> and is then subsequently queried by a client that DOES set the DO bit,
>> then the recursive resolver will return what's in-cache, i.e. the RR
>> without the RRSIG.  This would continue to break the use-case described
>> here:
>>
>> https://gitlab.labs.nic.cz/knot/knot-resolver/issues/153
>>
>> and it would greatly complicate timing of the introduction of the DS
>> record for a busy zone that was moving from insecure to secure.  The
>> only way to fix that is to set a flag on the cached entry that
>> represents whether the DO bit was set when the recursive resolver
>> queried the authoritative server and cached the result.  If the flag
>> isn't set in the cache and the DO bit was set on a subsequent query by a
>> downstream client, then the recursive resolver would have to re-fetch
>> the RR with the DO bit set.
> 
> And this one:
> 
>> This is starting to get complicated as we attempt to get rid of the
>> corner cases, and as it currently stands, this optimization makes it
>> difficult to time the introduction of new trust anchors and use
>> knot-resolver (and Cloudflare DNS) as a forwarder for a validating
>> resolver.  (As cool as I think the optimization otherwise is...)
> 
> Although the "how" is implementation-dependent, documenting which states
> a validating resolver SHOULD/MUST keep track of is definitely an
> operational practice. And, FWIW, I don't consider what you are
> describing as an "edge case".

I'll raise it in DNSOP--thanks for suggesting, Paul.

Are you aware of any other RFC sections which suggest timing of RRSIG
introduction into the zone?  I know it's a significant issue for
algorithm rollovers, but for new zones that are about to move from
insecure to secure, are there good recommendations?  I cited 7583,
section 3.3.5, but it only explicitly mentions DNSKEY presence.  My rule
of thumb has been that DNSKEYs *and* RRSIGs should appear for at least
1x(Longest-TTL-in-zone), including the negative TTL.  But I don't know
if that's been codified anywhere in an RFC or operational practice (or
if that's even the correct rule-of-thumb, so I often do 2x(longest-TTL)
just in case.

To Petr's point, I definitely appreciate the attention you're giving
this, and I do agree that I am assuming correct compliance.  I think the
balance I am trying to achieve is, how do we compensate for brokenness
while not punishing those who are trying to do the right thing?  As it
stands, even if I am doing the right thing with respect to timing, I
still might break resolution of a zone that I manage for anyone who is
forwarding to Cloudflare or knot-resolver for up to 3 hours, just by
adding the DS record.

I suspect the above trade-off is the precise reason for DNS Flag Day.

michael



More information about the dns-operations mailing list