[dns-operations] The perils of retroactive DNSSEC validation

Fri Nov 14 17:43:51 UTC 2008

At 14:51 +0100 11/14/08, Florian Weimer wrote:
>DNSSEC, as it is currently deployed, assumes that it's possible to run
>a hierarchy of caches where validition efforts increase down the
>hierarchy (that is, an initiator has more trust anchors to work from
>than a responding cache).  Therefore, caches are expected to store
>data which they cannot validate ("Indeterminate" in the language of
>RFC 4035).

The first question I have is about "hierarchy of caches."  This is 
not a concept I am familiar with.  Do you mean an answer that is 
gleaned from cache that received the answer from a different cache 
(and so on) as in a forwarding environment or are you referring to 
the data in a domain hierarchy that is stored in a single, "remote" 
cache?

Given my confusion, I would challenge the assertion that DNSSEC 
"assumes" "a hierarchy of caches where validation efforts increase."

OTOH, I may understand what you mean by caches holding data that 
fails validation.  There was quite some debate over whether this kind 
of data was kept or not, it is kept to prevent the cache from 
repeatedly trying to retrieve it.  (I think.  It's been a while since 
the issue came to a conclusion.)

>The question that bugs me is: Why do you think this can actually work?

I can't answer this because "this" is an unresolved pointer in my 
reading of the question. ;)

>For Bogus/BAD data (for the distinction or the lack thereof, see my
>question on the namedroppers list), an intermediate cache will
>eventually make an attempt at fetching the data again.  But not for
>Indeterminate data.  So once bad data ends up in the cache hierarchy,
>and there's no attempt at validation, it simply stays there, resulting
>in a rather effective denial of service.

(Note on mail: when referring to mail on another list, it is helpful 
to include a pointer to the web-based archives.  It makes it easier 
to find - and also helps those not subscribed to the other list. 
Sorry for this annoyance.)

Either you can tie up a cache going after data that won't stick 
because it is bad or you can tie up the cache from seeing new data 
that will validate because the authority was refreshed.  We (the IETF 
WG or some subset) decided that solving the former as more important 
than the latter.  It's a dilemma, you can't solve for both.  The 
question is - what is the TTL of the bad data.  I forget the 
recommendation, but I hope it is the negative cache TTL (in the SOA). 
I.e., the equivalent of NCACHE'ing something that is then added.

>It's hard to see that anyone is going to attempt a cache flush
>procedure because our bad experience with updating data before the TTL
>expires.  So it's difficult to see how this is going to be fixed.
>(Signing the root doesn't help if the caching hierarchy is mostly
>non-validating.)

There's no scale-able solution to cache flushing.  I mean, it can be 
done managed service situation (there are products that do it), but 
not on the scale of the Internet.

>My conclusion is that validators forwarding to non-validating caches
>just don't work and should be deprecated.

I don't understand this.  First, a forwarder has "up to" one bit of 
information regarding the DNSSEC capabilities of the requestor (the 
DO bit in the [optional] EDNS0 OPT RR).  There's no way to bar a DNS 
server with validation from answering queries only from 
non-validating DNS servers with caches because such clients can't be 
isolated.

Ok, maybe 3 bits -> AD and CD too.

>This also means that
>Indeterminate answers are relegated to an odd corner case, and once
>that has happened, key rollovers and redelegation with key changes
>become much less daunting: If the involved zone operators make a
>mistake, RRsets may end up in the BAD cache described in RFC 4035,
>which should be a rather temporary situation.  On top of that, lack of
>a response from a name server for a query which should result in a
>response with Secure data could mean that the (unsigned) referral has
>been tampered with, so a cache can be expected to periodically recheck
>the delegation chain.  This will actually improve operator experience
>because mistakes are fixed much quicker.

?

>Unfortunately, while the suggestion above does not require protocol
>changes, it seem require extensive changes in some implementations
>because iterator, cache and validator and validator are separate
>modules (at least conceptually).  In other words, the valdiator
>validates retroactively, and the BAD cache does not seem to have been
>properly implemented.

Let's say you get an RRset with a signature valid for November 2008. 
And for simplicity let's say you have a trust anchor validating the 
key in the signer field.  What does "retroactively validate" mean? 
If you perform the validation on November 1, on November 15, or 
November 30, is any of them "retroactive?"

The validity date on the signature is intended to reflect the quality 
of the signature - that is, a function of the useful lifetime of the 
private key.  The expiration may not be the time that the key is no 
longer useful, it might be a premature time selected to provide the 
administrator some flexibility in operations.

The TTL is still significant.  It is the reflection of the 
anticipated changes in the data as stated by the administrator.  If 
the TTL expires before the signature validity, one possible 
interpretation is that the data is more volatile than the private key 
signing the zone.  We cap the TTL by the remaining time until the 
signature expires - because the assumption is that the signature 
validity somehow says the data "expires" then.

(Thinking now, I feel this is a poor assumption but a harmless one. 
At least the cap is just to the signing key's signature, not the 
entire chain. ;)  I do remember those debates.)

If you want to really get twisted over explaining "retroactive 
validation" how do you handle an RRSet with three signatures, one for 
Mon-Wed, another for Fri, and another for Sat-Sun.?  What about a 
validation on Thurs?  (These are the headaches we solved for a decade 
ago, when we were trying to find the most flexible solution.)

Another "spanner in the works" is clock skew.  Nowadays NTP is 
prevalent but not so when we designed the protocol.  We allowed 
caches to hold data they felt was invalid because it might have been 
a problem with the local clock being wrong.

Or, an intermediate cache may not have the crypto library for the 
algorithm used by the zone...another local problem...but not 
something related to "retroactive."

And on and on...

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis                                                +1-571-434-5468
NeuStar

Never confuse activity with progress.  Activity pays more.