[dns-operations] The perils of retroactive DNSSEC validation
Ed.Lewis at neustar.biz
Fri Nov 14 17:43:51 UTC 2008
At 14:51 +0100 11/14/08, Florian Weimer wrote:
>DNSSEC, as it is currently deployed, assumes that it's possible to run
>a hierarchy of caches where validition efforts increase down the
>hierarchy (that is, an initiator has more trust anchors to work from
>than a responding cache). Therefore, caches are expected to store
>data which they cannot validate ("Indeterminate" in the language of
The first question I have is about "hierarchy of caches." This is
not a concept I am familiar with. Do you mean an answer that is
gleaned from cache that received the answer from a different cache
(and so on) as in a forwarding environment or are you referring to
the data in a domain hierarchy that is stored in a single, "remote"
Given my confusion, I would challenge the assertion that DNSSEC
"assumes" "a hierarchy of caches where validation efforts increase."
OTOH, I may understand what you mean by caches holding data that
fails validation. There was quite some debate over whether this kind
of data was kept or not, it is kept to prevent the cache from
repeatedly trying to retrieve it. (I think. It's been a while since
the issue came to a conclusion.)
>The question that bugs me is: Why do you think this can actually work?
I can't answer this because "this" is an unresolved pointer in my
reading of the question. ;)
>For Bogus/BAD data (for the distinction or the lack thereof, see my
>question on the namedroppers list), an intermediate cache will
>eventually make an attempt at fetching the data again. But not for
>Indeterminate data. So once bad data ends up in the cache hierarchy,
>and there's no attempt at validation, it simply stays there, resulting
>in a rather effective denial of service.
(Note on mail: when referring to mail on another list, it is helpful
to include a pointer to the web-based archives. It makes it easier
to find - and also helps those not subscribed to the other list.
Sorry for this annoyance.)
Either you can tie up a cache going after data that won't stick
because it is bad or you can tie up the cache from seeing new data
that will validate because the authority was refreshed. We (the IETF
WG or some subset) decided that solving the former as more important
than the latter. It's a dilemma, you can't solve for both. The
question is - what is the TTL of the bad data. I forget the
recommendation, but I hope it is the negative cache TTL (in the SOA).
I.e., the equivalent of NCACHE'ing something that is then added.
>It's hard to see that anyone is going to attempt a cache flush
>procedure because our bad experience with updating data before the TTL
>expires. So it's difficult to see how this is going to be fixed.
>(Signing the root doesn't help if the caching hierarchy is mostly
There's no scale-able solution to cache flushing. I mean, it can be
done managed service situation (there are products that do it), but
not on the scale of the Internet.
>My conclusion is that validators forwarding to non-validating caches
>just don't work and should be deprecated.
I don't understand this. First, a forwarder has "up to" one bit of
information regarding the DNSSEC capabilities of the requestor (the
DO bit in the [optional] EDNS0 OPT RR). There's no way to bar a DNS
server with validation from answering queries only from
non-validating DNS servers with caches because such clients can't be
Ok, maybe 3 bits -> AD and CD too.
>This also means that
>Indeterminate answers are relegated to an odd corner case, and once
>that has happened, key rollovers and redelegation with key changes
>become much less daunting: If the involved zone operators make a
>mistake, RRsets may end up in the BAD cache described in RFC 4035,
>which should be a rather temporary situation. On top of that, lack of
>a response from a name server for a query which should result in a
>response with Secure data could mean that the (unsigned) referral has
>been tampered with, so a cache can be expected to periodically recheck
>the delegation chain. This will actually improve operator experience
>because mistakes are fixed much quicker.
>Unfortunately, while the suggestion above does not require protocol
>changes, it seem require extensive changes in some implementations
>because iterator, cache and validator and validator are separate
>modules (at least conceptually). In other words, the valdiator
>validates retroactively, and the BAD cache does not seem to have been
Let's say you get an RRset with a signature valid for November 2008.
And for simplicity let's say you have a trust anchor validating the
key in the signer field. What does "retroactively validate" mean?
If you perform the validation on November 1, on November 15, or
November 30, is any of them "retroactive?"
The validity date on the signature is intended to reflect the quality
of the signature - that is, a function of the useful lifetime of the
private key. The expiration may not be the time that the key is no
longer useful, it might be a premature time selected to provide the
administrator some flexibility in operations.
The TTL is still significant. It is the reflection of the
anticipated changes in the data as stated by the administrator. If
the TTL expires before the signature validity, one possible
interpretation is that the data is more volatile than the private key
signing the zone. We cap the TTL by the remaining time until the
signature expires - because the assumption is that the signature
validity somehow says the data "expires" then.
(Thinking now, I feel this is a poor assumption but a harmless one.
At least the cap is just to the signing key's signature, not the
entire chain. ;) I do remember those debates.)
If you want to really get twisted over explaining "retroactive
validation" how do you handle an RRSet with three signatures, one for
Mon-Wed, another for Fri, and another for Sat-Sun.? What about a
validation on Thurs? (These are the headaches we solved for a decade
ago, when we were trying to find the most flexible solution.)
Another "spanner in the works" is clock skew. Nowadays NTP is
prevalent but not so when we designed the protocol. We allowed
caches to hold data they felt was invalid because it might have been
a problem with the local clock being wrong.
Or, an intermediate cache may not have the crypto library for the
algorithm used by the zone...another local problem...but not
something related to "retroactive."
And on and on...
Edward Lewis +1-571-434-5468
Never confuse activity with progress. Activity pays more.
More information about the dns-operations