[dns-operations] Difference between cacheing B data and I data

Thu Nov 20 21:42:00 UTC 2008

* Edward Lewis:

> Back to DNSSEC data.  Bad or bogus is determined by relying on a pile
> of "good" records, these records aren't in need of an update until
> their individual TTLs expire.  So there's no need to go back and ask
> again.  Now, if the RR set you are testing fails validation, there's
> little chance the outcome will change anytime soon.  The time it is
> held in the bad/bogus state might as well be the same as the NXDOMAIN
> TTL as it's essentially the same state of the data.

I think this is a consistent approach, but not the one suggested by
the BAD cache in RFC 4035.

> I'd add a caveat to this.  If a query is issued and a response comes
> that fails validation, my gut is that the recipient ought not give up
> listening until the timeout for an unanswered query expires or another
> response is heard that is accepted.

This doesn't help that much if you are sending the query to the wrong
place.  I'm not sure if special-casing this is worth the trouble.

> In re-examining the message from Florian, this is the penultimate
> paragraph, diced up to try to interpret it:
>
> # My conclusion is that validators forwarding to non-validating caches
> # just don't work and should be deprecated.
>
> Issue 1 - we can't deprecate this because the distinction can't be
> made.

We could say that sending the query to a cache which has a subset of
your trust anchors installed is an unsupported configuration.
>
> # This also means that indeterminate answers are relegated to an odd
> # corner case, and once that has happened, key rollovers and redelegation
> # with key changes become much less daunting: If the involved zone
> # operators make a mistake, RRsets may end up in the BAD cache described
> # in RFC 4035, which should be a rather temporary situation.
>
> I really can't figure this out.  Indeterminate answers are a product
> of the use of UDP.  We can't rule them out, even if they are rare in
> healthy networking environments (healthy as in low packet loss).

Read Insecure instead of Indeterminate, please.

Let me try again, this time with a concrete example.  Suppose that
.net is signed, and my validating, security-aware, non-iterative
resolver has a corresponding trust anchor.  My resolver sends queries
to security-aware, but non-validating recursive resolvers at my ISP
(non-validating because it lacks the trust anchor, for instance).  I
try to resolve www.example.net/IN/A.  Assume that the query results in
an upstream query from the ISP's resolver.  Someone mounts a
successful cache poisoning attack against it, and the ISP's resolver
ends up caching an A RRset with an RRSIG which doesn't match,
returning it to me.  My resolver detects that the data is Bogus/BAD.
It puts that information into its BAD cache and signal a resolution
failure to the application.  My application periodically tries to
resolve the domain name (because it still needs to download
something).  After some time, the BAD TTL expires.  Another query is
sent to my ISP's resolvers.  It responds with the same data is before
because expiration in their cache is controlled by the regular TTL
(because the data is in Insecure state).  My resolver detects the
tampering, and returns a resolution failure to the application, again.
And so on, until the data expires from the upstream cache.  (With an
LRU-based cache, my continued queries may ensure that expiration won't
happen before the original TTL expires.)

Isn't this how the protocol is supposed to work?  Is the continued
resolution failure an acceptable outcome in this case?

I've framed this in terms of an attack, but the same thing happens if
the zone owner publishes bad DNSKEY records or outdated RRSIG records.
There are non-DNSSEC equivalents of these errors; a few caching
recursors implement workarounds (some of which caused problems of
their own).  With DNSSEC, such workarounds are impossible to provide
if you haven't got the trust anchors your downstream clients use.