[dns-operations] Difference between cacheing B data and I data

Tue Nov 18 20:49:41 UTC 2008

Referring to the thread that started with this message:

https://lists.dns-oarc.net/pipermail/dns-operations/2008-November/003362.html

And focusing on the question of "why is data that is demonstrated to 
be invalid after obtaining a complete collection of relevant RR sets 
(that is "Bad" or "Bogus" data treated differently than data for 
which there is an failure in obtaining all needed RR sets to make a 
validation determination?"

In other words, why is Bad/Bogus data treated differently than 
Indeterminate data in caches?

The reason for my verbosity is that Florian has a point in this message:

http://ops.ietf.org/lists/namedroppers/namedroppers.2008/msg02183.html

(BTW, finding this message triggered a bug report to the archive 
master...yet more of a long story that needn't be retold here.)

In my opinion, not vetted obviously by the DNSEXT WG (as no one else 
seems to have chimed in), the terms Bad and Bogus are equivalent in 
the document.  The difference in terminology dates back to the 
introduction of Bogus into the document late in the team's editing 
process.

So, focus back on bad vs. indeterminate.

I've had experience in a similar area - that is in testing 
delegations.  Lame servers refers to servers that are not 
authoritative for a domain of the question received and they respond 
indicating that.  A server that is not responsive, has no address 
record, etc., is not lame, it is unresponsive.  The difference is 
that a lame server is telling you clearly that it should not be 
consulted with the question.  A lack of a response could indicate any 
number of network related errors - the fix might be in reconnecting a 
device, adjusting routing, lengthening a timeout, restarting a 
process, and so on and so on.  The troubleshooting of each is quite 
different, as well as the mean time to repair.  (And "mean time to 
miracle" referring to problems that seem to take care of themselves.)

Back to DNSSEC data.  Bad or bogus is determined by relying on a pile 
of "good" records, these records aren't in need of an update until 
their individual TTLs expire.  So there's no need to go back and ask 
again.  Now, if the RR set you are testing fails validation, there's 
little chance the outcome will change anytime soon.  The time it is 
held in the bad/bogus state might as well be the same as the NXDOMAIN 
TTL as it's essentially the same state of the data.

I'd add a caveat to this.  If a query is issued and a response comes 
that fails validation, my gut is that the recipient ought not give up 
listening until the timeout for an unanswered query expires or 
another response is heard that is accepted.

Indeterminate data is different.  The rationale for not caching it as 
bad for the same TTL is based on the reasons why it failed to be 
validated.  The reason for the failure might be transitory and 
unrelated to any server authoritative for any piece of the trust 
"chain."  For this reason, asking again might clear things up.

In re-examining the message from Florian, this is the penultimate 
paragraph, diced up to try to interpret it:

# My conclusion is that validators forwarding to non-validating caches
# just don't work and should be deprecated.

Issue 1 - we can't deprecate this because the distinction can't be made.

# This also means that indeterminate answers are relegated to an odd
# corner case, and once that has happened, key rollovers and redelegation
# with key changes become much less daunting: If the involved zone
# operators make a mistake, RRsets may end up in the BAD cache described
# in RFC 4035, which should be a rather temporary situation.

I really can't figure this out.  Indeterminate answers are a product 
of the use of UDP.  We can't rule them out, even if they are rare in 
healthy networking environments (healthy as in low packet loss).

# On top of that, lack of a response from a name server for a query which
# should result in a response with Secure data could mean that the
# (unsigned) referral has been tampered with, so a cache can be expected
# to periodically recheck the delegation chain.

It is possible that the query was dropped on the way out, the 
response dropped on the way back.  I think leaping to the conclusion 
that there was a tampering is wrong.  If anything, an inserted 
response will come sooner.  If tampering is happening, it's probably 
systemic and there would be no way "around" the tampering anyway (so 
just give up).

# This will actually improve operator experience because mistakes are
# fixed much quicker.

Mistakes are not the same as attacks.  DNSSEC isn't there to help 
with mistakes,  entering the wrong key is going to be an ongoing 
error, not something retries will fix.  That is, a key with the wrong 
value (or DS or RRSIG) is going to feed a bad/bogus state which is a 
hard error until someone changes the data.  But if the server with 
the data is up and down, then we will have indeterminate when it is 
down and that will pass with retries.

As a PS - someone privately added that Florian's idea was a good one. 
It might be that I am still missing the point.  What I have read is 
an assessment of the perils of treating bad/bogus differently than 
indeterminate.  What I haven't seen is what would be an improvement. 
I'm not trying to be confrontational, I'm trying to understand the 
proposal.
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis                                                +1-571-434-5468
NeuStar

Never confuse activity with progress.  Activity pays more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20081118/09d7ad4c/attachment.html>