[dns-operations] Someone from Cloudflare here?

Viktor Dukhovni ietf-dane at dukhovni.org
Tue Oct 27 06:34:48 UTC 2020


On Tue, Oct 27, 2020 at 01:47:03AM -0400, John Franklin wrote:

> > I might note that my "kid" 29460111 is also shared by many domains
> > (presently at least 251,300) and has been in use for over two years.
> 
> I'd expect Cloudflare to have more than 206k customers, so clearly not 
> their sole signing key.  200k to 300k zones per signing server sounds 
> plausible, each (as you say below) with its own HSM.

Yeah, that sounds plausible.

> I'm less concerned about the lack of a ZSK rollover per se, and more 
> that the RRSIG covering it wasn't regenerated in a timely manner.  All 
> we can really say is it wasn't caused by a rollover, but if the RRSIG 
> and the rollover are always done at the same time, that would explain 
> the expiring RRSIG.  As you point out, the past history of monthly ZSK 
> rollovers on the prior account, and none in the three months for the 
> current account -- is that a bug in their zone management or is it just 
> a different policy for ECDSA(13) vs RSA(8) ZSKs?

It rather looks like difference in policy between RSA and ECDSA.  The
200k+ other domains sharing the ECDSA keys don't appear to have regular
key rollovers, but are getting re-signed regularly and continue to
operate normally.

> I'm going to recommend to our team that we add some DNS monitoring to 
> track the DNSKEY RRSIG expiration dates.

That's definitely the most critical one to track, but you might also to
track the SOA, because the DNSKEY RRset tends to get signed with just
the KSK, while the SOA and the rest of the zone are signed with the ZSK.
It is not too far fetched to imagine a failure mode where the KSK
continues signing the DNSKEY RRset, but some or all the ZSK-based
signatures expire.

> Other RRs in the zone get 2-day  (50 hour) RRSIGs, with an inception
> date roughly one day in the past.  There simply aren't 3.14 days for a
> warning track, making pre-failure monitoring harder and more likely to
> generate critical alerts outside normal business hours.

Yes, the new signatures show ~60 days of signature with the KSK:

    agrilinks.org. IN DNSKEY 256 3 13 oJMRESz5E4gYzS/q6XDrvU1qMPYIjCWzJaOau8XNEZeqCYKD5ar0IRd8KqXXFJkqmVfRvMGPmM1x8fGAa2XhSA==
    agrilinks.org. IN DNSKEY 257 3 13 mdsswUyr3DPW132mOi8V9xESWE8jTo0dxCjjnopKl+GqJxpVXckHAeF+KkxLbxILfDLUT0rAK9iUzy1L53eKGQ==
    agrilinks.org. IN RRSIG DNSKEY 13 2 3600 20201126034731 20200927034731 2371 agrilinks.org. yjY9OSOLtMViN8ZYL/J0uaUGzTtJcHoyzP5WhMXIXqqF99YONh4AkmL0D1kOBkWKFnwqseU8vFbME8BmigQRxA==

but only ~50 hours with the ZSK:

    agrilinks.org. IN SOA anirban.ns.cloudflare.com. dns at cloudflare.com. 2035543705 10000 2400 604800 3600
    agrilinks.org. IN RRSIG SOA 13 2 3600 20201028072429 20201026052429 34505 agrilinks.org. DQGPPuKgn+vA3L4LLbT4bTatSimVoPI1UvfEq/76phlPaN2tvDY42DPR9kRTFvfvWr2GzmcX2eMwAM4w5DxCDg==

However, the zone is no longer statically signed, the signatures are
done "on the fly".  Successive queries to the authoritative servers show
a new +/- 25 hour sliding window signature validity time:

    agrilinks.org. IN SOA anirban.ns.cloudflare.com. dns at cloudflare.com. 2035543705 10000 2400 604800 3600
    agrilinks.org. IN RRSIG SOA 13 2 3600 20201028072909 20201026052909 34505 agrilinks.org. ehL9PmCXq9EqqVflDNdWJoznIpyj0d0LOGAbJBkaTqr6g1Hxs4e1lh67Nn3wpWLmdgPhSKRQbhYlpCcLJ5jOWA==

    ...

    agrilinks.org. IN SOA anirban.ns.cloudflare.com. dns at cloudflare.com. 2035543705 10000 2400 604800 3600
    agrilinks.org. IN RRSIG SOA 13 2 3600 20201028072916 20201026052916 34505 agrilinks.org. 0viyrLl6iK4czVUj+nHKgKdv7UCUr33KtUY2UwBYIsH/UTEnppLuy5pRYAE6RxM9JcN0VfCjmVI1zmtYnaAjqg==

Therefore, if you skip all caches, and directly ask the auth servers,
you can alarm as soon as the ZSK validity time is under 24h. It isn't
much of a lead time, but it should be enough provided the alerts are
not lost.  False positives should be rare.

-- 
    Viktor.



More information about the dns-operations mailing list