[dns-operations] Switching DNSSEC uncooperative operator - help, please
James Stevens
James.Stevens at jrcs.co.uk
Sun Mar 24 11:42:46 UTC 2019
Thank you to everybody who replied to this request for help.
As per below, and advice received, I dropped the TTL on the NS & DNSKEY
(on both the old & new signed zones) to 60 seconds.
Then I switched provider by switching the NS in both the zone and the
parent.
This resulted in CloudFlare very quickly switching to the new keys with
no outage and Google switched very gradually - there appeared to be
little or not outage during this gradual switch, but it was very slow
and it wasn't clear if there might be outage during later stages in the
gradual process.
After about an hour the two ISC servers (bind & unbound) both switched
with about 45s of outage.
So, in order to force the switchover, I repeated the test, but also
instructed the old provider to also switchover to the new signed zone.
This resulted in CloudFlare, Google and ISC all switching to the new
keys within one minute of the switchover, with varying amounts of
outage, from almost none (Cloudflare) to about one minute (Google).
Although adding this extra step meant more outage, we could "own" the
outage - control when it started and how long it lasts. This makes life
easier for us when communicating the process to customers.
So this is now my preferred switchover method.
This extra step also makes switching back more complex, but not
impossible - something we'd like to be able to do, but so far switching
has been successful (with other less busy zones), so I do not expect
ever switching back.
James
On 04/03/2019 20:34, James Stevens wrote:
> I'm working with a large client who is currently trying to change their
> DNSSEC signing operator. The client is *very* against going unsigned, if
> it can be avoided.
>
> Of course, option-1 would be to follow RFC-6781 - but looks like that's
> not going to be possible :(
>
> Currently neither operator appears to support adding each other's
> DNSKEYs in the zone. I have tickets open with both, but I'm not holding
> my breath.
>
>
> What I tried (with a test zone) was to put both operator's DS keys in
> the parent, wait >24 hrs then switch all NS (parent & zone), all the
> time polling Google, Cloudflare and ISC's test bind & unbound to check
> for outage - using queries that would give NXDOMAIN answers, to avoid
> cached answers.
>
> NS ttl is 3600, DNSKEY ttl is 7200 and parent is dot-COM, so DS ttl is
> 86400
>
> Cloudflare switched to validating with the new keys pretty much
> immediately, with very little outage.
>
> ISC's servers both behaved the same - they kept validating on the old
> keys until the NS expired, then they gave SERVFAIL until the DNSKEY
> expired, then they switched and orked fine on the new keys.
>
> Google was odd - they switch randomly and gradually - 7 hours later some
> of their servers still hadn't switched (i.e were giving answers
> validated using the old keys) and they were often refreshing the TTL on
> the old DNSKEYs - I can't see how that could be correct behavior after 7
> hours?? 24 hrs later they had completely switched to the new keys.
>
>
> If I can just get the old provider to carry the new DNSKEYs, it seems to
> me this would alleviate most of the outage.
>
> Failing that plan-b is to try and reduce the TTL on the NS & DNSKEY, to
> minimize the outage.
>
> Plan-Z is unsigned for 24 hrs :(
>
>
> So questions
>
> 1) Am I right that Google is behaving oddly?
> 2) Anybody got any better ideas for switching while avoiding going
> unsigned and avoiding outage ?
>
>
>
> James
>
>
>
>
>
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> dns-operations mailing list
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
More information about the dns-operations
mailing list