[dns-operations] Switching DNSSEC uncooperative operator - help, please

James Stevens James.Stevens at jrcs.co.uk
Sun Mar 24 11:42:46 UTC 2019


Thank you to everybody who replied to this request for help.

As per below, and advice received, I dropped the TTL on the NS & DNSKEY 
(on both the old & new signed zones) to 60 seconds.

Then I switched provider by switching the NS in both the zone and the 
parent.

This resulted in CloudFlare very quickly switching to the new keys with 
no outage and Google switched very gradually - there appeared to be 
little or not outage during this gradual switch, but it was very slow 
and it wasn't clear if there might be outage during later stages in the 
gradual process.

After about an hour the two ISC servers (bind & unbound) both switched 
with about 45s of outage.


So, in order to force the switchover, I repeated the test, but also 
instructed the old provider to also switchover to the new signed zone.

This resulted in CloudFlare, Google and ISC all switching to the new 
keys within one minute of the switchover, with varying amounts of 
outage, from almost none (Cloudflare) to about one minute (Google).

Although adding this extra step meant more outage, we could "own" the 
outage - control when it started and how long it lasts. This makes life 
easier for us when communicating the process to customers.

So this is now my preferred switchover method.

This extra step also makes switching back more complex, but not 
impossible - something we'd like to be able to do, but so far switching 
has been successful (with other less busy zones), so I do not expect 
ever switching back.



James




On 04/03/2019 20:34, James Stevens wrote:
> I'm working with a large client who is currently trying to change their 
> DNSSEC signing operator. The client is *very* against going unsigned, if 
> it can be avoided.
> 
> Of course, option-1 would be to follow RFC-6781 - but looks like that's 
> not going to be possible :(
> 
> Currently neither operator appears to support adding each other's 
> DNSKEYs in the zone. I have tickets open with both, but I'm not holding 
> my breath.
> 
> 
> What I tried (with a test zone) was to put both operator's DS keys in 
> the parent, wait >24 hrs then switch all NS (parent & zone), all the 
> time polling Google, Cloudflare and ISC's test bind & unbound to check 
> for outage - using queries that would give NXDOMAIN answers, to avoid 
> cached answers.
> 
> NS ttl is 3600, DNSKEY ttl is 7200 and parent is dot-COM, so DS ttl is 
> 86400
> 
> Cloudflare switched to validating with the new keys pretty much 
> immediately, with very little outage.
> 
> ISC's servers both behaved the same - they kept validating on the old 
> keys until the NS expired, then they gave SERVFAIL until the DNSKEY 
> expired, then they switched and orked fine on the new keys.
> 
> Google was odd - they switch randomly and gradually - 7 hours later some 
> of their servers still hadn't switched (i.e were giving answers 
> validated using the old keys) and they were often refreshing the TTL on 
> the old DNSKEYs - I can't see how that could be correct behavior after 7 
> hours?? 24 hrs later they had completely switched to the new keys.
> 
> 
> If I can just get the old provider to carry the new DNSKEYs, it seems to 
> me this would alleviate most of the outage.
> 
> Failing that plan-b is to try and reduce the TTL on the NS & DNSKEY, to 
> minimize the outage.
> 
> Plan-Z is unsigned for 24 hrs :(
> 
> 
> So questions
> 
> 1) Am I right that Google is behaving oddly?
> 2) Anybody got any better ideas for switching while avoiding going 
> unsigned and avoiding outage ?
> 
> 
> 
> James
> 
> 
> 
> 
> 
> _______________________________________________
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> dns-operations mailing list
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations



More information about the dns-operations mailing list