[dns-operations] Some DNSSEC trivia

Mon Jan 7 21:08:08 UTC 2008

* Florian Weimer wrote:
> Due to better Name Error caching, DNSSEC might actually be a win,

Ack.

> of view, the most pressing issue is who is responsible if DNSSEC
> breaks stuff.  If I turn on DNSSEC and this impacts customers[*]
> because they publish bad data, is this my fault or theirs?  I
> understand that there is no easy answer at this stage.

I - as the guy who signs the zone - was the responsible person.
The most common problems are:
  a) Broken mail transport for qmail based senders.
  b) Nasty discussions with customers CEO, why a special written permit
     for a "Test Scenario" is necessary.
  c) Additional training because of modified DNS management tools.
  d) Breakdown of infrastructure due to expired signatures.
  e) Zone delegation removal by the ccTLD-NIC.

Two from about 400 zones where switched to unsigned due to problems of type
a). The (single) customer did not accept any mail transport problem
regardless of the reason. All other problems of type a) are considered as a
remote configuration problem and where solved by switching the sending MTA.

Problems of type b) are only caused by the early SE requirements. Some
CEOs do not *want* to deal with such technical fiddlings. This problem is
considered done unless a new TLD starts with such fax templates.

Type c) problems can be solved by a piece of chocolate and a cup of coffee.
It's only problematic if admins are calling in the nights hours in order to
ask a "simple" question how to do something ...

A must admit, that I caused the type d) problem twice: After moving a zone
to the customers NS and forgot to register the resigning job in the crontab
and after introducing my signed root and run into a script error in the very
early stage ... The resigning of the customers zone was easy and fortunely
the incident happend in the night hours, It was reported by the network
monitoring, because some hosts are "unreachable" due their remove from the
DNS. ... The only missing signature in the root was those of the NS records
for ".". It was somewhat fatal, because most remote access doors where
closed after DNS fails on *any* request in the whole production enviroment.
Of course, this problem was easily detected and fixed by turning off the
validation on the recursiv servers for an hour.

Type e) problems occur, if your are signing an 'fr' zone and using a
recursive as well as authoritive NS for this zone. If you turn on validation
and switch to, e.g. a different - signed - root, the FR-NIC removes the zone
delegation from the FR zone, because their ongoning valitity checks fail to
return the "correct ICANN NS for .". Bad luck. Split authoritive and
recursive.