[dns-operations] Recent DNS issues

Craig Leres leres at ee.lbl.gov
Mon May 4 17:31:47 UTC 2009

I'm running 9.6.0-P1.

ns1.lbl.gov and ns2.lbl.gov were intermittently failing to resolve
some .gov. On Saturday it was reported that dts.ca.gov did not
resolve for some short period of time and on Sunday bso.science.doe.gov
was reported. But I was never able to look at one of these while
they were failing.

This morning, ns1.lbl.gov and ns2.lbl.gov stopped resolving everything.
When I tried to reload the config so I could dump the cache I got:

    May  4 08:49:44 ns1.lbl.gov named[4360]: reloading configuration
failed: out of memory

A few minutes ago nsx.lbl.gov lost the ability to resolve hosts in
the .gov TLD. Attempts were returning SERVFAIL. This was caught and
corrected pretty quickly because I now have nagios doing "dig +dnssec
gov." on all of my nameservers.

Management is starting to ask questions (e.g. "why has DNS been so
flakey lately?") and I sure don't have reasonable answers for them.


