[dns-operations] Current status of DNSViz's historical data
Matthew Pounsett
matt at conundrum.com
Fri Jun 28 15:23:28 UTC 2019
Hi everyone. It's about time for an update anyway, so with thanks to
Viktor for bringing this up, and with my OARC contractor hat on...
On Fri, 28 Jun 2019 at 04:07, Viktor Dukhovni <ietf-dane at dukhovni.org>
wrote:
> $subject. It has been degraded for quite some time now. I am no
> longer optimistic about a return to full functionality.
>
TLDR:
The issue with DNSViz's database is as much a question of time as
technology. I'm expecting it to be back up in about about a week, give or
take a long weekend, but have been sidelined enough times in the couple of
months that I'm obviously hesitant to make firm promises.
Tell Me Everything, I Want to Know (I'm gonna make TMEIWTN the new TLDR):
I think I've given this general rundown here before, but here it is again
with a bit more detail, and an update on recent events.
When Verisign was transferring the DNSViz service to OARC, we copied the
database (across country) to one of our large file stores for safe
keeping. Since the servers were going to spend the better part of a week
on trucks, we were justifiably worried about the risk of data corruption or
outright hardware failure in transit. The server arrived with some RAID
errors. We had also discussed prior to shipping that the server's drives
were configured with maximum performance in mind (at the expense of
available storage), but the database was getting dangerously close to
filling up the volume. We're not able to do a forklift upgrade of all of
the drives this year, so we were already considering reconfiguring the RAID
to some middle ground to gain back some space at the expense of some
performance. This seemed preferable to throwing out older data. The site
visit where I received and installed the DNSViz hardware was the same visit
in which I rebuilt OARC's entire physical plant, and unfortunately ran out
of time to do the rebuild of the database server while on-site. That meant
that the server would need to be rebuilt remotely, but we didn't expect
that to be a problem.
Anyone who does systems operations will be familiar with the issues
surrounding remote management of any servers produced in the last 15 years,
and prior to about 2 years ago: the Java-based remote consoles they depend
on are no longer compatible with modern web browsers, which have all
deprecated the security barn door that was the plugin architecture that
Java-based browser apps used. Newer systems have an HTML5 alternative
available, but most of OARC's Dell hardware is from this dark period, and
we (like like everyone else) have developed workarounds involving VMs
running old versions of Linux with old versions of Firefox and some
third-party plugins. Unfortunately, the HP hardware that runs DNSViz is
also from the dark period, but resistant to those workarounds. We burned a
fair bit of time trying to make them work, though.
HP seems to only officially support the .NET interface to their older
systems now, which means running Windows somewhere. Perhaps
unsurprisingly, it appears that Windows 10 doesn't play nice with KVM's
bridged networking. We also burned a lot of time trying to make that
work.
In an average operation, all of this would not result in many weeks going
by, but you must also remember that OARC is a small shop with one pair of
hands per function. And in my case, it's only 75% of a pair of hands. So,
instead of having 200 or 400 hours of sysadmins available per week, OARC
has 30. OARC has aways done a lot of things with the limited resources it
has; in the time since we received the DNSViz hardware this pair of hands
has been involved in: rebuilding the physical plant, participating in
IETF104, migrating to an entirely new user portal, running a DITL
collection, and organizing and running an OARC workshop in Bangkok. And
that doesn't get into the (literally) fifty other little services that OARC
runs (some public, some only for members, some internal) that need
attention or the less visible systems and network issues that need to be
managed... things like patching this month's remote TCP exploit.
We did briefly try running the database from the fileserver where the
backup is stored. We knew it would be a bit short on memory and expected
some performance issues, but quickly ran into a conflict between the
database and the memory requirements of the OS trying to operate the
filesystem itself, and started to get CRC errors from the filesystem. That
was our only option for alternate hardware to run the database on, and we
will not be bringing it back up there because of the risk of corruption not
only to the DNSViz database but also to this year's DITL collection, which
is on that same filesystem (it's the only one with enough space to hold
them at the moment).
Where things currently stand is that as of yesterday, with the help of
remote hands shuffling things around, I have some external hardware hooked
up to the database server, which gives me access to its console, and a USB
key with an OS installer in place. Today I'm beginning work on actually
rebuilding the RAID and reinstalling the server. Once that's done, we'll
have to wait several days to a week for the data to be copied back to the
server, at which point we'll be able to rebuild indexes, do testing, etc.,
and bring it back into service.
If everything goes perfectly (and we all know how often that happens) that
probably means we could bring it back online next Thursday. That is also
the first day of a US long weekend though, so even if we're ready then we
may wait until the following week.
We understand how important DNSViz has become to many people. I use it
myself on a very regular basis both in my function as OARC's systems
engineer, and in my other work, and have as well in past lives. It's been
an important and useful tool to me since Casey first introduced it. We're
not taking the lack of historical data lightly, but we have to balance our
use of resources carefully. In the final calculation DNSViz mostly works
as-is (only features related to the historical data are missing), and it
has cost a lot of time (which is also money) to get this server running
again, and other things still have had to get done.
If you want to help OARC have more resources to spread around, please
consider becoming a member. We've been investigating other ways to support
OARC's services, but at the moment the lion's share comes from annual
membership dues. OARC also accepts donations at <
https://www.dns-oarc.net/donate>. OARC is a 501(c)(3) not-for-profit in
the US, so you will receive a tax receipt that will be useful at least to
US individuals and corporations.
And that brings to a close this very long explanation for the continued
lack of historical data in the DNSViz interface. If you made it this far,
thanks for reading. Also, if you made it this far, I am quite jealous of
your free time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dns-oarc.net/pipermail/dns-operations/attachments/20190628/7a5f339c/attachment.html>
More information about the dns-operations
mailing list