[dns-operations] Quantifying performance was Re: requirements for TLD servers
Edward Lewis
Ed.Lewis at neustar.biz
Tue Mar 23 16:33:50 UTC 2010
At 22:21 +0000 3/22/10, Jim Reid wrote:
>For example, the document would discuss the characteristics of a good
>monitoring system without laying down the law about the One True Way
>how monitoring MUST be done.
I doubt that documenting, specifying, deploying, and operating the
OTW of DNS monitoring can be done. There are many problems.
With a goal of measuring the experience of a end user, to accurately
measure this you would need to stick a monitoring device at the end
of a user access network - not in a data center. In getting this
measurement, you have to include a lot of elements that are beyond
the control of the DNS operator - such as the user access network,
any intermediate networks, and so on up to the DNS provider's
equipment.
Another variable is the DNS client strategy for finding an answer.
There is no specified way for an iterating client to pick and choose
what servers to use and how to fail from one server to the next. If
an operator tries to solve for one iterator's (=resolver's or
=cache's) strategy, a different implementation of a iterator might
suffer.
I agree that merely requiring that a DNS operator reply within a
certain span of time to any query - measured from the time the query
hits the DNS operator's site to the transmission of the reply - is
"weak" but what are you going to measure that is under the operator's
control and is unbiased to iterator implementations?
Even if the DNS operator has a well balanced deployment, what if a
transit provider goes out of business or rips up peering? Or just
loses a link? The user may notice degradation but do you hold this
against the DNS operator?
I've been interested in assessing the performance of a distributed
system for more than 25 years now. It's pretty much impossible to do
it, especially if the distributed system is a shared responsibility.
Single owner networks [and I am not restricting this to computer
networks] can be modeled to varying levels of accuracy to identify
bottlenecks. Even with complete access it is still a lot of work and
tricky to build a model that is high fidelity and can accurately
assess the impact of changes. When there are multiple entities
involved, ouch.
Counting servers and sites, measuring machine readiness, etc. and
etc. are things that are a part of monitoring but don't really give a
hint of performance.
One of the most interesting pieces of research I have ever seen on
name server performance was conducted by JPRS in 2006-2007. (A slide
set based on that work can be found here:
http://ripe.net/ripe/meetings/ripe-54/presentations/Measurement_Anycast.pdf).
This showed the impact of tuning anycast on a "letter". The missing
element in the study was that the other servers involved (the other
letters) were not monitored to see if they picked up the slack or not.
I appreciate the effort of RIPE's DNS monitoring. Placing monitors
across the network is a good tradeoff between placing monitors at the
user points and measuring only what a provider has under it's direct
control. That does give a realistic view of performance.
Maybe you don't want to measure the user experience accurately but
rather just define some objective way to measure how DNS
constellations perform. Once we agree on an objectivity, the plan
then has to be carried out.
BTW, this will be discussed (as a registration issue) at the Bar BOF
Wednesday night as part of the IETF.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis
NeuStar You can leave a voice message at +1-571-434-5468
As with IPv6, the problem with the deployment of frictionless surfaces is
that they're not getting traction.
More information about the dns-operations
mailing list