[dns-operations] Quantifying performance was Re: requirements for TLD servers

Tue Mar 23 16:33:50 UTC 2010

At 22:21 +0000 3/22/10, Jim Reid wrote:
>For example, the document would discuss the characteristics of a good
>monitoring system without laying down the law about the One True Way
>how monitoring MUST be done.

I doubt that documenting, specifying, deploying, and operating the 
OTW of DNS monitoring can be done.  There are many problems.

With a goal of measuring the experience of a end user, to accurately 
measure this you would need to stick a monitoring device at the end 
of a user access network - not in a data center.  In getting this 
measurement, you have to include a lot of elements that are beyond 
the control of the DNS operator - such as the user access network, 
any intermediate networks, and so on up to the DNS provider's 
equipment.

Another variable is the DNS client strategy for finding an answer. 
There is no specified way for an iterating client to pick and choose 
what servers to use and how to fail from one server to the next.  If 
an operator tries to solve for one iterator's (=resolver's or 
=cache's) strategy, a different implementation of a iterator might 
suffer.

I agree that merely requiring that a DNS operator reply within a 
certain span of time to any query - measured from the time the query 
hits the DNS operator's site to the transmission of the reply - is 
"weak" but what are you going to measure that is under the operator's 
control and is unbiased to iterator implementations?

Even if the DNS operator has a well balanced deployment, what if a 
transit provider goes out of business or rips up peering?  Or just 
loses a link?  The user may notice degradation but do you hold this 
against the DNS operator?

I've been interested in assessing the performance of a distributed 
system for more than 25 years now.  It's pretty much impossible to do 
it, especially if the distributed system is a shared responsibility. 
Single owner networks [and I am not restricting this to computer 
networks] can be modeled to varying levels of accuracy to identify 
bottlenecks.  Even with complete access it is still a lot of work and 
tricky to build a model that is high fidelity and can accurately 
assess the impact of changes.  When there are multiple entities 
involved, ouch.

Counting servers and sites, measuring machine readiness, etc. and 
etc. are things that are a part of monitoring but don't really give a 
hint of performance.

One of the most interesting pieces of research I have ever seen on 
name server performance was conducted by JPRS in 2006-2007. (A slide 
set based on that work can be found here: 
http://ripe.net/ripe/meetings/ripe-54/presentations/Measurement_Anycast.pdf). 
This showed the impact of tuning anycast on a "letter".  The missing 
element in the study was that the other servers involved (the other 
letters) were not monitored to see if they picked up the slack or not.

I appreciate the effort of RIPE's DNS monitoring.  Placing monitors 
across the network is a good tradeoff between placing monitors at the 
user points and measuring only what a provider has under it's direct 
control.  That does give a realistic view of performance.

Maybe you don't want to measure the user experience accurately but 
rather just define some objective way to measure how DNS 
constellations perform.  Once we agree on an objectivity, the plan 
then has to be carried out.

BTW, this will be discussed (as a registration issue) at the Bar BOF 
Wednesday night as part of the IETF.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis
NeuStar                    You can leave a voice message at +1-571-434-5468

As with IPv6, the problem with the deployment of frictionless surfaces is
that they're not getting traction.