[dns-operations] new public DNS service: 9.9.9.9

Mon Dec 4 14:17:52 UTC 2017

abang wrote:
> Am 22.11.2017 um 09:14 schrieb P Vix:
>> Ok! Please show your measurements and metrics. Your assertion is
>> most intriguing.
>>
> This [1] is a measurement of query/answer (qa) latency within a DNS
> resolver. The traffic I sent to this resolver is a sample of a copy
> of real customer traffic with a increasing sample rate.

sampling an aggregate flow is a not-good way to approximate individual
flows unless the individual flow is a large enough fraction of the total
to be representative. are you able to run actual per-stub flows at
varying densities? remember, your intriguing claim was:

>> A DNS resolver depends on caching. You need tens of thousands of
>> subscribers for your RDNS to get a reasonable average latency.

your methodology here does not simulate small subscriber pools. since 
your earlier claim rests on a higher shared-cache hit rate than what 
would be seen from a non-shared cache, methodology is quite important. 
we need to know the average stub's working set of visited DNS content as 
well as its re-visit rate, and also the rdns's cache size and purge policy.

> I started with a query rate of 1 query per second (qps) which is the
> average equivalent of 10 clients. After two days I increased the
> query rate to 10qps (100 clients). After another day 100qps and so
> on. The average qa latency at 1qps is ~60ms, while it is ~3ms at
> 30kqps:

average alone can be very misleading. i suggest a more nuanced 
exposition, after first reviewing:

https://www.autodeskresearch.com/publications/samestats

> 10 clients (1qps): ~60ms 100 clients (10qps): ~35ms 1.000 clients
> (100qps): ~25ms 10.000 clients (1kqps): ~10ms 100.000 clients
> (10kqps): ~6ms 300.000 clients (30kqps): ~3ms

for an ISP shared-cache rdns, the speed-of-light delay from a stub to 
its rdns might be on the order of 3ms. for wide-area anycast rdns, even 
at the extraordinary density achieved by google or cisco/opendns, it is 
not. therefore some of the distinctions in your table are without a 
meaningful difference to the 9.9.9.9 thread we're in today.

the reason i originally suggested math rather than measurements is that 
your assertion rests on a theory, and that theory can be modeled. my 
model says that for content i am revisiting often enough for its DNS 
latency to be meaningful to me as an end user, i'll hit the cache, 
unless the TTL's are artificially low (as in the CDN theory you gave.)

your challenge in putting a foundation under your intriguing assertion 
is to show that there DNS content i don't revisit often enough for it to 
be in a non-shared cache, but that i do revisit often enough for it to 
be present in a shared cache of a certain user population (20000 or 
more). that's a very small keyhole.

finally there's the perception problem. most DNS traffic comes from web 
browsers, which are quite parallel in their lookups and resulting HTTP 
fetches. having the long pole in my DNS lookup tent by 60ms in some 
cases rather than 3ms will not be perceptible to me as a human user. so 
if you can find a mode in a common-enough DNS flow where the average DNS 
lookup time really is 60ms, it wouldn't be compelling.

i await further findings.

-- 
P Vixie