[dns-operations] DNS over TLS: slowly happening

Tue Jun 26 16:18:18 UTC 2018

On 26 Jun 2018, at 6:02, bert hubert wrote:

> Hi Björn,
>
> Good to see someone from the resolver community post on this list! 
> Usually
> most of us are from the (cc)TLD or hosting community.

I’m certain there are more lurkers from the resolver community here 
watching with interest.

> On Tue, Jun 26, 2018 at 11:17:46AM +0000, Hellqvist, Björn wrote:
>> Have anyone done any real research with real-world numbers on the 
>> server side when using DNS-over-TLS?
>
> Somewhat: 
> https://ripe76.ripe.net/presentations/92-RIPE76_DNS_Privacy_measurements.pdf
> and 
> https://ripe76.ripe.net/presentations/95-jonglez-dns-tcp-ripe76.pdf
>
> These numbers are not entirely 'real world' though.
>
>> And what happens during an attack and each client opens up a large 
>> number
>> of new unique connections?  Or if a vendor introduce a bug that does 
>> not
>> reuse the TCP connection and open up a new one each time and not 
>> closing
>> the unused one?
>
> I personally recommend having a proxy do the dnsdist termination, this 
> means
> that at worst the proxy fails. This has also been measured in the
> presentations above.

Bert: Did you mean “dns” instead of “dnsdist” in the above 
sentence?

>> Also how will this work in an ISP Anycast situation?
>
> DNS TCP is routinely anycast and this appears to work very well.

Agreed; we’ve not seen any complaints about broken sessions (we’ve 
been running DTLS since our public launch 11/2017) though of course that 
could be occurring with some regularity but not noticed. It seems that 
most DTLS client implementations perform re-connections aggressively 
enough to disguise any failures due to path shifts. Most anycast paths 
are fairly stable so this is not as big a problem as it might seem. I 
think a more significant problem arises with local distribution of 
sessions across a large array of servers in a single POP, which is 
something that the anycast provider needs to solve and which is 
typically invisible to the client in the path. ECMP seems to be one way 
to do this reliably, and I’m sure there is quite a bit of testing that 
has been done in the 80/443 world to validate this model. To date, this 
has not been an issue for TCP sessions for us in testing or in 
production.

>> Personally I think that such studies should be done before any vendor
>> introduces this functionality.  The study should also take into 
>> account
>> for global DNS providers, ISP DNS providers and maybe enterprise DNS
>> infrastructure.
>
> People from the DNS Privacy Project are doing such measurements. It 
> may
> also be possible to replay existing DNS traffic over DNS over TLS. I 
> agree
> lots of measuring "in the real world" is required.

I’ll speak for Quad9 and say that there is quite a bit more work that 
is needed on what the predicted impact of various client behaviors will 
be on anycast TCP/TLS resolvers, mostly because we don’t know what 
most clients will look like in final form yet (either DTLS or DOH.) 
There will be a significant amount of development of new TCP-based DNS 
clients in the next few years, and I suspect many will not get things 
right on the first try, or will exhibit unfortunate behaviors due to 
developers thinking they’re being clever for the client but in fact 
being detrimental to the server side of the equation.

It might be useful to have a baseline test regimen that could be 
developed as a best practices guideline for client implementations, 
which included both the DNS and TCP components of the stack in the test. 
The DNS-OARC “check my DNS” extended to be used as a resolver, with 
some TLS and TCP timer behaviors might be interesting (not that the OARC 
folks are lacking anything to do…)  As a recursive operator, it’s 
very difficult for me to tell clients “You’re broken!” without 
some third-party site to which to reference them that clearly shows the 
problem. DNSViz and OARC’s CMDNS have been great examples of how this 
is useful for other protocol implementations; it makes implementers or 
remote operators self-service on how to fix themselves.

Our current load volumes on 853 to 9.9.9.9 & 2620:fe::fe are still quite 
small, even in European locations where DTLS seems to be most popular at 
the moment. There is still not enough data here to build a comprehensive 
model that applies to gigabits-per-second of DNS traffic, though 
certainly we’d be interested in working with researchers who believe 
they could make extrapolations based on the existing packet flows. 
Contact me off-list.

>> Although we should aim to privacy, we should not jump in to a 
>> solution
>> where operators actively will disable it due to resource and cost 
>> limits.
>
> I'm afraid that if service providers will not make a move, the 
> browsers of
> their subscribers will, and start prefering the DNS of their vendor or
> preferred partner, like CloudFlare.
>
> You mention disabling things, but DNS over HTTPS is specifically 
> designed to
> be hard to disable.
>
> So the service provider community may not have a lot of choice, unless 
> they
> are fine with third parties taking over their customers DNS (this is a
> common choice in Africa for example).
>
>> For me this kind of sounds like a way to promote Google DNS resolver 
>> than
>> thinking for all other potential problematic scenarios that can 
>> happen
>> when this is introduced.
>
> You may well be right. If this is an outcome we like or not is open to
> discussion...

The assumption of DNS being performed by the client’s transport 
operator (or even visible to the transport operator) is not a safe bet 
at this point under any circumstances, not today and certainly not in 
the future, regardless of transport protocol or encapsulation. There are 
many drivers for this disconnection, with some reasons more comforting 
than others, but it is clearly happening.

JT