[dns-operations] anycast ops willing to review a IETF draft?

Wed Mar 27 09:35:40 UTC 2019

Thanks for your feedback, Klaus. It sure helps us to get a better draft.

I'll cover all comments here except for those related to Vries17b
(Verfploeter),  since I was not involved in that study. But one of my
draft co-authors was in fact involved, and he  will address that in a
subsequent  email.

Comments inline:
>> 2.  R1: Use equaly strong IP anycast in every authoritative server to
>>     achieve even load distribution
> 
> I do not understand this recommendation. Do you mean:
> 
> a) The service providing ns1.foobar should be as strong as the service
> providing ns2.foobar
> 
> or
> 
> b) All anycast sites providing ns1.foobar should be equaly strong.
> 
> I second a) bot not b).

Thanks for pointing this; another reviewer had also poitned this out,
and I've opened a issue about that:
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/5

> "to achieve even load distribution"
> 
> The load distribution is done by the resolvers, hence can not be
> influenced by the authoritative servers.

Resolvers get to choose which of the NS  they will use; however, an AUTH
OPS can influence resolver's choices by engineering NSes that deliver
similar latency values.

For example, if NS1, NS2 and NS3 deliver median RTT like 100ms to
resolver Y, resolver Y will evenly distribute its queries.
See Mueller17b.

>>
>>    DNS heavily relies upon replication to support high reliability,
>>    capacity and to reduce latency [Moura16b].  DNS has two complementary
>>    mechanisms to replicate the service.  First, the protocol itself
>>    supports nameserver replication of DNS service for a DNS zone through
>>    the use of multiple nameservers that each operate on different IP
>>    addresses, listed by a zone's NS records.  Second, each of these
>>    network addresses can run from multiple physical locations through
>>    the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
>>    same IP address from each instance and allowing Internet routing
>>    (BGP[RFC4271]) to associate clients with their topologically nearest
>>    anycast instance.  Outside the DNS protocol, replication can be
>>    achieved by deploying load balancers at each physical location.
>>    Nameserver replication is recommended for all zones (multiple NS
>>    records), and IP anycast is used by most large zones such as the DNS
>>    Root, most top-level domains[Moura16b] and large commercial
>>    enterprises, governments and other organizations.
> 
> The term replication is usually used for data replication. In the above
> sentence, if I got it right, you mean service replication. That should
> be stated explicitely, ie "DNS heavily relies on service replication to
> ..."

Disagree. Replication is not necessary only to data. It can be, like
caching can be seen as 'ephemeral data replication'. But DNS relies on
IP/NS/ replication, besides caching.

> "Can" is wrong. I would rephase like: "Routing Matters More Than Locations"

Can is not wrong.  While I agree with you that *most* cases that'd be
true, omitting "can" would imply that *all* cases are true, which is
something we did not investigate in the paper.

Just scientific precision.

>>    [Schmidt17a] found that C-Root, a smaller anycast deployment
>>    consisting of only 8 instances (they refer to anycast instance as
>>    anycast site), provided a very similar overall performance than that
>>    of the much larger deployments of K and L, with 33 and 144 instances
>>    respectively.  The median RTT for C, K and L Root was between
>>    30-32ms.
> 
> I think, this is suprising for "DNS guys" but not for network guys. I
> think most anycast DNS networks where started by "DNS guys" without
> deeper knowledge of Internet routing.

Right. This is a draft we submitted to DNSOP. so it's informative to them.

>>    [Schmidt17a] recommendation for DNS operators when engineering
>>    anycast services is consider factors other than just the number of
>>    instances (such as local routing connectivity) when designing for
>>    performance.  They showed that 12 instances can provide reasonable
>>    latency, given they are globally distributed and have good local
>>    interconnectivity.  However, more instances can be useful for other
>>    reasons, such as when handling DDoS attacks [Moura16b].
> 
> So, now you told us that "consider factors other than just the number of
> instances (such as local routing connectivity)", but were are the
> tangible recommendations? How shall the "local routing connectivity"
> look like to be a good "local routing connectivity"? That is missing.
> 
> Some practical hints about network (ie. homogeneus transits, IX
> connectivity, identical as-path ...) would be more useful - success is
> seen by C-root.

Good point. Issue opened on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/7

> 
>> 4.  R3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
>>     Deployment Can Improve Engineering Designs

As I said, I'll let my co-authors address Verflploeter related comments.
>> 5.  R4: When under stress, employ two strategies
>>
>>    DDoS attacks are becoming bigger, cheaper, and more frequent
>>    [Moura16b].  The most powerful recorded DDoS attack to DNS servers to
>>    date reached 1.2 Tbps, by using IoT devices [Perlroth16].  Such
>>    attacks call for an answer for the following question: how should a
>>    DNS operator engineer its anycast authoritative DNS server react to
>>    the stress of a DDoS attack?  This question is investigated in study
>>    [Moura16b] in which empirical observations are grounded with the
>>    following theoretical evaluation of options.
>>
>>    An authoritative DNS server deployed using anycast will have many
>>    server instances distributed over many networks and instances.
>>    Ultimately, the relationship between the DNS provider's network and a
>>    client's ISP will determine which anycast instance will answer
>>    queries for a given client.
> 
> If the possible relationships and there routing consequences would be
> described, that would help DNS operators in planning.

So what we refer here is the BGP relationship between the two networks.
BGP determines which site will server which resolver. Will add a sen on
that, clarify it.

Issue open  on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/8

>>    [Moura16b] speculates that more careful, explicit, and automated
>>    management of policies may provide stronger defenses to overload, an
>>    area currently under study.  For DNS operators, that means that
>>    besides traditional filtering, two other options are available
>>    (withdraw/prepend/communities or isolate instances), and the best
>>    choice depends on the specifics of the attack.
> Null routing (BGP blackholing) can also be applied to NOT move the
> attack to other sites but avoid collateral damage.

Thanks.
Issue opened on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/9

>>
>>    Therefore, given the important role of the TTL on user's experience
>>    during a DDoS attack (and in reducing ''friendly fire''), it is
>>    recommended that DNS zone owners set their TTL values carefully,
>>    using reasonable TTL values (at least 1 hour) whenever possible,
>>    given its role in DNS resilience against DDoS attacks.  However, the
>>    choice of the value depends on the specifics of each operator (CDNs
>>    are known for using TTL values in the range of few minutes).  The
>>    drawback of setting larger TTL values is that changes on the
>>    authoritative system infrastructure (e.g.: adding a new authoritative
>>    server or changing IP address) will take at least as long as the TTL
>>    to propagate among clients.
>
> I think it is also useful to avoid dependencies on other zones. IE.
> using in bailiwick name servers reduces dependiencies on other zones and
> the parent zone server glue records avoiding additional lookups.

We did not actually cover that in the research papers, that would be a
different study to address all the consequences of this.

thanks,

/giovane