[dns-operations] anycast ops willing to review a IETF draft?
giovane.moura at sidn.nl
Wed Mar 27 09:35:40 UTC 2019
Thanks for your feedback, Klaus. It sure helps us to get a better draft.
I'll cover all comments here except for those related to Vries17b
(Verfploeter), since I was not involved in that study. But one of my
draft co-authors was in fact involved, and he will address that in a
>> 2. R1: Use equaly strong IP anycast in every authoritative server to
>> achieve even load distribution
> I do not understand this recommendation. Do you mean:
> a) The service providing ns1.foobar should be as strong as the service
> providing ns2.foobar
> b) All anycast sites providing ns1.foobar should be equaly strong.
> I second a) bot not b).
Thanks for pointing this; another reviewer had also poitned this out,
and I've opened a issue about that:
> "to achieve even load distribution"
> The load distribution is done by the resolvers, hence can not be
> influenced by the authoritative servers.
Resolvers get to choose which of the NS they will use; however, an AUTH
OPS can influence resolver's choices by engineering NSes that deliver
similar latency values.
For example, if NS1, NS2 and NS3 deliver median RTT like 100ms to
resolver Y, resolver Y will evenly distribute its queries.
>> DNS heavily relies upon replication to support high reliability,
>> capacity and to reduce latency [Moura16b]. DNS has two complementary
>> mechanisms to replicate the service. First, the protocol itself
>> supports nameserver replication of DNS service for a DNS zone through
>> the use of multiple nameservers that each operate on different IP
>> addresses, listed by a zone's NS records. Second, each of these
>> network addresses can run from multiple physical locations through
>> the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
>> same IP address from each instance and allowing Internet routing
>> (BGP[RFC4271]) to associate clients with their topologically nearest
>> anycast instance. Outside the DNS protocol, replication can be
>> achieved by deploying load balancers at each physical location.
>> Nameserver replication is recommended for all zones (multiple NS
>> records), and IP anycast is used by most large zones such as the DNS
>> Root, most top-level domains[Moura16b] and large commercial
>> enterprises, governments and other organizations.
> The term replication is usually used for data replication. In the above
> sentence, if I got it right, you mean service replication. That should
> be stated explicitely, ie "DNS heavily relies on service replication to
Disagree. Replication is not necessary only to data. It can be, like
caching can be seen as 'ephemeral data replication'. But DNS relies on
IP/NS/ replication, besides caching.
> "Can" is wrong. I would rephase like: "Routing Matters More Than Locations"
Can is not wrong. While I agree with you that *most* cases that'd be
true, omitting "can" would imply that *all* cases are true, which is
something we did not investigate in the paper.
Just scientific precision.
>> [Schmidt17a] found that C-Root, a smaller anycast deployment
>> consisting of only 8 instances (they refer to anycast instance as
>> anycast site), provided a very similar overall performance than that
>> of the much larger deployments of K and L, with 33 and 144 instances
>> respectively. The median RTT for C, K and L Root was between
> I think, this is suprising for "DNS guys" but not for network guys. I
> think most anycast DNS networks where started by "DNS guys" without
> deeper knowledge of Internet routing.
Right. This is a draft we submitted to DNSOP. so it's informative to them.
>> [Schmidt17a] recommendation for DNS operators when engineering
>> anycast services is consider factors other than just the number of
>> instances (such as local routing connectivity) when designing for
>> performance. They showed that 12 instances can provide reasonable
>> latency, given they are globally distributed and have good local
>> interconnectivity. However, more instances can be useful for other
>> reasons, such as when handling DDoS attacks [Moura16b].
> So, now you told us that "consider factors other than just the number of
> instances (such as local routing connectivity)", but were are the
> tangible recommendations? How shall the "local routing connectivity"
> look like to be a good "local routing connectivity"? That is missing.
> Some practical hints about network (ie. homogeneus transits, IX
> connectivity, identical as-path ...) would be more useful - success is
> seen by C-root.
Good point. Issue opened on
>> 4. R3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
>> Deployment Can Improve Engineering Designs
As I said, I'll let my co-authors address Verflploeter related comments.
>> 5. R4: When under stress, employ two strategies
>> DDoS attacks are becoming bigger, cheaper, and more frequent
>> [Moura16b]. The most powerful recorded DDoS attack to DNS servers to
>> date reached 1.2 Tbps, by using IoT devices [Perlroth16]. Such
>> attacks call for an answer for the following question: how should a
>> DNS operator engineer its anycast authoritative DNS server react to
>> the stress of a DDoS attack? This question is investigated in study
>> [Moura16b] in which empirical observations are grounded with the
>> following theoretical evaluation of options.
>> An authoritative DNS server deployed using anycast will have many
>> server instances distributed over many networks and instances.
>> Ultimately, the relationship between the DNS provider's network and a
>> client's ISP will determine which anycast instance will answer
>> queries for a given client.
> If the possible relationships and there routing consequences would be
> described, that would help DNS operators in planning.
So what we refer here is the BGP relationship between the two networks.
BGP determines which site will server which resolver. Will add a sen on
that, clarify it.
Issue open on
>> [Moura16b] speculates that more careful, explicit, and automated
>> management of policies may provide stronger defenses to overload, an
>> area currently under study. For DNS operators, that means that
>> besides traditional filtering, two other options are available
>> (withdraw/prepend/communities or isolate instances), and the best
>> choice depends on the specifics of the attack.
> Null routing (BGP blackholing) can also be applied to NOT move the
> attack to other sites but avoid collateral damage.
Issue opened on
>> Therefore, given the important role of the TTL on user's experience
>> during a DDoS attack (and in reducing ''friendly fire''), it is
>> recommended that DNS zone owners set their TTL values carefully,
>> using reasonable TTL values (at least 1 hour) whenever possible,
>> given its role in DNS resilience against DDoS attacks. However, the
>> choice of the value depends on the specifics of each operator (CDNs
>> are known for using TTL values in the range of few minutes). The
>> drawback of setting larger TTL values is that changes on the
>> authoritative system infrastructure (e.g.: adding a new authoritative
>> server or changing IP address) will take at least as long as the TTL
>> to propagate among clients.
> I think it is also useful to avoid dependencies on other zones. IE.
> using in bailiwick name servers reduces dependiencies on other zones and
> the parent zone server glue records avoiding additional lookups.
We did not actually cover that in the research papers, that would be a
different study to address all the consequences of this.
More information about the dns-operations