[dns-operations] anycast ops willing to review a IETF draft?
Giovane Moura
giovane.moura at sidn.nl
Wed Mar 27 09:35:40 UTC 2019
Thanks for your feedback, Klaus. It sure helps us to get a better draft.
I'll cover all comments here except for those related to Vries17b
(Verfploeter), since I was not involved in that study. But one of my
draft co-authors was in fact involved, and he will address that in a
subsequent email.
Comments inline:
>> 2. R1: Use equaly strong IP anycast in every authoritative server to
>> achieve even load distribution
>
> I do not understand this recommendation. Do you mean:
>
> a) The service providing ns1.foobar should be as strong as the service
> providing ns2.foobar
>
> or
>
> b) All anycast sites providing ns1.foobar should be equaly strong.
>
> I second a) bot not b).
Thanks for pointing this; another reviewer had also poitned this out,
and I've opened a issue about that:
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/5
> "to achieve even load distribution"
>
> The load distribution is done by the resolvers, hence can not be
> influenced by the authoritative servers.
Resolvers get to choose which of the NS they will use; however, an AUTH
OPS can influence resolver's choices by engineering NSes that deliver
similar latency values.
For example, if NS1, NS2 and NS3 deliver median RTT like 100ms to
resolver Y, resolver Y will evenly distribute its queries.
See Mueller17b.
>>
>> DNS heavily relies upon replication to support high reliability,
>> capacity and to reduce latency [Moura16b]. DNS has two complementary
>> mechanisms to replicate the service. First, the protocol itself
>> supports nameserver replication of DNS service for a DNS zone through
>> the use of multiple nameservers that each operate on different IP
>> addresses, listed by a zone's NS records. Second, each of these
>> network addresses can run from multiple physical locations through
>> the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
>> same IP address from each instance and allowing Internet routing
>> (BGP[RFC4271]) to associate clients with their topologically nearest
>> anycast instance. Outside the DNS protocol, replication can be
>> achieved by deploying load balancers at each physical location.
>> Nameserver replication is recommended for all zones (multiple NS
>> records), and IP anycast is used by most large zones such as the DNS
>> Root, most top-level domains[Moura16b] and large commercial
>> enterprises, governments and other organizations.
>
> The term replication is usually used for data replication. In the above
> sentence, if I got it right, you mean service replication. That should
> be stated explicitely, ie "DNS heavily relies on service replication to
> ..."
Disagree. Replication is not necessary only to data. It can be, like
caching can be seen as 'ephemeral data replication'. But DNS relies on
IP/NS/ replication, besides caching.
> "Can" is wrong. I would rephase like: "Routing Matters More Than Locations"
Can is not wrong. While I agree with you that *most* cases that'd be
true, omitting "can" would imply that *all* cases are true, which is
something we did not investigate in the paper.
Just scientific precision.
>> [Schmidt17a] found that C-Root, a smaller anycast deployment
>> consisting of only 8 instances (they refer to anycast instance as
>> anycast site), provided a very similar overall performance than that
>> of the much larger deployments of K and L, with 33 and 144 instances
>> respectively. The median RTT for C, K and L Root was between
>> 30-32ms.
>
> I think, this is suprising for "DNS guys" but not for network guys. I
> think most anycast DNS networks where started by "DNS guys" without
> deeper knowledge of Internet routing.
Right. This is a draft we submitted to DNSOP. so it's informative to them.
>> [Schmidt17a] recommendation for DNS operators when engineering
>> anycast services is consider factors other than just the number of
>> instances (such as local routing connectivity) when designing for
>> performance. They showed that 12 instances can provide reasonable
>> latency, given they are globally distributed and have good local
>> interconnectivity. However, more instances can be useful for other
>> reasons, such as when handling DDoS attacks [Moura16b].
>
> So, now you told us that "consider factors other than just the number of
> instances (such as local routing connectivity)", but were are the
> tangible recommendations? How shall the "local routing connectivity"
> look like to be a good "local routing connectivity"? That is missing.
>
> Some practical hints about network (ie. homogeneus transits, IX
> connectivity, identical as-path ...) would be more useful - success is
> seen by C-root.
Good point. Issue opened on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/7
>
>> 4. R3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
>> Deployment Can Improve Engineering Designs
As I said, I'll let my co-authors address Verflploeter related comments.
>> 5. R4: When under stress, employ two strategies
>>
>> DDoS attacks are becoming bigger, cheaper, and more frequent
>> [Moura16b]. The most powerful recorded DDoS attack to DNS servers to
>> date reached 1.2 Tbps, by using IoT devices [Perlroth16]. Such
>> attacks call for an answer for the following question: how should a
>> DNS operator engineer its anycast authoritative DNS server react to
>> the stress of a DDoS attack? This question is investigated in study
>> [Moura16b] in which empirical observations are grounded with the
>> following theoretical evaluation of options.
>>
>> An authoritative DNS server deployed using anycast will have many
>> server instances distributed over many networks and instances.
>> Ultimately, the relationship between the DNS provider's network and a
>> client's ISP will determine which anycast instance will answer
>> queries for a given client.
>
> If the possible relationships and there routing consequences would be
> described, that would help DNS operators in planning.
So what we refer here is the BGP relationship between the two networks.
BGP determines which site will server which resolver. Will add a sen on
that, clarify it.
Issue open on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/8
>> [Moura16b] speculates that more careful, explicit, and automated
>> management of policies may provide stronger defenses to overload, an
>> area currently under study. For DNS operators, that means that
>> besides traditional filtering, two other options are available
>> (withdraw/prepend/communities or isolate instances), and the best
>> choice depends on the specifics of the attack.
> Null routing (BGP blackholing) can also be applied to NOT move the
> attack to other sites but avoid collateral damage.
Thanks.
Issue opened on
https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/9
>>
>> Therefore, given the important role of the TTL on user's experience
>> during a DDoS attack (and in reducing ''friendly fire''), it is
>> recommended that DNS zone owners set their TTL values carefully,
>> using reasonable TTL values (at least 1 hour) whenever possible,
>> given its role in DNS resilience against DDoS attacks. However, the
>> choice of the value depends on the specifics of each operator (CDNs
>> are known for using TTL values in the range of few minutes). The
>> drawback of setting larger TTL values is that changes on the
>> authoritative system infrastructure (e.g.: adding a new authoritative
>> server or changing IP address) will take at least as long as the TTL
>> to propagate among clients.
>
> I think it is also useful to avoid dependencies on other zones. IE.
> using in bailiwick name servers reduces dependiencies on other zones and
> the parent zone server glue records avoiding additional lookups.
We did not actually cover that in the research papers, that would be a
different study to address all the consequences of this.
thanks,
/giovane
More information about the dns-operations
mailing list