[dns-operations] anycast ops willing to review a IETF draft?
Klaus Darilion
klaus.mailinglists at pernau.at
Wed Mar 27 10:24:58 UTC 2019
Hi Giovane!
Am 27.03.2019 um 10:35 schrieb Giovane Moura:
> Thanks for your feedback, Klaus. It sure helps us to get a better draft.
>
> I'll cover all comments here except for those related to Vries17b
> (Verfploeter), since I was not involved in that study. But one of my
> draft co-authors was in fact involved, and he will address that in a
> subsequent email.
>
> Comments inline:
>>> 2. R1: Use equaly strong IP anycast in every authoritative server to
>>> achieve even load distribution
>>
>> I do not understand this recommendation. Do you mean:
>>
>> a) The service providing ns1.foobar should be as strong as the service
>> providing ns2.foobar
>>
>> or
>>
>> b) All anycast sites providing ns1.foobar should be equaly strong.
>>
>> I second a) bot not b).
>
> Thanks for pointing this; another reviewer had also poitned this out,
> and I've opened a issue about that:
> https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/5
>
>
>> "to achieve even load distribution"
>>
>> The load distribution is done by the resolvers, hence can not be
>> influenced by the authoritative servers.
>
> Resolvers get to choose which of the NS they will use; however, an AUTH
> OPS can influence resolver's choices by engineering NSes that deliver
> similar latency values.
>
> For example, if NS1, NS2 and NS3 deliver median RTT like 100ms to
> resolver Y, resolver Y will evenly distribute its queries.
> See Mueller17b.
Indeed. I think I ment due to the "Use equaly strong". How is "strong"
defined? I would relate "strong" with number of servers, CPU power. And
this does not influnce resolver behavior. If "strong" relates to "small
RTT" that you can achieve load distribution.
>>> DNS heavily relies upon replication to support high reliability,
>>> capacity and to reduce latency [Moura16b]. DNS has two complementary
>>> mechanisms to replicate the service. First, the protocol itself
>>> supports nameserver replication of DNS service for a DNS zone through
>>> the use of multiple nameservers that each operate on different IP
>>> addresses, listed by a zone's NS records. Second, each of these
>>> network addresses can run from multiple physical locations through
>>> the use of IP anycast[RFC1546][RFC4786][RFC7094], by announcing the
>>> same IP address from each instance and allowing Internet routing
>>> (BGP[RFC4271]) to associate clients with their topologically nearest
>>> anycast instance. Outside the DNS protocol, replication can be
>>> achieved by deploying load balancers at each physical location.
>>> Nameserver replication is recommended for all zones (multiple NS
>>> records), and IP anycast is used by most large zones such as the DNS
>>> Root, most top-level domains[Moura16b] and large commercial
>>> enterprises, governments and other organizations.
>>
>> The term replication is usually used for data replication. In the above
>> sentence, if I got it right, you mean service replication. That should
>> be stated explicitely, ie "DNS heavily relies on service replication to
>> ..."
>
> Disagree. Replication is not necessary only to data. It can be, like
> caching can be seen as 'ephemeral data replication'. But DNS relies on
> IP/NS/ replication, besides caching.
>
>
>
>
>> "Can" is wrong. I would rephase like: "Routing Matters More Than Locations"
>
> Can is not wrong. While I agree with you that *most* cases that'd be
> true, omitting "can" would imply that *all* cases are true, which is
> something we did not investigate in the paper.
>
> Just scientific precision.
Just practical experience ;-)
Klaus
>>> [Schmidt17a] found that C-Root, a smaller anycast deployment
>>> consisting of only 8 instances (they refer to anycast instance as
>>> anycast site), provided a very similar overall performance than that
>>> of the much larger deployments of K and L, with 33 and 144 instances
>>> respectively. The median RTT for C, K and L Root was between
>>> 30-32ms.
>>
>> I think, this is suprising for "DNS guys" but not for network guys. I
>> think most anycast DNS networks where started by "DNS guys" without
>> deeper knowledge of Internet routing.
>
> Right. This is a draft we submitted to DNSOP. so it's informative to them.
>
>
>>> [Schmidt17a] recommendation for DNS operators when engineering
>>> anycast services is consider factors other than just the number of
>>> instances (such as local routing connectivity) when designing for
>>> performance. They showed that 12 instances can provide reasonable
>>> latency, given they are globally distributed and have good local
>>> interconnectivity. However, more instances can be useful for other
>>> reasons, such as when handling DDoS attacks [Moura16b].
>>
>> So, now you told us that "consider factors other than just the number of
>> instances (such as local routing connectivity)", but were are the
>> tangible recommendations? How shall the "local routing connectivity"
>> look like to be a good "local routing connectivity"? That is missing.
>>
>> Some practical hints about network (ie. homogeneus transits, IX
>> connectivity, identical as-path ...) would be more useful - success is
>> seen by C-root.
>
> Good point. Issue opened on
> https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/7
>
>>
>>> 4. R3: Collecting Detailed Anycast Catchment Maps Ahead of Actual
>>> Deployment Can Improve Engineering Designs
>
> As I said, I'll let my co-authors address Verflploeter related comments.
>>> 5. R4: When under stress, employ two strategies
>>>
>>> DDoS attacks are becoming bigger, cheaper, and more frequent
>>> [Moura16b]. The most powerful recorded DDoS attack to DNS servers to
>>> date reached 1.2 Tbps, by using IoT devices [Perlroth16]. Such
>>> attacks call for an answer for the following question: how should a
>>> DNS operator engineer its anycast authoritative DNS server react to
>>> the stress of a DDoS attack? This question is investigated in study
>>> [Moura16b] in which empirical observations are grounded with the
>>> following theoretical evaluation of options.
>>>
>>> An authoritative DNS server deployed using anycast will have many
>>> server instances distributed over many networks and instances.
>>> Ultimately, the relationship between the DNS provider's network and a
>>> client's ISP will determine which anycast instance will answer
>>> queries for a given client.
>>
>> If the possible relationships and there routing consequences would be
>> described, that would help DNS operators in planning.
>
> So what we refer here is the BGP relationship between the two networks.
> BGP determines which site will server which resolver. Will add a sen on
> that, clarify it.
>
> Issue open on
> https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/8
>
>
>>> [Moura16b] speculates that more careful, explicit, and automated
>>> management of policies may provide stronger defenses to overload, an
>>> area currently under study. For DNS operators, that means that
>>> besides traditional filtering, two other options are available
>>> (withdraw/prepend/communities or isolate instances), and the best
>>> choice depends on the specifics of the attack.
>> Null routing (BGP blackholing) can also be applied to NOT move the
>> attack to other sites but avoid collateral damage.
>
> Thanks.
> Issue opened on
> https://github.com/gmmoura/draft-moura-dnsop-authoritative-recommendations/issues/9
>
>
>>>
>>> Therefore, given the important role of the TTL on user's experience
>>> during a DDoS attack (and in reducing ''friendly fire''), it is
>>> recommended that DNS zone owners set their TTL values carefully,
>>> using reasonable TTL values (at least 1 hour) whenever possible,
>>> given its role in DNS resilience against DDoS attacks. However, the
>>> choice of the value depends on the specifics of each operator (CDNs
>>> are known for using TTL values in the range of few minutes). The
>>> drawback of setting larger TTL values is that changes on the
>>> authoritative system infrastructure (e.g.: adding a new authoritative
>>> server or changing IP address) will take at least as long as the TTL
>>> to propagate among clients.
>>
>> I think it is also useful to avoid dependencies on other zones. IE.
>> using in bailiwick name servers reduces dependiencies on other zones and
>> the parent zone server glue records avoiding additional lookups.
>
> We did not actually cover that in the research papers, that would be a
> different study to address all the consequences of this.
>
>
> thanks,
>
> /giovane
>
More information about the dns-operations
mailing list