[dns-operations] Questions on DNS Flag day 2020 proposal

Ondřej Surý ondrej at sury.org
Mon Jun 17 10:01:59 UTC 2019

Hi Davey,

> On 17 Jun 2019, at 11:04, Davey Song <songlinjian at gmail.com> wrote:
> Hi Ondřej,
> Thanks for your reply. I had to say some of comments I left in previous mails are from other's complains, like the saying “tyranny by the few”. I know it is a little exaggerated. I just take this chance to speak it out for them.
> Reply inline.
> On Mon, 17 Jun 2019 at 15:58, Ondřej Surý <ondrej at sury.org <mailto:ondrej at sury.org>> wrote:
> The next DNS Flag Day can be reduced to simple TL;DR: reduce the default EDNS Buffer Size to a level that doesn’t fragment and then follow the original DNS on how to fallback to TCP.  
> OK. It's fine. I understand the background and agree to make changes on the specification of EDNS Buffer Size. My question is how do we achieve this? Because in rfc6891#section-6.2.5, it says: 
>    A good compromise may be the use of an EDNS maximum 
>   payload size of 4096 octets as a starting point.
>    A requestor MAY choose to implement a fallback to smaller advertised
>    sizes to work around firewall or other network limitations.  A
>    requestor SHOULD choose to use a fallback mechanism that begins with
>    a large size, such as 4096. 

That’s an operational guidance nobody is following as far as I know.  Also you skipped the first part of the paragraphs that says:

>    Due to transaction overhead, it is not recommended to advertise an
>    architectural limit as a maximum UDP payload size.  Even on system
>    stacks capable of reassembling 64 KB datagrams, memory usage at low
>    levels in the system will be a concern.  A good compromise may be the
>    use of an EDNS maximum payload size of 4096 octets as a starting
>    point.

E.g. the good compromise from picking 64k is choosing 4k and going down.

> If we are saying honoring RFC7766(Proposed Standard), should we honoring RFC6891(Internet Standard) in first place. Or Shall we think of updating the section-6.2.5 of RFC6891 before we propose a flag day?  People pay more attention to IETF and honor IETF consensus than a project of Vendors.

I am saying we MUST honour RFC1035 first.

> Again, it seems like you are coming from the position that the next day is about switching to TCP. That has never been the case. DNS over TCP has been integral part of DNS since the day one, this is not anything new.
> Yes it is not new on IETF document just as IPv6 is not new. We both know it and wish we can change over one night. But it is new for the operation reality for at least some registries who serve a huge population. So I suggest we take care to make any decision which may impact them. Or at least we should invite them to this discussion. 
> The switch to TCP will follow the standard DNS protocol, e.g. only when the response over UDP has been truncated (as indicated by the TC bit).  There might be slight increase in TCP connection for responses larger than the default EDNS buffer size - and I personally believe it’s fairly easy to mitigate this if you see this as a problem. Also the important part of it is, that if you are sending responses that fragment now, it’s already broken in parts of eyeball network, especially when IPv6 is involved.
> IP Fragmentation is already broken, insecure and actively harmful to any UDP based protocol including DNS.
> I know it well. And I'm so keen to hear if there is a solution for it.
> I don’t really follow this analogy.  I think you come from wrong assumptions.
> Sorry. I make a wrong analogy. My focus is on the flag day approach... 
> Speaking about “tyranny by the few”...
> I heard it from a feadback. I disagree on that saying as well but it give some people the impression if a big decision is made and has not been well circulated. I think a good example is ICANN's KSK rollover. They took a lot of effort for communication. But for flag day 2019, the communication is not sufficient. It sounded like a direct order. 
> > On 17 Jun 2019, at 08:54, Sam <samwu at dnspod.com <mailto:samwu at dnspod.com>> wrote:
> > 
> > Do you guys asked for any feedbacks from Chinese operators before you pushing the proposal?
> > 
> > I will be the first one to oppose this proposal, even though DNSPod already supported TCP protocol. Any proposals without fully and widely community discussion is a rogue proposal!
> > 
> > And, do you guys know about how many users and facility will be affected in China?
> > 
> > NO, you don't.
> Again, DNS over TCP has always been part of DNS from the day one.  The open-source DNS implementations **had** to implement many workarounds for the other parties that has violated the standards.  The DNS Flag Day initiative is all about restoring the balance and putting the costs where they belong.  If you don’t want to fully follow the existing DNS standard, fine, it’s your choice, but don’t impose the costs for the workarounds to the other parties because you chose to be non-compliant.
> Again. If you guys hope make changes on existing practice suggested by IETF (RFC6891 in this case), I think an initiative is OK to start the communication, but only a initiative project is not enough because it impacts others' business. I suggest we go IETF and get broader consensus on it. Why not?

Writing a BCP document was one of the plans of the initiative.

> Yes, the open source implementation are partly at fault by making those workaround in the first place, and part of taking responsibility for the current status is making a coordinated effort to fix the mess we are currently in.
> The big merit of open source is that you can choose whether to join and contribute to your own best interests.

And the biggest weakness comes from the fact that people rely on work being done by “someone else”.  Or that people usually only contribute stuff that scratch their itch.  For complex protocol such as DNS that many people rely on, this just cannot not work.  All we ask is to other parties to be more vigilant about the existing standards to make our life (as open-source maintainers) easier.  Sure, if I was Google, or Microsoft and had a huge team of people, I can maintain whatever workaround I chose, but I don’t.  I have to carefully select where to put the focus to improve the product, so the code and protocol support does not look as ball composed of glue and duct tape.

> No matter what's your choice, your situation are not becoming worse (penalty design in side of it) by your choice. However, DNS Flag Day "threathen" pepole to join or lose. I don't think it is in an open-source spirit.

There’s also underlying economics of following or not-following standards and where the cost for not following standards should be.  Maintaining a code that does f.e. this:

>    When ``named`` first queries a remote server, it will advertise a UDP
>    buffer size of 512, as this has the greatest chance of success on the
>    first try.
>    If the initial response times out, ``named`` will try again with
>    plain DNS, and if that is successful, it will be taken as evidence
>    that the server does not support EDNS. After enough failures using
>    EDNS and successes using plain DNS, ``named`` will default to plain
>    DNS for future communications with that server. (Periodically,
>    ``named`` will send an EDNS query to see if the situation has
>    improved.)
>    However, if the initial query is successful with EDNS advertising a
>    buffer size of 512, then ``named`` will advertise progressively
>    larger buffer sizes on successive queries, until responses begin
>    timing out or ``edns-udp-size`` is reached.
>    The default buffer sizes used by ``named`` are 512, 1232, 1432, and
>    4096, but never exceeding ``edns-udp-size``. (The values 1232 and
>    1432 are chosen to allow for an IPv4/IPv6 encapsulated UDP message to
>    be sent without fragmentation at the minimum MTU sizes for Ethernet
>    and IPv6 networks.)

comes with the associated costs (maintaining the code, debugging, …)
and also has impact on the whole eco system (debugging what’s happening
on the wire).  DNS also might be the main factor that caused the state
we are in - 37% of fragmented IPv6 just doesn’t work, because nobody
cared as most of the DNS packets fit into non-fragmented traffic.


More information about the dns-operations mailing list