[dns-operations] For darpa.mil, EDNS buffer == 1232 is *too small*. :-(
brian.peter.dickson at gmail.com
Tue Apr 21 08:52:24 UTC 2020
On Tue, Apr 21, 2020 at 1:04 AM Petr Špaček <petr.spacek at nic.cz> wrote:
> On 21. 04. 20 9:00, Paul Vixie wrote:
> > On Tuesday, 21 April 2020 06:20:04 UTC Petr Špaček wrote:
> Unfortunatelly I can't, we never got to the root cause.
> It is the same story again and again:
> Probably one of ISPs in chain on the affected link was doing weird stuff
> with big packets. We as Knot Resolver developers were not "their customer"
> but merely "supplier of their customer" so they refused to talk to us, and
> their actual customer lost interest as soon as it started to work reliably
> for them. That's all we have, i.e. nothing.
I'm confused by the use of the definite and indefinite, which are in
..."the root cause" suggests a single instance that was being investigated.
..."again and again" suggests multiple occurrences.
If it was multiples, was it all involving a single network, perhaps?
Or can you clarify if this was only a single instance of this happening?
Can you share any other diagnostic information or observations?
Was there fragmentation, or just packet loss above a certain size?
(Fragmentation would suggest smaller-than-expected MTU or perhaps tunnels,
while packet loss would suggest possible MTU mismatch on a single link.)
Understanding whether this was operator error by an ISP, versus some other
non-error situation with real MTU below 1400, is important.
This all has very real and very serious consequences to the entirety of the
> No, I did mean "would":
> - OpenDNS's experience says that in data centers 1410 works.
> - Our experience says that outside of data centers 1410 does not always
Are there additional instances where 1410 did not work, or are you using
that single instance to support the "does not always work" position?
> Let's be precise here. The proposal on the table is to change _default
> values in configuration_.
> Nobody is proposing to impose "arbitrary maximum response buffer size" and
> weld it onto DNS software. Vendors are simply looking for defaults which
> work for them and their customer/user base.
Actually, it is both.
Just as a reminder: UDP responses will be sent with a not-to-exceed size of
MIN(configured authority max, requestor's max from EDNS0 UDP_BUFSIZE).
If the client has a smaller value than the server, that is the maximum size
of a response that the server will send.
If that value is too small, it has consequences.
If that value is used for that client, to all servers, it has consequences
for all servers traffic to that client.
If that value comes from default, and is used on the vast majority of that
packages' operators, that affects traffic from all servers to all of those
If that package represents a large portion of the traffic from resolvers,
that's a big deal.
A significant shift in the amount of TCP traffic could occur due to TC=1
responses, with a non-linear relationship between the apparent decrease in
MTU, and the amount of TCP traffic.
Particularly with large RSA key sizes and large signatures, and a large
proportion of DNSSEC traffic, the impact could be severe.
If a DNS authority operator were to begin providing DNSSEC for their
customer base, DNSSEC deployment could jump from 1%-2% to 40% overnight.
(Hint: at least one major DNS hosting provider has strongly suggested this
is likely to occur quite soon.)
And a 5% to 10% decrease in actual MTU (offered by clients in EDNS), the
proportion of TCP traffic could triple or worse.
(This adversely affects both clients and servers. Plus, anything that
adversely affects authority servers affects all resolvers, not just those
small MTU resolvers, i.e. can/will cause collateral damage.)
If 1410 is actually reasonable, we should not shy away from using that
The question is whether the oddball networks showing up are statistically
significant, and whether those should dictate the global DNS consensus
value for a minimum of "maximum MTU" for default.
So, pushback on asking for more details, particularly with regard to
whether the observed situation(s) are ongoing, as well as some sense of the
prevalence or frequency of this, is entirely appropriate.
Please share as much as you know, particularly if this is a solved/resolved
problem that affected one customer only (even if root cause was never
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dns-operations