[dns-operations] UDP fragmentation while not needed/wanted DS www.veilingzaalmelase.be
Joao Luis Silva Damas
joao at bondis.org
Fri Mar 26 08:39:20 UTC 2021
> On 25 Mar 2021, at 23:36, Paul Vixie <paul at redbarn.org> wrote:
> On Wed, Mar 24, 2021 at 11:25:48PM +0100, Florian Weimer wrote:
>> Proactive fragmentation irrespective of path MTU is required for
>> stateless IPv6 UDP services. Unlike IPv4, the network does not
>> fragment packets. So a UDP service has to conservatively fragment
>> around 1200 or so bytes (to account for tunnel overhead). ...
> 1280 subtracts several hundred bytes from 1500 to allow for tunnel
Actually it doesn’t. The IETF arrived at this number from the other side (small values for MTU). See mail at bottom to see just what assumptions went into the parameters (1280=1025+256 because IPv4 was 512+64 and IPv6 needed to reduce the header overhead below that of IPv4. please…, 9600bps...)
> 1232 subtracts dozens more to allow for IP and UDP headers
> and options.
That is does, unnecessarily. There is, and has been for a while, data out there showing bigger numbers are OK (for instance, Google published how they got to the size they use on QUIC packets based on traffic they observed arriving at their network, a fairly decent piece of data).
> please, nobody, subtract even more. if everyone subtracts a fudge
> factor, forever, we'll eventually run out of payload space, or worse,
> go negative.
> Paul Vixie
> dns-operations mailing list
> dns-operations at lists.dns-oarc.net
From owner-ipng at sunroof.eng.sun.com <mailto:owner-ipng at sunroof.eng.sun.com> Thu Nov 13 16:41:01 1997
Received: (from majordomo at localhost)
by sunroof.eng.sun.com <http://sunroof.eng.sun.com/> (8.8.8+Sun.Beta.0/8.8.8) id QAA14339
for ipng-dist; Thu, 13 Nov 1997 16:38:00 -0800 (PST)
Received: from Eng.Sun.COM <http://eng.sun.com/> (engmail1 [220.127.116.11])
by sunroof.eng.sun.com <http://sunroof.eng.sun.com/> (8.8.8+Sun.Beta.0/8.8.8) with SMTP id QAA14332
for <ipng at sunroof>; Thu, 13 Nov 1997 16:37:51 -0800 (PST)
Received: from saturn.sun.com <http://saturn.sun.com/> (saturn.EBay.Sun.COM <http://saturn.ebay.sun.com/> [18.104.22.168])
by Eng.Sun.COM <http://eng.sun.com/> (SMI-8.6/SMI-5.3) with SMTP id QAA28654
for <ipng at sunroof.Eng.Sun.COM <mailto:ipng at sunroof.Eng.Sun.COM>>; Thu, 13 Nov 1997 16:37:48 -0800
Received: from postoffice.cisco.com <http://postoffice.cisco.com/> (postoffice.cisco.com <http://postoffice.cisco.com/> [22.214.171.124])
by saturn.sun.com <http://saturn.sun.com/> (8.8.8/8.8.8) with ESMTP id QAA28706
for <ipng at sunroof.Eng.Sun.COM <mailto:ipng at sunroof.Eng.Sun.COM>>; Thu, 13 Nov 1997 16:37:49 -0800 (PST)
Received: from [126.96.36.199] (deering-mac.cisco.com <http://deering-mac.cisco.com/> [188.8.131.52]) by postoffice.cisco.com <http://postoffice.cisco.com/> (8.8.5-Cisco.1/8.6.5) with ESMTP id QAA20862; Thu, 13 Nov 1997 16:37:48 -0800 (PST)
X-Sender: deering at postoffice.cisco.com <mailto:deering at postoffice.cisco.com>
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 13 Nov 1997 16:37:00 -0800
To: IPng Working Group <ipng at sunroof.eng.sun.com <mailto:ipng at sunroof.eng.sun.com>>
From: Steve Deering <deering at cisco.com <mailto:deering at cisco.com>>
Subject: (IPng 4802) increasing the IPv6 minimum MTU
Cc: hinden at ipsilon.com <mailto:hinden at ipsilon.com>
Sender: owner-ipng at eng.sun.com <mailto:owner-ipng at eng.sun.com>
In the ipngwg meeting in Munich, I proposed increasing the IPv6 minimum MTU
from 576 bytes to something closer to the Ethernet MTU of 1500 bytes, (i.e.,
1500 minus room for a couple layers of encapsulating headers, so that min-
MTU-size packets that are tunneled across 1500-byte-MTU paths won't be
subject to fragmentation/reassembly on ingress/egress from the tunnels,
in most cases).
After the short discussion in the Munich meeting, I called for a show of
hands, and of those who raised their hands (about half the attendees, if
I recall correctly), the vast majority were in favor of this change --
there were only two or three people opposed. However, we recognized that
a fundamental change of this nature requires thoughtful discussion and
analysis on the mailing list, to allow those who were not at the meeting
and those who were there but who have since had second thoughts, to express
their opinions. A couple of people have already, in private conversation,
raised some concerns that were not identified in the discussion at the
meeting, which I report below. We would like to get this issue settled as
soon as possible, since this is the only thing holding up the publication
of the updated Proposed Standard IPv6 spec (the version we expect to advance
to Draft Standard), so let's see if we can come to a decision before the ID
deadline at the end of next week (hoping there isn't any conflict between
"thoughtful analysis" and "let's decide quickly" :-).
The reason I would like to increase the minimum MTU is that there are some
applications for which Path MTU Discovery just won't work very well, and
which will therefore limit themselves to sending packets no larger than
the minimum MTU. Increasing the minimum MTU would improve the bandwidth
efficiency, i.e., reduce the header overhead (ratio of header bytes to
payload bytes), for those applications. Some examples of such applications
(1) Large-fanout, high-volume multicast apps, such as multicast video
("Internet TV"), multicast netnews, and multicast software
distribution. I believe these applications will end up limiting
themselves to packets no large than the min MTU in order to avoid
the danger of incurring an "implosion" of ICMP Packet-Too-Big
messages in response. Even though we have specified that router
implementations must carefully rate-limit the emission of ICMP
error messages, I am nervous about how well this will work in
practice, especially once there is a lot of high-speed, bulk
multicasting happening. An appropriate choice of rate or
probability of emission of Packet-Too-Big responses to multicasts
really depends on the fan-out of the multicast trees and the MTUs of
all the branches in that tree, which is unknown and unknowable to
the routers. Being sensibly conservative by choosing a very low
rate could, in many cases, significantly increase the delay before
the multicast source learns the right MTU for the tree and, hence,
before receivers on smaller-MTU branches can start receiving the
(2) DNS servers, or other similar apps that have the requirement of
sending a small amount of data (a few packets at most) to a very
large and transient set of clients. Such servers often reside on
links, such as Ethernet, that have an MTU bigger than the links on
which many of their clients may reside, such as dial-up links. If
those servers were to send many reply messages of the size of their
own links (as required by PMTU Discovery), they could incur very
many ICMP packet-too-big messages and consequent retransmissions of
the replies -- in the worse case, multiplying the total bandwidth
consumption (and delivery delay) by 2 or 3 times that of the
alternative approach of just using the min MTU always. Furthermore,
the use of PMTU Discovery could result in such servers filling up
lots of memory withed cached PMTU information that will never be
used again (at least, not before it gets garbage-collected).
The number I propose for the new minimum MTU is 1280 bytes (1024 + 256,
as compared to the classic 576 value which is 512 + 64). That would
leave generous room for encapsulating/tunnel headers within the Ethernet
MTU of 1500, e.g., enough for two layers of secure tunneling including
both ESP and AUTH headers.
For medium-to-high speed links, this change would reduce the IPv6 header
overhead for min MTU packets from 7% to 3% (a little less than the IPv4
header overhead for 576-byte IPv4 packets). For low-speed links such as
analog dial-up or low-speed wireless, I assume that header compression will
be employed, which compresses out the IPv6 header completely, so the IPv6
header overhead on such links is effectively zero in any case.
Here is a list of *disadvantages* to increasing the IPv6 minimum MTU that
have been raised, either publically or privately:
(1) This change would require the specification of link-specific
fragmentation and reassembly protocols for those link-layers
that can support 576-byte packets but not 1280-byte packets,
e.g., AppleTalk. I think such a protocol could be very simple,
and I briefly sketch such a protocol in Appendix I of this
message, as an example.
Often, those links that have a small native MTU are also the ones
that have low bandwidth. On low-bandwidth links, it is often
desirable to locally fragment and reassemble IPv6 packets anyway
(even 576-byte ones) in order to avoid having small, interactive
packets (e.g., keystrokes, character echoes, or voice samples)
be delayed excessively behind bigger packets (e.g., file transfers);
the small packets can be interleaved with the fragments of the
big packets. Someone mentioned in the meeting in Munich that the
ISSLL WG was working on a PPP-specific fragmentation and
reassembly protocol for precisely this reason, so maybe the job
of specifying such a protocol is already being taken care of.
(2) Someone raised the concern that, if we make the minimum MTU close
to Ethernet size, implementors might never bother to implement PMTU
Discovery. That would be regrettable, especially if the Internet
evolves to much more widespread use of links with MTUs bigger
than Ethernet's, since IPv6 would then fail to take advantage of
the bandwidth efficiencies possible on larger MTU paths.
(3) Peter Curran pointed out to me that using a larger minimum MTU for
IPv6 may result in much greater reliance on *IPv4* fragmentation and
reassembly during the transition phase while much of the IPv6
traffic is being tunneled over IPv4. This could incur unfortunate
performance penalties for tunneled IPv6 traffic (disasterous
penalties if there is non-negligible loss of IPv4 fragments).
I have included Peter's message, describing his concern in more
detail, in Appendix II of this message.
(4) Someone expressed the opinion that the requirement for link-layer
fragmentation and reassembly of IPv6 over low-cost, low-MTU links
like Firewire, would doom the potential use of IPv6 in cheap
consumer devices in which minimizing code size is important --
implementors of cheap Firewire devices would choose IPv4 instead,
since it would not need a fragmenting "shim" layer. This may well
be true, though I suspect the code required for local frag/reasm
would be negligible compared to the code required for Neighbor
Personally, I am not convinced by the above concerns that increasing the
minimum MTU would be a mistake, but I'd like to hear what the rest of the
WG thinks. Are there other problems that anyone can think of? As I
mentioned earlier, the clear consensus of the Munich attendees was to
increase the minimum MTU, so we need to find out if these newly-identified
problems are enough to swing the consensus in the other direction. Your
feedback is heartily requested.
Here is a sketch of a fragmentation and reassembly protocol (call it FRP)
to be employed between the IP layer and the link layer of a link with native
(or configured) MTU less than 1280 bytes.
Identify a Block Size, B, which is the lesser of (a) the native MTU of the
link or (b) a value related to the bandwidth of the link, chosen to bound
the latency that one block can impose on a subsequent block. For example,
to stay within a latency of 200 ms on a 9600 bps link, choose a block size
of .2 * 9600 = 2400 bits = 240 bytes.
IPv6 packets of length <= B are transmitted directly on the link.
IPv6 packets of length > B are fragmented into blocks of size B
(the last block possibly being shorter than B), and those fragments
are transmitted on the link with an FRP header containing the following
[packet ID, block number, end flag]
packet ID is the same for all fragments of the same packet,
and is incremented for each new fragmented packet. The size of
the packet ID field limits how many packets can be in flight or
interleaved on the link at any one time.
block number identifies the blocks within a packet, starting at
block zero. The block number field must be large enough to
identify 1280/B blocks.
end flag is a one-bit flag which is used to mark the last block
of a packet.
For example, on a 9600 bps serial link, one might use a block size of
240 bytes and an 8-bit FRP header of the following format:
4-bit packet ID, which allows interleaving of up to 16 packets.
3-bit block number, to identify blocks numbered 0 through 5.
1-bit end flag.
On a 256 kpbs AppleTalk link, one might use the AppleTalk-imposed block
size of ~580 bytes and an 8-bit FRP header of the following format:
5-bit packet ID, which allows for up to 32 fragmented packets in
flight from each source across the AppleTalk internet.
2-bit block number, to identify blocks numbered 0 through 2.
1-bit end flag.
On a multi-access link, like AppleTalk, the receiver uses the link-level
source address as well as the packet ID to identify blocks belonging to
the same packet.
If a receiver fails to receive all of the blocks of a packet by the time
the packet number wraps around, it discards the incompletely-reassembled
packet. Taking this approach, no timers should be needed at the receiver
to detect fragment loss. We expect the transport layer (e.g., TCP) checksum
at the final IPv6 destination to detect mis-assembly that might be caused by
extreme misordering/delay during transit across the link.
On links on which IPv6 header compression is being used, compression is
performed before fragmentation, and reassembly is done before decompression.
From: Peter Curran <peter at gate.ticl.co.uk <mailto:peter at gate.ticl.co.uk>>
Subject: Re: IPv6 MTU issue
To: deering at cisco.com <mailto:deering at cisco.com> (Steve Deering)
Date: Mon, 22 Sep 1997 11:50:34 +0100 (BST)
My problem was that moving the MTU close to 1500 would have an adverse
effect on the transition strategy. The current strategy assumes that the
typical Internet MTU is >576, and that sending an IPv6 packet close to the
minimum MTU will not require any IPv4 fragmentation to support the tunnel
transparently. The PMTU discovery mechanism will 'tune' IPv6 to use a
If the IPv4 MTU is <= 576 then IPv4 fragmentation will be required to
provide a tunnel with a minimum MTU of 576 for IPv6. This clearly places
a significant strain on the tunnelling nodes - as these will normally be
routers then there will be a demand for memory (for reassembly buffers)
as well as CPU (for the frag/reassembly process) that will have an overall
impact on performance.
This is an acceptable risk, as Internet MTU's of <= 576 are not too common.
However, if the minimum MTU of IPv6 is increased to something of the order
of 1200-1500 octets then the likelihood of finding an IPv4 path with an
MTU lower than this value increases (I think significantly) and this will
have a performance impact on these devices.
During the brief discussion of this matter in the IPNG session at Munich
you stated that MTU's less than 1500 where rare. I don't agree with this
completely - it seems to be pretty common practise for smaller 2nd and 3rd
tier ISP's in the UK to use an MTU of 576 for connection to their transit
provider. Their objective, I believe, is to 'normalize' the packet sizes
on relatively low bandwidth circuits (typically <1Mbps) to provide better
performance for interactive sessions compared to bulk-file transfer users.
I think that before we go ahead and make a decision on an increased minimum
MTU for IPv6 then we should discuss the issues a little more.
Incidentally, I am not convinced of the benefits of doing this anyway
(ignoring the issue raised above). With a properly setup stack the PMTU
discovery mechanism seems to be able to select a good MTU for use on the
path - at least that is my experience on our test network and the 6Bone.
I appreciate that you are trying to address the issues of PMTU for multi-
casting but I don't see how raising the minumum MTU is going to help much.
PMTU discovery will still be required irrespective of the minimum MTU
adopted, unless we adopt a value that can be used on all link-layer technolo-
I would welcome wider discussion of these issues before pressing ahead
with a change.
IETF IPng Working Group Mailing List
IPng Home Page: http://playground.sun.com/ipng <http://playground.sun.com/ipng>
FTP archive: ftp://playground.sun.com/pub/ipng <ftp://playground.sun.com/pub/ipng>
Direct all administrative requests to majordomo at sunroof.eng.sun.com <mailto:majordomo at sunroof.eng.sun.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dns-operations