Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Stefano Brivio <sbrivio@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>,
Aaron Conole <aconole@redhat.com>,
Numan Siddique <nusiddiq@redhat.com>,
Jakub Kicinski <kuba@kernel.org>,
Pravin B Shelar <pshelar@ovn.org>,
Roopa Prabhu <roopa@cumulusnetworks.com>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
Lourdes Pedrajas <lu@pplo.net>,
netdev@vger.kernel.org
Subject: Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets
Date: Tue, 4 Aug 2020 07:54:37 -0600 [thread overview]
Message-ID: <437077cc-8c3c-79de-3475-6c717001d8ae@gmail.com> (raw)
In-Reply-To: <83e5876f589b0071638630dd93fbe0fa6b1b257c.1596520062.git.sbrivio@redhat.com>
On 8/3/20 11:53 PM, Stefano Brivio wrote:
> It's currently possible to bridge Ethernet tunnels carrying IP
> packets directly to external interfaces without assigning them
> addresses and routes on the bridged network itself: this is the case
> for UDP tunnels bridged with a standard bridge or by Open vSwitch.
>
> PMTU discovery is currently broken with those configurations, because
> the encapsulation effectively decreases the MTU of the link, and
> while we are able to account for this using PMTU discovery on the
> lower layer, we don't have a way to relay ICMP or ICMPv6 messages
> needed by the sender, because we don't have valid routes to it.
>
> On the other hand, as a tunnel endpoint, we can't fragment packets
> as a general approach: this is for instance clearly forbidden for
> VXLAN by RFC 7348, section 4.3:
>
> VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
> fragment encapsulated VXLAN packets due to the larger frame size.
> The destination VTEP MAY silently discard such VXLAN fragments.
>
> The same paragraph recommends that the MTU over the physical network
> accomodates for encapsulations, but this isn't a practical option for
> complex topologies, especially for typical Open vSwitch use cases.
>
> Further, it states that:
>
> Other techniques like Path MTU discovery (see [RFC1191] and
> [RFC1981]) MAY be used to address this requirement as well.
>
> Now, PMTU discovery already works for routed interfaces, we get
> route exceptions created by the encapsulation device as they receive
> ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
> we already rebuild those messages with the appropriate MTU and route
> them back to the sender.
>
> Add the missing bits for bridged cases:
>
> - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
> to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
> RFC 4443 section 2.4 for ICMPv6. This function is already called by
> UDP tunnels
>
> - a new function generating those ICMP or ICMPv6 replies. We can't
> reuse icmp_send() and icmp6_send() as we don't see the sender as a
> valid destination. This doesn't need to be generic, as we don't
> cover any other type of ICMP errors given that we only provide an
> encapsulation function to the sender
>
> While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
> we might receive GSO buffers here, and the passed headroom already
> includes the inner MAC length, so we don't have to account for it
> a second time (that would imply three MAC headers on the wire, but
> there are just two).
>
> This issue became visible while bridging IPv6 packets with 4500 bytes
> of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
> bytes of encapsulation headroom, we would advertise MTU as 3950, and
> we would reject fragmented IPv6 datagrams of 3958 bytes size on the
> wire. We're exclusively dealing with network MTU here, though, so we
> could get Ethernet frames up to 3964 octets in that case.
>
> v2:
> - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
> - split IPv4/IPv6 functions (David Ahern)
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> drivers/net/bareudp.c | 5 +-
> drivers/net/geneve.c | 5 +-
> drivers/net/vxlan.c | 4 +-
> include/net/dst.h | 10 --
> include/net/ip_tunnels.h | 2 +
> net/ipv4/ip_tunnel_core.c | 244 ++++++++++++++++++++++++++++++++++++++
> 6 files changed, 254 insertions(+), 16 deletions(-)
>
Much easier to follow
Reviewed-by: David Ahern <dsahern@gmail.com>
next prev parent reply other threads:[~2020-08-04 13:54 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-04 5:53 [PATCH net-next v2 0/6] Support PMTU discovery with bridged UDP tunnels Stefano Brivio
2020-08-04 5:53 ` [PATCH net-next v2 1/6] ipv4: route: Ignore output interface in FIB lookup for PMTU route Stefano Brivio
2020-08-04 13:54 ` David Ahern
2020-08-04 5:53 ` [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets Stefano Brivio
2020-08-04 13:54 ` David Ahern [this message]
2020-08-05 16:54 ` Naresh Kamboju
2020-08-05 17:02 ` Stefano Brivio
2020-08-04 5:53 ` [PATCH net-next v2 3/6] vxlan: Support for PMTU discovery on directly bridged links Stefano Brivio
2020-08-04 5:53 ` [PATCH net-next v2 4/6] geneve: " Stefano Brivio
2020-08-04 5:53 ` [PATCH net-next v2 5/6] selftests: pmtu.sh: Add tests for bridged UDP tunnels Stefano Brivio
2020-08-04 14:00 ` David Ahern
2020-08-04 14:35 ` Stefano Brivio
2020-08-04 5:53 ` [PATCH net-next v2 6/6] selftests: pmtu.sh: Add tests for UDP tunnels handled by Open vSwitch Stefano Brivio
2020-08-04 20:02 ` [PATCH net-next v2 0/6] Support PMTU discovery with bridged UDP tunnels David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=437077cc-8c3c-79de-3475-6c717001d8ae@gmail.com \
--to=dsahern@gmail.com \
--cc=aconole@redhat.com \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=kuba@kernel.org \
--cc=lu@pplo.net \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=nusiddiq@redhat.com \
--cc=pshelar@ovn.org \
--cc=roopa@cumulusnetworks.com \
--cc=sbrivio@redhat.com \
--subject='Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).