Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Stefano Brivio <sbrivio@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>,
	Aaron Conole <aconole@redhat.com>,
	Numan Siddique <nusiddiq@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Pravin B Shelar <pshelar@ovn.org>,
	Roopa Prabhu <roopa@cumulusnetworks.com>,
	Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	Lourdes Pedrajas <lu@pplo.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets
Date: Tue, 4 Aug 2020 07:54:37 -0600	[thread overview]
Message-ID: <437077cc-8c3c-79de-3475-6c717001d8ae@gmail.com> (raw)
In-Reply-To: <83e5876f589b0071638630dd93fbe0fa6b1b257c.1596520062.git.sbrivio@redhat.com>

On 8/3/20 11:53 PM, Stefano Brivio wrote:
> It's currently possible to bridge Ethernet tunnels carrying IP
> packets directly to external interfaces without assigning them
> addresses and routes on the bridged network itself: this is the case
> for UDP tunnels bridged with a standard bridge or by Open vSwitch.
> 
> PMTU discovery is currently broken with those configurations, because
> the encapsulation effectively decreases the MTU of the link, and
> while we are able to account for this using PMTU discovery on the
> lower layer, we don't have a way to relay ICMP or ICMPv6 messages
> needed by the sender, because we don't have valid routes to it.
> 
> On the other hand, as a tunnel endpoint, we can't fragment packets
> as a general approach: this is for instance clearly forbidden for
> VXLAN by RFC 7348, section 4.3:
> 
>    VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
>    fragment encapsulated VXLAN packets due to the larger frame size.
>    The destination VTEP MAY silently discard such VXLAN fragments.
> 
> The same paragraph recommends that the MTU over the physical network
> accomodates for encapsulations, but this isn't a practical option for
> complex topologies, especially for typical Open vSwitch use cases.
> 
> Further, it states that:
> 
>    Other techniques like Path MTU discovery (see [RFC1191] and
>    [RFC1981]) MAY be used to address this requirement as well.
> 
> Now, PMTU discovery already works for routed interfaces, we get
> route exceptions created by the encapsulation device as they receive
> ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
> we already rebuild those messages with the appropriate MTU and route
> them back to the sender.
> 
> Add the missing bits for bridged cases:
> 
> - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
>   to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
>   RFC 4443 section 2.4 for ICMPv6. This function is already called by
>   UDP tunnels
> 
> - a new function generating those ICMP or ICMPv6 replies. We can't
>   reuse icmp_send() and icmp6_send() as we don't see the sender as a
>   valid destination. This doesn't need to be generic, as we don't
>   cover any other type of ICMP errors given that we only provide an
>   encapsulation function to the sender
> 
> While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
> we might receive GSO buffers here, and the passed headroom already
> includes the inner MAC length, so we don't have to account for it
> a second time (that would imply three MAC headers on the wire, but
> there are just two).
> 
> This issue became visible while bridging IPv6 packets with 4500 bytes
> of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
> bytes of encapsulation headroom, we would advertise MTU as 3950, and
> we would reject fragmented IPv6 datagrams of 3958 bytes size on the
> wire. We're exclusively dealing with network MTU here, though, so we
> could get Ethernet frames up to 3964 octets in that case.
> 
> v2:
> - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
> - split IPv4/IPv6 functions (David Ahern)
> 
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
>  drivers/net/bareudp.c     |   5 +-
>  drivers/net/geneve.c      |   5 +-
>  drivers/net/vxlan.c       |   4 +-
>  include/net/dst.h         |  10 --
>  include/net/ip_tunnels.h  |   2 +
>  net/ipv4/ip_tunnel_core.c | 244 ++++++++++++++++++++++++++++++++++++++
>  6 files changed, 254 insertions(+), 16 deletions(-)
> 

Much easier to follow

Reviewed-by: David Ahern <dsahern@gmail.com>



  reply	other threads:[~2020-08-04 13:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04  5:53 [PATCH net-next v2 0/6] Support PMTU discovery with bridged UDP tunnels Stefano Brivio
2020-08-04  5:53 ` [PATCH net-next v2 1/6] ipv4: route: Ignore output interface in FIB lookup for PMTU route Stefano Brivio
2020-08-04 13:54   ` David Ahern
2020-08-04  5:53 ` [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets Stefano Brivio
2020-08-04 13:54   ` David Ahern [this message]
2020-08-05 16:54   ` Naresh Kamboju
2020-08-05 17:02     ` Stefano Brivio
2020-08-04  5:53 ` [PATCH net-next v2 3/6] vxlan: Support for PMTU discovery on directly bridged links Stefano Brivio
2020-08-04  5:53 ` [PATCH net-next v2 4/6] geneve: " Stefano Brivio
2020-08-04  5:53 ` [PATCH net-next v2 5/6] selftests: pmtu.sh: Add tests for bridged UDP tunnels Stefano Brivio
2020-08-04 14:00   ` David Ahern
2020-08-04 14:35     ` Stefano Brivio
2020-08-04  5:53 ` [PATCH net-next v2 6/6] selftests: pmtu.sh: Add tests for UDP tunnels handled by Open vSwitch Stefano Brivio
2020-08-04 20:02 ` [PATCH net-next v2 0/6] Support PMTU discovery with bridged UDP tunnels David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=437077cc-8c3c-79de-3475-6c717001d8ae@gmail.com \
    --to=dsahern@gmail.com \
    --cc=aconole@redhat.com \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=kuba@kernel.org \
    --cc=lu@pplo.net \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=nusiddiq@redhat.com \
    --cc=pshelar@ovn.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=sbrivio@redhat.com \
    --subject='Re: [PATCH net-next v2 2/6] tunnels: PMTU discovery support for directly bridged IP packets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).