From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.4 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27620C4338F for ; Thu, 5 Aug 2021 20:51:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0C8C60EBC for ; Thu, 5 Aug 2021 20:51:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241271AbhHEUv6 (ORCPT ); Thu, 5 Aug 2021 16:51:58 -0400 Received: from novek.ru ([213.148.174.62]:45760 "EHLO novek.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231689AbhHEUv5 (ORCPT ); Thu, 5 Aug 2021 16:51:57 -0400 Received: from [192.168.0.18] (unknown [37.228.234.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by novek.ru (Postfix) with ESMTPSA id D0682503EA8; Thu, 5 Aug 2021 23:49:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 novek.ru D0682503EA8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=novek.ru; s=mail; t=1628196542; bh=AvRIgv68spDFkRN2aW/DbsgeWmJgDf2pBShLl7FdmzM=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=Bhhq3HOYmzhQq6yOimh6laALNHxrARMVnxQZ6MvRYKMspaLze87zjqNsUBploeokM eQTRIzPNHZ5NBwUNDYc9TwarCZ4Y/0UFUTiyVuZFx+52hXCIOl0sq4XW4sn0E0wwxG jjhASayLtnsGsdKUKTHlhCzPIy/l89da1LrEL4gM= Subject: Re: [PATCH net] net: ipv4: fix path MTU for multi path routes To: David Ahern Cc: Willem de Bruijn , Paolo Abeni , Jakub Kicinski , "David S. Miller" , netdev@vger.kernel.org References: <20210731011729.4357-1-vfedorenko@novek.ru> From: Vadim Fedorenko Message-ID: <71b3384d-6d9c-4841-c610-463879f993b2@novek.ru> Date: Thu, 5 Aug 2021 21:51:38 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 01.08.2021 18:12, David Ahern wrote: > On 7/30/21 7:17 PM, Vadim Fedorenko wrote: >> Bug 213729 showed that MTU check could be used against route that >> will not be used in actual transmit if source ip is not specified. >> But path MTU update is always done on route with defined source ip. >> Fix route selection by updating flow info in case when source ip >> is not explicitly defined in raw and udp sockets. > > There is more to it than just setting the source address and doing a > second lookup. > You are right. Update of source address fixes only some specific cases. Also, I'm not fun of doing several lookups just because we found additional next hops. It looks like (for ipv4 case) fib_table_lookup() should select correct next-hop based on hash and update source ip and output interface for flowi4. But right now flowi4 is constant and such change looks more like net-next improvement. Or do you have another solution? > Attached is a test script I started last summer which shows the problem > at hand and is setup to cover various permutations of routing (legacy > routing, nexthop objects, and vrf), network protocols (v4 and v6) and > should cover tcp, udp and icmp: > > # PMTU handling with multipath routing. > # > # .-- sw1 --. > # h1 ----|-- sw2 --|---- h2 -------- h3 > # | ... | reduced mtu > # .-- swN --. > # > # h2-h3 segment has reduced mtu. > # Exceptions created in h1 for h3. > > N=8 (8-way multipath) seems to always demonstrate it; N=2 is a 50-50 chance. > > Snippet from a run this morning: > > # ip netns exec h1 ping -s 1450 10.100.2.254 > PING 10.100.2.254 (10.100.2.254) 1450(1478) bytes of data. > From 10.2.22.254 icmp_seq=1 Frag needed and DF set (mtu = 1420) > From 10.2.22.254 icmp_seq=2 Frag needed and DF set (mtu = 1420) > From 10.2.22.254 icmp_seq=3 Frag needed and DF set (mtu = 1420) > From 10.2.22.254 icmp_seq=4 Frag needed and DF set (mtu = 1420) > > ok, an MTU message makes it back to h1, but that it continues shows the > exception is not created on the right interface: > > # ip -netns h1 ro ls cache > 10.100.2.254 via 10.1.15.5 dev eth5 > cache expires 580sec mtu 1420 > > But the selected path is: > # ip -netns h1 ro get 10.100.2.254 > 10.100.2.254 via 10.1.12.2 dev eth2 src 10.1.12.254 uid 0 > cache > > Adding in the source address does not fix it but it does change the > selected path. .e.g, > > # ip -netns h1 ro get 10.100.2.254 from 10.1.16.254 > 10.100.2.254 from 10.1.16.254 via 10.1.14.4 dev eth4 uid 0 > cache > > If 10.1.16.254 is the set source address then egress should be eth6, not > eth4, since that is an address on eth6. >