LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Marcin Wojtas <mw@semihalf.com>,
	linuxarm@openeuler.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Fenghua Yu <fenghua.yu@intel.com>, Roman Gushchin <guro@fb.com>,
	Peter Xu <peterx@redhat.com>, "Tang, Feng" <feng.tang@intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	mcroce@microsoft.com, Hugh Dickins <hughd@google.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Willem de Bruijn <willemb@google.com>, wenxu <wenxu@ucloud.cn>,
	Cong Wang <cong.wang@bytedance.com>,
	Kevin Hao <haokexin@gmail.com>,
	Aleksandr Nogikh <nogikh@google.com>,
	Marco Elver <elver@google.com>, Yonghong Song <yhs@fb.com>,
	kpsingh@kernel.org, Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	chenhao288@hisilicon.com,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	memxor@gmail.com, linux@rempel-privat.de,
	Antoine Tenart <atenart@kernel.org>, Wei Wang <weiwan@google.com>,
	Taehee Yoo <ap420073@gmail.com>, Arnd Bergmann <arnd@arndb.de>,
	Mat Martineau <mathew.j.martineau@linux.intel.com>,
	aahringo@redhat.com, ceggers@arri.de, yangbo.lu@nxp.com,
	Florian Westphal <fw@strlen.de>,
	xiangxia.m.yue@gmail.com, linmiaohe <linmiaohe@huawei.com>
Subject: Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support
Date: Wed, 18 Aug 2021 10:57:06 +0200	[thread overview]
Message-ID: <CANn89iJDf9uzSdqLEBeTeGB1uAxvmruKfK5HbeZWp+Cdc+qggQ@mail.gmail.com> (raw)
In-Reply-To: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com>

On Wed, Aug 18, 2021 at 5:33 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> This patchset adds the socket to netdev page frag recycling
> support based on the busy polling and page pool infrastructure.

I really do not see how this can scale to thousands of sockets.

tcp_mem[] defaults to ~ 9 % of physical memory.

If you now run tests with thousands of sockets, their skbs will
consume Gigabytes
of memory on typical servers, now backed by order-0 pages (instead of
current order-3 pages)
So IOMMU costs will actually be much bigger.

Are we planning to use Gigabyte sized page pools for NIC ?

Have you tried instead to make TCP frags twice bigger ?
This would require less IOMMU mappings.
(Note: This could require some mm help, since PAGE_ALLOC_COSTLY_ORDER
is currently 3, not 4)

diff --git a/net/core/sock.c b/net/core/sock.c
index a3eea6e0b30a7d43793f567ffa526092c03e3546..6b66b51b61be9f198f6f1c4a3d81b57fa327986a
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2560,7 +2560,7 @@ static void sk_leave_memory_pressure(struct sock *sk)
        }
 }

-#define SKB_FRAG_PAGE_ORDER    get_order(32768)
+#define SKB_FRAG_PAGE_ORDER    get_order(65536)
 DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);

 /**



>
> The profermance improve from 30Gbit to 41Gbit for one thread iperf
> tcp flow, and the CPU usages decreases about 20% for four threads
> iperf flow with 100Gb line speed in IOMMU strict mode.
>
> The profermance improve about 2.5% for one thread iperf tcp flow
> in IOMMU passthrough mode.
>
> Yunsheng Lin (7):
>   page_pool: refactor the page pool to support multi alloc context
>   skbuff: add interface to manipulate frag count for tx recycling
>   net: add NAPI api to register and retrieve the page pool ptr
>   net: pfrag_pool: add pfrag pool support based on page pool
>   sock: support refilling pfrag from pfrag_pool
>   net: hns3: support tx recycling in the hns3 driver
>   sysctl_tcp_use_pfrag_pool
>
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 32 +++++----
>  include/linux/netdevice.h                       |  9 +++
>  include/linux/skbuff.h                          | 43 +++++++++++-
>  include/net/netns/ipv4.h                        |  1 +
>  include/net/page_pool.h                         | 15 ++++
>  include/net/pfrag_pool.h                        | 24 +++++++
>  include/net/sock.h                              |  1 +
>  net/core/Makefile                               |  1 +
>  net/core/dev.c                                  | 34 ++++++++-
>  net/core/page_pool.c                            | 86 ++++++++++++-----------
>  net/core/pfrag_pool.c                           | 92 +++++++++++++++++++++++++
>  net/core/sock.c                                 | 12 ++++
>  net/ipv4/sysctl_net_ipv4.c                      |  7 ++
>  net/ipv4/tcp.c                                  | 34 ++++++---
>  14 files changed, 325 insertions(+), 66 deletions(-)
>  create mode 100644 include/net/pfrag_pool.h
>  create mode 100644 net/core/pfrag_pool.c
>
> --
> 2.7.4
>

  parent reply	other threads:[~2021-08-18  8:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  3:32 Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 1/7] page_pool: refactor the page pool to support multi alloc context Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 2/7] skbuff: add interface to manipulate frag count for tx recycling Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 3/7] net: add NAPI api to register and retrieve the page pool ptr Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 4/7] net: pfrag_pool: add pfrag pool support based on page pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 5/7] sock: support refilling pfrag from pfrag_pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 6/7] net: hns3: support tx recycling in the hns3 driver Yunsheng Lin
2021-08-18  8:57 ` Eric Dumazet [this message]
2021-08-18  9:36   ` [PATCH RFC 0/7] add socket to netdev page frag recycling support Yunsheng Lin
2021-08-23  9:25     ` [Linuxarm] " Yunsheng Lin
2021-08-23 15:04       ` Eric Dumazet
2021-08-24  8:03         ` Yunsheng Lin
2021-08-25 16:29         ` David Ahern
2021-08-25 16:32           ` Eric Dumazet
2021-08-25 16:38             ` David Ahern
2021-08-25 17:24               ` Eric Dumazet
2021-08-26  4:05                 ` David Ahern
2021-08-18 22:05 ` David Ahern
2021-08-19  8:18   ` Yunsheng Lin
2021-08-20 14:35     ` David Ahern
2021-08-23  3:32       ` Yunsheng Lin
2021-08-24  3:34         ` David Ahern
2021-08-24  8:41           ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iJDf9uzSdqLEBeTeGB1uAxvmruKfK5HbeZWp+Cdc+qggQ@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=aahringo@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ceggers@arri.de \
    --cc=chenhao288@hisilicon.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=fw@strlen.de \
    --cc=guro@fb.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=hughd@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@rempel-privat.de \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mathew.j.martineau@linux.intel.com \
    --cc=mcroce@microsoft.com \
    --cc=memxor@gmail.com \
    --cc=mw@semihalf.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=salil.mehta@huawei.com \
    --cc=songliubraving@fb.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vbabka@suse.cz \
    --cc=weiwan@google.com \
    --cc=wenxu@ucloud.cn \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yangbo.lu@nxp.com \
    --cc=yhs@fb.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \
    --subject='Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).