LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: David Ahern <dsahern@gmail.com>, <davem@davemloft.net>,
	<kuba@kernel.org>
Cc: <alexander.duyck@gmail.com>, <linux@armlinux.org.uk>,
	<mw@semihalf.com>, <linuxarm@openeuler.org>,
	<yisen.zhuang@huawei.com>, <salil.mehta@huawei.com>,
	<thomas.petazzoni@bootlin.com>, <hawk@kernel.org>,
	<ilias.apalodimas@linaro.org>, <ast@kernel.org>,
	<daniel@iogearbox.net>, <john.fastabend@gmail.com>,
	<akpm@linux-foundation.org>, <peterz@infradead.org>,
	<will@kernel.org>, <willy@infradead.org>, <vbabka@suse.cz>,
	<fenghua.yu@intel.com>, <guro@fb.com>, <peterx@redhat.com>,
	<feng.tang@intel.com>, <jgg@ziepe.ca>, <mcroce@microsoft.com>,
	<hughd@google.com>, <jonathan.lemon@gmail.com>, <alobakin@pm.me>,
	<willemb@google.com>, <wenxu@ucloud.cn>,
	<cong.wang@bytedance.com>, <haokexin@gmail.com>,
	<nogikh@google.com>, <elver@google.com>, <yhs@fb.com>,
	<kpsingh@kernel.org>, <andrii@kernel.org>, <kafai@fb.com>,
	<songliubraving@fb.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>,
	<chenhao288@hisilicon.com>, <edumazet@google.com>,
	<yoshfuji@linux-ipv6.org>, <dsahern@kernel.org>,
	<memxor@gmail.com>, <linux@rempel-privat.de>,
	<atenart@kernel.org>, <weiwan@google.com>, <ap420073@gmail.com>,
	<arnd@arndb.de>, <mathew.j.martineau@linux.intel.com>,
	<aahringo@redhat.com>, <ceggers@arri.de>, <yangbo.lu@nxp.com>,
	<fw@strlen.de>, <xiangxia.m.yue@gmail.com>,
	<linmiaohe@huawei.com>
Subject: Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support
Date: Thu, 19 Aug 2021 16:18:19 +0800	[thread overview]
Message-ID: <f0d935b9-45fe-4c51-46f0-1f526167877f@huawei.com> (raw)
In-Reply-To: <83b8bae8-d524-36a1-302e-59198410d9a9@gmail.com>

On 2021/8/19 6:05, David Ahern wrote:
> On 8/17/21 9:32 PM, Yunsheng Lin wrote:
>> This patchset adds the socket to netdev page frag recycling
>> support based on the busy polling and page pool infrastructure.
>>
>> The profermance improve from 30Gbit to 41Gbit for one thread iperf
>> tcp flow, and the CPU usages decreases about 20% for four threads
>> iperf flow with 100Gb line speed in IOMMU strict mode.
>>
>> The profermance improve about 2.5% for one thread iperf tcp flow
>> in IOMMU passthrough mode.
>>
> 
> Details about the test setup? cpu model, mtu, any other relevant changes
> / settings.

CPU is arm64 Kunpeng 920, see:
https://www.hisilicon.com/en/products/Kunpeng/Huawei-Kunpeng-920

mtu is 1500, the relevant changes/settings I can think of the iperf
client runs on the same numa as the nic hw exists(which has one 100Gbit
port), and the driver has the XPS enabled too.

> 
> How does that performance improvement compare with using the Tx ZC API?
> At 1500 MTU I see a CPU drop on the Tx side from 80% to 20% with the ZC
> API and ~10% increase in throughput. Bumping the MTU to 3300 and
> performance with the ZC API is 2x the current model with 1/2 the cpu.

I added a sysctl node to decide whether pfrag pool is used:
net.ipv4.tcp_use_pfrag_pool

and use msg_zerocopy to compare the result:
Server uses cmd "./msg_zerocopy -4 -i eth4 -C 32 -S 192.168.100.2 -r tcp"
Client uses cmd "./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -"

The zc does seem to improve the CPU usages significantly, but not for throughput
with mtu 1500. And the result seems to be similar with mtu 3300.

the detail result is below:

(1) IOMMU strict mode + net.ipv4.tcp_use_pfrag_pool = 0:
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp
tx=115317 (7196 MB) txc=0 zc=n

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp':

        4315472244      cycles

       4.199890190 seconds time elapsed

       0.084328000 seconds user
       1.528714000 seconds sys
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z
tx=90121 (5623 MB) txc=90121 zc=y

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z':

        1715892155      cycles

       4.243329050 seconds time elapsed

       0.083275000 seconds user
       0.755355000 seconds sys


(2)IOMMU strict mode + net.ipv4.tcp_use_pfrag_pool = 1:
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp
tx=138932 (8669 MB) txc=0 zc=n

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp':

        4034016168      cycles

       4.199877510 seconds time elapsed

       0.058143000 seconds user
       1.644480000 seconds sys
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z
tx=93369 (5826 MB) txc=93369 zc=y

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z':

        1815300491      cycles

       4.243259530 seconds time elapsed

       0.051767000 seconds user
       0.796610000 seconds sys


(3)IOMMU passthrough + net.ipv4.tcp_use_pfrag_pool=0
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp
tx=129927 (8107 MB) txc=0 zc=n

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp':

        3720131007      cycles

       4.200651840 seconds time elapsed

       0.038604000 seconds user
       1.455521000 seconds sys
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z
tx=135285 (8442 MB) txc=135285 zc=y

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp -z':

        1721949875      cycles

       4.242596800 seconds time elapsed

       0.024963000 seconds user
       0.779391000 seconds sys

(4)IOMMU  passthrough + net.ipv4.tcp_use_pfrag_pool=1
root@(none):/# perf stat -e cycles ./msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp
tx=151844 (9475 MB) txc=0 zc=n

 Performance counter stats for './msg_zerocopy -4 -i eth4 -C 0 -S 192.168.100.1 -D 192.168.100.2 tcp':

        3786216097      cycles

       4.200606520 seconds time elapsed

       0.028633000 seconds user
       1.569736000 seconds sys


> 
> Epyc 7502, ConnectX-6, IOMMU off.
> 
> In short, it seems like improving the Tx ZC API is the better path
> forward than per-socket page pools.

The main goal is to optimize the SMMU mapping/unmaping, if the cost of memcpy
it higher than the SMMU mapping/unmaping + page pinning, then Tx ZC may be a
better path, at leas it is not sure for small packet?


> .
> 

  reply	other threads:[~2021-08-19  8:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  3:32 Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 1/7] page_pool: refactor the page pool to support multi alloc context Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 2/7] skbuff: add interface to manipulate frag count for tx recycling Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 3/7] net: add NAPI api to register and retrieve the page pool ptr Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 4/7] net: pfrag_pool: add pfrag pool support based on page pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 5/7] sock: support refilling pfrag from pfrag_pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 6/7] net: hns3: support tx recycling in the hns3 driver Yunsheng Lin
2021-08-18  8:57 ` [PATCH RFC 0/7] add socket to netdev page frag recycling support Eric Dumazet
2021-08-18  9:36   ` Yunsheng Lin
2021-08-23  9:25     ` [Linuxarm] " Yunsheng Lin
2021-08-23 15:04       ` Eric Dumazet
2021-08-24  8:03         ` Yunsheng Lin
2021-08-25 16:29         ` David Ahern
2021-08-25 16:32           ` Eric Dumazet
2021-08-25 16:38             ` David Ahern
2021-08-25 17:24               ` Eric Dumazet
2021-08-26  4:05                 ` David Ahern
2021-08-18 22:05 ` David Ahern
2021-08-19  8:18   ` Yunsheng Lin [this message]
2021-08-20 14:35     ` David Ahern
2021-08-23  3:32       ` Yunsheng Lin
2021-08-24  3:34         ` David Ahern
2021-08-24  8:41           ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0d935b9-45fe-4c51-46f0-1f526167877f@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=aahringo@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ceggers@arri.de \
    --cc=chenhao288@hisilicon.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=fw@strlen.de \
    --cc=guro@fb.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=hughd@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@rempel-privat.de \
    --cc=linuxarm@openeuler.org \
    --cc=mathew.j.martineau@linux.intel.com \
    --cc=mcroce@microsoft.com \
    --cc=memxor@gmail.com \
    --cc=mw@semihalf.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=salil.mehta@huawei.com \
    --cc=songliubraving@fb.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vbabka@suse.cz \
    --cc=weiwan@google.com \
    --cc=wenxu@ucloud.cn \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yangbo.lu@nxp.com \
    --cc=yhs@fb.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \
    --subject='Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).