Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Wei Wang <weiwan@google.com>
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	"Network Development" <netdev@vger.kernel.org>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Hannes Frederic Sowa" <hannes@stressinduktion.org>,
	"Felix Fietkau" <nbd@nbd.name>,
	"Björn Töpel" <bjorn.topel@intel.com>
Subject: Re: [RFC PATCH net-next 0/6] implement kthread based napi poll
Date: Fri, 25 Sep 2020 10:15:25 -0700	[thread overview]
Message-ID: <CAEA6p_BBaSQJjTPicgjoDh17BxS9aFMb-o6ddqW3wDvPAMyrGQ@mail.gmail.com> (raw)
In-Reply-To: <CAJ8uoz30afXpbn+RXwN5BNMwrLAcW0Cn8tqP502oCLaKH0+kZg@mail.gmail.com>

On Fri, Sep 25, 2020 at 6:48 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Mon, Sep 14, 2020 at 7:26 PM Wei Wang <weiwan@google.com> wrote:
> >
> > The idea of moving the napi poll process out of softirq context to a
> > kernel thread based context is not new.
> > Paolo Abeni and Hannes Frederic Sowa has proposed patches to move napi
> > poll to kthread back in 2016. And Felix Fietkau has also proposed
> > patches of similar ideas to use workqueue to process napi poll just a
> > few weeks ago.
> >
> > The main reason we'd like to push forward with this idea is that the
> > scheduler has poor visibility into cpu cycles spent in softirq context,
> > and is not able to make optimal scheduling decisions of the user threads.
> > For example, we see in one of the application benchmark where network
> > load is high, the CPUs handling network softirqs has ~80% cpu util. And
> > user threads are still scheduled on those CPUs, despite other more idle
> > cpus available in the system. And we see very high tail latencies. In this
> > case, we have to explicitly pin away user threads from the CPUs handling
> > network softirqs to ensure good performance.
> > With napi poll moved to kthread, scheduler is in charge of scheduling both
> > the kthreads handling network load, and the user threads, and is able to
> > make better decisions. In the previous benchmark, if we do this and we
> > pin the kthreads processing napi poll to specific CPUs, scheduler is
> > able to schedule user threads away from these CPUs automatically.
> >
> > And the reason we prefer 1 kthread per napi, instead of 1 workqueue
> > entity per host, is that kthread is more configurable than workqueue,
> > and we could leverage existing tuning tools for threads, like taskset,
> > chrt, etc to tune scheduling class and cpu set, etc. Another reason is
> > if we eventually want to provide busy poll feature using kernel threads
> > for napi poll, kthread seems to be more suitable than workqueue.
> >
> > In this patch series, I revived Paolo and Hannes's patch in 2016 and
> > left them as the first 2 patches. Then there are changes proposed by
> > Felix, Jakub, Paolo and myself on top of those, with suggestions from
> > Eric Dumazet.
> >
> > In terms of performance, I ran tcp_rr tests with 1000 flows with
> > various request/response sizes, with RFS/RPS disabled, and compared
> > performance between softirq vs kthread. Host has 56 hyper threads and
> > 100Gbps nic.
> >
> >         req/resp   QPS   50%tile    90%tile    99%tile    99.9%tile
> > softirq   1B/1B   2.19M   284us       987us      1.1ms      1.56ms
> > kthread   1B/1B   2.14M   295us       987us      1.0ms      1.17ms
> >
> > softirq 5KB/5KB   1.31M   869us      1.06ms     1.28ms      2.38ms
> > kthread 5KB/5KB   1.32M   878us      1.06ms     1.26ms      1.66ms
> >
> > softirq 1MB/1MB  10.78K   84ms       166ms      234ms       294ms
> > kthread 1MB/1MB  10.83K   82ms       173ms      262ms       320ms
> >
> > I also ran one application benchmark where the user threads have more
> > work to do. We do see good amount of tail latency reductions with the
> > kthread model.
>
> I really like this RFC and would encourage you to submit it as a
> patch. Would love to see it make it into the kernel.
>

Thanks for the feedback! I am preparing an official patchset for this
and will send them out soon.

> I see the same positive effects as you when trying it out with AF_XDP
> sockets. Made some simple experiments where I sent 64-byte packets to
> a single AF_XDP socket. Have not managed to figure out how to do
> percentiles on my load generator, so this is going to be min, avg and
> max only. The application using the AF_XDP socket just performs a mac
> swap on the packet and sends it back to the load generator that then
> measures the round trip latency. The kthread is taskset to the same
> core as ksoftirqd would run on. So in each experiment, they always run
> on the same core id (which is not the same as the application).
>
> Rate 12 Mpps with 0% loss.
>               Latencies (us)         Delay Variation between packets
>           min    avg    max      avg   max
> sofirq  11.0  17.1   78.4      0.116  63.0
> kthread 11.2  17.1   35.0     0.116  20.9
>
> Rate ~58 Mpps (Line rate at 40 Gbit/s) with substantial loss
>               Latencies (us)         Delay Variation between packets
>           min    avg    max      avg   max
> softirq  87.6  194.9  282.6    0.062  25.9
> kthread  86.5  185.2  271.8    0.061  22.5
>
> For the last experiment, I also get 1.5% to 2% higher throughput with
> your kthread approach. Moreover, just from the per-second throughput
> printouts from my application, I can see that the kthread numbers are
> more stable. The softirq numbers can vary quite a lot between each
> second, around +-3%. But for the kthread approach, they are nice and
> stable. Have not examined why.
>

Thanks for sharing the results!

> One thing I noticed though, and I do not know if this is an issue, is
> that the switching between the two modes does not occur at high packet
> rates. I have to lower the packet rate to something that makes the
> core work less than 100% for it to switch between ksoftirqd to kthread
> and vice versa. They just seem too busy to switch at 100% load when
> changing the "threaded" sysfs variable.
>

I think the reason for this is when load is high, napi_poll() probably
always exhausts the predefined napi->weight. So it will keep
re-polling in the current context. The switch could only happen the
next time ___napi_schedule() is called.

> Thank you for working on this feature.
>
>
> /Magnus
>
>
> > Paolo Abeni (2):
> >   net: implement threaded-able napi poll loop support
> >   net: add sysfs attribute to control napi threaded mode
> > Felix Fietkau (1):
> >   net: extract napi poll functionality to __napi_poll()
> > Jakub Kicinski (1):
> >   net: modify kthread handler to use __napi_poll()
> > Paolo Abeni (1):
> >   net: process RPS/RFS work in kthread context
> > Wei Wang (1):
> >   net: improve napi threaded config
> >
> >  include/linux/netdevice.h |   6 ++
> >  net/core/dev.c            | 146 +++++++++++++++++++++++++++++++++++---
> >  net/core/net-sysfs.c      |  99 ++++++++++++++++++++++++++
> >  3 files changed, 242 insertions(+), 9 deletions(-)
> >
> > --
> > 2.28.0.618.gf4bc123cb7-goog
> >

  reply	other threads:[~2020-09-25 17:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-14 17:24 Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 1/6] net: implement threaded-able napi poll loop support Wei Wang
2020-09-25 19:45   ` Hannes Frederic Sowa
2020-09-25 23:50     ` Wei Wang
2020-09-26 14:22       ` Hannes Frederic Sowa
2020-09-28  8:45         ` Paolo Abeni
2020-09-28 18:13           ` Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 2/6] net: add sysfs attribute to control napi threaded mode Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 3/6] net: extract napi poll functionality to __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 4/6] net: modify kthread handler to use __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 5/6] net: process RPS/RFS work in kthread context Wei Wang
2020-09-18 22:44   ` Wei Wang
2020-09-21  8:11     ` Eric Dumazet
2020-09-14 17:24 ` [RFC PATCH net-next 6/6] net: improve napi threaded config Wei Wang
2020-09-25 13:48 ` [RFC PATCH net-next 0/6] implement kthread based napi poll Magnus Karlsson
2020-09-25 17:15   ` Wei Wang [this message]
2020-09-25 17:30     ` Eric Dumazet
2020-09-25 18:16     ` Stephen Hemminger
2020-09-25 18:23       ` Eric Dumazet
2020-09-25 19:00         ` Stephen Hemminger
2020-09-25 19:06   ` Jakub Kicinski
2020-09-28 14:07     ` Magnus Karlsson
2020-09-28 17:43 ` Eric Dumazet
2020-09-28 18:15   ` Wei Wang
2020-09-29 19:19   ` Jakub Kicinski
2020-09-29 20:16     ` Wei Wang
2020-09-29 21:48       ` Jakub Kicinski
2020-09-30  8:23         ` David Laight
2020-09-30  8:58         ` Paolo Abeni
2020-09-30 15:58           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEA6p_BBaSQJjTPicgjoDh17BxS9aFMb-o6ddqW3wDvPAMyrGQ@mail.gmail.com \
    --to=weiwan@google.com \
    --cc=bjorn.topel@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --subject='Re: [RFC PATCH net-next 0/6] implement kthread based napi poll' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).