LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Li, Aubrey" <aubrey.li@linux.intel.com>
To: Julien Desfossez <jdesfossez@digitalocean.com>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Cc: "Aubrey Li" <aubrey.intel@gmail.com>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3
Date: Wed, 12 Jun 2019 09:52:51 +0800	[thread overview]
Message-ID: <4bd9a132-bbdb-ec45-331f-829df07a7376@linux.intel.com> (raw)
In-Reply-To: <20190606152637.GA5703@sinkpad>

On 2019/6/6 23:26, Julien Desfossez wrote:
> As mentioned above, we have come up with a fix for the long starvation
> of untagged interactive threads competing for the same core with tagged
> threads at the same priority. The idea is to detect the stall and boost
> the stalling threads priority so that it gets a chance next time.
> Boosting is done by a new counter(min_vruntime_boost) for every task
> which we subtract from vruntime before comparison. The new logic looks
> like this:
> 
> If we see that normalized runtimes are equal, we check the min_vruntimes
> of their runqueues and give a chance for the task in the runqueue with
> less min_vruntime. That will help it to progress its vruntime. While
> doing this, we boost the priority of the task in the sibling so that, we
> don’t starve the task in the sibling until the min_vruntime of this
> runqueue catches up.
> 
> If min_vruntimes are also equal, we do as before and consider the task
> ‘a’ of higher priority. Here we boost the task ‘b’ so that it gets to
> run next time.
> 
> The min_vruntime_boost is reset to zero once the task in on cpu. So only
> waiting tasks will have a non-zero value if it is starved while matching
> a task on the other sibling.
> 
> The attached patch has a sched_feature to enable the above feature so
> that you can compare the results with and without this feature.
> 
> What we observe with this patch is that it helps for untagged
> interactive tasks and fairness in general, but this increases the
> overhead of core scheduling when there is contention for the CPU with
> tasks of varying cpu usage. The general trend we see is that if there is
> a cpu intensive thread and multiple relatively idle threads in different
> tags, the cpu intensive tasks continuously yields to be fair to the
> relatively idle threads when it becomes runnable. And if the relatively
> idle threads make up for most of the tasks in a system and are tagged,
> the cpu intensive tasks sees a considerable drop in performance.
> 
> If you have any feedback or creative ideas to help improve, let us
> know !

The data on my side looks good with CORESCHED_STALL_FIX = true.

Environment setup
--------------------------
Skylake 8170 server, 2 numa nodes, 52 cores, 104 CPUs (HT on)
cgroup1 workload, sysbench (CPU mode, non AVX workload)
cgroup2 workload, gemmbench (AVX512 workload)

sysbench throughput result:
.--------------------------------------------------------------------------------------------------------------------------------------.
|NA/AVX	vanilla-SMT	[std% / sem%]	  cpu% |coresched-SMT	[std% / sem%]	  +/-	  cpu% |  no-SMT [std% / sem%]	 +/-	  cpu% |
|--------------------------------------------------------------------------------------------------------------------------------------|
|  1/1	      490.8	[ 0.1%/ 0.0%]	  1.9% |        492.6	[ 0.1%/ 0.0%]	  0.4%	  1.9% |   489.5 [ 0.1%/ 0.0%]  -0.3%	  3.9% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  2/2	      975.0	[ 0.6%/ 0.1%]	  3.9% |        970.4	[ 0.4%/ 0.0%]	 -0.5%	  3.9% |   975.6 [ 0.2%/ 0.0%]   0.1%	  7.7% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  4/4	     1856.9	[ 0.2%/ 0.0%]	  7.8% |       1854.5	[ 0.3%/ 0.0%]	 -0.1%	  7.8% |  1849.4 [ 0.8%/ 0.1%]  -0.4%	 14.8% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  8/8	     3622.8	[ 0.2%/ 0.0%]	 14.6% |       3618.3	[ 0.1%/ 0.0%]	 -0.1%	 14.7% |  3626.6 [ 0.4%/ 0.0%]   0.1%	 30.1% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 16/16	     6976.7	[ 0.2%/ 0.0%]	 30.1% |       6959.3	[ 0.3%/ 0.0%]	 -0.2%	 30.1% |  6964.4 [ 0.9%/ 0.1%]  -0.2%	 60.1% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 32/32	    10347.7	[ 3.8%/ 0.4%]	 60.1% |      11525.4	[ 2.8%/ 0.3%]	 11.4%	 59.5% |  9810.5 [ 9.4%/ 0.8%]  -5.2%	 97.7% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 64/64	    15284.9	[ 9.0%/ 0.9%]	 98.1% |      17022.1	[ 4.5%/ 0.5%]	 11.4%	 98.2% |  9989.7 [19.3%/ 1.1%] -34.6%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|128/128    16211.3	[18.9%/ 1.9%]	100.0% |      16507.9	[ 6.1%/ 0.6%]	  1.8%	 99.8% | 10379.0 [12.6%/ 0.8%] -36.0%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|256/256    16667.1	[ 3.1%/ 0.3%]	100.0% |      16499.1	[ 3.2%/ 0.3%]	 -1.0%	100.0% | 10540.9 [16.2%/ 1.0%] -36.8%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'

sysbench latency result:
(The story we care about latency is that some customers reported their
latency critical job is affected when co-locating a deep learning job
(AVX512 task) onto the same core, because when a core executes AVX512
instructions, the core automatically reduces its frequency. This can
lead to a significant overall performance loss for a non-AVX512 job
on the same core.

And now we have core cookie match mechanism, so if we put AVX512 task
and non AVX512 task into the different cgroups, they are supposed not
to be co-located. That's why we saw the improvements of 32/32 and 64/64
cases.) 

.--------------------------------------------------------------------------------------------------------------------------------------.
|NA/AVX	vanilla-SMT	[std% / sem%]	  cpu% |coresched-SMT	[std% / sem%]	  +/-	  cpu% |  no-SMT [std% / sem%]	 +/-	  cpu% |
|--------------------------------------------------------------------------------------------------------------------------------------|
|  1/1	        2.1	[ 0.6%/ 0.1%]	  1.9% |          2.0	[ 0.2%/ 0.0%]	  3.8%	  1.9% |     2.1 [ 0.7%/ 0.1%]  -0.8%	  3.9% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  2/2	        2.1	[ 0.7%/ 0.1%]	  3.9% |          2.1	[ 0.3%/ 0.0%]	  0.2%	  3.9% |     2.1 [ 0.6%/ 0.1%]   0.5%	  7.7% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  4/4	        2.2	[ 0.6%/ 0.1%]	  7.8% |          2.2	[ 0.4%/ 0.0%]	 -0.2%	  7.8% |     2.2 [ 0.2%/ 0.0%]  -0.3%	 14.8% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|  8/8	        2.2	[ 0.4%/ 0.0%]	 14.6% |          2.2	[ 0.0%/ 0.0%]	  0.1%	 14.7% |     2.2 [ 0.0%/ 0.0%]   0.1%	 30.1% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 16/16	        2.4	[ 1.6%/ 0.2%]	 30.1% |          2.4	[ 1.6%/ 0.2%]	 -0.9%	 30.1% |     2.4 [ 1.9%/ 0.2%]  -0.3%	 60.1% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 32/32	        4.9	[ 6.2%/ 0.6%]	 60.1% |          3.1	[ 5.0%/ 0.5%]	 36.6%	 59.5% |     6.7 [17.3%/ 3.7%] -34.5%	 97.7% |
'--------------------------------------------------------------------------------------------------------------------------------------'
| 64/64	        9.4	[28.3%/ 2.8%]	 98.1% |          3.5	[25.6%/ 2.6%]	 62.4%	 98.2% |    18.5 [ 9.5%/ 5.0%] -97.9%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|128/128       21.3	[10.1%/ 1.0%]	100.0% |         24.8	[ 8.1%/ 0.8%]	-16.1%	 99.8% |    34.5 [ 4.9%/ 0.7%] -62.0%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'
|256/256       35.5	[ 7.8%/ 0.8%]	100.0% |         37.3	[ 5.4%/ 0.5%]	 -5.1%	100.0% |    40.8 [ 5.9%/ 0.6%] -15.0%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'

Note:
----
64/64:		64 sysbench threads(in one cgroup) and 64 gemmbench threads(in other cgroup) run simultaneously.
Vanilla-SMT:	baseline with HT on
coresched-SMT:	core scheduling enabled
no-SMT:		HT off thru /sys/devices/system/cpu/smt/control
std%:		standard deviation
sem%:		standard error of the mean
±:		improvement/regression against baseline
cpu%:		derived by vmstat.idle and vmstat.iowait

  reply	other threads:[~2019-06-12  1:52 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 20:36 Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 01/16] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-08-08 10:54   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:19   ` [RFC PATCH v3 01/16] " mark gross
2019-08-26 16:59     ` Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:20   ` [RFC PATCH v3 02/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 03/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-08-08 10:57   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 06/16] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] sched/fair: Expose newidle_balance() tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 07/16] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:51   ` [RFC PATCH v3 07/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 08/16] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-08-08 10:59   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 17:01   ` [RFC PATCH v3 08/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 09/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-08-26 17:14   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 10/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-05-31 11:08   ` Peter Zijlstra
2019-05-31 15:23     ` Vineeth Pillai
2019-05-29 20:36 ` [RFC PATCH v3 11/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-08-26 20:59   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 12/16] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 13/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-06-07 23:36   ` Pawan Gupta
2019-05-29 20:36 ` [RFC PATCH v3 14/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 15/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 16/16] sched: Debug bits Vineeth Remanan Pillai
2019-05-29 21:02   ` Peter Oskolkov
2019-05-30 14:04 ` [RFC PATCH v3 00/16] Core scheduling v3 Aubrey Li
2019-05-30 14:17   ` Julien Desfossez
2019-05-31  4:55     ` Aubrey Li
2019-05-31  3:01   ` Aaron Lu
2019-05-31  5:12     ` Aubrey Li
2019-05-31  6:09       ` Aaron Lu
2019-05-31  6:53         ` Aubrey Li
2019-05-31  7:44           ` Aaron Lu
2019-05-31  8:26             ` Aubrey Li
2019-05-31 21:08     ` Julien Desfossez
2019-06-06 15:26       ` Julien Desfossez
2019-06-12  1:52         ` Li, Aubrey [this message]
2019-06-12 16:06           ` Julien Desfossez
2019-06-12 16:33         ` Julien Desfossez
2019-06-13  0:03           ` Subhra Mazumdar
2019-06-13  3:22             ` Julien Desfossez
2019-06-17  2:51               ` Aubrey Li
2019-06-19 18:33                 ` Julien Desfossez
2019-07-18 10:07                   ` Aaron Lu
2019-07-18 23:27                     ` Tim Chen
2019-07-19  5:52                       ` Aaron Lu
2019-07-19 11:48                         ` Aubrey Li
2019-07-19 18:33                         ` Tim Chen
2019-07-22 10:26                     ` Aubrey Li
2019-07-22 10:43                       ` Aaron Lu
2019-07-23  2:52                         ` Aubrey Li
2019-07-25 14:30                       ` Aaron Lu
2019-07-25 14:31                         ` [RFC PATCH 1/3] wrapper for cfs_rq->min_vruntime Aaron Lu
2019-07-25 14:32                         ` [PATCH 2/3] core vruntime comparison Aaron Lu
2019-08-06 14:17                           ` Peter Zijlstra
2019-07-25 14:33                         ` [PATCH 3/3] temp hack to make tick based schedule happen Aaron Lu
2019-07-25 21:42                         ` [RFC PATCH v3 00/16] Core scheduling v3 Li, Aubrey
2019-07-26 15:21                         ` Julien Desfossez
2019-07-26 21:29                           ` Tim Chen
2019-07-31  2:42                           ` Li, Aubrey
2019-08-02 15:37                             ` Julien Desfossez
2019-08-05 15:55                               ` Tim Chen
2019-08-06  3:24                                 ` Aaron Lu
2019-08-06  6:56                                   ` Aubrey Li
2019-08-06  7:04                                     ` Aaron Lu
2019-08-06 12:24                                       ` Vineeth Remanan Pillai
2019-08-06 13:49                                         ` Aaron Lu
2019-08-06 16:14                                           ` Vineeth Remanan Pillai
2019-08-06 14:16                                         ` Peter Zijlstra
2019-08-06 15:53                                           ` Vineeth Remanan Pillai
2019-08-06 17:03                                   ` Tim Chen
2019-08-06 17:12                                     ` Peter Zijlstra
2019-08-06 21:19                                       ` Tim Chen
2019-08-08  6:47                                         ` Aaron Lu
2019-08-08 17:27                                           ` Tim Chen
2019-08-08 21:42                                             ` Tim Chen
2019-08-10 14:15                                               ` Aaron Lu
2019-08-12 15:38                                                 ` Vineeth Remanan Pillai
2019-08-13  2:24                                                   ` Aaron Lu
2019-08-08 12:55                                 ` Aaron Lu
2019-08-08 16:39                                   ` Tim Chen
2019-08-10 14:18                                     ` Aaron Lu
2019-08-05 20:09                               ` Phil Auld
2019-08-06 13:54                                 ` Aaron Lu
2019-08-06 14:17                                   ` Phil Auld
2019-08-06 14:41                                     ` Aaron Lu
2019-08-06 14:55                                       ` Phil Auld
2019-08-07  8:58                               ` Dario Faggioli
2019-08-07 17:10                                 ` Tim Chen
2019-08-15 16:09                                   ` Dario Faggioli
2019-08-16  2:33                                     ` Aaron Lu
2019-09-05  1:44                                   ` Julien Desfossez
2019-09-06 22:17                                     ` Tim Chen
2019-09-18 21:27                                     ` Tim Chen
2019-09-06 18:30                                   ` Tim Chen
2019-09-11 14:02                                     ` Aaron Lu
2019-09-11 16:19                                       ` Tim Chen
2019-09-11 16:47                                         ` Vineeth Remanan Pillai
2019-09-12 12:35                                           ` Aaron Lu
2019-09-12 17:29                                             ` Tim Chen
2019-09-13 14:15                                               ` Aaron Lu
2019-09-13 17:13                                                 ` Tim Chen
2019-09-30 11:53                                             ` Vineeth Remanan Pillai
2019-10-02 20:48                                               ` Vineeth Remanan Pillai
2019-10-10 13:54                                                 ` Aaron Lu
2019-10-10 14:29                                                   ` Vineeth Remanan Pillai
2019-10-11  7:33                                                     ` Aaron Lu
2019-10-11 11:32                                                       ` Vineeth Remanan Pillai
2019-10-11 12:01                                                         ` Aaron Lu
2019-10-11 12:10                                                           ` Vineeth Remanan Pillai
2019-10-12  3:55                                                             ` Aaron Lu
2019-10-13 12:44                                                               ` Vineeth Remanan Pillai
2019-10-14  9:57                                                                 ` Aaron Lu
2019-10-21 12:30                                                                   ` Vineeth Remanan Pillai
2019-09-12 12:04                                         ` Aaron Lu
2019-09-12 17:05                                           ` Tim Chen
2019-09-13 13:57                                             ` Aaron Lu
2019-09-12 23:12                                           ` Aubrey Li
2019-09-15 14:14                                             ` Aaron Lu
2019-09-18  1:33                                               ` Aubrey Li
2019-09-18 20:40                                                 ` Tim Chen
2019-09-18 22:16                                                   ` Aubrey Li
2019-09-30 14:36                                                     ` Vineeth Remanan Pillai
2019-10-29 20:40                                                   ` Julien Desfossez
2019-11-01 21:42                                                     ` Tim Chen
2019-10-29  9:11                                               ` Dario Faggioli
2019-10-29  9:15                                                 ` Dario Faggioli
2019-10-29  9:16                                                 ` Dario Faggioli
2019-10-29  9:17                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:19                                                 ` Dario Faggioli
2019-10-29  9:20                                                 ` Dario Faggioli
2019-10-29 20:34                                                   ` Julien Desfossez
2019-11-15 16:30                                                     ` Dario Faggioli
2019-09-25  2:40                                     ` Aubrey Li
2019-09-25 17:24                                       ` Tim Chen
2019-09-25 22:07                                         ` Aubrey Li
2019-09-30 15:22                                     ` Julien Desfossez
2019-08-27 21:14 ` Matthew Garrett
2019-08-27 21:50   ` Peter Zijlstra
2019-08-28 15:30     ` Phil Auld
2019-08-28 16:01       ` Peter Zijlstra
2019-08-28 16:37         ` Tim Chen
2019-08-29 14:30         ` Phil Auld
2019-08-29 14:38           ` Peter Zijlstra
2019-09-10 14:27             ` Julien Desfossez
2019-09-18 21:12               ` Tim Chen
2019-08-28 15:59     ` Tim Chen
2019-08-28 16:16       ` Peter Zijlstra
2019-08-27 23:24   ` Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bd9a132-bbdb-ec45-331f-829df07a7376@linux.intel.com \
    --to=aubrey.li@linux.intel.com \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aubrey.intel@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    --subject='Re: [RFC PATCH v3 00/16] Core scheduling v3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).