LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@linux.alibaba.com>
To: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: "Aubrey Li" <aubrey.intel@gmail.com>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3
Date: Thu, 18 Jul 2019 18:07:14 +0800	[thread overview]
Message-ID: <20190718100714.GA469@aaronlu> (raw)
In-Reply-To: <20190619183302.GA6775@sinkpad>

On Wed, Jun 19, 2019 at 02:33:02PM -0400, Julien Desfossez wrote:
> On 17-Jun-2019 10:51:27 AM, Aubrey Li wrote:
> > The result looks still unfair, and particularly, the variance is too high,
> 
> I just want to confirm that I am also seeing the same issue with a
> similar setup. I also tried with the priority boost fix we previously
> posted, the results are slightly better, but we are still seeing a very
> high variance.
> 
> On average, the results I get for 10 30-seconds runs are still much
> better than nosmt (both sysbench pinned on the same sibling) for the
> memory benchmark, and pretty similar for the CPU benchmark, but the high
> variance between runs is indeed concerning.

I was thinking to use util_avg signal to decide which task win in
__prio_less() in the cross cpu case. The reason util_avg is chosen
is because it represents how cpu intensive the task is, so the end
result is, less cpu intensive task will preempt more cpu intensive
task.

Here is the test I have done to see how util_avg works
(on a single node, 16 cores, 32 cpus vm):
1 Start tmux and then start 3 windows with each running bash;
2 Place two shells into two different cgroups and both have cpu.tag set;
3 Switch to the 1st tmux window, start
  will-it-scale/page_fault1_processes -t 16 -s 30
  in the first tagged shell;
4 Switch to the 2nd tmux window;
5 Start
  will-it-scale/page_fault1_processes -t 16 -s 30
  in the 2nd tagged shell;
6 Switch to the 3rd tmux window;
7 Do some simple things in the 3rd untagged shell like ls to see if
  untagged task is able to proceed;
8 Wait for the two page_fault workloads to finish.

With v3 here, I can not do step 4 and later steps, i.e. the 16
page_fault1 processes started in step 3 will occupy all 16 cores and
other tasks do not have a chance to run, including tmux, which made
switching tmux window impossible.

With the below patch on top of v3 that makes use of util_avg to decide
which task win, I can do all 8 steps and the final scores of the 2
workloads are: 1796191 and 2199586. The score number are not close,
suggesting some unfairness, but I can finish the test now...

Here is the diff(consider it as a POC):

---
 kernel/sched/core.c  | 35 ++---------------------------------
 kernel/sched/fair.c  | 36 ++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  2 ++
 3 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26fea68f7f54..7557a7bbb481 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -105,25 +105,8 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b)
 	if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
 		return !dl_time_before(a->dl.deadline, b->dl.deadline);
 
-	if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
-		u64 a_vruntime = a->se.vruntime;
-		u64 b_vruntime = b->se.vruntime;
-
-		/*
-		 * Normalize the vruntime if tasks are in different cpus.
-		 */
-		if (task_cpu(a) != task_cpu(b)) {
-			b_vruntime -= task_cfs_rq(b)->min_vruntime;
-			b_vruntime += task_cfs_rq(a)->min_vruntime;
-
-			trace_printk("(%d:%Lu,%Lu,%Lu) <> (%d:%Lu,%Lu,%Lu)\n",
-				     a->pid, a_vruntime, a->se.vruntime, task_cfs_rq(a)->min_vruntime,
-				     b->pid, b_vruntime, b->se.vruntime, task_cfs_rq(b)->min_vruntime);
-
-		}
-
-		return !((s64)(a_vruntime - b_vruntime) <= 0);
-	}
+	if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
+		return cfs_prio_less(a, b);
 
 	return false;
 }
@@ -3663,20 +3646,6 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
 	if (!class_pick)
 		return NULL;
 
-	if (!cookie) {
-		/*
-		 * If class_pick is tagged, return it only if it has
-		 * higher priority than max.
-		 */
-		bool max_is_higher = sched_feat(CORESCHED_STALL_FIX) ?
-				     max && !prio_less(max, class_pick) :
-				     max && prio_less(class_pick, max);
-		if (class_pick->core_cookie && max_is_higher)
-			return idle_sched_class.pick_task(rq);
-
-		return class_pick;
-	}
-
 	/*
 	 * If class_pick is idle or matches cookie, return early.
 	 */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26d29126d6a5..06fb00689db1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10740,3 +10740,39 @@ __init void init_sched_fair_class(void)
 #endif /* SMP */
 
 }
+
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
+{
+	struct sched_entity *sea = &a->se;
+	struct sched_entity *seb = &b->se;
+	bool samecore = task_cpu(a) == task_cpu(b);
+	struct task_struct *p;
+	s64 delta;
+
+	if (samecore) {
+		/* vruntime is per cfs_rq */
+		while (!is_same_group(sea, seb)) {
+			int sea_depth = sea->depth;
+			int seb_depth = seb->depth;
+
+			if (sea_depth >= seb_depth)
+				sea = parent_entity(sea);
+			if (sea_depth <= seb_depth)
+				seb = parent_entity(seb);
+		}
+
+		delta = (s64)(sea->vruntime - seb->vruntime);
+	}
+
+	/* across cpu: use util_avg to decide which task to run */
+	delta = (s64)(sea->avg.util_avg - seb->avg.util_avg);
+
+	p = delta > 0 ? b : a;
+	trace_printk("picked %s/%d %s: %Lu %Lu %Ld\n", p->comm, p->pid,
+			samecore ? "vruntime" : "util_avg",
+			samecore ? sea->vruntime : sea->avg.util_avg,
+			samecore ? seb->vruntime : seb->avg.util_avg,
+			delta);
+
+	return delta > 0;
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e91c188a452c..02a6d71704f0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2454,3 +2454,5 @@ static inline bool sched_energy_enabled(void)
 static inline bool sched_energy_enabled(void) { return false; }
 
 #endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+
+bool cfs_prio_less(struct task_struct *a, struct task_struct *b);
-- 
2.19.1.3.ge56e4f7


  reply	other threads:[~2019-07-18 10:07 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 20:36 Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 01/16] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-08-08 10:54   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:19   ` [RFC PATCH v3 01/16] " mark gross
2019-08-26 16:59     ` Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:20   ` [RFC PATCH v3 02/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 03/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-08-08 10:57   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 06/16] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] sched/fair: Expose newidle_balance() tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 07/16] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:51   ` [RFC PATCH v3 07/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 08/16] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-08-08 10:59   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 17:01   ` [RFC PATCH v3 08/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 09/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-08-26 17:14   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 10/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-05-31 11:08   ` Peter Zijlstra
2019-05-31 15:23     ` Vineeth Pillai
2019-05-29 20:36 ` [RFC PATCH v3 11/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-08-26 20:59   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 12/16] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 13/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-06-07 23:36   ` Pawan Gupta
2019-05-29 20:36 ` [RFC PATCH v3 14/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 15/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 16/16] sched: Debug bits Vineeth Remanan Pillai
2019-05-29 21:02   ` Peter Oskolkov
2019-05-30 14:04 ` [RFC PATCH v3 00/16] Core scheduling v3 Aubrey Li
2019-05-30 14:17   ` Julien Desfossez
2019-05-31  4:55     ` Aubrey Li
2019-05-31  3:01   ` Aaron Lu
2019-05-31  5:12     ` Aubrey Li
2019-05-31  6:09       ` Aaron Lu
2019-05-31  6:53         ` Aubrey Li
2019-05-31  7:44           ` Aaron Lu
2019-05-31  8:26             ` Aubrey Li
2019-05-31 21:08     ` Julien Desfossez
2019-06-06 15:26       ` Julien Desfossez
2019-06-12  1:52         ` Li, Aubrey
2019-06-12 16:06           ` Julien Desfossez
2019-06-12 16:33         ` Julien Desfossez
2019-06-13  0:03           ` Subhra Mazumdar
2019-06-13  3:22             ` Julien Desfossez
2019-06-17  2:51               ` Aubrey Li
2019-06-19 18:33                 ` Julien Desfossez
2019-07-18 10:07                   ` Aaron Lu [this message]
2019-07-18 23:27                     ` Tim Chen
2019-07-19  5:52                       ` Aaron Lu
2019-07-19 11:48                         ` Aubrey Li
2019-07-19 18:33                         ` Tim Chen
2019-07-22 10:26                     ` Aubrey Li
2019-07-22 10:43                       ` Aaron Lu
2019-07-23  2:52                         ` Aubrey Li
2019-07-25 14:30                       ` Aaron Lu
2019-07-25 14:31                         ` [RFC PATCH 1/3] wrapper for cfs_rq->min_vruntime Aaron Lu
2019-07-25 14:32                         ` [PATCH 2/3] core vruntime comparison Aaron Lu
2019-08-06 14:17                           ` Peter Zijlstra
2019-07-25 14:33                         ` [PATCH 3/3] temp hack to make tick based schedule happen Aaron Lu
2019-07-25 21:42                         ` [RFC PATCH v3 00/16] Core scheduling v3 Li, Aubrey
2019-07-26 15:21                         ` Julien Desfossez
2019-07-26 21:29                           ` Tim Chen
2019-07-31  2:42                           ` Li, Aubrey
2019-08-02 15:37                             ` Julien Desfossez
2019-08-05 15:55                               ` Tim Chen
2019-08-06  3:24                                 ` Aaron Lu
2019-08-06  6:56                                   ` Aubrey Li
2019-08-06  7:04                                     ` Aaron Lu
2019-08-06 12:24                                       ` Vineeth Remanan Pillai
2019-08-06 13:49                                         ` Aaron Lu
2019-08-06 16:14                                           ` Vineeth Remanan Pillai
2019-08-06 14:16                                         ` Peter Zijlstra
2019-08-06 15:53                                           ` Vineeth Remanan Pillai
2019-08-06 17:03                                   ` Tim Chen
2019-08-06 17:12                                     ` Peter Zijlstra
2019-08-06 21:19                                       ` Tim Chen
2019-08-08  6:47                                         ` Aaron Lu
2019-08-08 17:27                                           ` Tim Chen
2019-08-08 21:42                                             ` Tim Chen
2019-08-10 14:15                                               ` Aaron Lu
2019-08-12 15:38                                                 ` Vineeth Remanan Pillai
2019-08-13  2:24                                                   ` Aaron Lu
2019-08-08 12:55                                 ` Aaron Lu
2019-08-08 16:39                                   ` Tim Chen
2019-08-10 14:18                                     ` Aaron Lu
2019-08-05 20:09                               ` Phil Auld
2019-08-06 13:54                                 ` Aaron Lu
2019-08-06 14:17                                   ` Phil Auld
2019-08-06 14:41                                     ` Aaron Lu
2019-08-06 14:55                                       ` Phil Auld
2019-08-07  8:58                               ` Dario Faggioli
2019-08-07 17:10                                 ` Tim Chen
2019-08-15 16:09                                   ` Dario Faggioli
2019-08-16  2:33                                     ` Aaron Lu
2019-09-05  1:44                                   ` Julien Desfossez
2019-09-06 22:17                                     ` Tim Chen
2019-09-18 21:27                                     ` Tim Chen
2019-09-06 18:30                                   ` Tim Chen
2019-09-11 14:02                                     ` Aaron Lu
2019-09-11 16:19                                       ` Tim Chen
2019-09-11 16:47                                         ` Vineeth Remanan Pillai
2019-09-12 12:35                                           ` Aaron Lu
2019-09-12 17:29                                             ` Tim Chen
2019-09-13 14:15                                               ` Aaron Lu
2019-09-13 17:13                                                 ` Tim Chen
2019-09-30 11:53                                             ` Vineeth Remanan Pillai
2019-10-02 20:48                                               ` Vineeth Remanan Pillai
2019-10-10 13:54                                                 ` Aaron Lu
2019-10-10 14:29                                                   ` Vineeth Remanan Pillai
2019-10-11  7:33                                                     ` Aaron Lu
2019-10-11 11:32                                                       ` Vineeth Remanan Pillai
2019-10-11 12:01                                                         ` Aaron Lu
2019-10-11 12:10                                                           ` Vineeth Remanan Pillai
2019-10-12  3:55                                                             ` Aaron Lu
2019-10-13 12:44                                                               ` Vineeth Remanan Pillai
2019-10-14  9:57                                                                 ` Aaron Lu
2019-10-21 12:30                                                                   ` Vineeth Remanan Pillai
2019-09-12 12:04                                         ` Aaron Lu
2019-09-12 17:05                                           ` Tim Chen
2019-09-13 13:57                                             ` Aaron Lu
2019-09-12 23:12                                           ` Aubrey Li
2019-09-15 14:14                                             ` Aaron Lu
2019-09-18  1:33                                               ` Aubrey Li
2019-09-18 20:40                                                 ` Tim Chen
2019-09-18 22:16                                                   ` Aubrey Li
2019-09-30 14:36                                                     ` Vineeth Remanan Pillai
2019-10-29 20:40                                                   ` Julien Desfossez
2019-11-01 21:42                                                     ` Tim Chen
2019-10-29  9:11                                               ` Dario Faggioli
2019-10-29  9:15                                                 ` Dario Faggioli
2019-10-29  9:16                                                 ` Dario Faggioli
2019-10-29  9:17                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:19                                                 ` Dario Faggioli
2019-10-29  9:20                                                 ` Dario Faggioli
2019-10-29 20:34                                                   ` Julien Desfossez
2019-11-15 16:30                                                     ` Dario Faggioli
2019-09-25  2:40                                     ` Aubrey Li
2019-09-25 17:24                                       ` Tim Chen
2019-09-25 22:07                                         ` Aubrey Li
2019-09-30 15:22                                     ` Julien Desfossez
2019-08-27 21:14 ` Matthew Garrett
2019-08-27 21:50   ` Peter Zijlstra
2019-08-28 15:30     ` Phil Auld
2019-08-28 16:01       ` Peter Zijlstra
2019-08-28 16:37         ` Tim Chen
2019-08-29 14:30         ` Phil Auld
2019-08-29 14:38           ` Peter Zijlstra
2019-09-10 14:27             ` Julien Desfossez
2019-09-18 21:12               ` Tim Chen
2019-08-28 15:59     ` Tim Chen
2019-08-28 16:16       ` Peter Zijlstra
2019-08-27 23:24   ` Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190718100714.GA469@aaronlu \
    --to=aaron.lu@linux.alibaba.com \
    --cc=aubrey.intel@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    --subject='Re: [RFC PATCH v3 00/16] Core scheduling v3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).