LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Julien Desfossez <jdesfossez@digitalocean.com>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Cc: "Aubrey Li" <aubrey.intel@gmail.com>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3
Date: Fri, 26 Jul 2019 14:29:55 -0700	[thread overview]
Message-ID: <f96350c1-25a9-0564-ff46-6658e96d726c@linux.intel.com> (raw)
In-Reply-To: <20190726152101.GA27884@sinkpad>

On 7/26/19 8:21 AM, Julien Desfossez wrote:
> On 25-Jul-2019 10:30:03 PM, Aaron Lu wrote:
>>
>> I tried a different approach based on vruntime with 3 patches following.
> [...]
> 
> We have experimented with this new patchset and indeed the fairness is
> now much better. Interactive tasks with v3 were complete starving when
> there were cpu-intensive tasks running, now they can run consistently.
> With my initial test of TPC-C running in large VMs with a lot of
> background noise VMs, the results are pretty similar to v3, I will run
> more thorough tests and report the results back here.

Aaron's patch inspired me to experiment with another approach to tackle
fairness.  The root problem with v3 was we didn't account for the forced
idle time when we compare the priority of tasks between two threads.

So what I did here is to account the forced idle time in the top cfs_rq's
min_vruntime when we update the runqueue clock.  When we are comparing
between two cfs runqueues, the task in cpu getting forced idle will now
be credited with the forced idle time. The effect should be similar to
Aaron's patches. Logic is a bit simpler and we don't need to use
one of the sibling's cfs_rq min_vruntime as a time base.

In really limited testing, it seems to have balanced fairness between two
tagged cgroups.

Tim

-------patch 1----------
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Wed, 24 Jul 2019 13:58:18 -0700
Subject: [PATCH 1/2] sched: move sched fair prio comparison to fair.c

Move the priority comparison of two tasks in fair class to fair.c.
There is no functional change.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/core.c  | 21 ++-------------------
 kernel/sched/fair.c  | 21 +++++++++++++++++++++
 kernel/sched/sched.h |  1 +
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8ea87be56a1e..f78b8fdfd47c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -105,25 +105,8 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b)
 	if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
 		return !dl_time_before(a->dl.deadline, b->dl.deadline);
 
-	if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
-		u64 a_vruntime = a->se.vruntime;
-		u64 b_vruntime = b->se.vruntime;
-
-		/*
-		 * Normalize the vruntime if tasks are in different cpus.
-		 */
-		if (task_cpu(a) != task_cpu(b)) {
-			b_vruntime -= task_cfs_rq(b)->min_vruntime;
-			b_vruntime += task_cfs_rq(a)->min_vruntime;
-
-			trace_printk("(%d:%Lu,%Lu,%Lu) <> (%d:%Lu,%Lu,%Lu)\n",
-				     a->pid, a_vruntime, a->se.vruntime, task_cfs_rq(a)->min_vruntime,
-				     b->pid, b_vruntime, b->se.vruntime, task_cfs_rq(b)->min_vruntime);
-
-		}
-
-		return !((s64)(a_vruntime - b_vruntime) <= 0);
-	}
+	if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
+		return prio_less_fair(a, b);
 
 	return false;
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02bff10237d4..e289b6e1545b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -602,6 +602,27 @@ static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
 	return delta;
 }
 
+bool prio_less_fair(struct task_struct *a, struct task_struct *b)
+{
+	u64 a_vruntime = a->se.vruntime;
+	u64 b_vruntime = b->se.vruntime;
+
+	/*
+	 * Normalize the vruntime if tasks are in different cpus.
+	 */
+	if (task_cpu(a) != task_cpu(b)) {
+		b_vruntime -= task_cfs_rq(b)->min_vruntime;
+		b_vruntime += task_cfs_rq(a)->min_vruntime;
+
+		trace_printk("(%d:%Lu,%Lu,%Lu) <> (%d:%Lu,%Lu,%Lu)\n",
+			     a->pid, a_vruntime, a->se.vruntime, task_cfs_rq(a)->min_vruntime,
+			     b->pid, b_vruntime, b->se.vruntime, task_cfs_rq(b)->min_vruntime);
+
+	}
+
+	return !((s64)(a_vruntime - b_vruntime) <= 0);
+}
+
 /*
  * The idea is to set a period in which each task runs once.
  *
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e91c188a452c..bdabe7ce1152 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1015,6 +1015,7 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq)
 }
 
 extern void queue_core_balance(struct rq *rq);
+extern bool prio_less_fair(struct task_struct *a, struct task_struct *b);
 
 #else /* !CONFIG_SCHED_CORE */
 
-- 
2.20.1


--------------patch 2------------------
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 25 Jul 2019 13:09:21 -0700
Subject: [PATCH 2/2] sched: Account the forced idle time

We did not account for the forced idle time when comparing two tasks
from different SMT thread in the same core.

Account it in root cfs_rq min_vruntime when we update the rq clock.  This will
allow for fair comparison of which task has higher priority from
two different SMT.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/core.c |  6 ++++++
 kernel/sched/fair.c | 22 ++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f78b8fdfd47c..d8fa74810126 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -393,6 +393,12 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 
 	rq->clock_task += delta;
 
+#ifdef CONFIG_SCHED_CORE
+	/* Account the forced idle time by sibling */
+	if (rq->core_forceidle)
+		rq->cfs.min_vruntime += delta;
+#endif
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
 		update_irq_load_avg(rq, irq_delta + steal);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e289b6e1545b..1b2fd1271c51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -604,20 +604,34 @@ static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
 
 bool prio_less_fair(struct task_struct *a, struct task_struct *b)
 {
-	u64 a_vruntime = a->se.vruntime;
-	u64 b_vruntime = b->se.vruntime;
+	u64 a_vruntime;
+	u64 b_vruntime;
 
 	/*
 	 * Normalize the vruntime if tasks are in different cpus.
 	 */
 	if (task_cpu(a) != task_cpu(b)) {
-		b_vruntime -= task_cfs_rq(b)->min_vruntime;
-		b_vruntime += task_cfs_rq(a)->min_vruntime;
+		struct sched_entity *sea = &a->se;
+		struct sched_entity *seb = &b->se;
+
+		while (sea->parent)
+			sea = sea->parent;
+		while (seb->parent)
+			seb = seb->parent;
+
+		a_vruntime = sea->vruntime;
+		b_vruntime = seb->vruntime;
+
+		b_vruntime -= task_rq(b)->cfs.min_vruntime;
+		b_vruntime += task_rq(a)->cfs.min_vruntime;
 
 		trace_printk("(%d:%Lu,%Lu,%Lu) <> (%d:%Lu,%Lu,%Lu)\n",
 			     a->pid, a_vruntime, a->se.vruntime, task_cfs_rq(a)->min_vruntime,
 			     b->pid, b_vruntime, b->se.vruntime, task_cfs_rq(b)->min_vruntime);
 
+	} else {
+		a_vruntime = a->se.vruntime;
+		b_vruntime = b->se.vruntime;
 	}
 
 	return !((s64)(a_vruntime - b_vruntime) <= 0);
-- 
2.20.1


  reply	other threads:[~2019-07-26 21:29 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 20:36 Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 01/16] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-08-08 10:54   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:19   ` [RFC PATCH v3 01/16] " mark gross
2019-08-26 16:59     ` Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:20   ` [RFC PATCH v3 02/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 03/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-08-08 10:57   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 06/16] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] sched/fair: Expose newidle_balance() tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 07/16] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:51   ` [RFC PATCH v3 07/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 08/16] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-08-08 10:59   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 17:01   ` [RFC PATCH v3 08/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 09/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-08-26 17:14   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 10/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-05-31 11:08   ` Peter Zijlstra
2019-05-31 15:23     ` Vineeth Pillai
2019-05-29 20:36 ` [RFC PATCH v3 11/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-08-26 20:59   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 12/16] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 13/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-06-07 23:36   ` Pawan Gupta
2019-05-29 20:36 ` [RFC PATCH v3 14/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 15/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 16/16] sched: Debug bits Vineeth Remanan Pillai
2019-05-29 21:02   ` Peter Oskolkov
2019-05-30 14:04 ` [RFC PATCH v3 00/16] Core scheduling v3 Aubrey Li
2019-05-30 14:17   ` Julien Desfossez
2019-05-31  4:55     ` Aubrey Li
2019-05-31  3:01   ` Aaron Lu
2019-05-31  5:12     ` Aubrey Li
2019-05-31  6:09       ` Aaron Lu
2019-05-31  6:53         ` Aubrey Li
2019-05-31  7:44           ` Aaron Lu
2019-05-31  8:26             ` Aubrey Li
2019-05-31 21:08     ` Julien Desfossez
2019-06-06 15:26       ` Julien Desfossez
2019-06-12  1:52         ` Li, Aubrey
2019-06-12 16:06           ` Julien Desfossez
2019-06-12 16:33         ` Julien Desfossez
2019-06-13  0:03           ` Subhra Mazumdar
2019-06-13  3:22             ` Julien Desfossez
2019-06-17  2:51               ` Aubrey Li
2019-06-19 18:33                 ` Julien Desfossez
2019-07-18 10:07                   ` Aaron Lu
2019-07-18 23:27                     ` Tim Chen
2019-07-19  5:52                       ` Aaron Lu
2019-07-19 11:48                         ` Aubrey Li
2019-07-19 18:33                         ` Tim Chen
2019-07-22 10:26                     ` Aubrey Li
2019-07-22 10:43                       ` Aaron Lu
2019-07-23  2:52                         ` Aubrey Li
2019-07-25 14:30                       ` Aaron Lu
2019-07-25 14:31                         ` [RFC PATCH 1/3] wrapper for cfs_rq->min_vruntime Aaron Lu
2019-07-25 14:32                         ` [PATCH 2/3] core vruntime comparison Aaron Lu
2019-08-06 14:17                           ` Peter Zijlstra
2019-07-25 14:33                         ` [PATCH 3/3] temp hack to make tick based schedule happen Aaron Lu
2019-07-25 21:42                         ` [RFC PATCH v3 00/16] Core scheduling v3 Li, Aubrey
2019-07-26 15:21                         ` Julien Desfossez
2019-07-26 21:29                           ` Tim Chen [this message]
2019-07-31  2:42                           ` Li, Aubrey
2019-08-02 15:37                             ` Julien Desfossez
2019-08-05 15:55                               ` Tim Chen
2019-08-06  3:24                                 ` Aaron Lu
2019-08-06  6:56                                   ` Aubrey Li
2019-08-06  7:04                                     ` Aaron Lu
2019-08-06 12:24                                       ` Vineeth Remanan Pillai
2019-08-06 13:49                                         ` Aaron Lu
2019-08-06 16:14                                           ` Vineeth Remanan Pillai
2019-08-06 14:16                                         ` Peter Zijlstra
2019-08-06 15:53                                           ` Vineeth Remanan Pillai
2019-08-06 17:03                                   ` Tim Chen
2019-08-06 17:12                                     ` Peter Zijlstra
2019-08-06 21:19                                       ` Tim Chen
2019-08-08  6:47                                         ` Aaron Lu
2019-08-08 17:27                                           ` Tim Chen
2019-08-08 21:42                                             ` Tim Chen
2019-08-10 14:15                                               ` Aaron Lu
2019-08-12 15:38                                                 ` Vineeth Remanan Pillai
2019-08-13  2:24                                                   ` Aaron Lu
2019-08-08 12:55                                 ` Aaron Lu
2019-08-08 16:39                                   ` Tim Chen
2019-08-10 14:18                                     ` Aaron Lu
2019-08-05 20:09                               ` Phil Auld
2019-08-06 13:54                                 ` Aaron Lu
2019-08-06 14:17                                   ` Phil Auld
2019-08-06 14:41                                     ` Aaron Lu
2019-08-06 14:55                                       ` Phil Auld
2019-08-07  8:58                               ` Dario Faggioli
2019-08-07 17:10                                 ` Tim Chen
2019-08-15 16:09                                   ` Dario Faggioli
2019-08-16  2:33                                     ` Aaron Lu
2019-09-05  1:44                                   ` Julien Desfossez
2019-09-06 22:17                                     ` Tim Chen
2019-09-18 21:27                                     ` Tim Chen
2019-09-06 18:30                                   ` Tim Chen
2019-09-11 14:02                                     ` Aaron Lu
2019-09-11 16:19                                       ` Tim Chen
2019-09-11 16:47                                         ` Vineeth Remanan Pillai
2019-09-12 12:35                                           ` Aaron Lu
2019-09-12 17:29                                             ` Tim Chen
2019-09-13 14:15                                               ` Aaron Lu
2019-09-13 17:13                                                 ` Tim Chen
2019-09-30 11:53                                             ` Vineeth Remanan Pillai
2019-10-02 20:48                                               ` Vineeth Remanan Pillai
2019-10-10 13:54                                                 ` Aaron Lu
2019-10-10 14:29                                                   ` Vineeth Remanan Pillai
2019-10-11  7:33                                                     ` Aaron Lu
2019-10-11 11:32                                                       ` Vineeth Remanan Pillai
2019-10-11 12:01                                                         ` Aaron Lu
2019-10-11 12:10                                                           ` Vineeth Remanan Pillai
2019-10-12  3:55                                                             ` Aaron Lu
2019-10-13 12:44                                                               ` Vineeth Remanan Pillai
2019-10-14  9:57                                                                 ` Aaron Lu
2019-10-21 12:30                                                                   ` Vineeth Remanan Pillai
2019-09-12 12:04                                         ` Aaron Lu
2019-09-12 17:05                                           ` Tim Chen
2019-09-13 13:57                                             ` Aaron Lu
2019-09-12 23:12                                           ` Aubrey Li
2019-09-15 14:14                                             ` Aaron Lu
2019-09-18  1:33                                               ` Aubrey Li
2019-09-18 20:40                                                 ` Tim Chen
2019-09-18 22:16                                                   ` Aubrey Li
2019-09-30 14:36                                                     ` Vineeth Remanan Pillai
2019-10-29 20:40                                                   ` Julien Desfossez
2019-11-01 21:42                                                     ` Tim Chen
2019-10-29  9:11                                               ` Dario Faggioli
2019-10-29  9:15                                                 ` Dario Faggioli
2019-10-29  9:16                                                 ` Dario Faggioli
2019-10-29  9:17                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:19                                                 ` Dario Faggioli
2019-10-29  9:20                                                 ` Dario Faggioli
2019-10-29 20:34                                                   ` Julien Desfossez
2019-11-15 16:30                                                     ` Dario Faggioli
2019-09-25  2:40                                     ` Aubrey Li
2019-09-25 17:24                                       ` Tim Chen
2019-09-25 22:07                                         ` Aubrey Li
2019-09-30 15:22                                     ` Julien Desfossez
2019-08-27 21:14 ` Matthew Garrett
2019-08-27 21:50   ` Peter Zijlstra
2019-08-28 15:30     ` Phil Auld
2019-08-28 16:01       ` Peter Zijlstra
2019-08-28 16:37         ` Tim Chen
2019-08-29 14:30         ` Phil Auld
2019-08-29 14:38           ` Peter Zijlstra
2019-09-10 14:27             ` Julien Desfossez
2019-09-18 21:12               ` Tim Chen
2019-08-28 15:59     ` Tim Chen
2019-08-28 16:16       ` Peter Zijlstra
2019-08-27 23:24   ` Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f96350c1-25a9-0564-ff46-6658e96d726c@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aubrey.intel@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    --subject='Re: [RFC PATCH v3 00/16] Core scheduling v3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).