LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Aubrey Li <aubrey.intel@gmail.com>
To: Aaron Lu <aaron.lu@linux.alibaba.com>
Cc: "Julien Desfossez" <jdesfossez@digitalocean.com>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3
Date: Mon, 22 Jul 2019 18:26:46 +0800	[thread overview]
Message-ID: <CAERHkrtvLKxrpvfw04urAuougsYOWnNw4-H1vUDFx27Dvy0=Ww@mail.gmail.com> (raw)
In-Reply-To: <20190718100714.GA469@aaronlu>

On Thu, Jul 18, 2019 at 6:07 PM Aaron Lu <aaron.lu@linux.alibaba.com> wrote:
>
> On Wed, Jun 19, 2019 at 02:33:02PM -0400, Julien Desfossez wrote:
> > On 17-Jun-2019 10:51:27 AM, Aubrey Li wrote:
> > > The result looks still unfair, and particularly, the variance is too high,
> >
> > I just want to confirm that I am also seeing the same issue with a
> > similar setup. I also tried with the priority boost fix we previously
> > posted, the results are slightly better, but we are still seeing a very
> > high variance.
> >
> > On average, the results I get for 10 30-seconds runs are still much
> > better than nosmt (both sysbench pinned on the same sibling) for the
> > memory benchmark, and pretty similar for the CPU benchmark, but the high
> > variance between runs is indeed concerning.
>
> I was thinking to use util_avg signal to decide which task win in
> __prio_less() in the cross cpu case. The reason util_avg is chosen
> is because it represents how cpu intensive the task is, so the end
> result is, less cpu intensive task will preempt more cpu intensive
> task.
>
> Here is the test I have done to see how util_avg works
> (on a single node, 16 cores, 32 cpus vm):
> 1 Start tmux and then start 3 windows with each running bash;
> 2 Place two shells into two different cgroups and both have cpu.tag set;
> 3 Switch to the 1st tmux window, start
>   will-it-scale/page_fault1_processes -t 16 -s 30
>   in the first tagged shell;
> 4 Switch to the 2nd tmux window;
> 5 Start
>   will-it-scale/page_fault1_processes -t 16 -s 30
>   in the 2nd tagged shell;
> 6 Switch to the 3rd tmux window;
> 7 Do some simple things in the 3rd untagged shell like ls to see if
>   untagged task is able to proceed;
> 8 Wait for the two page_fault workloads to finish.
>
> With v3 here, I can not do step 4 and later steps, i.e. the 16
> page_fault1 processes started in step 3 will occupy all 16 cores and
> other tasks do not have a chance to run, including tmux, which made
> switching tmux window impossible.
>
> With the below patch on top of v3 that makes use of util_avg to decide
> which task win, I can do all 8 steps and the final scores of the 2
> workloads are: 1796191 and 2199586. The score number are not close,
> suggesting some unfairness, but I can finish the test now...
>
> Here is the diff(consider it as a POC):
>
> ---
>  kernel/sched/core.c  | 35 ++---------------------------------
>  kernel/sched/fair.c  | 36 ++++++++++++++++++++++++++++++++++++
>  kernel/sched/sched.h |  2 ++
>  3 files changed, 40 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 26fea68f7f54..7557a7bbb481 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -105,25 +105,8 @@ static inline bool prio_less(struct task_struct *a, struct task_struct *b)
>         if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
>                 return !dl_time_before(a->dl.deadline, b->dl.deadline);
>
> -       if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
> -               u64 a_vruntime = a->se.vruntime;
> -               u64 b_vruntime = b->se.vruntime;
> -
> -               /*
> -                * Normalize the vruntime if tasks are in different cpus.
> -                */
> -               if (task_cpu(a) != task_cpu(b)) {
> -                       b_vruntime -= task_cfs_rq(b)->min_vruntime;
> -                       b_vruntime += task_cfs_rq(a)->min_vruntime;
> -
> -                       trace_printk("(%d:%Lu,%Lu,%Lu) <> (%d:%Lu,%Lu,%Lu)\n",
> -                                    a->pid, a_vruntime, a->se.vruntime, task_cfs_rq(a)->min_vruntime,
> -                                    b->pid, b_vruntime, b->se.vruntime, task_cfs_rq(b)->min_vruntime);
> -
> -               }
> -
> -               return !((s64)(a_vruntime - b_vruntime) <= 0);
> -       }
> +       if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
> +               return cfs_prio_less(a, b);
>
>         return false;
>  }
> @@ -3663,20 +3646,6 @@ pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *ma
>         if (!class_pick)
>                 return NULL;
>
> -       if (!cookie) {
> -               /*
> -                * If class_pick is tagged, return it only if it has
> -                * higher priority than max.
> -                */
> -               bool max_is_higher = sched_feat(CORESCHED_STALL_FIX) ?
> -                                    max && !prio_less(max, class_pick) :
> -                                    max && prio_less(class_pick, max);
> -               if (class_pick->core_cookie && max_is_higher)
> -                       return idle_sched_class.pick_task(rq);
> -
> -               return class_pick;
> -       }
> -
>         /*
>          * If class_pick is idle or matches cookie, return early.
>          */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 26d29126d6a5..06fb00689db1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10740,3 +10740,39 @@ __init void init_sched_fair_class(void)
>  #endif /* SMP */
>
>  }
> +
> +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> +{
> +       struct sched_entity *sea = &a->se;
> +       struct sched_entity *seb = &b->se;
> +       bool samecore = task_cpu(a) == task_cpu(b);
> +       struct task_struct *p;
> +       s64 delta;
> +
> +       if (samecore) {
> +               /* vruntime is per cfs_rq */
> +               while (!is_same_group(sea, seb)) {
> +                       int sea_depth = sea->depth;
> +                       int seb_depth = seb->depth;
> +
> +                       if (sea_depth >= seb_depth)
> +                               sea = parent_entity(sea);
> +                       if (sea_depth <= seb_depth)
> +                               seb = parent_entity(seb);
> +               }
> +
> +               delta = (s64)(sea->vruntime - seb->vruntime);
> +       }
> +
> +       /* across cpu: use util_avg to decide which task to run */
> +       delta = (s64)(sea->avg.util_avg - seb->avg.util_avg);

The granularity period of util_avg seems too large to decide task priority
during pick_task(), at least it is in my case, cfs_prio_less() always picked
core max task, so pick_task() eventually picked idle, which causes this change
not very helpful for my case.

 <idle>-0     [057] dN..    83.716973: __schedule: max: sysbench/2578
ffff889050f68600
 <idle>-0     [057] dN..    83.716974: __schedule:
(swapper/5/0;140,0,0) ?< (mysqld/2511;119,1042118143,0)
 <idle>-0     [057] dN..    83.716975: __schedule:
(sysbench/2578;119,96449836,0) ?< (mysqld/2511;119,1042118143,0)
 <idle>-0     [057] dN..    83.716975: cfs_prio_less: picked
sysbench/2578 util_avg: 20 527 -507 <======= here===
 <idle>-0     [057] dN..    83.716976: __schedule: pick_task cookie
pick swapper/5/0 ffff889050f68600

Thanks,
-Aubrey

  parent reply	other threads:[~2019-07-22 10:27 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-29 20:36 Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 01/16] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-08-08 10:54   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:19   ` [RFC PATCH v3 01/16] " mark gross
2019-08-26 16:59     ` Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:20   ` [RFC PATCH v3 02/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 03/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-08-08 10:55   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-08-08 10:57   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 06/16] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] sched/fair: Expose newidle_balance() tip-bot for Peter Zijlstra
2019-05-29 20:36 ` [RFC PATCH v3 07/16] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-08-08 10:58   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 16:51   ` [RFC PATCH v3 07/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 08/16] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-08-08 10:59   ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-08-26 17:01   ` [RFC PATCH v3 08/16] " mark gross
2019-05-29 20:36 ` [RFC PATCH v3 09/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-08-26 17:14   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 10/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-05-31 11:08   ` Peter Zijlstra
2019-05-31 15:23     ` Vineeth Pillai
2019-05-29 20:36 ` [RFC PATCH v3 11/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-08-26 20:59   ` mark gross
2019-05-29 20:36 ` [RFC PATCH v3 12/16] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 13/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-06-07 23:36   ` Pawan Gupta
2019-05-29 20:36 ` [RFC PATCH v3 14/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 15/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-05-29 20:36 ` [RFC PATCH v3 16/16] sched: Debug bits Vineeth Remanan Pillai
2019-05-29 21:02   ` Peter Oskolkov
2019-05-30 14:04 ` [RFC PATCH v3 00/16] Core scheduling v3 Aubrey Li
2019-05-30 14:17   ` Julien Desfossez
2019-05-31  4:55     ` Aubrey Li
2019-05-31  3:01   ` Aaron Lu
2019-05-31  5:12     ` Aubrey Li
2019-05-31  6:09       ` Aaron Lu
2019-05-31  6:53         ` Aubrey Li
2019-05-31  7:44           ` Aaron Lu
2019-05-31  8:26             ` Aubrey Li
2019-05-31 21:08     ` Julien Desfossez
2019-06-06 15:26       ` Julien Desfossez
2019-06-12  1:52         ` Li, Aubrey
2019-06-12 16:06           ` Julien Desfossez
2019-06-12 16:33         ` Julien Desfossez
2019-06-13  0:03           ` Subhra Mazumdar
2019-06-13  3:22             ` Julien Desfossez
2019-06-17  2:51               ` Aubrey Li
2019-06-19 18:33                 ` Julien Desfossez
2019-07-18 10:07                   ` Aaron Lu
2019-07-18 23:27                     ` Tim Chen
2019-07-19  5:52                       ` Aaron Lu
2019-07-19 11:48                         ` Aubrey Li
2019-07-19 18:33                         ` Tim Chen
2019-07-22 10:26                     ` Aubrey Li [this message]
2019-07-22 10:43                       ` Aaron Lu
2019-07-23  2:52                         ` Aubrey Li
2019-07-25 14:30                       ` Aaron Lu
2019-07-25 14:31                         ` [RFC PATCH 1/3] wrapper for cfs_rq->min_vruntime Aaron Lu
2019-07-25 14:32                         ` [PATCH 2/3] core vruntime comparison Aaron Lu
2019-08-06 14:17                           ` Peter Zijlstra
2019-07-25 14:33                         ` [PATCH 3/3] temp hack to make tick based schedule happen Aaron Lu
2019-07-25 21:42                         ` [RFC PATCH v3 00/16] Core scheduling v3 Li, Aubrey
2019-07-26 15:21                         ` Julien Desfossez
2019-07-26 21:29                           ` Tim Chen
2019-07-31  2:42                           ` Li, Aubrey
2019-08-02 15:37                             ` Julien Desfossez
2019-08-05 15:55                               ` Tim Chen
2019-08-06  3:24                                 ` Aaron Lu
2019-08-06  6:56                                   ` Aubrey Li
2019-08-06  7:04                                     ` Aaron Lu
2019-08-06 12:24                                       ` Vineeth Remanan Pillai
2019-08-06 13:49                                         ` Aaron Lu
2019-08-06 16:14                                           ` Vineeth Remanan Pillai
2019-08-06 14:16                                         ` Peter Zijlstra
2019-08-06 15:53                                           ` Vineeth Remanan Pillai
2019-08-06 17:03                                   ` Tim Chen
2019-08-06 17:12                                     ` Peter Zijlstra
2019-08-06 21:19                                       ` Tim Chen
2019-08-08  6:47                                         ` Aaron Lu
2019-08-08 17:27                                           ` Tim Chen
2019-08-08 21:42                                             ` Tim Chen
2019-08-10 14:15                                               ` Aaron Lu
2019-08-12 15:38                                                 ` Vineeth Remanan Pillai
2019-08-13  2:24                                                   ` Aaron Lu
2019-08-08 12:55                                 ` Aaron Lu
2019-08-08 16:39                                   ` Tim Chen
2019-08-10 14:18                                     ` Aaron Lu
2019-08-05 20:09                               ` Phil Auld
2019-08-06 13:54                                 ` Aaron Lu
2019-08-06 14:17                                   ` Phil Auld
2019-08-06 14:41                                     ` Aaron Lu
2019-08-06 14:55                                       ` Phil Auld
2019-08-07  8:58                               ` Dario Faggioli
2019-08-07 17:10                                 ` Tim Chen
2019-08-15 16:09                                   ` Dario Faggioli
2019-08-16  2:33                                     ` Aaron Lu
2019-09-05  1:44                                   ` Julien Desfossez
2019-09-06 22:17                                     ` Tim Chen
2019-09-18 21:27                                     ` Tim Chen
2019-09-06 18:30                                   ` Tim Chen
2019-09-11 14:02                                     ` Aaron Lu
2019-09-11 16:19                                       ` Tim Chen
2019-09-11 16:47                                         ` Vineeth Remanan Pillai
2019-09-12 12:35                                           ` Aaron Lu
2019-09-12 17:29                                             ` Tim Chen
2019-09-13 14:15                                               ` Aaron Lu
2019-09-13 17:13                                                 ` Tim Chen
2019-09-30 11:53                                             ` Vineeth Remanan Pillai
2019-10-02 20:48                                               ` Vineeth Remanan Pillai
2019-10-10 13:54                                                 ` Aaron Lu
2019-10-10 14:29                                                   ` Vineeth Remanan Pillai
2019-10-11  7:33                                                     ` Aaron Lu
2019-10-11 11:32                                                       ` Vineeth Remanan Pillai
2019-10-11 12:01                                                         ` Aaron Lu
2019-10-11 12:10                                                           ` Vineeth Remanan Pillai
2019-10-12  3:55                                                             ` Aaron Lu
2019-10-13 12:44                                                               ` Vineeth Remanan Pillai
2019-10-14  9:57                                                                 ` Aaron Lu
2019-10-21 12:30                                                                   ` Vineeth Remanan Pillai
2019-09-12 12:04                                         ` Aaron Lu
2019-09-12 17:05                                           ` Tim Chen
2019-09-13 13:57                                             ` Aaron Lu
2019-09-12 23:12                                           ` Aubrey Li
2019-09-15 14:14                                             ` Aaron Lu
2019-09-18  1:33                                               ` Aubrey Li
2019-09-18 20:40                                                 ` Tim Chen
2019-09-18 22:16                                                   ` Aubrey Li
2019-09-30 14:36                                                     ` Vineeth Remanan Pillai
2019-10-29 20:40                                                   ` Julien Desfossez
2019-11-01 21:42                                                     ` Tim Chen
2019-10-29  9:11                                               ` Dario Faggioli
2019-10-29  9:15                                                 ` Dario Faggioli
2019-10-29  9:16                                                 ` Dario Faggioli
2019-10-29  9:17                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:18                                                 ` Dario Faggioli
2019-10-29  9:19                                                 ` Dario Faggioli
2019-10-29  9:20                                                 ` Dario Faggioli
2019-10-29 20:34                                                   ` Julien Desfossez
2019-11-15 16:30                                                     ` Dario Faggioli
2019-09-25  2:40                                     ` Aubrey Li
2019-09-25 17:24                                       ` Tim Chen
2019-09-25 22:07                                         ` Aubrey Li
2019-09-30 15:22                                     ` Julien Desfossez
2019-08-27 21:14 ` Matthew Garrett
2019-08-27 21:50   ` Peter Zijlstra
2019-08-28 15:30     ` Phil Auld
2019-08-28 16:01       ` Peter Zijlstra
2019-08-28 16:37         ` Tim Chen
2019-08-29 14:30         ` Phil Auld
2019-08-29 14:38           ` Peter Zijlstra
2019-09-10 14:27             ` Julien Desfossez
2019-09-18 21:12               ` Tim Chen
2019-08-28 15:59     ` Tim Chen
2019-08-28 16:16       ` Peter Zijlstra
2019-08-27 23:24   ` Aubrey Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAERHkrtvLKxrpvfw04urAuougsYOWnNw4-H1vUDFx27Dvy0=Ww@mail.gmail.com' \
    --to=aubrey.intel@gmail.com \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    --subject='Re: [RFC PATCH v3 00/16] Core scheduling v3' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).