LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* A strange behavior of sched_fair @ 2008-02-27 22:51 Kei Tokunaga 2008-02-29 7:08 ` Andrew Morton 2008-02-29 9:34 ` Peter Zijlstra 0 siblings, 2 replies; 8+ messages in thread From: Kei Tokunaga @ 2008-02-27 22:51 UTC (permalink / raw) To: mingo; +Cc: Kei Tokunaga, linux-kernel Hi Ingo, I am playing around with sched_fair and cgroup, and it seems like I hit a possible bug. Could you also check if that is a bug? Description of behavior: Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the task to run for a while (e.g. several tens of seconds or a couple of minutes would be adequate.) Then, start another cpu-bound task (t2) and attach it to cgA in the way described in "Steps to Reproduce" section. You will see t1 does not get run for a while. (The tasks may not have to be cpu-bound, but it is easier to see the behavior using cpu-bound tasks.) How reproducible: Always. Environments where I saw the behavior: 2.6.25-rc3 with resource management functions enabled on ia64 box. Steps to Reproduce: # mkdir /dev/cgroup # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup # mkdir /dev/cgroup/{a,b} # echo 0 > /dev/cgroup/a/cpuset.cpus # echo 0 > /dev/cgroup/b/cpuset.cpus # echo 1 > /dev/cgroup/a/cpuset.mems # echo 1 > /dev/cgroup/b/cpuset.mems # echo $$ > /dev/cgroup/b/tasks # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program) [Wait for several tens of seconds or a couple of minutes.] # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program) [You will see a.out does not get run for a while by running top command.] Additional Info: a.out2 needs to be started from the shell of cgroup-b in order to reproduce the problem (, unless the system is UP.) Starting a.out2 in the manner, se->vruntime (or something to create the se->vruntime) of a.out2 seems to be initialized to a small value, compared to the value of a.out. And the fair scheduler only runs a.out2 until the se->vruntime catches up with the se->vruntime of a.out. Thanks, Kei -- Kei Tokunaga Fujitsu (Red Hat On-site Partner) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga @ 2008-02-29 7:08 ` Andrew Morton 2008-02-29 9:34 ` Peter Zijlstra 1 sibling, 0 replies; 8+ messages in thread From: Andrew Morton @ 2008-02-29 7:08 UTC (permalink / raw) To: Kei Tokunaga; +Cc: mingo, linux-kernel, containers (cc containers list) On Wed, 27 Feb 2008 17:51:35 -0500 Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote: > Hi Ingo, > > I am playing around with sched_fair and cgroup, and it seems like > I hit a possible bug. Could you also check if that is a bug? > > Description of behavior: > Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the > task to run for a while (e.g. several tens of seconds or a couple of > minutes would be adequate.) Then, start another cpu-bound task (t2) > and attach it to cgA in the way described in "Steps to Reproduce" section. > You will see t1 does not get run for a while. > (The tasks may not have to be cpu-bound, but it is easier to see the > behavior using cpu-bound tasks.) > > How reproducible: > Always. > > Environments where I saw the behavior: > 2.6.25-rc3 with resource management functions enabled on ia64 box. > > Steps to Reproduce: > # mkdir /dev/cgroup > # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup > # mkdir /dev/cgroup/{a,b} > # echo 0 > /dev/cgroup/a/cpuset.cpus > # echo 0 > /dev/cgroup/b/cpuset.cpus > # echo 1 > /dev/cgroup/a/cpuset.mems > # echo 1 > /dev/cgroup/b/cpuset.mems > # echo $$ > /dev/cgroup/b/tasks > # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program) > [Wait for several tens of seconds or a couple of minutes.] > # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program) > [You will see a.out does not get run for a while by running top command.] > > Additional Info: > a.out2 needs to be started from the shell of cgroup-b in order to > reproduce the problem (, unless the system is UP.) Starting a.out2 > in the manner, se->vruntime (or something to create the se->vruntime) > of a.out2 seems to be initialized to a small value, compared to the > value of a.out. And the fair scheduler only runs a.out2 until the > se->vruntime catches up with the se->vruntime of a.out. > > Thanks, > Kei > -- > Kei Tokunaga > Fujitsu (Red Hat On-site Partner) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga 2008-02-29 7:08 ` Andrew Morton @ 2008-02-29 9:34 ` Peter Zijlstra 2008-02-29 10:11 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2008-02-29 9:34 UTC (permalink / raw) To: Kei Tokunaga; +Cc: mingo, linux-kernel, Dhaval Giani, vatsa On Wed, 2008-02-27 at 17:51 -0500, Kei Tokunaga wrote: > Hi Ingo, > > I am playing around with sched_fair and cgroup, and it seems like > I hit a possible bug. Could you also check if that is a bug? > > Description of behavior: > Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the > task to run for a while (e.g. several tens of seconds or a couple of > minutes would be adequate.) Then, start another cpu-bound task (t2) > and attach it to cgA in the way described in "Steps to Reproduce" section. > You will see t1 does not get run for a while. > (The tasks may not have to be cpu-bound, but it is easier to see the > behavior using cpu-bound tasks.) > > How reproducible: > Always. > > Environments where I saw the behavior: > 2.6.25-rc3 with resource management functions enabled on ia64 box. > > Steps to Reproduce: > # mkdir /dev/cgroup > # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup > # mkdir /dev/cgroup/{a,b} > # echo 0 > /dev/cgroup/a/cpuset.cpus > # echo 0 > /dev/cgroup/b/cpuset.cpus > # echo 1 > /dev/cgroup/a/cpuset.mems > # echo 1 > /dev/cgroup/b/cpuset.mems > # echo $$ > /dev/cgroup/b/tasks > # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program) > [Wait for several tens of seconds or a couple of minutes.] > # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program) > [You will see a.out does not get run for a while by running top command.] > > Additional Info: > a.out2 needs to be started from the shell of cgroup-b in order to > reproduce the problem (, unless the system is UP.) Starting a.out2 > in the manner, se->vruntime (or something to create the se->vruntime) > of a.out2 seems to be initialized to a small value, compared to the > value of a.out. And the fair scheduler only runs a.out2 until the > se->vruntime catches up with the se->vruntime of a.out. Seems the vruntime doesn't get re-set if you move tasks between groups. sched_move_task() should call place_entity() in the context of the new cfs_rq. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-29 9:34 ` Peter Zijlstra @ 2008-02-29 10:11 ` Peter Zijlstra 2008-02-29 19:42 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2008-02-29 10:11 UTC (permalink / raw) To: Kei Tokunaga; +Cc: mingo, linux-kernel, Dhaval Giani, vatsa On Fri, 2008-02-29 at 10:34 +0100, Peter Zijlstra wrote: > On Wed, 2008-02-27 at 17:51 -0500, Kei Tokunaga wrote: > > Hi Ingo, > > > > I am playing around with sched_fair and cgroup, and it seems like > > I hit a possible bug. Could you also check if that is a bug? > > > > Description of behavior: > > Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the > > task to run for a while (e.g. several tens of seconds or a couple of > > minutes would be adequate.) Then, start another cpu-bound task (t2) > > and attach it to cgA in the way described in "Steps to Reproduce" section. > > You will see t1 does not get run for a while. > > (The tasks may not have to be cpu-bound, but it is easier to see the > > behavior using cpu-bound tasks.) > > > > How reproducible: > > Always. > > > > Environments where I saw the behavior: > > 2.6.25-rc3 with resource management functions enabled on ia64 box. > > > > Steps to Reproduce: > > # mkdir /dev/cgroup > > # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup > > # mkdir /dev/cgroup/{a,b} > > # echo 0 > /dev/cgroup/a/cpuset.cpus > > # echo 0 > /dev/cgroup/b/cpuset.cpus > > # echo 1 > /dev/cgroup/a/cpuset.mems > > # echo 1 > /dev/cgroup/b/cpuset.mems > > # echo $$ > /dev/cgroup/b/tasks > > # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program) > > [Wait for several tens of seconds or a couple of minutes.] > > # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program) > > [You will see a.out does not get run for a while by running top command.] > > > > Additional Info: > > a.out2 needs to be started from the shell of cgroup-b in order to > > reproduce the problem (, unless the system is UP.) Starting a.out2 > > in the manner, se->vruntime (or something to create the se->vruntime) > > of a.out2 seems to be initialized to a small value, compared to the > > value of a.out. And the fair scheduler only runs a.out2 until the > > se->vruntime catches up with the se->vruntime of a.out. > > Seems the vruntime doesn't get re-set if you move tasks between groups. > sched_move_task() should call place_entity() in the context of the new > cfs_rq. This should do I guess - uncompiled, untested --- Subject: sched: fair-group: fixup tasks on group move Tasks would retain their old vruntime when moved between groups, this can cause funny lags. Re-set the vruntime on group move to fit within the new tree. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- include/linux/sched.h | 4 ++++ kernel/sched.c | 3 +++ kernel/sched_fair.c | 14 ++++++++++++++ 3 files changed, 21 insertions(+) Index: linux-2.6-2/include/linux/sched.h =================================================================== --- linux-2.6-2.orig/include/linux/sched.h +++ linux-2.6-2/include/linux/sched.h @@ -899,6 +899,10 @@ struct sched_class { int running); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, int oldprio, int running); + +#ifdef CONFIG_FAIR_GROUP_SCHED + void (*moved_group) (struct task_struct *p); +#endif }; struct load_weight { Index: linux-2.6-2/kernel/sched.c =================================================================== --- linux-2.6-2.orig/kernel/sched.c +++ linux-2.6-2/kernel/sched.c @@ -7831,6 +7831,9 @@ void sched_move_task(struct task_struct set_task_rq(tsk, task_cpu(tsk)); + if (tsk->sched_class->moved_group) + tsk->sched_class->moved_group(tsk); + if (on_rq) { if (unlikely(running)) tsk->sched_class->set_curr_task(rq); Index: linux-2.6-2/kernel/sched_fair.c =================================================================== --- linux-2.6-2.orig/kernel/sched_fair.c +++ linux-2.6-2/kernel/sched_fair.c @@ -1398,6 +1398,16 @@ static void set_curr_task_fair(struct rq set_next_entity(cfs_rq_of(se), se); } +#ifdef CONFIG_FAIR_GROUP_SCHED +static void moved_group_fair(struct task_struct *p) +{ + struct cfs_rq = task_cfs_rq(tsk); + + update_curr(cfs_rq); + place_entity(cfs_rq, &p->se, 1); +} +#endif + /* * All the scheduling class methods: */ @@ -1426,6 +1436,10 @@ static const struct sched_class fair_sch .prio_changed = prio_changed_fair, .switched_to = switched_to_fair, + +#ifdef CONFIG_FAIR_GROUP_SCHED + .moved_group = moved_group_fair, +#endif }; #ifdef CONFIG_SCHED_DEBUG ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-29 10:11 ` Peter Zijlstra @ 2008-02-29 19:42 ` Ingo Molnar 2008-02-29 20:21 ` Kei Tokunaga 0 siblings, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2008-02-29 19:42 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Kei Tokunaga, linux-kernel, Dhaval Giani, vatsa * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > Seems the vruntime doesn't get re-set if you move tasks between > > groups. sched_move_task() should call place_entity() in the context > > of the new cfs_rq. > > This should do I guess - uncompiled, untested if someone tests it and if it solves the problem i'll apply it to the scheduler tree. Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-29 19:42 ` Ingo Molnar @ 2008-02-29 20:21 ` Kei Tokunaga 2008-02-29 20:32 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Kei Tokunaga @ 2008-02-29 20:21 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga, Kei Tokunaga Ingo Molnar wrote, (2008/02/29 14:42): > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > >>> Seems the vruntime doesn't get re-set if you move tasks between >>> groups. sched_move_task() should call place_entity() in the context >>> of the new cfs_rq. >> This should do I guess - uncompiled, untested > > if someone tests it and if it solves the problem i'll apply it to the > scheduler tree. Hi Ingo, Peter, Thanks for the patch, Peter. I modified it a bit to compile and confirmed the issue did not occur on my ia64 box. I am attaching the revised patch below. Thanks, Kei Subject: sched: fair-group: fixup tasks on group move Tasks would retain their old vruntime when moved between groups, this can cause funny lags. Re-set the vruntime on group move to fit within the new tree. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- linux-2.6.25-rc3-cgroup-kei/include/linux/sched.h | 4 ++++ linux-2.6.25-rc3-cgroup-kei/kernel/sched.c | 3 +++ linux-2.6.25-rc3-cgroup-kei/kernel/sched_fair.c | 14 ++++++++++++++ 3 files changed, 21 insertions(+) diff -puN include/linux/sched.h~sched_fair include/linux/sched.h --- linux-2.6.25-rc3-cgroup/include/linux/sched.h~sched_fair 2008-02-29 12:26:47.000000000 -0500 +++ linux-2.6.25-rc3-cgroup-kei/include/linux/sched.h 2008-02-29 12:26:47.000000000 -0500 @@ -898,6 +898,10 @@ struct sched_class { int running); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, int oldprio, int running); + +#ifdef CONFIG_FAIR_GROUP_SCHED + void (*moved_group) (struct task_struct *p); +#endif }; struct load_weight { diff -puN kernel/sched.c~sched_fair kernel/sched.c --- linux-2.6.25-rc3-cgroup/kernel/sched.c~sched_fair 2008-02-29 12:26:47.000000000 -0500 +++ linux-2.6.25-rc3-cgroup-kei/kernel/sched.c 2008-02-29 12:26:47.000000000 -0500 @@ -7825,6 +7825,9 @@ void sched_move_task(struct task_struct set_task_rq(tsk, task_cpu(tsk)); + if (tsk->sched_class->moved_group) + tsk->sched_class->moved_group(tsk); + if (on_rq) { if (unlikely(running)) tsk->sched_class->set_curr_task(rq); diff -puN kernel/sched_fair.c~sched_fair kernel/sched_fair.c --- linux-2.6.25-rc3-cgroup/kernel/sched_fair.c~sched_fair 2008-02-29 12:26:47.000000000 -0500 +++ linux-2.6.25-rc3-cgroup-kei/kernel/sched_fair.c 2008-02-29 13:49:15.000000000 -0500 @@ -1403,6 +1403,16 @@ static void set_curr_task_fair(struct rq set_next_entity(cfs_rq_of(se), se); } +#ifdef CONFIG_FAIR_GROUP_SCHED +static void moved_group_fair(struct task_struct *p) +{ + struct cfs_rq *cfs_rq = task_cfs_rq(p); + + update_curr(cfs_rq); + place_entity(cfs_rq, &p->se, 1); +} +#endif + /* * All the scheduling class methods: */ @@ -1431,6 +1441,10 @@ static const struct sched_class fair_sch .prio_changed = prio_changed_fair, .switched_to = switched_to_fair, + +#ifdef CONFIG_FAIR_GROUP_SCHED + .moved_group = moved_group_fair, +#endif }; #ifdef CONFIG_SCHED_DEBUG _ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-29 20:21 ` Kei Tokunaga @ 2008-02-29 20:32 ` Ingo Molnar 2008-02-29 22:46 ` Kei Tokunaga 0 siblings, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2008-02-29 20:32 UTC (permalink / raw) To: Kei Tokunaga Cc: Peter Zijlstra, linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga * Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote: > Hi Ingo, Peter, > > Thanks for the patch, Peter. I modified it a bit to compile and > confirmed the issue did not occur on my ia64 box. I am attaching the > revised patch below. thanks, queued it up. Could you send me your Tested-by (and Acked-by) lines? Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: A strange behavior of sched_fair 2008-02-29 20:32 ` Ingo Molnar @ 2008-02-29 22:46 ` Kei Tokunaga 0 siblings, 0 replies; 8+ messages in thread From: Kei Tokunaga @ 2008-02-29 22:46 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga, Kei Tokunaga Ingo Molnar wrote, (2008/02/29 15:32): > * Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote: > >> Hi Ingo, Peter, >> >> Thanks for the patch, Peter. I modified it a bit to compile and >> confirmed the issue did not occur on my ia64 box. I am attaching the >> revised patch below. > > thanks, queued it up. Could you send me your Tested-by (and Acked-by) > lines? Tested-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> Thanks, Kei ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-02-29 22:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga 2008-02-29 7:08 ` Andrew Morton 2008-02-29 9:34 ` Peter Zijlstra 2008-02-29 10:11 ` Peter Zijlstra 2008-02-29 19:42 ` Ingo Molnar 2008-02-29 20:21 ` Kei Tokunaga 2008-02-29 20:32 ` Ingo Molnar 2008-02-29 22:46 ` Kei Tokunaga
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).