LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* A strange behavior of sched_fair
@ 2008-02-27 22:51 Kei Tokunaga
  2008-02-29  7:08 ` Andrew Morton
  2008-02-29  9:34 ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Kei Tokunaga @ 2008-02-27 22:51 UTC (permalink / raw)
  To: mingo; +Cc: Kei Tokunaga, linux-kernel

Hi Ingo,

I am playing around with sched_fair and cgroup, and it seems like
I hit a possible bug.  Could you also check if that is a bug?

Description of behavior:
   Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the
   task to run for a while (e.g. several tens of seconds or a couple of
   minutes would be adequate.)  Then, start another cpu-bound task (t2)
   and attach it to cgA in the way described in "Steps to Reproduce" section.
   You will see t1 does not get run for a while.
   (The tasks may not have to be cpu-bound, but it is easier to see the
    behavior using cpu-bound tasks.)

How reproducible:
   Always.

Environments where I saw the behavior:
   2.6.25-rc3 with resource management functions enabled on ia64 box.

Steps to Reproduce:
   # mkdir /dev/cgroup
   # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup
   # mkdir /dev/cgroup/{a,b}
   # echo 0 > /dev/cgroup/a/cpuset.cpus
   # echo 0 > /dev/cgroup/b/cpuset.cpus
   # echo 1 > /dev/cgroup/a/cpuset.mems
   # echo 1 > /dev/cgroup/b/cpuset.mems
   # echo $$ > /dev/cgroup/b/tasks
   # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program)
     [Wait for several tens of seconds or a couple of minutes.]
   # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program)
     [You will see a.out does not get run for a while by running top command.]

Additional Info:
   a.out2 needs to be started from the shell of cgroup-b in order to
   reproduce the problem (, unless the system is UP.)  Starting a.out2
   in the manner, se->vruntime (or something to create the se->vruntime)
   of a.out2 seems to be initialized to a small value, compared to the
   value of a.out.  And the fair scheduler only runs a.out2 until the
   se->vruntime catches up with the se->vruntime of a.out.

Thanks,
Kei
--
Kei Tokunaga
Fujitsu (Red Hat On-site Partner)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga
@ 2008-02-29  7:08 ` Andrew Morton
  2008-02-29  9:34 ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2008-02-29  7:08 UTC (permalink / raw)
  To: Kei Tokunaga; +Cc: mingo, linux-kernel, containers


(cc containers list)

On Wed, 27 Feb 2008 17:51:35 -0500 Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote:

> Hi Ingo,
> 
> I am playing around with sched_fair and cgroup, and it seems like
> I hit a possible bug.  Could you also check if that is a bug?
> 
> Description of behavior:
>    Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the
>    task to run for a while (e.g. several tens of seconds or a couple of
>    minutes would be adequate.)  Then, start another cpu-bound task (t2)
>    and attach it to cgA in the way described in "Steps to Reproduce" section.
>    You will see t1 does not get run for a while.
>    (The tasks may not have to be cpu-bound, but it is easier to see the
>     behavior using cpu-bound tasks.)
> 
> How reproducible:
>    Always.
> 
> Environments where I saw the behavior:
>    2.6.25-rc3 with resource management functions enabled on ia64 box.
> 
> Steps to Reproduce:
>    # mkdir /dev/cgroup
>    # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup
>    # mkdir /dev/cgroup/{a,b}
>    # echo 0 > /dev/cgroup/a/cpuset.cpus
>    # echo 0 > /dev/cgroup/b/cpuset.cpus
>    # echo 1 > /dev/cgroup/a/cpuset.mems
>    # echo 1 > /dev/cgroup/b/cpuset.mems
>    # echo $$ > /dev/cgroup/b/tasks
>    # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program)
>      [Wait for several tens of seconds or a couple of minutes.]
>    # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program)
>      [You will see a.out does not get run for a while by running top command.]
> 
> Additional Info:
>    a.out2 needs to be started from the shell of cgroup-b in order to
>    reproduce the problem (, unless the system is UP.)  Starting a.out2
>    in the manner, se->vruntime (or something to create the se->vruntime)
>    of a.out2 seems to be initialized to a small value, compared to the
>    value of a.out.  And the fair scheduler only runs a.out2 until the
>    se->vruntime catches up with the se->vruntime of a.out.
> 
> Thanks,
> Kei
> --
> Kei Tokunaga
> Fujitsu (Red Hat On-site Partner)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga
  2008-02-29  7:08 ` Andrew Morton
@ 2008-02-29  9:34 ` Peter Zijlstra
  2008-02-29 10:11   ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2008-02-29  9:34 UTC (permalink / raw)
  To: Kei Tokunaga; +Cc: mingo, linux-kernel, Dhaval Giani, vatsa


On Wed, 2008-02-27 at 17:51 -0500, Kei Tokunaga wrote:
> Hi Ingo,
> 
> I am playing around with sched_fair and cgroup, and it seems like
> I hit a possible bug.  Could you also check if that is a bug?
> 
> Description of behavior:
>    Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the
>    task to run for a while (e.g. several tens of seconds or a couple of
>    minutes would be adequate.)  Then, start another cpu-bound task (t2)
>    and attach it to cgA in the way described in "Steps to Reproduce" section.
>    You will see t1 does not get run for a while.
>    (The tasks may not have to be cpu-bound, but it is easier to see the
>     behavior using cpu-bound tasks.)
> 
> How reproducible:
>    Always.
> 
> Environments where I saw the behavior:
>    2.6.25-rc3 with resource management functions enabled on ia64 box.
> 
> Steps to Reproduce:
>    # mkdir /dev/cgroup
>    # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup
>    # mkdir /dev/cgroup/{a,b}
>    # echo 0 > /dev/cgroup/a/cpuset.cpus
>    # echo 0 > /dev/cgroup/b/cpuset.cpus
>    # echo 1 > /dev/cgroup/a/cpuset.mems
>    # echo 1 > /dev/cgroup/b/cpuset.mems
>    # echo $$ > /dev/cgroup/b/tasks
>    # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program)
>      [Wait for several tens of seconds or a couple of minutes.]
>    # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program)
>      [You will see a.out does not get run for a while by running top command.]
> 
> Additional Info:
>    a.out2 needs to be started from the shell of cgroup-b in order to
>    reproduce the problem (, unless the system is UP.)  Starting a.out2
>    in the manner, se->vruntime (or something to create the se->vruntime)
>    of a.out2 seems to be initialized to a small value, compared to the
>    value of a.out.  And the fair scheduler only runs a.out2 until the
>    se->vruntime catches up with the se->vruntime of a.out.

Seems the vruntime doesn't get re-set if you move tasks between groups.
sched_move_task() should call place_entity() in the context of the new
cfs_rq.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-29  9:34 ` Peter Zijlstra
@ 2008-02-29 10:11   ` Peter Zijlstra
  2008-02-29 19:42     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2008-02-29 10:11 UTC (permalink / raw)
  To: Kei Tokunaga; +Cc: mingo, linux-kernel, Dhaval Giani, vatsa


On Fri, 2008-02-29 at 10:34 +0100, Peter Zijlstra wrote:
> On Wed, 2008-02-27 at 17:51 -0500, Kei Tokunaga wrote:
> > Hi Ingo,
> > 
> > I am playing around with sched_fair and cgroup, and it seems like
> > I hit a possible bug.  Could you also check if that is a bug?
> > 
> > Description of behavior:
> >    Start a cpu-bound task (t1), attach it to a cgroup (cgA), and let the
> >    task to run for a while (e.g. several tens of seconds or a couple of
> >    minutes would be adequate.)  Then, start another cpu-bound task (t2)
> >    and attach it to cgA in the way described in "Steps to Reproduce" section.
> >    You will see t1 does not get run for a while.
> >    (The tasks may not have to be cpu-bound, but it is easier to see the
> >     behavior using cpu-bound tasks.)
> > 
> > How reproducible:
> >    Always.
> > 
> > Environments where I saw the behavior:
> >    2.6.25-rc3 with resource management functions enabled on ia64 box.
> > 
> > Steps to Reproduce:
> >    # mkdir /dev/cgroup
> >    # mount -t cgroup -ocpuset,cpu cpu /dev/cgroup
> >    # mkdir /dev/cgroup/{a,b}
> >    # echo 0 > /dev/cgroup/a/cpuset.cpus
> >    # echo 0 > /dev/cgroup/b/cpuset.cpus
> >    # echo 1 > /dev/cgroup/a/cpuset.mems
> >    # echo 1 > /dev/cgroup/b/cpuset.mems
> >    # echo $$ > /dev/cgroup/b/tasks
> >    # ./a.out & echo $! > /dev/cgroup/a/tasks (a.out is just a for-loop program)
> >      [Wait for several tens of seconds or a couple of minutes.]
> >    # ./a.out2 & echo $! > /dev/cgroup/a/tasks (a.out2 is just a for-loop program)
> >      [You will see a.out does not get run for a while by running top command.]
> > 
> > Additional Info:
> >    a.out2 needs to be started from the shell of cgroup-b in order to
> >    reproduce the problem (, unless the system is UP.)  Starting a.out2
> >    in the manner, se->vruntime (or something to create the se->vruntime)
> >    of a.out2 seems to be initialized to a small value, compared to the
> >    value of a.out.  And the fair scheduler only runs a.out2 until the
> >    se->vruntime catches up with the se->vruntime of a.out.
> 
> Seems the vruntime doesn't get re-set if you move tasks between groups.
> sched_move_task() should call place_entity() in the context of the new
> cfs_rq.

This should do I guess - uncompiled, untested

---
Subject: sched: fair-group: fixup tasks on group move

Tasks would retain their old vruntime when moved between groups, this
can cause funny lags. Re-set the vruntime on group move to fit within
the new tree.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h |    4 ++++
 kernel/sched.c        |    3 +++
 kernel/sched_fair.c   |   14 ++++++++++++++
 3 files changed, 21 insertions(+)

Index: linux-2.6-2/include/linux/sched.h
===================================================================
--- linux-2.6-2.orig/include/linux/sched.h
+++ linux-2.6-2/include/linux/sched.h
@@ -899,6 +899,10 @@ struct sched_class {
 			     int running);
 	void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
 			     int oldprio, int running);
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	void (*moved_group) (struct task_struct *p);
+#endif
 };
 
 struct load_weight {
Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -7831,6 +7831,9 @@ void sched_move_task(struct task_struct 
 
 	set_task_rq(tsk, task_cpu(tsk));
 
+	if (tsk->sched_class->moved_group)
+		tsk->sched_class->moved_group(tsk);
+
 	if (on_rq) {
 		if (unlikely(running))
 			tsk->sched_class->set_curr_task(rq);
Index: linux-2.6-2/kernel/sched_fair.c
===================================================================
--- linux-2.6-2.orig/kernel/sched_fair.c
+++ linux-2.6-2/kernel/sched_fair.c
@@ -1398,6 +1398,16 @@ static void set_curr_task_fair(struct rq
 		set_next_entity(cfs_rq_of(se), se);
 }
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+static void moved_group_fair(struct task_struct *p)
+{
+	struct cfs_rq = task_cfs_rq(tsk);
+
+	update_curr(cfs_rq);
+	place_entity(cfs_rq, &p->se, 1);
+}
+#endif
+
 /*
  * All the scheduling class methods:
  */
@@ -1426,6 +1436,10 @@ static const struct sched_class fair_sch
 
 	.prio_changed		= prio_changed_fair,
 	.switched_to		= switched_to_fair,
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	.moved_group		= moved_group_fair,
+#endif
 };
 
 #ifdef CONFIG_SCHED_DEBUG



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-29 10:11   ` Peter Zijlstra
@ 2008-02-29 19:42     ` Ingo Molnar
  2008-02-29 20:21       ` Kei Tokunaga
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-02-29 19:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kei Tokunaga, linux-kernel, Dhaval Giani, vatsa


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > Seems the vruntime doesn't get re-set if you move tasks between 
> > groups. sched_move_task() should call place_entity() in the context 
> > of the new cfs_rq.
> 
> This should do I guess - uncompiled, untested

if someone tests it and if it solves the problem i'll apply it to the 
scheduler tree.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-29 19:42     ` Ingo Molnar
@ 2008-02-29 20:21       ` Kei Tokunaga
  2008-02-29 20:32         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Kei Tokunaga @ 2008-02-29 20:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga, Kei Tokunaga

Ingo Molnar wrote, (2008/02/29 14:42):
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
>>> Seems the vruntime doesn't get re-set if you move tasks between 
>>> groups. sched_move_task() should call place_entity() in the context 
>>> of the new cfs_rq.
>> This should do I guess - uncompiled, untested
> 
> if someone tests it and if it solves the problem i'll apply it to the 
> scheduler tree.

Hi Ingo, Peter,

Thanks for the patch, Peter.  I modified it a bit to compile and
confirmed the issue did not occur on my ia64 box.  I am attaching
the revised patch below.

Thanks,
Kei


Subject: sched: fair-group: fixup tasks on group move

Tasks would retain their old vruntime when moved between groups, this
can cause funny lags. Re-set the vruntime on group move to fit within
the new tree.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


---

  linux-2.6.25-rc3-cgroup-kei/include/linux/sched.h |    4 ++++
  linux-2.6.25-rc3-cgroup-kei/kernel/sched.c        |    3 +++
  linux-2.6.25-rc3-cgroup-kei/kernel/sched_fair.c   |   14 ++++++++++++++
  3 files changed, 21 insertions(+)

diff -puN include/linux/sched.h~sched_fair include/linux/sched.h
--- linux-2.6.25-rc3-cgroup/include/linux/sched.h~sched_fair	2008-02-29 12:26:47.000000000 -0500
+++ linux-2.6.25-rc3-cgroup-kei/include/linux/sched.h	2008-02-29 12:26:47.000000000 -0500
@@ -898,6 +898,10 @@ struct sched_class {
  			     int running);
  	void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
  			     int oldprio, int running);
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	void (*moved_group) (struct task_struct *p);
+#endif
  };

  struct load_weight {
diff -puN kernel/sched.c~sched_fair kernel/sched.c
--- linux-2.6.25-rc3-cgroup/kernel/sched.c~sched_fair	2008-02-29 12:26:47.000000000 -0500
+++ linux-2.6.25-rc3-cgroup-kei/kernel/sched.c	2008-02-29 12:26:47.000000000 -0500
@@ -7825,6 +7825,9 @@ void sched_move_task(struct task_struct

  	set_task_rq(tsk, task_cpu(tsk));

+	if (tsk->sched_class->moved_group)
+		tsk->sched_class->moved_group(tsk);
+
  	if (on_rq) {
  		if (unlikely(running))
  			tsk->sched_class->set_curr_task(rq);
diff -puN kernel/sched_fair.c~sched_fair kernel/sched_fair.c
--- linux-2.6.25-rc3-cgroup/kernel/sched_fair.c~sched_fair	2008-02-29 12:26:47.000000000 -0500
+++ linux-2.6.25-rc3-cgroup-kei/kernel/sched_fair.c	2008-02-29 13:49:15.000000000 -0500
@@ -1403,6 +1403,16 @@ static void set_curr_task_fair(struct rq
  		set_next_entity(cfs_rq_of(se), se);
  }

+#ifdef CONFIG_FAIR_GROUP_SCHED
+static void moved_group_fair(struct task_struct *p)
+{
+	struct cfs_rq *cfs_rq = task_cfs_rq(p);
+
+	update_curr(cfs_rq);
+	place_entity(cfs_rq, &p->se, 1);
+}
+#endif
+
  /*
   * All the scheduling class methods:
   */
@@ -1431,6 +1441,10 @@ static const struct sched_class fair_sch

  	.prio_changed		= prio_changed_fair,
  	.switched_to		= switched_to_fair,
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	.moved_group		= moved_group_fair,
+#endif
  };

  #ifdef CONFIG_SCHED_DEBUG

_




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-29 20:21       ` Kei Tokunaga
@ 2008-02-29 20:32         ` Ingo Molnar
  2008-02-29 22:46           ` Kei Tokunaga
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-02-29 20:32 UTC (permalink / raw)
  To: Kei Tokunaga
  Cc: Peter Zijlstra, linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga


* Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote:

> Hi Ingo, Peter,
>
> Thanks for the patch, Peter.  I modified it a bit to compile and 
> confirmed the issue did not occur on my ia64 box.  I am attaching the 
> revised patch below.

thanks, queued it up. Could you send me your Tested-by (and Acked-by) 
lines?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A strange behavior of sched_fair
  2008-02-29 20:32         ` Ingo Molnar
@ 2008-02-29 22:46           ` Kei Tokunaga
  0 siblings, 0 replies; 8+ messages in thread
From: Kei Tokunaga @ 2008-02-29 22:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, Dhaval Giani, vatsa, Kei Tokunaga,
	Kei Tokunaga

Ingo Molnar wrote, (2008/02/29 15:32):
> * Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> wrote:
> 
>> Hi Ingo, Peter,
>>
>> Thanks for the patch, Peter.  I modified it a bit to compile and 
>> confirmed the issue did not occur on my ia64 box.  I am attaching the 
>> revised patch below.
> 
> thanks, queued it up. Could you send me your Tested-by (and Acked-by) 
> lines?

Tested-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>

Thanks,
Kei


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-02-29 22:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-27 22:51 A strange behavior of sched_fair Kei Tokunaga
2008-02-29  7:08 ` Andrew Morton
2008-02-29  9:34 ` Peter Zijlstra
2008-02-29 10:11   ` Peter Zijlstra
2008-02-29 19:42     ` Ingo Molnar
2008-02-29 20:21       ` Kei Tokunaga
2008-02-29 20:32         ` Ingo Molnar
2008-02-29 22:46           ` Kei Tokunaga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).