LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] sched: fix a bug in sched domain degenerate
@ 2008-11-06  1:45 Li Zefan
  2008-11-06  7:06 ` Ingo Molnar
  2008-11-08  6:55 ` Li Zefan
  0 siblings, 2 replies; 4+ messages in thread
From: Li Zefan @ 2008-11-06  1:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, LKML, suresh.b.siddha

(1) on i386 with SCHED_SMT and SCHED_MC enabled
	# mount -t cgroup -o cpuset xxx /mnt
	# echo 0 > /mnt/cpuset.sched_load_balance
	# mkdir /mnt/0
	# echo 0 > /mnt/0/cpuset.cpus
	# dmesg
	CPU0 attaching sched-domain:
	 domain 0: span 0 level CPU
	  groups: 0

(2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
	# same with (1)
	# dmesg
	CPU0 attaching NULL sched-domain.

The bug is some sched domains may be skipped unintentionally when
doing sched domain degenerating.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/sched.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index dee79ad..b13f45a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6875,15 +6875,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 	struct sched_domain *tmp;
 
 	/* Remove the sched domains which do not contribute to scheduling. */
-	for (tmp = sd; tmp; tmp = tmp->parent) {
+	for (tmp = sd; tmp; ) {
 		struct sched_domain *parent = tmp->parent;
 		if (!parent)
 			break;
+
 		if (sd_parent_degenerate(tmp, parent)) {
 			tmp->parent = parent->parent;
 			if (parent->parent)
 				parent->parent->child = tmp;
-		}
+		} else
+			tmp = tmp->parent;
 	}
 
 	if (sd && sd_degenerate(sd)) {
-- 
1.5.4.rc3


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix a bug in sched domain degenerate
  2008-11-06  1:45 [PATCH] sched: fix a bug in sched domain degenerate Li Zefan
@ 2008-11-06  7:06 ` Ingo Molnar
  2008-11-08  6:55 ` Li Zefan
  1 sibling, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-11-06  7:06 UTC (permalink / raw)
  To: Li Zefan; +Cc: Peter Zijlstra, LKML, suresh.b.siddha, Mike Galbraith


* Li Zefan <lizf@cn.fujitsu.com> wrote:

> (1) on i386 with SCHED_SMT and SCHED_MC enabled
> 	# mount -t cgroup -o cpuset xxx /mnt
> 	# echo 0 > /mnt/cpuset.sched_load_balance
> 	# mkdir /mnt/0
> 	# echo 0 > /mnt/0/cpuset.cpus
> 	# dmesg
> 	CPU0 attaching sched-domain:
> 	 domain 0: span 0 level CPU
> 	  groups: 0
> 
> (2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
> 	# same with (1)
> 	# dmesg
> 	CPU0 attaching NULL sched-domain.
> 
> The bug is some sched domains may be skipped unintentionally when
> doing sched domain degenerating.
> 
> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
> ---
>  kernel/sched.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)

applied to tip/sched/urgent, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix a bug in sched domain degenerate
  2008-11-06  1:45 [PATCH] sched: fix a bug in sched domain degenerate Li Zefan
  2008-11-06  7:06 ` Ingo Molnar
@ 2008-11-08  6:55 ` Li Zefan
  2008-11-08  8:32   ` Ingo Molnar
  1 sibling, 1 reply; 4+ messages in thread
From: Li Zefan @ 2008-11-08  6:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, LKML, suresh.b.siddha

Hi Ingo,

I just read the modified changelog in the git-log, and it is
wrong (or maybe my fix is wrong?), I should have explained
the bug clearer. :(

I'm writing this mail to confirm if my thought and fix is
right or not.

> commit f29c9b1ccb52904ee442a933cf3dee628f9f4e62
> Author: Li Zefan <lizf@cn.fujitsu.com>
> Date:   Thu Nov 6 09:45:16 2008 +0800
> 
>     sched: fix a bug in sched domain degenerate
>     
>     Impact: re-add incorrectly eliminated sched domain layers
>     

This statement is wrong..

>     (1) on i386 with SCHED_SMT and SCHED_MC enabled
>     	# mount -t cgroup -o cpuset xxx /mnt
>     	# echo 0 > /mnt/cpuset.sched_load_balance
>     	# mkdir /mnt/0
>     	# echo 0 > /mnt/0/cpuset.cpus
>     	# dmesg
>     	CPU0 attaching sched-domain:
>     	 domain 0: span 0 level CPU
>     	  groups: 0
>     

I think this behavior is wrong.

>     (2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
>     	# same with (1)
>     	# dmesg
>     	CPU0 attaching NULL sched-domain.
>     

And this is right. CPU domain has only 1 cpu so it does not contribute
to scheduling, so it can be removed.

>     The bug is that some sched domains may be skipped unintentionally when
>     degenerating (optimizing) sched domains.
>     

The bug is, some sched domains won't be checked in the for loop due
to the bug, so they have no chance to be removed.

In the for loop, we check if the parents domains can be removed:

cur_ptr
 |
 v
SMT--->MC--->CPU--->NULL

(parent MC is checked and can be removed)

=>

       cur_ptr
        |
        v
SMT--->CPU--->NULL

(break out of the for loop, because cur_ptr->parent == NULL)

so CPU domain won't be checked. When we delete MC domain, the pointer
should not move forwards, so the fix is:

cur_ptr
 |
 v
SMT--->CPU--->NULL

>     Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
>     Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 82cc839..4c7e2bc 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6877,15 +6877,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>  	struct sched_domain *tmp;
>  
>  	/* Remove the sched domains which do not contribute to scheduling. */
> -	for (tmp = sd; tmp; tmp = tmp->parent) {
> +	for (tmp = sd; tmp; ) {
>  		struct sched_domain *parent = tmp->parent;
>  		if (!parent)
>  			break;
> +
>  		if (sd_parent_degenerate(tmp, parent)) {
>  			tmp->parent = parent->parent;
>  			if (parent->parent)
>  				parent->parent->child = tmp;
> -		}
> +		} else
> +			tmp = tmp->parent;
>  	}
>  
>  	if (sd && sd_degenerate(sd)) {



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: fix a bug in sched domain degenerate
  2008-11-08  6:55 ` Li Zefan
@ 2008-11-08  8:32   ` Ingo Molnar
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-11-08  8:32 UTC (permalink / raw)
  To: Li Zefan; +Cc: Peter Zijlstra, LKML, suresh.b.siddha


* Li Zefan <lizf@cn.fujitsu.com> wrote:

> Hi Ingo,
> 
> I just read the modified changelog in the git-log, and it is
> wrong (or maybe my fix is wrong?), I should have explained
> the bug clearer. :(
> 
> I'm writing this mail to confirm if my thought and fix is
> right or not.
> 
> > commit f29c9b1ccb52904ee442a933cf3dee628f9f4e62
> > Author: Li Zefan <lizf@cn.fujitsu.com>
> > Date:   Thu Nov 6 09:45:16 2008 +0800
> > 
> >     sched: fix a bug in sched domain degenerate
> >     
> >     Impact: re-add incorrectly eliminated sched domain layers
> >     
> 
> This statement is wrong..

that's OK, because the patch is correct :-)

> >     (1) on i386 with SCHED_SMT and SCHED_MC enabled
> >     	# mount -t cgroup -o cpuset xxx /mnt
> >     	# echo 0 > /mnt/cpuset.sched_load_balance
> >     	# mkdir /mnt/0
> >     	# echo 0 > /mnt/0/cpuset.cpus
> >     	# dmesg
> >     	CPU0 attaching sched-domain:
> >     	 domain 0: span 0 level CPU
> >     	  groups: 0
> >     
> 
> I think this behavior is wrong.
> 
> >     (2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
> >     	# same with (1)
> >     	# dmesg
> >     	CPU0 attaching NULL sched-domain.
> >     
> 
> And this is right. CPU domain has only 1 cpu so it does not contribute
> to scheduling, so it can be removed.
> 
> >     The bug is that some sched domains may be skipped unintentionally when
> >     degenerating (optimizing) sched domains.
> >     
> 
> The bug is, some sched domains won't be checked in the for loop due
> to the bug, so they have no chance to be removed.
> 
> In the for loop, we check if the parents domains can be removed:
> 
> cur_ptr
>  |
>  v
> SMT--->MC--->CPU--->NULL
> 
> (parent MC is checked and can be removed)
> 
> =>
> 
>        cur_ptr
>         |
>         v
> SMT--->CPU--->NULL
> 
> (break out of the for loop, because cur_ptr->parent == NULL)
> 
> so CPU domain won't be checked. When we delete MC domain, the pointer
> should not move forwards, so the fix is:
> 
> cur_ptr
>  |
>  v
> SMT--->CPU--->NULL

ah, ok - i misunderstood the direction of the fix. So it strengthens 
degeneration - which is a valid fix too. And the commit message 
remains there to shame my reading skills forever ;-)

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-11-08  8:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-06  1:45 [PATCH] sched: fix a bug in sched domain degenerate Li Zefan
2008-11-06  7:06 ` Ingo Molnar
2008-11-08  6:55 ` Li Zefan
2008-11-08  8:32   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).