LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* preempt rcu bug on s390
@ 2008-02-09 11:34 Heiko Carstens
  2008-02-09 14:07 ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Heiko Carstens @ 2008-02-09 11:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
gets stuck when running with more than one cpu.
When booting with four cpus I get all four cpus caught withing cpu_idle
and not advancing anymore. However there is the init process which is
waitung for synchronize_rcu() to complete (lcrash output):

STACK TRACE FOR TASK: 0xf84d968 (swapper)

 STACK:
 0 schedule+842 [0x36c956]
 1 schedule_timeout+172 [0x36d0e4]
 2 wait_for_common+204 [0x36c398]
 3 synchronize_rcu+76 [0x567bc]
 4 netlink_change_ngroups+150 [0x2b4302]
 5 genl_register_mc_group+256 [0x2b6174]
 6 genl_init+188 [0x534e44]
 7 kernel_init+444 [0x518334]
 8 kernel_thread_starter+6 [0x192a6]

If I change the code so that timer ticks won't be disabled everything
runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
thing for the rcu preemptible case.

Kernel version is git head of today.

Any ideas?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-09 11:34 preempt rcu bug on s390 Heiko Carstens
@ 2008-02-09 14:07 ` Paul E. McKenney
  2008-02-09 17:14   ` Heiko Carstens
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2008-02-09 14:07 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote:
> Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
> gets stuck when running with more than one cpu.
> When booting with four cpus I get all four cpus caught withing cpu_idle
> and not advancing anymore. However there is the init process which is
> waitung for synchronize_rcu() to complete (lcrash output):
> 
> STACK TRACE FOR TASK: 0xf84d968 (swapper)
> 
>  STACK:
>  0 schedule+842 [0x36c956]
>  1 schedule_timeout+172 [0x36d0e4]
>  2 wait_for_common+204 [0x36c398]
>  3 synchronize_rcu+76 [0x567bc]
>  4 netlink_change_ngroups+150 [0x2b4302]
>  5 genl_register_mc_group+256 [0x2b6174]
>  6 genl_init+188 [0x534e44]
>  7 kernel_init+444 [0x518334]
>  8 kernel_thread_starter+6 [0x192a6]
> 
> If I change the code so that timer ticks won't be disabled everything
> runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
> thing for the rcu preemptible case.
> 
> Kernel version is git head of today.
> 
> Any ideas?

Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?

If not, could you please check it out?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-09 14:07 ` Paul E. McKenney
@ 2008-02-09 17:14   ` Heiko Carstens
  2008-02-09 22:02     ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Heiko Carstens @ 2008-02-09 17:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

On Sat, Feb 09, 2008 at 06:07:11AM -0800, Paul E. McKenney wrote:
> On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote:
> > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
> > gets stuck when running with more than one cpu.
> > When booting with four cpus I get all four cpus caught withing cpu_idle
> > and not advancing anymore. However there is the init process which is
> > waitung for synchronize_rcu() to complete (lcrash output):
> > 
> > STACK TRACE FOR TASK: 0xf84d968 (swapper)
> > 
> >  STACK:
> >  0 schedule+842 [0x36c956]
> >  1 schedule_timeout+172 [0x36d0e4]
> >  2 wait_for_common+204 [0x36c398]
> >  3 synchronize_rcu+76 [0x567bc]
> >  4 netlink_change_ngroups+150 [0x2b4302]
> >  5 genl_register_mc_group+256 [0x2b6174]
> >  6 genl_init+188 [0x534e44]
> >  7 kernel_init+444 [0x518334]
> >  8 kernel_thread_starter+6 [0x192a6]
> > 
> > If I change the code so that timer ticks won't be disabled everything
> > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
> > thing for the rcu preemptible case.
> > 
> > Kernel version is git head of today.
> > 
> > Any ideas?
> 
> Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?
> 
> If not, could you please check it out?

It's not applied, however it doesn't change anything. Also the patch
is tied to the dynticks implementation which is differently from
s390's nohz implementation.
I had to add the patch below so it would make at least some sense.
But it doesn't fix the problem.

---
 arch/s390/kernel/time.c |    2 ++
 include/linux/hardirq.h |    2 +-
 kernel/rcupreempt.c     |    2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/rcupreempt.c
===================================================================
--- linux-2.6.orig/kernel/rcupreempt.c
+++ linux-2.6/kernel/rcupreempt.c
@@ -413,7 +413,7 @@ static void __rcu_advance_callbacks(stru
 	}
 }
 
-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ)
 
 DEFINE_PER_CPU(long, dynticks_progress_counter) = 1;
 static DEFINE_PER_CPU(long, rcu_dyntick_snapshot);
Index: linux-2.6/arch/s390/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/s390/kernel/time.c
+++ linux-2.6/arch/s390/kernel/time.c
@@ -200,6 +200,7 @@ static void stop_hz_timer(void)
 		if (timer >= jiffies_timer_cc)
 			todval = timer;
 	}
+	rcu_enter_nohz();
 	set_clock_comparator(todval);
 }
 
@@ -213,6 +214,7 @@ static void start_hz_timer(void)
 
 	if (!cpu_isset(smp_processor_id(), nohz_cpu_mask))
 		return;
+	rcu_exit_nohz();
 	account_ticks(get_clock());
 	set_clock_comparator(S390_lowcore.jiffy_timer + CPU_DEVIATION);
 	cpu_clear(smp_processor_id(), nohz_cpu_mask);
Index: linux-2.6/include/linux/hardirq.h
===================================================================
--- linux-2.6.orig/include/linux/hardirq.h
+++ linux-2.6/include/linux/hardirq.h
@@ -109,7 +109,7 @@ static inline void account_system_vtime(
 }
 #endif
 
-#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ)
+#if defined(CONFIG_PREEMPT_RCU) && (defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ))
 extern void rcu_irq_enter(void);
 extern void rcu_irq_exit(void);
 #else

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-09 17:14   ` Heiko Carstens
@ 2008-02-09 22:02     ` Paul E. McKenney
  2008-02-10 13:01       ` Heiko Carstens
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2008-02-09 22:02 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

On Sat, Feb 09, 2008 at 06:14:51PM +0100, Heiko Carstens wrote:
> On Sat, Feb 09, 2008 at 06:07:11AM -0800, Paul E. McKenney wrote:
> > On Sat, Feb 09, 2008 at 12:34:35PM +0100, Heiko Carstens wrote:
> > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
> > > gets stuck when running with more than one cpu.
> > > When booting with four cpus I get all four cpus caught withing cpu_idle
> > > and not advancing anymore. However there is the init process which is
> > > waitung for synchronize_rcu() to complete (lcrash output):
> > > 
> > > STACK TRACE FOR TASK: 0xf84d968 (swapper)
> > > 
> > >  STACK:
> > >  0 schedule+842 [0x36c956]
> > >  1 schedule_timeout+172 [0x36d0e4]
> > >  2 wait_for_common+204 [0x36c398]
> > >  3 synchronize_rcu+76 [0x567bc]
> > >  4 netlink_change_ngroups+150 [0x2b4302]
> > >  5 genl_register_mc_group+256 [0x2b6174]
> > >  6 genl_init+188 [0x534e44]
> > >  7 kernel_init+444 [0x518334]
> > >  8 kernel_thread_starter+6 [0x192a6]
> > > 
> > > If I change the code so that timer ticks won't be disabled everything
> > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
> > > thing for the rcu preemptible case.
> > > 
> > > Kernel version is git head of today.
> > > 
> > > Any ideas?
> > 
> > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?
> > 
> > If not, could you please check it out?
> 
> It's not applied, however it doesn't change anything. Also the patch
> is tied to the dynticks implementation which is differently from
> s390's nohz implementation.
> I had to add the patch below so it would make at least some sense.
> But it doesn't fix the problem.

OK, I was afraid of that.  ;-)

Does s390 start out in nohz mode?  The reason I ask is that it feels like
an off-by-one error for the dynticks_progress_counter.

							Thanx, Paul

> ---
>  arch/s390/kernel/time.c |    2 ++
>  include/linux/hardirq.h |    2 +-
>  kernel/rcupreempt.c     |    2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/kernel/rcupreempt.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupreempt.c
> +++ linux-2.6/kernel/rcupreempt.c
> @@ -413,7 +413,7 @@ static void __rcu_advance_callbacks(stru
>  	}
>  }
> 
> -#ifdef CONFIG_NO_HZ
> +#if defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ)
> 
>  DEFINE_PER_CPU(long, dynticks_progress_counter) = 1;
>  static DEFINE_PER_CPU(long, rcu_dyntick_snapshot);
> Index: linux-2.6/arch/s390/kernel/time.c
> ===================================================================
> --- linux-2.6.orig/arch/s390/kernel/time.c
> +++ linux-2.6/arch/s390/kernel/time.c
> @@ -200,6 +200,7 @@ static void stop_hz_timer(void)
>  		if (timer >= jiffies_timer_cc)
>  			todval = timer;
>  	}
> +	rcu_enter_nohz();
>  	set_clock_comparator(todval);
>  }
> 
> @@ -213,6 +214,7 @@ static void start_hz_timer(void)
> 
>  	if (!cpu_isset(smp_processor_id(), nohz_cpu_mask))
>  		return;
> +	rcu_exit_nohz();
>  	account_ticks(get_clock());
>  	set_clock_comparator(S390_lowcore.jiffy_timer + CPU_DEVIATION);
>  	cpu_clear(smp_processor_id(), nohz_cpu_mask);
> Index: linux-2.6/include/linux/hardirq.h
> ===================================================================
> --- linux-2.6.orig/include/linux/hardirq.h
> +++ linux-2.6/include/linux/hardirq.h
> @@ -109,7 +109,7 @@ static inline void account_system_vtime(
>  }
>  #endif
> 
> -#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ)
> +#if defined(CONFIG_PREEMPT_RCU) && (defined(CONFIG_NO_HZ) || defined(CONFIG_NO_IDLE_HZ))
>  extern void rcu_irq_enter(void);
>  extern void rcu_irq_exit(void);
>  #else

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-09 22:02     ` Paul E. McKenney
@ 2008-02-10 13:01       ` Heiko Carstens
  2008-02-10 17:43         ` Paul E. McKenney
  2008-02-11 15:37         ` Steven Rostedt
  0 siblings, 2 replies; 7+ messages in thread
From: Heiko Carstens @ 2008-02-10 13:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

> > > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
> > > > gets stuck when running with more than one cpu.
> > > > When booting with four cpus I get all four cpus caught withing cpu_idle
> > > > and not advancing anymore. However there is the init process which is
> > > > waitung for synchronize_rcu() to complete (lcrash output):
> > > > 
> > > > If I change the code so that timer ticks won't be disabled everything
> > > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
> > > > thing for the rcu preemptible case.
> > > > 
> > > > Kernel version is git head of today.
> > > > 
> > > > Any ideas?
> > > 
> > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?
> > > 
> > > If not, could you please check it out?
> > 
> > It's not applied, however it doesn't change anything. Also the patch
> > is tied to the dynticks implementation which is differently from
> > s390's nohz implementation.
> > I had to add the patch below so it would make at least some sense.
> > But it doesn't fix the problem.
> 
> OK, I was afraid of that.  ;-)
> 
> Does s390 start out in nohz mode?  The reason I ask is that it feels like
> an off-by-one error for the dynticks_progress_counter.

Actually I forgot to add a few ifdefs to make the code do something :)
That just reveals that we have a conflict with the dynticks implementation
and s390's nohz that shows up in what rcu_irq_enter/exit assume.
I didn't patch s390 and common code so it will work, but I think the
patch you mentionened will fix the problem I reported.
So I guess we should either convert s390 to use the generic dynticks
implementation or disable preemptible rcu on s390 until we converted
our code.

Thanks for helping debugging this!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-10 13:01       ` Heiko Carstens
@ 2008-02-10 17:43         ` Paul E. McKenney
  2008-02-11 15:37         ` Steven Rostedt
  1 sibling, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2008-02-10 17:43 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Gautham R Shenoy, Dipankar Sarma, Steven Rostedt, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

On Sun, Feb 10, 2008 at 02:01:50PM +0100, Heiko Carstens wrote:
> > > > > Using CONFIG_PREEMPT_RCU and CONFIG_NO_IDLE_HZ on s390 my system always
> > > > > gets stuck when running with more than one cpu.
> > > > > When booting with four cpus I get all four cpus caught withing cpu_idle
> > > > > and not advancing anymore. However there is the init process which is
> > > > > waitung for synchronize_rcu() to complete (lcrash output):
> > > > > 
> > > > > If I change the code so that timer ticks won't be disabled everything
> > > > > runs fine. So my guess is that rcu_needs_cpu() doesn't do the right
> > > > > thing for the rcu preemptible case.
> > > > > 
> > > > > Kernel version is git head of today.
> > > > > 
> > > > > Any ideas?
> > > > 
> > > > Does this tree have http://lkml.org/lkml/2008/1/29/208 applied?
> > > > 
> > > > If not, could you please check it out?
> > > 
> > > It's not applied, however it doesn't change anything. Also the patch
> > > is tied to the dynticks implementation which is differently from
> > > s390's nohz implementation.
> > > I had to add the patch below so it would make at least some sense.
> > > But it doesn't fix the problem.
> > 
> > OK, I was afraid of that.  ;-)
> > 
> > Does s390 start out in nohz mode?  The reason I ask is that it feels like
> > an off-by-one error for the dynticks_progress_counter.
> 
> Actually I forgot to add a few ifdefs to make the code do something :)
> That just reveals that we have a conflict with the dynticks implementation
> and s390's nohz that shows up in what rcu_irq_enter/exit assume.
> I didn't patch s390 and common code so it will work, but I think the
> patch you mentionened will fix the problem I reported.
> So I guess we should either convert s390 to use the generic dynticks
> implementation or disable preemptible rcu on s390 until we converted
> our code.

Sounds good to me!!!  (Especially converting s390 to generic algorithm.)

I believe that the generic implementation will do what you need, but
I am sure you will let me know of any problems that arise.

> Thanks for helping debugging this!

Thank you for tracking it down!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: preempt rcu bug on s390
  2008-02-10 13:01       ` Heiko Carstens
  2008-02-10 17:43         ` Paul E. McKenney
@ 2008-02-11 15:37         ` Steven Rostedt
  1 sibling, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2008-02-11 15:37 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Paul E. McKenney, Gautham R Shenoy, Dipankar Sarma, Ingo Molnar,
	Martin Schwidefsky, linux-kernel

Heiko Carstens wrote:

>> Does s390 start out in nohz mode?  The reason I ask is that it feels like
>> an off-by-one error for the dynticks_progress_counter.
> 
> Actually I forgot to add a few ifdefs to make the code do something :)
> That just reveals that we have a conflict with the dynticks implementation
> and s390's nohz that shows up in what rcu_irq_enter/exit assume.
> I didn't patch s390 and common code so it will work, but I think the
> patch you mentionened will fix the problem I reported.
> So I guess we should either convert s390 to use the generic dynticks
> implementation or disable preemptible rcu on s390 until we converted
> our code.
> 
> Thanks for helping debugging this!

Heiko, thanks for reporting this.

This patch still didn't make it into -rc1, and it really should. Because 
without this patch, PREEMPT_RCU and NO_HZ together is broken, on all boxes.

The patch is in Ingo's sched-devel git tree, as 
9460545f81ea48b07dbb20456a8ede776d8ebc1b (last I checked) and titled:

     rcu: add support for dynamic ticks and preempt rcu


-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-11 15:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-09 11:34 preempt rcu bug on s390 Heiko Carstens
2008-02-09 14:07 ` Paul E. McKenney
2008-02-09 17:14   ` Heiko Carstens
2008-02-09 22:02     ` Paul E. McKenney
2008-02-10 13:01       ` Heiko Carstens
2008-02-10 17:43         ` Paul E. McKenney
2008-02-11 15:37         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).