LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.25-rc2-mm1 - boot hangs on ia64
@ 2008-02-25 15:56 Lee Schermerhorn
  2008-02-26 11:25 ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Lee Schermerhorn @ 2008-02-25 15:56 UTC (permalink / raw)
  To: linux-ia64, linux-kernel, Andrew Morton, Tony Luck, Ingo Molnar
  Cc: Bob Picco, Eric Whitney

25-rc2-mm1 is hanging early in boot on my HP ia64 numa platform.  I saw
the "Strange hang on ia64 with CONFIG_PRINTK_TIME=y" thread on lkml:  

	http://marc.info/?t=120288396800001&r=1&w=4

However, my config does not include PRINTK_TIME=y.  In fact, hang occurs
with ia64 defconfig as well--right after the "Loading...initrd...done"
message.  2.6.25-rc2 boots OK.

Bisecting the broken-out series appears to indict 'git-sched.patch'.  I
went ahead and added Ingo's patch, discussed in the "strange hang"
thread, even tho' I hadn't enabled printk timestamps.  No effect.

Anyone else seeing this?

Regards,
Lee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-25 15:56 2.6.25-rc2-mm1 - boot hangs on ia64 Lee Schermerhorn
@ 2008-02-26 11:25 ` KOSAKI Motohiro
  2008-02-26 11:31   ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-02-26 11:25 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, linux-ia64, linux-kernel, Andrew Morton,
	Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney

Hi

Fujitsu machine can't boot too.
my bisect indicate git-sched.patch cause regression too.

Thanks.

> 25-rc2-mm1 is hanging early in boot on my HP ia64 numa platform.  I saw
> the "Strange hang on ia64 with CONFIG_PRINTK_TIME=y" thread on lkml:  
> 
> 	http://marc.info/?t=120288396800001&r=1&w=4
> 
> However, my config does not include PRINTK_TIME=y.  In fact, hang occurs
> with ia64 defconfig as well--right after the "Loading...initrd...done"
> message.  2.6.25-rc2 boots OK.
> 
> Bisecting the broken-out series appears to indict 'git-sched.patch'.  I
> went ahead and added Ingo's patch, discussed in the "strange hang"
> thread, even tho' I hadn't enabled printk timestamps.  No effect.
> 
> Anyone else seeing this?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-26 11:25 ` KOSAKI Motohiro
@ 2008-02-26 11:31   ` Ingo Molnar
  2008-02-27  1:42     ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-02-26 11:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, linux-ia64, linux-kernel, Andrew Morton,
	Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney


* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Fujitsu machine can't boot too. my bisect indicate git-sched.patch 
> cause regression too.

hm, that's a bit weird - nothing really should have broken it. Could you 
try to do a specific bisection of sched-devel.git:

   http://people.redhat.com/mingo/sched-devel.git/README

it's just a handful of commits so it should be relatively quick to 
figure out. My only guess would be:

  Subject: sched: make early bootup sched_clock() use safer

but i think this has been ruled out before ...

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-26 11:31   ` Ingo Molnar
@ 2008-02-27  1:42     ` KOSAKI Motohiro
  2008-02-27  7:11       ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-02-27  1:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: kosaki.motohiro, Lee Schermerhorn, linux-ia64, linux-kernel,
	Andrew Morton, Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney

Hi Ingo

> > Fujitsu machine can't boot too. my bisect indicate git-sched.patch 
> > cause regression too.
> 
> hm, that's a bit weird - nothing really should have broken it. Could you 
> try to do a specific bisection of sched-devel.git:
> 
>    http://people.redhat.com/mingo/sched-devel.git/README

How do I know revision of git-sched.patch of 2.6.25-rc2-mm1?
Should I do bisect from HEAD of sched-devel.git?

> it's just a handful of commits so it should be relatively quick to 
> figure out. My only guess would be:
> 
>   Subject: sched: make early bootup sched_clock() use safer
> 
> but i think this has been ruled out before ...

rc2-mm1 + that patch doesn't boot too.
stop at the same point ;)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-27  1:42     ` KOSAKI Motohiro
@ 2008-02-27  7:11       ` Ingo Molnar
  2008-02-28 10:38         ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-02-27  7:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, linux-ia64, linux-kernel, Andrew Morton,
	Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney


* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > hm, that's a bit weird - nothing really should have broken it. Could you 
> > try to do a specific bisection of sched-devel.git:
> > 
> >    http://people.redhat.com/mingo/sched-devel.git/README
> 
> How do I know revision of git-sched.patch of 2.6.25-rc2-mm1? Should I 
> do bisect from HEAD of sched-devel.git?

yeah, please. If it's caused by sched-devel.git then you should see the 
hang there too.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-27  7:11       ` Ingo Molnar
@ 2008-02-28 10:38         ` KOSAKI Motohiro
  2008-02-28 11:50           ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-02-28 10:38 UTC (permalink / raw)
  To: Ingo Molnar, Steven Rostedt
  Cc: kosaki.motohiro, Lee Schermerhorn, linux-ia64, linux-kernel,
	Andrew Morton, Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney,
	akpm

Hi Ingo,
CC'ed Steven Rostedt 

I found the following patch cause regression by bisect.

2.6.25-rc2-mm1:                            doesn't boot
2.6.25-rc2-mm1 + revert following patch:   works well

but I think it is very strange.
runqueue_is_locked() seems simple and have not bug. ;)

What do you think this problem?


-------------------------------------------------------------------
commit 033394c2c097215ac556a446154af24fbf18b064
Author: Steven Rostedt <srostedt@redhat.com>
Date:   Mon Feb 25 21:15:44 2008 +0100

    printk: dont wake up klogd with the rq locked

    It is not wise to place a printk where the runqueue lock is held.


> 
> * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > > hm, that's a bit weird - nothing really should have broken it. Could you 
> > > try to do a specific bisection of sched-devel.git:
> > > 
> > >    http://people.redhat.com/mingo/sched-devel.git/README
> > 
> > How do I know revision of git-sched.patch of 2.6.25-rc2-mm1? Should I 
> > do bisect from HEAD of sched-devel.git?
> 
> yeah, please. If it's caused by sched-devel.git then you should see the 
> hang there too.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 10:38         ` KOSAKI Motohiro
@ 2008-02-28 11:50           ` Ingo Molnar
  2008-02-28 12:13             ` KOSAKI Motohiro
  2008-02-28 18:13             ` Andrew Morton
  0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-02-28 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Steven Rostedt, Lee Schermerhorn, linux-ia64, linux-kernel,
	Andrew Morton, Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney


* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi Ingo,
> CC'ed Steven Rostedt 
> 
> I found the following patch cause regression by bisect.
> 
> 2.6.25-rc2-mm1:                            doesn't boot
> 2.6.25-rc2-mm1 + revert following patch:   works well
> 
> but I think it is very strange. runqueue_is_locked() seems simple and 
> have not bug. ;)
> 
> What do you think this problem?

thanks for bisecting it down! Could ia64 have trouble accessing the 
percpu data structures of the scheduler?

does the patch below resolve the hang?

	Ingo

------------------------->
Subject: sched: fix wake_up_klogd()
From: Ingo Molnar <mingo@elte.hu>
Date: Thu Feb 28 12:42:45 CET 2008

on some platforms if we printk too early it might not be safe to call
into the scheduler data structures.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -948,7 +948,8 @@ int is_console_locked(void)
 
 void wake_up_klogd(void)
 {
-	if (!oops_in_progress && waitqueue_active(&log_wait))
+	if (!oops_in_progress && waitqueue_active(&log_wait) &&
+						!runqueue_is_locked())
 		wake_up_interruptible(&log_wait);
 }
 
@@ -1000,7 +1001,7 @@ void release_console_sem(void)
 	 * If we try to wake up klogd while printing with the runqueue lock
 	 * held, this will deadlock.
 	 */
-	if (wake_klogd && !runqueue_is_locked())
+	if (wake_klogd)
 		wake_up_klogd();
 }
 EXPORT_SYMBOL(release_console_sem);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 11:50           ` Ingo Molnar
@ 2008-02-28 12:13             ` KOSAKI Motohiro
  2008-02-28 18:13             ` Andrew Morton
  1 sibling, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2008-02-28 12:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: kosaki.motohiro, Steven Rostedt, Lee Schermerhorn, linux-ia64,
	linux-kernel, Andrew Morton, Tony Luck, Ingo Molnar, Bob Picco,
	Eric Whitney

Hi Ingo,

> thanks for bisecting it down! Could ia64 have trouble accessing the 
> percpu data structures of the scheduler?
> 
> does the patch below resolve the hang?

Thanks!
that patch works well on my test environment.

Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


BTW: Your work is ultimate fast. it's wonderful :)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 11:50           ` Ingo Molnar
  2008-02-28 12:13             ` KOSAKI Motohiro
@ 2008-02-28 18:13             ` Andrew Morton
  2008-02-28 19:12               ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2008-02-28 18:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: KOSAKI Motohiro, Steven Rostedt, Lee Schermerhorn, linux-ia64,
	linux-kernel, Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney

On Thu, 28 Feb 2008 12:50:41 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> @@ -1000,7 +1001,7 @@ void release_console_sem(void)
>  	 * If we try to wake up klogd while printing with the runqueue lock
>  	 * held, this will deadlock.
>  	 */
> -	if (wake_klogd && !runqueue_is_locked())
> +	if (wake_klogd)
>  		wake_up_klogd();
>  }

I don't think we shoudl have added that hack in the first place.  It solves a
problem which about three developers hit four times in five years but it
has made kernel logging less reliable for everyone.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 18:13             ` Andrew Morton
@ 2008-02-28 19:12               ` Ingo Molnar
  2008-02-28 19:24                 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-02-28 19:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Steven Rostedt, Lee Schermerhorn, linux-ia64,
	linux-kernel, Tony Luck, Ingo Molnar, Bob Picco, Eric Whitney


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 28 Feb 2008 12:50:41 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > @@ -1000,7 +1001,7 @@ void release_console_sem(void)
> >  	 * If we try to wake up klogd while printing with the runqueue lock
> >  	 * held, this will deadlock.
> >  	 */
> > -	if (wake_klogd && !runqueue_is_locked())
> > +	if (wake_klogd)
> >  		wake_up_klogd();
> >  }
> 
> I don't think we shoudl have added that hack in the first place.  It 
> solves a problem which about three developers hit four times in five 
> years but it has made kernel logging less reliable for everyone.

well, the problem was ia64, not a problem on x86 or other platforms. The 
problem here is ia64 not setting up percpu data structures soon enough. 
It has blown up in the past in other areas, and it will likely blow up 
in the future in other areas as well. It's just not robust to have init 
dependencies on such basic data structures like percpu areas like that.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 19:12               ` Ingo Molnar
@ 2008-02-28 19:24                 ` Andrew Morton
  2008-02-28 19:32                   ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2008-02-28 19:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: kosaki.motohiro, srostedt, Lee.Schermerhorn, linux-ia64,
	linux-kernel, tony.luck, mingo, bob.picco, eric.whitney

On Thu, 28 Feb 2008 20:12:14 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 28 Feb 2008 12:50:41 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > @@ -1000,7 +1001,7 @@ void release_console_sem(void)
> > >  	 * If we try to wake up klogd while printing with the runqueue lock
> > >  	 * held, this will deadlock.
> > >  	 */
> > > -	if (wake_klogd && !runqueue_is_locked())
> > > +	if (wake_klogd)
> > >  		wake_up_klogd();
> > >  }
> > 
> > I don't think we shoudl have added that hack in the first place.  It 
> > solves a problem which about three developers hit four times in five 
> > years but it has made kernel logging less reliable for everyone.
> 
> well, the problem was ia64, not a problem on x86 or other platforms.

I am referring to the original change which made klogd wakeups unreliable.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.25-rc2-mm1 - boot hangs on ia64
  2008-02-28 19:24                 ` Andrew Morton
@ 2008-02-28 19:32                   ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-02-28 19:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, srostedt, Lee.Schermerhorn, linux-ia64,
	linux-kernel, tony.luck, mingo, bob.picco, eric.whitney


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > > I don't think we shoudl have added that hack in the first place.  
> > > It solves a problem which about three developers hit four times in 
> > > five years but it has made kernel logging less reliable for 
> > > everyone.
> > 
> > well, the problem was ia64, not a problem on x86 or other platforms.
> 
> I am referring to the original change which made klogd wakeups 
> unreliable.

oh, indeed - agreed - i missed the fact that is_locked check is sporadic 
and can cause other CPUs to prevent the wakeup of klogd. I've zapped the 
change.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-02-28 19:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-25 15:56 2.6.25-rc2-mm1 - boot hangs on ia64 Lee Schermerhorn
2008-02-26 11:25 ` KOSAKI Motohiro
2008-02-26 11:31   ` Ingo Molnar
2008-02-27  1:42     ` KOSAKI Motohiro
2008-02-27  7:11       ` Ingo Molnar
2008-02-28 10:38         ` KOSAKI Motohiro
2008-02-28 11:50           ` Ingo Molnar
2008-02-28 12:13             ` KOSAKI Motohiro
2008-02-28 18:13             ` Andrew Morton
2008-02-28 19:12               ` Ingo Molnar
2008-02-28 19:24                 ` Andrew Morton
2008-02-28 19:32                   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).