LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Strange hang on ia64 with CONFIG_PRINTK_TIME=y
@ 2008-02-13  6:24 Roland Dreier
  2008-02-13  7:01 ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Roland Dreier @ 2008-02-13  6:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

I'm seeing a strange hang with current git (head 96b5a46e) on an ia64
box -- an Intel SDV with 2 dual core hyperthreaded Itanium 2 CPUs (so
8 logical CPUs to the kernel).  It hangs without printing anything
("Uncompressing Linux... done" from ELILO is the last thing I see) if
I have CONFIG_PRINTK_TIME=y; it works fine with CONFIG_PRINTK_TIME=n.

The really strange thing is that I have bisected this down to 326e96b9
("printk: revert ktime_get() timestamps"), and verified that if revert
this one patch on top of my current git tree, then the kernel boots
fine with CONFIG_PRINTK_TIME=y.  The strange thing is that I have also
checked that the real v2.6.24 kernel boots fine on this system, and as
far as I can tell, 2.6.24 didn't have the commit that 326e96b9 reverts
(19ef9309), so there is some interaction with another patch that made
19ef9309 necessary on my system.

Any good idea how to debug this, given that the broken kernels don't
give any output at all?

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13  6:24 Strange hang on ia64 with CONFIG_PRINTK_TIME=y Roland Dreier
@ 2008-02-13  7:01 ` David Miller
  2008-02-13 12:57   ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-02-13  7:01 UTC (permalink / raw)
  To: rdreier; +Cc: linux-kernel, mingo

From: Roland Dreier <rdreier@cisco.com>
Date: Tue, 12 Feb 2008 22:24:05 -0800

> I'm seeing a strange hang with current git (head 96b5a46e) on an ia64
> box -- an Intel SDV with 2 dual core hyperthreaded Itanium 2 CPUs (so
> 8 logical CPUs to the kernel).  It hangs without printing anything
> ("Uncompressing Linux... done" from ELILO is the last thing I see) if
> I have CONFIG_PRINTK_TIME=y; it works fine with CONFIG_PRINTK_TIME=n.
> 
> The really strange thing is that I have bisected this down to 326e96b9
> ("printk: revert ktime_get() timestamps"), and verified that if revert
> this one patch on top of my current git tree, then the kernel boots
> fine with CONFIG_PRINTK_TIME=y.  The strange thing is that I have also
> checked that the real v2.6.24 kernel boots fine on this system, and as
> far as I can tell, 2.6.24 didn't have the commit that 326e96b9 reverts
> (19ef9309), so there is some interaction with another patch that made
> 19ef9309 necessary on my system.
> 
> Any good idea how to debug this, given that the broken kernels don't
> give any output at all?

The kernel now derefernces per-cpu variables very early, essentially
in the very first printk() (via printk()'s call to cpu_clock()).

This bit me on sparc64 because of how I do the per-cpu address
formation.  If I booted on a non-zero cpuid things would explode.

You might be hitting something similar.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13  7:01 ` David Miller
@ 2008-02-13 12:57   ` Ingo Molnar
  2008-02-13 13:02     ` Ingo Molnar
  2008-02-13 13:03     ` David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-02-13 12:57 UTC (permalink / raw)
  To: David Miller; +Cc: rdreier, linux-kernel


* David Miller <davem@davemloft.net> wrote:

> The kernel now derefernces per-cpu variables very early, essentially 
> in the very first printk() (via printk()'s call to cpu_clock()).
> 
> This bit me on sparc64 because of how I do the per-cpu address 
> formation.  If I booted on a non-zero cpuid things would explode.
> 
> You might be hitting something similar.

hm. But the raw sched_clock() use was wrong. We could either go back to 
jiffies (which is certainly the simplest and was used before 
printk_clock() was introduced which incorrectly relied on sched_clock()) 
- but that loses precision and the same issue will re-visit us once we 
go totally tickless and start to map jiffies to GTOD ...

so .. how about the patch below? Note that we already had an "early 
bootup" special (the rq->idle check), it's now just made explicit via 
the scheduler_running flag.

	Ingo

---------------------->
Subject: sched: make sched_clock() early-bootup capable
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Feb 13 13:49:36 CET 2008

do not call sched_clock() too early. Not only might rq->idle
not be set up - but pure per-cpu data might not be accessible
either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-x86.q/kernel/sched.c
===================================================================
--- linux-x86.q.orig/kernel/sched.c
+++ linux-x86.q/kernel/sched.c
@@ -666,6 +666,8 @@ const_debug unsigned int sysctl_sched_rt
  */
 const_debug unsigned int sysctl_sched_rt_ratio = 62259;
 
+static __readmostly int scheduler_running;
+
 /*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
  * clock constructed from sched_clock():
@@ -676,14 +678,16 @@ unsigned long long cpu_clock(int cpu)
 	unsigned long flags;
 	struct rq *rq;
 
-	local_irq_save(flags);
-	rq = cpu_rq(cpu);
 	/*
 	 * Only call sched_clock() if the scheduler has already been
 	 * initialized (some code might call cpu_clock() very early):
 	 */
-	if (rq->idle)
-		update_rq_clock(rq);
+	if (unlikely(!scheduler_running))
+		return 0;
+
+	local_irq_save(flags);
+	rq = cpu_rq(cpu);
+	update_rq_clock(rq);
 	now = rq->clock;
 	local_irq_restore(flags);
 
@@ -7255,6 +7259,8 @@ void __init sched_init(void)
 	 * During early bootup we pretend to be a normal task:
 	 */
 	current->sched_class = &fair_sched_class;
+
+	scheduler_running = 1;
 }
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13 12:57   ` Ingo Molnar
@ 2008-02-13 13:02     ` Ingo Molnar
  2008-02-14  0:26       ` Roland Dreier
  2008-02-13 13:03     ` David Miller
  1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-02-13 13:02 UTC (permalink / raw)
  To: David Miller; +Cc: rdreier, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> so .. how about the patch below? Note that we already had an "early 
> bootup" special (the rq->idle check), it's now just made explicit via 
> the scheduler_running flag.

the one below even builds. (untested otherwise)

	Ingo

----------->
Subject: sched: make sched_clock() early-bootup capable
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Feb 13 13:49:36 CET 2008

do not call sched_clock() too early. Not only might rq->idle
not be set up - but pure per-cpu data might not be accessible
either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-x86.q/kernel/sched.c
===================================================================
--- linux-x86.q.orig/kernel/sched.c
+++ linux-x86.q/kernel/sched.c
@@ -666,6 +666,8 @@ const_debug unsigned int sysctl_sched_rt
  */
 const_debug unsigned int sysctl_sched_rt_ratio = 62259;
 
+static __read_mostly int scheduler_running;
+
 /*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
  * clock constructed from sched_clock():
@@ -676,14 +678,16 @@ unsigned long long cpu_clock(int cpu)
 	unsigned long flags;
 	struct rq *rq;
 
-	local_irq_save(flags);
-	rq = cpu_rq(cpu);
 	/*
 	 * Only call sched_clock() if the scheduler has already been
 	 * initialized (some code might call cpu_clock() very early):
 	 */
-	if (rq->idle)
-		update_rq_clock(rq);
+	if (unlikely(!scheduler_running))
+		return 0;
+
+	local_irq_save(flags);
+	rq = cpu_rq(cpu);
+	update_rq_clock(rq);
 	now = rq->clock;
 	local_irq_restore(flags);
 
@@ -7255,6 +7259,8 @@ void __init sched_init(void)
 	 * During early bootup we pretend to be a normal task:
 	 */
 	current->sched_class = &fair_sched_class;
+
+	scheduler_running = 1;
 }
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13 12:57   ` Ingo Molnar
  2008-02-13 13:02     ` Ingo Molnar
@ 2008-02-13 13:03     ` David Miller
  2008-02-13 13:06       ` Ingo Molnar
  2008-02-14  0:59       ` Tony Luck
  1 sibling, 2 replies; 21+ messages in thread
From: David Miller @ 2008-02-13 13:03 UTC (permalink / raw)
  To: mingo; +Cc: rdreier, linux-kernel

From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 13 Feb 2008 13:57:25 +0100

> so .. how about the patch below? Note that we already had an "early 
> bootup" special (the rq->idle check), it's now just made explicit via 
> the scheduler_running flag.

I don't see what the problem is.

It is legal to access per-cpu data as early as you like,
it just evaluates to the static copy in the per-cpu section
of the kernel image until the per-cpu areas are setup.

rq->idle should also be zero this early as well, that's
also legal to rely upon

I see nothing illegal in what cpu_clock() is doing, that's
why I fixed the sparc64 per-cpu problem I ran into since
sparc64 was doing the wrong thing when booted on a non-zero
cpu.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13 13:03     ` David Miller
@ 2008-02-13 13:06       ` Ingo Molnar
  2008-02-14  0:59       ` Tony Luck
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-02-13 13:06 UTC (permalink / raw)
  To: David Miller; +Cc: rdreier, linux-kernel


* David Miller <davem@davemloft.net> wrote:

> I see nothing illegal in what cpu_clock() is doing, that's why I fixed 
> the sparc64 per-cpu problem I ran into since sparc64 was doing the 
> wrong thing when booted on a non-zero cpu.

okay. I'm just somewhat uneasy about potentially losing the ability to 
printk. printk must stay simple and dependable. Maybe PRINTK_TIMESTAMPs 
should be delayed to at the point when system_state == SYSTEM_RUNNING.

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13 13:02     ` Ingo Molnar
@ 2008-02-14  0:26       ` Roland Dreier
  0 siblings, 0 replies; 21+ messages in thread
From: Roland Dreier @ 2008-02-14  0:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Miller, linux-kernel

 > > so .. how about the patch below? Note that we already had an "early 
 > > bootup" special (the rq->idle check), it's now just made explicit via 
 > > the scheduler_running flag.
 > 
 > the one below even builds. (untested otherwise)

I just tried this... it doesn't work on top of current git (same
behavior: hang before the kernel prints anything).

I also tried reverting 19ef9309 on top of a checkout of the tree state
at 19ef9309, and that hung before printing anything too.

 - R.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-13 13:03     ` David Miller
  2008-02-13 13:06       ` Ingo Molnar
@ 2008-02-14  0:59       ` Tony Luck
  2008-02-14  1:04         ` David Miller
  1 sibling, 1 reply; 21+ messages in thread
From: Tony Luck @ 2008-02-14  0:59 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, rdreier, linux-kernel

> It is legal to access per-cpu data as early as you like,
> it just evaluates to the static copy in the per-cpu section
> of the kernel image until the per-cpu areas are setup.

On ia64 per-cpu variables are mapped into the top 64K
of the address space.  Accessing them before the
resources to handle the access to that virtual address
have been set up would cause problems.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14  0:59       ` Tony Luck
@ 2008-02-14  1:04         ` David Miller
  2008-02-14  3:33           ` Tony Luck
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2008-02-14  1:04 UTC (permalink / raw)
  To: tony.luck; +Cc: mingo, rdreier, linux-kernel

From: "Tony Luck" <tony.luck@intel.com>
Date: Wed, 13 Feb 2008 16:59:30 -0800

> > It is legal to access per-cpu data as early as you like,
> > it just evaluates to the static copy in the per-cpu section
> > of the kernel image until the per-cpu areas are setup.
> 
> On ia64 per-cpu variables are mapped into the top 64K
> of the address space.  Accessing them before the
> resources to handle the access to that virtual address
> have been set up would cause problems.

That's right.  I thought you guys had something that would handle that
early on, but looking at how the trick works in the vmlinux.lds.S ia64
uses that isn't the case.

I guess we really can't do per-cpu accesses early on.

This is extremely fragile, we should find some way to formalize this
to make accidental early per-cpu references trap at build time or
similar.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14  1:04         ` David Miller
@ 2008-02-14  3:33           ` Tony Luck
  2008-02-14  3:47             ` Roland Dreier
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Luck @ 2008-02-14  3:33 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, rdreier, linux-kernel

> That's right.  I thought you guys had something that would handle that
> early on, but looking at how the trick works in the vmlinux.lds.S ia64
> uses that isn't the case.

We try to get things set up pertty early ... but I agree this is
fragile.  Adding code to printk() to not provide a timestamp
before some safe point in boot is a workaround to the
current problem.  But it may come back to haunt us if
other per-cpu data is added that needs to be accessed
early during boot.

There are some changes going on at the moment on how
we allocate the space for the per-cpu area.  It is likely that
for a non-boot cpu we might be able to get everything that
we need for per-cpu access to work done in head.S before
we can get to any C code.  Boot cpu may be harder unless
we statically allocate space for its per-cpu area in
vmlinux.lds.S

I'll take a closer look at what is needed tomorrow.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14  3:33           ` Tony Luck
@ 2008-02-14  3:47             ` Roland Dreier
  2008-02-14 18:24               ` Luck, Tony
  2008-02-14 21:27               ` Tony Luck
  0 siblings, 2 replies; 21+ messages in thread
From: Roland Dreier @ 2008-02-14  3:47 UTC (permalink / raw)
  To: Tony Luck; +Cc: David Miller, mingo, linux-kernel

 > I'll take a closer look at what is needed tomorrow.

Hi Tony,

Just curious -- can you reproduce the same problem with
CONFIG_PRINTK_TIME as I'm seeing?  If not I'm happy to test anything
you want to try.

The strange thing is that Ingo's patch to make cpu_clock() a NOP until
after sched_init() didn't fix things for me...

 - R.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14  3:47             ` Roland Dreier
@ 2008-02-14 18:24               ` Luck, Tony
  2008-02-14 21:27               ` Tony Luck
  1 sibling, 0 replies; 21+ messages in thread
From: Luck, Tony @ 2008-02-14 18:24 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, mingo, linux-kernel

> Just curious -- can you reproduce the same problem with
> CONFIG_PRINTK_TIME as I'm seeing?

Yes I can reproduce this (on latest Linus tree).  System
dies with no console output ... looks like the boot cpu
may have taken a machine check (it isn't responding to my
debugger).

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14  3:47             ` Roland Dreier
  2008-02-14 18:24               ` Luck, Tony
@ 2008-02-14 21:27               ` Tony Luck
  2008-02-14 22:04                 ` Roland Dreier
  1 sibling, 1 reply; 21+ messages in thread
From: Tony Luck @ 2008-02-14 21:27 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, mingo, linux-kernel

On Wed, Feb 13, 2008 at 7:47 PM, Roland Dreier <rdreier@cisco.com> wrote:
>  The strange thing is that Ingo's patch to make cpu_clock() a NOP until
>  after sched_init() didn't fix things for me...

Very strange.  I threw in an output line counter into the printk code() ... if I
disable the timestamps for the first 30 lines, then everything is good (so the
basic timestamping code does still work on ia64). But I would have thought
that Ingo's delay until sched_init() ought to be long enough too. Clearly I
need to figure out exactly what needs to be initialized to prevent the
hang/crash.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14 21:27               ` Tony Luck
@ 2008-02-14 22:04                 ` Roland Dreier
  2008-02-14 22:23                   ` Luck, Tony
  0 siblings, 1 reply; 21+ messages in thread
From: Roland Dreier @ 2008-02-14 22:04 UTC (permalink / raw)
  To: Tony Luck; +Cc: David Miller, mingo, linux-kernel

 > >  The strange thing is that Ingo's patch to make cpu_clock() a NOP until
 > >  after sched_init() didn't fix things for me...

 > Very strange.  I threw in an output line counter into the printk code() ... if I
 > disable the timestamps for the first 30 lines, then everything is good (so the
 > basic timestamping code does still work on ia64). But I would have thought
 > that Ingo's delay until sched_init() ought to be long enough too. Clearly I
 > need to figure out exactly what needs to be initialized to prevent the
 > hang/crash.

I guess sched_init() is too early... it does seem really strange to
me, but I just double checked with Ingo's patch and it does indeed
hang.  The slow way to make progress is just to go through
start_kernel() line-by-line and enable cpu_clock() at each stage, and
see where it stops hanging.  I'll give that a shot as a background
process (my ia64 box takes quite a while to boot, so each test takes a
long time but requires very little of my attention).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14 22:04                 ` Roland Dreier
@ 2008-02-14 22:23                   ` Luck, Tony
  2008-02-15  2:17                     ` Tony Luck
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2008-02-14 22:23 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, mingo, linux-kernel

> I guess sched_init() is too early... it does seem really strange to
> me, but I just double checked with Ingo's patch and it does indeed
> hang.  The slow way to make progress is just to go through
> start_kernel() line-by-line and enable cpu_clock() at each stage, and
> see where it stops hanging.  I'll give that a shot as a background
> process (my ia64 box takes quite a while to boot, so each test takes a
> long time but requires very little of my attention).

We *ought* to be safe after cpu_init() ... which is called from setup_arch(),
which is several calls before sched_init().

Thanks for looking at this though ... my ability to test just went
away for a while ... some lab re-organization means all my systems
just got powered off and removed from their rack so the rack can be
moved to a new location.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-14 22:23                   ` Luck, Tony
@ 2008-02-15  2:17                     ` Tony Luck
  2008-02-19 22:40                       ` Tony Luck
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Luck @ 2008-02-15  2:17 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, mingo, linux-kernel

> We *ought* to be safe after cpu_init() ... which is called from setup_arch(),
> which is several calls before sched_init().

Perhaps what is happening is that cpu0 comes online ... safely skips
over the early printk calls.  Calls cpu_init() which sets up the resources
*it* needs (ar.k3 points to per-cpu space), and then executes
sched_init() which marks it safe for all printk's. Then cpu1 comes
up and does a printk before it gets to cpu_init().

Try with Ingo patch and CONFIG_SMP=n to see if you can come
up on a uni-processor.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-15  2:17                     ` Tony Luck
@ 2008-02-19 22:40                       ` Tony Luck
  2008-02-20  8:12                         ` Ingo Molnar
                                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Tony Luck @ 2008-02-19 22:40 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, mingo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

> Perhaps what is happening is that cpu0 comes online ... safely skips
> over the early printk calls.  Calls cpu_init() which sets up the resources
> *it* needs (ar.k3 points to per-cpu space), and then executes
> sched_init() which marks it safe for all printk's. Then cpu1 comes
> up and does a printk before it gets to cpu_init().

I just tried Ingo's patch[1] on a 2.6.25-rc2 kernel with printk timestamps
turned on ... and it booted just fine on my tiger4.  The default path
for non-boot cpus is from head.S to start_secondary(), and that
calls cpu_init() pretty quickly.  There shouldn't normally[2] be any
printk() calls on the non-boot cpu before it is safe to do so.

-Tony

[1] Attached
[2] If you set #define SMP_DEBUG in arch/ia64/kernel/smpboot.c
that enables at least one printk() that will cause problems if you have
also configured timestamps.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ingo.patch --]
[-- Type: text/x-patch; name=ingo.patch, Size: 1271 bytes --]

 kernel/sched.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-x86.q/kernel/sched.c
===================================================================
--- linux-x86.q.orig/kernel/sched.c
+++ linux-x86.q/kernel/sched.c
@@ -666,6 +666,8 @@ const_debug unsigned int sysctl_sched_rt
  */
 const_debug unsigned int sysctl_sched_rt_ratio = 62259;
 
+static __read_mostly int scheduler_running;
+
 /*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
  * clock constructed from sched_clock():
@@ -676,14 +678,16 @@ unsigned long long cpu_clock(int cpu)
 	unsigned long flags;
 	struct rq *rq;
 
-	local_irq_save(flags);
-	rq = cpu_rq(cpu);
 	/*
 	 * Only call sched_clock() if the scheduler has already been
 	 * initialized (some code might call cpu_clock() very early):
 	 */
-	if (rq->idle)
-		update_rq_clock(rq);
+	if (unlikely(!scheduler_running))
+		return 0;
+
+	local_irq_save(flags);
+	rq = cpu_rq(cpu);
+	update_rq_clock(rq);
 	now = rq->clock;
 	local_irq_restore(flags);
 
@@ -7255,6 +7259,8 @@ void __init sched_init(void)
 	 * During early bootup we pretend to be a normal task:
 	 */
 	current->sched_class = &fair_sched_class;
+
+	scheduler_running = 1;
 }
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-19 22:40                       ` Tony Luck
@ 2008-02-20  8:12                         ` Ingo Molnar
  2008-02-20  8:15                           ` David Miller
  2008-02-20 23:35                         ` Roland Dreier
  2008-02-20 23:36                         ` Roland Dreier
  2 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-02-20  8:12 UTC (permalink / raw)
  To: Tony Luck; +Cc: Roland Dreier, David Miller, linux-kernel


* Tony Luck <tony.luck@gmail.com> wrote:

> > Perhaps what is happening is that cpu0 comes online ... safely skips 
> > over the early printk calls.  Calls cpu_init() which sets up the 
> > resources *it* needs (ar.k3 points to per-cpu space), and then 
> > executes sched_init() which marks it safe for all printk's. Then 
> > cpu1 comes up and does a printk before it gets to cpu_init().
> 
> I just tried Ingo's patch[1] on a 2.6.25-rc2 kernel with printk 
> timestamps turned on ... and it booted just fine on my tiger4.  The 
> default path for non-boot cpus is from head.S to start_secondary(), 
> and that calls cpu_init() pretty quickly.  There shouldn't normally[2] 
> be any printk() calls on the non-boot cpu before it is safe to do so.

ok. I've queued up the patch again in sched.git, as a .25 fix. Can i add 
your Tested-by and Acked-by tags?

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-20  8:12                         ` Ingo Molnar
@ 2008-02-20  8:15                           ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2008-02-20  8:15 UTC (permalink / raw)
  To: mingo; +Cc: tony.luck, rdreier, linux-kernel

From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 20 Feb 2008 09:12:28 +0100

> 
> * Tony Luck <tony.luck@gmail.com> wrote:
> 
> > > Perhaps what is happening is that cpu0 comes online ... safely skips 
> > > over the early printk calls.  Calls cpu_init() which sets up the 
> > > resources *it* needs (ar.k3 points to per-cpu space), and then 
> > > executes sched_init() which marks it safe for all printk's. Then 
> > > cpu1 comes up and does a printk before it gets to cpu_init().
> > 
> > I just tried Ingo's patch[1] on a 2.6.25-rc2 kernel with printk 
> > timestamps turned on ... and it booted just fine on my tiger4.  The 
> > default path for non-boot cpus is from head.S to start_secondary(), 
> > and that calls cpu_init() pretty quickly.  There shouldn't normally[2] 
> > be any printk() calls on the non-boot cpu before it is safe to do so.
> 
> ok. I've queued up the patch again in sched.git, as a .25 fix. Can i add 
> your Tested-by and Acked-by tags?

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-19 22:40                       ` Tony Luck
  2008-02-20  8:12                         ` Ingo Molnar
@ 2008-02-20 23:35                         ` Roland Dreier
  2008-02-20 23:36                         ` Roland Dreier
  2 siblings, 0 replies; 21+ messages in thread
From: Roland Dreier @ 2008-02-20 23:35 UTC (permalink / raw)
  To: Tony Luck; +Cc: David Miller, mingo, linux-kernel

 > I just tried Ingo's patch[1] on a 2.6.25-rc2 kernel with printk timestamps
 > turned on ... and it booted just fine on my tiger4.  The default path
 > for non-boot cpus is from head.S to start_secondary(), and that
 > calls cpu_init() pretty quickly.  There shouldn't normally[2] be any
 > printk() calls on the non-boot cpu before it is safe to do so.

Hmm, current git + Ingo's patch still hangs on boot on my system
("Hitachi Cold Fusion-3e 2S4U" according to DMI).

 - R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
  2008-02-19 22:40                       ` Tony Luck
  2008-02-20  8:12                         ` Ingo Molnar
  2008-02-20 23:35                         ` Roland Dreier
@ 2008-02-20 23:36                         ` Roland Dreier
  2 siblings, 0 replies; 21+ messages in thread
From: Roland Dreier @ 2008-02-20 23:36 UTC (permalink / raw)
  To: Tony Luck; +Cc: David Miller, mingo, linux-kernel

BTW, sorry I didn't get a chance to try some of the other debugging
you suggested yet... got busy with other stuff.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-02-20 23:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-13  6:24 Strange hang on ia64 with CONFIG_PRINTK_TIME=y Roland Dreier
2008-02-13  7:01 ` David Miller
2008-02-13 12:57   ` Ingo Molnar
2008-02-13 13:02     ` Ingo Molnar
2008-02-14  0:26       ` Roland Dreier
2008-02-13 13:03     ` David Miller
2008-02-13 13:06       ` Ingo Molnar
2008-02-14  0:59       ` Tony Luck
2008-02-14  1:04         ` David Miller
2008-02-14  3:33           ` Tony Luck
2008-02-14  3:47             ` Roland Dreier
2008-02-14 18:24               ` Luck, Tony
2008-02-14 21:27               ` Tony Luck
2008-02-14 22:04                 ` Roland Dreier
2008-02-14 22:23                   ` Luck, Tony
2008-02-15  2:17                     ` Tony Luck
2008-02-19 22:40                       ` Tony Luck
2008-02-20  8:12                         ` Ingo Molnar
2008-02-20  8:15                           ` David Miller
2008-02-20 23:35                         ` Roland Dreier
2008-02-20 23:36                         ` Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).