LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* overlaping printk
@ 2004-05-19 15:19 Ricky Beam
  2004-05-19 15:26 ` Tigran Aivazian
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ricky Beam @ 2004-05-19 15:19 UTC (permalink / raw)
  To: Linux Kernel Mail List

Can anyone explain why the kernel does this on a serial console:

C<PU0 >C2P:U M 3a:ch Mianech iCnhee cCk heExckc epEtxcieopnt: i0o0n0: 0000000000
0000000000000040
00C4P
U 2C:P UE 3IP:: E cI0P:1 0c10fa108 1fEFa8L AEGSF:L AG00S0:0 00200460
0246    e
ax:     e 0ax00: 0000000000 0eb00x:  ebfx7f: a6f070f0a4 0ec00x:  e0c2x:a7 a02bea
7e aebedxe : efdx7f: a6f700f0a4
000     e
si      e: sif7: faf76f0a0040 e0d0i e:d ci0: 10c01f107e1f e7be p:eb 0p:00 000000
00000 e00sp e:s fp:7 faf77ffab45
fb4*
** **Ba*n kBa 0nk:  00:00 0000000000000000000000000000[00[0000000000000000000000
0000000]00] a at t 00000000000000000000000000000000

***** * BaBankn k1 1: :0 000000000000000000000000000000g0eneral protection fault
: 0000 [#1]
...

That's two MCE's being printed at the exact same time. (CPU2 & 3 are a single
P4 Xeon.)  It makes it real damn hard to debug when the errors are printed
like that.

The GPF was printed correctly:
CPU:    3
EIP:    0060:[<c010c0a7>]    Not tainted
EFLAGS: 00010282   (2.6.6-SMP-rc3+BK@1.1386)
EIP is at intel_machine_check+0x12b/0x364
eax: 0000001f   ebx: 00000004   ecx: 00000407   edx: 00006f52
esi: 00000407   edi: 00000000   ebp: 00000000   esp: f7fa5f04
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=f7fa4000 task=f7fa85c0)
Stack: c0304002 00000001 00000000 00000000 f7fa5fb4 0000000f c0120adc 00000003
       00000001 00000000 00000004 00000001 00000000 f7fa4000 02a7abee f7fa4000
       f7fa4000 c0101f7e 00000000 f7fa5fb4 00000246 c0101fa8 00000046 c03cf108
Call Trace:
 [<c0120adc>] run_timer_softirq+0x110/0x180
 [<c0101f7e>] default_idle+0x0/0x2d
 [<c0101fa8>] default_idle+0x2a/0x2d
 [<c010bf7c>] intel_machine_check+0x0/0x364
 [<c01049b5>] error_code+0x2d/0x38
 [<c0101f7e>] default_idle+0x0/0x2d
 [<c0101fa8>] default_idle+0x2a/0x2d
 [<c010201c>] cpu_idle+0x37/0x40
 [<c0119558>] printk+0x172/0x1a8
 [<c03b2981>] print_cpu_info+0x86/0xd2

Code: 0f 32 81 c3 02 04 00 00 89 44 24 08 89 54 24 04 c7 04 24 f7

I wouldn't trust it as it was on the MCE'ing processor.

Note: The answer to "which kernel" is ALL of them. (even the bastard stepchild
redhat 2.4 kernel will do it.)

It goes on to have a fit unblanking a never blanked VT.  Stupid kernel!

--Ricky



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 15:19 overlaping printk Ricky Beam
@ 2004-05-19 15:26 ` Tigran Aivazian
  2004-05-19 15:31   ` Ricky Beam
  2004-05-19 15:39 ` Randy.Dunlap
  2004-05-20  7:21 ` Andrew Morton
  2 siblings, 1 reply; 15+ messages in thread
From: Tigran Aivazian @ 2004-05-19 15:26 UTC (permalink / raw)
  To: Ricky Beam; +Cc: Linux Kernel Mail List

It happens to me often (on SMP) too. Originally I thought that printk is
broken but having looked at the code I see that it has sufficient locking
in place to prevent this from happening (on printk side). However, the
fact that it only happens on a serial console (especially at low baud
rates like 9600)  points in the direction of the serial driver.

Kind regards
Tigran

On Wed, 19 May 2004, Ricky Beam wrote:

> Can anyone explain why the kernel does this on a serial console:
> 
> C<PU0 >C2P:U M 3a:ch Mianech iCnhee cCk heExckc epEtxcieopnt: i0o0n0: 0000000000
> 0000000000000040
> 00C4P
> U 2C:P UE 3IP:: E cI0P:1 0c10fa108 1fEFa8L AEGSF:L AG00S0:0 00200460
> 0246    e
> ax:     e 0ax00: 0000000000 0eb00x:  ebfx7f: a6f070f0a4 0ec00x:  e0c2x:a7 a02bea
> 7e aebedxe : efdx7f: a6f700f0a4
> 000     e
> si      e: sif7: faf76f0a0040 e0d0i e:d ci0: 10c01f107e1f e7be p:eb 0p:00 000000
> 00000 e00sp e:s fp:7 faf77ffab45
> fb4*
> ** **Ba*n kBa 0nk:  00:00 0000000000000000000000000000[00[0000000000000000000000
> 0000000]00] a at t 00000000000000000000000000000000
> 
> ***** * BaBankn k1 1: :0 000000000000000000000000000000g0eneral protection fault
> : 0000 [#1]
> ....
> 
> That's two MCE's being printed at the exact same time. (CPU2 & 3 are a single
> P4 Xeon.)  It makes it real damn hard to debug when the errors are printed
> like that.
> 
> The GPF was printed correctly:
> CPU:    3
> EIP:    0060:[<c010c0a7>]    Not tainted
> EFLAGS: 00010282   (2.6.6-SMP-rc3+BK@1.1386)
> EIP is at intel_machine_check+0x12b/0x364
> eax: 0000001f   ebx: 00000004   ecx: 00000407   edx: 00006f52
> esi: 00000407   edi: 00000000   ebp: 00000000   esp: f7fa5f04
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, threadinfo=f7fa4000 task=f7fa85c0)
> Stack: c0304002 00000001 00000000 00000000 f7fa5fb4 0000000f c0120adc 00000003
>        00000001 00000000 00000004 00000001 00000000 f7fa4000 02a7abee f7fa4000
>        f7fa4000 c0101f7e 00000000 f7fa5fb4 00000246 c0101fa8 00000046 c03cf108
> Call Trace:
>  [<c0120adc>] run_timer_softirq+0x110/0x180
>  [<c0101f7e>] default_idle+0x0/0x2d
>  [<c0101fa8>] default_idle+0x2a/0x2d
>  [<c010bf7c>] intel_machine_check+0x0/0x364
>  [<c01049b5>] error_code+0x2d/0x38
>  [<c0101f7e>] default_idle+0x0/0x2d
>  [<c0101fa8>] default_idle+0x2a/0x2d
>  [<c010201c>] cpu_idle+0x37/0x40
>  [<c0119558>] printk+0x172/0x1a8
>  [<c03b2981>] print_cpu_info+0x86/0xd2
> 
> Code: 0f 32 81 c3 02 04 00 00 89 44 24 08 89 54 24 04 c7 04 24 f7
> 
> I wouldn't trust it as it was on the MCE'ing processor.
> 
> Note: The answer to "which kernel" is ALL of them. (even the bastard stepchild
> redhat 2.4 kernel will do it.)
> 
> It goes on to have a fit unblanking a never blanked VT.  Stupid kernel!
> 
> --Ricky
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 15:26 ` Tigran Aivazian
@ 2004-05-19 15:31   ` Ricky Beam
  0 siblings, 0 replies; 15+ messages in thread
From: Ricky Beam @ 2004-05-19 15:31 UTC (permalink / raw)
  To: Tigran Aivazian; +Cc: Linux Kernel Mail List

On Wed, 19 May 2004, Tigran Aivazian wrote:
>It happens to me often (on SMP) too. Originally I thought that printk is
>broken but having looked at the code I see that it has sufficient locking
>in place to prevent this from happening (on printk side). However, the
>fact that it only happens on a serial console (especially at low baud
>rates like 9600)  points in the direction of the serial driver.

(the port's @ 115.2k, btw)

That doesn't make sense.  printk is putting things in it's ring buffer
correctly, however the process that reads out of the buffer is reading
from two different points and unpredictably pushing them out the serial
port.

I'll go look at the code behind kseriod.

--Ricky



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 15:19 overlaping printk Ricky Beam
  2004-05-19 15:26 ` Tigran Aivazian
@ 2004-05-19 15:39 ` Randy.Dunlap
  2004-05-20  7:21 ` Andrew Morton
  2 siblings, 0 replies; 15+ messages in thread
From: Randy.Dunlap @ 2004-05-19 15:39 UTC (permalink / raw)
  To: Ricky Beam; +Cc: linux-kernel

On Wed, 19 May 2004 11:19:15 -0400 (EDT) Ricky Beam wrote:

| Can anyone explain why the kernel does this on a serial console:
| 
| C<PU0 >C2P:U M 3a:ch Mianech iCnhee cCk heExckc epEtxcieopnt: i0o0n0: 0000000000
| 0000000000000040
| 00C4P
| U 2C:P UE 3IP:: E cI0P:1 0c10fa108 1fEFa8L AEGSF:L AG00S0:0 00200460
| 0246    e
| ax:     e 0ax00: 0000000000 0eb00x:  ebfx7f: a6f070f0a4 0ec00x:  e0c2x:a7 a02bea
| 7e aebedxe : efdx7f: a6f700f0a4
| 000     e
| si      e: sif7: faf76f0a0040 e0d0i e:d ci0: 10c01f107e1f e7be p:eb 0p:00 000000
| 00000 e00sp e:s fp:7 faf77ffab45
| fb4*
| ** **Ba*n kBa 0nk:  00:00 0000000000000000000000000000[00[0000000000000000000000
| 0000000]00] a at t 00000000000000000000000000000000
| 
| ***** * BaBankn k1 1: :0 000000000000000000000000000000g0eneral protection fault
| : 0000 [#1]
| ...
| 
| That's two MCE's being printed at the exact same time. (CPU2 & 3 are a single
| P4 Xeon.)  It makes it real damn hard to debug when the errors are printed
| like that.
| 
| The GPF was printed correctly:
| CPU:    3
| EIP:    0060:[<c010c0a7>]    Not tainted
| EFLAGS: 00010282   (2.6.6-SMP-rc3+BK@1.1386)
| EIP is at intel_machine_check+0x12b/0x364
| eax: 0000001f   ebx: 00000004   ecx: 00000407   edx: 00006f52
| esi: 00000407   edi: 00000000   ebp: 00000000   esp: f7fa5f04
| ds: 007b   es: 007b   ss: 0068
| Process swapper (pid: 0, threadinfo=f7fa4000 task=f7fa85c0)
| Stack: c0304002 00000001 00000000 00000000 f7fa5fb4 0000000f c0120adc 00000003
|        00000001 00000000 00000004 00000001 00000000 f7fa4000 02a7abee f7fa4000
|        f7fa4000 c0101f7e 00000000 f7fa5fb4 00000246 c0101fa8 00000046 c03cf108
| Call Trace:
|  [<c0120adc>] run_timer_softirq+0x110/0x180
|  [<c0101f7e>] default_idle+0x0/0x2d
|  [<c0101fa8>] default_idle+0x2a/0x2d
|  [<c010bf7c>] intel_machine_check+0x0/0x364
|  [<c01049b5>] error_code+0x2d/0x38
|  [<c0101f7e>] default_idle+0x0/0x2d
|  [<c0101fa8>] default_idle+0x2a/0x2d
|  [<c010201c>] cpu_idle+0x37/0x40
|  [<c0119558>] printk+0x172/0x1a8
|  [<c03b2981>] print_cpu_info+0x86/0xd2
| 
| Code: 0f 32 81 c3 02 04 00 00 89 44 24 08 89 54 24 04 c7 04 24 f7
| 
| I wouldn't trust it as it was on the MCE'ing processor.
| 
| Note: The answer to "which kernel" is ALL of them. (even the bastard stepchild
| redhat 2.4 kernel will do it.)
| 
| It goes on to have a fit unblanking a never blanked VT.  Stupid kernel!


I've seen it on SMP also (like Tigran).  A patch has been posted
for this a few times, by David Howells.  The latest one that I
see is here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103001834526258&w=2

Don't know how valid it is now.

--
~Randy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 15:19 overlaping printk Ricky Beam
  2004-05-19 15:26 ` Tigran Aivazian
  2004-05-19 15:39 ` Randy.Dunlap
@ 2004-05-20  7:21 ` Andrew Morton
  2 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2004-05-20  7:21 UTC (permalink / raw)
  To: Ricky Beam; +Cc: linux-kernel

Ricky Beam <jfbeam@bluetronic.net> wrote:
>
> Can anyone explain why the kernel does this on a serial console:
> 
>  C<PU0 >C2P:U M 3a:ch Mianech iCnhee cCk heExckc epEtxcieopnt: i0o0n0: 0000000000
>  0000000000000040

<looks>

<after three years, penny finally drops>

We bust the spinlock for *every* printk call.  No wonder.


diff -puN kernel/printk.c~printk-mangling-fix kernel/printk.c
--- 25/kernel/printk.c~printk-mangling-fix	2004-05-20 00:20:06.388725128 -0700
+++ 25-akpm/kernel/printk.c	2004-05-20 00:20:27.228556992 -0700
@@ -492,8 +492,10 @@ asmlinkage int printk(const char *fmt, .
 	char *p;
 	static char printk_buf[1024];
 	static int log_level_unknown = 1;
+	static int once;
 
-	if (oops_in_progress) {
+	if (oops_in_progress && !once) {
+		once = 1;
 		/* If a crash is occurring, make sure we can't deadlock */
 		spin_lock_init(&logbuf_lock);
 		/* And make sure that we print immediately */

_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 23:11               ` Andrew Morton
@ 2004-05-21  7:00                 ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-05-21  7:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: adi, ak, linux-kernel


* Andrew Morton <akpm@osdl.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > i've attached a new patch that does what Andi suggested too - 
> > timestamping of the oopses. This way we will zap no sooner than 10 
> > seconds after the first oops.
> 
> I think that will do the wrong thing between 23 and 47 days uptime
> because time_after() will return an incorrect answer.
> 
> How's this look?

> +	static unsigned long oops_timestamp;
> +
> +	if (time_after_eq(jiffies, oops_timestamp) &&
> +			!time_after(jiffies, oops_timestamp + 30*HZ))
>  		return;
> +

looks good to me!

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 18:57             ` Ingo Molnar
@ 2004-05-20 23:11               ` Andrew Morton
  2004-05-21  7:00                 ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2004-05-20 23:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: adi, ak, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
>
> i've attached a new patch that does what Andi suggested too - 
> timestamping of the oopses. This way we will zap no sooner than 10 
> seconds after the first oops.

I think that will do the wrong thing between 23 and 47 days uptime because
time_after() will return an incorrect answer.

How's this look?




- Bump the timeout to 30 seconds - 9600 baud is slow.

- Handle jiffy wraps: change the logic so that we only skip the lockbust if
  the current time is within 30 seconds of the previous lockbusting attempt.



---

 25-akpm/kernel/printk.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff -puN kernel/printk.c~mangled-printk-oops-output-fix-tweaks kernel/printk.c
--- 25/kernel/printk.c~mangled-printk-oops-output-fix-tweaks	Thu May 20 16:01:53 2004
+++ 25-akpm/kernel/printk.c	Thu May 20 16:10:32 2004
@@ -55,9 +55,6 @@ EXPORT_SYMBOL(console_printk);
 
 int oops_in_progress;
 
-/* zap spinlocks only once: */
-unsigned long oops_timestamp;
-
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -479,10 +476,14 @@ static void emit_log_char(char c)
  * every 10 seconds, to leave time for slow consoles to print a
  * full oops.
  */
-static inline void zap_locks(void)
+static void zap_locks(void)
 {
-	if (!time_after(jiffies, oops_timestamp + 10*HZ))
+	static unsigned long oops_timestamp;
+
+	if (time_after_eq(jiffies, oops_timestamp) &&
+			!time_after(jiffies, oops_timestamp + 30*HZ))
 		return;
+
 	oops_timestamp = jiffies;
 
 	/* If a crash is occurring, make sure we can't deadlock */

_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 16:19           ` Andy Isaacson
@ 2004-05-20 18:57             ` Ingo Molnar
  2004-05-20 23:11               ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-05-20 18:57 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Andi Kleen, linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]


* Andy Isaacson <adi@hexapodia.org> wrote:

> It looks like this will cause problems if the user experiences an oops
> during boot, runs for a few hours, and then has another oops.  Won't
> the second oops fail to break the lock, and end up just deadlocking?

i've attached a new patch that does what Andi suggested too - 
timestamping of the oopses. This way we will zap no sooner than 10 
seconds after the first oops.

	Ingo

[-- Attachment #2: bust-spinlocks-fix-2.6.6-A1 --]
[-- Type: text/plain, Size: 1514 bytes --]

--- linux/kernel/printk.c.orig	
+++ linux/kernel/printk.c	
@@ -55,6 +55,9 @@ EXPORT_SYMBOL(console_printk);
 
 int oops_in_progress;
 
+/* zap spinlocks only once: */
+unsigned long oops_timestamp;
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -472,6 +475,23 @@ static void emit_log_char(char c)
 }
 
 /*
+ * Zap console related locks when oopsing. Only zap at most once
+ * every 10 seconds, to leave time for slow consoles to print a
+ * full oops.
+ */
+static inline void zap_locks(void)
+{
+	if (!time_after(jiffies, oops_timestamp + 10*HZ))
+		return;
+	oops_timestamp = jiffies;
+
+	/* If a crash is occurring, make sure we can't deadlock */
+	spin_lock_init(&logbuf_lock);
+	/* And make sure that we print immediately */
+	init_MUTEX(&console_sem);
+}
+
+/*
  * This is printk.  It can be called from any context.  We want it to work.
  * 
  * We try to grab the console_sem.  If we succeed, it's easy - we log the output and
@@ -493,12 +513,8 @@ asmlinkage int printk(const char *fmt, .
 	static char printk_buf[1024];
 	static int log_level_unknown = 1;
 
-	if (oops_in_progress) {
-		/* If a crash is occurring, make sure we can't deadlock */
-		spin_lock_init(&logbuf_lock);
-		/* And make sure that we print immediately */
-		init_MUTEX(&console_sem);
-	}
+	if (unlikely(oops_in_progress))
+		zap_locks();
 
 	/* This stops the holder of console_sem just where we want him */
 	spin_lock_irqsave(&logbuf_lock, flags);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 15:53         ` Ingo Molnar
@ 2004-05-20 16:19           ` Andy Isaacson
  2004-05-20 18:57             ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Andy Isaacson @ 2004-05-20 16:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, linux-kernel, Andrew Morton

On Thu, May 20, 2004 at 05:53:23PM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > another solution would be to break the lock only once during the
> > kernel's lifetime. The system is messed up anyway if it needs multiple
> > lock breaks to get an oops out to the console. We dont care about
> > followup oopses - the first oops is that matters.
> 
> i.e. something like the attached patch, against BK-curr. (i've also
> attached a cleanup patch that gets rid of the many instances of
> bust_spinlocks() - we now have a generic one in lib/bust_spinlocks.c)
> 
> i consider any secondary lockup after the first oops has been printed a
> feature - sometimes the first oops gets washed away by the many followup
> oopses.

> +unsigned long zap_spinlocks = 1;
[snip]
> -	if (oops_in_progress) {
> +	if (oops_in_progress && test_and_clear_bit(0, &zap_spinlocks)) {

It looks like this will cause problems if the user experiences an oops
during boot, runs for a few hours, and then has another oops.  Won't the
second oops fail to break the lock, and end up just deadlocking?

Yeah, of course you shouldn't blithely ignore the first oops.  But I bet
it happens.

Perhaps zap_spinlocks could be reset after 10 minutes without an oops?

-andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 15:19       ` Ingo Molnar
@ 2004-05-20 15:53         ` Ingo Molnar
  2004-05-20 16:19           ` Andy Isaacson
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-05-20 15:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]


* Ingo Molnar <mingo@elte.hu> wrote:

> another solution would be to break the lock only once during the
> kernel's lifetime. The system is messed up anyway if it needs multiple
> lock breaks to get an oops out to the console. We dont care about
> followup oopses - the first oops is that matters.

i.e. something like the attached patch, against BK-curr. (i've also
attached a cleanup patch that gets rid of the many instances of
bust_spinlocks() - we now have a generic one in lib/bust_spinlocks.c)

i consider any secondary lockup after the first oops has been printed a
feature - sometimes the first oops gets washed away by the many followup
oopses.

i've tested the patch with parallel SMP oopses - they seem to be
serialized now. (but it's hard to time the oopses right.)

	Ingo

[-- Attachment #2: bust-spinlocks-fix-2.6.6-A0 --]
[-- Type: text/plain, Size: 695 bytes --]

--- linux/kernel/printk.c.orig	
+++ linux/kernel/printk.c	
@@ -55,6 +55,9 @@ EXPORT_SYMBOL(console_printk);
 
 int oops_in_progress;
 
+/* zap spinlocks only once: */
+unsigned long zap_spinlocks = 1;
+
 /*
  * console_sem protects the console_drivers list, and also
  * provides serialisation for access to the entire console
@@ -493,7 +496,7 @@ asmlinkage int printk(const char *fmt, .
 	static char printk_buf[1024];
 	static int log_level_unknown = 1;
 
-	if (oops_in_progress) {
+	if (oops_in_progress && test_and_clear_bit(0, &zap_spinlocks)) {
 		/* If a crash is occurring, make sure we can't deadlock */
 		spin_lock_init(&logbuf_lock);
 		/* And make sure that we print immediately */

[-- Attachment #3: bust-spinlocks-cleanup-2.6.6-A0 --]
[-- Type: text/plain, Size: 4594 bytes --]

--- linux/arch/i386/mm/fault.c.orig	
+++ linux/arch/i386/mm/fault.c	
@@ -31,32 +31,6 @@
 extern void die(const char *,struct pt_regs *,long);
 
 /*
- * Unlock any spinlocks which will prevent us from getting the
- * message out 
- */
-void bust_spinlocks(int yes)
-{
-	int loglevel_save = console_loglevel;
-
-	if (yes) {
-		oops_in_progress = 1;
-		return;
-	}
-#ifdef CONFIG_VT
-	unblank_screen();
-#endif
-	oops_in_progress = 0;
-	/*
-	 * OK, the message is on the console.  Now we call printk()
-	 * without oops_in_progress set so that printk will give klogd
-	 * a poke.  Hold onto your hats...
-	 */
-	console_loglevel = 15;		/* NMI oopser may have shut the console up */
-	printk(" ");
-	console_loglevel = loglevel_save;
-}
-
-/*
  * Return EIP plus the CS segment base.  The segment limit is also
  * adjusted, clamped to the kernel/user address space (whichever is
  * appropriate), and returned in *eip_limit.
--- linux/arch/ia64/kernel/traps.c.orig	
+++ linux/arch/ia64/kernel/traps.c	
@@ -58,34 +58,6 @@ trap_init (void)
 		fpswa_interface = __va(ia64_boot_param->fpswa);
 }
 
-/*
- * Unlock any spinlocks which will prevent us from getting the message out (timerlist_lock
- * is acquired through the console unblank code)
- */
-void
-bust_spinlocks (int yes)
-{
-	int loglevel_save = console_loglevel;
-
-	if (yes) {
-		oops_in_progress = 1;
-		return;
-	}
-
-#ifdef CONFIG_VT
-	unblank_screen();
-#endif
-	oops_in_progress = 0;
-	/*
-	 * OK, the message is on the console.  Now we call printk() without
-	 * oops_in_progress set so that printk will give klogd a poke.  Hold onto
-	 * your hats...
-	 */
-	console_loglevel = 15;		/* NMI oopser may have shut the console up */
-	printk(" ");
-	console_loglevel = loglevel_save;
-}
-
 void
 die (const char *str, struct pt_regs *regs, long err)
 {
--- linux/arch/x86_64/mm/fault.c.orig	
+++ linux/arch/x86_64/mm/fault.c	
@@ -34,27 +34,6 @@
 #include <asm/kdebug.h>
 #include <asm-generic/sections.h>
 
-void bust_spinlocks(int yes)
-{
-	int loglevel_save = console_loglevel;
-	if (yes) {
-		oops_in_progress = 1;
-	} else {
-#ifdef CONFIG_VT
-		unblank_screen();
-#endif
-		oops_in_progress = 0;
-		/*
-		 * OK, the message is on the console.  Now we call printk()
-		 * without oops_in_progress set so that printk will give klogd
-		 * a poke.  Hold onto your hats...
-		 */
-		console_loglevel = 15;		/* NMI oopser may have shut the console up */
-		printk(" ");
-		console_loglevel = loglevel_save;
-	}
-}
-
 /* Sometimes the CPU reports invalid exceptions on prefetch.
    Check that here and ignore.
    Opcode checker based on code by Richard Brunner */
--- linux/arch/s390/mm/fault.c.orig	
+++ linux/arch/s390/mm/fault.c	
@@ -49,32 +49,6 @@ extern int sysctl_userprocess_debug;
 
 extern void die(const char *,struct pt_regs *,long);
 
-extern spinlock_t timerlist_lock;
-
-/*
- * Unlock any spinlocks which will prevent us from getting the
- * message out (timerlist_lock is acquired through the
- * console unblank code)
- */
-void bust_spinlocks(int yes)
-{
-	if (yes) {
-		oops_in_progress = 1;
-	} else {
-		int loglevel_save = console_loglevel;
-		oops_in_progress = 0;
-		console_unblank();
-		/*
-		 * OK, the message is on the console.  Now we call printk()
-		 * without oops_in_progress set so that printk will give klogd
-		 * a poke.  Hold onto your hats...
-		 */
-		console_loglevel = 15;
-		printk(" ");
-		console_loglevel = loglevel_save;
-	}
-}
-
 /*
  * Check which address space is addressed by the access
  * register in S390_lowcore.exc_access_id.
--- linux/lib/bust_spinlocks.c.orig	
+++ linux/lib/bust_spinlocks.c	
@@ -17,23 +17,24 @@
 
 void bust_spinlocks(int yes)
 {
+	int loglevel_save = console_loglevel;
+
 	if (yes) {
 		oops_in_progress = 1;
-	} else {
-		int loglevel_save = console_loglevel;
+		return;
+	}
 #ifdef CONFIG_VT
-		unblank_screen();
+	unblank_screen();
 #endif
-		oops_in_progress = 0;
-		/*
-		 * OK, the message is on the console.  Now we call printk()
-		 * without oops_in_progress set so that printk() will give klogd
-		 * and the blanked console a poke.  Hold onto your hats...
-		 */
-		console_loglevel = 15;		/* NMI oopser may have shut the console up */
-		printk(" ");
-		console_loglevel = loglevel_save;
-	}
+	oops_in_progress = 0;
+	/*
+	 * OK, the message is on the console.  Now we call printk()
+	 * without oops_in_progress set so that printk() will give klogd
+	 * and the blanked console a poke.  Hold onto your hats...
+	 */
+	console_loglevel = 15;	/* NMI oopser may have shut the console up */
+	printk(" ");
+	console_loglevel = loglevel_save;
 }
 
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-20 14:53     ` Andi Kleen
@ 2004-05-20 15:19       ` Ingo Molnar
  2004-05-20 15:53         ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-05-20 15:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


* Andi Kleen <ak@muc.de> wrote:

> One alternative way would be to use locks with timeouts for these two
> locks (e.g. checking the TSC on x86, since the timer interrupt may not
> be running anymore) and only break the lock when the wait time is too
> long.
> 
> Of course serial lines can be quite slow so even that may not help
> always (for unknown reasons far too many people use 9600 baud for
> their serial line)

another solution would be to break the lock only once during the
kernel's lifetime. The system is messed up anyway if it needs multiple
lock breaks to get an oops out to the console. We dont care about
followup oopses - the first oops is that matters.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
       [not found]   ` <1XWpp-zy-9@gated-at.bofh.it>
@ 2004-05-20 14:53     ` Andi Kleen
  2004-05-20 15:19       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2004-05-20 14:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> * Ricky Beam <jfbeam@bluetronic.net> wrote:
>
>> It looks like somewhere in the path of release_console_sem() more than
>> one CPU is running the log. [...]
>
> the problem is this code in printk:
>
>         if (oops_in_progress) {
>                 /* If a crash is occurring, make sure we can't deadlock */
>                 spin_lock_init(&logbuf_lock);
>                 /* And make sure that we print immediately */
>                 init_MUTEX(&console_sem);
>         }
>
> so two crashes on two separate CPUs can go on in parallel. The problem
> is not constrained to the serial line - i've seen it on VGA too (albeit
> there it's much more rare).

One alternative way would be to use locks with timeouts for these two
locks (e.g. checking the TSC on x86, since the timer interrupt 
may not be running anymore) and only break the lock when the 
wait time is too long.

Of course serial lines can be quite slow so even that may not help
always (for unknown reasons far too many people use 9600 baud for 
their serial line) 

-Andi



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 16:03 ` Ricky Beam
@ 2004-05-20 14:03   ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-05-20 14:03 UTC (permalink / raw)
  To: Ricky Beam; +Cc: Linux Kernel Mail List


* Ricky Beam <jfbeam@bluetronic.net> wrote:

> It looks like somewhere in the path of release_console_sem() more than
> one CPU is running the log. [...]

the problem is this code in printk:

        if (oops_in_progress) {
                /* If a crash is occurring, make sure we can't deadlock */
                spin_lock_init(&logbuf_lock);
                /* And make sure that we print immediately */
                init_MUTEX(&console_sem);
        }

so two crashes on two separate CPUs can go on in parallel. The problem
is not constrained to the serial line - i've seen it on VGA too (albeit
there it's much more rare).

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
  2004-05-19 15:55 Tuukka Toivonen
@ 2004-05-19 16:03 ` Ricky Beam
  2004-05-20 14:03   ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Ricky Beam @ 2004-05-19 16:03 UTC (permalink / raw)
  To: Linux Kernel Mail List

On Wed, 19 May 2004, Tuukka Toivonen wrote:
>Do you get CRC errors during transfer (would definitely point to serial
>driver then).
...

The serial driver is not at fault.  It's processing exactly as it is asked
to.  The kernel's console logic is causing the problem. (in fact, there
are specific checks to make sure there aren't two CPUs attempting to
write to the serial registers at the same time.)

It looks like somewhere in the path of release_console_sem() more than
one CPU is running the log.  The "printk_cpu" patch might fix this.  I'll
konw in a day (the box crashs every night... the jack***'s turn the AC
off after 6pm everyday.)

--Ricky



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlaping printk
@ 2004-05-19 15:55 Tuukka Toivonen
  2004-05-19 16:03 ` Ricky Beam
  0 siblings, 1 reply; 15+ messages in thread
From: Tuukka Toivonen @ 2004-05-19 15:55 UTC (permalink / raw)
  To: Ricky Beam; +Cc: linux-kernel

Try transferring large files via minicom/zmodem (lrzsz).
Do you get CRC errors during transfer (would definitely point to serial
driver then).

Similar problems happen to me on my laptop when using 2.2.x with X server
or 2.4.x with or without. 2.2.x without X is fine, haven't tested 2.6.x
yet. (by the way, no problem with serial mouse, just file transfers which
do not proceed since almost all frames have some errors).

I've also ran serial console with 2.6.x on SMP machine with the laptop as
console (but running 2.4.x). Sometimes the output gets slightly garbled,
could be a problem with flow control too. I can do more experiments if
anyone is interested.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-05-21 23:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-19 15:19 overlaping printk Ricky Beam
2004-05-19 15:26 ` Tigran Aivazian
2004-05-19 15:31   ` Ricky Beam
2004-05-19 15:39 ` Randy.Dunlap
2004-05-20  7:21 ` Andrew Morton
2004-05-19 15:55 Tuukka Toivonen
2004-05-19 16:03 ` Ricky Beam
2004-05-20 14:03   ` Ingo Molnar
     [not found] <1XBEP-Mc-49@gated-at.bofh.it>
     [not found] ` <1XBXw-13D-3@gated-at.bofh.it>
     [not found]   ` <1XWpp-zy-9@gated-at.bofh.it>
2004-05-20 14:53     ` Andi Kleen
2004-05-20 15:19       ` Ingo Molnar
2004-05-20 15:53         ` Ingo Molnar
2004-05-20 16:19           ` Andy Isaacson
2004-05-20 18:57             ` Ingo Molnar
2004-05-20 23:11               ` Andrew Morton
2004-05-21  7:00                 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).