LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.20 BUG: soft lockup detected on CPU#0!
@ 2007-02-07 23:02 Lukasz Trabinski
  2007-02-08  4:47 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Lukasz Trabinski @ 2007-02-07 23:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bartlomiej Solarz-Niesluchowski

Hello

On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(



oceanic:~$ uname -a
Linux oceanic.wsisiz.edu.pl 2.6.20-oceanic #2 SMP Sun Feb 4 21:55:29 CET 
2007 x86_64 x86_64 x86_64 GNU/Linux

Feb  7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#0! 
Feb  7 22:46:00 oceanic kernel: 
Feb  7 22:46:00 oceanic kernel: Call Trace: 
Feb  7 22:46:00 oceanic kernel:  <IRQ>  [<ffffffff80250550>] softlockup_tick+0xdb/0xed 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022ea87>] __do_softirq+0x55/0xc4 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023256c>] update_process_times+0x42/0x68 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
Feb  7 22:46:00 oceanic kernel:  <EOI>  [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023dddd>] hrtimer_try_to_cancel+0x4a/0x53 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022cc66>] do_exit+0x1c8/0x800 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022d31d>] sys_exit_group+0x0/0xe 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020995c>] tracesys+0xdc/0xe1 
Feb  7 22:46:00 oceanic kernel: 
Feb  7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#1! 
Feb  7 22:46:00 oceanic kernel: 
Feb  7 22:46:00 oceanic kernel: Call Trace: 
Feb  7 22:46:00 oceanic kernel:  <IRQ>  [<ffffffff80250550>] softlockup_tick+0xdb/0xed 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023256c>] update_process_times+0x42/0x68 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff802338c8>] __group_send_sig_info+0x35/0x8b 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022e0f1>] it_real_fn+0x0/0x4f 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff802164da>] flat_send_IPI_mask+0x0/0x3d 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023dc52>] hrtimer_run_queues+0x105/0x164 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80231d0f>] run_timer_softirq+0x21/0x19f 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022eb49>] tasklet_action+0x53/0x9d 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022ea87>] __do_softirq+0x55/0xc4 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a8fc>] call_softirq+0x1c/0x28 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020c3b1>] do_softirq+0x2c/0x7d 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8021431b>] smp_apic_timer_interrupt+0x56/0x68 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80208a55>] default_idle+0x0/0x3d 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
Feb  7 22:46:00 oceanic kernel:  <EOI>  [<ffffffff80208a7e>] default_idle+0x29/0x3d 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff80208ae4>] cpu_idle+0x52/0x71 
Feb  7 22:46:00 oceanic kernel:  [<ffffffff8057fbbc>] start_secondary+0x465/0x474

-- 
ŁT


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-07 23:02 2.6.20 BUG: soft lockup detected on CPU#0! Lukasz Trabinski
@ 2007-02-08  4:47 ` Andrew Morton
  2007-02-08  8:06   ` Ingo Molnar
  2007-02-08 15:51   ` Lukasz Trabinski
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08  4:47 UTC (permalink / raw)
  To: Lukasz Trabinski
  Cc: linux-kernel, Bartlomiej Solarz-Niesluchowski, Ingo Molnar,
	Thomas Gleixner

On Thu, 8 Feb 2007 00:02:10 +0100 (CET) Lukasz Trabinski <lukasz@wsisiz.edu.pl> wrote:

> Hello
> 
> On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(
> 

Did the machine actually fail?  Or did it just print these messages and
keep going?


> 
> oceanic:~$ uname -a
> Linux oceanic.wsisiz.edu.pl 2.6.20-oceanic #2 SMP Sun Feb 4 21:55:29 CET 
> 2007 x86_64 x86_64 x86_64 GNU/Linux
> 
> Feb  7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#0! 
> Feb  7 22:46:00 oceanic kernel: 
> Feb  7 22:46:00 oceanic kernel: Call Trace: 
> Feb  7 22:46:00 oceanic kernel:  <IRQ>  [<ffffffff80250550>] softlockup_tick+0xdb/0xed 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022ea87>] __do_softirq+0x55/0xc4 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023256c>] update_process_times+0x42/0x68 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
> Feb  7 22:46:00 oceanic kernel:  <EOI>  [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023dddd>] hrtimer_try_to_cancel+0x4a/0x53 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022cc66>] do_exit+0x1c8/0x800 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022d31d>] sys_exit_group+0x0/0xe 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020995c>] tracesys+0xdc/0xe1 
> Feb  7 22:46:00 oceanic kernel: 
> Feb  7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#1! 
> Feb  7 22:46:00 oceanic kernel: 
> Feb  7 22:46:00 oceanic kernel: Call Trace: 
> Feb  7 22:46:00 oceanic kernel:  <IRQ>  [<ffffffff80250550>] softlockup_tick+0xdb/0xed 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023256c>] update_process_times+0x42/0x68 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff802338c8>] __group_send_sig_info+0x35/0x8b 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022e0f1>] it_real_fn+0x0/0x4f 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff802164da>] flat_send_IPI_mask+0x0/0x3d 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8023dc52>] hrtimer_run_queues+0x105/0x164 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80231d0f>] run_timer_softirq+0x21/0x19f 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022eb49>] tasklet_action+0x53/0x9d 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8022ea87>] __do_softirq+0x55/0xc4 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a8fc>] call_softirq+0x1c/0x28 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020c3b1>] do_softirq+0x2c/0x7d 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8021431b>] smp_apic_timer_interrupt+0x56/0x68 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80208a55>] default_idle+0x0/0x3d 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70 
> Feb  7 22:46:00 oceanic kernel:  <EOI>  [<ffffffff80208a7e>] default_idle+0x29/0x3d 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff80208ae4>] cpu_idle+0x52/0x71 
> Feb  7 22:46:00 oceanic kernel:  [<ffffffff8057fbbc>] start_secondary+0x465/0x474
> 

The softlock detector has a long history of false positives and precious
few true positives, in my experience.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  4:47 ` Andrew Morton
@ 2007-02-08  8:06   ` Ingo Molnar
  2007-02-08  8:15     ` Andrew Morton
  2007-02-08  9:16     ` Eric Dumazet
  2007-02-08 15:51   ` Lukasz Trabinski
  1 sibling, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2007-02-08  8:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lukasz Trabinski, linux-kernel, Bartlomiej Solarz-Niesluchowski,
	Thomas Gleixner


* Andrew Morton <akpm@linux-foundation.org> wrote:

> The softlock detector has a long history of false positives and 
> precious few true positives, in my experience.

hm, not so the latest & lamest in my experience. The commit that made it 
quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March 
2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep 
the main source of soft lockups in the field has been quite severely 
reduced. Nevertheless it's still good to have it around, occasionally 
there happen other types of soft lockups too, in open-coded loops, etc.)

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  8:06   ` Ingo Molnar
@ 2007-02-08  8:15     ` Andrew Morton
  2007-02-08  9:16     ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08  8:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lukasz Trabinski, linux-kernel, Bartlomiej Solarz-Niesluchowski,
	Thomas Gleixner

On Thu, 8 Feb 2007 09:06:44 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > The softlock detector has a long history of false positives and 
> > precious few true positives, in my experience.
> 
> hm, not so the latest & lamest in my experience. The commit that made it 
> quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March 
> 2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep 
> the main source of soft lockups in the field has been quite severely 
> reduced. Nevertheless it's still good to have it around, occasionally 
> there happen other types of soft lockups too, in open-coded loops, etc.)
> 

So...  what caused Lukasz's lockup?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  8:06   ` Ingo Molnar
  2007-02-08  8:15     ` Andrew Morton
@ 2007-02-08  9:16     ` Eric Dumazet
  2007-02-08  9:56       ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2007-02-08  9:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Lukasz Trabinski, linux-kernel,
	Bartlomiej Solarz-Niesluchowski, Thomas Gleixner

On Thursday 08 February 2007 09:06, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> > The softlock detector has a long history of false positives and
> > precious few true positives, in my experience.
>
> hm, not so the latest & lamest in my experience. The commit that made it
> quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March
> 2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep
> the main source of soft lockups in the field has been quite severely
> reduced. Nevertheless it's still good to have it around, occasionally
> there happen other types of soft lockups too, in open-coded loops, etc.)

This reminds me the current problem in close_files()
 code, where we trigger soft lockup quite regularly.

Is there any chance/interest we can solve the issue Andrew had with this 
patch ?

http://lkml.org/lkml/2006/5/2/273

Thank you
Eric



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  9:16     ` Eric Dumazet
@ 2007-02-08  9:56       ` Ingo Molnar
  2007-02-08 10:12         ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2007-02-08  9:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Lukasz Trabinski, linux-kernel,
	Bartlomiej Solarz-Niesluchowski, Thomas Gleixner


* Eric Dumazet <dada1@cosmosbay.com> wrote:

> This reminds me the current problem in close_files()
>  code, where we trigger soft lockup quite regularly.
> 
> Is there any chance/interest we can solve the issue Andrew had with 
> this patch ?
> 
> http://lkml.org/lkml/2006/5/2/273

yes - the -rt patch included the patch below for more than 2 years. 
(note that this one is even more finegrained)

	Ingo

Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -431,8 +433,10 @@ static void close_files(struct files_str
 		while (set) {
 			if (set & 1) {
 				struct file * file = xchg(&fdt->fd[i], NULL);
-				if (file)
+				if (file) {
 					filp_close(file, files);
+					cond_resched();
+				}
 			}
 			i++;
 			set >>= 1;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  9:56       ` Ingo Molnar
@ 2007-02-08 10:12         ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 10:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Lukasz Trabinski, linux-kernel,
	Bartlomiej Solarz-Niesluchowski, Thomas Gleixner

On Thu, 8 Feb 2007 10:56:12 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> > This reminds me the current problem in close_files()
> >  code, where we trigger soft lockup quite regularly.
> > 
> > Is there any chance/interest we can solve the issue Andrew had with 
> > this patch ?
> > 
> > http://lkml.org/lkml/2006/5/2/273
> 
> yes - the -rt patch included the patch below for more than 2 years. 
> (note that this one is even more finegrained)
> 
> 	Ingo
> 
> Index: linux/kernel/exit.c
> ===================================================================
> --- linux.orig/kernel/exit.c
> +++ linux/kernel/exit.c
> @@ -431,8 +433,10 @@ static void close_files(struct files_str
>  		while (set) {
>  			if (set & 1) {
>  				struct file * file = xchg(&fdt->fd[i], NULL);
> -				if (file)
> +				if (file) {
>  					filp_close(file, files);
> +					cond_resched();
> +				}
>  			}
>  			i++;
>  			set >>= 1;

That doesn't hang like the other patch did on 2.6.17-rc3.

Very mysterious.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
  2007-02-08  4:47 ` Andrew Morton
  2007-02-08  8:06   ` Ingo Molnar
@ 2007-02-08 15:51   ` Lukasz Trabinski
  1 sibling, 0 replies; 8+ messages in thread
From: Lukasz Trabinski @ 2007-02-08 15:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Bartlomiej Solarz-Niesluchowski, Ingo Molnar,
	Thomas Gleixner

On Wed, 7 Feb 2007, Andrew Morton wrote:

> On Thu, 8 Feb 2007 00:02:10 +0100 (CET) Lukasz Trabinski <lukasz@wsisiz.edu.pl> wrote:
>
>> Hello
>>
>> On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(
>>
>
> Did the machine actually fail?  Or did it just print these messages and
> keep going?

Was message from tg3 network driver, but i don't remember it. :(
Machine was totaly locked, only ctrl sysrq boot...

-- 
ŁT


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-02-08 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07 23:02 2.6.20 BUG: soft lockup detected on CPU#0! Lukasz Trabinski
2007-02-08  4:47 ` Andrew Morton
2007-02-08  8:06   ` Ingo Molnar
2007-02-08  8:15     ` Andrew Morton
2007-02-08  9:16     ` Eric Dumazet
2007-02-08  9:56       ` Ingo Molnar
2007-02-08 10:12         ` Andrew Morton
2007-02-08 15:51   ` Lukasz Trabinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).