LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* BUG/ spinlock lockup, 2.6.24
@ 2008-02-15 15:19 Denys Fedoryshchenko
2008-02-15 15:24 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Denys Fedoryshchenko @ 2008-02-15 15:19 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel
Server crashed(not responding over network), last line over netconsole was
Feb 15 15:50:17 217.151.X.X [1521315.068984] BUG: spinlock lockup on CPU#1,
ksoftirqd/1/7, f0551180
I have random crashes, at least once per week. It is very difficult to catch
error message, and only recently i setup netconsole. Now i got crash, but
there is no traceback and only single line came over netconsole, mentioned
before.
.config file
http://www.nuclearcat.com/files/config_qos
Kernel is 2.6.24 with epoll patch(it is from mainline) applied.
cat /proc/version
Linux version 2.6.24-devel (root@visp-1) (gcc version 4.1.1 (Gentoo 4.1.1-
r3)) #1 SMP Sat Jan 26 17:26:54 EET 2008
visp-1 ~ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 18361 17785 17471 17748 IO-APIC-edge timer
1: 2 0 0 0 IO-APIC-edge i8042
8: 5 4 3 4 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 1 0 1 2 IO-APIC-edge i8042
14: 14 17 17 15 IO-APIC-edge libata
15: 0 0 0 0 IO-APIC-edge libata
17: 269 259 256 259 IO-APIC-fasteoi ioc0
18: 5 5 6 7 IO-APIC-fasteoi
ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb4
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb3
66: 1 0 0 0 none-<NULL>
212: 27 32 35 32 PCI-MSI-edge eth1
213: 36818 36995 37307 37029 PCI-MSI-edge eth0
214: 0 1 1 1 PCI-MSI-edge
NMI: 71107 70983 70962 70962 Non-maskable interrupts
LOC: 53005 53178 53490 53214 Local timer interrupts
RES: 414 434 363 378 Rescheduling interrupts
CAL: 52 46 56 47 function call interrupts
TLB: 398 288 403 264 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
SPU: 0 0 0 0 Spurious interrupts
ERR: 0
MIS: 0
visp-1 ~ # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 4
cpu MHz : 3192.163
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips : 6390.17
clflush size : 64
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 4
cpu MHz : 3192.163
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips : 6383.72
clflush size : 64
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 4
cpu MHz : 3192.163
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips : 6383.75
clflush size : 64
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Xeon(TM) CPU 3.20GHz
stepping : 4
cpu MHz : 3192.163
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
bogomips : 6383.76
clflush size : 64
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG/ spinlock lockup, 2.6.24
2008-02-15 15:19 BUG/ spinlock lockup, 2.6.24 Denys Fedoryshchenko
@ 2008-02-15 15:24 ` Bart Van Assche
2008-02-15 19:42 ` Denys Fedoryshchenko
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2008-02-15 15:24 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: netdev, linux-kernel
2008/2/15 Denys Fedoryshchenko <denys@visp.net.lb>:
> I have random crashes, at least once per week. It is very difficult to catch
> error message, and only recently i setup netconsole. Now i got crash, but
> there is no traceback and only single line came over netconsole, mentioned
> before.
Did you already run memtest ? You can run memtest by booting from the
Knoppix CD-ROM or DVD. Most Linux distributions also have included
memtest on their bootable distribution CD's/DVD's.
Bart Van Assche.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG/ spinlock lockup, 2.6.24
2008-02-15 15:24 ` Bart Van Assche
@ 2008-02-15 19:42 ` Denys Fedoryshchenko
2008-02-15 20:21 ` Jarek Poplawski
0 siblings, 1 reply; 6+ messages in thread
From: Denys Fedoryshchenko @ 2008-02-15 19:42 UTC (permalink / raw)
To: Bart Van Assche; +Cc: netdev, linux-kernel
This server was working fine under load under FreeBSD, and worked fine before
with other tasks under Linux. I dont think it is RAM.
Additionally it is server hardware (Dell PowerEdge) with ECC, MCE and other
layers, who will report about any hardware issue most probably, and i think
even better than memtest.
Additionally it is very difficult to run test on it, cause it is in another
country, and i have limited access to it (i dont have network KVM).
I have similar crashes on completely different hardware with same job (QOS),
so i think it is actually some nasty bug in networking.
On Fri, 15 Feb 2008 16:24:56 +0100, Bart Van Assche wrote
> 2008/2/15 Denys Fedoryshchenko <denys@visp.net.lb>:
> > I have random crashes, at least once per week. It is very difficult to
catch
> > error message, and only recently i setup netconsole. Now i got crash, but
> > there is no traceback and only single line came over netconsole,
mentioned
> > before.
>
> Did you already run memtest ? You can run memtest by booting from the
> Knoppix CD-ROM or DVD. Most Linux distributions also have included
> memtest on their bootable distribution CD's/DVD's.
>
> Bart Van Assche.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG/ spinlock lockup, 2.6.24
2008-02-15 19:42 ` Denys Fedoryshchenko
@ 2008-02-15 20:21 ` Jarek Poplawski
2008-02-15 21:03 ` Jarek Poplawski
0 siblings, 1 reply; 6+ messages in thread
From: Jarek Poplawski @ 2008-02-15 20:21 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Bart Van Assche, netdev, linux-kernel
Denys Fedoryshchenko wrote, On 02/15/2008 08:42 PM:
...
> I have similar crashes on completely different hardware with same job (QOS),
> so i think it is actually some nasty bug in networking.
Maybe you could try with some other debugging options? E.g. since lockdep
doesn't help - turn this off. Instead try some others, like these:
> # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
> # CONFIG_DEBUG_KOBJECT is not set
> # CONFIG_DEBUG_HIGHMEM is not set
> # CONFIG_DEBUG_VM is not set
> # CONFIG_DEBUG_LIST is not set
> # CONFIG_DEBUG_SG is not set
> # CONFIG_BOOT_PRINTK_DELAY is not set
> # CONFIG_DEBUG_STACKOVERFLOW is not set
> # CONFIG_DEBUG_STACK_USAGE is not set
> # CONFIG_DEBUG_RODATA is not set
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG/ spinlock lockup, 2.6.24
2008-02-15 20:21 ` Jarek Poplawski
@ 2008-02-15 21:03 ` Jarek Poplawski
2008-02-15 22:49 ` Jarek Poplawski
0 siblings, 1 reply; 6+ messages in thread
From: Jarek Poplawski @ 2008-02-15 21:03 UTC (permalink / raw)
Cc: Denys Fedoryshchenko, Bart Van Assche, netdev, linux-kernel
Jarek Poplawski wrote, On 02/15/2008 09:21 PM:
> Denys Fedoryshchenko wrote, On 02/15/2008 08:42 PM:
> ...
>
>> I have similar crashes on completely different hardware with same job (QOS),
>> so i think it is actually some nasty bug in networking.
>
> Maybe you could try with some other debugging options? E.g. since lockdep
> doesn't help - turn this off. Instead try some others, like these:
...On the other hand this:
> Feb 15 15:50:17 217.151.X.X [1521315.068984] BUG: spinlock lockup on CPU#1,
> ksoftirqd/1/7, f0551180
seems to point just at spinlock lockup, so it's more about the full report.
I wonder if this patch to prink could help here:
author Ingo Molnar <mingo at elte.hu>
Fri, 25 Jan 2008 20:07:58 +0000 (21:07 +0100)
printk: make printk more robust by not allowing recursion
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=32a76006683f7b28ae3cc491da37716e002f198e
Jarek P.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG/ spinlock lockup, 2.6.24
2008-02-15 21:03 ` Jarek Poplawski
@ 2008-02-15 22:49 ` Jarek Poplawski
0 siblings, 0 replies; 6+ messages in thread
From: Jarek Poplawski @ 2008-02-15 22:49 UTC (permalink / raw)
To: Denys Fedoryshchenko; +Cc: Bart Van Assche, netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 654 bytes --]
Jarek Poplawski wrote, On 02/15/2008 10:03 PM:
...
> ...On the other hand this:
>
>> Feb 15 15:50:17 217.151.X.X [1521315.068984] BUG: spinlock lockup on CPU#1,
>> ksoftirqd/1/7, f0551180
>
> seems to point just at spinlock lockup, so it's more about the full report.
> I wonder if this patch to prink could help here:
>
> author Ingo Molnar <mingo at elte.hu>
> Fri, 25 Jan 2008 20:07:58 +0000 (21:07 +0100)
> printk: make printk more robust by not allowing recursion
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=32a76006683f7b28ae3cc491da37716e002f198e
...or maybe a patch like this attached here?
Jarek P.
[-- Attachment #2: spinlock_debug.diff --]
[-- Type: text/x-diff, Size: 733 bytes --]
diff --git a/lib/spinlock_debug.c b/lib/spinlock_debug.c
index 9c4b025..21c8aaa 100644
--- a/lib/spinlock_debug.c
+++ b/lib/spinlock_debug.c
@@ -111,8 +111,7 @@ static void __spin_lock_debug(spinlock_t *lock)
__delay(1);
}
/* lockup suspected: */
- if (print_once) {
- print_once = 0;
+ if (print_once == 1) {
printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
"%s/%d, %p\n",
raw_smp_processor_id(), current->comm,
@@ -122,7 +121,14 @@ static void __spin_lock_debug(spinlock_t *lock)
trigger_all_cpu_backtrace();
#endif
}
+ if (print_once++ > 1000)
+ goto out;
}
+ return;
+out:
+ panic("spinlock lockup panic #%llu\n", i);
+ // or:
+ // BUG();
}
void _raw_spin_lock(spinlock_t *lock)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-02-15 22:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-15 15:19 BUG/ spinlock lockup, 2.6.24 Denys Fedoryshchenko
2008-02-15 15:24 ` Bart Van Assche
2008-02-15 19:42 ` Denys Fedoryshchenko
2008-02-15 20:21 ` Jarek Poplawski
2008-02-15 21:03 ` Jarek Poplawski
2008-02-15 22:49 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).