LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
@ 2009-05-03 13:36 Michael Guntsche
  2009-05-05 19:27 ` Andrew Morton
  2009-05-20 21:47 ` Lennert Buytenhek
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Guntsche @ 2009-05-03 13:36 UTC (permalink / raw)
  To: linux-kernel

Hello list,


I recently tried 2.6.30-rc4 on a routerboard currently running 2.6.29  
(it is running stable with this kernel).

This board is used as a gateway and under load I see the following BUG  
with 2.6.30-rc4.

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
Oops: Exception in kernel mode, sig: 5 [#1]
MikroTik RouterBOARD 600 series
Modules linked in: nf_nat_rtsp nf_conntrack_rtsp
NIP: c01abc68 LR: c01abc68 CTR: c015559c
REGS: c7aa7b20 TRAP: 0700   Not tainted  (2.6.30-rc4)
MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002424  XER: 20000000
TASK = c7855bc0[588] 'pptpgw' THREAD: c7aa6000
GPR00: c01abc68 c7aa7bd0 c7855bc0 00000085 0000295e ffffffff c0152b68  
00000030
GPR08: c03848d4 c0350000 0000295e c0380398 84002422 10029614 100de49c  
100e0000
GPR16: 100b45a0 00000040 c02f6260 c02f628c c7846380 c7aa6000 c7957800  
00000000
GPR24: 00000002 0000003e c7a12480 c7957a00 c7846000 000005e6 c7956240  
c7a8b880
NIP [c01abc68] skb_over_panic+0x48/0x5c
LR [c01abc68] skb_over_panic+0x48/0x5c
Call Trace:
[c7aa7bd0] [c01abc68] skb_over_panic+0x48/0x5c (unreliable)
[c7aa7be0] [c01ad468] skb_put+0x5c/0x60
[c7aa7bf0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
[c7aa7c40] [c0189074] gfar_poll+0x258/0x338
[c7aa7c90] [c01b8b8c] net_rx_action+0x9c/0x190
[c7aa7cc0] [c002ddac] __do_softirq+0x84/0x100
[c7aa7cf0] [c0006474] do_softirq+0x58/0x5c
[c7aa7d00] [c002dc24] irq_exit+0x94/0x98
[c7aa7d10] [c0006518] do_IRQ+0xa0/0xc4
[c7aa7d30] [c0014448] ret_from_except+0x0/0x14
--- Exception: 501 at ppp_asynctty_receive+0x1d8/0x550
     LR = ppp_asynctty_receive+0x4a4/0x550
[c7aa7df0] [c0197734] ppp_asynctty_receive+0x324/0x550 (unreliable)
[c7aa7e40] [c014def4] pty_write+0x74/0x8c
[c7aa7e50] [c0148f1c] n_tty_write+0x2b0/0x430
[c7aa7eb0] [c0145f34] tty_write+0x188/0x268
[c7aa7ef0] [c00903cc] vfs_write+0xb4/0x1a4
[c7aa7f10] [c009096c] sys_write+0x4c/0x90
[c7aa7f40] [c0013db0] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff2d5c4
     LR = 0x10003a48
Instruction dump:
80a30054 2f800000 80e300a8 810300ac 816300a0 814300a4 419e0020 3c60c030
7d695b78 90010008 3863901c 4be7d091 <0fe00000> 48000000 3d20c02d  
380968b8
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[c7aa7970] [c0008274] show_stack+0x4c/0x16c (unreliable)
[c7aa79b0] [c0027d1c] panic+0x90/0x170
[c7aa7a00] [c00118ac] die+0x19c/0x1d4
[c7aa7a20] [c0011b80] _exception+0x138/0x15c
[c7aa7b10] [c00143fc] ret_from_except_full+0x0/0x4c
--- Exception: 700 at skb_over_panic+0x48/0x5c
     LR = skb_over_panic+0x48/0x5c
[c7aa7be0] [c01ad468] skb_put+0x5c/0x60
[c7aa7bf0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
[c7aa7c40] [c0189074] gfar_poll+0x258/0x338
[c7aa7c90] [c01b8b8c] net_rx_action+0x9c/0x190
[c7aa7cc0] [c002ddac] __do_softirq+0x84/0x100
[c7aa7cf0] [c0006474] do_softirq+0x58/0x5c
[c7aa7d00] [c002dc24] irq_exit+0x94/0x98
[c7aa7d10] [c0006518] do_IRQ+0xa0/0xc4
[c7aa7d30] [c0014448] ret_from_except+0x0/0x14
--- Exception: 501 at ppp_asynctty_receive+0x1d8/0x550
     LR = ppp_asynctty_receive+0x4a4/0x550
[c7aa7df0] [c0197734] ppp_asynctty_receive+0x324/0x550 (unreliable)
[c7aa7e40] [c014def4] pty_write+0x74/0x8c
[c7aa7e50] [c0148f1c] n_tty_write+0x2b0/0x430
[c7aa7eb0] [c0145f34] tty_write+0x188/0x268
[c7aa7ef0] [c00903cc] vfs_write+0xb4/0x1a4
[c7aa7f10] [c009096c] sys_write+0x4c/0x90
[c7aa7f40] [c0013db0] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff2d5c4
     LR = 0x10003a48

After that the Board reboots after 180 seconds. To reproduce this  
problem I just have to stream audio or run some benchmarks on http://speedtest.net 
.

Doing the same with 2.6.29 does not inhibit this problem.

Please CC me on any replies since I am not subscribed to the list. If  
you need more information please just tell me.

Kind regards,
Michael 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-03 13:36 [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar Michael Guntsche
@ 2009-05-05 19:27 ` Andrew Morton
  2009-05-05 20:29   ` Michael Guntsche
  2009-05-20 21:47 ` Lennert Buytenhek
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2009-05-05 19:27 UTC (permalink / raw)
  To: Michael Guntsche; +Cc: linux-kernel, netdev

(cc netdev)

On Sun, 3 May 2009 15:36:27 +0200
Michael Guntsche <mike@it-loops.com> wrote:

> Hello list,
> 
> 
> I recently tried 2.6.30-rc4 on a routerboard currently running 2.6.29  
> (it is running stable with this kernel).
> 
> This board is used as a gateway and under load I see the following BUG  
> with 2.6.30-rc4.
> 
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:126!

skb_over_panic().  There should have been some additional information
printed before the "cut here" text.  That might be useful in fixing
this regression.


> Oops: Exception in kernel mode, sig: 5 [#1]
> MikroTik RouterBOARD 600 series
> Modules linked in: nf_nat_rtsp nf_conntrack_rtsp
> NIP: c01abc68 LR: c01abc68 CTR: c015559c
> REGS: c7aa7b20 TRAP: 0700   Not tainted  (2.6.30-rc4)
> MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002424  XER: 20000000
> TASK = c7855bc0[588] 'pptpgw' THREAD: c7aa6000
> GPR00: c01abc68 c7aa7bd0 c7855bc0 00000085 0000295e ffffffff c0152b68  
> 00000030
> GPR08: c03848d4 c0350000 0000295e c0380398 84002422 10029614 100de49c  
> 100e0000
> GPR16: 100b45a0 00000040 c02f6260 c02f628c c7846380 c7aa6000 c7957800  
> 00000000
> GPR24: 00000002 0000003e c7a12480 c7957a00 c7846000 000005e6 c7956240  
> c7a8b880
> NIP [c01abc68] skb_over_panic+0x48/0x5c
> LR [c01abc68] skb_over_panic+0x48/0x5c
> Call Trace:
> [c7aa7bd0] [c01abc68] skb_over_panic+0x48/0x5c (unreliable)
> [c7aa7be0] [c01ad468] skb_put+0x5c/0x60
> [c7aa7bf0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
> [c7aa7c40] [c0189074] gfar_poll+0x258/0x338
> [c7aa7c90] [c01b8b8c] net_rx_action+0x9c/0x190
> [c7aa7cc0] [c002ddac] __do_softirq+0x84/0x100
> [c7aa7cf0] [c0006474] do_softirq+0x58/0x5c
> [c7aa7d00] [c002dc24] irq_exit+0x94/0x98
> [c7aa7d10] [c0006518] do_IRQ+0xa0/0xc4
> [c7aa7d30] [c0014448] ret_from_except+0x0/0x14
> --- Exception: 501 at ppp_asynctty_receive+0x1d8/0x550
>      LR = ppp_asynctty_receive+0x4a4/0x550
> [c7aa7df0] [c0197734] ppp_asynctty_receive+0x324/0x550 (unreliable)
> [c7aa7e40] [c014def4] pty_write+0x74/0x8c
> [c7aa7e50] [c0148f1c] n_tty_write+0x2b0/0x430
> [c7aa7eb0] [c0145f34] tty_write+0x188/0x268
> [c7aa7ef0] [c00903cc] vfs_write+0xb4/0x1a4
> [c7aa7f10] [c009096c] sys_write+0x4c/0x90
> [c7aa7f40] [c0013db0] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xff2d5c4
>      LR = 0x10003a48
> Instruction dump:
> 80a30054 2f800000 80e300a8 810300ac 816300a0 814300a4 419e0020 3c60c030
> 7d695b78 90010008 3863901c 4be7d091 <0fe00000> 48000000 3d20c02d  
> 380968b8
> Kernel panic - not syncing: Fatal exception in interrupt
> Call Trace:
> [c7aa7970] [c0008274] show_stack+0x4c/0x16c (unreliable)
> [c7aa79b0] [c0027d1c] panic+0x90/0x170
> [c7aa7a00] [c00118ac] die+0x19c/0x1d4
> [c7aa7a20] [c0011b80] _exception+0x138/0x15c
> [c7aa7b10] [c00143fc] ret_from_except_full+0x0/0x4c
> --- Exception: 700 at skb_over_panic+0x48/0x5c
>      LR = skb_over_panic+0x48/0x5c
> [c7aa7be0] [c01ad468] skb_put+0x5c/0x60
> [c7aa7bf0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
> [c7aa7c40] [c0189074] gfar_poll+0x258/0x338
> [c7aa7c90] [c01b8b8c] net_rx_action+0x9c/0x190
> [c7aa7cc0] [c002ddac] __do_softirq+0x84/0x100
> [c7aa7cf0] [c0006474] do_softirq+0x58/0x5c
> [c7aa7d00] [c002dc24] irq_exit+0x94/0x98
> [c7aa7d10] [c0006518] do_IRQ+0xa0/0xc4
> [c7aa7d30] [c0014448] ret_from_except+0x0/0x14
> --- Exception: 501 at ppp_asynctty_receive+0x1d8/0x550
>      LR = ppp_asynctty_receive+0x4a4/0x550
> [c7aa7df0] [c0197734] ppp_asynctty_receive+0x324/0x550 (unreliable)
> [c7aa7e40] [c014def4] pty_write+0x74/0x8c
> [c7aa7e50] [c0148f1c] n_tty_write+0x2b0/0x430
> [c7aa7eb0] [c0145f34] tty_write+0x188/0x268
> [c7aa7ef0] [c00903cc] vfs_write+0xb4/0x1a4
> [c7aa7f10] [c009096c] sys_write+0x4c/0x90
> [c7aa7f40] [c0013db0] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xff2d5c4
>      LR = 0x10003a48
> 
> After that the Board reboots after 180 seconds. To reproduce this  
> problem I just have to stream audio or run some benchmarks on http://speedtest.net 
> .
> 
> Doing the same with 2.6.29 does not inhibit this problem.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-05 19:27 ` Andrew Morton
@ 2009-05-05 20:29   ` Michael Guntsche
  2009-05-12 22:34     ` Michael Guntsche
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Guntsche @ 2009-05-05 20:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev


On May 5, 2009, at 21:27, Andrew Morton wrote:

> skb_over_panic().  There should have been some additional information
> printed before the "cut here" text.  That might be useful in fixing
> this regression.
>

Here is a full dump from the console again including the line before  
the "cut here" text.
Wan is the device going to the ADSL-modem.

[  172.251007] skb_over_panic: text:c0187650 len:1514 put:1514  
head:c79a8800 data:c79a8880 tail:0xc79a8e6a end:0xc79a8e60 dev:wan
[  172.262613] ------------[ cut here ]------------
[  172.267250] kernel BUG at net/core/skbuff.c:126!
[  172.271884] Oops: Exception in kernel mode, sig: 5 [#1]
[  172.277121] MikroTik RouterBOARD 600 series
[  172.281312] Modules linked in: nf_nat_rtsp nf_conntrack_rtsp
[  172.287006] NIP: c01abc68 LR: c01abc68 CTR: c015559c
[  172.291986] REGS: c782bdd0 TRAP: 0700   Not tainted  (2.6.30-rc4)
[  172.298094] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002024  XER:  
20000000
[  172.304776] TASK = c7820940[3] 'ksoftirqd/0' THREAD: c782a000
[  172.310361] GPR00: c01abc68 c782be80 c7820940 00000085 0000295e  
ffffffff c0152b68 00000030
[  172.318788] GPR08: c03848d4 c0350000 0000295e c0380398 84002022  
efffdbdf 007d66c0 00000008
[  172.327215] GPR16: 00000000 00000040 c02f6260 c02f628c c7846380  
c782a000 c7954800 00000000
[  172.335641] GPR24: 00000001 0000003f c7951780 c7954e38 c7846000  
000005ea c79e4480 c79a8880
[  172.344263] NIP [c01abc68] skb_over_panic+0x48/0x5c
[  172.349156] LR [c01abc68] skb_over_panic+0x48/0x5c
[  172.353956] Call Trace:
[  172.356412] [c782be80] [c01abc68] skb_over_panic+0x48/0x5c  
(unreliable)
[  172.363060] [c782be90] [c01ad468] skb_put+0x5c/0x60
[  172.367969] [c782bea0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
[  172.374003] [c782bef0] [c0189074] gfar_poll+0x258/0x338
[  172.379264] [c782bf40] [c01b8b8c] net_rx_action+0x9c/0x190
[  172.384787] [c782bf70] [c002ddac] __do_softirq+0x84/0x100
[  172.390218] [c782bfa0] [c0006474] do_softirq+0x58/0x5c
[  172.395382] [c782bfb0] [c002d934] ksoftirqd+0x60/0x114
[  172.400549] [c782bfd0] [c003f050] kthread+0x4c/0x88
[  172.405450] [c782bff0] [c0013b10] kernel_thread+0x4c/0x68
[  172.410866] Instruction dump:
[  172.413843] 80a30054 2f800000 80e300a8 810300ac 816300a0 814300a4  
419e0020 3c60c030
[  172.421656] 7d695b78 90010008 3863901c 4be7d091 <0fe00000> 48000000  
3d20c02d 380968b8
[  172.429651] Kernel panic - not syncing: Fatal exception in interrupt
[  172.436019] Call Trace:
[  172.438479] [c782bc20] [c0008274] show_stack+0x4c/0x16c (unreliable)
[  172.444866] [c782bc60] [c0027d1c] panic+0x90/0x170
[  172.449689] [c782bcb0] [c00118ac] die+0x19c/0x1d4
[  172.454417] [c782bcd0] [c0011b80] _exception+0x138/0x15c
[  172.459754] [c782bdc0] [c00143fc] ret_from_except_full+0x0/0x4c
[  172.465703] --- Exception: 700 at skb_over_panic+0x48/0x5c
[  172.465713]     LR = skb_over_panic+0x48/0x5c
[  172.475570] [c782be90] [c01ad468] skb_put+0x5c/0x60
[  172.480471] [c782bea0] [c0187650] gfar_clean_rx_ring+0x204/0x42c
[  172.486505] [c782bef0] [c0189074] gfar_poll+0x258/0x338
[  172.491757] [c782bf40] [c01b8b8c] net_rx_action+0x9c/0x190
[  172.497270] [c782bf70] [c002ddac] __do_softirq+0x84/0x100
[  172.502694] [c782bfa0] [c0006474] do_softirq+0x58/0x5c
[  172.507858] [c782bfb0] [c002d934] ksoftirqd+0x60/0x114
[  172.513021] [c782bfd0] [c003f050] kthread+0x4c/0x88
[  172.517920] [c782bff0] [c0013b10] kernel_thread+0x4c/0x68


Kind regards,
Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-05 20:29   ` Michael Guntsche
@ 2009-05-12 22:34     ` Michael Guntsche
  2009-05-12 22:48       ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Guntsche @ 2009-05-12 22:34 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, netdev


On May 5, 2009, at 10:29 PM, Michael Guntsche wrote:
>
> Here is a full dump from the console again including the line  
> before the "cut here" text.
> Wan is the device going to the ADSL-modem.
>
> [  172.251007] skb_over_panic: text:c0187650 len:1514 put:1514  
> head:c79a8800 data:c79a8880 tail:0xc79a8e6a end:0xc79a8e60 dev:wan
>
>

since I still had the panics with rc5 I went ahead and started  
looking for the commit that caused the problem. After some testing  
(actually a lot of compile cycles) I found out that commit

0fd56bb5be6455d0d42241e65aed057244665e5e gianfar: Add support for skb  
recycling

was the culprit. The commits before it work and starting with this  
one the skb_over_pancis start to occur.

This is as far as I can debug it though. Maybe someone more  
knowledgeable about this stuff could take a look at it or tell me how  
I can figure out what's happening exactly.

For testing purposes I reverted this commit from the current master  
and it has been running the last hour without any problems. Before  
that the panic would occur almost immediately after a download was  
started.


Once again part of the panic which is more or less the same everytime...

[  113.218513] skb_over_panic: text:c015a74c len:1514 put:1514  
head:c7a07800 data:c7a07880 tail:0xc7a07e6a end:0xc7a07e60 dev:wan
[  113.230039] ------------[ cut here ]------------
[  113.234669] kernel BUG at net/core/skbuff.c:124!
[  113.239302] Oops: Exception in kernel mode, sig: 5 [#1]
[  113.244539] MikroTik RouterBOARD 600 series
[  113.248729] Modules linked in:
[  113.251797] NIP: c01a2ad0 LR: c01a2ad0 CTR: c014e37c
[  113.256776] REGS: c0367cb0 TRAP: 0700   Not tainted  (2.6.29-rc3)
[  113.262883] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002024  XER:  
20000000
[  113.269562] TASK = c0344588[0] 'swapper' THREAD: c0366000
[  113.274799] GPR00: c01a2ad0 c0367d60 c0344588 00000085 00002625  
ffffffff c014b948 00002625
[  113.283223] GPR08: 00000030 c0370000 00002625 c03723a8 84002082  
1001b1cc 007d66c0 00000008
[  113.291647] GPR16: c02e70bc c02e70e8 c78423d8 c0345678 00000040  
c0366000 c7971800 00000000
[  113.300071] GPR24: 00000001 00000000 c78eb300 c7971ff0 c7842000  
000005ea c795c3c0 c7a07880
[  113.308689] NIP [c01a2ad0] skb_over_panic+0x48/0x5c
[  113.313582] LR [c01a2ad0] skb_over_panic+0x48/0x5c
[  113.318382] Call Trace:
[  113.320838] [c0367d60] [c01a2ad0] skb_over_panic+0x48/0x5c  
(unreliable)
[  113.327485] [c0367d70] [c01a42cc] skb_put+0x5c/0x60
[  113.332390] [c0367d80] [c015a74c] gfar_clean_rx_ring+0x20c/0x438
[  113.338422] [c0367dd0] [c015c160] gfar_poll+0x288/0x370
[  113.343672] [c0367e30] [c01af538] net_rx_action+0x94/0x188
[  113.349191] [c0367e60] [c002d0f0] __do_softirq+0x84/0x124
[  113.354621] [c0367e90] [c00063f8] do_softirq+0x58/0x5c
[  113.359781] [c0367ea0] [c002cf6c] irq_exit+0x94/0x98
[  113.364768] [c0367eb0] [c000649c] do_IRQ+0xa0/0xc4
[  113.369582] [c0367ed0] [c0014220] ret_from_except+0x0/0x14
[  113.375101] --- Exception: 501 at cpu_idle+0xa0/0xec
[  113.375109]     LR = cpu_idle+0xa0/0xec
[  113.383923] [c0367f90] [c0009164] cpu_idle+0x50/0xec (unreliable)
[  113.390054] [c0367fb0] [c027c8bc] __got2_end+0x58/0x68
[  113.395225] [c0367fc0] [c03167e4] start_kernel+0x220/0x2a0
[  113.400731] [c0367ff0] [00003438] 0x3438


Kind regards,
Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-12 22:34     ` Michael Guntsche
@ 2009-05-12 22:48       ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2009-05-12 22:48 UTC (permalink / raw)
  To: Michael Guntsche; +Cc: linux-kernel, netdev, Andy Fleming

On Wed, 13 May 2009 00:34:52 +0200
Michael Guntsche <mike@it-loops.com> wrote:

> 
> On May 5, 2009, at 10:29 PM, Michael Guntsche wrote:
> >
> > Here is a full dump from the console again including the line  
> > before the "cut here" text.
> > Wan is the device going to the ADSL-modem.
> >
> > [  172.251007] skb_over_panic: text:c0187650 len:1514 put:1514  
> > head:c79a8800 data:c79a8880 tail:0xc79a8e6a end:0xc79a8e60 dev:wan
> >
> >
> 
> since I still had the panics with rc5 I went ahead and started  
> looking for the commit that caused the problem. After some testing  
> (actually a lot of compile cycles) I found out that commit
> 
> 0fd56bb5be6455d0d42241e65aed057244665e5e gianfar: Add support for skb  
> recycling
> 
> was the culprit.

Thanks.  Let's CC andy.

> The commits before it work and starting with this  
> one the skb_over_pancis start to occur.
> 
> This is as far as I can debug it though. Maybe someone more  
> knowledgeable about this stuff could take a look at it or tell me how  
> I can figure out what's happening exactly.
> 
> For testing purposes I reverted this commit from the current master  
> and it has been running the last hour without any problems. Before  
> that the panic would occur almost immediately after a download was  
> started.
> 
> 
> Once again part of the panic which is more or less the same everytime...
> 
> [  113.218513] skb_over_panic: text:c015a74c len:1514 put:1514  
> head:c7a07800 data:c7a07880 tail:0xc7a07e6a end:0xc7a07e60 dev:wan
> [  113.230039] ------------[ cut here ]------------
> [  113.234669] kernel BUG at net/core/skbuff.c:124!
> [  113.239302] Oops: Exception in kernel mode, sig: 5 [#1]
> [  113.244539] MikroTik RouterBOARD 600 series
> [  113.248729] Modules linked in:
> [  113.251797] NIP: c01a2ad0 LR: c01a2ad0 CTR: c014e37c
> [  113.256776] REGS: c0367cb0 TRAP: 0700   Not tainted  (2.6.29-rc3)
> [  113.262883] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002024  XER:  
> 20000000
> [  113.269562] TASK = c0344588[0] 'swapper' THREAD: c0366000
> [  113.274799] GPR00: c01a2ad0 c0367d60 c0344588 00000085 00002625  
> ffffffff c014b948 00002625
> [  113.283223] GPR08: 00000030 c0370000 00002625 c03723a8 84002082  
> 1001b1cc 007d66c0 00000008
> [  113.291647] GPR16: c02e70bc c02e70e8 c78423d8 c0345678 00000040  
> c0366000 c7971800 00000000
> [  113.300071] GPR24: 00000001 00000000 c78eb300 c7971ff0 c7842000  
> 000005ea c795c3c0 c7a07880
> [  113.308689] NIP [c01a2ad0] skb_over_panic+0x48/0x5c
> [  113.313582] LR [c01a2ad0] skb_over_panic+0x48/0x5c
> [  113.318382] Call Trace:
> [  113.320838] [c0367d60] [c01a2ad0] skb_over_panic+0x48/0x5c  
> (unreliable)
> [  113.327485] [c0367d70] [c01a42cc] skb_put+0x5c/0x60
> [  113.332390] [c0367d80] [c015a74c] gfar_clean_rx_ring+0x20c/0x438
> [  113.338422] [c0367dd0] [c015c160] gfar_poll+0x288/0x370
> [  113.343672] [c0367e30] [c01af538] net_rx_action+0x94/0x188
> [  113.349191] [c0367e60] [c002d0f0] __do_softirq+0x84/0x124
> [  113.354621] [c0367e90] [c00063f8] do_softirq+0x58/0x5c
> [  113.359781] [c0367ea0] [c002cf6c] irq_exit+0x94/0x98
> [  113.364768] [c0367eb0] [c000649c] do_IRQ+0xa0/0xc4
> [  113.369582] [c0367ed0] [c0014220] ret_from_except+0x0/0x14
> [  113.375101] --- Exception: 501 at cpu_idle+0xa0/0xec
> [  113.375109]     LR = cpu_idle+0xa0/0xec
> [  113.383923] [c0367f90] [c0009164] cpu_idle+0x50/0xec (unreliable)
> [  113.390054] [c0367fb0] [c027c8bc] __got2_end+0x58/0x68
> [  113.395225] [c0367fc0] [c03167e4] start_kernel+0x220/0x2a0
> [  113.400731] [c0367ff0] [00003438] 0x3438
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-03 13:36 [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar Michael Guntsche
  2009-05-05 19:27 ` Andrew Morton
@ 2009-05-20 21:47 ` Lennert Buytenhek
  2009-05-21 22:24   ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Lennert Buytenhek @ 2009-05-20 21:47 UTC (permalink / raw)
  To: Michael Guntsche, afleming; +Cc: linux-kernel, Rafael J. Wysocki, davem

On Sun, May 03, 2009 at 03:36:27PM +0200, Michael Guntsche wrote:

> I recently tried 2.6.30-rc4 on a routerboard currently running 2.6.29  
> (it is running stable with this kernel).
> 
> This board is used as a gateway and under load I see the following BUG  
> with 2.6.30-rc4.
> 
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:126!
> Oops: Exception in kernel mode, sig: 5 [#1]
> MikroTik RouterBOARD 600 series
> Modules linked in: nf_nat_rtsp nf_conntrack_rtsp
> NIP: c01abc68 LR: c01abc68 CTR: c015559c
> REGS: c7aa7b20 TRAP: 0700   Not tainted  (2.6.30-rc4)
> MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002424  XER: 20000000
> TASK = c7855bc0[588] 'pptpgw' THREAD: c7aa6000
> GPR00: c01abc68 c7aa7bd0 c7855bc0 00000085 0000295e ffffffff c0152b68  
> 00000030
> GPR08: c03848d4 c0350000 0000295e c0380398 84002422 10029614 100de49c  
> 100e0000
> GPR16: 100b45a0 00000040 c02f6260 c02f628c c7846380 c7aa6000 c7957800  
> 00000000
> GPR24: 00000002 0000003e c7a12480 c7957a00 c7846000 000005e6 c7956240  
> c7a8b880
> NIP [c01abc68] skb_over_panic+0x48/0x5c
> LR [c01abc68] skb_over_panic+0x48/0x5c
> Call Trace:
> [c7aa7bd0] [c01abc68] skb_over_panic+0x48/0x5c (unreliable)
> [c7aa7be0] [c01ad468] skb_put+0x5c/0x60

gianfar puts skbuffs that are in the rx ring back onto the recycle
list if there was a receive error, but this breaks the following
invariant: that all skbuffs on the recycle list have skb->data =
skb->head + NET_SKB_PAD (NET_SKB_PAD being 32 for you).

In this case, the skb's ->data will be skb->head + RXBUF_ALIGNMENT
(where RXBUF_ALIGNMENT is 64) when it is put onto the recycle list.
And when gfar_new_skb() picks this skb off the recycle list again,
it'll do:

        alignamount = RXBUF_ALIGNMENT -
                (((unsigned long) skb->data) & (RXBUF_ALIGNMENT - 1));

        /* We need the data buffer to be aligned properly.  We will reserve
         * as many bytes as needed to align the data properly
         */
        skb_reserve(skb, alignamount);

So now skb->data will be skb->head + 128, and there won't be enough
space between skb->head and skb->end to hold a full-sized packet.

Something like the patch below would fix it.

(Or, one could change the RXBUF_ALIGNMENT code to be idempotent (i.e.
do nothing if skb->data is already aligned), that'd fix it too -- but
you'll want to stick a big fat comment a la "this is subtle" somewhere
in that case.)


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index b2c4967..85883c7 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1886,7 +1886,7 @@ int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit)
 			if (unlikely(!newskb))
 				newskb = skb;
 			else if (skb)
-				__skb_queue_head(&priv->rx_recycle, skb);
+				dev_kfree_skb_any(skb);
 		} else {
 			/* Increment the number of packets */
 			dev->stats.rx_packets++;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-20 21:47 ` Lennert Buytenhek
@ 2009-05-21 22:24   ` David Miller
  2009-05-22 14:27     ` Michael Guntsche
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2009-05-21 22:24 UTC (permalink / raw)
  To: buytenh; +Cc: mike, afleming, linux-kernel, rjw

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Wed, 20 May 2009 23:47:34 +0200

> gianfar puts skbuffs that are in the rx ring back onto the recycle
> list if there was a receive error, but this breaks the following
> invariant: that all skbuffs on the recycle list have skb->data =
> skb->head + NET_SKB_PAD (NET_SKB_PAD being 32 for you).
> 
> In this case, the skb's ->data will be skb->head + RXBUF_ALIGNMENT
> (where RXBUF_ALIGNMENT is 64) when it is put onto the recycle list.
> And when gfar_new_skb() picks this skb off the recycle list again,
> it'll do:
> 
>         alignamount = RXBUF_ALIGNMENT -
>                 (((unsigned long) skb->data) & (RXBUF_ALIGNMENT - 1));
> 
>         /* We need the data buffer to be aligned properly.  We will reserve
>          * as many bytes as needed to align the data properly
>          */
>         skb_reserve(skb, alignamount);
> 
> So now skb->data will be skb->head + 128, and there won't be enough
> space between skb->head and skb->end to hold a full-sized packet.
> 
> Something like the patch below would fix it.

Let me know when a final, tested, version of this patch is available
and please make sure it makes it to netdev@vger.kernel.org

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-21 22:24   ` David Miller
@ 2009-05-22 14:27     ` Michael Guntsche
  2009-05-22 17:18       ` Andy Fleming
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Guntsche @ 2009-05-22 14:27 UTC (permalink / raw)
  To: David Miller, afleming; +Cc: buytenh, linux-kernel, Rafael J. Wysocki, netdev


On May 22, 2009, at 0:24, David Miller wrote:

> From: Lennert Buytenhek <buytenh@wantstofly.org>
> Date: Wed, 20 May 2009 23:47:34 +0200
>
>> gianfar puts skbuffs that are in the rx ring back onto the recycle
>> list if there was a receive error, but this breaks the following
>> invariant: that all skbuffs on the recycle list have skb->data =
>> skb->head + NET_SKB_PAD (NET_SKB_PAD being 32 for you).
>>
>> In this case, the skb's ->data will be skb->head + RXBUF_ALIGNMENT
>> (where RXBUF_ALIGNMENT is 64) when it is put onto the recycle list.
>> And when gfar_new_skb() picks this skb off the recycle list again,
>> it'll do:
>>
>>        alignamount = RXBUF_ALIGNMENT -
>>                (((unsigned long) skb->data) & (RXBUF_ALIGNMENT - 1));
>>
>>        /* We need the data buffer to be aligned properly.  We will  
>> reserve
>>         * as many bytes as needed to align the data properly
>>         */
>>        skb_reserve(skb, alignamount);
>>
>> So now skb->data will be skb->head + 128, and there won't be enough
>> space between skb->head and skb->end to hold a full-sized packet.
>>
>> Something like the patch below would fix it.
>
> Let me know when a final, tested, version of this patch is available
> and please make sure it makes it to netdev@vger.kernel.org

I can confirm that Lennert's patch fixes my panic problems here. After  
applying it, my connection has been rock solid and I no longer get any  
kernel panics.
That said, I think the final descision if this patch should go in like  
it is now has to be made by Andy Fleming since he wrote the initial  
code.

Kind regards,
Michael

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index b2c4967..85883c7 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1886,7 +1886,7 @@ int gfar_clean_rx_ring(struct net_device *dev,  
int rx_work_limit)
			if (unlikely(!newskb))
				newskb = skb;
			else if (skb)
-				__skb_queue_head(&priv->rx_recycle, skb);
+				dev_kfree_skb_any(skb);
		} else {
			/* Increment the number of packets */
			dev->stats.rx_packets++;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-22 14:27     ` Michael Guntsche
@ 2009-05-22 17:18       ` Andy Fleming
  2009-05-22 17:59         ` Michael Guntsche
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Fleming @ 2009-05-22 17:18 UTC (permalink / raw)
  To: Michael Guntsche
  Cc: David Miller, buytenh, linux-kernel, Rafael J. Wysocki, netdev


On May 22, 2009, at 9:27 AM, Michael Guntsche wrote:

>>>
>>> Something like the patch below would fix it.
>>
>> Let me know when a final, tested, version of this patch is available
>> and please make sure it makes it to netdev@vger.kernel.org
>
> I can confirm that Lennert's patch fixes my panic problems here.  
> After applying it, my connection has been rock solid and I no longer  
> get any kernel panics.
> That said, I think the final descision if this patch should go in  
> like it is now has to be made by Andy Fleming since he wrote the  
> initial code.

I'm working (admittedly fairly slowly) on a patch that resets the  
pointers so they don't get moved twice.  That way we can continue to  
benefit from recycling the error skbs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar
  2009-05-22 17:18       ` Andy Fleming
@ 2009-05-22 17:59         ` Michael Guntsche
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Guntsche @ 2009-05-22 17:59 UTC (permalink / raw)
  To: Andy Fleming
  Cc: David Miller, buytenh, linux-kernel, Rafael J. Wysocki, netdev


On May 22, 2009, at 19:18, Andy Fleming wrote:

> I'm working (admittedly fairly slowly) on a patch that resets the  
> pointers so they don't get moved twice.  That way we can continue to  
> benefit from recycling the error skbs.

Ok, thanks for the information. Just tell me when you have a patch  
ready for testing, since it's fairly easy for me to reproduce this  
here with my setup. I'll keep Lennard's patch in the meantime.

Kind regards,
Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-05-22 17:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-03 13:36 [BUG] 2.6.30-rc4: Kernel BUG under network load with gianfar Michael Guntsche
2009-05-05 19:27 ` Andrew Morton
2009-05-05 20:29   ` Michael Guntsche
2009-05-12 22:34     ` Michael Guntsche
2009-05-12 22:48       ` Andrew Morton
2009-05-20 21:47 ` Lennert Buytenhek
2009-05-21 22:24   ` David Miller
2009-05-22 14:27     ` Michael Guntsche
2009-05-22 17:18       ` Andy Fleming
2009-05-22 17:59         ` Michael Guntsche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).