LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [Bug 2773] New: kernel panic under medium load
@ 2004-05-26 14:35 Martin J. Bligh
2004-05-26 16:51 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Martin J. Bligh @ 2004-05-26 14:35 UTC (permalink / raw)
To: linux-kernel; +Cc: bug-kernel
http://bugme.osdl.org/show_bug.cgi?id=2773
Summary: kernel panic under medium load
Kernel Version: 2.6.6 unpatched
Status: NEW
Severity: normal
Owner: akpm@digeo.com
Submitter: bug-kernel@leroutier.net
Distribution:
Gentoo x86
Hardware Environment:
P4 Celeron
0000:00:00.0 Host bridge: VIA Technologies, Inc. P4M266 Host Bridge
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP]
0000:00:05.0 Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 10)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
0000:00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
0000:01:00.0 VGA compatible controller: S3 Inc. VT8375 [ProSavage8 KM266/KL266]
Software Environment:
mldonkey 2.5.21 (the app that triggers the bug at each time)
Problem Description:
Unable to handle kernel paging request at virtual address 00004018
printing eip:
c012c41c
*pde = 00000000
Oops: 0002 [#1]
CPU: 0
EIP: 0060:[<c012c41c>] Not tainted
EFLAGS: 00010206 (2.6.6)
EIP is at __alloc_pages+0x5e/0x2d4
eax: 00004000 ebx: 00000000 ecx: 00000153 edx: c02ed5e4
esi: 00000001 edi: c02ed84c ebp: c6e93d6c esp: c6e93d38
ds: 007b es: 007b ss: 0068
Process mlnet (pid: 16251, threadinfo=c6e92000 task=ddd1d6b0)
Stack: c6e93d6c c015615c d261120c bfffc684 00000001 ddd1d6b0 00000010 00000000
000000d2 00006006 00000000 00006006 00000000 c6e93e54 c012a4a8 d26112a4
00006006 00000000 00000000 0001c26f c6e93d88 00000001 cfb344c8 00000020
Call Trace:
[<c015615c>] inode_update_time+0x9c/0xd0
[<c012a4a8>] generic_file_aio_write_nolock+0x2bf/0xaa6
[<c012e785>] page_cache_readahead+0x1a2/0x1fd
[<c012924a>] file_read_actor+0x0/0xda
[<c01294b3>] __generic_file_aio_read+0x18f/0x1be
[<c012924a>] file_read_actor+0x0/0xda
[<c012ad8b>] generic_file_aio_write+0x6b/0x88
[<c0170457>] ext3_file_write+0x3f/0xb8
[<c01409cd>] do_sync_write+0x89/0xb4
[<c011b0e1>] update_wall_time+0xf/0x3a
[<c0111a05>] scheduler_tick+0x1f/0x506
[<c0117cae>] __do_softirq+0x82/0x84
[<c011b263>] update_process_times+0x46/0x50
[<c011b0e1>] update_wall_time+0xf/0x3a
[<c011b4df>] do_timer+0xdf/0xe4
[<c0140944>] do_sync_write+0x0/0xb4
[<c0140a99>] vfs_write+0xa1/0x10c
[<c0140ba0>] sys_write+0x3f/0x5d
[<c0103cf3>] syscall_call+0x7/0xb
Code: 83 78 18 63 7f 07 8b 42 08 d1 e8 29 c1 8b 02 39 c8 73 0c 8b
Steps to reproduce:
run mlnet (multi-network exe of mldonkey, a P2P app) with a load > 1MB/s (in & out)
preempt is off
would attach .config on request
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 2773] New: kernel panic under medium load
2004-05-26 14:35 [Bug 2773] New: kernel panic under medium load Martin J. Bligh
@ 2004-05-26 16:51 ` Linus Torvalds
2004-05-26 23:30 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2004-05-26 16:51 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton
Hmm..
Interesting. The code is
cmpl $0x63,0x18(%eax)
jg <__alloc_pages+xx>
mov 0x8(%ebx),%eax
shr %eax
sub %eax,%ecx
__alloc_pages+xx:
mov (%edx),%eax
cmp %ecx,%eax
where %eax is 0x00004000.
Now, that would look like a pointer that _should_ be zero, but with a
single-bit error. But that's not so. The pointer in question is "p", which
should have been initialized "current". The code is
if (rt_task(p))
min -= z->pages_low >> 1;
and 0x63 is just (MAX_RT_PRIO-1). So it's just doing a
if ((p)->prio < MAX_RT_PRIO)
min -= z->pages_low >> 1;
and "p" is seriously corrupt.
My first reaction was that this looks like a thread-info corruption,
possibly due to a stack overflow that just overwrote the thread-info (and
thus the "current" pointer). HOWEVER, looking closer, I note that the oops
report gets it right and says
Process mlnet (pid: 16251, threadinfo=c6e92000 task=ddd1d6b0)
and that the stack dump (which contains the stack value of "p") agrees:
the value of "p" on the stack is at 20(%esp) (if your compiler agrees with
mine, and it seems to), and that is indeed dumped as the correct value
"ddd1d6b0". Also %esp is 0xc6e93d38, which is not even _close_ to the end
of stack, and shows that you're not running with the 4kB stack anyway.
So "p" _was_ correct at some point, and the only incorrect value is in
fact the register value in %eax.
Now, to make it more interesting, at least for me, the instruction
_immediately_ preceding the "cmp" that oopsed is in fact:
movl 20(%esp), %ecx
(for my config, %ecx is the register that contains the value of "p", we
seem to have different register allocation, either because of compiler
differences or because of config changes).
What happens for you before? Can you do a "make mm/page_alloc.s" and post
the result (well, just __alloc_pages, not the rest).
(Btw, I hate web interfaces, but feel free to update bugzilla for me)
Linus
---
On Wed, 26 May 2004, Martin J. Bligh wrote:
>
> http://bugme.osdl.org/show_bug.cgi?id=2773
>
> Summary: kernel panic under medium load
> Kernel Version: 2.6.6 unpatched
> Status: NEW
> Severity: normal
> Owner: akpm@digeo.com
> Submitter: bug-kernel@leroutier.net
>
>
> Distribution:
> Gentoo x86
>
> Hardware Environment:
> P4 Celeron
> 0000:00:00.0 Host bridge: VIA Technologies, Inc. P4M266 Host Bridge
> 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP]
> 0000:00:05.0 Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 10)
> 0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 80)
> 0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 80)
> 0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 80)
> 0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
> 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
> 0000:00:11.1 IDE interface: VIA Technologies, Inc.
> VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc.
> VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
> 0000:00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
> 0000:01:00.0 VGA compatible controller: S3 Inc. VT8375 [ProSavage8 KM266/KL266]
>
> Software Environment:
> mldonkey 2.5.21 (the app that triggers the bug at each time)
>
> Problem Description:
>
> Unable to handle kernel paging request at virtual address 00004018
> printing eip:
> c012c41c
> *pde = 00000000
> Oops: 0002 [#1]
> CPU: 0
> EIP: 0060:[<c012c41c>] Not tainted
> EFLAGS: 00010206 (2.6.6)
> EIP is at __alloc_pages+0x5e/0x2d4
> eax: 00004000 ebx: 00000000 ecx: 00000153 edx: c02ed5e4
> esi: 00000001 edi: c02ed84c ebp: c6e93d6c esp: c6e93d38
> ds: 007b es: 007b ss: 0068
> Process mlnet (pid: 16251, threadinfo=c6e92000 task=ddd1d6b0)
> Stack: c6e93d6c c015615c d261120c bfffc684 00000001 ddd1d6b0 00000010 00000000
> 000000d2 00006006 00000000 00006006 00000000 c6e93e54 c012a4a8 d26112a4
> 00006006 00000000 00000000 0001c26f c6e93d88 00000001 cfb344c8 00000020
> Call Trace:
> [<c015615c>] inode_update_time+0x9c/0xd0
> [<c012a4a8>] generic_file_aio_write_nolock+0x2bf/0xaa6
> [<c012e785>] page_cache_readahead+0x1a2/0x1fd
> [<c012924a>] file_read_actor+0x0/0xda
> [<c01294b3>] __generic_file_aio_read+0x18f/0x1be
> [<c012924a>] file_read_actor+0x0/0xda
> [<c012ad8b>] generic_file_aio_write+0x6b/0x88
> [<c0170457>] ext3_file_write+0x3f/0xb8
> [<c01409cd>] do_sync_write+0x89/0xb4
> [<c011b0e1>] update_wall_time+0xf/0x3a
> [<c0111a05>] scheduler_tick+0x1f/0x506
> [<c0117cae>] __do_softirq+0x82/0x84
> [<c011b263>] update_process_times+0x46/0x50
> [<c011b0e1>] update_wall_time+0xf/0x3a
> [<c011b4df>] do_timer+0xdf/0xe4
> [<c0140944>] do_sync_write+0x0/0xb4
> [<c0140a99>] vfs_write+0xa1/0x10c
> [<c0140ba0>] sys_write+0x3f/0x5d
> [<c0103cf3>] syscall_call+0x7/0xb
>
> Code: 83 78 18 63 7f 07 8b 42 08 d1 e8 29 c1 8b 02 39 c8 73 0c 8b
>
> Steps to reproduce:
>
> run mlnet (multi-network exe of mldonkey, a P2P app) with a load > 1MB/s (in & out)
>
> preempt is off
>
> would attach .config on request
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 2773] New: kernel panic under medium load
2004-05-26 16:51 ` Linus Torvalds
@ 2004-05-26 23:30 ` Andrew Morton
2004-05-27 0:08 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2004-05-26 23:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mbligh, linux-kernel, Stephane LOEUILLET
Linus Torvalds <torvalds@osdl.org> wrote:
>
> What happens for you before? Can you do a "make mm/page_alloc.s" and post
> the result (well, just __alloc_pages, not the rest).
>
Stephane had added page_alloc.s to
http://bugme.osdl.org/show_bug.cgi?id=2773
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 2773] New: kernel panic under medium load
2004-05-26 23:30 ` Andrew Morton
@ 2004-05-27 0:08 ` Linus Torvalds
0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2004-05-27 0:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: mbligh, linux-kernel, Stephane LOEUILLET
On Wed, 26 May 2004, Andrew Morton wrote:
>
> Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > What happens for you before? Can you do a "make mm/page_alloc.s" and post
> > the result (well, just __alloc_pages, not the rest).
> >
>
> Stephane had added page_alloc.s to
>
> http://bugme.osdl.org/show_bug.cgi?id=2773
Interesting. The load is right before the instruction:
** movl -36(%ebp), %eax ***
cmpl $99, 24(%eax)
jg .L241
and the only difference here is that it says "-36(%ebp)" instead of
"20(%esp)" because it's compiled with frame pointers.
(Well, there must be something else different on the stack frame too,
according to the earlier dump it really _should_ be 20%(%esp), but the
difference between %esp and %ebp here runs to 52, not 56, which makes me
suspect the thing is compiled with some different options. Anyway, that
bogus 0x00004000 value is not on the stack at either offset in the dump,
so it shouldn't matter).
Oh, I just notice that Stephane actually _says_ that he changed the config
options.
Anyway, the interesting part seems to be that %eax is corrupt, and it
really shouldn't be. It appears to be ok in memory, and was loaded from
there just before, so I'm wondering whether there is soem _serious_
problem with this machine?
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-05-27 0:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-26 14:35 [Bug 2773] New: kernel panic under medium load Martin J. Bligh
2004-05-26 16:51 ` Linus Torvalds
2004-05-26 23:30 ` Andrew Morton
2004-05-27 0:08 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).