LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* oops, 2.4.26 and jfs
@ 2004-05-28 20:15 Chris Stromsoe
2004-05-28 20:31 ` Dave Kleikamp
0 siblings, 1 reply; 7+ messages in thread
From: Chris Stromsoe @ 2004-05-28 20:15 UTC (permalink / raw)
To: linux-kernel, Marcelo Tosatti, Dave Kleikamp
This morning during a cron run while doing a find across /, I got the
following oops.
There is only the one jfs partition on the machine It has 607376 inodes in
use out of 17524128 total, and 27350660 bytes out of 35806700. The
partition is on top of a 3-disk md RAID5 that is under constant pressure
(it's a virus quarantine). All processes trying to do anything with the
partition are stuck in D state.
Right before the oops, I logged:
May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
May 26 06:28:11 begonia last message repeated 3 times
May 26 06:28:11 begonia kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d0/0)
May 27 06:29:13 begonia kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
May 27 06:29:24 begonia last message repeated 3 times
May 28 06:28:44 begonia kernel: __alloc_pages: 1-order allocation failed (gfp=0x1f0/0)
May 28 06:28:45 begonia kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
May 28 06:28:50 begonia last message repeated 3 times
May 28 06:28:50 begonia kernel: __alloc_pages: 1-order allocation failed (gfp=0x1f0/0)
May 28 06:28:50 begonia kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d0/0)
I don't have correct modules or System.map output; I rebuilt the
kernel/modules/system.map yesterday to prepare for a reboot, and
over-wrote the System.map and modules with the new files.
-Chris
cbs@begonia:~ > ksymoops -L -M < oops
ksymoops 2.4.5 on i686 2.4.26. Options used
-V (default)
-k /proc/ksyms (default)
-L (specified)
-o /lib/modules/2.4.26/ (default)
-M (specified)
Warning (expand_objects): object
/lib/modules/2.4.26/kernel/drivers/net/eepro100.o for module eepro100 has changed since load
Warning (expand_objects): object /lib/modules/2.4.26/kernel/drivers/net/mii.o for module mii has changed since load
May 28 06:28:50 begonia kernel: BUG at jfs_txnmgr.c:2849 assert(log)
May 28 06:28:50 begonia kernel: kernel BUG at jfs_txnmgr.c:2849!
May 28 06:28:50 begonia kernel: invalid operand: 0000
May 28 06:28:50 begonia kernel: CPU: 1
May 28 06:28:50 begonia kernel: EIP: 0010:[LogSyncRelease+89/204] Not tainted
May 28 06:28:50 begonia kernel: EFLAGS: 00010282
May 28 06:28:50 begonia kernel: eax: 00000028 ebx: c3fa74a0 ecx: ffffffff edx: 00000002
May 28 06:28:50 begonia kernel: esi: 00000000 edi: f88054a8 ebp: ea595dbc esp: ea595da4
May 28 06:28:50 begonia kernel: ds: 0018 es: 0018 ss: 0018
May 28 06:28:50 begonia kernel: Process mimedefang.pl (pid: 23467, stackpage=ea595000)
May 28 06:28:50 begonia kernel: Stack: c028fd4b c028fd3e 00000b21 c028fe80 00000107 000006ff ea595dd8 c01965ae
May 28 06:28:50 begonia kernel: c3fa74a0 c8d128f0 f48827a0 00000000 00000102 ea595e40 c019521e ea595e04
May 28 06:28:50 begonia kernel: f48827a0 00000000 00000102 00000001 ea595e1c f88054a8 f7ad3880 fffffffb
May 28 06:28:50 begonia kernel: Call Trace: [txAbortCommit+82/172] [txCommit+686/700] [__mark_inode_dirty+51/172] [jfs_create+444/548] [__lru_cache_del+116/124]
May 28 06:28:50 begonia kernel: Code: 0f 0b 21 0b 3e fd 28 c0 83 c4 10 f0 ff 4b 48 8b 43 48 85 c0
Using defaults from ksymoops -t elf32-i386 -a i386
>>ebx; c3fa74a0 <_end+3c0cdac/3896e90c>
>>ecx; ffffffff <END_OF_CODE+72f05d4/????>
>>edi; f88054a8 <_end+3846adb4/3896e90c>
>>ebp; ea595dbc <_end+2a1fb6c8/3896e90c>
>>esp; ea595da4 <_end+2a1fb6b0/3896e90c>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 21 0b and %ecx,(%ebx)
Code; 00000004 Before first symbol
4: 3e ds
Code; 00000005 Before first symbol
5: fd std
Code; 00000006 Before first symbol
6: 28 c0 sub %al,%al
Code; 00000008 Before first symbol
8: 83 c4 10 add $0x10,%esp
Code; 0000000b Before first symbol
b: f0 ff 4b 48 lock decl 0x48(%ebx)
Code; 0000000f Before first symbol
f: 8b 43 48 mov 0x48(%ebx),%eax
Code; 00000012 Before first symbol
12: 85 c0 test %eax,%eax
2 warnings issued. Results may not be reliable.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-28 20:15 oops, 2.4.26 and jfs Chris Stromsoe
@ 2004-05-28 20:31 ` Dave Kleikamp
2004-05-29 1:16 ` Chris Stromsoe
0 siblings, 1 reply; 7+ messages in thread
From: Dave Kleikamp @ 2004-05-28 20:31 UTC (permalink / raw)
To: Chris Stromsoe; +Cc: linux-kernel, Marcelo Tosatti
On Fri, 2004-05-28 at 15:15, Chris Stromsoe wrote:
> This morning during a cron run while doing a find across /, I got the
> following oops.
The oops is fixed in 2.4.27-pre3 with the patch:
http://linux.bkbits.net:8080/linux-2.4/cset@1.1359.20.3
jfs still may give you problems if 0-order allocations are failing, but
it's not supposed to trap.
Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-28 20:31 ` Dave Kleikamp
@ 2004-05-29 1:16 ` Chris Stromsoe
2004-05-29 2:32 ` Chris Stromsoe
2004-05-30 17:38 ` Marcelo Tosatti
0 siblings, 2 replies; 7+ messages in thread
From: Chris Stromsoe @ 2004-05-29 1:16 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel, Marcelo Tosatti
On Fri, 28 May 2004, Dave Kleikamp wrote:
> On Fri, 2004-05-28 at 15:15, Chris Stromsoe wrote:
> > This morning during a cron run while doing a find across /, I got the
> > following oops.
>
> The oops is fixed in 2.4.27-pre3 with the patch:
> http://linux.bkbits.net:8080/linux-2.4/cset@1.1359.20.3
>
> jfs still may give you problems if 0-order allocations are failing, but
> it's not supposed to trap.
Thanks, patch applied.
Aside from that:
> May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation failed
> (gfp=0x1f0/0)
I'm curious about why 0-order allocations would fail. From everything
I've read (google searching for the error message), that indicates an out
of memory condition, which shouldn't be the case.
The box in question has 4Gb of physical ram (512Mb is used as tmpfs) and
9Gb of swap. When the oops happened, no swap was in use. Physical ram
was pretty much filled, but no swap at all. OOM_KILLER is not enabled.
There's nothing especially exotic in the box. It does a lot of network
traffic (eepro100) and a lot of disk traffic (aic7xxx). The morning cron
jobs had just kicked off. Two of them do "find /" -- I believe that the
second one was running when it happened.
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-29 1:16 ` Chris Stromsoe
@ 2004-05-29 2:32 ` Chris Stromsoe
2004-05-30 17:38 ` Marcelo Tosatti
1 sibling, 0 replies; 7+ messages in thread
From: Chris Stromsoe @ 2004-05-29 2:32 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel, Marcelo Tosatti
On Fri, 28 May 2004, Chris Stromsoe wrote:
> Aside from that:
>
> > May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation
> > failed (gfp=0x1f0/0)
>
> I'm curious about why 0-order allocations would fail. From everything
> I've read (google searching for the error message), that indicates an
> out of memory condition, which shouldn't be the case.
>
> The box in question has 4Gb of physical ram (512Mb is used as tmpfs) and
> 9Gb of swap. When the oops happened, no swap was in use. Physical ram
> was pretty much filled, but no swap at all. OOM_KILLER is not enabled.
>
> There's nothing especially exotic in the box. It does a lot of network
> traffic (eepro100) and a lot of disk traffic (aic7xxx). The morning
> cron jobs had just kicked off. Two of them do "find /" -- I believe
> that the second one was running when it happened.
Looking back through my mail logs, I've had (and reported) problems with
0-order allocations failing and random hangs with this same workload on 2
other machines at least as far back as early April 3, 2004, with 2.4.23.
See <http://marc.theaimsgroup.com/?l=linux-kernel&m=107835211117799&w=2>
for an earlier report.
Differences between then and now: I'm using tmpfs instead of rd,
CONFIG_HIGHIO=y is set, and I've upgraded the kernel to 2.4.26.
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-29 1:16 ` Chris Stromsoe
2004-05-29 2:32 ` Chris Stromsoe
@ 2004-05-30 17:38 ` Marcelo Tosatti
2004-05-31 11:19 ` Chris Stromsoe
1 sibling, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2004-05-30 17:38 UTC (permalink / raw)
To: Chris Stromsoe; +Cc: Dave Kleikamp, linux-kernel
On Fri, May 28, 2004 at 06:16:22PM -0700, Chris Stromsoe wrote:
> On Fri, 28 May 2004, Dave Kleikamp wrote:
>
> > On Fri, 2004-05-28 at 15:15, Chris Stromsoe wrote:
> > > This morning during a cron run while doing a find across /, I got the
> > > following oops.
> >
> > The oops is fixed in 2.4.27-pre3 with the patch:
> > http://linux.bkbits.net:8080/linux-2.4/cset@1.1359.20.3
> >
> > jfs still may give you problems if 0-order allocations are failing, but
> > it's not supposed to trap.
>
> Thanks, patch applied.
>
>
> Aside from that:
>
> > May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation failed
> > (gfp=0x1f0/0)
>
> I'm curious about why 0-order allocations would fail. From everything
> I've read (google searching for the error message), that indicates an out
> of memory condition, which shouldn't be the case.
>
> The box in question has 4Gb of physical ram (512Mb is used as tmpfs) and
> 9Gb of swap. When the oops happened, no swap was in use. Physical ram
> was pretty much filled, but no swap at all. OOM_KILLER is not enabled.
Hi Chris,
This seems to be a normal allocation (which can wait), it really
looks the system was out of memory.
Can you stick a call to show_free_areas() in mm/page_alloc.c after
printk(KERN_NOTICE "__alloc_pages: %u-order allocation failed (gfp=0x%x/%i)\n",
order, gfp_mask, !!(current->flags & PF_MEMALLOC));
so we know the state of the memory areas when it happens again.
Also turn on /proc/sys/vm/vm_gfp_debug.
> There's nothing especially exotic in the box. It does a lot of network
> traffic (eepro100) and a lot of disk traffic (aic7xxx). The morning cron
> jobs had just kicked off. Two of them do "find /" -- I believe that the
> second one was running when it happened.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-30 17:38 ` Marcelo Tosatti
@ 2004-05-31 11:19 ` Chris Stromsoe
2004-05-31 14:05 ` Marcelo Tosatti
0 siblings, 1 reply; 7+ messages in thread
From: Chris Stromsoe @ 2004-05-31 11:19 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-kernel
On Sun, 30 May 2004, Marcelo Tosatti wrote:
> On Fri, May 28, 2004 at 06:16:22PM -0700, Chris Stromsoe wrote:
>
> > Aside from that:
> >
> > > May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation
> > > failed (gfp=0x1f0/0)
> >
> > I'm curious about why 0-order allocations would fail. From everything
> > I've read (google searching for the error message), that indicates an
> > out of memory condition, which shouldn't be the case.
> >
> > The box in question has 4Gb of physical ram (512Mb is used as tmpfs)
> > and 9Gb of swap. When the oops happened, no swap was in use.
> > Physical ram was pretty much filled, but no swap at all. OOM_KILLER
> > is not enabled.
>
> Hi Chris,
>
> This seems to be a normal allocation (which can wait), it really looks
> the system was out of memory.
Hrm. No swap was in use at all when it happened; there was >9Gb free.
Since I started watching memory usage (mid-October), I've never seen this
particular machine use more than ~ 200Mb or so of swap. The workload
generates a lot of disk access and most of hte "used" memory is cache.
/proc/meminfo right now shows:
cbs:~ > cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 4175282176 4049809408 125472768 0 155308032 3207811072
Swap: 9606946816 12288 9606934528
MemTotal: 4077424 kB
MemFree: 122532 kB
MemShared: 0 kB
Buffers: 151668 kB
Cached: 3132616 kB
SwapCached: 12 kB
Active: 608876 kB
Inactive: 3019980 kB
HighTotal: 3211264 kB
HighFree: 112708 kB
LowTotal: 866160 kB
LowFree: 9824 kB
SwapTotal: 9381784 kB
SwapFree: 9381772 kB
which is pretty typical. Should it be able to OOM when there is swap
free? Or when most of the used memory is cache?
> Can you stick a call to show_free_areas() in mm/page_alloc.c after
>
> printk(KERN_NOTICE "__alloc_pages: %u-order allocation failed (gfp=0x%x/%i)\n",
> order, gfp_mask, !!(current->flags & PF_MEMALLOC));
>
> so we know the state of the memory areas when it happens again.
>
> Also turn on /proc/sys/vm/vm_gfp_debug.
done. I'll wait until the crash happens again. It could be a few weeks.
What will vm_gfp_debug do -- what should I look for when it happens?
-Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: oops, 2.4.26 and jfs
2004-05-31 11:19 ` Chris Stromsoe
@ 2004-05-31 14:05 ` Marcelo Tosatti
0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2004-05-31 14:05 UTC (permalink / raw)
To: Chris Stromsoe; +Cc: linux-kernel
On Mon, May 31, 2004 at 04:19:39AM -0700, Chris Stromsoe wrote:
> On Sun, 30 May 2004, Marcelo Tosatti wrote:
>
> > On Fri, May 28, 2004 at 06:16:22PM -0700, Chris Stromsoe wrote:
> >
> > > Aside from that:
> > >
> > > > May 26 06:28:10 begonia kernel: __alloc_pages: 0-order allocation
> > > > failed (gfp=0x1f0/0)
> > >
> > > I'm curious about why 0-order allocations would fail. From everything
> > > I've read (google searching for the error message), that indicates an
> > > out of memory condition, which shouldn't be the case.
> > >
> > > The box in question has 4Gb of physical ram (512Mb is used as tmpfs)
> > > and 9Gb of swap. When the oops happened, no swap was in use.
> > > Physical ram was pretty much filled, but no swap at all. OOM_KILLER
> > > is not enabled.
> >
> > Hi Chris,
> >
> > This seems to be a normal allocation (which can wait), it really looks
> > the system was out of memory.
>
> Hrm. No swap was in use at all when it happened; there was >9Gb free.
> Since I started watching memory usage (mid-October), I've never seen this
> particular machine use more than ~ 200Mb or so of swap. The workload
> generates a lot of disk access and most of hte "used" memory is cache.
> /proc/meminfo right now shows:
>
> cbs:~ > cat /proc/meminfo
> total: used: free: shared: buffers: cached:
> Mem: 4175282176 4049809408 125472768 0 155308032 3207811072
> Swap: 9606946816 12288 9606934528
> MemTotal: 4077424 kB
> MemFree: 122532 kB
> MemShared: 0 kB
> Buffers: 151668 kB
> Cached: 3132616 kB
> SwapCached: 12 kB
> Active: 608876 kB
> Inactive: 3019980 kB
> HighTotal: 3211264 kB
> HighFree: 112708 kB
> LowTotal: 866160 kB
> LowFree: 9824 kB
> SwapTotal: 9381784 kB
> SwapFree: 9381772 kB
>
> which is pretty typical. Should it be able to OOM when there is swap
> free? Or when most of the used memory is cache?
Should not really be able to OOM with swap free because of this check:
/**
* out_of_memory - is the system out of memory?
*/
void out_of_memory(void)
{
/*
* oom_lock protects out_of_memory()'s static variables.
* It's a global lock; this is not performance-critical.
*/
static spinlock_t oom_lock = SPIN_LOCK_UNLOCKED;
static unsigned long first, last, count, lastkill;
unsigned long now, since;
/*
* Enough swap space left? Not OOM.
*/
if (nr_swap_pages > 0)
return;
> > Can you stick a call to show_free_areas() in mm/page_alloc.c after
> >
> > printk(KERN_NOTICE "__alloc_pages: %u-order allocation failed (gfp=0x%x/%i)\n",
> > order, gfp_mask, !!(current->flags & PF_MEMALLOC));
> >
> > so we know the state of the memory areas when it happens again.
> >
> > Also turn on /proc/sys/vm/vm_gfp_debug.
>
> done. I'll wait until the crash happens again. It could be a few weeks.
> What will vm_gfp_debug do -- what should I look for when it happens?
It will print the calltrace when a memory allocation fails.
show_free_areas() should print the memory state (function used by alt+sysrq+m)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-05-31 14:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-28 20:15 oops, 2.4.26 and jfs Chris Stromsoe
2004-05-28 20:31 ` Dave Kleikamp
2004-05-29 1:16 ` Chris Stromsoe
2004-05-29 2:32 ` Chris Stromsoe
2004-05-30 17:38 ` Marcelo Tosatti
2004-05-31 11:19 ` Chris Stromsoe
2004-05-31 14:05 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).