LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: Re: Re: [Oops]  i386 mm/slab.c (cache_flusharray)
@ 2003-11-26 23:25 pinotj
  2003-12-01 23:46 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: pinotj @ 2003-11-26 23:25 UTC (permalink / raw)
  To: torvalds; +Cc: manfred, akpm, linux-kernel

Here is the result of test of 2.6.0-test10 with the printk patch in slab.c and this new patch for fork.c from Linus :

# --------------------------------------------
# 03/11/25 torvalds@home.osdl.org 1.1487
# Fix error return on concurrent fork() with threaded exit()
# --------------------------------------------

Again the test is made in heavy load (compilation of kernel)
1st compilation: OK
2nd compilation straight after, oops :

---
slab: double free detected in cache 'vm_area_struct', objp cd4783f8, objnr 10, slabp cd478000, s_mem cd478100, bufctl ffffffff.
------------[ cut here ]------------
kernel BUG at mm/slab.c:1956!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[free_block+357/784]    Not tainted
EIP:    0060:[<c015ad55>]    Not tainted
EFLAGS: 00010096
EIP is at free_block+0x165/0x310
eax: 00000083   ebx: 00000009   ecx: c0697854   edx: c05714f8
esi: cd478000   edi: cd478018   ebp: ceaddb78   esp: ceaddb44
ds: 007b   es: 007b   ss: 0068
Process login (pid: 222, threadinfo=ceadc000 task=ceb0c960)
Stack: c0504f40 c0502370 cd4783f8 0000000a cd478000 cd478100 ffffffff 0000000a 
       cd4783f8 00000005 cffdff08 c5b29100 00000010 ceaddbb0 c015afda cffed980 
       cffdff08 00000010 c95d7ef8 c5b29234 cffd91dc ceaddbc4 cffee788 00000010 
Call Trace:
 [cache_flusharray+218/688] cache_flusharray+0xda/0x2b0
 [<c015afda>] cache_flusharray+0xda/0x2b0
 [kmem_cache_free+429/912] kmem_cache_free+0x1ad/0x390
 [<c015b7bd>] kmem_cache_free+0x1ad/0x390
 [exit_mmap+505/688] exit_mmap+0x1f9/0x2b0
---

Full log : http://cercle-daejeon.homelinux.org/oops-full2.txt
Ksymoops : http://cercle-daejeon.homelinux.org/oops2.txt

Sorry that is doesn't work.
Maybe the best way is to find which patch between test9 and test10 makes this happen but it takes me a really long time, I have no idea how to choose. I already tested the files in mm subfolder unsuccessfully.
As I already said, the problem appeared in the cset-20031115_0206.txt.gz.

Regards,

Jerome Pinot





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Re: [Oops]  i386 mm/slab.c (cache_flusharray)
  2003-11-26 23:25 Re: Re: [Oops] i386 mm/slab.c (cache_flusharray) pinotj
@ 2003-12-01 23:46 ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2003-12-01 23:46 UTC (permalink / raw)
  To: pinotj; +Cc: manfred, akpm, linux-kernel



On Thu, 27 Nov 2003 pinotj@club-internet.fr wrote:
>
> Here is the result of test of 2.6.0-test10 with the printk patch in
> slab.c and this new patch for fork.c from Linus :

The fork.c change can really only affect threaded programs using the new
threading, and even then is likely to hit only in very unlikely
circumstances. Certainly not a kernel compile.

I'm wondering if the slab debugging code is just broken somehow. If you
have lots of memory, it should even work for you.

NOTE! For this patch to make sense, you have to enable the page allocator
debugging thing (CONFIG_DEBUG_PAGEALLOC), and you have to live with the
fact that it wastes a _lot_ of memory.

There's another problem with this patch: if the bug is actually in the
slab code itself, this will obviously not find it, since it disables that
code entirely.

		Linus

----
===== mm/slab.c 1.110 vs edited =====
--- 1.110/mm/slab.c	Tue Oct 21 22:10:10 2003
+++ edited/mm/slab.c	Mon Dec  1 15:29:06 2003
@@ -1906,6 +1906,21 @@

 static inline void * __cache_alloc (kmem_cache_t *cachep, int flags)
 {
+#if 1
+	void *ptr = (void*)__get_free_pages(flags, cachep->gfporder);
+	if (ptr) {
+		struct page *page = virt_to_page(ptr);
+		SET_PAGE_CACHE(page, cachep);
+		SET_PAGE_SLAB(page, 0x01020304);
+		if (cachep->ctor) {
+			unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR;
+			if (!(flags & __GFP_WAIT))
+				ctor_flags |= SLAB_CTOR_ATOMIC;
+			cachep->ctor(ptr, cachep, ctor_flags);
+		}
+	}
+	return ptr;
+#else
 	unsigned long save_flags;
 	void* objp;
 	struct array_cache *ac;
@@ -1925,6 +1940,7 @@
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0));
 	return objp;
+#endif
 }

 /*
@@ -2042,6 +2058,15 @@
  */
 static inline void __cache_free (kmem_cache_t *cachep, void* objp)
 {
+#if 1
+	{
+		struct page *page = virt_to_page(objp);
+		int order = cachep->gfporder;
+		if (cachep->dtor)
+			cachep->dtor(objp, cachep, 0);
+		__free_pages(page, order);
+	}
+#else
 	struct array_cache *ac = ac_data(cachep);

 	check_irq_off();
@@ -2056,6 +2081,7 @@
 		cache_flusharray(cachep, ac);
 		ac_entry(ac)[ac->avail++] = objp;
 	}
+#endif
 }

 /**

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Re: [Oops]  i386 mm/slab.c (cache_flusharray)
@ 2003-12-02 23:45 pinotj
  0 siblings, 0 replies; 4+ messages in thread
From: pinotj @ 2003-12-02 23:45 UTC (permalink / raw)
  To: torvalds; +Cc: manfred, akpm, linux-kernel, nathans




----Message d'origine----
>Date: Mon, 1 Dec 2003 16:36:33 -0800 (PST)
>De: Linus Torvalds <torvalds@osdl.org>
>A: pinotj@club-internet.fr
>Copie à: manfred@colorfullife.com, Andrew Morton <akpm@osdl.org>,
>Sujet: Re: Re: [Oops]  i386 mm/slab.c (cache_flusharray)
>
>
>
>On Sat, 29 Nov 2003 pinotj@club-internet.fr wrote:
>>
>> I triggered the slab oops with a very small kernel -test11 (~700KB):
>
>The only thing that looks at _all_ likely to explain the problem is
>
>> CONFIG_XFS_FS=y
>
>since there aren't that many XFS users I know of. It's also now the only
>thing that uses buffer heads in your config, so..
>
>I assume it's not an option to try another filesystem on this setup, but
>it's entirely possible that the 2.6.x buffer-head removal has impacted XFS
>negatively - although I'm a bit surprised at how easily you seem to show
>problems, since XFS actually has active maintenance.
[...]

Yes, I use XFS with my LFS but I got some problem with my slack on ext3 too. Usually under X and settings are not so good for debugging so I didn't made bug reports (nvidia driver too). I will try to confirm the problem under ext3.

Jerome


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Re: [Oops]  i386 mm/slab.c (cache_flusharray)
@ 2003-11-22  7:47 pinotj
  0 siblings, 0 replies; 4+ messages in thread
From: pinotj @ 2003-11-22  7:47 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, manfred, linux-kernel

>> Summary: Oops reproductible when heavy load, bug in mm/slab.c
>
>Do you have CONFIG_PREEMPT on, and if so, does it go away if you compile
>without PREEMPT? We have at least one other bug that seems to be dependent
>on CONFIG_PREEMPT.
>
>		Linus

Yes, I have CONFIG_PREEMPT=y in my .config
I will try without next time.

Here is the result about what asked Manfred.
In my logs, I found:
---
Nov 21 05:46:48 gegenux kernel: slab: double free detected in cache 'buffer_head', objp c4c8e3d8, objnr 10, slabp c4c8e000,
Nov 21 05:46:49 gegenux kernel: slab: double free detected in cache 'buffer_head', objp c9a582ac, objnr 5, slabp c9a58000,
Nov 21 07:01:50 gegenux kernel: slab: double free detected in cache 'pte_chain', objp c18a6600, objnr 10, slabp c18a6000,
---

So the objnr can be different but it's in the same oops (look the time). The slabp always finish by 0xXXXXX000.

I compiled again with the patch you gave.
First compilation (kernel) no freeze, simple error of `as` but I got this in the logs:
---
slab error in cache_free_debugcheck(): cache `bio': double free, or memory outside object was overwritten
Call Trace:
 [kmem_cache_free+687/912] kmem_cache_free+0x2af/0x390
 [<c015b8cf>] kmem_cache_free+0x2af/0x390
 [mempool_free+224/544] mempool_free+0xe0/0x220
 [<c01535d0>] mempool_free+0xe0/0x220
 [mempool_free+224/544] mempool_free+0xe0/0x220
 [<c01535d0>] mempool_free+0xe0/0x220
 [kernel_map_pages+40/144] kernel_map_pages+0x28/0x90
 [<c0122bd8>] kernel_map_pages+0x28/0x90
 [bio_destructor+57/96] bio_destructor+0x39/0x60
 [<c01816f9>] bio_destructor+0x39/0x60
 [bio_put+41/64] bio_put+0x29/0x40
 [<c01818e9>] bio_put+0x29/0x40
 [end_bio_bh_io_sync+56/64] end_bio_bh_io_sync+0x38/0x40
 [<c0180e18>] end_bio_bh_io_sync+0x38/0x40
 [bio_endio+77/128] bio_endio+0x4d/0x80
--cut--
[background_writeout+0/176] background_writeout+0x0/0xb0
 [<c01568a0>] background_writeout+0x0/0xb0
 [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
 [<c01092a9>] kernel_thread_helper+0x5/0xc

c6fd7870: redzone 1: 0x170fc2a5, redzone 2: 0x160fc2a5.
---
System looks OK, I tried a second compilation just after and this time I got an oops:
---
slab: double free detected in cache 'buffer_head', objp cc3f9798, objnr 26, slabp cc3f9000, s_mem cc3f9180 bufctl f7ffffff.

mm/slab.c:1777: spin_lock(mm/slab.c:cffed844) already locked by mm/slab.c/1994
---cut---
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
kernel BUG at mm/slab.c:1956!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[free_block+363/784]    Not tainted
EIP:    0060:[<c015ad6b>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010092
eax: 0000007f   ebx: 0000000a   ecx: c06973dc   edx: c05712f8
esi: cc3f9000   edi: cc3f9018   ebp: cf821c68   esp: cf821c34
ds: 007b   es: 007b   ss: 0068
Stack: c0504d00 c05058fd cc3f9798 0000001a cc3f9000 cc3f9180 f7ffffff 0000001a
       cc3f9798 0000000b cffdef08 c3bb8180 00000010 cf821ca0 c015afea cffed800
       cffdef08 00000010 cf821ce8 c010cabc 80010c00 cd37e000 cffee730 00000010
Call Trace:
 [<c015afea>] cache_flusharray+0xda/0x2b0
 [<c010cabc>] common_interrupt+0x18/0x20
 [<c015b7cd>] kmem_cache_free+0x1ad/0x39
---cut---

You can find the complete log here:
http://cercle-daejeon.homelinux.org/oops-log.txt

Hope this will help...

Jerome



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-12-02 23:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-26 23:25 Re: Re: [Oops] i386 mm/slab.c (cache_flusharray) pinotj
2003-12-01 23:46 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2003-12-02 23:45 pinotj
2003-11-22  7:47 pinotj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).