LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* kernel BUGs when removing largish files with the SLOB allocator
@ 2006-09-04  6:56 Aubrey
  2006-09-04 10:21 ` David Howells
  0 siblings, 1 reply; 23+ messages in thread
From: Aubrey @ 2006-09-04  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: mpm, dhowells, davidm, gerg

Hi all,

I'm working on the nommu blackfin uclinux(2.6.16) platform and  get a
kernel BUG! at mm/nommu.c:124 when removing largish file with the SLOB
allocator.

root:~> df -k
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock3            3008      1264      1744  42% /
root:~> cp /bin/busybox /busy
root:~> df -k
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock3            3008      1532      1476  51% /
root:~> ls -l /bin/busybox /busy
-rwxr-xr-x    1 0        0          423904 /bin/busybox
-rwxr-xr-x    1 0        0          423904 /busy
root:~> md5sum /bin/busybox
7db253a2259ab71bc854c9e5dac544d6  /bin/busybox
root:~> md5sum /busy
7db253a2259ab71bc854c9e5dac544d6  /busy
root:~> rm /busy
kernel BUG at mm/nommu.c:124!
Kernel panic - not syncing: BUG!

Bug comes from  mm/nommu.c:
=======================================
        if (!objp || !((page = virt_to_page(objp))) ||
           (unsigned long)objp >= memory_end)
                return 0;

        if (PageSlab(page))
                return ksize(objp);

        BUG_ON(page->index < 0);
124:    BUG_ON(page->index >= MAX_ORDER);
=======================================

This seems that the SLOB allocator doesn't set the SLAB page flag
while nommu.c seem to be written for SLAB only.

On my side the following patch seems to work around the issue
============================================================
--- nommu.c     2006-06-26 14:49:28.000000000 +0800
+++ nommu.c.new 2006-06-26 14:47:20.000000000 +0800
@@ -18,7 +18,9 @@
 #include <linux/file.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
-#include <linux/slab.h>
+#ifdef CONFIG_SLAB
+# include <linux/slab.h>
+#endif
 #include <linux/vmalloc.h>
 #include <linux/ptrace.h>
 #include <linux/blkdev.h>
@@ -117,7 +119,9 @@
        if (!objp || !((page = virt_to_page(objp))) || (unsigned
long)objp >= memory_end)
                return 0;

+#ifdef CONFIG_SLAB
        if (PageSlab(page))
+#endif
                return ksize(objp);

        BUG_ON(page->index < 0);
============================================================

Is there any solution/patch to fix the issue?

Any suggestions are really appreciated.

Thanks,
-Aubrey

-- 
VGER BF report: U 0.498985

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-04  6:56 kernel BUGs when removing largish files with the SLOB allocator Aubrey
@ 2006-09-04 10:21 ` David Howells
  2006-09-05  3:52   ` Aubrey
  2006-09-05  9:35   ` David Howells
  0 siblings, 2 replies; 23+ messages in thread
From: David Howells @ 2006-09-04 10:21 UTC (permalink / raw)
  To: Aubrey; +Cc: linux-kernel, mpm, dhowells, davidm, gerg


Aubrey <aubreylee@gmail.com> wrote:

> Is there any solution/patch to fix the issue?

Make the SLOB allocator mark its pages PG_slab, just like the SLAB allocator
does.  I think this should be okay as the SLOB allocator and the SLAB
allocator seem to be mutually exclusive.

Using PG_slab would also give an instant check to things like SLOB's kfree().

> +#ifdef CONFIG_SLAB
>        if (PageSlab(page))
> +#endif

This is not a valid workaround as the object won't necessarily have been
allocated from a slab (shared ramfs mappings and SYSV SHM for example).  You
may not pass to ksize() objects allocated by means other than SLAB/SLOB.

David

-- 
VGER BF report: H 1.12398e-05

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-04 10:21 ` David Howells
@ 2006-09-05  3:52   ` Aubrey
  2006-09-05  9:35   ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Aubrey @ 2006-09-05  3:52 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, mpm, davidm, gerg

IMHO the problem is nommu.c is written for slab only. So when slob is
enabled, it need to be considered to make some modification to make
two or more memory allocator algorithms work properly, rather than to
force all others algorithm to be compatible with the current one(slab)
to match the code in the nommu.c, which is not common enough.

Does that make sense?

Regards,
-Aubrey

On 9/4/06, David Howells <dhowells@redhat.com> wrote:
>
> Aubrey <aubreylee@gmail.com> wrote:
>
> > Is there any solution/patch to fix the issue?
>
> Make the SLOB allocator mark its pages PG_slab, just like the SLAB allocator
> does.  I think this should be okay as the SLOB allocator and the SLAB
> allocator seem to be mutually exclusive.
>
> Using PG_slab would also give an instant check to things like SLOB's kfree().
>
> > +#ifdef CONFIG_SLAB
> >        if (PageSlab(page))
> > +#endif
>
> This is not a valid workaround as the object won't necessarily have been
> allocated from a slab (shared ramfs mappings and SYSV SHM for example).  You
> may not pass to ksize() objects allocated by means other than SLAB/SLOB.
>
> David
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-04 10:21 ` David Howells
  2006-09-05  3:52   ` Aubrey
@ 2006-09-05  9:35   ` David Howells
  2006-09-06  2:35     ` Aubrey
  1 sibling, 1 reply; 23+ messages in thread
From: David Howells @ 2006-09-05  9:35 UTC (permalink / raw)
  To: Aubrey; +Cc: David Howells, linux-kernel, mpm, davidm, gerg

Aubrey <aubreylee@gmail.com> wrote:

> IMHO the problem is nommu.c is written for slab only. So when slob is
> enabled, it need to be considered to make some modification to make
> two or more memory allocator algorithms work properly, rather than to
> force all others algorithm to be compatible with the current one(slab)
> to match the code in the nommu.c, which is not common enough.
>
> Does that make sense?

No, not really.

The point is that kobjsize() needs to determine the size of the object it has
been asked to assess.  It knows how to do that directly if the page is
allocated by the main page allocator, but not if the page belongs to the slab
allocator.  The quickest way it can determine this is to look at PG_slab.  In
such a case it defers to the slab allocator for a determination.

What I don't want to happen is that we have to defer immediately to the slob
allocator which then goes and searches various lists to see if it owns the
page.  Remember: unless the page is _marked_ as belonging to the slob
allocator, the slob allocator may _not_ assume any of the metadata in struct
page is valid slob metadata.  It _has_ to determine the validity of the page
by other means _before_ it can use the metadata, and that most likely means a
search.  This is why PG_slab exists: if it is set, you _know_ you can
instantly trust the metadata.

Since slob appears to be an entry-point-by-entry-point replacement for the
slab allocator, the slob allocator can also mark its pages for anything that's
looking to defer to it using PG_slab since the presence of slab and slob are
mutually exclusive.

Also, we already have two major memory allocator algorithms in the kernel at
any one time: (1) the main page allocator and (2) slab or slob.  We don't
really want to start going to three or more.


So, I come back to the main question: Why don't you want to use PG_slab?

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-05  9:35   ` David Howells
@ 2006-09-06  2:35     ` Aubrey
  2006-09-06  3:36       ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Aubrey @ 2006-09-06  2:35 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel, mpm, davidm, gerg

Yeah, I agree with most of your opinion. Using PG_slab is really a
quickest way to determine the size of the object. But I think using a
flag named "PG_slab" on a memory algorithm named "slob" seems not
reasonable. It may confuse the people who start to read the kernel
source code. So I'm writing to ask if there is a better solution to
fix the issue.

-Aubrey

On 9/5/06, David Howells <dhowells@redhat.com> wrote:
> Aubrey <aubreylee@gmail.com> wrote:
>
> > IMHO the problem is nommu.c is written for slab only. So when slob is
> > enabled, it need to be considered to make some modification to make
> > two or more memory allocator algorithms work properly, rather than to
> > force all others algorithm to be compatible with the current one(slab)
> > to match the code in the nommu.c, which is not common enough.
> >
> > Does that make sense?
>
> No, not really.
>
> The point is that kobjsize() needs to determine the size of the object it has
> been asked to assess.  It knows how to do that directly if the page is
> allocated by the main page allocator, but not if the page belongs to the slab
> allocator.  The quickest way it can determine this is to look at PG_slab.  In
> such a case it defers to the slab allocator for a determination.
>
> What I don't want to happen is that we have to defer immediately to the slob
> allocator which then goes and searches various lists to see if it owns the
> page.  Remember: unless the page is _marked_ as belonging to the slob
> allocator, the slob allocator may _not_ assume any of the metadata in struct
> page is valid slob metadata.  It _has_ to determine the validity of the page
> by other means _before_ it can use the metadata, and that most likely means a
> search.  This is why PG_slab exists: if it is set, you _know_ you can
> instantly trust the metadata.
>
> Since slob appears to be an entry-point-by-entry-point replacement for the
> slab allocator, the slob allocator can also mark its pages for anything that's
> looking to defer to it using PG_slab since the presence of slab and slob are
> mutually exclusive.
>
> Also, we already have two major memory allocator algorithms in the kernel at
> any one time: (1) the main page allocator and (2) slab or slob.  We don't
> really want to start going to three or more.
>
>
> So, I come back to the main question: Why don't you want to use PG_slab?
>
> David
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-06  2:35     ` Aubrey
@ 2006-09-06  3:36       ` Nick Piggin
  2006-09-12  8:07         ` Aubrey
  2006-09-12  8:54         ` David Howells
  0 siblings, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2006-09-06  3:36 UTC (permalink / raw)
  To: Aubrey; +Cc: David Howells, linux-kernel, mpm, davidm, gerg

Aubrey wrote:

> Yeah, I agree with most of your opinion. Using PG_slab is really a
> quickest way to determine the size of the object. But I think using a
> flag named "PG_slab" on a memory algorithm named "slob" seems not
> reasonable. It may confuse the people who start to read the kernel
> source code. So I'm writing to ask if there is a better solution to
> fix the issue.


No, confusing would be a "slab replacement" that doesn't provide the same
API as slab and thus requires users to use ifdefs.

I've already suggested exact same thing as David in the exact same situation
about 6 months ago. It is the right way to go.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-06  3:36       ` Nick Piggin
@ 2006-09-12  8:07         ` Aubrey
  2006-09-12 17:43           ` Matt Mackall
                             ` (2 more replies)
  2006-09-12  8:54         ` David Howells
  1 sibling, 3 replies; 23+ messages in thread
From: Aubrey @ 2006-09-12  8:07 UTC (permalink / raw)
  To: Nick Piggin; +Cc: David Howells, linux-kernel, mpm, davidm, gerg

On 9/6/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Aubrey wrote:
>
> > Yeah, I agree with most of your opinion. Using PG_slab is really a
> > quickest way to determine the size of the object. But I think using a
> > flag named "PG_slab" on a memory algorithm named "slob" seems not
> > reasonable. It may confuse the people who start to read the kernel
> > source code. So I'm writing to ask if there is a better solution to
> > fix the issue.
>
>
> No, confusing would be a "slab replacement" that doesn't provide the same
> API as slab and thus requires users to use ifdefs.
>
> I've already suggested exact same thing as David in the exact same situation
> about 6 months ago. It is the right way to go.

OK. Here is the patch and work properly on my side.
Welcome any suggestions and comments.

Thanks,
-Aubrey
=================================================================
diff --git a/mm/slob.c b/mm/slob.c
index 7b52b20..18ed170 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp

 			slob_free(cur, PAGE_SIZE);
 			spin_lock_irqsave(&slob_lock, flags);
+			SetPageSlab(virt_to_page(cur));
 			cur = slobfree;
 		}
 	}
@@ -162,6 +163,8 @@ void *kmalloc(size_t size, gfp_t gfp)
 	slob_t *m;
 	bigblock_t *bb;
 	unsigned long flags;
+	int i;
+	struct page *page;

 	if (size < PAGE_SIZE - SLOB_UNIT) {
 		m = slob_alloc(size + SLOB_UNIT, gfp, 0);
@@ -173,12 +176,17 @@ void *kmalloc(size_t size, gfp_t gfp)
 		return 0;

 	bb->order = find_order(size);
-	bb->pages = (void *)__get_free_pages(gfp, bb->order);
+	page = alloc_pages(gfp, bb->order);
+	bb->pages = (void *)page_address(page);

 	if (bb->pages) {
 		spin_lock_irqsave(&block_lock, flags);
 		bb->next = bigblocks;
 		bigblocks = bb;
+		for (i = 0; i < (1 << bb->order); i++) {
+			SetPageSlab(page);
+			page++;
+		}
 		spin_unlock_irqrestore(&block_lock, flags);
 		return bb->pages;
 	}
@@ -202,8 +210,16 @@ void kfree(const void *block)
 		spin_lock_irqsave(&block_lock, flags);
 		for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
 			if (bb->pages == block) {
+				struct page *page = virt_to_page(bb->pages);
+				int i;
+
 				*last = bb->next;
 				spin_unlock_irqrestore(&block_lock, flags);
+				for (i = 0; i < (1 << bb->order); i++) {
+					if (!TestClearPageSlab(page))
+						BUG();
+					page++;
+				}
 				free_pages((unsigned long)block, bb->order);
 				slob_free(bb, sizeof(bigblock_t));
 				return;
@@ -332,10 +348,12 @@ static struct timer_list slob_timer = TI

 void kmem_cache_init(void)
 {
+#if 0
 	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);

 	if (p)
 		free_page((unsigned long)p);
+#endif

 	mod_timer(&slob_timer, jiffies + HZ);
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-06  3:36       ` Nick Piggin
  2006-09-12  8:07         ` Aubrey
@ 2006-09-12  8:54         ` David Howells
  2006-09-12 10:53           ` Aubrey
  1 sibling, 1 reply; 23+ messages in thread
From: David Howells @ 2006-09-12  8:54 UTC (permalink / raw)
  To: Aubrey; +Cc: Nick Piggin, David Howells, linux-kernel, mpm, davidm, gerg

Aubrey <aubreylee@gmail.com> wrote:

> OK. Here is the patch and work properly on my side.
> Welcome any suggestions and comments.

It looks reasonable.  Don't forget to sign off the patch.

> void kmem_cache_init(void)
> {
> +#if 0
> 	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
> 
> 	if (p)
> 		free_page((unsigned long)p);
> +#endif

Any idea what that's about?

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12  8:54         ` David Howells
@ 2006-09-12 10:53           ` Aubrey
  0 siblings, 0 replies; 23+ messages in thread
From: Aubrey @ 2006-09-12 10:53 UTC (permalink / raw)
  To: David Howells; +Cc: Nick Piggin, linux-kernel, mpm, davidm, gerg

On 9/12/06, David Howells <dhowells@redhat.com> wrote:
> Aubrey <aubreylee@gmail.com> wrote:
>
> > OK. Here is the patch and work properly on my side.
> > Welcome any suggestions and comments.
>
> It looks reasonable.  Don't forget to sign off the patch.
>
> > void kmem_cache_init(void)
> > {
> > +#if 0
> >       void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
> >
> >       if (p)
> >               free_page((unsigned long)p);
> > +#endif
>
> Any idea what that's about?
>
IMHO kmem_cache_init() needn't to do anything for the slob allocation.

In addition, the original code allocate one page and then free it.
Here, with the patch, slob_alloc() will set the SLAB flag on the page,
and the page with this flag can't pass the free_pages_check(), if so,
the kernel will run into bad_page() panic.

So, I removed this piece of code for the draft of the patch.
Welcome any better idea.

Regards,
-Aubrey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12  8:07         ` Aubrey
@ 2006-09-12 17:43           ` Matt Mackall
  2006-09-12 19:10           ` David Howells
  2006-09-12 19:25           ` David Howells
  2 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 17:43 UTC (permalink / raw)
  To: Aubrey; +Cc: Nick Piggin, David Howells, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 04:07:03PM +0800, Aubrey wrote:
> On 9/6/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >Aubrey wrote:
> >
> >> Yeah, I agree with most of your opinion. Using PG_slab is really a
> >> quickest way to determine the size of the object. But I think using a
> >> flag named "PG_slab" on a memory algorithm named "slob" seems not
> >> reasonable. It may confuse the people who start to read the kernel
> >> source code. So I'm writing to ask if there is a better solution to
> >> fix the issue.
> >
> >
> >No, confusing would be a "slab replacement" that doesn't provide the same
> >API as slab and thus requires users to use ifdefs.
> >
> >I've already suggested exact same thing as David in the exact same 
> >situation
> >about 6 months ago. It is the right way to go.
> 
> OK. Here is the patch and work properly on my side.
> Welcome any suggestions and comments.
> 
> Thanks,
> -Aubrey
> =================================================================
> diff --git a/mm/slob.c b/mm/slob.c
> index 7b52b20..18ed170 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp
> 
> 			slob_free(cur, PAGE_SIZE);
> 			spin_lock_irqsave(&slob_lock, flags);
> +			SetPageSlab(virt_to_page(cur));
> 			cur = slobfree;
> 		}
> 	}
> @@ -162,6 +163,8 @@ void *kmalloc(size_t size, gfp_t gfp)
> 	slob_t *m;
> 	bigblock_t *bb;
> 	unsigned long flags;
> +	int i;
> +	struct page *page;
> 
> 	if (size < PAGE_SIZE - SLOB_UNIT) {
> 		m = slob_alloc(size + SLOB_UNIT, gfp, 0);
> @@ -173,12 +176,17 @@ void *kmalloc(size_t size, gfp_t gfp)
> 		return 0;
> 
> 	bb->order = find_order(size);
> -	bb->pages = (void *)__get_free_pages(gfp, bb->order);
> +	page = alloc_pages(gfp, bb->order);
> +	bb->pages = (void *)page_address(page);

Useless (void *) casts are out of fashion. You shouldn't copy mine.

> 	if (bb->pages) {
> 		spin_lock_irqsave(&block_lock, flags);
> 		bb->next = bigblocks;
> 		bigblocks = bb;
> +		for (i = 0; i < (1 << bb->order); i++) {
> +			SetPageSlab(page);
> +			page++;
> +		}

for ( ; page < page + (1 << bb->order), page++)
      SetPageSlab(page);

I'm not sure it even makes sense to mark the bigblock pages,
especially for the purposes of kobjsize. But see below.

> 		spin_unlock_irqrestore(&block_lock, flags);
> 		return bb->pages;
> 	}
> @@ -202,8 +210,16 @@ void kfree(const void *block)
> 		spin_lock_irqsave(&block_lock, flags);
> 		for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
> 			if (bb->pages == block) {
> +				struct page *page = virt_to_page(bb->pages);
> +				int i;
> +
> 				*last = bb->next;
> 				spin_unlock_irqrestore(&block_lock, flags);
> +				for (i = 0; i < (1 << bb->order); i++) {
> +					if (!TestClearPageSlab(page))
> +						BUG();
> +					page++;
> +				}

Please drop the BUG. We've already established it's on our lists by
this point.

> void kmem_cache_init(void)
> {
> +#if 0
> 	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
> 
> 	if (p)
> 		free_page((unsigned long)p);
> +#endif
> 
> 	mod_timer(&slob_timer, jiffies + HZ);
> }

You just broke the bit that shrinks the arena.


I don't like this patch. We've just grown SLOB by about 10% everywhere
to make nommu happy. Is this needed because nommu is doing something
special for MMU-less machines or because it's doing something bogus?

To my eyes, the answer seems to clearly be the latter. There's nothing
nommu-specific about kobjsize at all, so it ought to be in the main
kernel if it's actually kosher. But kobjsize tries to guess the size
of an object of unknown type, which is a very suspicious business. If
you've got a pointer, I would propose you ought to know what type it
is, or ought not mess with it at all.

Looking through all the users of kobjsize, it seems we always know
what the type is (and it's usually a VMA). I instead propose we use
ksize on objects we know to be SLAB/SLOB-allocated and add a new
function (kpagesize?) to size other objects where nommu needs it.

As for whether SLOB should emulate PG_slab on general principles, as
far as I can tell, PG_slab should be considered a private debugging
aid to SLAB. All the other users look rather bogus to my eye.

If anything, SLOB should internally abuse PG_slab to mark big pages
and dispense with its bigblocks lists.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-12  8:07         ` Aubrey
  2006-09-12 17:43           ` Matt Mackall
@ 2006-09-12 19:10           ` David Howells
  2006-09-12 20:51             ` Matt Mackall
                               ` (2 more replies)
  2006-09-12 19:25           ` David Howells
  2 siblings, 3 replies; 23+ messages in thread
From: David Howells @ 2006-09-12 19:10 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Aubrey, Nick Piggin, David Howells, linux-kernel, davidm, gerg

Matt Mackall <mpm@selenic.com> wrote:

> > +		for (i = 0; i < (1 << bb->order); i++) {
> > +			SetPageSlab(page);
> > +			page++;
> > +		}
> 
> for ( ; page < page + (1 << bb->order), page++)
>       SetPageSlab(page);

Ugh.  No.  You can't do that.  "page < page + X" will be true until "page + X"
wraps the end of memory.

> > +				for (i = 0; i < (1 << bb->order); i++) {
> > +					if (!TestClearPageSlab(page))
> > +						BUG();
> > +					page++;
> > +				}
> 
> Please drop the BUG. We've already established it's on our lists by
> this point.

I disagree.  Let's catch accidental reuse of pages.  It should, however, be
marked unlikely().

> You just broke the bit that shrinks the arena.

How?  This is only called once when things are being initialised.  There can
be no SLOB objects allocated prior to that point.

> I don't like this patch. We've just grown SLOB by about 10% everywhere
> to make nommu happy. Is this needed because nommu is doing something
> special for MMU-less machines or because it's doing something bogus?

See preceding discussion on LKML.

kobjsize() needs to work out how to calculate the size of an object.  One of
the main classes of object it might be given is slab/slob objects.  Either
these should be marked with PG_slab as per the slab allocator, of kobjsize()
has to go and walk all the slab allocator or slob allocator lists to find out
whether this page belongs to them.  Only then can it know whether or not it
can trust the page metadata as slab/slob metadata or not.

> Looking through all the users of kobjsize, it seems we always know
> what the type is (and it's usually a VMA). I instead propose we use
> ksize on objects we know to be SLAB/SLOB-allocated and add a new
> function (kpagesize?) to size other objects where nommu needs it.

Yes, we might have a VMA, but no, we do not know how big the object we've
_actually_ been given by whoever is.  kmalloc() doesn't tell us that and
get_unmapped_area() doesn't tell us that but we want it to account correctly
for the ammount of memory allocated.

You say "use ksize on objects we know to be SLAB/SLOB-allocated" which is the
whole point of PG_slab.

> As for whether SLOB should emulate PG_slab on general principles, as
> far as I can tell, PG_slab should be considered a private debugging
> aid to SLAB. All the other users look rather bogus to my eye.
>
> If anything, SLOB should internally abuse PG_slab to mark big pages
> and dispense with its bigblocks lists.

It's not abuse.  SLOB is a drop-in replacement for the SLAB allocator,
therefore PG_slab belongs to it instead of SLAB.


Anyway, if you've got a better idea, then patch please.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-12  8:07         ` Aubrey
  2006-09-12 17:43           ` Matt Mackall
  2006-09-12 19:10           ` David Howells
@ 2006-09-12 19:25           ` David Howells
  2006-09-12 20:28             ` Matt Mackall
                               ` (2 more replies)
  2 siblings, 3 replies; 23+ messages in thread
From: David Howells @ 2006-09-12 19:25 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Aubrey, Nick Piggin, David Howells, linux-kernel, davidm, gerg

Matt Mackall <mpm@selenic.com> wrote:

> Looking through all the users of kobjsize, it seems we always know
> what the type is (and it's usually a VMA). I instead propose we use
> ksize on objects we know to be SLAB/SLOB-allocated and add a new
> function (kpagesize?) to size other objects where nommu needs it.

It sounds like we'd need an op in the VMA to do the per-type size thing (the
VMA itself not the VMA ops table).

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 19:25           ` David Howells
@ 2006-09-12 20:28             ` Matt Mackall
  2006-09-12 20:49             ` Matt Mackall
  2006-09-12 21:02             ` David Howells
  2 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 20:28 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 08:25:04PM +0100, David Howells wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > Looking through all the users of kobjsize, it seems we always know
> > what the type is (and it's usually a VMA). I instead propose we use
> > ksize on objects we know to be SLAB/SLOB-allocated and add a new
> > function (kpagesize?) to size other objects where nommu needs it.
> 
> It sounds like we'd need an op in the VMA to do the per-type size thing (the
> VMA itself not the VMA ops table).

Not sure yet. There's only one user in nommu.c that shouldn't just be
changed to ksize() that I can see, and that's the one in
show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
hands off otherwise. 

I can imagine situations where ->mmap returns pointers to something
that's statically allocated anyway (XIP?), where kobjsize doesn't
really make sense.

Also, looks like the WARN_ON_SLACK code has rotten, result isn't
defined in that function. Change it to base, perhaps?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 19:25           ` David Howells
  2006-09-12 20:28             ` Matt Mackall
@ 2006-09-12 20:49             ` Matt Mackall
  2006-09-12 21:02             ` David Howells
  2 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 20:49 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 08:25:04PM +0100, David Howells wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > Looking through all the users of kobjsize, it seems we always know
> > what the type is (and it's usually a VMA). I instead propose we use
> > ksize on objects we know to be SLAB/SLOB-allocated and add a new
> > function (kpagesize?) to size other objects where nommu needs it.
> 
> It sounds like we'd need an op in the VMA to do the per-type size thing (the
> VMA itself not the VMA ops table).

On looking closer, one place where both the current code and my
proposed change are wrong are measurements on the init task, where
many elements of task_struct are statically allocated.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 19:10           ` David Howells
@ 2006-09-12 20:51             ` Matt Mackall
  2006-09-12 20:56             ` David Howells
  2006-09-13  2:16             ` Nick Piggin
  2 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 20:51 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 08:10:32PM +0100, David Howells wrote:
> > You just broke the bit that shrinks the arena.
> 
> How?  This is only called once when things are being initialised.  There can
> be no SLOB objects allocated prior to that point.

It's on a timer.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-12 19:10           ` David Howells
  2006-09-12 20:51             ` Matt Mackall
@ 2006-09-12 20:56             ` David Howells
  2006-09-12 21:04               ` Matt Mackall
  2006-09-12 21:59               ` David Howells
  2006-09-13  2:16             ` Nick Piggin
  2 siblings, 2 replies; 23+ messages in thread
From: David Howells @ 2006-09-12 20:56 UTC (permalink / raw)
  To: Matt Mackall
  Cc: David Howells, Aubrey, Nick Piggin, linux-kernel, davidm, gerg

Matt Mackall <mpm@selenic.com> wrote:

> > > You just broke the bit that shrinks the arena.
> > 
> > How?  This is only called once when things are being initialised.  There can
> > be no SLOB objects allocated prior to that point.
> 
> It's on a timer.

So what then?  The timer is still initialised:

	void kmem_cache_init(void)
	{
	+#if 0
		void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);

		if (p)
			free_page((unsigned long)p);
	+#endif

		mod_timer(&slob_timer, jiffies + HZ);
	}

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-12 19:25           ` David Howells
  2006-09-12 20:28             ` Matt Mackall
  2006-09-12 20:49             ` Matt Mackall
@ 2006-09-12 21:02             ` David Howells
  2006-09-12 21:15               ` Matt Mackall
  2 siblings, 1 reply; 23+ messages in thread
From: David Howells @ 2006-09-12 21:02 UTC (permalink / raw)
  To: Matt Mackall
  Cc: David Howells, Aubrey, Nick Piggin, linux-kernel, davidm, gerg

Matt Mackall <mpm@selenic.com> wrote:

> Not sure yet. There's only one user in nommu.c that shouldn't just be
> changed to ksize() that I can see, and that's the one in
> show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
> hands off otherwise. 

Hmmm...  You're right.  However, note binfmt_elf_fdpic().  This calls ksize()
but should really call kobjsize().  It should not assume that the allocation
it's been given is of any particular type.  IIRC ksize() changed purpose at
some point.

> I can imagine situations where ->mmap returns pointers to something
> that's statically allocated anyway (XIP?), where kobjsize doesn't
> really make sense.

ramfs for example.  See get_unmapped_area() hooks, not mmap() hooks.

> Also, looks like the WARN_ON_SLACK code has rotten, result isn't
> defined in that function. Change it to base, perhaps?

Yeah.  It might be worth ditching it entirely too.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 20:56             ` David Howells
@ 2006-09-12 21:04               ` Matt Mackall
  2006-09-12 21:59               ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 21:04 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 09:56:46PM +0100, David Howells wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > > > You just broke the bit that shrinks the arena.
> > > 
> > > How?  This is only called once when things are being initialised.  There can
> > > be no SLOB objects allocated prior to that point.
> > 
> > It's on a timer.
> 
> So what then?  The timer is still initialised:
> 
> 	void kmem_cache_init(void)
> 	{
> 	+#if 0
> 		void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
> 
> 		if (p)
> 			free_page((unsigned long)p);
> 	+#endif
> 
> 		mod_timer(&slob_timer, jiffies + HZ);
> 	}

"allocate a page from the slob arena"
"if successful, release it to the page allocator"
"re-arm timer"

The only tricky part is the timer points back to _this very function_.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 21:02             ` David Howells
@ 2006-09-12 21:15               ` Matt Mackall
  0 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-12 21:15 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

On Tue, Sep 12, 2006 at 10:02:44PM +0100, David Howells wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > Not sure yet. There's only one user in nommu.c that shouldn't just be
> > changed to ksize() that I can see, and that's the one in
> > show_process_blocks(). That could test for VM_MAPPED_COPY and keep its
> > hands off otherwise. 
> 
> Hmmm...  You're right.  However, note binfmt_elf_fdpic().  This calls ksize()
> but should really call kobjsize().  It should not assume that the allocation
> it's been given is of any particular type. 

I presume you mean load_elf_fdpic_binary, which is doing:

        fullsize = ksize((char *) current->mm->start_brk);

That's a little troubling.

> IIRC ksize() changed purpose at some point.

Uh, nope. ksize doesn't even exist in 2.4 and has always done the same
thing in 2.6.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator 
  2006-09-12 20:56             ` David Howells
  2006-09-12 21:04               ` Matt Mackall
@ 2006-09-12 21:59               ` David Howells
  2006-09-13 19:39                 ` Matt Mackall
  1 sibling, 1 reply; 23+ messages in thread
From: David Howells @ 2006-09-12 21:59 UTC (permalink / raw)
  To: Matt Mackall
  Cc: David Howells, Aubrey, Nick Piggin, linux-kernel, davidm, gerg

Matt Mackall <mpm@selenic.com> wrote:

> The only tricky part is the timer points back to _this very function_.

OIC.  Brrr.  That _definitely_ needs commenting - as has been demonstrated.
SLOB is using the theoretically one-shot initiator to do garbage collection.

The:

			if (size == PAGE_SIZE) /* trying to shrink arena? */
				return 0;

In slob_alloc() definitely looks very dodgy, now that I see it.  What happens
if some normal user asks to allocate exactly a page?  Oh... I suppose it never
gets into slob_alloc() to allocate the main piece of storage.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 19:10           ` David Howells
  2006-09-12 20:51             ` Matt Mackall
  2006-09-12 20:56             ` David Howells
@ 2006-09-13  2:16             ` Nick Piggin
  2006-09-13  7:21               ` Aubrey
  2 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2006-09-13  2:16 UTC (permalink / raw)
  To: David Howells; +Cc: Matt Mackall, Aubrey, linux-kernel, davidm, gerg

David Howells wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> 
> 
>>>+		for (i = 0; i < (1 << bb->order); i++) {
>>>+			SetPageSlab(page);
>>>+			page++;
>>>+		}
>>
>>for ( ; page < page + (1 << bb->order), page++)
>>      SetPageSlab(page);
> 
> 
> Ugh.  No.  You can't do that.  "page < page + X" will be true until "page + X"
> wraps the end of memory.
> 
> 
>>>+				for (i = 0; i < (1 << bb->order); i++) {
>>>+					if (!TestClearPageSlab(page))
>>>+						BUG();
>>>+					page++;
>>>+				}
>>
>>Please drop the BUG. We've already established it's on our lists by
>>this point.
> 
> 
> I disagree.  Let's catch accidental reuse of pages.  It should, however, be
> marked unlikely().

If you do this, the biggest problem with those ops is that they are atomic,
and the latter also requires strong memory barriers. Don't use RMW variants,
and use __ prepended iff you are the only user of the page at this point.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-13  2:16             ` Nick Piggin
@ 2006-09-13  7:21               ` Aubrey
  0 siblings, 0 replies; 23+ messages in thread
From: Aubrey @ 2006-09-13  7:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: David Howells, Matt Mackall, linux-kernel, davidm, gerg

[-- Attachment #1: Type: text/plain, Size: 1981 bytes --]

I changed my patch with some of your suggestions. Anyway, my system
works properly with the patch while the current kernel does not.

From: Aubrey.Li <aubrey@linux-suse.ADI>
Date: Wed, 13 Sep 2006 15:01:23 +0800
Subject: [PATCH] Make the SLOB allocator mark its pages with PG_slab.
Signed-off-by: Aubrey.Li <aubrey.adi@gmail.com>
---
 mm/slob.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index 7b52b20..05f0d16 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp

 			slob_free(cur, PAGE_SIZE);
 			spin_lock_irqsave(&slob_lock, flags);
+			SetPageSlab(virt_to_page(cur));
 			cur = slobfree;
 		}
 	}
@@ -173,12 +174,17 @@ void *kmalloc(size_t size, gfp_t gfp)
 		return 0;

 	bb->order = find_order(size);
-	bb->pages = (void *)__get_free_pages(gfp, bb->order);
+	page = alloc_pages(gfp, bb->order);
+	bb->pages = page_address(page);

 	if (bb->pages) {
 		spin_lock_irqsave(&block_lock, flags);
 		bb->next = bigblocks;
 		bigblocks = bb;
+		for (i = 0; i < (1 << bb->order); i++) {
+			SetPageSlab(page);
+			page++;
+		}
 		spin_unlock_irqrestore(&block_lock, flags);
 		return bb->pages;
 	}
@@ -202,8 +208,15 @@ void kfree(const void *block)
 		spin_lock_irqsave(&block_lock, flags);
 		for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
 			if (bb->pages == block) {
+				struct page *page = virt_to_page(bb->pages);
+				int i;
+
 				*last = bb->next;
 				spin_unlock_irqrestore(&block_lock, flags);
+				for (i = 0; i < (1 << bb->order); i++) {
+					ClearPageSlab(page);
+					page++;
+				}
 				free_pages((unsigned long)block, bb->order);
 				slob_free(bb, sizeof(bigblock_t));
 				return;
@@ -332,11 +345,6 @@ static struct timer_list slob_timer = TI

 void kmem_cache_init(void)
 {
-	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
-
-	if (p)
-		free_page((unsigned long)p);
-
 	mod_timer(&slob_timer, jiffies + HZ);
 }

-- 
1.4.0

[-- Attachment #2: 0001-Make-the-SLOB-allocator-mark-its-pages-with-PG_slab.txt --]
[-- Type: text/plain, Size: 1924 bytes --]

From 6e2e615a2f59f5b8d1b8312c5f8cc596b927be9f Mon Sep 17 00:00:00 2001
From: Aubrey.Li <aubrey@linux-suse.ADI>
Date: Wed, 13 Sep 2006 15:01:23 +0800
Subject: [PATCH] Make the SLOB allocator mark its pages with PG_slab.
Signed-off-by: Aubrey.Li <aubrey.adi@gmail.com>
---
 mm/slob.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index 7b52b20..05f0d16 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -109,6 +109,7 @@ static void *slob_alloc(size_t size, gfp
 
 			slob_free(cur, PAGE_SIZE);
 			spin_lock_irqsave(&slob_lock, flags);
+			SetPageSlab(virt_to_page(cur));
 			cur = slobfree;
 		}
 	}
@@ -173,12 +174,17 @@ void *kmalloc(size_t size, gfp_t gfp)
 		return 0;
 
 	bb->order = find_order(size);
-	bb->pages = (void *)__get_free_pages(gfp, bb->order);
+	page = alloc_pages(gfp, bb->order);
+	bb->pages = page_address(page);
 
 	if (bb->pages) {
 		spin_lock_irqsave(&block_lock, flags);
 		bb->next = bigblocks;
 		bigblocks = bb;
+		for (i = 0; i < (1 << bb->order); i++) {
+			SetPageSlab(page);
+			page++;
+		}
 		spin_unlock_irqrestore(&block_lock, flags);
 		return bb->pages;
 	}
@@ -202,8 +208,15 @@ void kfree(const void *block)
 		spin_lock_irqsave(&block_lock, flags);
 		for (bb = bigblocks; bb; last = &bb->next, bb = bb->next) {
 			if (bb->pages == block) {
+				struct page *page = virt_to_page(bb->pages);
+				int i;
+
 				*last = bb->next;
 				spin_unlock_irqrestore(&block_lock, flags);
+				for (i = 0; i < (1 << bb->order); i++) {
+					ClearPageSlab(page);
+					page++;
+				}
 				free_pages((unsigned long)block, bb->order);
 				slob_free(bb, sizeof(bigblock_t));
 				return;
@@ -332,11 +345,6 @@ static struct timer_list slob_timer = TI
 
 void kmem_cache_init(void)
 {
-	void *p = slob_alloc(PAGE_SIZE, 0, PAGE_SIZE-1);
-
-	if (p)
-		free_page((unsigned long)p);
-
 	mod_timer(&slob_timer, jiffies + HZ);
 }
 
-- 
1.4.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel BUGs when removing largish files with the SLOB allocator
  2006-09-12 21:59               ` David Howells
@ 2006-09-13 19:39                 ` Matt Mackall
  0 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2006-09-13 19:39 UTC (permalink / raw)
  To: David Howells; +Cc: Aubrey, Nick Piggin, linux-kernel, davidm, gerg

Another issue that occurred to me last night is that the size of
objects allocated with SLOB's slab-like API are implicit and not
calculable from the object. kmalloc'ed objects, in contrast, have a
header that contains the object size.

So ksize(kmalloc(...)) works, but not ksize(kmem_cache_alloc(...)). I
don't know if anything in the kernel is using the latter aside from
kobjsize.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-09-13 19:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-04  6:56 kernel BUGs when removing largish files with the SLOB allocator Aubrey
2006-09-04 10:21 ` David Howells
2006-09-05  3:52   ` Aubrey
2006-09-05  9:35   ` David Howells
2006-09-06  2:35     ` Aubrey
2006-09-06  3:36       ` Nick Piggin
2006-09-12  8:07         ` Aubrey
2006-09-12 17:43           ` Matt Mackall
2006-09-12 19:10           ` David Howells
2006-09-12 20:51             ` Matt Mackall
2006-09-12 20:56             ` David Howells
2006-09-12 21:04               ` Matt Mackall
2006-09-12 21:59               ` David Howells
2006-09-13 19:39                 ` Matt Mackall
2006-09-13  2:16             ` Nick Piggin
2006-09-13  7:21               ` Aubrey
2006-09-12 19:25           ` David Howells
2006-09-12 20:28             ` Matt Mackall
2006-09-12 20:49             ` Matt Mackall
2006-09-12 21:02             ` David Howells
2006-09-12 21:15               ` Matt Mackall
2006-09-12  8:54         ` David Howells
2006-09-12 10:53           ` Aubrey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).