LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Allocate module.ref array dynamically
@ 2008-11-11 20:01 Takashi Iwai
  2008-11-11 22:06 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Takashi Iwai @ 2008-11-11 20:01 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andreas Gruenbacher, Jan Blunck, Andrew Morton, linux-kernel

Hi,

we found that the kernel module sizes and memory footprints
grow drastically when NR_CPUS is high.  For example, with
NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
info).

A part of the reason is the fixed size array in struct module.
The patch below fixes the problem by allocating it dynamically.
With the patch, the size can go down to 20MB.


Any comments/suggestions appreciated.

thanks,

Takashi

==

From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] Allocate module.ref array dynamically

This patch makes the module handling code to allocate the ref
array of each module struct dynamically.  It saves both module
disk space and memory footprints when number of CPUs is high
like 4096.

Reference: Novell bnc#425240
	https://bugzilla.novell.com/show_bug.cgi?id=425240

Signed-off-by: Takashi Iwai <tiwai@suse.de>

---
 include/linux/module.h |    2 +-
 kernel/module.c        |   21 ++++++++++++++++-----
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -559,17 +559,24 @@ static char last_unloaded_module[MODULE_
 
 #ifdef CONFIG_MODULE_UNLOAD
 /* Init the unload section of the module. */
-static void module_unload_init(struct module *mod)
+static int module_unload_init(struct module *mod)
 {
 	unsigned int i;
 
 	INIT_LIST_HEAD(&mod->modules_which_use_me);
-	for (i = 0; i < NR_CPUS; i++)
+
+	mod->ref = kcalloc(nr_cpu_ids, sizeof(*mod->ref), GFP_KERNEL);
+	if (!mod->ref)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_cpu_ids; i++)
 		local_set(&mod->ref[i].count, 0);
 	/* Hold reference count during initialization. */
 	local_set(&mod->ref[raw_smp_processor_id()].count, 1);
 	/* Backwards compatibility macros put refcount during init. */
 	mod->waiter = current;
+
+	return 0;
 }
 
 /* modules using other modules */
@@ -649,6 +656,7 @@ static void module_unload_free(struct mo
 			}
 		}
 	}
+	kfree(mod->ref);
 }
 
 #ifdef CONFIG_MODULE_FORCE_UNLOAD
@@ -707,7 +715,7 @@ unsigned int module_refcount(struct modu
 {
 	unsigned int i, total = 0;
 
-	for (i = 0; i < NR_CPUS; i++)
+	for (i = 0; i < nr_cpu_ids; i++)
 		total += local_read(&mod->ref[i].count);
 	return total;
 }
@@ -896,8 +904,9 @@ static inline int use_module(struct modu
 	return strong_try_module_get(b) == 0;
 }
 
-static inline void module_unload_init(struct module *mod)
+static inline int module_unload_init(struct module *mod)
 {
+	return 0;
 }
 #endif /* CONFIG_MODULE_UNLOAD */
 
@@ -2123,7 +2132,9 @@ static noinline struct module *load_modu
 	mod = (void *)sechdrs[modindex].sh_addr;
 
 	/* Now we've moved module, initialize linked lists, etc. */
-	module_unload_init(mod);
+	err = module_unload_init(mod);
+	if (err)
+		goto free_unload;
 
 	/* add kobject, so we can reference it. */
 	err = mod_sysfs_init(mod);
diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -349,7 +349,7 @@ struct module
 	void (*exit)(void);
 
 	/* Reference counts */
-	struct module_ref ref[NR_CPUS];
+	struct module_ref *ref;
 #endif
 };
 #ifndef MODULE_ARCH_INIT

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-11 20:01 [PATCH] Allocate module.ref array dynamically Takashi Iwai
@ 2008-11-11 22:06 ` Eric Dumazet
  2008-11-11 22:15   ` Eric Dumazet
                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Eric Dumazet @ 2008-11-11 22:06 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton,
	linux-kernel, Mike Travis

Takashi Iwai a écrit :
> Hi,
> 
> we found that the kernel module sizes and memory footprints
> grow drastically when NR_CPUS is high.  For example, with
> NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
> info).
> 
> A part of the reason is the fixed size array in struct module.
> The patch below fixes the problem by allocating it dynamically.
> With the patch, the size can go down to 20MB.
> 
> 
> Any comments/suggestions appreciated.
> 

Many attempts were done on this area on the past.

Your patch has the drawback of using kcalloc(), while previously, module_ref
space was allocated with vmalloc().

After a while, a machine could have a lot of vmalloc() space available, but not enough
physically contiguous space to fullfill a kmalloc(large_area) call.

So a module load could fail, while previous code could load module.

I believe Mike Travis has a better patch for this problem, partly using new percpu allocator.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-11 22:06 ` Eric Dumazet
@ 2008-11-11 22:15   ` Eric Dumazet
  2008-11-11 22:32     ` Takashi Iwai
  2008-11-11 22:26   ` Takashi Iwai
  2008-11-12  1:44   ` Rusty Russell
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2008-11-11 22:15 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton,
	linux-kernel, Mike Travis

Eric Dumazet a écrit :
> Takashi Iwai a écrit :
>> Hi,
>>
>> we found that the kernel module sizes and memory footprints
>> grow drastically when NR_CPUS is high.  For example, with
>> NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
>> info).
>>
>> A part of the reason is the fixed size array in struct module.
>> The patch below fixes the problem by allocating it dynamically.
>> With the patch, the size can go down to 20MB.
>>
>>
>> Any comments/suggestions appreciated.
>>
> 
> Many attempts were done on this area on the past.

Forgot to include a link to previous attempt : http://lkml.org/lkml/2008/5/15/402

> 
> Your patch has the drawback of using kcalloc(), while previously, 
> module_ref
> space was allocated with vmalloc().
> 
> After a while, a machine could have a lot of vmalloc() space available, 
> but not enough
> physically contiguous space to fullfill a kmalloc(large_area) call.
> 
> So a module load could fail, while previous code could load module.
> 
> I believe Mike Travis has a better patch for this problem, partly using 
> new percpu allocator.
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-11 22:06 ` Eric Dumazet
  2008-11-11 22:15   ` Eric Dumazet
@ 2008-11-11 22:26   ` Takashi Iwai
  2008-11-12  1:44   ` Rusty Russell
  2 siblings, 0 replies; 22+ messages in thread
From: Takashi Iwai @ 2008-11-11 22:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton,
	linux-kernel, Mike Travis

At Tue, 11 Nov 2008 23:06:26 +0100,
Eric Dumazet wrote:
> 
> Takashi Iwai a écrit :
> > Hi,
> > 
> > we found that the kernel module sizes and memory footprints
> > grow drastically when NR_CPUS is high.  For example, with
> > NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
> > info).
> > 
> > A part of the reason is the fixed size array in struct module.
> > The patch below fixes the problem by allocating it dynamically.
> > With the patch, the size can go down to 20MB.
> > 
> > 
> > Any comments/suggestions appreciated.
> > 
> 
> Many attempts were done on this area on the past.
> 
> Your patch has the drawback of using kcalloc(), while previously, module_ref
> space was allocated with vmalloc().
> 
> After a while, a machine could have a lot of vmalloc() space available, but not enough
> physically contiguous space to fullfill a kmalloc(large_area) call.
> 
> So a module load could fail, while previous code could load module.

Then how about to call simply vmalloc() as a fallback?
The revised patch is below.

> I believe Mike Travis has a better patch for this problem, partly using new percpu allocator.

It's interesting...


thanks,

Takashi

===

From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] Allocate module.ref array dynamically (v2)

This patch makes the module handling code to allocate the ref
array of each module struct dynamically.  It saves both module
disk space and memory footprints when number of CPUs is high
like 4096.

Reference: Novell bnc#425240
        https://bugzilla.novell.com/show_bug.cgi?id=425240

v1->v2:
 - use vmalloc() as a fallback in case kmalloc() fails

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 include/linux/module.h |    2 +-
 kernel/module.c        |   28 +++++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 3bfed01..7b5cbf3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -348,7 +348,7 @@ struct module
 	void (*exit)(void);
 
 	/* Reference counts */
-	struct module_ref ref[NR_CPUS];
+	struct module_ref *ref;
 #endif
 };
 #ifndef MODULE_ARCH_INIT
diff --git a/kernel/module.c b/kernel/module.c
index 1f4cc00..6fb7ab5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -571,17 +571,28 @@ static char last_unloaded_module[MODULE_NAME_LEN+1];
 
 #ifdef CONFIG_MODULE_UNLOAD
 /* Init the unload section of the module. */
-static void module_unload_init(struct module *mod)
+static int module_unload_init(struct module *mod)
 {
 	unsigned int i;
+	size_t refsize = nr_cpu_ids * sizeof(*mod->ref);
 
 	INIT_LIST_HEAD(&mod->modules_which_use_me);
-	for (i = 0; i < NR_CPUS; i++)
+
+	mod->ref = kzalloc(refsize, GFP_KERNEL);
+	if (!mod->ref) {
+		mod->ref = vmalloc(refsize);
+		if (!mod->ref)
+			return -ENOMEM;
+		memset(mod->ref, 0, refsize);
+	}
+	for (i = 0; i < nr_cpu_ids; i++)
 		local_set(&mod->ref[i].count, 0);
 	/* Hold reference count during initialization. */
 	local_set(&mod->ref[raw_smp_processor_id()].count, 1);
 	/* Backwards compatibility macros put refcount during init. */
 	mod->waiter = current;
+
+	return 0;
 }
 
 /* modules using other modules */
@@ -661,6 +672,10 @@ static void module_unload_free(struct module *mod)
 			}
 		}
 	}
+	if (is_vmalloc_addr(mod->ref))
+		vfree(mod->ref);
+	else
+		kfree(mod->ref);
 }
 
 #ifdef CONFIG_MODULE_FORCE_UNLOAD
@@ -719,7 +734,7 @@ unsigned int module_refcount(struct module *mod)
 {
 	unsigned int i, total = 0;
 
-	for (i = 0; i < NR_CPUS; i++)
+	for (i = 0; i < nr_cpu_ids; i++)
 		total += local_read(&mod->ref[i].count);
 	return total;
 }
@@ -908,8 +923,9 @@ static inline int use_module(struct module *a, struct module *b)
 	return strong_try_module_get(b) == 0;
 }
 
-static inline void module_unload_init(struct module *mod)
+static inline int module_unload_init(struct module *mod)
 {
+	return 0;
 }
 #endif /* CONFIG_MODULE_UNLOAD */
 
@@ -2051,7 +2067,9 @@ static noinline struct module *load_module(void __user *umod,
 	mod = (void *)sechdrs[modindex].sh_addr;
 
 	/* Now we've moved module, initialize linked lists, etc. */
-	module_unload_init(mod);
+	err = module_unload_init(mod);
+	if (err)
+		goto free_unload;
 
 	/* add kobject, so we can reference it. */
 	err = mod_sysfs_init(mod);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-11 22:15   ` Eric Dumazet
@ 2008-11-11 22:32     ` Takashi Iwai
  0 siblings, 0 replies; 22+ messages in thread
From: Takashi Iwai @ 2008-11-11 22:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rusty Russell, Andreas Gruenbacher, Jan Blunck, Andrew Morton,
	linux-kernel, Mike Travis

At Tue, 11 Nov 2008 23:15:50 +0100,
Eric Dumazet wrote:
> 
> Eric Dumazet a écrit :
> > Takashi Iwai a écrit :
> >> Hi,
> >>
> >> we found that the kernel module sizes and memory footprints
> >> grow drastically when NR_CPUS is high.  For example, with
> >> NR_CPUS=4096, SUSE kernel packages weigh over 500MB (even w/o debug
> >> info).
> >>
> >> A part of the reason is the fixed size array in struct module.
> >> The patch below fixes the problem by allocating it dynamically.
> >> With the patch, the size can go down to 20MB.
> >>
> >>
> >> Any comments/suggestions appreciated.
> >>
> > 
> > Many attempts were done on this area on the past.
> 
> Forgot to include a link to previous attempt : http://lkml.org/lkml/2008/5/15/402

Thanks, an interesting thread.


Takashi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-11 22:06 ` Eric Dumazet
  2008-11-11 22:15   ` Eric Dumazet
  2008-11-11 22:26   ` Takashi Iwai
@ 2008-11-12  1:44   ` Rusty Russell
  2008-11-12  2:26     ` Mike Travis
  2008-11-12  3:14     ` Christoph Lameter
  2 siblings, 2 replies; 22+ messages in thread
From: Rusty Russell @ 2008-11-12  1:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Takashi Iwai, Andreas Gruenbacher, Jan Blunck, Andrew Morton,
	linux-kernel, Mike Travis, Christoph Lameter

On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote:
> I believe Mike Travis has a better patch for this problem, partly using new
> percpu allocator.

Actually, I think this in in Christoph's hands now.  He's been wrangling with 
getting us a real per-cpu allocator.

There's something in linux-next, but I'm not sure of the status. Christoph, is 
this anticipated to make the next merge window, or am I best off merging a 
patch like Eric's for the moment?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  1:44   ` Rusty Russell
@ 2008-11-12  2:26     ` Mike Travis
  2008-11-12  3:17       ` Christoph Lameter
  2008-11-12  3:14     ` Christoph Lameter
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Travis @ 2008-11-12  2:26 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Christoph Lameter

Rusty Russell wrote:
> On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote:
>> I believe Mike Travis has a better patch for this problem, partly using new
>> percpu allocator.
> 
> Actually, I think this in in Christoph's hands now.  He's been wrangling with 
> getting us a real per-cpu allocator.
> 
> There's something in linux-next, but I'm not sure of the status. Christoph, is 
> this anticipated to make the next merge window, or am I best off merging a 
> patch like Eric's for the moment?
> 
> Thanks,
> Rusty.


I haven't looked closely at Christoph's latest but I believe the x86_64 version
is waiting for the zero-based percpu variables (and hence the combined pda/percpu
base.)  It's on the queue just under 4k cpus.

Cheers,
Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  1:44   ` Rusty Russell
  2008-11-12  2:26     ` Mike Travis
@ 2008-11-12  3:14     ` Christoph Lameter
  2008-11-12  9:59       ` Rusty Russell
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-12  3:14 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Wed, 12 Nov 2008, Rusty Russell wrote:

> On Wednesday 12 November 2008 08:36:26 Eric Dumazet wrote:
> > I believe Mike Travis has a better patch for this problem, partly using new
> > percpu allocator.
>
> Actually, I think this in in Christoph's hands now.  He's been wrangling with
> getting us a real per-cpu allocator.

Please use my new email address.... Otherwise I will not see this.

> There's something in linux-next, but I'm not sure of the status. Christoph, is
> this anticipated to make the next merge window, or am I best off merging a
> patch like Eric's for the moment?

Yes the patch in -next is for the next merge window. This should
actually have been in .28.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  2:26     ` Mike Travis
@ 2008-11-12  3:17       ` Christoph Lameter
  2008-11-12 13:46         ` Mike Travis
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-12  3:17 UTC (permalink / raw)
  To: Mike Travis
  Cc: Rusty Russell, Eric Dumazet, Takashi Iwai, Andreas Gruenbacher,
	Jan Blunck, Andrew Morton, linux-kernel

On Tue, 11 Nov 2008, Mike Travis wrote:

> I haven't looked closely at Christoph's latest but I believe the x86_64 version
> is waiting for the zero-based percpu variables (and hence the combined pda/percpu
> base.)  It's on the queue just under 4k cpus.

The base cpu_alloc is indepedent. And your laziness on the zero based
stuff only hurts the x86_64 version. Frankly, I think we should stop
merging 4k patches until you have finished the work on zero based because
that is elementary for various other things and simplifies 4k work. And it
has been pending for more than 6 months now.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  3:14     ` Christoph Lameter
@ 2008-11-12  9:59       ` Rusty Russell
  2008-11-12 20:11         ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2008-11-12  9:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Wednesday 12 November 2008 13:44:51 Christoph Lameter wrote:
> Please use my new email address.... Otherwise I will not see this.

Oops, updated thanks.

> > There's something in linux-next, but I'm not sure of the status.
> > Christoph, is this anticipated to make the next merge window, or am I
> > best off merging a patch like Eric's for the moment?
>
> Yes the patch in -next is for the next merge window. This should
> actually have been in .28.

Perhaps I'm missing the overarching plan here?

You've introduced a third set of per-cpu primitives, yet the second set still 
has 0 users.

Your new basic interface is:
CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU

I don't think the CAPS adds anything.  I'd like to see standard docbook 
comments.  It's not clear from your documentation whether this allocates for 
all possible or only all online CPUs, and the difference between THIS_CPU and 
__THIS_CPU is not immediately obvious.

How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs?  Rename THIS_CPU 
to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr wrappers (a-la 
get_cpu_var).

I love this work, but I think it stumbles on the final polish.  If that's just 
a "not done yet", I'd be happy to try to put some patches together.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  3:17       ` Christoph Lameter
@ 2008-11-12 13:46         ` Mike Travis
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Travis @ 2008-11-12 13:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rusty Russell, Eric Dumazet, Takashi Iwai, Andreas Gruenbacher,
	Jan Blunck, Andrew Morton, linux-kernel

Christoph Lameter wrote:
> On Tue, 11 Nov 2008, Mike Travis wrote:
> 
>> I haven't looked closely at Christoph's latest but I believe the x86_64 version
>> is waiting for the zero-based percpu variables (and hence the combined pda/percpu
>> base.)  It's on the queue just under 4k cpus.
> 
> The base cpu_alloc is indepedent. And your laziness on the zero based
> stuff only hurts the x86_64 version. Frankly, I think we should stop
> merging 4k patches until you have finished the work on zero based because
> that is elementary for various other things and simplifies 4k work. And it
> has been pending for more than 6 months now.
> 


Trust me, I would have finished it by now, but the most critical objective for me
(and hence the direction from my mgmt) is that SGI is able to ship UV systems when
they are ready, to customers who are buying them, to perform as they are expecting.
Nothing else comes close in priority.  I've already missed two merge windows,
missing a third and I may have to learn hara-kiri. ;-)  [And working up to 7 days
of 10-14+ hours per day, is not what one would normally think of as lazy.]

Thanks,
Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12  9:59       ` Rusty Russell
@ 2008-11-12 20:11         ` Christoph Lameter
  2008-11-12 22:01           ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-12 20:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Wed, 12 Nov 2008, Rusty Russell wrote:

> You've introduced a third set of per-cpu primitives, yet the second set still
> has 0 users.

What second set?

> Your new basic interface is:
> CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU
>
> I don't think the CAPS adds anything.  I'd like to see standard docbook
> comments.  It's not clear from your documentation whether this allocates for
> all possible or only all online CPUs, and the difference between THIS_CPU and
> __THIS_CPU is not immediately obvious.

The allocation is for all allocated percpu areas which is now per possible
cpu. The difference between THIS_CPU and __THIS_CPU is the context check
by smp_processor_id()

> How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs?  Rename THIS_CPU
> to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr wrappers (a-la
> get_cpu_var).

The caps functions are macros not functions. Macros are
capitalized. alloc_percpu must stay until the last remains have been
evicted. And the API does work differently.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12 20:11         ` Christoph Lameter
@ 2008-11-12 22:01           ` Rusty Russell
  2008-11-12 22:50             ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2008-11-12 22:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Thursday 13 November 2008 06:41:32 Christoph Lameter wrote:
> On Wed, 12 Nov 2008, Rusty Russell wrote:
> > You've introduced a third set of per-cpu primitives, yet the second set
> > still has 0 users.
>
> What second set?

percpu_alloc().

> > Your new basic interface is:
> > CPU_ALLOC/CPU_FREE/CPU_PTR/THIS_CPU/__THIS_CPU
> >
> > I don't think the CAPS adds anything.  I'd like to see standard docbook
> > comments.  It's not clear from your documentation whether this allocates
> > for all possible or only all online CPUs, and the difference between
> > THIS_CPU and __THIS_CPU is not immediately obvious.
>
> The allocation is for all allocated percpu areas which is now per possible
> cpu. The difference between THIS_CPU and __THIS_CPU is the context check
> by smp_processor_id()

Thanks, I figured that out, but it's not documented.  I'll happily create a 
patch, but see below.  BTW, did you end up using __THIS_CPU anywhere?

> > How about re-using alloc_percpu/free_percpu/per_cpu_ptr APIs?  Rename
> > THIS_CPU to __get_cpu_ptr and implement get_cpu_ptr and put_cpu_ptr
> > wrappers (a-la get_cpu_var).
>
> The caps functions are macros not functions. Macros are
> capitalized.

Traditionally, yes, but we seem to be more ambivalent in the kernel.  
list_for_each() et. al spring to mind: code would be uglier if we made them 
caps.

> alloc_percpu must stay until the last remains have been
> evicted.

OK, you anticipate replacement of all alloc_percpu users?

> And the API does work differently.

I'm afraid the difference has escaped me so far.  Please explain why you can't 
use the current API.  Is this subtle difference going to make the transition 
difficult?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12 22:01           ` Rusty Russell
@ 2008-11-12 22:50             ` Christoph Lameter
  2008-11-13  9:21               ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-12 22:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Thu, 13 Nov 2008, Rusty Russell wrote:

> > > You've introduced a third set of per-cpu primitives, yet the second set
> > > still has 0 users.
> >
> > What second set?
>
> percpu_alloc().

That is a variant of allocpercpu. We have thinned the allocpercpu API
somewhat already. I can drop more portions or I can drop it completely at
the very end of the patchset.

> > cpu. The difference between THIS_CPU and __THIS_CPU is the context check
> > by smp_processor_id()
>
> Thanks, I figured that out, but it's not documented.  I'll happily create a
> patch, but see below.  BTW, did you end up using __THIS_CPU anywhere?

Yes in later patches. This is a total set of 40 or so patches.

> > The caps functions are macros not functions. Macros are
> > capitalized.
>
> Traditionally, yes, but we seem to be more ambivalent in the kernel.
> list_for_each() et. al spring to mind: code would be uglier if we made them
> caps.

Macros being capitalized have a reason: They may do strange things with
the parameters. In particular CPU_ALLOC/CPU_FREE do a determination of
the size of the type. The macros do not take an argument in the
traditional sense. Frankly alloc_percpu(type) is something that looks not
right to me.

> > alloc_percpu must stay until the last remains have been
> > evicted.
>
> OK, you anticipate replacement of all alloc_percpu users?

Yes. If you look at the full patchsets that were posted at various times
during the last year you see that allocpercpu will be gone.

> > And the API does work differently.
>
> I'm afraid the difference has escaped me so far.  Please explain why you can't
> use the current API.  Is this subtle difference going to make the transition
> difficult?

The old api was based on an attempt to introduce a cpu mask. That mask was
never used. See percpu_alloc_mask. The handling is not consistent with the
nature of the percpu sections for other percpu data because allocation is
only done for online processors. So we have semantic differences. The API
is inconsistent and underwent rot. Having to allocate percpu sections if
processors are onlined and offlined causes the need for more hooks. The
cpu alloc patchset gets rid of about half the hooks in the page allocator
and slab allocator.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-12 22:50             ` Christoph Lameter
@ 2008-11-13  9:21               ` Rusty Russell
  2008-11-13 14:40                 ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2008-11-13  9:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Thursday 13 November 2008 09:20:46 Christoph Lameter wrote:
> The old api was based on an attempt to introduce a cpu mask. That mask was
> never used. See percpu_alloc_mask. The handling is not consistent with the
> nature of the percpu sections for other percpu data because allocation is
> only done for online processors. So we have semantic differences. The API
> is inconsistent and underwent rot.

Yes, but I was talking about the original percpu API: 
alloc_percpu/free_percpu/per_cpu_ptr.  That's the only bit that counts, as 
it's the only bit that's used.  Yes, the percpu_alloc should die.

Just convert *that API* to your new implementation, and drop all the 
conversion patches.  I said this back in June.

> The cpu alloc patchset gets rid of about half the hooks in the page
> allocator and slab allocator.

Sure, but we could convert those today to alloc_percpu etc.
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-13  9:21               ` Rusty Russell
@ 2008-11-13 14:40                 ` Christoph Lameter
  2008-11-13 23:49                   ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-13 14:40 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Thu, 13 Nov 2008, Rusty Russell wrote:

> > The cpu alloc patchset gets rid of about half the hooks in the page
> > allocator and slab allocator.
>
> Sure, but we could convert those today to alloc_percpu etc.

The percpu_ptr macros etc are not upercase suggesting that it is a
function but the functions do weird things like passing through the
type. This is confusing.

alloc_percpu() does not allow specifying a GFP mask therefore does not
support zeroing and is not extendable for the future. It does not allow
specifying an alignment therefore tight packing is not possible.

Plus the name alloc_percpu() suggests that it is a function but its a
macro doing things with typing.

CPU_PTR and THIS_CPU can be applied both to percpu pointers and
percpu variables. The new API unifies the handling which is required in
order for the new cpu_ops to work both on statically and dynamically
allocated percpu data.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-13 14:40                 ` Christoph Lameter
@ 2008-11-13 23:49                   ` Rusty Russell
  2008-11-14  0:20                     ` Christoph Lameter
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2008-11-13 23:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Friday 14 November 2008 01:10:32 Christoph Lameter wrote:
> On Thu, 13 Nov 2008, Rusty Russell wrote:
> > > The cpu alloc patchset gets rid of about half the hooks in the page
> > > allocator and slab allocator.
> >
> > Sure, but we could convert those today to alloc_percpu etc.
>
> The percpu_ptr macros etc are not upercase suggesting that it is a
> function but the functions do weird things like passing through the
> type. This is confusing.

No it's not, we have several like that.  And noone's going to somehow get 
confused and misuse it.  Finally, we don't change APIs just to meet some sense 
of neatness.

> alloc_percpu() does not allow specifying a GFP mask therefore does not
> support zeroing and is not extendable for the future.

It is defined to be zeroing.  And it is very unclear that (1) future 
implementations will be able to support GFP_ATOMIC reasonably, and (2) that we 
want people to do that.  If we do, we fix it.

> It does not allow
> specifying an alignment therefore tight packing is not possible.

It absolutely does.  That's why it takes a type!

> Plus the name alloc_percpu() suggests that it is a function but its a
> macro doing things with typing.

You said that before.

> CPU_PTR and THIS_CPU can be applied both to percpu pointers and
> percpu variables. The new API unifies the handling which is required in
> order for the new cpu_ops to work both on statically and dynamically
> allocated percpu data.

And if you called them per_cpu_ptr and __get_cpu_ptr they would have these 
same properties.  But now the names would be consistent.

So let's just reimplement the original API and be done.
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-13 23:49                   ` Rusty Russell
@ 2008-11-14  0:20                     ` Christoph Lameter
  2008-11-16  0:00                       ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-14  0:20 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Fri, 14 Nov 2008, Rusty Russell wrote:

> > alloc_percpu() does not allow specifying a GFP mask therefore does not
> > support zeroing and is not extendable for the future.
>
> It is defined to be zeroing.  And it is very unclear that (1) future
> implementations will be able to support GFP_ATOMIC reasonably, and (2) that we
> want people to do that.  If we do, we fix it.

But some uses do not need zeroing. They currently have no choice. And
other flags can become necessary if percpu areas gain the ability of
being dynamically extended. Zeroing percpu areas on systems with
lots of cpus can take awhile in particular since these areas may be on
remote nodes.

> > It does not allow
> > specifying an alignment therefore tight packing is not possible.
>
> It absolutely does.  That's why it takes a type!

alloc_percpu is based on __alloc_percpu which currently does not take an
alignment. Guess we could get there by removing the intermediate
functions and making alloc_percpu call cpu_alloc.

However, there are only a few places that use allocpercpu and all of
those are touched by necesssity by the full cpu alloc patchset. At mininum
we need to add the allocation flags and add cpu ops as well as review the
preemption at each call site etcin order to use the correct
call. So why bother with the old API?

> > CPU_PTR and THIS_CPU can be applied both to percpu pointers and
> > percpu variables. The new API unifies the handling which is required in
> > order for the new cpu_ops to work both on statically and dynamically
> > allocated percpu data.
>
> And if you called them per_cpu_ptr and __get_cpu_ptr they would have these
> same properties.  But now the names would be consistent.

Having macros that do type casting on results and do macro operations on
parameters look like functions does not seem to be right. We can make it
consistent at the end by renaming the remaining ones.

Having an uppercase CPU_PTR and such in the source clarifies
that something special is going on. This is consistent with other
uses in the kernel. This is CPP magic after all (see examples in
include/linux/kernel.h or slab.h f.e.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-14  0:20                     ` Christoph Lameter
@ 2008-11-16  0:00                       ` Rusty Russell
  2008-11-16 21:41                         ` Rusty Russell
  2008-11-20 16:47                         ` Christoph Lameter
  0 siblings, 2 replies; 22+ messages in thread
From: Rusty Russell @ 2008-11-16  0:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Friday 14 November 2008 10:50:41 Christoph Lameter wrote:
> On Fri, 14 Nov 2008, Rusty Russell wrote:
> > It is defined to be zeroing.
>
> But some uses do not need zeroing. They currently have no choice.

OK, of the 32 in-tree users, only 4 don't need zeroing (rough audit; they 
might need zeroing in some subtle way).  Performance arguments are not valid 
in any of these cases though, and it's hard to see that they would be.

> And other flags can become necessary if percpu areas gain the ability of
> being dynamically extended.

And other flags become impossible, eg. GFP_ATOMIC.

> > It absolutely does.  That's why it takes a type!
>
> alloc_percpu is based on __alloc_percpu which currently does not take an
> alignment. Guess we could get there by removing the intermediate
> functions and making alloc_percpu call cpu_alloc.

Yes.  When I originally wrote this, it hooked up to the percpu allocator which 
took an alignment (this was reduced to the old module.c percpu allocator in 
the end).

The strange indirection was from someone's failed experiment at only 
allocating only online cpus.  We should kill that as a separate patch.

> However, there are only a few places that use allocpercpu and all of
> those are touched by necesssity by the full cpu alloc patchset.

No, that's my point.  There are 28 places whihc use alloc_percpu, and they 
should be left.  There are 4 places which use __alloc_percpu, and all of them 
can be converted to alloc_percpu (see patch below).

Then, you change alloc_percpu(type) to hand the size and align to your 
infrastructure (call it __alloc_percpu(size, align) if you want, or stick with 
cpu_alloc()).

> At mininum
> we need to add the allocation flags and add cpu ops as well as review the
> preemption at each call site etcin order to use the correct
> call. So why bother with the old API?

We don't need a flags arg, there's no evidence of any problem, and it's 
useless churn to change it.

There are several seperate things here (this is to make sure my head is 
straight and to clarify for others skimming):

1) Make the dynamic percpu allocator use the static percpu system.
   - Agreed, always the aim, never quite happened.

2) Make zeroing optional
   - Disagree, adds API complexity for corner case with no gain.  Using
     gfp_t for it is even worse, since it implies GFP_ATOMIC is valid or
     sensible.

3) Change API to use CAPS for macros
   - Strongly disagree, Linus doesn't use CAPS for type-taking macros
     (list_entry, container_of, min_t), it's ugly, and status-quo wins.

4) Get rid of unused "online-only" percpu allocators.
   - Agree, and will simplify implementation and macro tangle.

5) Make dynamic percpu var access more efficient.
   - Agree, obviously.  x86 for the moment, the rest can follow (or not).

6) Use percpu allocations more widely.
   - Definitely, I have some other patches which use it too.  And makes even
     more sense once (5) is done.

7) Make percpu area grow dynamically.
   - Yes, but a thorny tangle esp. with IA64.  The cmdline hack is probably
     sufficient meanwhile, and parallels can be drawn with vmalloc.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-16  0:00                       ` Rusty Russell
@ 2008-11-16 21:41                         ` Rusty Russell
  2008-11-20 16:47                         ` Christoph Lameter
  1 sibling, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2008-11-16 21:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Sunday 16 November 2008 10:30:33 Rusty Russell wrote:
> There are 4 places which use __alloc_percpu, and all of
> them can be converted to alloc_percpu (see patch below).

No they can't: that patch was crap.

The three files which use __alloc_percpu will need conversion.

Sorry,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-16  0:00                       ` Rusty Russell
  2008-11-16 21:41                         ` Rusty Russell
@ 2008-11-20 16:47                         ` Christoph Lameter
  2008-11-20 23:21                           ` Rusty Russell
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Lameter @ 2008-11-20 16:47 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Sun, 16 Nov 2008, Rusty Russell wrote:

> > And other flags can become necessary if percpu areas gain the ability of
> > being dynamically extended.
>
> And other flags become impossible, eg. GFP_ATOMIC.

There are other flags that may become relevant like GFP_THISNODE, reclaim
settings (counter allocation from filesystem context) etc.

> The strange indirection was from someone's failed experiment at only
> allocating only online cpus.  We should kill that as a separate patch.

I fully agree.

> There are several seperate things here (this is to make sure my head is
> straight and to clarify for others skimming):
>
> 1) Make the dynamic percpu allocator use the static percpu system.
>    - Agreed, always the aim, never quite happened.

Ack.

> 2) Make zeroing optional
>    - Disagree, adds API complexity for corner case with no gain.  Using
>      gfp_t for it is even worse, since it implies GFP_ATOMIC is valid or
>      sensible.

It does not imply that. Various allocations limit the type of flags that
can be passed. Also there may be situations in which GFP_ATOMIC will make
sense in the future f.e. if a percpu counter structure has to be allocated
from an interrupt context.

GFP_FS, GFP_IO, GFP_NOFAIL GFP_ZERO GFP_NOMEMALLOC GFP_THISNODE and
GFP_HARDWALL could all be relevant for a dynamically extending percpu
allocator since memory reclaim could be triggered.

> 3) Change API to use CAPS for macros
>    - Strongly disagree, Linus doesn't use CAPS for type-taking macros
>      (list_entry, container_of, min_t), it's ugly, and status-quo wins.

That is the cause for many problems because people assume these can be
handled like a function.

> 4) Get rid of unused "online-only" percpu allocators.
>    - Agree, and will simplify implementation and macro tangle.

Ack.

> 5) Make dynamic percpu var access more efficient.
>    - Agree, obviously.  x86 for the moment, the rest can follow (or not).

Ack.
>
> 6) Use percpu allocations more widely.
>    - Definitely, I have some other patches which use it too.  And makes even
>      more sense once (5) is done.

Ack.

> 7) Make percpu area grow dynamically.
>    - Yes, but a thorny tangle esp. with IA64.  The cmdline hack is probably
>      sufficient meanwhile, and parallels can be drawn with vmalloc.

Allright. Please check with David Miller who wants to allocate thousands
of network interfaces which all need a MIB block.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Allocate module.ref array dynamically
  2008-11-20 16:47                         ` Christoph Lameter
@ 2008-11-20 23:21                           ` Rusty Russell
  0 siblings, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2008-11-20 23:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Takashi Iwai, Andreas Gruenbacher, Jan Blunck,
	Andrew Morton, linux-kernel, Mike Travis

On Friday 21 November 2008 03:17:24 Christoph Lameter wrote:
> > 7) Make percpu area grow dynamically.
> >    - Yes, but a thorny tangle esp. with IA64.  The cmdline hack is
> > probably sufficient meanwhile, and parallels can be drawn with vmalloc.
>
> Allright. Please check with David Miller who wants to allocate thousands
> of network interfaces which all need a MIB block.

Yes, I'm already battling Dave over percpu issues.  I'll add this to the list 
:)

Thanks for the warning,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-11-20 23:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-11 20:01 [PATCH] Allocate module.ref array dynamically Takashi Iwai
2008-11-11 22:06 ` Eric Dumazet
2008-11-11 22:15   ` Eric Dumazet
2008-11-11 22:32     ` Takashi Iwai
2008-11-11 22:26   ` Takashi Iwai
2008-11-12  1:44   ` Rusty Russell
2008-11-12  2:26     ` Mike Travis
2008-11-12  3:17       ` Christoph Lameter
2008-11-12 13:46         ` Mike Travis
2008-11-12  3:14     ` Christoph Lameter
2008-11-12  9:59       ` Rusty Russell
2008-11-12 20:11         ` Christoph Lameter
2008-11-12 22:01           ` Rusty Russell
2008-11-12 22:50             ` Christoph Lameter
2008-11-13  9:21               ` Rusty Russell
2008-11-13 14:40                 ` Christoph Lameter
2008-11-13 23:49                   ` Rusty Russell
2008-11-14  0:20                     ` Christoph Lameter
2008-11-16  0:00                       ` Rusty Russell
2008-11-16 21:41                         ` Rusty Russell
2008-11-20 16:47                         ` Christoph Lameter
2008-11-20 23:21                           ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).