LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
@ 2007-03-20 17:12 Eric Dumazet
2007-03-20 19:54 ` Christoph Lameter
0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2007-03-20 17:12 UTC (permalink / raw)
To: Andrew Morton, Christoph Lameter, Andi Kleen; +Cc: linux kernel
Hi
I noticed on a small x86_64 NUMA setup (2 nodes) that cache_free_alien() is very expensive.
This is because of a cache miss on struct slab.
At the time an object is freed (call to kmem_cache_free() for example), the underlying 'struct slab' is not anymore cache-hot.
struct slab *slabp = virt_to_slab(objp);
nodeid = slabp->nodeid; // cache miss
So we currently need slab only to lookup nodeid, to be able to use the cachep cpu cache, or not.
Couldn't we use something less expensive, like pfn_to_nid() ?
On x86_64 pfn_to_nid usually shares one cache line for all objects (struct memnode)
Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
Eric
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-20 17:12 [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Eric Dumazet
@ 2007-03-20 19:54 ` Christoph Lameter
2007-03-20 21:32 ` Andi Kleen
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-20 19:54 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Andi Kleen, linux kernel
On Tue, 20 Mar 2007, Eric Dumazet wrote:
>
> I noticed on a small x86_64 NUMA setup (2 nodes) that cache_free_alien() is very expensive.
> This is because of a cache miss on struct slab.
> At the time an object is freed (call to kmem_cache_free() for example), the underlying 'struct slab' is not anymore cache-hot.
>
> struct slab *slabp = virt_to_slab(objp);
> nodeid = slabp->nodeid; // cache miss
>
> So we currently need slab only to lookup nodeid, to be able to use the cachep cpu cache, or not.
>
> Couldn't we use something less expensive, like pfn_to_nid() ?
Nodeid describes the node that the slab was allocated for which may
not be equal to the node that the page came from. But if GFP_THISNODE is
used then this should always be the same. That those two nodeid are
different was certainly frequent before the GFP_THISNODE work went in. Now
it may just occur in corner cases. Perhaps during bootup on
some machines that boot with empty nodes. I vaguely recall a powerpc
issue.
> Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
It is possible the page allocator falls back to another node than
requested. We would need to check that this never occurs.
If we are sure then we could drop the nodeid field completely.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-20 19:54 ` Christoph Lameter
@ 2007-03-20 21:32 ` Andi Kleen
2007-03-20 22:09 ` Eric Dumazet
2007-03-21 0:18 ` Christoph Lameter
0 siblings, 2 replies; 31+ messages in thread
From: Andi Kleen @ 2007-03-20 21:32 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Eric Dumazet, Andrew Morton, Andi Kleen, linux kernel
> > Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
>
> It is possible the page allocator falls back to another node than
> requested. We would need to check that this never occurs.
The only way to ensure that would be to set a strict mempolicy.
But I'm not sure that's a good idea -- after all you don't want
to fail an allocation in this case.
But pfn_to_nid on the object like proposed by Eric should work anyways.
But I'm not sure the tables used for that will be more often cache hot
than the slab.
-Andi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-20 21:32 ` Andi Kleen
@ 2007-03-20 22:09 ` Eric Dumazet
2007-03-21 0:16 ` Christoph Lameter
2007-03-21 0:18 ` Christoph Lameter
1 sibling, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2007-03-20 22:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: Christoph Lameter, Andrew Morton, linux kernel
Andi Kleen a écrit :
>>> Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
>> It is possible the page allocator falls back to another node than
>> requested. We would need to check that this never occurs.
>
> The only way to ensure that would be to set a strict mempolicy.
> But I'm not sure that's a good idea -- after all you don't want
> to fail an allocation in this case.
>
> But pfn_to_nid on the object like proposed by Eric should work anyways.
> But I'm not sure the tables used for that will be more often cache hot
> than the slab.
pfn_to_nid() on most x86_64 machines access one cache line (struct memnode).
Node 0 MemBase 0000000000000000 Limit 0000000280000000
Node 1 MemBase 0000000280000000 Limit 0000000480000000
NUMA: Using 31 for the hash shift.
On this example, we use only 8 bytes of memnode.embedded_map[] to find nid of
all 16 GB of ram. On profiles I have, memnode is always hot (no cache miss on it).
While virt_to_slab() has to access :
1) struct page -> page_get_slab() (page->lru.prev) (one cache miss)
2) struct slab -> nodeid (one other cache miss)
So using pfn_to_nid() would avoid 2 cache misses.
I understand we want to do special things (fallback and such tricks) at
allocation time, but I believe that we can just trust the real nid of memory
at free time.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-20 22:09 ` Eric Dumazet
@ 2007-03-21 0:16 ` Christoph Lameter
2007-03-21 6:27 ` Eric Dumazet
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 0:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, Andrew Morton, linux kernel
On Tue, 20 Mar 2007, Eric Dumazet wrote:
> I understand we want to do special things (fallback and such tricks) at
> allocation time, but I believe that we can just trust the real nid of memory
> at free time.
Sorry no. The node at allocation time determines which node specific
structure tracks the slab. If we fall back then the node is allocated
from one node but entered in the node structure of another. Thus you
cannot free the slab without knowing the node at allocation time.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-20 21:32 ` Andi Kleen
2007-03-20 22:09 ` Eric Dumazet
@ 2007-03-21 0:18 ` Christoph Lameter
2007-03-21 2:44 ` Andi Kleen
1 sibling, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 0:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: Eric Dumazet, Andrew Morton, linux kernel
On Tue, 20 Mar 2007, Andi Kleen wrote:
> > > Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) ?
> >
> > It is possible the page allocator falls back to another node than
> > requested. We would need to check that this never occurs.
>
> The only way to ensure that would be to set a strict mempolicy.
> But I'm not sure that's a good idea -- after all you don't want
> to fail an allocation in this case.
>
> But pfn_to_nid on the object like proposed by Eric should work anyways.
> But I'm not sure the tables used for that will be more often cache hot
> than the slab.
We usually use page_to_nid(). Sure this will determine the node the object
resides on. But this may not be the node on which the slab is tracked
since there may have been a fallback at alloc time.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 0:18 ` Christoph Lameter
@ 2007-03-21 2:44 ` Andi Kleen
2007-03-21 3:10 ` Christoph Lameter
0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-03-21 2:44 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, Eric Dumazet, Andrew Morton, linux kernel
> We usually use page_to_nid(). Sure this will determine the node the object
> resides on. But this may not be the node on which the slab is tracked
> since there may have been a fallback at alloc time.
How about your slab rewrite? I assume it would make more sense to fix
such problems in that code instead of the old which is going to be replaced
at some point.
-Andi
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 2:44 ` Andi Kleen
@ 2007-03-21 3:10 ` Christoph Lameter
2007-03-22 21:28 ` non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;) Siddha, Suresh B
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 3:10 UTC (permalink / raw)
To: Andi Kleen; +Cc: Eric Dumazet, Andrew Morton, linux kernel
On Wed, 21 Mar 2007, Andi Kleen wrote:
> > We usually use page_to_nid(). Sure this will determine the node the object
> > resides on. But this may not be the node on which the slab is tracked
> > since there may have been a fallback at alloc time.
>
> How about your slab rewrite? I assume it would make more sense to fix
> such problems in that code instead of the old which is going to be replaced
> at some point.
The slab rewrite first allocates a page and then determines where it
came from instead of requiring the page allocator to allocate from a
certain node. Plus SLUB does not keep per cpu or per node object queues.
So the problem does not occur in the same way. The per cpu slab in SLUB
can contain objects from another node whereas SLAB can only put node local
objects on its queues.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 0:16 ` Christoph Lameter
@ 2007-03-21 6:27 ` Eric Dumazet
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
2007-03-21 7:03 ` [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Christoph Lameter
0 siblings, 2 replies; 31+ messages in thread
From: Eric Dumazet @ 2007-03-21 6:27 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, Andrew Morton, linux kernel
Christoph Lameter a écrit :
> On Tue, 20 Mar 2007, Eric Dumazet wrote:
>
>> I understand we want to do special things (fallback and such tricks) at
>> allocation time, but I believe that we can just trust the real nid of memory
>> at free time.
>
> Sorry no. The node at allocation time determines which node specific
> structure tracks the slab. If we fall back then the node is allocated
> from one node but entered in the node structure of another. Thus you
> cannot free the slab without knowing the node at allocation time.
I think you dont understand my point.
When we enter kmem_cache_free(), we are not freeing a slab, but an object,
knowing a pointer to this object.
The fast path is to put the pointer, into the cpu array cache. This object
might be given back some cycles later, because of a kmem_cache_alloc() : No
need to access the two cache lines (struct page, struct slab)
This fast path could be entered checking the node of the page, which is
faster, but eventually different from the virt_to_slab(obj)->nodeid. Do we
care ? Definitly not : Node is guaranted to be correct.
Then, if we must flush the cpu array cache because it is full, we *may* access
the slabs of the objects we are flushing, and then check the
virt_to_slab(obj)->nodeid to be able to do the correct thing.
Fortunatly, flushing cache array is not a frequent event, and the cost of
access to cache lines (truct page, struct slab) can be amortized because
several 'transfered or freed' objects might share them at this time.
Actually I had to disable NUMA on my platforms because it is just overkill and
slower. Maybe its something OK for very big machines, and not dual nodes
Opterons ? Let me know so that I dont waste your time (and mine)
Thank you
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache()
2007-03-21 6:27 ` Eric Dumazet
@ 2007-03-21 6:57 ` Eric Dumazet
2007-03-21 7:21 ` [PATCH] SLAB : Dont allocate empty shared caches Eric Dumazet
` (2 more replies)
2007-03-21 7:03 ` [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Christoph Lameter
1 sibling, 3 replies; 31+ messages in thread
From: Eric Dumazet @ 2007-03-21 6:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, Andi Kleen, linux kernel
[-- Attachment #1: Type: text/plain, Size: 986 bytes --]
The existing comment in mm/slab.c is *perfect*, so I reproduce it :
/*
* CPU bound tasks (e.g. network routing) can exhibit cpu bound
* allocation behaviour: Most allocs on one cpu, most free operations
* on another cpu. For these cases, an efficient object passing between
* cpus is necessary. This is provided by a shared array. The array
* replaces Bonwick's magazine layer.
* On uniprocessor, it's functionally equivalent (but less efficient)
* to a larger limit. Thus disabled by default.
*/
As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way
a preprocessor #if can detect if the machine is UP or SMP. Better to use
num_possible_cpus().
This means on UP we allocate a 'size=0 shared array', to be more efficient.
Another patch can later avoid the allocations of 'empty shared arrays', to
save some memory.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: slab_use_num_possible_cpus.patch --]
[-- Type: text/plain, Size: 398 bytes --]
diff --git a/mm/slab.c b/mm/slab.c
index 57f7aa4..a69d0a5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3975,10 +3975,8 @@ static int enable_cpucache(struct kmem_c
* to a larger limit. Thus disabled by default.
*/
shared = 0;
-#ifdef CONFIG_SMP
- if (cachep->buffer_size <= PAGE_SIZE)
+ if (cachep->buffer_size <= PAGE_SIZE && num_possible_cpus() > 1)
shared = 8;
-#endif
#if DEBUG
/*
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 6:27 ` Eric Dumazet
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
@ 2007-03-21 7:03 ` Christoph Lameter
2007-03-21 7:14 ` Eric Dumazet
1 sibling, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 7:03 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, Andrew Morton, linux kernel
On Wed, 21 Mar 2007, Eric Dumazet wrote:
> The fast path is to put the pointer, into the cpu array cache. This object
> might be given back some cycles later, because of a kmem_cache_alloc() : No
> need to access the two cache lines (struct page, struct slab)
If you do that then the slab will no longer return objects from the
desired nodes. The assumption is that cpu array objects are from the local
node.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 7:03 ` [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Christoph Lameter
@ 2007-03-21 7:14 ` Eric Dumazet
2007-03-21 14:35 ` Christoph Lameter
0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2007-03-21 7:14 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, Andrew Morton, linux kernel
Christoph Lameter a écrit :
> On Wed, 21 Mar 2007, Eric Dumazet wrote:
>
>> The fast path is to put the pointer, into the cpu array cache. This object
>> might be given back some cycles later, because of a kmem_cache_alloc() : No
>> need to access the two cache lines (struct page, struct slab)
>
> If you do that then the slab will no longer return objects from the
> desired nodes. The assumption is that cpu array objects are from the local
> node.
Me confused.
How the following could be wrong ?
static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
{
int mynode = numa_node_id();
int objnode = virt_to_nid(objp); // or whatever
if (mynode == objnode)
return 0;
...
}
If numa_node_id() is equal to the node of the page containing the first byte
of the object, then object is on the local node. Or what ?
Thank you
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] SLAB : Dont allocate empty shared caches
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
@ 2007-03-21 7:21 ` Eric Dumazet
2007-03-21 13:13 ` Pekka Enberg
2007-03-21 13:02 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Pekka Enberg
2007-03-21 18:45 ` Christoph Lameter
2 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2007-03-21 7:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, Andi Kleen, linux kernel
[-- Attachment #1: Type: text/plain, Size: 306 bytes --]
We can avoid allocating empty shared caches and avoid unecessary check of
cache->limit. We save some memory. We avoid bringing into CPU cache unecessary
cache lines.
All accesses to l3->shared are already checking NULL pointers so this patch is
safe.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: dont_allocate_empty_arraycache.patch --]
[-- Type: text/plain, Size: 1612 bytes --]
diff --git a/mm/slab.c b/mm/slab.c
index a69d0a5..abf46ae 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1223,19 +1223,20 @@ static int __cpuinit cpuup_callback(stru
*/
list_for_each_entry(cachep, &cache_chain, next) {
struct array_cache *nc;
- struct array_cache *shared;
+ struct array_cache *shared = NULL;
struct array_cache **alien = NULL;
nc = alloc_arraycache(node, cachep->limit,
cachep->batchcount);
if (!nc)
goto bad;
- shared = alloc_arraycache(node,
+ if (cachep->shared) {
+ shared = alloc_arraycache(node,
cachep->shared * cachep->batchcount,
0xbaadf00d);
- if (!shared)
- goto bad;
-
+ if (!shared)
+ goto bad;
+ }
if (use_alien_caches) {
alien = alloc_alien_cache(node, cachep->limit);
if (!alien)
@@ -1317,8 +1318,8 @@ #endif
shared = l3->shared;
if (shared) {
- free_block(cachep, l3->shared->entry,
- l3->shared->avail, node);
+ free_block(cachep, shared->entry,
+ shared->avail, node);
l3->shared = NULL;
}
@@ -3812,12 +3813,15 @@ static int alloc_kmemlist(struct kmem_ca
goto fail;
}
- new_shared = alloc_arraycache(node,
+ new_shared = NULL;
+ if (cachep->shared) {
+ new_shared = alloc_arraycache(node,
cachep->shared*cachep->batchcount,
0xbaadf00d);
- if (!new_shared) {
- free_alien_cache(new_alien);
- goto fail;
+ if (!new_shared) {
+ free_alien_cache(new_alien);
+ goto fail;
+ }
}
l3 = cachep->nodelists[node];
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache()
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
2007-03-21 7:21 ` [PATCH] SLAB : Dont allocate empty shared caches Eric Dumazet
@ 2007-03-21 13:02 ` Pekka Enberg
2007-03-21 18:45 ` Christoph Lameter
2 siblings, 0 replies; 31+ messages in thread
From: Pekka Enberg @ 2007-03-21 13:02 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Christoph Lameter, Andi Kleen, linux kernel
On 3/21/07, Eric Dumazet <dada1@cosmosbay.com> wrote:
> As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way
> a preprocessor #if can detect if the machine is UP or SMP. Better to use
> num_possible_cpus().
Looks good to me.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] SLAB : Dont allocate empty shared caches
2007-03-21 7:21 ` [PATCH] SLAB : Dont allocate empty shared caches Eric Dumazet
@ 2007-03-21 13:13 ` Pekka Enberg
0 siblings, 0 replies; 31+ messages in thread
From: Pekka Enberg @ 2007-03-21 13:13 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Christoph Lameter, Andi Kleen, linux kernel
On 3/21/07, Eric Dumazet <dada1@cosmosbay.com> wrote:
> We can avoid allocating empty shared caches and avoid unecessary check of
> cache->limit. We save some memory. We avoid bringing into CPU cache unecessary
> cache lines.
>
> All accesses to l3->shared are already checking NULL pointers so this patch is
> safe.
Looks good to me.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;
2007-03-21 7:14 ` Eric Dumazet
@ 2007-03-21 14:35 ` Christoph Lameter
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 14:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, Andrew Morton, linux kernel
On Wed, 21 Mar 2007, Eric Dumazet wrote:
> If numa_node_id() is equal to the node of the page containing the first byte
> of the object, then object is on the local node. Or what ?
No. The slab (the page you are referring to) may have been allocated for
another node and been tracked via the node structs of that other node. We
were just falling back to the node that now appears to be local.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache()
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
2007-03-21 7:21 ` [PATCH] SLAB : Dont allocate empty shared caches Eric Dumazet
2007-03-21 13:02 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Pekka Enberg
@ 2007-03-21 18:45 ` Christoph Lameter
2 siblings, 0 replies; 31+ messages in thread
From: Christoph Lameter @ 2007-03-21 18:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, Andi Kleen, linux kernel
On Wed, 21 Mar 2007, Eric Dumazet wrote:
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-21 3:10 ` Christoph Lameter
@ 2007-03-22 21:28 ` Siddha, Suresh B
2007-03-22 22:10 ` Christoph Lameter
2007-03-22 22:12 ` Eric Dumazet
0 siblings, 2 replies; 31+ messages in thread
From: Siddha, Suresh B @ 2007-03-22 21:28 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andi Kleen, Eric Dumazet, Andrew Morton, linux kernel
Christoph,
While we are at this topic, recently I had reports that
cache_free_alien() is costly on non NUMA platforms too (similar
to the cache miss issues that Eric was referring to on NUMA)
and the appended patch seems to fix it for non NUMA atleast.
Appended patch gives a nice 1% perf improvement on non-NUMA platform
with database workload.
Please comment or Ack for mainline :)
thanks,
suresh
---
Subject: [patch] slab: skip cache_free_alien() on non NUMA
From: Suresh Siddha <suresh.b.siddha@intel.com>
set use_alien_caches to 0 on non NUMA platforms. And avoid calling
the cache_free_alien() when use_alien_caches is not set. This will avoid
the cache miss that happens while dereferencing slabp to get nodeid.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
diff --git a/mm/slab.c b/mm/slab.c
index 8fdaffa..146082d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1146,7 +1146,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
* Make sure we are not freeing a object from another node to the array
* cache on this cpu.
*/
- if (likely(slabp->nodeid == node) || unlikely(!use_alien_caches))
+ if (likely(slabp->nodeid == node))
return 0;
l3 = cachep->nodelists[node];
@@ -1394,6 +1394,9 @@ void __init kmem_cache_init(void)
int order;
int node;
+ if (num_online_nodes() == 1)
+ use_alien_caches = 0;
+
for (i = 0; i < NUM_INIT_LISTS; i++) {
kmem_list3_init(&initkmem_list3[i]);
if (i < MAX_NUMNODES)
@@ -3563,7 +3566,7 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp)
check_irq_off();
objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
- if (cache_free_alien(cachep, objp))
+ if (use_alien_caches && cache_free_alien(cachep, objp))
return;
if (likely(ac->avail < ac->limit)) {
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 21:28 ` non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;) Siddha, Suresh B
@ 2007-03-22 22:10 ` Christoph Lameter
2007-03-22 22:12 ` Eric Dumazet
1 sibling, 0 replies; 31+ messages in thread
From: Christoph Lameter @ 2007-03-22 22:10 UTC (permalink / raw)
To: Siddha, Suresh B; +Cc: Andi Kleen, Eric Dumazet, Andrew Morton, linux kernel
On Thu, 22 Mar 2007, Siddha, Suresh B wrote:
> @@ -1394,6 +1394,9 @@ void __init kmem_cache_init(void)
> int order;
> int node;
>
> + if (num_online_nodes() == 1)
> + use_alien_caches = 0;
> +
What happens if you bring up a second node?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 21:28 ` non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;) Siddha, Suresh B
2007-03-22 22:10 ` Christoph Lameter
@ 2007-03-22 22:12 ` Eric Dumazet
2007-03-22 22:40 ` Siddha, Suresh B
1 sibling, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2007-03-22 22:12 UTC (permalink / raw)
To: Siddha, Suresh B
Cc: Christoph Lameter, Andi Kleen, Andrew Morton, linux kernel
Siddha, Suresh B a écrit :
> Christoph,
>
> While we are at this topic, recently I had reports that
> cache_free_alien() is costly on non NUMA platforms too (similar
> to the cache miss issues that Eric was referring to on NUMA)
> and the appended patch seems to fix it for non NUMA atleast.
>
> Appended patch gives a nice 1% perf improvement on non-NUMA platform
> with database workload.
>
> Please comment or Ack for mainline :)
I have one comment :)
> @@ -1394,6 +1394,9 @@ void __init kmem_cache_init(void)
> int order;
> int node;
>
> + if (num_online_nodes() == 1)
> + use_alien_caches = 0;
> +
Unfortunatly this part is wrong.
You should check num_possible_nodes(), or nr_node_ids (this one is cheaper,
its a variable instead of a function call)
I wonder if we could add a new SLAB_NUMA_BYPASS, so that we can declare some
kmem_cache as non NUMA aware (for example, I feel network skb dont need the
NUMA overhead)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 22:12 ` Eric Dumazet
@ 2007-03-22 22:40 ` Siddha, Suresh B
2007-03-22 22:56 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Siddha, Suresh B @ 2007-03-22 22:40 UTC (permalink / raw)
To: Eric Dumazet
Cc: Siddha, Suresh B, Christoph Lameter, Andi Kleen, Andrew Morton,
linux kernel
On Thu, Mar 22, 2007 at 11:12:39PM +0100, Eric Dumazet wrote:
> Siddha, Suresh B a écrit :
> >+ if (num_online_nodes() == 1)
> >+ use_alien_caches = 0;
> >+
>
> Unfortunatly this part is wrong.
oops.
>
> You should check num_possible_nodes(), or nr_node_ids (this one is cheaper,
> its a variable instead of a function call)
But that is based on compile time option, isn't it? Perhaps I need
to use some other mechanism to find out the platform is not NUMA capable..
thanks,
suresh
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 22:40 ` Siddha, Suresh B
@ 2007-03-22 22:56 ` Eric Dumazet
2007-03-23 1:25 ` Christoph Lameter
2007-03-23 14:12 ` Andi Kleen
2 siblings, 0 replies; 31+ messages in thread
From: Eric Dumazet @ 2007-03-22 22:56 UTC (permalink / raw)
To: Siddha, Suresh B
Cc: Christoph Lameter, Andi Kleen, Andrew Morton, linux kernel
Siddha, Suresh B a écrit :
> On Thu, Mar 22, 2007 at 11:12:39PM +0100, Eric Dumazet wrote:
>> Siddha, Suresh B a écrit :
>>> + if (num_online_nodes() == 1)
>>> + use_alien_caches = 0;
>>> +
>> Unfortunatly this part is wrong.
>
> oops.
>
>> You should check num_possible_nodes(), or nr_node_ids (this one is cheaper,
>> its a variable instead of a function call)
>
> But that is based on compile time option, isn't it? Perhaps I need
> to use some other mechanism to find out the platform is not NUMA capable..
nr_node_ids is defined to 1 if you compile a non NUMA kernel.
If CONFIG_NUMA is on, then nr_node_ids is a variable, that is filled with the
maximum nodeid of possible node (+1). If your machine is not CPU hot plug
capable, and you have say one node, (one dual core processor for example),
then nr_node_ids will be set to 1
(see mm/page_alloc.c function setup_nr_node_ids() )
So this is OK for your need...
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 22:40 ` Siddha, Suresh B
2007-03-22 22:56 ` Eric Dumazet
@ 2007-03-23 1:25 ` Christoph Lameter
2007-03-23 14:14 ` Andi Kleen
2007-03-23 14:12 ` Andi Kleen
2 siblings, 1 reply; 31+ messages in thread
From: Christoph Lameter @ 2007-03-23 1:25 UTC (permalink / raw)
To: Siddha, Suresh B; +Cc: Eric Dumazet, Andi Kleen, Andrew Morton, linux kernel
On Thu, 22 Mar 2007, Siddha, Suresh B wrote:
> > You should check num_possible_nodes(), or nr_node_ids (this one is cheaper,
> > its a variable instead of a function call)
>
> But that is based on compile time option, isn't it? Perhaps I need
> to use some other mechanism to find out the platform is not NUMA capable..
No its runtime.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-22 22:40 ` Siddha, Suresh B
2007-03-22 22:56 ` Eric Dumazet
2007-03-23 1:25 ` Christoph Lameter
@ 2007-03-23 14:12 ` Andi Kleen
2007-04-02 22:55 ` Siddha, Suresh B
2 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-03-23 14:12 UTC (permalink / raw)
To: Siddha, Suresh B
Cc: Eric Dumazet, Christoph Lameter, Andi Kleen, Andrew Morton, linux kernel
> But that is based on compile time option, isn't it? Perhaps I need
> to use some other mechanism to find out the platform is not NUMA capable..
We can probably make it runtime on x86. That will be needed sooner or
later for correct NUMA hotplug support anyways.
-Andi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-23 1:25 ` Christoph Lameter
@ 2007-03-23 14:14 ` Andi Kleen
0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2007-03-23 14:14 UTC (permalink / raw)
To: Christoph Lameter
Cc: Siddha, Suresh B, Eric Dumazet, Andi Kleen, Andrew Morton, linux kernel
On Thu, Mar 22, 2007 at 06:25:16PM -0700, Christoph Lameter wrote:
> On Thu, 22 Mar 2007, Siddha, Suresh B wrote:
>
> > > You should check num_possible_nodes(), or nr_node_ids (this one is cheaper,
> > > its a variable instead of a function call)
> >
> > But that is based on compile time option, isn't it? Perhaps I need
> > to use some other mechanism to find out the platform is not NUMA capable..
>
> No its runtime.
I don't see any code that would ever change the mask from the compile
default. But that is easy to fix.
-Andi
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-03-23 14:12 ` Andi Kleen
@ 2007-04-02 22:55 ` Siddha, Suresh B
2007-04-03 0:23 ` Christoph Lameter
0 siblings, 1 reply; 31+ messages in thread
From: Siddha, Suresh B @ 2007-04-02 22:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Siddha, Suresh B, Eric Dumazet, Christoph Lameter, Andrew Morton,
linux kernel
On Fri, Mar 23, 2007 at 03:12:10PM +0100, Andi Kleen wrote:
> > But that is based on compile time option, isn't it? Perhaps I need
> > to use some other mechanism to find out the platform is not NUMA capable..
>
> We can probably make it runtime on x86. That will be needed sooner or
> later for correct NUMA hotplug support anyways.
How about this patch? Thanks.
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
[patch] x86_64: set node_possible_map at runtime.
Set the node_possible_map at runtime. On a non NUMA system,
num_possible_nodes() will now say '1'
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
diff --git a/arch/x86_64/mm/k8topology.c b/arch/x86_64/mm/k8topology.c
index b5b8dba..d6f4447 100644
--- a/arch/x86_64/mm/k8topology.c
+++ b/arch/x86_64/mm/k8topology.c
@@ -49,11 +49,8 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
int found = 0;
u32 reg;
unsigned numnodes;
- nodemask_t nodes_parsed;
unsigned dualcore = 0;
- nodes_clear(nodes_parsed);
-
if (!early_pci_allowed())
return -1;
@@ -102,7 +99,7 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
nodeid, (base>>8)&3, (limit>>8) & 3);
return -1;
}
- if (node_isset(nodeid, nodes_parsed)) {
+ if (node_isset(nodeid, node_possible_map)) {
printk(KERN_INFO "Node %d already present. Skipping\n",
nodeid);
continue;
@@ -155,7 +152,7 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
prevbase = base;
- node_set(nodeid, nodes_parsed);
+ node_set(nodeid, node_possible_map);
}
if (!found)
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 41b8fb0..5f7d4d8 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -383,6 +383,7 @@ static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
i,
nodes[i].start, nodes[i].end,
(nodes[i].end - nodes[i].start) >> 20);
+ node_set(i, node_possible_map);
node_set_online(i);
}
memnode_shift = compute_hash_shift(nodes, numa_fake);
@@ -405,6 +406,8 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
{
int i;
+ nodes_clear(node_possible_map);
+
#ifdef CONFIG_NUMA_EMU
if (numa_fake && !numa_emulation(start_pfn, end_pfn))
return;
@@ -432,6 +435,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
memnodemap[0] = 0;
nodes_clear(node_online_map);
node_set_online(0);
+ node_set(0, node_possible_map);
for (i = 0; i < NR_CPUS; i++)
numa_set_node(i, 0);
node_to_cpumask[0] = cpumask_of_cpu(0);
diff --git a/arch/x86_64/mm/srat.c b/arch/x86_64/mm/srat.c
index 2efe215..9f26e2b 100644
--- a/arch/x86_64/mm/srat.c
+++ b/arch/x86_64/mm/srat.c
@@ -25,7 +25,6 @@ int acpi_numa __initdata;
static struct acpi_table_slit *acpi_slit;
-static nodemask_t nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];
static int found_add_area __initdata;
@@ -43,7 +42,7 @@ static __init int setup_node(int pxm)
static __init int conflicting_nodes(unsigned long start, unsigned long end)
{
int i;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, node_possible_map) {
struct bootnode *nd = &nodes[i];
if (nd->start == nd->end)
continue;
@@ -321,7 +320,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
}
nd = &nodes[node];
oldnode = *nd;
- if (!node_test_and_set(node, nodes_parsed)) {
+ if (!node_test_and_set(node, node_possible_map)) {
nd->start = start;
nd->end = end;
} else {
@@ -344,7 +343,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
printk(KERN_NOTICE "SRAT: Hotplug region ignored\n");
*nd = oldnode;
if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
+ node_clear(node, node_possible_map);
}
}
@@ -356,7 +355,7 @@ static int nodes_cover_memory(void)
unsigned long pxmram, e820ram;
pxmram = 0;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, node_possible_map) {
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
@@ -380,7 +379,7 @@ static int nodes_cover_memory(void)
static void unparse_node(int node)
{
int i;
- node_clear(node, nodes_parsed);
+ node_clear(node, node_possible_map);
for (i = 0; i < MAX_LOCAL_APIC; i++) {
if (apicid_to_node[i] == node)
apicid_to_node[i] = NUMA_NO_NODE;
@@ -420,18 +419,18 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}
/* Finally register nodes */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
/* Try again in case setup_node_bootmem missed one due
to missing bootmem */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
if (!node_online(i))
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
for (i = 0; i < NR_CPUS; i++) {
if (cpu_to_node[i] == NUMA_NO_NODE)
continue;
- if (!node_isset(cpu_to_node[i], nodes_parsed))
+ if (!node_isset(cpu_to_node[i], node_possible_map))
numa_set_node(i, NUMA_NO_NODE);
}
numa_init_array();
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-04-02 22:55 ` Siddha, Suresh B
@ 2007-04-03 0:23 ` Christoph Lameter
2007-04-03 0:31 ` Siddha, Suresh B
2007-04-09 18:01 ` [patch 1/2] x86_64: set node_possible_map at runtime Siddha, Suresh B
0 siblings, 2 replies; 31+ messages in thread
From: Christoph Lameter @ 2007-04-03 0:23 UTC (permalink / raw)
To: Siddha, Suresh B; +Cc: Andi Kleen, Eric Dumazet, Andrew Morton, linux kernel
On Mon, 2 Apr 2007, Siddha, Suresh B wrote:
> Set the node_possible_map at runtime. On a non NUMA system,
> num_possible_nodes() will now say '1'
How does this relate to nr_node_ids?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;)
2007-04-03 0:23 ` Christoph Lameter
@ 2007-04-03 0:31 ` Siddha, Suresh B
2007-04-09 18:01 ` [patch 1/2] x86_64: set node_possible_map at runtime Siddha, Suresh B
1 sibling, 0 replies; 31+ messages in thread
From: Siddha, Suresh B @ 2007-04-03 0:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: Siddha, Suresh B, Andi Kleen, Eric Dumazet, Andrew Morton, linux kernel
On Mon, Apr 02, 2007 at 05:23:20PM -0700, Christoph Lameter wrote:
> On Mon, 2 Apr 2007, Siddha, Suresh B wrote:
>
> > Set the node_possible_map at runtime. On a non NUMA system,
> > num_possible_nodes() will now say '1'
>
> How does this relate to nr_node_ids?
With this patch, nr_node_ids on non NUMA will also be '1' and
as before nr_node_ids is same as num_possible_nodes()
thanks,
suresh
^ permalink raw reply [flat|nested] 31+ messages in thread
* [patch 1/2] x86_64: set node_possible_map at runtime.
2007-04-03 0:23 ` Christoph Lameter
2007-04-03 0:31 ` Siddha, Suresh B
@ 2007-04-09 18:01 ` Siddha, Suresh B
2007-04-09 18:07 ` [patch 2/2] slab, x86_64: skip cache_free_alien() on non NUMA Siddha, Suresh B
2007-04-09 20:23 ` [patch 1/2] x86_64: set node_possible_map at runtime Andrew Morton
1 sibling, 2 replies; 31+ messages in thread
From: Siddha, Suresh B @ 2007-04-09 18:01 UTC (permalink / raw)
To: akpm; +Cc: christoph, Andi Kleen, Eric Dumazet, linux kernel, suresh.b.siddha
Set the node_possible_map at runtime on x86_64. On a non NUMA system,
num_possible_nodes() will now say '1'.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
diff --git a/arch/x86_64/mm/k8topology.c b/arch/x86_64/mm/k8topology.c
index b5b8dba..d6f4447 100644
--- a/arch/x86_64/mm/k8topology.c
+++ b/arch/x86_64/mm/k8topology.c
@@ -49,11 +49,8 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
int found = 0;
u32 reg;
unsigned numnodes;
- nodemask_t nodes_parsed;
unsigned dualcore = 0;
- nodes_clear(nodes_parsed);
-
if (!early_pci_allowed())
return -1;
@@ -102,7 +99,7 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
nodeid, (base>>8)&3, (limit>>8) & 3);
return -1;
}
- if (node_isset(nodeid, nodes_parsed)) {
+ if (node_isset(nodeid, node_possible_map)) {
printk(KERN_INFO "Node %d already present. Skipping\n",
nodeid);
continue;
@@ -155,7 +152,7 @@ int __init k8_scan_nodes(unsigned long start, unsigned long end)
prevbase = base;
- node_set(nodeid, nodes_parsed);
+ node_set(nodeid, node_possible_map);
}
if (!found)
diff --git a/arch/x86_64/mm/numa.c b/arch/x86_64/mm/numa.c
index 41b8fb0..5f7d4d8 100644
--- a/arch/x86_64/mm/numa.c
+++ b/arch/x86_64/mm/numa.c
@@ -383,6 +383,7 @@ static int __init numa_emulation(unsigned long start_pfn, unsigned long end_pfn)
i,
nodes[i].start, nodes[i].end,
(nodes[i].end - nodes[i].start) >> 20);
+ node_set(i, node_possible_map);
node_set_online(i);
}
memnode_shift = compute_hash_shift(nodes, numa_fake);
@@ -405,6 +406,8 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
{
int i;
+ nodes_clear(node_possible_map);
+
#ifdef CONFIG_NUMA_EMU
if (numa_fake && !numa_emulation(start_pfn, end_pfn))
return;
@@ -432,6 +435,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
memnodemap[0] = 0;
nodes_clear(node_online_map);
node_set_online(0);
+ node_set(0, node_possible_map);
for (i = 0; i < NR_CPUS; i++)
numa_set_node(i, 0);
node_to_cpumask[0] = cpumask_of_cpu(0);
diff --git a/arch/x86_64/mm/srat.c b/arch/x86_64/mm/srat.c
index 2efe215..9f26e2b 100644
--- a/arch/x86_64/mm/srat.c
+++ b/arch/x86_64/mm/srat.c
@@ -25,7 +25,6 @@ int acpi_numa __initdata;
static struct acpi_table_slit *acpi_slit;
-static nodemask_t nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];
static int found_add_area __initdata;
@@ -43,7 +42,7 @@ static __init int setup_node(int pxm)
static __init int conflicting_nodes(unsigned long start, unsigned long end)
{
int i;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, node_possible_map) {
struct bootnode *nd = &nodes[i];
if (nd->start == nd->end)
continue;
@@ -321,7 +320,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
}
nd = &nodes[node];
oldnode = *nd;
- if (!node_test_and_set(node, nodes_parsed)) {
+ if (!node_test_and_set(node, node_possible_map)) {
nd->start = start;
nd->end = end;
} else {
@@ -344,7 +343,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
printk(KERN_NOTICE "SRAT: Hotplug region ignored\n");
*nd = oldnode;
if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
+ node_clear(node, node_possible_map);
}
}
@@ -356,7 +355,7 @@ static int nodes_cover_memory(void)
unsigned long pxmram, e820ram;
pxmram = 0;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, node_possible_map) {
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
@@ -380,7 +379,7 @@ static int nodes_cover_memory(void)
static void unparse_node(int node)
{
int i;
- node_clear(node, nodes_parsed);
+ node_clear(node, node_possible_map);
for (i = 0; i < MAX_LOCAL_APIC; i++) {
if (apicid_to_node[i] == node)
apicid_to_node[i] = NUMA_NO_NODE;
@@ -420,18 +419,18 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}
/* Finally register nodes */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
/* Try again in case setup_node_bootmem missed one due
to missing bootmem */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
if (!node_online(i))
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
for (i = 0; i < NR_CPUS; i++) {
if (cpu_to_node[i] == NUMA_NO_NODE)
continue;
- if (!node_isset(cpu_to_node[i], nodes_parsed))
+ if (!node_isset(cpu_to_node[i], node_possible_map))
numa_set_node(i, NUMA_NO_NODE);
}
numa_init_array();
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [patch 2/2] slab, x86_64: skip cache_free_alien() on non NUMA
2007-04-09 18:01 ` [patch 1/2] x86_64: set node_possible_map at runtime Siddha, Suresh B
@ 2007-04-09 18:07 ` Siddha, Suresh B
2007-04-09 20:23 ` [patch 1/2] x86_64: set node_possible_map at runtime Andrew Morton
1 sibling, 0 replies; 31+ messages in thread
From: Siddha, Suresh B @ 2007-04-09 18:07 UTC (permalink / raw)
To: akpm; +Cc: christoph, Andi Kleen, Eric Dumazet, linux kernel, suresh.b.siddha
set use_alien_caches to 0 on non NUMA platforms. And avoid calling
the cache_free_alien() when use_alien_caches is not set. This will avoid
the cache miss that happens while dereferencing slabp to get nodeid.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
diff --git a/mm/slab.c b/mm/slab.c
index 8fdaffa..146082d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1146,7 +1146,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
* Make sure we are not freeing a object from another node to the array
* cache on this cpu.
*/
- if (likely(slabp->nodeid == node) || unlikely(!use_alien_caches))
+ if (likely(slabp->nodeid == node))
return 0;
l3 = cachep->nodelists[node];
@@ -1394,6 +1394,9 @@ void __init kmem_cache_init(void)
int order;
int node;
+ if (num_possible_nodes() == 1)
+ use_alien_caches = 0;
+
for (i = 0; i < NUM_INIT_LISTS; i++) {
kmem_list3_init(&initkmem_list3[i]);
if (i < MAX_NUMNODES)
@@ -3563,7 +3566,7 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp)
check_irq_off();
objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
- if (cache_free_alien(cachep, objp))
+ if (use_alien_caches && cache_free_alien(cachep, objp))
return;
if (likely(ac->avail < ac->limit)) {
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [patch 1/2] x86_64: set node_possible_map at runtime.
2007-04-09 18:01 ` [patch 1/2] x86_64: set node_possible_map at runtime Siddha, Suresh B
2007-04-09 18:07 ` [patch 2/2] slab, x86_64: skip cache_free_alien() on non NUMA Siddha, Suresh B
@ 2007-04-09 20:23 ` Andrew Morton
1 sibling, 0 replies; 31+ messages in thread
From: Andrew Morton @ 2007-04-09 20:23 UTC (permalink / raw)
To: Siddha, Suresh B, David Rientjes
Cc: christoph, Andi Kleen, Eric Dumazet, linux kernel
On Mon, 9 Apr 2007 11:01:45 -0700
"Siddha, Suresh B" <suresh.b.siddha@intel.com> wrote:
> Set the node_possible_map at runtime on x86_64. On a non NUMA system,
> num_possible_nodes() will now say '1'.
>
I had to rework this a bit in the numa-emulation case due to changes in
Andi's tree. The change to numa_emulation() got moved into
setup_node_range().
We have 166 x86 patches pending at present - the chances of a patch against
mainline applying and working against the development tree are reduced.
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2007-04-09 20:24 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-20 17:12 [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Eric Dumazet
2007-03-20 19:54 ` Christoph Lameter
2007-03-20 21:32 ` Andi Kleen
2007-03-20 22:09 ` Eric Dumazet
2007-03-21 0:16 ` Christoph Lameter
2007-03-21 6:27 ` Eric Dumazet
2007-03-21 6:57 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Eric Dumazet
2007-03-21 7:21 ` [PATCH] SLAB : Dont allocate empty shared caches Eric Dumazet
2007-03-21 13:13 ` Pekka Enberg
2007-03-21 13:02 ` [PATCH] SLAB : Use num_possible_cpus() in enable_cpucache() Pekka Enberg
2007-03-21 18:45 ` Christoph Lameter
2007-03-21 7:03 ` [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid; Christoph Lameter
2007-03-21 7:14 ` Eric Dumazet
2007-03-21 14:35 ` Christoph Lameter
2007-03-21 0:18 ` Christoph Lameter
2007-03-21 2:44 ` Andi Kleen
2007-03-21 3:10 ` Christoph Lameter
2007-03-22 21:28 ` non-NUMA cache_free_alien() (was Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;) Siddha, Suresh B
2007-03-22 22:10 ` Christoph Lameter
2007-03-22 22:12 ` Eric Dumazet
2007-03-22 22:40 ` Siddha, Suresh B
2007-03-22 22:56 ` Eric Dumazet
2007-03-23 1:25 ` Christoph Lameter
2007-03-23 14:14 ` Andi Kleen
2007-03-23 14:12 ` Andi Kleen
2007-04-02 22:55 ` Siddha, Suresh B
2007-04-03 0:23 ` Christoph Lameter
2007-04-03 0:31 ` Siddha, Suresh B
2007-04-09 18:01 ` [patch 1/2] x86_64: set node_possible_map at runtime Siddha, Suresh B
2007-04-09 18:07 ` [patch 2/2] slab, x86_64: skip cache_free_alien() on non NUMA Siddha, Suresh B
2007-04-09 20:23 ` [patch 1/2] x86_64: set node_possible_map at runtime Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).