LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] improved locking performance in rt_run_flush()
@ 2007-05-12 16:36 Dave Johnson
  2007-05-14 10:04 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Johnson @ 2007-05-12 16:36 UTC (permalink / raw)
  To: linux-kernel, netdev


While testing adding/deleting large numbers of interfaces, I found
rt_run_flush() was the #1 cpu user in a kernel profile by far.

The below patch changes rt_run_flush() to only take each spinlock
protecting the rt_hash_table once instead of taking a spinlock for
every hash table bucket (and ending up taking the same small set 
of locks over and over).

Deleting 256 interfaces on a 4-way SMP system with 16K buckets reduced
overall cpu-time more than 50% and reduced wall-time about 33%.  I
suspect systems with large amounts of memory (and more buckets) will
see an even greater benefit.

Note there is a small change in that rt_free() is called while the
lock is held where before it was called without the lock held.  I
don't think this should be an issue.

Signed-off-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

===== net/ipv4/route.c 1.162 vs edited =====
--- 1.162/net/ipv4/route.c	2007-02-14 11:09:54 -05:00
+++ edited/net/ipv4/route.c	2007-05-04 17:53:33 -04:00
@@ -237,9 +237,17 @@
 		for (i = 0; i < RT_HASH_LOCK_SZ; i++) \
 			spin_lock_init(&rt_hash_locks[i]); \
 		}
+# define rt_hash_lock_for_each(lockaddr) \
+	for (lockaddr = rt_hash_lock_addr(RT_HASH_LOCK_SZ-1); lockaddr >= rt_hash_locks; lockaddr--)
+# define rt_hash_lock_for_each_item(locknum, slot) \
+	for (slot = locknum-rt_hash_locks; slot <= rt_hash_mask; slot += RT_HASH_LOCK_SZ)
 #else
 # define rt_hash_lock_addr(slot) NULL
 # define rt_hash_lock_init()
+# define rt_hash_lock_for_each(lockaddr) \
+	lockaddr = NULL;
+# define rt_hash_lock_for_each_item(locknum, slot) \
+	for (slot = rt_hash_mask; slot >= 0; slot--)
 #endif
 
 static struct rt_hash_bucket 	*rt_hash_table;
@@ -691,23 +699,26 @@
 static void rt_run_flush(unsigned long dummy)
 {
 	int i;
+	spinlock_t *lockaddr;
 	struct rtable *rth, *next;
 
 	rt_deadline = 0;
 
 	get_random_bytes(&rt_hash_rnd, 4);
 
-	for (i = rt_hash_mask; i >= 0; i--) {
-		spin_lock_bh(rt_hash_lock_addr(i));
-		rth = rt_hash_table[i].chain;
-		if (rth)
-			rt_hash_table[i].chain = NULL;
-		spin_unlock_bh(rt_hash_lock_addr(i));
-
-		for (; rth; rth = next) {
-			next = rth->u.dst.rt_next;
-			rt_free(rth);
+	rt_hash_lock_for_each(lockaddr) {
+		spin_lock_bh(lockaddr);
+		rt_hash_lock_for_each_item(lockaddr, i) {
+			rth = rt_hash_table[i].chain;
+			if (rth) {
+				rt_hash_table[i].chain = NULL;
+				for (; rth; rth = next) {
+					next = rth->u.dst.rt_next;
+					rt_free(rth);
+				}
+			}
 		}
+		spin_unlock_bh(lockaddr);
 	}
 }
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improved locking performance in rt_run_flush()
  2007-05-12 16:36 [PATCH] improved locking performance in rt_run_flush() Dave Johnson
@ 2007-05-14 10:04 ` David Miller
  2007-05-20  5:11   ` Herbert Xu
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2007-05-14 10:04 UTC (permalink / raw)
  To: djohnson+linux-kernel; +Cc: linux-kernel, netdev

From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Date: Sat, 12 May 2007 12:36:47 -0400

> 
> While testing adding/deleting large numbers of interfaces, I found
> rt_run_flush() was the #1 cpu user in a kernel profile by far.
> 
> The below patch changes rt_run_flush() to only take each spinlock
> protecting the rt_hash_table once instead of taking a spinlock for
> every hash table bucket (and ending up taking the same small set 
> of locks over and over).
> 
> Deleting 256 interfaces on a 4-way SMP system with 16K buckets reduced
> overall cpu-time more than 50% and reduced wall-time about 33%.  I
> suspect systems with large amounts of memory (and more buckets) will
> see an even greater benefit.
> 
> Note there is a small change in that rt_free() is called while the
> lock is held where before it was called without the lock held.  I
> don't think this should be an issue.
> 
> Signed-off-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

Thanks for this patch.

I'm not ignoring it I'm just trying to brainstorm whether there
is a better way to resolve this inefficiency. :-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improved locking performance in rt_run_flush()
  2007-05-14 10:04 ` David Miller
@ 2007-05-20  5:11   ` Herbert Xu
  2007-05-31 23:26     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Herbert Xu @ 2007-05-20  5:11 UTC (permalink / raw)
  To: David Miller; +Cc: djohnson+linux-kernel, linux-kernel, netdev

David Miller <davem@davemloft.net> wrote:
> From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
>> 
>> The below patch changes rt_run_flush() to only take each spinlock
>> protecting the rt_hash_table once instead of taking a spinlock for
>> every hash table bucket (and ending up taking the same small set 
>> of locks over and over).

...

> I'm not ignoring it I'm just trying to brainstorm whether there
> is a better way to resolve this inefficiency. :-)

The main problem I see with this is having to walk and free each
chain with the lock held.  We could avoid this if we had a pointer
in struct rtable to chain them up for freeing later.

I just checked and struct rtable is 236 bytes long on 32-bit but
the slab cache pads it to 256 bytes so we've got some free space.
I suspect 64-bit should be similar.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] improved locking performance in rt_run_flush()
  2007-05-20  5:11   ` Herbert Xu
@ 2007-05-31 23:26     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2007-05-31 23:26 UTC (permalink / raw)
  To: herbert; +Cc: djohnson+linux-kernel, linux-kernel, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 20 May 2007 15:11:48 +1000

> David Miller <davem@davemloft.net> wrote:
> > From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
> >> 
> >> The below patch changes rt_run_flush() to only take each spinlock
> >> protecting the rt_hash_table once instead of taking a spinlock for
> >> every hash table bucket (and ending up taking the same small set 
> >> of locks over and over).
> 
> ...
> 
> > I'm not ignoring it I'm just trying to brainstorm whether there
> > is a better way to resolve this inefficiency. :-)
> 
> The main problem I see with this is having to walk and free each
> chain with the lock held.  We could avoid this if we had a pointer
> in struct rtable to chain them up for freeing later.
> 
> I just checked and struct rtable is 236 bytes long on 32-bit but
> the slab cache pads it to 256 bytes so we've got some free space.
> I suspect 64-bit should be similar.

SLUB I believe packs more aggressively and won't pad things out like
that.  Therefore adding a member to rtable is much less attractive.

I've been considering various alternative ways to deal with this.

For 2.6.22 and -stable's sake we could allocate an array of pointers
of size N where N is the number of rtable hash slots per spinlock.
A big lock wraps around rt_run_flush() to protect these slots, and
then the loop is:

	grap_lock();
	for_each_hash_chain_for_lock(i) {
		rth = rt_hash_table[i].chain;
		if (rth) {
			rt_hash_table[i].chain = NULL;
			flush_chain[i % N] = rt;
		}
	}
	drop_lock();

	for (i = 0; i < N; i++) {
		struct rtable *rth = flush_chain[i];
		flush_chain[i] = NULL;
		while (rth) {
			struct rtable *next = rth->u.dst.rt_next;
			rt_free(rth);
			rth = next;
		}
	}

Holding a lock across the entire hash plucking has it's not nice
properties, but it's better than taking the same lock N times in
a row.

In the longer term, if I resurrect my dynamically sized rtable hash
patches (which I do intend to do), that code protected a lot of this
stuff with a seqlock and it might be possible to use that seqlock
solely to flush the lists in rt_run_flush().

Any better ideas?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-05-31 23:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-12 16:36 [PATCH] improved locking performance in rt_run_flush() Dave Johnson
2007-05-14 10:04 ` David Miller
2007-05-20  5:11   ` Herbert Xu
2007-05-31 23:26     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).