LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	Domenico Andreoli <cavokz@gmail.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, bunk@kernel.org,
	Eric Dumazet <dada1@cosmosbay.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [patch 08/27] IPV4 ROUTE: ip_rt_dump() is unecessary slow
Date: Fri, 1 Feb 2008 16:20:15 -0800	[thread overview]
Message-ID: <20080202002015.GI8368@suse.de> (raw)
In-Reply-To: <20080202001927.GA8368@suse.de>

[-- Attachment #1: ipv4-route-ip_rt_dump-is-unecessary-slow.patch --]
[-- Type: text/plain, Size: 4403 bytes --]

2.6.22-stable review patch.  If anyone has any objections, please let us
know.

------------------
From: Eric Dumazet <dada1@cosmosbay.com>

[IPV4] ROUTE: ip_rt_dump() is unecessary slow

[ Upstream commit: d8c9283089287341c85a0a69de32c2287a990e71 ]

I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache
is fetched quite fast :

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem,
removing quadratic behavior.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/ipv4/route.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2885,11 +2885,10 @@ int ip_rt_dump(struct sk_buff *skb,  str
 	int idx, s_idx;
 
 	s_h = cb->args[0];
+	if (s_h < 0)
+		s_h = 0;
 	s_idx = idx = cb->args[1];
-	for (h = 0; h <= rt_hash_mask; h++) {
-		if (h < s_h) continue;
-		if (h > s_h)
-			s_idx = 0;
+	for (h = s_h; h <= rt_hash_mask; h++) {
 		rcu_read_lock_bh();
 		for (rt = rcu_dereference(rt_hash_table[h].chain), idx = 0; rt;
 		     rt = rcu_dereference(rt->u.dst.rt_next), idx++) {
@@ -2906,6 +2905,7 @@ int ip_rt_dump(struct sk_buff *skb,  str
 			dst_release(xchg(&skb->dst, NULL));
 		}
 		rcu_read_unlock_bh();
+		s_idx = 0;
 	}
 
 done:

-- 

  parent reply	other threads:[~2008-02-02  0:26 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080202001232.472591439@mini.kroah.org>
2008-02-02  0:19 ` [patch 00/27] 2.6.22-stable review Greg KH
2008-02-02  0:19   ` [patch 01/27] X25: Add missing x25_neigh_put Greg KH
2008-02-02  0:19   ` [patch 02/27] SPARC64: Fix two kernel linear mapping setup bugs Greg KH
2008-02-02  0:20   ` [patch 03/27] SPARC64: Fix memory controller register access when non-SMP Greg KH
2008-02-02  0:20   ` [patch 04/27] NET: mcs7830 passes msecs instead of jiffies to usb_control_msg Greg KH
2008-02-02  0:20   ` [patch 05/27] NET: kaweth was forgotten in msec switchover of usb_start_wait_urb Greg KH
2008-02-02  0:20   ` [patch 06/27] NET: Correct two mistaken skb_reset_mac_header() conversions Greg KH
2008-02-02  0:20   ` [patch 07/27] IRDA: irda_create() nuke user triggable printk Greg KH
2008-02-02  0:20   ` Greg KH [this message]
2008-02-02  0:20   ` [patch 09/27] IPV4: ip_gre: set mac_header correctly in receive path Greg KH
2008-02-02  0:20   ` [patch 10/27] IPSEC: Fix potential dst leak in xfrm_lookup Greg KH
2008-02-02  0:20   ` [patch 11/27] IPSEC: Avoid undefined shift operation when testing algorithm ID Greg KH
2008-02-02  0:20   ` [patch 12/27] INET: Fix netdev renaming and inet address labels Greg KH
2008-02-02  0:20   ` [patch 13/27] : Fix sparc64 cpu cross call hangs Greg KH
2008-02-02  0:20   ` [patch 14/27] CONNECTOR: Dont touch queue dev after decrement of ref count Greg KH
2008-02-02  0:20   ` [patch 15/27] ATM: delay irq setup until card is configured Greg KH
2008-02-02  0:20   ` [patch 16/27] ATM: Check IP header validity in mpc_send_packet Greg KH
2008-02-02  0:20   ` [patch 17/27] CASSINI: Fix endianness bug Greg KH
2008-02-02  0:20   ` [patch 18/27] CASSINI: Revert dont touch page_count Greg KH
2008-02-02  0:20   ` [patch 19/27] CASSINI: Set skb->truesize properly on receive packets Greg KH
2008-02-02  0:20   ` [patch 20/27] ACPICA: fix acpi-cpufreq boot crash due to _PSD return-by-reference Greg KH
2008-02-02  0:20   ` [patch 21/27] vfs: coredumping fix (CVE-2007-6206) Greg KH
2008-02-02  0:21   ` [patch 22/27] quicklist: Set tlb->need_flush if pages are remaining in quicklist 0 Greg KH
2008-02-02  0:39     ` Christoph Lameter
2008-02-02  0:50       ` Greg KH
2008-02-02  1:28       ` Justin M. Forbes
2008-02-02  1:30         ` Christoph Lameter
2008-02-02  2:20           ` Justin M. Forbes
2008-02-06 22:45           ` Greg KH
2008-02-06 23:11             ` Oliver Pinter
2008-02-06 23:28               ` Christoph Lameter
2008-02-09 15:14               ` Fwd: " Oliver Pinter
2008-02-02  0:21   ` [patch 23/27] cxgb: fix T2 GSO Greg KH
2008-02-02  0:21   ` [patch 24/27] cxgb: fix stats Greg KH
2008-02-02  0:21   ` [patch 25/27] chelsio: Fix skb->dev setting Greg KH
2008-02-02  0:21   ` [patch 26/27] POWERPC: Fix invalid semicolon after if statement Greg KH
2008-02-02  0:21   ` [patch 27/27] ACPI: apply quirk_ich6_lpc_acpi to more ICH8 and ICH9 Greg KH
2008-02-02 18:55   ` [patch 00/27] 2.6.22-stable review Arkadiusz Miskiewicz
2008-02-04 17:30   ` Oliver Pinter (Pintér Olivér)
2008-02-04 17:59     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080202002015.GI8368@suse.de \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bunk@kernel.org \
    --cc=cavokz@gmail.com \
    --cc=cebbert@redhat.com \
    --cc=chuckw@quantumlinux.com \
    --cc=dada1@cosmosbay.com \
    --cc=davej@redhat.com \
    --cc=davem@davemloft.net \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrufky@linuxtv.org \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=zwane@arm.linux.org.uk \
    --subject='Re: [patch 08/27] IPV4 ROUTE: ip_rt_dump() is unecessary slow' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).