Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: DENG Qingfang <dqfext@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Woojung Huh <woojung.huh@microchip.com>,
	UNGLinuxDriver@microchip.com, Sean Wang <sean.wang@mediatek.com>,
	Landen Chao <Landen.Chao@mediatek.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	George McCollister <george.mccollister@gmail.com>
Subject: Re: [RFC PATCH net-next 2/4] net: dsa: remove the "dsa_to_port in a loop" antipattern from the core
Date: Wed, 11 Aug 2021 20:32:54 +0300	[thread overview]
Message-ID: <20210811173254.shwupnunaaoadpjb@skbuf> (raw)
In-Reply-To: <20210810170447.1517888-1-dqfext@gmail.com>

On Wed, Aug 11, 2021 at 01:04:47AM +0800, DENG Qingfang wrote:
> On Tue, Aug 10, 2021 at 07:35:33PM +0300, Vladimir Oltean wrote:
> > If I were to guess where Qingfang was hinting at, is that the receive
> > path now needs to iterate over a list, whereas before it simply indexed
> > an array:
> > 
> > static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
> > 						       int device, int port)
> > {
> > 	struct dsa_port *cpu_dp = dev->dsa_ptr;
> > 	struct dsa_switch_tree *dst = cpu_dp->dst;
> > 	struct dsa_port *dp;
> > 
> > 	list_for_each_entry(dp, &dst->ports, list)
> > 		if (dp->ds->index == device && dp->index == port &&
> > 		    dp->type == DSA_PORT_TYPE_USER)
> > 			return dp->slave;
> > 
> > 	return NULL;
> > }
> > 
> > I will try in the following days to make a prototype implementation of
> > converting back the linked list into an array and see if there is any
> > justifiable performance improvement.
> > 
> > [ even if this would make the "multiple CPU ports in LAG" implementation
> >   harder ]
> 
> Yes, you got my point.
> 
> There is RTL8390M series SoC, which has 52+ ports but a weak CPU (MIPS
> 34kc 700MHz). In that case the linear lookup time and the potential cache
> miss could make a difference.

Then I am not in a position to make relevant performance tests for that
scenario.

I have been testing with the following setup: an NXP LS1028A switch
(ocelot/felix driver) using IPv4 forwarding of 64 byte UDP datagrams
sent by a data generator. 2 ports at 1Gbps each, 100% port load. IP
forwarding takes place between 1 port and the other.
Generator port A sends from 192.168.100.1 to 192.168.200.1
Generator port B sends from 192.168.200.1 to 192.168.100.1

Flow control is enabled on all switch ports, the user ports and the CPU port
(I don't really have a setup that I can test in any meaningful way
without flow control).

The script I run on the board to set things up for IP forwarding is:

ip link set eno2 down && echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
ip link set swp0 address a0:00:00:00:00:02
ip link set swp1 address a0:00:00:00:00:04
for eth in swp0 swp1; do
	ip link set ${eth} up
done
ip addr add 192.168.100.2/24 dev swp0
ip addr add 192.168.200.2/24 dev swp1
echo 1 > /proc/sys/net/ipv4/ip_forward
arp -s 192.168.100.1 00:01:02:03:04:05 dev swp0
arp -s 192.168.200.1 00:01:02:03:04:06 dev swp1
ethtool --config-nfc eno2 flow-type ip4 dst-ip 192.168.200.1 action 0
ethtool --config-nfc eno2 flow-type ip4 dst-ip 192.168.100.1 action 1
ethtool -K eno2 gro on rx-gro-list on
ethtool -K swp0 gro on rx-gro-list on
ethtool -K swp1 gro on rx-gro-list on


The DSA patch I used on top of today's net-next was:

-----------------------------[ cut here ]-----------------------------
From 7733f643dd61431a93da5a8f5118848cdc037562 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Wed, 11 Aug 2021 00:27:07 +0300
Subject: [PATCH] net: dsa: setup a linear port cache for faster receive path

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  5 +++++
 net/dsa/dsa2.c     | 46 ++++++++++++++++++++++++++++++++++++++++++----
 net/dsa/dsa_priv.h |  9 ++++-----
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 3203b200cc38..2a9ea4f57910 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -153,9 +153,14 @@ struct dsa_switch_tree {
 	struct net_device **lags;
 	unsigned int lags_len;
 
+	struct dsa_port **port_cache;
+
 	/* Track the largest switch index within a tree */
 	unsigned int last_switch;
 
+	/* Track the largest port count in a switch within a tree */
+	unsigned int max_num_ports;
+
 	/* Track the bridges with forwarding offload enabled */
 	unsigned long fwd_offloading_bridges;
 };
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 0b7497dd60c3..3d2b92dbd603 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -941,6 +941,39 @@ static void dsa_tree_teardown_lags(struct dsa_switch_tree *dst)
 	kfree(dst->lags);
 }
 
+static int dsa_tree_setup_port_cache(struct dsa_switch_tree *dst)
+{
+	struct dsa_port *dp;
+
+	list_for_each_entry(dp, &dst->ports, list) {
+		if (dst->last_switch < dp->ds->index)
+			dst->last_switch = dp->ds->index;
+		if (dst->max_num_ports < dp->ds->num_ports)
+			dst->max_num_ports = dp->ds->num_ports;
+	}
+
+	dst->port_cache = kcalloc((dst->last_switch + 1) * dst->max_num_ports,
+				  sizeof(struct dsa_port *), GFP_KERNEL);
+	if (!dst->port_cache)
+		return -ENOMEM;
+
+	list_for_each_entry(dp, &dst->ports, list)
+		dst->port_cache[dp->ds->index * dst->max_num_ports + dp->index] = dp;
+
+	return 0;
+}
+
+static void dsa_tree_teardown_port_cache(struct dsa_switch_tree *dst)
+{
+	int i;
+
+	for (i = 0; i < dst->max_num_ports * dst->last_switch; i++)
+		dst->port_cache[i] = NULL;
+
+	kfree(dst->port_cache);
+	dst->port_cache = NULL;
+}
+
 static int dsa_tree_setup(struct dsa_switch_tree *dst)
 {
 	bool complete;
@@ -956,10 +989,14 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
 	if (!complete)
 		return 0;
 
-	err = dsa_tree_setup_cpu_ports(dst);
+	err = dsa_tree_setup_port_cache(dst);
 	if (err)
 		return err;
 
+	err = dsa_tree_setup_cpu_ports(dst);
+	if (err)
+		goto teardown_port_cache;
+
 	err = dsa_tree_setup_switches(dst);
 	if (err)
 		goto teardown_cpu_ports;
@@ -984,6 +1021,8 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
 	dsa_tree_teardown_switches(dst);
 teardown_cpu_ports:
 	dsa_tree_teardown_cpu_ports(dst);
+teardown_port_cache:
+	dsa_tree_teardown_port_cache(dst);
 
 	return err;
 }
@@ -1003,6 +1042,8 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
 
 	dsa_tree_teardown_cpu_ports(dst);
 
+	dsa_tree_teardown_port_cache(dst);
+
 	list_for_each_entry_safe(dl, next, &dst->rtable, list) {
 		list_del(&dl->list);
 		kfree(dl);
@@ -1301,9 +1342,6 @@ static int dsa_switch_parse_member_of(struct dsa_switch *ds,
 		return -EEXIST;
 	}
 
-	if (ds->dst->last_switch < ds->index)
-		ds->dst->last_switch = ds->index;
-
 	return 0;
 }
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 6310a15afe21..5c27f66fd62a 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -188,12 +188,11 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
 	struct dsa_switch_tree *dst = cpu_dp->dst;
 	struct dsa_port *dp;
 
-	list_for_each_entry(dp, &dst->ports, list)
-		if (dp->ds->index == device && dp->index == port &&
-		    dp->type == DSA_PORT_TYPE_USER)
-			return dp->slave;
+	dp = dst->port_cache[device * dst->max_num_ports + port];
+	if (!dp || dp->type != DSA_PORT_TYPE_USER)
+		return NULL;
 
-	return NULL;
+	return dp->slave;
 }
 
 /* netlink.c */
-----------------------------[ cut here ]-----------------------------

The results I got were:

Before the patch:

684 Kpps = 459 Mbps

perf record -e cycles -C 0 sleep 10 && perf report
    10.17%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_pci_remove
     6.13%  ksoftirqd/0      [kernel.kallsyms]  [k] eth_type_trans
     5.48%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_poll
     4.99%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_alloc
     4.56%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_gro_receive
     2.89%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_start_xmit
     2.77%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_flow_dissect
     2.75%  ksoftirqd/0      [kernel.kallsyms]  [k] __siphash_aligned
     2.55%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_core
     2.48%  ksoftirqd/0      [kernel.kallsyms]  [k] build_skb
     2.47%  ksoftirqd/0      [kernel.kallsyms]  [k] take_page_off_buddy
     2.01%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gro_receive
     1.86%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_slave_xmit
     1.76%  ksoftirqd/0      [kernel.kallsyms]  [k] __dev_queue_xmit
     1.68%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_zcopy_clear
     1.62%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_build_skb
     1.60%  ksoftirqd/0      [kernel.kallsyms]  [k] __build_skb_around
     1.50%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_hard_start_xmit
     1.49%  ksoftirqd/0      [kernel.kallsyms]  [k] sch_direct_xmit
     1.42%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_get_hash
     1.29%  ksoftirqd/0      [kernel.kallsyms]  [k] __local_bh_enable_ip
     1.26%  ksoftirqd/0      [kernel.kallsyms]  [k] udp_gro_receive
     1.23%  ksoftirqd/0      [kernel.kallsyms]  [k] udp4_gro_receive
     1.21%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_segment_list
     1.13%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_drivername
     1.05%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_shutdown
     1.05%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gso_segment
     1.01%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_switch_rcv
     0.98%  ksoftirqd/0      [kernel.kallsyms]  [k] ocelot_rcv
     0.91%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_gro_receive
     0.87%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_xmit
     0.85%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_free_bulk
     0.84%  ksoftirqd/0      [kernel.kallsyms]  [k] do_csum
     0.84%  ksoftirqd/0      [kernel.kallsyms]  [k] memmove
     0.80%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_rcv
     0.79%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_list_core
     0.78%  ksoftirqd/0      [kernel.kallsyms]  [k] netif_skb_features
     0.76%  ksoftirqd/0      [kernel.kallsyms]  [k] netif_receive_skb_list_internal
     0.76%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_all
     0.74%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_pick_tx

perf record -e cache-misses -C 0 sleep 10 && perf report
     7.22%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_zcopy_clear
     6.46%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gro_receive
     6.41%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_pci_remove
     6.20%  ksoftirqd/0      [kernel.kallsyms]  [k] take_page_off_buddy
     6.13%  ksoftirqd/0      [kernel.kallsyms]  [k] build_skb
     5.06%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gso_segment
     4.47%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_gro_receive
     4.28%  ksoftirqd/0      [kernel.kallsyms]  [k] memmove
     3.77%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_poll
     3.73%  ksoftirqd/0      [kernel.kallsyms]  [k] __copy_skb_header
     3.46%  ksoftirqd/0      [kernel.kallsyms]  [k] eth_type_trans
     3.06%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_all
     2.76%  ksoftirqd/0      [kernel.kallsyms]  [k] ip_send_check
     2.36%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_get_hash
     2.24%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_alloc
     2.23%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_core
     1.68%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_pick_tx
     1.56%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_segment_list
     1.55%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_slave_xmit
     1.54%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_headers_offset_update
     1.51%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_start_xmit
     1.48%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_core_pick_tx
     1.30%  ksoftirqd/0      [kernel.kallsyms]  [k] __dev_queue_xmit
     1.14%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_rcv
     1.05%  ksoftirqd/0      [kernel.kallsyms]  [k] __build_skb_around
     1.05%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_pull_rcsum
     1.03%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_xmit
     0.98%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_flow_dissect
     0.89%  ksoftirqd/0      [kernel.kallsyms]  [k] ocelot_xmit
     0.84%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_build_skb
     0.73%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_free_bulk
     0.63%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_refill_rx_ring
     0.55%  ksoftirqd/0      [kernel.kallsyms]  [k] netif_skb_features
     0.46%  ksoftirqd/0      [kernel.kallsyms]  [k] dma_unmap_page_attrs
     0.46%  ksoftirqd/0      [kernel.kallsyms]  [k] fib_table_lookup
     0.36%  ksoftirqd/0      [kernel.kallsyms]  [k] fib_lookup_good_nhc
     0.33%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_data
     0.33%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_tx_vid
     0.32%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_flip_rx_buff
     0.29%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_consume_skb

After the patch:

650 Kpps = 426 Mbps

perf record -e cycles -C 0 sleep 10 && perf report
     9.34%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_pci_remove
     7.70%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_poll
     5.49%  ksoftirqd/0      [kernel.kallsyms]  [k] eth_type_trans
     4.62%  ksoftirqd/0      [kernel.kallsyms]  [k] take_page_off_buddy
     4.55%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_alloc
     4.36%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_gro_receive
     3.22%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_zcopy_clear
     2.59%  ksoftirqd/0      [kernel.kallsyms]  [k] __siphash_aligned
     2.51%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_start_xmit
     2.37%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_flow_dissect
     2.20%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_core
     1.84%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_free_bulk
     1.69%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gro_receive
     1.65%  ksoftirqd/0      [kernel.kallsyms]  [k] __dev_queue_xmit
     1.63%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_slave_xmit
     1.60%  ksoftirqd/0      [kernel.kallsyms]  [k] build_skb
     1.45%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_build_skb
     1.41%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_hard_start_xmit
     1.39%  ksoftirqd/0      [kernel.kallsyms]  [k] sch_direct_xmit
     1.39%  ksoftirqd/0      [kernel.kallsyms]  [k] __build_skb_around
     1.28%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_get_hash
     1.16%  ksoftirqd/0      [kernel.kallsyms]  [k] __local_bh_enable_ip
     1.14%  ksoftirqd/0      [kernel.kallsyms]  [k] udp4_gro_receive
     1.12%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_all
     1.09%  ksoftirqd/0      [kernel.kallsyms]  [k] ocelot_rcv
     1.08%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_drivername
     1.08%  ksoftirqd/0      [kernel.kallsyms]  [k] dma_unmap_page_attrs
     1.08%  ksoftirqd/0      [kernel.kallsyms]  [k] udp_gro_receive
     1.05%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_shutdown
     1.04%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_consume_skb
     1.03%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_segment_list
     0.90%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_gro_receive
     0.86%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gso_segment
     0.83%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_switch_rcv
     0.77%  ksoftirqd/0      [kernel.kallsyms]  [k] memmove
     0.75%  ksoftirqd/0      [kernel.kallsyms]  [k] do_csum
     0.73%  ksoftirqd/0      [kernel.kallsyms]  [k] netif_skb_features
     0.71%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_rcv
     0.69%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_list_core
     0.67%  ksoftirqd/0      [kernel.kallsyms]  [k] netif_receive_skb_list_internal

perf record -e cache-misses -C 0 sleep 10 && perf report
    12.38%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_zcopy_clear
     9.34%  ksoftirqd/0      [kernel.kallsyms]  [k] take_page_off_buddy
     8.62%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_pci_remove
     5.61%  ksoftirqd/0      [kernel.kallsyms]  [k] memmove
     5.44%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gro_receive
     4.61%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_poll
     4.20%  ksoftirqd/0      [kernel.kallsyms]  [k] inet_gso_segment
     3.58%  ksoftirqd/0      [kernel.kallsyms]  [k] dev_gro_receive
     3.19%  ksoftirqd/0      [kernel.kallsyms]  [k] build_skb
     3.11%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_all
     2.80%  ksoftirqd/0      [kernel.kallsyms]  [k] __copy_skb_header
     2.79%  ksoftirqd/0      [kernel.kallsyms]  [k] eth_type_trans
     2.31%  ksoftirqd/0      [kernel.kallsyms]  [k] ip_send_check
     1.95%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_get_hash
     1.64%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_alloc
     1.54%  ksoftirqd/0      [kernel.kallsyms]  [k] __netif_receive_skb_core
     1.52%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_slave_xmit
     1.51%  ksoftirqd/0      [kernel.kallsyms]  [k] kmem_cache_free_bulk
     1.49%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_pick_tx
     1.42%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_headers_offset_update
     1.34%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_segment_list
     1.20%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_build_skb
     1.19%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_start_xmit
     1.09%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_xmit
     0.94%  ksoftirqd/0      [kernel.kallsyms]  [k] netdev_core_pick_tx
     0.90%  ksoftirqd/0      [kernel.kallsyms]  [k] __dev_queue_xmit
     0.87%  ksoftirqd/0      [kernel.kallsyms]  [k] ocelot_xmit
     0.85%  ksoftirqd/0      [kernel.kallsyms]  [k] __skb_flow_dissect
     0.68%  ksoftirqd/0      [kernel.kallsyms]  [k] dsa_8021q_rcv
     0.63%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_pull_rcsum
     0.63%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_flip_rx_buff
     0.61%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_consume_skb
     0.50%  ksoftirqd/0      [kernel.kallsyms]  [k] skb_release_data
     0.48%  ksoftirqd/0      [kernel.kallsyms]  [k] dma_unmap_page_attrs
     0.41%  ksoftirqd/0      [kernel.kallsyms]  [k] enetc_refill_rx_ring
     0.37%  ksoftirqd/0      [kernel.kallsyms]  [k] __build_skb_around
     0.33%  ksoftirqd/0      [kernel.kallsyms]  [k] fib_table_lookup
     0.33%  ksoftirqd/0      [kernel.kallsyms]  [k] napi_skb_cache_put
     0.28%  ksoftirqd/0      [kernel.kallsyms]  [k] gro_cells_receive
     0.26%  ksoftirqd/0      [kernel.kallsyms]  [k] bpf_skb_load_helper_16

So the performance seems to be slightly worse with the patch, for
reasons that are not immediately apparent from perf, as far as I can
tell. If I just look at dsa_switch_rcv, I see there is a decrease in CPU
cycles from 1.01% to 0.83%, but that doesn't reflect in the throughput I
see for some reason.

(also please ignore some oddities in the perf reports like
"enetc_pci_remove", this comes from enetc_lock_mdio/enetc_unlock_mdio, I
have no idea why it gets printed like that)

So yeah, I cannot actually change my setup such that the list iteration
is more expensive than this (swp0 is the first element, swp1 is the second).

  reply	other threads:[~2021-08-11 17:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-09 19:03 [RFC PATCH net-next 0/4] Remove the "dsa_to_port in a loop" antipattern Vladimir Oltean
2021-08-09 19:03 ` [RFC PATCH net-next 1/4] net: dsa: introduce a dsa_port_is_unused helper Vladimir Oltean
2021-08-10  9:34   ` Florian Fainelli
2021-08-09 19:03 ` [RFC PATCH net-next 2/4] net: dsa: remove the "dsa_to_port in a loop" antipattern from the core Vladimir Oltean
2021-08-10  3:33   ` DENG Qingfang
2021-08-10  9:41     ` Florian Fainelli
2021-08-10 11:35       ` Vladimir Oltean
2021-08-10 16:35         ` Vladimir Oltean
2021-08-10 17:04           ` DENG Qingfang
2021-08-11 17:32             ` Vladimir Oltean [this message]
2021-08-10  9:37   ` Florian Fainelli
2021-08-09 19:03 ` [RFC PATCH net-next 3/4] net: dsa: remove the "dsa_to_port in a loop" antipattern from drivers Vladimir Oltean
2021-08-09 19:03 ` [RFC PATCH net-next 4/4] net: dsa: b53: express b53_for_each_port in terms of dsa_switch_for_each_port Vladimir Oltean
2021-08-10  9:39   ` Florian Fainelli
2021-08-10 13:14     ` Vladimir Oltean
2021-08-09 19:31 ` [RFC PATCH net-next 0/4] Remove the "dsa_to_port in a loop" antipattern Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210811173254.shwupnunaaoadpjb@skbuf \
    --to=olteanv@gmail.com \
    --cc=Landen.Chao@mediatek.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=george.mccollister@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=matthias.bgg@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sean.wang@mediatek.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=woojung.huh@microchip.com \
    --subject='Re: [RFC PATCH net-next 2/4] net: dsa: remove the "dsa_to_port in a loop" antipattern from the core' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).