LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs
@ 2021-09-16 15:44 Kai-Heng Feng
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 1/3] PCI/ASPM: Introduce a new helper to report ASPM capability Kai-Heng Feng
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-16 15:44 UTC (permalink / raw)
  To: hkallweit1, nic_swsd, bhelgaas
  Cc: davem, kuba, anthony.wong, netdev, linux-pci, linux-kernel,
	Kai-Heng Feng

The purpose of the series is to get comments and reviews so we can merge
and test the series in downstream kernel.

The latest Realtek vendor driver and its Windows driver implements a
feature called "dynamic ASPM" which can improve performance on it's
ethernet NICs.

Heiner Kallweit pointed out the potential root cause can be that the
buffer is to small for its ASPM exit latency.

So bring the dynamic ASPM to r8169 so we can have both nice performance
and powersaving at the same time.

For the slow/fast alternating traffic pattern, we'll need some real
world test to know if we need to lower the dynamic ASPM interval.

v4:
https://lore.kernel.org/netdev/20210827171452.217123-1-kai.heng.feng@canonical.com/

v3:
https://lore.kernel.org/netdev/20210819054542.608745-1-kai.heng.feng@canonical.com/

v2:
https://lore.kernel.org/netdev/20210812155341.817031-1-kai.heng.feng@canonical.com/

v1:
https://lore.kernel.org/netdev/20210803152823.515849-1-kai.heng.feng@canonical.com/

Kai-Heng Feng (3):
  PCI/ASPM: Introduce a new helper to report ASPM capability
  r8169: Use PCIe ASPM status for NIC ASPM enablement
  r8169: Implement dynamic ASPM mechanism

 drivers/net/ethernet/realtek/r8169_main.c | 69 ++++++++++++++++++++---
 drivers/pci/pcie/aspm.c                   | 11 ++++
 include/linux/pci.h                       |  2 +
 3 files changed, 73 insertions(+), 9 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC] [PATCH net-next v5 1/3] PCI/ASPM: Introduce a new helper to report ASPM capability
  2021-09-16 15:44 [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Kai-Heng Feng
@ 2021-09-16 15:44 ` Kai-Heng Feng
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement Kai-Heng Feng
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-16 15:44 UTC (permalink / raw)
  To: hkallweit1, nic_swsd, bhelgaas
  Cc: davem, kuba, anthony.wong, netdev, linux-pci, linux-kernel,
	Kai-Heng Feng, Saheed O. Bolarinwa, Vidya Sagar,
	Krzysztof Wilczyński

Introduce a new helper, pcie_aspm_capable(), to report ASPM capability.

The user will be introduced by next patch.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v5:
 - No change.

v4:
 - Report aspm_capable instead.

v3:
 - This is a new patch

 drivers/pci/pcie/aspm.c | 11 +++++++++++
 include/linux/pci.h     |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 013a47f587cea..788e7496f33b1 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1201,6 +1201,17 @@ bool pcie_aspm_enabled(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pcie_aspm_enabled);
 
+bool pcie_aspm_capable(struct pci_dev *pdev)
+{
+	struct pcie_link_state *link = pcie_aspm_get_link(pdev);
+
+	if (!link)
+		return false;
+
+	return link->aspm_capable;
+}
+EXPORT_SYMBOL_GPL(pcie_aspm_capable);
+
 static ssize_t aspm_attr_show_common(struct device *dev,
 				     struct device_attribute *attr,
 				     char *buf, u8 state)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 34d7d94ddf6dc..d77919d125af1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1631,6 +1631,7 @@ int pci_disable_link_state_locked(struct pci_dev *pdev, int state);
 void pcie_no_aspm(void);
 bool pcie_aspm_support_enabled(void);
 bool pcie_aspm_enabled(struct pci_dev *pdev);
+bool pcie_aspm_capable(struct pci_dev *pdev);
 #else
 static inline int pci_disable_link_state(struct pci_dev *pdev, int state)
 { return 0; }
@@ -1639,6 +1640,7 @@ static inline int pci_disable_link_state_locked(struct pci_dev *pdev, int state)
 static inline void pcie_no_aspm(void) { }
 static inline bool pcie_aspm_support_enabled(void) { return false; }
 static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
+static inline bool pcie_aspm_capable(struct pci_dev *pdev) { return false; }
 #endif
 
 #ifdef CONFIG_PCIEAER
-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement
  2021-09-16 15:44 [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Kai-Heng Feng
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 1/3] PCI/ASPM: Introduce a new helper to report ASPM capability Kai-Heng Feng
@ 2021-09-16 15:44 ` Kai-Heng Feng
  2021-09-16 17:07   ` Bjorn Helgaas
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism Kai-Heng Feng
  2021-09-17 22:09 ` [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Bjorn Helgaas
  3 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-16 15:44 UTC (permalink / raw)
  To: hkallweit1, nic_swsd, bhelgaas
  Cc: davem, kuba, anthony.wong, netdev, linux-pci, linux-kernel,
	Kai-Heng Feng

Because ASPM control may not be granted by BIOS while ASPM is enabled,
and ASPM can be enabled via sysfs, so use pcie_aspm_enabled() directly
to check current ASPM enable status.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v5:
 - New patch.

 drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0199914440abc..6f1a9bec40c05 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -622,7 +622,6 @@ struct rtl8169_private {
 	} wk;
 
 	unsigned supports_gmii:1;
-	unsigned aspm_manageable:1;
 	dma_addr_t counters_phys_addr;
 	struct rtl8169_counters *counters;
 	struct rtl8169_tc_offsets tc_offset;
@@ -2664,8 +2663,13 @@ static void rtl_enable_exit_l1(struct rtl8169_private *tp)
 
 static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
 {
-	/* Don't enable ASPM in the chip if OS can't control ASPM */
-	if (enable && tp->aspm_manageable) {
+	struct pci_dev *pdev = tp->pci_dev;
+
+	/* Don't enable ASPM in the chip if PCIe ASPM isn't enabled */
+	if (!pcie_aspm_enabled(pdev) && enable)
+		return;
+
+	if (enable) {
 		RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
 	} else {
@@ -5272,8 +5276,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Disable ASPM L1 as that cause random device stop working
 	 * problems as well as full system hangs for some PCIe devices users.
 	 */
-	rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
-	tp->aspm_manageable = !rc;
+	pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
 
 	/* enable device (incl. PCI PM wakeup and hotplug setup) */
 	rc = pcim_enable_device(pdev);
-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism
  2021-09-16 15:44 [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Kai-Heng Feng
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 1/3] PCI/ASPM: Introduce a new helper to report ASPM capability Kai-Heng Feng
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement Kai-Heng Feng
@ 2021-09-16 15:44 ` Kai-Heng Feng
  2021-09-16 17:12   ` Bjorn Helgaas
  2021-09-17 22:09 ` [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Bjorn Helgaas
  3 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-16 15:44 UTC (permalink / raw)
  To: hkallweit1, nic_swsd, bhelgaas
  Cc: davem, kuba, anthony.wong, netdev, linux-pci, linux-kernel,
	Kai-Heng Feng

r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
Same issue can be observed with older vendor drivers.

The issue is however solved by the latest vendor driver. There's a new
mechanism, which disables r8169's internal ASPM when the NIC traffic has
more than 10 packets, and vice versa. The possible reason for this is
likely because the buffer on the chip is too small for its ASPM exit
latency.

Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125
use dynamic ASPM under Windows. So implement the same mechanism here to
resolve the issue.

Also introduce a lock to prevent race on accessing config registers.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v5:
 - Split out aspm_manageable replacement as another patch.
 - Introduce a lock for lock_config_regs() and unlock_config_regs().

v4:
 - Squash two patches
 - Remove aspm_manageable and use pcie_aspm_capable()
   pcie_aspm_enabled() accordingly

v3:
 - Use msecs_to_jiffies() for delay time
 - Use atomic_t instead of mutex for bh
 - Mention the buffer size and ASPM exit latency in commit message

v2: 
 - Use delayed_work instead of timer_list to avoid interrupt context
 - Use mutex to serialize packet counter read/write
 - Wording change

 drivers/net/ethernet/realtek/r8169_main.c | 58 +++++++++++++++++++++--
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 6f1a9bec40c05..0994967aa971f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -622,6 +622,11 @@ struct rtl8169_private {
 	} wk;
 
 	unsigned supports_gmii:1;
+	unsigned rtl_aspm_enabled:1;
+	struct delayed_work aspm_toggle;
+	atomic_t aspm_packet_count;
+	struct mutex config_lock;
+
 	dma_addr_t counters_phys_addr;
 	struct rtl8169_counters *counters;
 	struct rtl8169_tc_offsets tc_offset;
@@ -670,12 +675,14 @@ static inline struct device *tp_to_dev(struct rtl8169_private *tp)
 
 static void rtl_lock_config_regs(struct rtl8169_private *tp)
 {
+	mutex_lock(&tp->config_lock);
 	RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 }
 
 static void rtl_unlock_config_regs(struct rtl8169_private *tp)
 {
 	RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
+	mutex_unlock(&tp->config_lock);
 }
 
 static void rtl_pci_commit(struct rtl8169_private *tp)
@@ -2669,6 +2676,8 @@ static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
 	if (!pcie_aspm_enabled(pdev) && enable)
 		return;
 
+	tp->rtl_aspm_enabled = enable;
+
 	if (enable) {
 		RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
@@ -4407,6 +4416,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
 
 	dirty_tx = tp->dirty_tx;
 
+	atomic_add(tp->cur_tx - dirty_tx, &tp->aspm_packet_count);
 	while (READ_ONCE(tp->cur_tx) != dirty_tx) {
 		unsigned int entry = dirty_tx % NUM_TX_DESC;
 		u32 status;
@@ -4551,6 +4561,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 		rtl8169_mark_to_asic(desc);
 	}
 
+	atomic_add(count, &tp->aspm_packet_count);
+
 	return count;
 }
 
@@ -4658,8 +4670,39 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
 	return 0;
 }
 
+#define ASPM_PACKET_THRESHOLD 10
+#define ASPM_TOGGLE_INTERVAL 1000
+
+static void rtl8169_aspm_toggle(struct work_struct *work)
+{
+	struct rtl8169_private *tp = container_of(work, struct rtl8169_private,
+						  aspm_toggle.work);
+	int packet_count;
+	bool enable;
+
+	packet_count = atomic_xchg(&tp->aspm_packet_count, 0);
+
+	if (pcie_aspm_enabled(tp->pci_dev)) {
+		enable = packet_count <= ASPM_PACKET_THRESHOLD;
+
+		if (tp->rtl_aspm_enabled != enable) {
+			rtl_unlock_config_regs(tp);
+			rtl_hw_aspm_clkreq_enable(tp, enable);
+			rtl_lock_config_regs(tp);
+		}
+	} else if (tp->rtl_aspm_enabled) {
+		rtl_unlock_config_regs(tp);
+		rtl_hw_aspm_clkreq_enable(tp, false);
+		rtl_lock_config_regs(tp);
+	}
+
+	schedule_delayed_work(&tp->aspm_toggle, msecs_to_jiffies(ASPM_TOGGLE_INTERVAL));
+}
+
 static void rtl8169_down(struct rtl8169_private *tp)
 {
+	cancel_delayed_work_sync(&tp->aspm_toggle);
+
 	/* Clear all task flags */
 	bitmap_zero(tp->wk.flags, RTL_FLAG_MAX);
 
@@ -4686,6 +4729,10 @@ static void rtl8169_up(struct rtl8169_private *tp)
 	rtl_reset_work(tp);
 
 	phy_start(tp->phydev);
+
+	/* pcie_aspm_capable may change after system resume */
+	if (pcie_aspm_support_enabled() && pcie_aspm_capable(tp->pci_dev))
+		schedule_delayed_work(&tp->aspm_toggle, 0);
 }
 
 static int rtl8169_close(struct net_device *dev)
@@ -5273,11 +5320,6 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		return rc;
 
-	/* Disable ASPM L1 as that cause random device stop working
-	 * problems as well as full system hangs for some PCIe devices users.
-	 */
-	pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
-
 	/* enable device (incl. PCI PM wakeup and hotplug setup) */
 	rc = pcim_enable_device(pdev);
 	if (rc < 0) {
@@ -5307,6 +5349,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return rc;
 	}
 
+	mutex_init(&tp->config_lock);
+
 	tp->mmio_addr = pcim_iomap_table(pdev)[region];
 
 	xid = (RTL_R32(tp, TxConfig) >> 20) & 0xfcf;
@@ -5344,6 +5388,10 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	INIT_WORK(&tp->wk.work, rtl_task);
 
+	INIT_DELAYED_WORK(&tp->aspm_toggle, rtl8169_aspm_toggle);
+
+	atomic_set(&tp->aspm_packet_count, 0);
+
 	rtl_init_mac_address(tp);
 
 	dev->ethtool_ops = &rtl8169_ethtool_ops;
-- 
2.32.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement Kai-Heng Feng
@ 2021-09-16 17:07   ` Bjorn Helgaas
  2021-09-17  4:09     ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2021-09-16 17:07 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: hkallweit1, nic_swsd, bhelgaas, davem, kuba, anthony.wong,
	netdev, linux-pci, linux-kernel

On Thu, Sep 16, 2021 at 11:44:16PM +0800, Kai-Heng Feng wrote:
> Because ASPM control may not be granted by BIOS while ASPM is enabled,
> and ASPM can be enabled via sysfs, so use pcie_aspm_enabled() directly
> to check current ASPM enable status.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> v5:
>  - New patch.
> 
>  drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 0199914440abc..6f1a9bec40c05 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -622,7 +622,6 @@ struct rtl8169_private {
>  	} wk;
>  
>  	unsigned supports_gmii:1;
> -	unsigned aspm_manageable:1;
>  	dma_addr_t counters_phys_addr;
>  	struct rtl8169_counters *counters;
>  	struct rtl8169_tc_offsets tc_offset;
> @@ -2664,8 +2663,13 @@ static void rtl_enable_exit_l1(struct rtl8169_private *tp)
>  
>  static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
>  {
> -	/* Don't enable ASPM in the chip if OS can't control ASPM */
> -	if (enable && tp->aspm_manageable) {
> +	struct pci_dev *pdev = tp->pci_dev;
> +
> +	/* Don't enable ASPM in the chip if PCIe ASPM isn't enabled */
> +	if (!pcie_aspm_enabled(pdev) && enable)
> +		return;

What happens when the user enables or disables ASPM via sysfs (see
https://git.kernel.org/linus/72ea91afbfb0)?

The driver is not going to know about that change.

> +	if (enable) {
>  		RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
>  		RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
>  	} else {
> @@ -5272,8 +5276,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	/* Disable ASPM L1 as that cause random device stop working
>  	 * problems as well as full system hangs for some PCIe devices users.
>  	 */
> -	rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> -	tp->aspm_manageable = !rc;
> +	pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
>  
>  	/* enable device (incl. PCI PM wakeup and hotplug setup) */
>  	rc = pcim_enable_device(pdev);
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism Kai-Heng Feng
@ 2021-09-16 17:12   ` Bjorn Helgaas
  2021-09-17  4:19     ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2021-09-16 17:12 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: hkallweit1, nic_swsd, bhelgaas, davem, kuba, anthony.wong,
	netdev, linux-pci, linux-kernel

On Thu, Sep 16, 2021 at 11:44:17PM +0800, Kai-Heng Feng wrote:
> r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
> Same issue can be observed with older vendor drivers.
> 
> The issue is however solved by the latest vendor driver. There's a new
> mechanism, which disables r8169's internal ASPM when the NIC traffic has
> more than 10 packets, and vice versa. 

Obviously this is a *rate*, not an absolute number.  I think you mean
something like "10 packets in 1000ms".

> The possible reason for this is
> likely because the buffer on the chip is too small for its ASPM exit
> latency.
> 
> Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125
> use dynamic ASPM under Windows. So implement the same mechanism here to
> resolve the issue.
> 
> Also introduce a lock to prevent race on accessing config registers.

Can you please include the bugzilla link where you attached lspci
data?  I think it's this:

  https://bugzilla.kernel.org/show_bug.cgi?id=214307

> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement
  2021-09-16 17:07   ` Bjorn Helgaas
@ 2021-09-17  4:09     ` Kai-Heng Feng
  2021-09-17 15:26       ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-17  4:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Sep 17, 2021 at 1:07 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Sep 16, 2021 at 11:44:16PM +0800, Kai-Heng Feng wrote:
> > Because ASPM control may not be granted by BIOS while ASPM is enabled,
> > and ASPM can be enabled via sysfs, so use pcie_aspm_enabled() directly
> > to check current ASPM enable status.
> >
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> > v5:
> >  - New patch.
> >
> >  drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 0199914440abc..6f1a9bec40c05 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -622,7 +622,6 @@ struct rtl8169_private {
> >       } wk;
> >
> >       unsigned supports_gmii:1;
> > -     unsigned aspm_manageable:1;
> >       dma_addr_t counters_phys_addr;
> >       struct rtl8169_counters *counters;
> >       struct rtl8169_tc_offsets tc_offset;
> > @@ -2664,8 +2663,13 @@ static void rtl_enable_exit_l1(struct rtl8169_private *tp)
> >
> >  static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
> >  {
> > -     /* Don't enable ASPM in the chip if OS can't control ASPM */
> > -     if (enable && tp->aspm_manageable) {
> > +     struct pci_dev *pdev = tp->pci_dev;
> > +
> > +     /* Don't enable ASPM in the chip if PCIe ASPM isn't enabled */
> > +     if (!pcie_aspm_enabled(pdev) && enable)
> > +             return;
>
> What happens when the user enables or disables ASPM via sysfs (see
> https://git.kernel.org/linus/72ea91afbfb0)?
>
> The driver is not going to know about that change.

So it's still better to fold this patch into next one? So the periodic
delayed_work can toggle ASPM accordingly.

Kai-Heng

>
> > +     if (enable) {
> >               RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
> >               RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
> >       } else {
> > @@ -5272,8 +5276,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> >       /* Disable ASPM L1 as that cause random device stop working
> >        * problems as well as full system hangs for some PCIe devices users.
> >        */
> > -     rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> > -     tp->aspm_manageable = !rc;
> > +     pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> >
> >       /* enable device (incl. PCI PM wakeup and hotplug setup) */
> >       rc = pcim_enable_device(pdev);
> > --
> > 2.32.0
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism
  2021-09-16 17:12   ` Bjorn Helgaas
@ 2021-09-17  4:19     ` Kai-Heng Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-09-17  4:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Sep 17, 2021 at 1:12 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Sep 16, 2021 at 11:44:17PM +0800, Kai-Heng Feng wrote:
> > r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
> > Same issue can be observed with older vendor drivers.
> >
> > The issue is however solved by the latest vendor driver. There's a new
> > mechanism, which disables r8169's internal ASPM when the NIC traffic has
> > more than 10 packets, and vice versa.
>
> Obviously this is a *rate*, not an absolute number.  I think you mean
> something like "10 packets in 1000ms".

Will amend this in next iteration.

>
> > The possible reason for this is
> > likely because the buffer on the chip is too small for its ASPM exit
> > latency.
> >
> > Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125
> > use dynamic ASPM under Windows. So implement the same mechanism here to
> > resolve the issue.
> >
> > Also introduce a lock to prevent race on accessing config registers.
>
> Can you please include the bugzilla link where you attached lspci
> data?  I think it's this:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=214307

Yes I forgot to add it. Will include in in next iteration.

Kai-Heng

>
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement
  2021-09-17  4:09     ` Kai-Heng Feng
@ 2021-09-17 15:26       ` Bjorn Helgaas
  2021-10-01  4:32         ` Kai-Heng Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2021-09-17 15:26 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Sep 17, 2021 at 12:09:08PM +0800, Kai-Heng Feng wrote:
> On Fri, Sep 17, 2021 at 1:07 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Sep 16, 2021 at 11:44:16PM +0800, Kai-Heng Feng wrote:
> > > Because ASPM control may not be granted by BIOS while ASPM is enabled,
> > > and ASPM can be enabled via sysfs, so use pcie_aspm_enabled() directly
> > > to check current ASPM enable status.
> > >
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > ---
> > > v5:
> > >  - New patch.
> > >
> > >  drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++-----
> > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > > index 0199914440abc..6f1a9bec40c05 100644
> > > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > > @@ -622,7 +622,6 @@ struct rtl8169_private {
> > >       } wk;
> > >
> > >       unsigned supports_gmii:1;
> > > -     unsigned aspm_manageable:1;
> > >       dma_addr_t counters_phys_addr;
> > >       struct rtl8169_counters *counters;
> > >       struct rtl8169_tc_offsets tc_offset;
> > > @@ -2664,8 +2663,13 @@ static void rtl_enable_exit_l1(struct rtl8169_private *tp)
> > >
> > >  static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
> > >  {
> > > -     /* Don't enable ASPM in the chip if OS can't control ASPM */
> > > -     if (enable && tp->aspm_manageable) {
> > > +     struct pci_dev *pdev = tp->pci_dev;
> > > +
> > > +     /* Don't enable ASPM in the chip if PCIe ASPM isn't enabled */
> > > +     if (!pcie_aspm_enabled(pdev) && enable)
> > > +             return;
> >
> > What happens when the user enables or disables ASPM via sysfs (see
> > https://git.kernel.org/linus/72ea91afbfb0)?
> >
> > The driver is not going to know about that change.
> 
> So it's still better to fold this patch into next one? So the periodic
> delayed_work can toggle ASPM accordingly.

No, my point is that the user can enable/disable ASPM via sysfs, and
the driver will not know anything about it.  There's no callback that
tells the driver when this happens.

My question is whether this code works when that happens.  I doubt it
works, because if ASPM is not enabled at this moment, you return
without doing enabling ASPM in the chip below.

If the user subsequently enables ASPM via sysfs, the chip setup below
will not be done.

If there's chip-specific setup to make ASPM work, I think the
chip-specific part needs to be done unconditionally.

> > > +     if (enable) {
> > >               RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
> > >               RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
> > >       } else {
> > > @@ -5272,8 +5276,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> > >       /* Disable ASPM L1 as that cause random device stop working
> > >        * problems as well as full system hangs for some PCIe devices users.
> > >        */
> > > -     rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> > > -     tp->aspm_manageable = !rc;
> > > +     pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> > >
> > >       /* enable device (incl. PCI PM wakeup and hotplug setup) */
> > >       rc = pcim_enable_device(pdev);
> > > --
> > > 2.32.0
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs
  2021-09-16 15:44 [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Kai-Heng Feng
                   ` (2 preceding siblings ...)
  2021-09-16 15:44 ` [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism Kai-Heng Feng
@ 2021-09-17 22:09 ` Bjorn Helgaas
  2021-10-01  4:17   ` Kai-Heng Feng
  3 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2021-09-17 22:09 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: hkallweit1, nic_swsd, bhelgaas, davem, kuba, anthony.wong,
	netdev, linux-pci, linux-kernel

On Thu, Sep 16, 2021 at 11:44:14PM +0800, Kai-Heng Feng wrote:
> The purpose of the series is to get comments and reviews so we can merge
> and test the series in downstream kernel.
> 
> The latest Realtek vendor driver and its Windows driver implements a
> feature called "dynamic ASPM" which can improve performance on it's
> ethernet NICs.
> 
> Heiner Kallweit pointed out the potential root cause can be that the
> buffer is too small for its ASPM exit latency.

I looked at the lspci data in your bugzilla
(https://bugzilla.kernel.org/show_bug.cgi?id=214307).

L1.2 is enabled, which requires the Latency Tolerance Reporting
capability, which helps determine when the Link will be put in L1.2.
IIUC, these are analogous to the DevCap "Acceptable Latency" values.
Zero latency values indicate the device will be impacted by any delay
(PCIe r5.0, sec 6.18).

Linux does not currently program those values, so the values there
must have been set by the BIOS.  On the working AMD system, they're
set to 1048576ns, while on the broken Intel system, they're set to
3145728ns.

I don't really understand how these values should be computed, and I
think they depend on some electrical characteristics of the Link, so
I'm not sure it's *necessarily* a problem that they are different.
But a 3X difference does seem pretty large.

So I'm curious whether this is related to the problem.  Here are some
things we could try on the broken Intel system:

  - What happens if you disable ASPM L1.2 using
    /sys/devices/pci*/.../link/l1_2_aspm?

  - If that doesn't work, what happens if you also disable PCI-PM L1.2
    using /sys/devices/pci*/.../link/l1_2_pcipm?

  - If either of the above makes things work, then at least we know
    the problem is sensitive to L1.2.

  - Then what happens if you use setpci to set the LTR Latency
    registers to 0, then re-enable ASPM L1.2 and PCI-PM L1.2?  This
    should mean the Realtek device wants the best possible service and
    the Link probably won't spend much time in L1.2.

  - What happens if you set the LTR Latency registers to 0x1001
    (should be the same as on the AMD system)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs
  2021-09-17 22:09 ` [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Bjorn Helgaas
@ 2021-10-01  4:17   ` Kai-Heng Feng
  2021-10-07 19:10     ` Bjorn Helgaas
  2021-10-08 13:56     ` Bjorn Helgaas
  0 siblings, 2 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-10-01  4:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Sat, Sep 18, 2021 at 6:09 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Sep 16, 2021 at 11:44:14PM +0800, Kai-Heng Feng wrote:
> > The purpose of the series is to get comments and reviews so we can merge
> > and test the series in downstream kernel.
> >
> > The latest Realtek vendor driver and its Windows driver implements a
> > feature called "dynamic ASPM" which can improve performance on it's
> > ethernet NICs.
> >
> > Heiner Kallweit pointed out the potential root cause can be that the
> > buffer is too small for its ASPM exit latency.
>
> I looked at the lspci data in your bugzilla
> (https://bugzilla.kernel.org/show_bug.cgi?id=214307).
>
> L1.2 is enabled, which requires the Latency Tolerance Reporting
> capability, which helps determine when the Link will be put in L1.2.
> IIUC, these are analogous to the DevCap "Acceptable Latency" values.
> Zero latency values indicate the device will be impacted by any delay
> (PCIe r5.0, sec 6.18).
>
> Linux does not currently program those values, so the values there
> must have been set by the BIOS.  On the working AMD system, they're
> set to 1048576ns, while on the broken Intel system, they're set to
> 3145728ns.
>
> I don't really understand how these values should be computed, and I
> think they depend on some electrical characteristics of the Link, so
> I'm not sure it's *necessarily* a problem that they are different.
> But a 3X difference does seem pretty large.
>
> So I'm curious whether this is related to the problem.  Here are some
> things we could try on the broken Intel system:

Original network speed, tested via iperf3:
TX: ~255 Mbps
RX: ~490 Mbps

>
>   - What happens if you disable ASPM L1.2 using
>     /sys/devices/pci*/.../link/l1_2_aspm?

TX: ~670 Mbps
RX: ~670 Mbps

>
>   - If that doesn't work, what happens if you also disable PCI-PM L1.2
>     using /sys/devices/pci*/.../link/l1_2_pcipm?

Same as only disables l1_2_aspm.

>
>   - If either of the above makes things work, then at least we know
>     the problem is sensitive to L1.2.

Right now the downstream kernel disables ASPM L1.2 as workaround.

>
>   - Then what happens if you use setpci to set the LTR Latency
>     registers to 0, then re-enable ASPM L1.2 and PCI-PM L1.2?  This
>     should mean the Realtek device wants the best possible service and
>     the Link probably won't spend much time in L1.2.

# setpci -s 01:00.0 ECAP_LTR+4.w=0x0
# setpci -s 01:00.0 ECAP_LTR+6.w=0x0

Then re-enable ASPM L1.2, the issue persists - the network speed is
still very slow.

>
>   - What happens if you set the LTR Latency registers to 0x1001
>     (should be the same as on the AMD system)?

Same slow speed here.

Kai-Heng

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement
  2021-09-17 15:26       ` Bjorn Helgaas
@ 2021-10-01  4:32         ` Kai-Heng Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Kai-Heng Feng @ 2021-10-01  4:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Sep 17, 2021 at 11:26 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Sep 17, 2021 at 12:09:08PM +0800, Kai-Heng Feng wrote:
> > On Fri, Sep 17, 2021 at 1:07 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Sep 16, 2021 at 11:44:16PM +0800, Kai-Heng Feng wrote:
> > > > Because ASPM control may not be granted by BIOS while ASPM is enabled,
> > > > and ASPM can be enabled via sysfs, so use pcie_aspm_enabled() directly
> > > > to check current ASPM enable status.
> > > >
> > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > ---
> > > > v5:
> > > >  - New patch.
> > > >
> > > >  drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++-----
> > > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > > > index 0199914440abc..6f1a9bec40c05 100644
> > > > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > > > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > > > @@ -622,7 +622,6 @@ struct rtl8169_private {
> > > >       } wk;
> > > >
> > > >       unsigned supports_gmii:1;
> > > > -     unsigned aspm_manageable:1;
> > > >       dma_addr_t counters_phys_addr;
> > > >       struct rtl8169_counters *counters;
> > > >       struct rtl8169_tc_offsets tc_offset;
> > > > @@ -2664,8 +2663,13 @@ static void rtl_enable_exit_l1(struct rtl8169_private *tp)
> > > >
> > > >  static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
> > > >  {
> > > > -     /* Don't enable ASPM in the chip if OS can't control ASPM */
> > > > -     if (enable && tp->aspm_manageable) {
> > > > +     struct pci_dev *pdev = tp->pci_dev;
> > > > +
> > > > +     /* Don't enable ASPM in the chip if PCIe ASPM isn't enabled */
> > > > +     if (!pcie_aspm_enabled(pdev) && enable)
> > > > +             return;
> > >
> > > What happens when the user enables or disables ASPM via sysfs (see
> > > https://git.kernel.org/linus/72ea91afbfb0)?
> > >
> > > The driver is not going to know about that change.
> >
> > So it's still better to fold this patch into next one? So the periodic
> > delayed_work can toggle ASPM accordingly.
>
> No, my point is that the user can enable/disable ASPM via sysfs, and
> the driver will not know anything about it.  There's no callback that
> tells the driver when this happens.
>
> My question is whether this code works when that happens.  I doubt it
> works, because if ASPM is not enabled at this moment, you return
> without doing enabling ASPM in the chip below.
>
> If the user subsequently enables ASPM via sysfs, the chip setup below
> will not be done.
>
> If there's chip-specific setup to make ASPM work, I think the
> chip-specific part needs to be done unconditionally.

So it's either adding a callback to notify driver about ASPM change,
or doing chip-specific ASPM unconditionally.
Which one do you prefer?

Kai-Heng

>
> > > > +     if (enable) {
> > > >               RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
> > > >               RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
> > > >       } else {
> > > > @@ -5272,8 +5276,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> > > >       /* Disable ASPM L1 as that cause random device stop working
> > > >        * problems as well as full system hangs for some PCIe devices users.
> > > >        */
> > > > -     rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> > > > -     tp->aspm_manageable = !rc;
> > > > +     pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
> > > >
> > > >       /* enable device (incl. PCI PM wakeup and hotplug setup) */
> > > >       rc = pcim_enable_device(pdev);
> > > > --
> > > > 2.32.0
> > > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs
  2021-10-01  4:17   ` Kai-Heng Feng
@ 2021-10-07 19:10     ` Bjorn Helgaas
  2021-10-08 13:56     ` Bjorn Helgaas
  1 sibling, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2021-10-07 19:10 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Oct 01, 2021 at 12:17:26PM +0800, Kai-Heng Feng wrote:
> On Sat, Sep 18, 2021 at 6:09 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Sep 16, 2021 at 11:44:14PM +0800, Kai-Heng Feng wrote:
> > > The purpose of the series is to get comments and reviews so we can merge
> > > and test the series in downstream kernel.
> > >
> > > The latest Realtek vendor driver and its Windows driver implements a
> > > feature called "dynamic ASPM" which can improve performance on it's
> > > ethernet NICs.
> > >
> > > Heiner Kallweit pointed out the potential root cause can be that the
> > > buffer is too small for its ASPM exit latency.
> >
> > I looked at the lspci data in your bugzilla
> > (https://bugzilla.kernel.org/show_bug.cgi?id=214307).
> >
> > L1.2 is enabled, which requires the Latency Tolerance Reporting
> > capability, which helps determine when the Link will be put in L1.2.
> > IIUC, these are analogous to the DevCap "Acceptable Latency" values.
> > Zero latency values indicate the device will be impacted by any delay
> > (PCIe r5.0, sec 6.18).
> >
> > Linux does not currently program those values, so the values there
> > must have been set by the BIOS.  On the working AMD system, they're
> > set to 1048576ns, while on the broken Intel system, they're set to
> > 3145728ns.
> >
> > I don't really understand how these values should be computed, and I
> > think they depend on some electrical characteristics of the Link, so
> > I'm not sure it's *necessarily* a problem that they are different.
> > But a 3X difference does seem pretty large.
> >
> > So I'm curious whether this is related to the problem.  Here are some
> > things we could try on the broken Intel system:
> 
> Original network speed, tested via iperf3:
> TX: ~255 Mbps
> RX: ~490 Mbps
> 
> >   - What happens if you disable ASPM L1.2 using
> >     /sys/devices/pci*/.../link/l1_2_aspm?
> 
> TX: ~670 Mbps
> RX: ~670 Mbps
> 
> >   - If that doesn't work, what happens if you also disable PCI-PM L1.2
> >     using /sys/devices/pci*/.../link/l1_2_pcipm?
> 
> Same as only disables l1_2_aspm.
> 
> >   - If either of the above makes things work, then at least we know
> >     the problem is sensitive to L1.2.
> 
> Right now the downstream kernel disables ASPM L1.2 as workaround.
> 
> >   - Then what happens if you use setpci to set the LTR Latency
> >     registers to 0, then re-enable ASPM L1.2 and PCI-PM L1.2?  This
> >     should mean the Realtek device wants the best possible service and
> >     the Link probably won't spend much time in L1.2.
> 
> # setpci -s 01:00.0 ECAP_LTR+4.w=0x0
> # setpci -s 01:00.0 ECAP_LTR+6.w=0x0
> 
> Then re-enable ASPM L1.2, the issue persists - the network speed is
> still very slow.
> 
> >   - What happens if you set the LTR Latency registers to 0x1001
> >     (should be the same as on the AMD system)?
> 
> Same slow speed here.

Thanks a lot for indulging my curiosity and testing this.  So I guess
you confirmed that specifically ASPM L1.2 is the issue, which makes
sense given the current downstream kernel workaround.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs
  2021-10-01  4:17   ` Kai-Heng Feng
  2021-10-07 19:10     ` Bjorn Helgaas
@ 2021-10-08 13:56     ` Bjorn Helgaas
  1 sibling, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2021-10-08 13:56 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Heiner Kallweit, nic_swsd, Bjorn Helgaas, David Miller,
	Jakub Kicinski, Anthony Wong, Linux Netdev List, Linux PCI, LKML

On Fri, Oct 01, 2021 at 12:17:26PM +0800, Kai-Heng Feng wrote:
> On Sat, Sep 18, 2021 at 6:09 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Sep 16, 2021 at 11:44:14PM +0800, Kai-Heng Feng wrote:
> > > The purpose of the series is to get comments and reviews so we can merge
> > > and test the series in downstream kernel.
> > >
> > > The latest Realtek vendor driver and its Windows driver implements a
> > > feature called "dynamic ASPM" which can improve performance on it's
> > > ethernet NICs.
> > >
> > > Heiner Kallweit pointed out the potential root cause can be that the
> > > buffer is too small for its ASPM exit latency.
> >
> > I looked at the lspci data in your bugzilla
> > (https://bugzilla.kernel.org/show_bug.cgi?id=214307).
> >
> > L1.2 is enabled, which requires the Latency Tolerance Reporting
> > capability, which helps determine when the Link will be put in L1.2.
> > IIUC, these are analogous to the DevCap "Acceptable Latency" values.
> > Zero latency values indicate the device will be impacted by any delay
> > (PCIe r5.0, sec 6.18).
> >
> > Linux does not currently program those values, so the values there
> > must have been set by the BIOS.  On the working AMD system, they're
> > set to 1048576ns, while on the broken Intel system, they're set to
> > 3145728ns.
> >
> > I don't really understand how these values should be computed, and I
> > think they depend on some electrical characteristics of the Link, so
> > I'm not sure it's *necessarily* a problem that they are different.
> > But a 3X difference does seem pretty large.
> >
> > So I'm curious whether this is related to the problem.  Here are some
> > things we could try on the broken Intel system:
> 
> Original network speed, tested via iperf3:
> TX: ~255 Mbps
> RX: ~490 Mbps
> 
> >   - What happens if you disable ASPM L1.2 using
> >     /sys/devices/pci*/.../link/l1_2_aspm?
> 
> TX: ~670 Mbps
> RX: ~670 Mbps

Do you remember if there were any dropped packets here?  You mentioned
at [1] that you have also seen reports of issues with L0s and L1.1.
If you disable L1.2, L0s and L1.1 *should* still be enabled.

[1] https://lore.kernel.org/r/CAAd53p4v+CmupCu2+3vY5N64WKkxcNvpk1M7+hhNoposx+aYCg@mail.gmail.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-08 13:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16 15:44 [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Kai-Heng Feng
2021-09-16 15:44 ` [RFC] [PATCH net-next v5 1/3] PCI/ASPM: Introduce a new helper to report ASPM capability Kai-Heng Feng
2021-09-16 15:44 ` [RFC] [PATCH net-next v5 2/3] r8169: Use PCIe ASPM status for NIC ASPM enablement Kai-Heng Feng
2021-09-16 17:07   ` Bjorn Helgaas
2021-09-17  4:09     ` Kai-Heng Feng
2021-09-17 15:26       ` Bjorn Helgaas
2021-10-01  4:32         ` Kai-Heng Feng
2021-09-16 15:44 ` [RFC] [PATCH net-next v5 3/3] r8169: Implement dynamic ASPM mechanism Kai-Heng Feng
2021-09-16 17:12   ` Bjorn Helgaas
2021-09-17  4:19     ` Kai-Heng Feng
2021-09-17 22:09 ` [RFC] [PATCH net-next v5 0/3] r8169: Implement dynamic ASPM mechanism for recent 1.0/2.5Gbps Realtek NICs Bjorn Helgaas
2021-10-01  4:17   ` Kai-Heng Feng
2021-10-07 19:10     ` Bjorn Helgaas
2021-10-08 13:56     ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).