Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Michael Chan <michael.chan@broadcom.com>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, kuba@kernel.org, edwin.peer@broadcom.com,
	gospo@broadcom.com
Subject: [PATCH net 5/5] bnxt_en: Fix possible unintended driver initiated error recovery
Date: Sun,  5 Sep 2021 14:10:59 -0400	[thread overview]
Message-ID: <1630865459-19146-6-git-send-email-michael.chan@broadcom.com> (raw)
In-Reply-To: <1630865459-19146-1-git-send-email-michael.chan@broadcom.com>

[-- Attachment #1: Type: text/plain, Size: 3602 bytes --]

If error recovery is already enabled, bnxt_timer() will periodically
check the heartbeat register and the reset counter.  If we get an
error recovery async. notification from the firmware (e.g. change in
primary/secondary role), we will immediately read and update the
heartbeat register and the reset counter.  If the timer for the next
health check expires soon after this, we may read the heartbeat register
again in quick succession and find that it hasn't changed.  This will
trigger error recovery unintentionally.

The likelihood is small because we also reset fw_health->tmr_counter
which will reset the interval for the next health check.  But the
update is not protected and bnxt_timer() can miss the update and
perform the health check without waiting for the full interval.

Fix it by only reading the heartbeat register and reset counter in
bnxt_async_event_process() if error recovery is trasitioning to the
enabled state.  Also add proper memory barriers so that when enabling
for the first time, bnxt_timer() will see the tmr_counter interval and
perform the health check after the full interval has elapsed.

Fixes: 7e914027f757 ("bnxt_en: Enable health monitoring.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 ++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 40a390652d8d..9b86516e59a1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2202,25 +2202,34 @@ static int bnxt_async_event_process(struct bnxt *bp,
 		if (!fw_health)
 			goto async_event_process_exit;
 
-		fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1);
-		fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
-		if (!fw_health->enabled) {
+		if (!EVENT_DATA1_RECOVERY_ENABLED(data1)) {
+			fw_health->enabled = false;
 			netif_info(bp, drv, bp->dev,
 				   "Error recovery info: error recovery[0]\n");
 			break;
 		}
+		fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
 		fw_health->tmr_multiplier =
 			DIV_ROUND_UP(fw_health->polling_dsecs * HZ,
 				     bp->current_interval * 10);
 		fw_health->tmr_counter = fw_health->tmr_multiplier;
-		fw_health->last_fw_heartbeat =
-			bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
-		fw_health->last_fw_reset_cnt =
-			bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+		if (!fw_health->enabled) {
+			fw_health->last_fw_heartbeat =
+				bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+			fw_health->last_fw_reset_cnt =
+				bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+		}
 		netif_info(bp, drv, bp->dev,
 			   "Error recovery info: error recovery[1], master[%d], reset count[%u], health status: 0x%x\n",
 			   fw_health->master, fw_health->last_fw_reset_cnt,
 			   bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG));
+		if (!fw_health->enabled) {
+			/* Make sure tmr_counter is set and visible to
+			 * bnxt_health_check() before setting enabled to true.
+			 */
+			smp_wmb();
+			fw_health->enabled = true;
+		}
 		goto async_event_process_exit;
 	}
 	case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION:
@@ -11258,6 +11267,8 @@ static void bnxt_fw_health_check(struct bnxt *bp)
 	if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
 		return;
 
+	/* Make sure it is enabled before checking the tmr_counter. */
+	smp_rmb();
 	if (fw_health->tmr_counter) {
 		fw_health->tmr_counter--;
 		return;
-- 
2.18.1


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

  parent reply	other threads:[~2021-09-05 18:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-05 18:10 [PATCH net 0/5] bnxt_en: Bug fixes Michael Chan
2021-09-05 18:10 ` [PATCH net 1/5] bnxt_en: fix stored FW_PSID version masks Michael Chan
2021-09-05 18:10 ` [PATCH net 2/5] bnxt_en: fix read of stored FW_PSID version on P5 devices Michael Chan
2021-09-05 18:10 ` [PATCH net 3/5] bnxt_en: Fix asic.rev in devlink dev info command Michael Chan
2021-09-05 18:10 ` [PATCH net 4/5] bnxt_en: Fix UDP tunnel logic Michael Chan
2021-09-05 18:10 ` Michael Chan [this message]
2021-09-05 19:50 ` [PATCH net 0/5] bnxt_en: Bug fixes patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1630865459-19146-6-git-send-email-michael.chan@broadcom.com \
    --to=michael.chan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edwin.peer@broadcom.com \
    --cc=gospo@broadcom.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --subject='Re: [PATCH net 5/5] bnxt_en: Fix possible unintended driver initiated error recovery' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).