Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>,
	Aya Levin <ayal@nvidia.com>, Moshe Shemesh <moshe@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>
Subject: [net 10/12] net/mlx5: Unload device upon firmware fatal error
Date: Tue, 27 Jul 2021 16:20:48 -0700	[thread overview]
Message-ID: <20210727232050.606896-11-saeed@kernel.org> (raw)
In-Reply-To: <20210727232050.606896-1-saeed@kernel.org>

From: Aya Levin <ayal@nvidia.com>

When fw_fatal reporter reports an error, the firmware in not responding.
Unload the device to ensure that the driver closes all its resources,
even if recovery is not due (user disabled auto-recovery or reporter is
in grace period). On successful recovery the device is loaded back up.

Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 9ff163c5bcde..9abeb80ffa31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -626,8 +626,16 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
 	}
 	fw_reporter_ctx.err_synd = health->synd;
 	fw_reporter_ctx.miss_counter = health->miss_counter;
-	devlink_health_report(health->fw_fatal_reporter,
-			      "FW fatal error reported", &fw_reporter_ctx);
+	if (devlink_health_report(health->fw_fatal_reporter,
+				  "FW fatal error reported", &fw_reporter_ctx) == -ECANCELED) {
+		/* If recovery wasn't performed, due to grace period,
+		 * unload the driver. This ensures that the driver
+		 * closes all its resources and it is not subjected to
+		 * requests from the kernel.
+		 */
+		mlx5_core_err(dev, "Driver is in error state. Unloading\n");
+		mlx5_unload_one(dev);
+	}
 }
 
 static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
-- 
2.31.1


  parent reply	other threads:[~2021-07-27 23:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 23:20 [pull request][net 00/12] mlx5 fixes 2021-07-27 Saeed Mahameed
2021-07-27 23:20 ` [net 01/12] net/mlx5: Fix flow table chaining Saeed Mahameed
2021-07-28  8:30   ` patchwork-bot+netdevbpf
2021-07-27 23:20 ` [net 02/12] net/mlx5e: Disable Rx ntuple offload for uplink representor Saeed Mahameed
2021-07-27 23:20 ` [net 03/12] net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch is supported Saeed Mahameed
2021-07-27 23:20 ` [net 04/12] net/mlx5: E-Switch, handle devcom events only for ports on the same device Saeed Mahameed
2021-07-27 23:20 ` [net 05/12] net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO combined Saeed Mahameed
2021-07-27 23:20 ` [net 06/12] net/mlx5e: Add NETIF_F_HW_TC to hw_features when HTB offload is available Saeed Mahameed
2021-07-27 23:20 ` [net 07/12] net/mlx5e: Consider PTP-RQ when setting RX VLAN stripping Saeed Mahameed
2021-07-27 23:20 ` [net 08/12] net/mlx5e: Fix page allocation failure for trap-RQ over SF Saeed Mahameed
2021-07-27 23:20 ` [net 09/12] net/mlx5e: Fix page allocation failure for ptp-RQ " Saeed Mahameed
2021-07-27 23:20 ` Saeed Mahameed [this message]
2021-07-27 23:20 ` [net 11/12] net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() Saeed Mahameed
2021-07-27 23:20 ` [net 12/12] net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210727232050.606896-11-saeed@kernel.org \
    --to=saeed@kernel.org \
    --cc=ayal@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --subject='Re: [net 10/12] net/mlx5: Unload device upon firmware fatal error' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).