LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net-next 0/6] Devlink cleanups
@ 2021-11-17 18:26 Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 1/6] devlink: Remove misleading internal_flags from health reporter dump Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

This series is non-controversial subset of my RFC [1], where I proposed
a way to allow parallel devlink execution.

Thanks

[1] https://lore.kernel.org/all/cover.1636390483.git.leonro@nvidia.com

Leon Romanovsky (6):
  devlink: Remove misleading internal_flags from health reporter dump
  devlink: Delete useless checks of holding devlink lock
  devlink: Simplify devlink resources unregister call
  devlink: Clean registration of devlink port
  devlink: Reshuffle resource registration logic
  devlink: Inline sb related functions

 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |   7 +-
 .../freescale/dpaa2/dpaa2-eth-devlink.c       |   7 +-
 drivers/net/ethernet/intel/ice/ice_devlink.c  |  23 +-
 .../marvell/prestera/prestera_devlink.c       |   8 +-
 drivers/net/ethernet/mellanox/mlx4/main.c     |   4 +-
 .../ethernet/mellanox/mlx5/core/en/devlink.c  |   5 +-
 .../ethernet/mellanox/mlx5/core/en/devlink.h  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   7 +-
 .../mellanox/mlx5/core/esw/devlink_port.c     |   9 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  15 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |   4 +-
 drivers/net/ethernet/mscc/ocelot_net.c        |   4 +-
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |   4 +-
 .../ethernet/pensando/ionic/ionic_devlink.c   |   8 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  14 +-
 drivers/net/netdevsim/dev.c                   |  11 +-
 include/net/devlink.h                         |   9 +-
 net/core/devlink.c                            | 220 ++++++------------
 net/dsa/dsa.c                                 |   2 +-
 net/dsa/dsa2.c                                |   9 +-
 20 files changed, 115 insertions(+), 257 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 1/6] devlink: Remove misleading internal_flags from health reporter dump
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 2/6] devlink: Delete useless checks of holding devlink lock Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET command doesn't have .doit callback
and has no use in internal_flags at all. Remove this misleading assignment.

Fixes: e44ef4e4516c ("devlink: Hang reporter's dump method on a dumpit cb")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/core/devlink.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 5ba4f9434acd..1cb2e0ae9173 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -8838,8 +8838,6 @@ static const struct genl_small_ops devlink_nl_ops[] = {
 			    GENL_DONT_VALIDATE_DUMP_STRICT,
 		.dumpit = devlink_nl_cmd_health_reporter_dump_get_dumpit,
 		.flags = GENL_ADMIN_PERM,
-		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT |
-				  DEVLINK_NL_FLAG_NO_LOCK,
 	},
 	{
 		.cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 2/6] devlink: Delete useless checks of holding devlink lock
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 1/6] devlink: Remove misleading internal_flags from health reporter dump Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 3/6] devlink: Simplify devlink resources unregister call Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

The snapshot API is fully protected by devlink->lock and these internal
functions are not exported directly to the code outside of the devlink.c.
This makes the checks of holding devlink lock as completely redundant.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/core/devlink.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1cb2e0ae9173..dcc09c62f3e5 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5223,8 +5223,6 @@ static int __devlink_snapshot_id_increment(struct devlink *devlink, u32 id)
 	unsigned long count;
 	void *p;
 
-	lockdep_assert_held(&devlink->lock);
-
 	p = xa_load(&devlink->snapshot_ids, id);
 	if (WARN_ON(!p))
 		return -EINVAL;
@@ -5259,8 +5257,6 @@ static void __devlink_snapshot_id_decrement(struct devlink *devlink, u32 id)
 	unsigned long count;
 	void *p;
 
-	lockdep_assert_held(&devlink->lock);
-
 	p = xa_load(&devlink->snapshot_ids, id);
 	if (WARN_ON(!p))
 		return;
@@ -5298,8 +5294,6 @@ static void __devlink_snapshot_id_decrement(struct devlink *devlink, u32 id)
  */
 static int __devlink_snapshot_id_insert(struct devlink *devlink, u32 id)
 {
-	lockdep_assert_held(&devlink->lock);
-
 	if (xa_load(&devlink->snapshot_ids, id))
 		return -EEXIST;
 
@@ -5325,8 +5319,6 @@ static int __devlink_snapshot_id_insert(struct devlink *devlink, u32 id)
  */
 static int __devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id)
 {
-	lockdep_assert_held(&devlink->lock);
-
 	return xa_alloc(&devlink->snapshot_ids, id, xa_mk_value(1),
 			xa_limit_32b, GFP_KERNEL);
 }
@@ -5353,8 +5345,6 @@ __devlink_region_snapshot_create(struct devlink_region *region,
 	struct devlink_snapshot *snapshot;
 	int err;
 
-	lockdep_assert_held(&devlink->lock);
-
 	/* check if region can hold one more snapshot */
 	if (region->cur_snapshots == region->max_snapshots)
 		return -ENOSPC;
@@ -5391,8 +5381,6 @@ static void devlink_region_snapshot_del(struct devlink_region *region,
 {
 	struct devlink *devlink = region->devlink;
 
-	lockdep_assert_held(&devlink->lock);
-
 	devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
 	region->cur_snapshots--;
 	list_del(&snapshot->list);
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 3/6] devlink: Simplify devlink resources unregister call
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 1/6] devlink: Remove misleading internal_flags from health reporter dump Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 2/6] devlink: Delete useless checks of holding devlink lock Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 4/6] devlink: Clean registration of devlink port Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

The devlink_resources_unregister() used second parameter as an
entry point for the recursive removal of devlink resources. None
of external to devlink users needed to use this field, so lat's
remove it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  8 ++---
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  4 +--
 drivers/net/netdevsim/dev.c                   |  4 +--
 include/net/devlink.h                         |  3 +-
 net/core/devlink.c                            | 34 +++++++++++--------
 net/dsa/dsa.c                                 |  2 +-
 6 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 3fd3812b8f31..0d1f08bbf631 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -160,7 +160,7 @@ static void mlxsw_ports_fini(struct mlxsw_core *mlxsw_core, bool reload)
 
 	devlink_resource_occ_get_unregister(devlink, MLXSW_CORE_RESOURCE_PORTS);
 	if (!reload)
-		devlink_resources_unregister(priv_to_devlink(mlxsw_core), NULL);
+		devlink_resources_unregister(priv_to_devlink(mlxsw_core));
 
 	kfree(mlxsw_core->ports);
 }
@@ -2033,7 +2033,7 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	mlxsw_ports_fini(mlxsw_core, reload);
 err_ports_init:
 	if (!reload)
-		devlink_resources_unregister(devlink, NULL);
+		devlink_resources_unregister(devlink);
 err_register_resources:
 	mlxsw_bus->fini(bus_priv);
 err_bus_init:
@@ -2099,7 +2099,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
 	kfree(mlxsw_core->lag.mapping);
 	mlxsw_ports_fini(mlxsw_core, reload);
 	if (!reload)
-		devlink_resources_unregister(devlink, NULL);
+		devlink_resources_unregister(devlink);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	if (!reload)
 		devlink_free(devlink);
@@ -2108,7 +2108,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
 
 reload_fail_deinit:
 	mlxsw_core_params_unregister(mlxsw_core);
-	devlink_resources_unregister(devlink, NULL);
+	devlink_resources_unregister(devlink);
 	devlink_free(devlink);
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 5925db386b1b..6a5887aacbc0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3336,7 +3336,7 @@ static int mlxsw_sp1_resources_register(struct mlxsw_core *mlxsw_core)
 err_policer_resources_register:
 err_resources_counter_register:
 err_resources_span_register:
-	devlink_resources_unregister(priv_to_devlink(mlxsw_core), NULL);
+	devlink_resources_unregister(priv_to_devlink(mlxsw_core));
 	return err;
 }
 
@@ -3370,7 +3370,7 @@ static int mlxsw_sp2_resources_register(struct mlxsw_core *mlxsw_core)
 err_policer_resources_register:
 err_resources_counter_register:
 err_resources_span_register:
-	devlink_resources_unregister(priv_to_devlink(mlxsw_core), NULL);
+	devlink_resources_unregister(priv_to_devlink(mlxsw_core));
 	return err;
 }
 
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 54345c096a16..08d7b465a0de 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1622,7 +1622,7 @@ int nsim_drv_probe(struct nsim_bus_dev *nsim_bus_dev)
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
 err_dl_unregister:
-	devlink_resources_unregister(devlink, NULL);
+	devlink_resources_unregister(devlink);
 err_vfc_free:
 	kfree(nsim_dev->vfconfigs);
 err_devlink_free:
@@ -1668,7 +1668,7 @@ void nsim_drv_remove(struct nsim_bus_dev *nsim_bus_dev)
 	nsim_dev_debugfs_exit(nsim_dev);
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
-	devlink_resources_unregister(devlink, NULL);
+	devlink_resources_unregister(devlink);
 	kfree(nsim_dev->vfconfigs);
 	devlink_free(devlink);
 	dev_set_drvdata(&nsim_bus_dev->dev, NULL);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index aab3d007c577..f919545f5781 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1567,8 +1567,7 @@ int devlink_resource_register(struct devlink *devlink,
 			      u64 resource_id,
 			      u64 parent_resource_id,
 			      const struct devlink_resource_size_params *size_params);
-void devlink_resources_unregister(struct devlink *devlink,
-				  struct devlink_resource *resource);
+void devlink_resources_unregister(struct devlink *devlink);
 int devlink_resource_size_get(struct devlink *devlink,
 			      u64 resource_id,
 			      u64 *p_resource_size);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index dcc09c62f3e5..af94015b7424 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -9889,34 +9889,38 @@ int devlink_resource_register(struct devlink *devlink,
 }
 EXPORT_SYMBOL_GPL(devlink_resource_register);
 
+static void devlink_resource_unregister(struct devlink *devlink,
+					struct devlink_resource *resource)
+{
+	struct devlink_resource *tmp, *child_resource;
+
+	list_for_each_entry_safe(child_resource, tmp, &resource->resource_list,
+				 list) {
+		devlink_resource_unregister(devlink, child_resource);
+		list_del(&child_resource->list);
+		kfree(child_resource);
+	}
+}
+
 /**
  *	devlink_resources_unregister - free all resources
  *
  *	@devlink: devlink
- *	@resource: resource
  */
-void devlink_resources_unregister(struct devlink *devlink,
-				  struct devlink_resource *resource)
+void devlink_resources_unregister(struct devlink *devlink)
 {
 	struct devlink_resource *tmp, *child_resource;
-	struct list_head *resource_list;
-
-	if (resource)
-		resource_list = &resource->resource_list;
-	else
-		resource_list = &devlink->resource_list;
 
-	if (!resource)
-		mutex_lock(&devlink->lock);
+	mutex_lock(&devlink->lock);
 
-	list_for_each_entry_safe(child_resource, tmp, resource_list, list) {
-		devlink_resources_unregister(devlink, child_resource);
+	list_for_each_entry_safe(child_resource, tmp, &devlink->resource_list,
+				 list) {
+		devlink_resource_unregister(devlink, child_resource);
 		list_del(&child_resource->list);
 		kfree(child_resource);
 	}
 
-	if (!resource)
-		mutex_unlock(&devlink->lock);
+	mutex_unlock(&devlink->lock);
 }
 EXPORT_SYMBOL_GPL(devlink_resources_unregister);
 
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index ea5169e671ae..d9d0d227092c 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -406,7 +406,7 @@ EXPORT_SYMBOL_GPL(dsa_devlink_resource_register);
 
 void dsa_devlink_resources_unregister(struct dsa_switch *ds)
 {
-	devlink_resources_unregister(ds->devlink, NULL);
+	devlink_resources_unregister(ds->devlink);
 }
 EXPORT_SYMBOL_GPL(dsa_devlink_resources_unregister);
 
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 4/6] devlink: Clean registration of devlink port
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
                   ` (2 preceding siblings ...)
  2021-11-17 18:26 ` [PATCH net-next 3/6] devlink: Simplify devlink resources unregister call Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  2021-11-18  4:49   ` Jakub Kicinski
  2021-11-17 18:26 ` [PATCH net-next 5/6] devlink: Reshuffle resource registration logic Leon Romanovsky
  2021-11-17 18:26 ` [PATCH net-next 6/6] devlink: Inline sb related functions Leon Romanovsky
  5 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

devlink_port_register() is in-kernel API and as such can't really fail
as long as driver author didn't make a mistake by providing already existing
port index. Instead of relying on various error prints from the driver,
convert the existence check to be WARN_ON(), so such a mistake will be
caught easier.

As an outcome of this conversion, it was made clear that this function
should be void and devlink->lock was intended to protect addition to
port_list.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |  7 +---
 .../freescale/dpaa2/dpaa2-eth-devlink.c       |  7 +---
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 23 ++-----------
 .../marvell/prestera/prestera_devlink.c       |  8 +----
 drivers/net/ethernet/mellanox/mlx4/main.c     |  4 +--
 .../ethernet/mellanox/mlx5/core/en/devlink.c  |  5 ++-
 .../ethernet/mellanox/mlx5/core/en/devlink.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  7 +---
 .../mellanox/mlx5/core/esw/devlink_port.c     |  9 ++----
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  7 ++--
 drivers/net/ethernet/mscc/ocelot_net.c        |  4 +--
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  4 +--
 .../ethernet/pensando/ionic/ionic_devlink.c   |  8 +----
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      | 14 +-------
 drivers/net/netdevsim/dev.c                   |  7 ++--
 include/net/devlink.h                         |  6 ++--
 net/core/devlink.c                            | 32 ++++++++-----------
 net/dsa/dsa2.c                                |  9 ++----
 18 files changed, 41 insertions(+), 122 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 5c464ea73576..c4a7122ee1fd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -1209,11 +1209,7 @@ int bnxt_dl_register(struct bnxt *bp)
 	memcpy(attrs.switch_id.id, bp->dsn, sizeof(bp->dsn));
 	attrs.switch_id.id_len = sizeof(bp->dsn);
 	devlink_port_attrs_set(&bp->dl_port, &attrs);
-	rc = devlink_port_register(dl, &bp->dl_port, bp->pf.port_id);
-	if (rc) {
-		netdev_err(bp->dev, "devlink_port_register failed\n");
-		goto err_dl_free;
-	}
+	devlink_port_register(dl, &bp->dl_port, bp->pf.port_id);
 
 	rc = bnxt_dl_params_register(bp);
 	if (rc)
@@ -1225,7 +1221,6 @@ int bnxt_dl_register(struct bnxt *bp)
 
 err_dl_port_unreg:
 	devlink_port_unregister(&bp->dl_port);
-err_dl_free:
 	devlink_free(dl);
 	return rc;
 }
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.c
index 7fefe1574b6a..d19423780e59 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.c
@@ -226,17 +226,12 @@ int dpaa2_eth_dl_port_add(struct dpaa2_eth_priv *priv)
 {
 	struct devlink_port *devlink_port = &priv->devlink_port;
 	struct devlink_port_attrs attrs = {};
-	int err;
 
 	attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
 	devlink_port_attrs_set(devlink_port, &attrs);
 
-	err = devlink_port_register(priv->devlink, devlink_port, 0);
-	if (err)
-		return err;
-
+	devlink_port_register(priv->devlink, devlink_port, 0);
 	devlink_port_type_eth_set(devlink_port, priv->net_dev);
-
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index b9bd9f9472f6..ba4c463cb276 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -498,10 +498,6 @@ int ice_devlink_create_pf_port(struct ice_pf *pf)
 	struct devlink_port *devlink_port;
 	struct devlink *devlink;
 	struct ice_vsi *vsi;
-	struct device *dev;
-	int err;
-
-	dev = ice_pf_to_dev(pf);
 
 	devlink_port = &pf->devlink_port;
 
@@ -514,13 +510,7 @@ int ice_devlink_create_pf_port(struct ice_pf *pf)
 	devlink_port_attrs_set(devlink_port, &attrs);
 	devlink = priv_to_devlink(pf);
 
-	err = devlink_port_register(devlink, devlink_port, vsi->idx);
-	if (err) {
-		dev_err(dev, "Failed to create devlink port for PF %d, error %d\n",
-			pf->hw.pf_id, err);
-		return err;
-	}
-
+	devlink_port_register(devlink, devlink_port, vsi->idx);
 	return 0;
 }
 
@@ -554,12 +544,9 @@ int ice_devlink_create_vf_port(struct ice_vf *vf)
 	struct devlink_port *devlink_port;
 	struct devlink *devlink;
 	struct ice_vsi *vsi;
-	struct device *dev;
 	struct ice_pf *pf;
-	int err;
 
 	pf = vf->pf;
-	dev = ice_pf_to_dev(pf);
 	vsi = ice_get_vf_vsi(vf);
 	devlink_port = &vf->devlink_port;
 
@@ -570,13 +557,7 @@ int ice_devlink_create_vf_port(struct ice_vf *vf)
 	devlink_port_attrs_set(devlink_port, &attrs);
 	devlink = priv_to_devlink(pf);
 
-	err = devlink_port_register(devlink, devlink_port, vsi->idx);
-	if (err) {
-		dev_err(dev, "Failed to create devlink port for VF %d, error %d\n",
-			vf->vf_id, err);
-		return err;
-	}
-
+	devlink_port_register(devlink, devlink_port, vsi->idx);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_devlink.c b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c
index 06279cd6da67..91a4ff2e412b 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_devlink.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c
@@ -422,7 +422,6 @@ int prestera_devlink_port_register(struct prestera_port *port)
 	struct prestera_switch *sw = port->sw;
 	struct devlink *dl = priv_to_devlink(sw);
 	struct devlink_port_attrs attrs = {};
-	int err;
 
 	attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
 	attrs.phys.port_number = port->fp_id;
@@ -431,12 +430,7 @@ int prestera_devlink_port_register(struct prestera_port *port)
 
 	devlink_port_attrs_set(&port->dl_port, &attrs);
 
-	err = devlink_port_register(dl, &port->dl_port, port->fp_id);
-	if (err) {
-		dev_err(prestera_dev(sw), "devlink_port_register failed: %d\n", err);
-		return err;
-	}
-
+	devlink_port_register(dl, &port->dl_port, port->fp_id);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index b187c210d4d6..c88a586ecd8d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3033,9 +3033,7 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
 	int err;
 
-	err = devlink_port_register(devlink, &info->devlink_port, port);
-	if (err)
-		return err;
+	devlink_port_register(devlink, &info->devlink_port, port);
 
 	/* Ethernet and IB drivers will normally set the port type,
 	 * but if they are not built set the type now to prevent
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
index ae52e7f38306..76326858db22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.c
@@ -14,7 +14,7 @@ mlx5e_devlink_get_port_parent_id(struct mlx5_core_dev *dev, struct netdev_phys_i
 	memcpy(ppid->id, &parent_id, sizeof(parent_id));
 }
 
-int mlx5e_devlink_port_register(struct mlx5e_priv *priv)
+void mlx5e_devlink_port_register(struct mlx5e_priv *priv)
 {
 	struct devlink *devlink = priv_to_devlink(priv->mdev);
 	struct devlink_port_attrs attrs = {};
@@ -40,8 +40,7 @@ int mlx5e_devlink_port_register(struct mlx5e_priv *priv)
 	dl_port = mlx5e_devlink_get_dl_port(priv);
 	memset(dl_port, 0, sizeof(*dl_port));
 	devlink_port_attrs_set(dl_port, &attrs);
-
-	return devlink_port_register(devlink, dl_port, dl_port_index);
+	devlink_port_register(devlink, dl_port, dl_port_index);
 }
 
 void mlx5e_devlink_port_type_eth_set(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.h
index 10b50feb9883..04ada8624367 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/devlink.h
@@ -7,7 +7,7 @@
 #include <net/devlink.h>
 #include "en.h"
 
-int mlx5e_devlink_port_register(struct mlx5e_priv *priv);
+void mlx5e_devlink_port_register(struct mlx5e_priv *priv);
 void mlx5e_devlink_port_unregister(struct mlx5e_priv *priv);
 void mlx5e_devlink_port_type_eth_set(struct mlx5e_priv *priv);
 struct devlink_port *mlx5e_get_devlink_port(struct net_device *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 65571593ec5c..385533dce1bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5461,11 +5461,7 @@ static int mlx5e_probe(struct auxiliary_device *adev,
 	priv->profile = profile;
 	priv->ppriv = NULL;
 
-	err = mlx5e_devlink_port_register(priv);
-	if (err) {
-		mlx5_core_err(mdev, "mlx5e_devlink_port_register failed, %d\n", err);
-		goto err_destroy_netdev;
-	}
+	mlx5e_devlink_port_register(priv);
 
 	err = profile->init(mdev, netdev);
 	if (err) {
@@ -5497,7 +5493,6 @@ static int mlx5e_probe(struct auxiliary_device *adev,
 	profile->cleanup(priv);
 err_devlink_cleanup:
 	mlx5e_devlink_port_unregister(priv);
-err_destroy_netdev:
 	mlx5e_destroy_netdev(priv);
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index 7f9b96d9537e..d1550f661644 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -87,9 +87,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
 
 	devlink = priv_to_devlink(dev);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
-	err = devlink_port_register(devlink, dl_port, dl_port_index);
-	if (err)
-		goto reg_err;
+	devlink_port_register(devlink, dl_port, dl_port_index);
 
 	err = devlink_rate_leaf_create(dl_port, vport);
 	if (err)
@@ -100,7 +98,6 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
 
 rate_err:
 	devlink_port_unregister(dl_port);
-reg_err:
 	mlx5_esw_dl_port_free(dl_port);
 	return err;
 }
@@ -156,9 +153,7 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p
 	devlink_port_attrs_pci_sf_set(dl_port, controller, pfnum, sfnum, !!controller);
 	devlink = priv_to_devlink(dev);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
-	err = devlink_port_register(devlink, dl_port, dl_port_index);
-	if (err)
-		return err;
+	devlink_port_register(devlink, dl_port, dl_port_index);
 
 	err = devlink_rate_leaf_create(dl_port, vport);
 	if (err)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 0d1f08bbf631..7684a86c1745 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2760,7 +2760,6 @@ static int __mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
 					&mlxsw_core->ports[local_port];
 	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
 	struct devlink_port_attrs attrs = {};
-	int err;
 
 	attrs.split = split;
 	attrs.lanes = lanes;
@@ -2772,10 +2771,8 @@ static int __mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
 	attrs.switch_id.id_len = switch_id_len;
 	mlxsw_core_port->local_port = local_port;
 	devlink_port_attrs_set(devlink_port, &attrs);
-	err = devlink_port_register(devlink, devlink_port, local_port);
-	if (err)
-		memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
-	return err;
+	devlink_port_register(devlink, devlink_port, local_port);
+	return 0;
 }
 
 static void __mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port)
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index 0fcf359a6975..fc534b7c3a96 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -172,8 +172,8 @@ int ocelot_port_devlink_init(struct ocelot *ocelot, int port,
 	attrs.flavour = flavour;
 
 	devlink_port_attrs_set(dlp, &attrs);
-
-	return devlink_port_register(dl, dlp, port);
+	devlink_port_register(dl, dlp, port);
+	return 0;
 }
 
 void ocelot_port_devlink_teardown(struct ocelot *ocelot, int port)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index bea978df7713..ed53462ac9c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -374,8 +374,8 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
 	devlink_port_attrs_set(&port->dl_port, &attrs);
 
 	devlink = priv_to_devlink(app->pf);
-
-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
+	devlink_port_register(devlink, &port->dl_port, port->eth_id);
+	return 0;
 }
 
 void nfp_devlink_port_unregister(struct nfp_port *port)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
index 4297ed9024c0..f8c80856ca02 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.c
@@ -80,16 +80,10 @@ int ionic_devlink_register(struct ionic *ionic)
 {
 	struct devlink *dl = priv_to_devlink(ionic);
 	struct devlink_port_attrs attrs = {};
-	int err;
 
 	attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
 	devlink_port_attrs_set(&ionic->dl_port, &attrs);
-	err = devlink_port_register(dl, &ionic->dl_port, 0);
-	if (err) {
-		dev_err(ionic->dev, "devlink_port_register failed: %d\n", err);
-		return err;
-	}
-
+	devlink_port_register(dl, &ionic->dl_port, 0);
 	devlink_port_type_eth_set(&ionic->dl_port, ionic->lif->netdev);
 	devlink_register(dl);
 	return 0;
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index c092cb61416a..2e3cc08da998 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -2453,24 +2453,12 @@ static int am65_cpsw_nuss_register_devlink(struct am65_cpsw_common *common)
 		memcpy(attrs.switch_id.id, common->switch_id, attrs.switch_id.id_len);
 		devlink_port_attrs_set(dl_port, &attrs);
 
-		ret = devlink_port_register(common->devlink, dl_port, port->port_id);
-		if (ret) {
-			dev_err(dev, "devlink_port reg fail for port %d, ret:%d\n",
-				port->port_id, ret);
-			goto dl_port_unreg;
-		}
+		devlink_port_register(common->devlink, dl_port, port->port_id);
 		devlink_port_type_eth_set(dl_port, port->ndev);
 	}
 	devlink_register(common->devlink);
 	return ret;
 
-dl_port_unreg:
-	for (i = i - 1; i >= 1; i--) {
-		port = am65_common_get_port(common, i);
-		dl_port = &port->devlink_port;
-
-		devlink_port_unregister(dl_port);
-	}
 dl_unreg:
 	devlink_free(common->devlink);
 	return ret;
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 08d7b465a0de..1f0c085d8f76 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1380,10 +1380,8 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev, enum nsim_dev_port_typ
 	memcpy(attrs.switch_id.id, nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
 	attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
 	devlink_port_attrs_set(devlink_port, &attrs);
-	err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
-				    nsim_dev_port->port_index);
-	if (err)
-		goto err_port_free;
+	devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
+			      nsim_dev_port->port_index);
 
 	err = nsim_dev_port_debugfs_init(nsim_dev, nsim_dev_port);
 	if (err)
@@ -1413,7 +1411,6 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev, enum nsim_dev_port_typ
 	nsim_dev_port_debugfs_exit(nsim_dev_port);
 err_dl_port_unregister:
 	devlink_port_unregister(devlink_port);
-err_port_free:
 	kfree(nsim_dev_port);
 	return err;
 }
diff --git a/include/net/devlink.h b/include/net/devlink.h
index f919545f5781..bc2bdff043a3 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1511,9 +1511,9 @@ void devlink_set_features(struct devlink *devlink, u64 features);
 void devlink_register(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
-int devlink_port_register(struct devlink *devlink,
-			  struct devlink_port *devlink_port,
-			  unsigned int port_index);
+void devlink_port_register(struct devlink *devlink,
+			   struct devlink_port *devlink_port,
+			   unsigned int port_index);
 void devlink_port_unregister(struct devlink_port *devlink_port);
 void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 			       struct net_device *netdev);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index af94015b7424..356057ea2e52 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -239,12 +239,6 @@ static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink,
 	return NULL;
 }
 
-static bool devlink_port_index_exists(struct devlink *devlink,
-				      unsigned int port_index)
-{
-	return devlink_port_get_by_index(devlink, port_index);
-}
-
 static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink,
 							struct nlattr **attrs)
 {
@@ -9203,30 +9197,28 @@ static void devlink_port_type_warn_cancel(struct devlink_port *devlink_port)
  *	Note that the caller should take care of zeroing the devlink_port
  *	structure.
  */
-int devlink_port_register(struct devlink *devlink,
-			  struct devlink_port *devlink_port,
-			  unsigned int port_index)
+void devlink_port_register(struct devlink *devlink,
+			   struct devlink_port *devlink_port,
+			   unsigned int port_index)
 {
-	mutex_lock(&devlink->lock);
-	if (devlink_port_index_exists(devlink, port_index)) {
-		mutex_unlock(&devlink->lock);
-		return -EEXIST;
-	}
-
+	WARN_ON(devlink_port_get_by_index(devlink, port_index));
 	WARN_ON(devlink_port->devlink);
+
 	devlink_port->devlink = devlink;
 	devlink_port->index = port_index;
 	spin_lock_init(&devlink_port->type_lock);
 	INIT_LIST_HEAD(&devlink_port->reporter_list);
 	mutex_init(&devlink_port->reporters_lock);
-	list_add_tail(&devlink_port->list, &devlink->port_list);
 	INIT_LIST_HEAD(&devlink_port->param_list);
 	INIT_LIST_HEAD(&devlink_port->region_list);
-	mutex_unlock(&devlink->lock);
 	INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn);
-	devlink_port_type_warn_schedule(devlink_port);
+
+	mutex_lock(&devlink->lock);
+	list_add_tail(&devlink_port->list, &devlink->port_list);
+	mutex_unlock(&devlink->lock);
+
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
-	return 0;
+	devlink_port_type_warn_schedule(devlink_port);
 }
 EXPORT_SYMBOL_GPL(devlink_port_register);
 
@@ -9241,9 +9233,11 @@ void devlink_port_unregister(struct devlink_port *devlink_port)
 
 	devlink_port_type_warn_cancel(devlink_port);
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
+
 	mutex_lock(&devlink->lock);
 	list_del(&devlink_port->list);
 	mutex_unlock(&devlink->lock);
+
 	WARN_ON(!list_empty(&devlink_port->reporter_list));
 	WARN_ON(!list_empty(&devlink_port->region_list));
 	mutex_destroy(&devlink_port->reporters_lock);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 826957b6442b..29f39f761e05 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -504,7 +504,6 @@ static int dsa_port_devlink_setup(struct dsa_port *dp)
 	struct devlink *dl = dp->ds->devlink;
 	const unsigned char *id;
 	unsigned char len;
-	int err;
 
 	id = (const unsigned char *)&dst->index;
 	len = sizeof(dst->index);
@@ -530,12 +529,10 @@ static int dsa_port_devlink_setup(struct dsa_port *dp)
 	}
 
 	devlink_port_attrs_set(dlp, &attrs);
-	err = devlink_port_register(dl, dlp, dp->index);
-
-	if (!err)
-		dp->devlink_port_setup = true;
+	devlink_port_register(dl, dlp, dp->index);
+	dp->devlink_port_setup = true;
 
-	return err;
+	return 0;
 }
 
 static void dsa_port_teardown(struct dsa_port *dp)
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
                   ` (3 preceding siblings ...)
  2021-11-17 18:26 ` [PATCH net-next 4/6] devlink: Clean registration of devlink port Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  2021-11-18  4:49   ` Jakub Kicinski
  2021-11-17 18:26 ` [PATCH net-next 6/6] devlink: Inline sb related functions Leon Romanovsky
  5 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

The devlink->lock doesn't need to be held during whole execution of
function, but only when recursive path walk and list addition are
performed. So reshuffle the resource registration logic to allocate
resource and configure it without lock.

As part of this change, complain more louder if driver authors used
already existed resource_id. It is performed outside of the locks as
drivers were supposed to provide unique IDs and such error can't happen.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/core/devlink.c | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 356057ea2e52..1dda313d6d1b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -9833,17 +9833,9 @@ int devlink_resource_register(struct devlink *devlink,
 {
 	struct devlink_resource *resource;
 	struct list_head *resource_list;
-	bool top_hierarchy;
 	int err = 0;
 
-	top_hierarchy = parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP;
-
-	mutex_lock(&devlink->lock);
-	resource = devlink_resource_find(devlink, NULL, resource_id);
-	if (resource) {
-		err = -EINVAL;
-		goto out;
-	}
+	WARN_ON(devlink_resource_find(devlink, NULL, resource_id));
 
 	resource = kzalloc(sizeof(*resource), GFP_KERNEL);
 	if (!resource) {
@@ -9851,7 +9843,17 @@ int devlink_resource_register(struct devlink *devlink,
 		goto out;
 	}
 
-	if (top_hierarchy) {
+	resource->name = resource_name;
+	resource->size = resource_size;
+	resource->size_new = resource_size;
+	resource->id = resource_id;
+	resource->size_valid = true;
+	memcpy(&resource->size_params, size_params,
+	       sizeof(resource->size_params));
+	INIT_LIST_HEAD(&resource->resource_list);
+
+	mutex_lock(&devlink->lock);
+	if (parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP) {
 		resource_list = &devlink->resource_list;
 	} else {
 		struct devlink_resource *parent_resource;
@@ -9868,14 +9870,6 @@ int devlink_resource_register(struct devlink *devlink,
 		}
 	}
 
-	resource->name = resource_name;
-	resource->size = resource_size;
-	resource->size_new = resource_size;
-	resource->id = resource_id;
-	resource->size_valid = true;
-	memcpy(&resource->size_params, size_params,
-	       sizeof(resource->size_params));
-	INIT_LIST_HEAD(&resource->resource_list);
 	list_add_tail(&resource->list, resource_list);
 out:
 	mutex_unlock(&devlink->lock);
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 6/6] devlink: Inline sb related functions
  2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
                   ` (4 preceding siblings ...)
  2021-11-17 18:26 ` [PATCH net-next 5/6] devlink: Reshuffle resource registration logic Leon Romanovsky
@ 2021-11-17 18:26 ` Leon Romanovsky
  5 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-17 18:26 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: Leon Romanovsky, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

From: Leon Romanovsky <leonro@nvidia.com>

Remove useless indirection of sb related functions, which called only
once and do nothing except accessing specific struct field.

As part of this cleanup, properly report an programming erro if already
existing sb index was supplied during SB registration.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/core/devlink.c | 110 ++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 77 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1dda313d6d1b..fb5d3ba331f8 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -361,12 +361,6 @@ static struct devlink_sb *devlink_sb_get_by_index(struct devlink *devlink,
 	return NULL;
 }
 
-static bool devlink_sb_index_exists(struct devlink *devlink,
-				    unsigned int sb_index)
-{
-	return devlink_sb_get_by_index(devlink, sb_index);
-}
-
 static struct devlink_sb *devlink_sb_get_from_attrs(struct devlink *devlink,
 						    struct nlattr **attrs)
 {
@@ -382,16 +376,11 @@ static struct devlink_sb *devlink_sb_get_from_attrs(struct devlink *devlink,
 	return ERR_PTR(-EINVAL);
 }
 
-static struct devlink_sb *devlink_sb_get_from_info(struct devlink *devlink,
-						   struct genl_info *info)
-{
-	return devlink_sb_get_from_attrs(devlink, info->attrs);
-}
-
-static int devlink_sb_pool_index_get_from_attrs(struct devlink_sb *devlink_sb,
-						struct nlattr **attrs,
-						u16 *p_pool_index)
+static int devlink_sb_pool_index_get_from_info(struct devlink_sb *devlink_sb,
+					       struct genl_info *info,
+					       u16 *p_pool_index)
 {
+	struct nlattr **attrs = info->attrs;
 	u16 val;
 
 	if (!attrs[DEVLINK_ATTR_SB_POOL_INDEX])
@@ -404,18 +393,11 @@ static int devlink_sb_pool_index_get_from_attrs(struct devlink_sb *devlink_sb,
 	return 0;
 }
 
-static int devlink_sb_pool_index_get_from_info(struct devlink_sb *devlink_sb,
-					       struct genl_info *info,
-					       u16 *p_pool_index)
-{
-	return devlink_sb_pool_index_get_from_attrs(devlink_sb, info->attrs,
-						    p_pool_index);
-}
-
 static int
-devlink_sb_pool_type_get_from_attrs(struct nlattr **attrs,
-				    enum devlink_sb_pool_type *p_pool_type)
+devlink_sb_pool_type_get_from_info(struct genl_info *info,
+				   enum devlink_sb_pool_type *p_pool_type)
 {
+	struct nlattr **attrs = info->attrs;
 	u8 val;
 
 	if (!attrs[DEVLINK_ATTR_SB_POOL_TYPE])
@@ -430,16 +412,10 @@ devlink_sb_pool_type_get_from_attrs(struct nlattr **attrs,
 }
 
 static int
-devlink_sb_pool_type_get_from_info(struct genl_info *info,
-				   enum devlink_sb_pool_type *p_pool_type)
-{
-	return devlink_sb_pool_type_get_from_attrs(info->attrs, p_pool_type);
-}
-
-static int
-devlink_sb_th_type_get_from_attrs(struct nlattr **attrs,
-				  enum devlink_sb_threshold_type *p_th_type)
+devlink_sb_th_type_get_from_info(struct genl_info *info,
+				 enum devlink_sb_threshold_type *p_th_type)
 {
+	struct nlattr **attrs = info->attrs;
 	u8 val;
 
 	if (!attrs[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE])
@@ -454,18 +430,12 @@ devlink_sb_th_type_get_from_attrs(struct nlattr **attrs,
 }
 
 static int
-devlink_sb_th_type_get_from_info(struct genl_info *info,
-				 enum devlink_sb_threshold_type *p_th_type)
-{
-	return devlink_sb_th_type_get_from_attrs(info->attrs, p_th_type);
-}
-
-static int
-devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb,
-				   struct nlattr **attrs,
-				   enum devlink_sb_pool_type pool_type,
-				   u16 *p_tc_index)
+devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
+				  struct genl_info *info,
+				  enum devlink_sb_pool_type pool_type,
+				  u16 *p_tc_index)
 {
+	struct nlattr **attrs = info->attrs;
 	u16 val;
 
 	if (!attrs[DEVLINK_ATTR_SB_TC_INDEX])
@@ -482,16 +452,6 @@ devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb,
 	return 0;
 }
 
-static int
-devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
-				  struct genl_info *info,
-				  enum devlink_sb_pool_type pool_type,
-				  u16 *p_tc_index)
-{
-	return devlink_sb_tc_index_get_from_attrs(devlink_sb, info->attrs,
-						  pool_type, p_tc_index);
-}
-
 struct devlink_region {
 	struct devlink *devlink;
 	struct devlink_port *port;
@@ -1972,7 +1932,7 @@ static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb,
 	struct sk_buff *msg;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2090,7 +2050,7 @@ static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff *skb,
 	u16 pool_index;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2214,7 +2174,7 @@ static int devlink_nl_cmd_sb_pool_set_doit(struct sk_buff *skb,
 	u32 size;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2305,7 +2265,7 @@ static int devlink_nl_cmd_sb_port_pool_get_doit(struct sk_buff *skb,
 	u16 pool_index;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2435,7 +2395,7 @@ static int devlink_nl_cmd_sb_port_pool_set_doit(struct sk_buff *skb,
 	u32 threshold;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2528,7 +2488,7 @@ static int devlink_nl_cmd_sb_tc_pool_bind_get_doit(struct sk_buff *skb,
 	u16 tc_index;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2689,7 +2649,7 @@ static int devlink_nl_cmd_sb_tc_pool_bind_set_doit(struct sk_buff *skb,
 	u32 threshold;
 	int err;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2723,7 +2683,7 @@ static int devlink_nl_cmd_sb_occ_snapshot_doit(struct sk_buff *skb,
 	const struct devlink_ops *ops = devlink->ops;
 	struct devlink_sb *devlink_sb;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -2739,7 +2699,7 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb,
 	const struct devlink_ops *ops = devlink->ops;
 	struct devlink_sb *devlink_sb;
 
-	devlink_sb = devlink_sb_get_from_info(devlink, info);
+	devlink_sb = devlink_sb_get_from_attrs(devlink, info->attrs);
 	if (IS_ERR(devlink_sb))
 		return PTR_ERR(devlink_sb);
 
@@ -9636,29 +9596,24 @@ int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u16 egress_tc_count)
 {
 	struct devlink_sb *devlink_sb;
-	int err = 0;
 
-	mutex_lock(&devlink->lock);
-	if (devlink_sb_index_exists(devlink, sb_index)) {
-		err = -EEXIST;
-		goto unlock;
-	}
+	WARN_ON(devlink_sb_get_by_index(devlink, sb_index));
 
 	devlink_sb = kzalloc(sizeof(*devlink_sb), GFP_KERNEL);
-	if (!devlink_sb) {
-		err = -ENOMEM;
-		goto unlock;
-	}
+	if (!devlink_sb)
+		return -ENOMEM;
+
 	devlink_sb->index = sb_index;
 	devlink_sb->size = size;
 	devlink_sb->ingress_pools_count = ingress_pools_count;
 	devlink_sb->egress_pools_count = egress_pools_count;
 	devlink_sb->ingress_tc_count = ingress_tc_count;
 	devlink_sb->egress_tc_count = egress_tc_count;
+
+	mutex_lock(&devlink->lock);
 	list_add_tail(&devlink_sb->list, &devlink->sb_list);
-unlock:
 	mutex_unlock(&devlink->lock);
-	return err;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(devlink_sb_register);
 
@@ -9666,9 +9621,10 @@ void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index)
 {
 	struct devlink_sb *devlink_sb;
 
-	mutex_lock(&devlink->lock);
 	devlink_sb = devlink_sb_get_by_index(devlink, sb_index);
 	WARN_ON(!devlink_sb);
+
+	mutex_lock(&devlink->lock);
 	list_del(&devlink_sb->list);
 	mutex_unlock(&devlink->lock);
 	kfree(devlink_sb);
-- 
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 4/6] devlink: Clean registration of devlink port
  2021-11-17 18:26 ` [PATCH net-next 4/6] devlink: Clean registration of devlink port Leon Romanovsky
@ 2021-11-18  4:49   ` Jakub Kicinski
  2021-11-18  7:32     ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-18  4:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Leon Romanovsky, Alexandre Belloni,
	Andrew Lunn, Aya Levin, Claudiu Manoil, drivers,
	Florian Fainelli, Ido Schimmel, intel-wired-lan, Ioana Ciornei,
	Jesse Brandeburg, Jiri Pirko, linux-kernel, linux-rdma,
	Michael Chan, netdev, oss-drivers, Saeed Mahameed,
	Shannon Nelson, Simon Horman, Taras Chornyi, Tariq Toukan,
	Tony Nguyen, UNGLinuxDriver, Vivien Didelot, Vladimir Oltean

On Wed, 17 Nov 2021 20:26:20 +0200 Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> devlink_port_register() is in-kernel API and as such can't really fail
> as long as driver author didn't make a mistake by providing already existing
> port index. Instead of relying on various error prints from the driver,
> convert the existence check to be WARN_ON(), so such a mistake will be
> caught easier.
> 
> As an outcome of this conversion, it was made clear that this function
> should be void and devlink->lock was intended to protect addition to
> port_list.

Leave this error checking in please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-17 18:26 ` [PATCH net-next 5/6] devlink: Reshuffle resource registration logic Leon Romanovsky
@ 2021-11-18  4:49   ` Jakub Kicinski
  2021-11-18  7:50     ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-18  4:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Leon Romanovsky, Alexandre Belloni,
	Andrew Lunn, Aya Levin, Claudiu Manoil, drivers,
	Florian Fainelli, Ido Schimmel, intel-wired-lan, Ioana Ciornei,
	Jesse Brandeburg, Jiri Pirko, linux-kernel, linux-rdma,
	Michael Chan, netdev, oss-drivers, Saeed Mahameed,
	Shannon Nelson, Simon Horman, Taras Chornyi, Tariq Toukan,
	Tony Nguyen, UNGLinuxDriver, Vivien Didelot, Vladimir Oltean

On Wed, 17 Nov 2021 20:26:21 +0200 Leon Romanovsky wrote:
> -	top_hierarchy = parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP;
> -
> -	mutex_lock(&devlink->lock);
> -	resource = devlink_resource_find(devlink, NULL, resource_id);
> -	if (resource) {
> -		err = -EINVAL;
> -		goto out;
> -	}
> +	WARN_ON(devlink_resource_find(devlink, NULL, resource_id));

This is not atomic with the add now.

>  	resource = kzalloc(sizeof(*resource), GFP_KERNEL);
>  	if (!resource) {
> @@ -9851,7 +9843,17 @@ int devlink_resource_register(struct devlink *devlink,
>  		goto out;
>  	}
>  
> -	if (top_hierarchy) {
> +	resource->name = resource_name;
> +	resource->size = resource_size;
> +	resource->size_new = resource_size;
> +	resource->id = resource_id;
> +	resource->size_valid = true;
> +	memcpy(&resource->size_params, size_params,
> +	       sizeof(resource->size_params));
> +	INIT_LIST_HEAD(&resource->resource_list);
> +
> +	mutex_lock(&devlink->lock);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 4/6] devlink: Clean registration of devlink port
  2021-11-18  4:49   ` Jakub Kicinski
@ 2021-11-18  7:32     ` Leon Romanovsky
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-18  7:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Wed, Nov 17, 2021 at 08:49:29PM -0800, Jakub Kicinski wrote:
> On Wed, 17 Nov 2021 20:26:20 +0200 Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > devlink_port_register() is in-kernel API and as such can't really fail
> > as long as driver author didn't make a mistake by providing already existing
> > port index. Instead of relying on various error prints from the driver,
> > convert the existence check to be WARN_ON(), so such a mistake will be
> > caught easier.
> > 
> > As an outcome of this conversion, it was made clear that this function
> > should be void and devlink->lock was intended to protect addition to
> > port_list.
> 
> Leave this error checking in please.

Are you referring to error checks in the drivers or the below section
from devlink_port_register()?

       mutex_lock(&devlink->lock);
       if (devlink_port_index_exists(devlink, port_index)) {
               mutex_unlock(&devlink->lock);
               return -EEXIST;
       }

Because if it is latter, any driver (I didn't find any) that will rely
on this -EEXIST field should have some sort of locking in top level.
Otherwise nothing will prevent from doing port unregister right
before "return --EXEEXIST".

So change to WARN_ON() will be much more effective in finding wrong
drivers, because they manage port_index and not devlink.

And because this function can't fail, the drivers have a plenty of dead
code.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-18  4:49   ` Jakub Kicinski
@ 2021-11-18  7:50     ` Leon Romanovsky
  2021-11-19  1:48       ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-18  7:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Wed, Nov 17, 2021 at 08:49:56PM -0800, Jakub Kicinski wrote:
> On Wed, 17 Nov 2021 20:26:21 +0200 Leon Romanovsky wrote:
> > -	top_hierarchy = parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP;
> > -
> > -	mutex_lock(&devlink->lock);
> > -	resource = devlink_resource_find(devlink, NULL, resource_id);
> > -	if (resource) {
> > -		err = -EINVAL;
> > -		goto out;
> > -	}
> > +	WARN_ON(devlink_resource_find(devlink, NULL, resource_id));
> 
> This is not atomic with the add now.

And it shouldn't. devlink_resource_find() will return valid resource only
if there driver is completely bogus with races or incorrect allocations of
resource_id.

devlink_*_register(..)
 mutex_lock(&devlink->lock);
 if (devlink_*_find(...)) {
    mutex_unlock(&devlink->lock);
    return ....;
 }
 .....

It is almost always wrong from locking and layering perspective the pattern above,
as it is racy by definition if not protected by top layer.

There are exceptions from the rule above, but devlink is clearly not the
one of such exceptions.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-18  7:50     ` Leon Romanovsky
@ 2021-11-19  1:48       ` Jakub Kicinski
  2021-11-19 15:38         ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-19  1:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Thu, 18 Nov 2021 09:50:20 +0200 Leon Romanovsky wrote:
> And it shouldn't. devlink_resource_find() will return valid resource only
> if there driver is completely bogus with races or incorrect allocations of
> resource_id.
> 
> devlink_*_register(..)
>  mutex_lock(&devlink->lock);
>  if (devlink_*_find(...)) {
>     mutex_unlock(&devlink->lock);
>     return ....;
>  }
>  .....
> 
> It is almost always wrong from locking and layering perspective the pattern above,
> as it is racy by definition if not protected by top layer.
> 
> There are exceptions from the rule above, but devlink is clearly not the
> one of such exceptions.

Just drop the unnecessary "cleanup" patches and limit the amount 
of driver code we'll have to revert if your approach fails.

I spent enough time going back and forth with you.

Please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-19  1:48       ` Jakub Kicinski
@ 2021-11-19 15:38         ` Leon Romanovsky
  2021-11-19 16:10           ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-19 15:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Thu, Nov 18, 2021 at 05:48:13PM -0800, Jakub Kicinski wrote:
> On Thu, 18 Nov 2021 09:50:20 +0200 Leon Romanovsky wrote:
> > And it shouldn't. devlink_resource_find() will return valid resource only
> > if there driver is completely bogus with races or incorrect allocations of
> > resource_id.
> > 
> > devlink_*_register(..)
> >  mutex_lock(&devlink->lock);
> >  if (devlink_*_find(...)) {
> >     mutex_unlock(&devlink->lock);
> >     return ....;
> >  }
> >  .....
> > 
> > It is almost always wrong from locking and layering perspective the pattern above,
> > as it is racy by definition if not protected by top layer.
> > 
> > There are exceptions from the rule above, but devlink is clearly not the
> > one of such exceptions.
> 
> Just drop the unnecessary "cleanup" patches and limit the amount 
> of driver code we'll have to revert if your approach fails.

My approach works, exactly like it works in other subsystems.
https://lore.kernel.org/netdev/cover.1636390483.git.leonro@nvidia.com/

We are waiting to see your proposal extended to support parallel devlink
execution and to be applied to real drivers.
https://lore.kernel.org/netdev/20211030231254.2477599-1-kuba@kernel.org/

Anyway, you are maintainer, you want half work, you will get half work.

> 
> I spent enough time going back and forth with you.
> 
> Please.

Disagreements are hard for everyone, not only for you.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-19 15:38         ` Leon Romanovsky
@ 2021-11-19 16:10           ` Jakub Kicinski
  2021-11-21  8:45             ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-19 16:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Fri, 19 Nov 2021 17:38:53 +0200 Leon Romanovsky wrote:
> On Thu, Nov 18, 2021 at 05:48:13PM -0800, Jakub Kicinski wrote:
> > On Thu, 18 Nov 2021 09:50:20 +0200 Leon Romanovsky wrote:  
> > > And it shouldn't. devlink_resource_find() will return valid resource only
> > > if there driver is completely bogus with races or incorrect allocations of
> > > resource_id.
> > > 
> > > devlink_*_register(..)
> > >  mutex_lock(&devlink->lock);
> > >  if (devlink_*_find(...)) {
> > >     mutex_unlock(&devlink->lock);
> > >     return ....;
> > >  }
> > >  .....
> > > 
> > > It is almost always wrong from locking and layering perspective the pattern above,
> > > as it is racy by definition if not protected by top layer.
> > > 
> > > There are exceptions from the rule above, but devlink is clearly not the
> > > one of such exceptions.  
> > 
> > Just drop the unnecessary "cleanup" patches and limit the amount 
> > of driver code we'll have to revert if your approach fails.  
> 
> My approach works, exactly like it works in other subsystems.
> https://lore.kernel.org/netdev/cover.1636390483.git.leonro@nvidia.com/

What "other subsystems"? I'm aware of the RFC version of these patches.

Breaking up the locks to to protect sub-objects only is fine for
protecting internal lists but now you can't guarantee that the object
exists when driver is called.

I'm sure you'll utter your unprovable "in real drivers.." but the fact
is my approach does not suffer from any such issues. Or depends on
drivers registering devlink last.

I can start passing a pointer to a devlink_port to split/unsplit
functions, which is a great improvement to the devlink driver API.

> We are waiting to see your proposal extended to support parallel devlink
> execution and to be applied to real drivers.
> https://lore.kernel.org/netdev/20211030231254.2477599-1-kuba@kernel.org/

The conversion to xarray you have done is a great improvement, I don't
disagree with the way you convert to allow parallel calls either.

I already told you that real drivers can be converted rather easily,
even if it's not really necessary.

But I'm giving you time to make your proposal. If I spend time
polishing my patches I'll be even more eager to put this behind me.

> Anyway, you are maintainer, you want half work, you will get half work.

What do you mean half work? You have a record of breaking things 
in the area and changing directions. How is my request to limit
unnecessary "cleanups" affecting drivers until the work is finished
not perfectly reasonable?!?!

> > I spent enough time going back and forth with you.
> 
> Disagreements are hard for everyone, not only for you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-19 16:10           ` Jakub Kicinski
@ 2021-11-21  8:45             ` Leon Romanovsky
  2021-11-23  2:27               ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-21  8:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Fri, Nov 19, 2021 at 08:10:17AM -0800, Jakub Kicinski wrote:
> On Fri, 19 Nov 2021 17:38:53 +0200 Leon Romanovsky wrote:
> > On Thu, Nov 18, 2021 at 05:48:13PM -0800, Jakub Kicinski wrote:
> > > On Thu, 18 Nov 2021 09:50:20 +0200 Leon Romanovsky wrote:  
> > > > And it shouldn't. devlink_resource_find() will return valid resource only
> > > > if there driver is completely bogus with races or incorrect allocations of
> > > > resource_id.
> > > > 
> > > > devlink_*_register(..)
> > > >  mutex_lock(&devlink->lock);
> > > >  if (devlink_*_find(...)) {
> > > >     mutex_unlock(&devlink->lock);
> > > >     return ....;
> > > >  }
> > > >  .....
> > > > 
> > > > It is almost always wrong from locking and layering perspective the pattern above,
> > > > as it is racy by definition if not protected by top layer.
> > > > 
> > > > There are exceptions from the rule above, but devlink is clearly not the
> > > > one of such exceptions.  
> > > 
> > > Just drop the unnecessary "cleanup" patches and limit the amount 
> > > of driver code we'll have to revert if your approach fails.  
> > 
> > My approach works, exactly like it works in other subsystems.
> > https://lore.kernel.org/netdev/cover.1636390483.git.leonro@nvidia.com/
> 
> What "other subsystems"? I'm aware of the RFC version of these patches.

Approach to have fine-grained locking scheme, instead of having one big lock.
This was done in MM for mmap_sem, we did it for RDMA too.

> 
> Breaking up the locks to to protect sub-objects only is fine for
> protecting internal lists but now you can't guarantee that the object
> exists when driver is called.

I can only guess about which objects you are talking.

If you are talking about various devlink sub-objects (ports, traps,
e.t.c), they created by the drivers and as such should be managed by them.
Also they are connected to devlink which is guaranteed to exist. At the end,
they called to devlink_XXX->devlink pointer without any existence check.

If you are talking about devlink instance itself, we guarantee that it
exists between devlink_alloc() and devlink_free(). It seems to me pretty
reasonable request from drivers do not access devlink before devlink_alloc()
or after devlink_free(),

> 
> I'm sure you'll utter your unprovable "in real drivers.." but the fact
> is my approach does not suffer from any such issues. Or depends on
> drivers registering devlink last.

Registration of devlink doesn't do anything except opening it to the world.
The lifetime is controlled with alloc and free. My beloved sentence "in
real drivers ..." belongs to use of devlink_put and devlink_locks outside
of devlink.c and nothing more.

> 
> I can start passing a pointer to a devlink_port to split/unsplit
> functions, which is a great improvement to the devlink driver API.

You can do it with my approach too. We incremented reference counter
of devlink instance when devlink_nl_cmd_port_split_doit() was called,
and we can safely take devlink->port_list_lock lock before returning
from pre_doit.

> 
> > We are waiting to see your proposal extended to support parallel devlink
> > execution and to be applied to real drivers.
> > https://lore.kernel.org/netdev/20211030231254.2477599-1-kuba@kernel.org/
> 
> The conversion to xarray you have done is a great improvement, I don't
> disagree with the way you convert to allow parallel calls either.
> 
> I already told you that real drivers can be converted rather easily,
> even if it's not really necessary.
> 
> But I'm giving you time to make your proposal. If I spend time
> polishing my patches I'll be even more eager to put this behind me.

I see exposure of devlink internals to the driver as last resort, so I
stopped to make proposals after your responses:

"I prefer my version."
https://lore.kernel.org/netdev/20211108101646.0a4e5ca4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

"If by "fixed first" you mean it needs 5 locks to be added and to remove
any guarantees on sub-object lifetime then no thanks."
https://lore.kernel.org/netdev/20211108104608.378c106e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

> 
> > Anyway, you are maintainer, you want half work, you will get half work.
> 
> What do you mean half work? You have a record of breaking things 
> in the area and changing directions. How is my request to limit
> unnecessary "cleanups" affecting drivers until the work is finished
> not perfectly reasonable?!?!

I don't know what made you think so. My end goals (parallel execution
and safe devlink reload) and solutions didn't changes:
 * Devlink instance is safe to access by kernel between devlink_alloc() and devlink_free().
 * Devlink instance is visible for users between devlink_register() and devlink_unregister().
 * Locks should be fine-grained and limited.

By saying, half work, I mean that attempt to limit locks leave many
functions to be such that can't fail and as such should be void and not
"return 0".

And regarding "breaking things", I'm not doing it for fun, but with real
desire to improve kernel for everyone, not only for our driver.

> 
> > > I spent enough time going back and forth with you.
> > 
> > Disagreements are hard for everyone, not only for you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-21  8:45             ` Leon Romanovsky
@ 2021-11-23  2:27               ` Jakub Kicinski
  2021-11-23  8:33                 ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-23  2:27 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Sun, 21 Nov 2021 10:45:12 +0200 Leon Romanovsky wrote:
> On Fri, Nov 19, 2021 at 08:10:17AM -0800, Jakub Kicinski wrote:
> > On Fri, 19 Nov 2021 17:38:53 +0200 Leon Romanovsky wrote:  
> > > My approach works, exactly like it works in other subsystems.
> > > https://lore.kernel.org/netdev/cover.1636390483.git.leonro@nvidia.com/  
> > 
> > What "other subsystems"? I'm aware of the RFC version of these patches.  
> 
> Approach to have fine-grained locking scheme, instead of having one big lock.
> This was done in MM for mmap_sem, we did it for RDMA too.

You're breaking things up to avoid lock ordering issues. The user can
still only run a single write command at a time.

> > Breaking up the locks to to protect sub-objects only is fine for
> > protecting internal lists but now you can't guarantee that the object
> > exists when driver is called.  
> 
> I can only guess about which objects you are talking.

It obviously refers to the port splitting I mentioned below.

> If you are talking about various devlink sub-objects (ports, traps,
> e.t.c), they created by the drivers and as such should be managed by them.
> Also they are connected to devlink which is guaranteed to exist. At the end,
> they called to devlink_XXX->devlink pointer without any existence check.
> 
> If you are talking about devlink instance itself, we guarantee that it
> exists between devlink_alloc() and devlink_free(). It seems to me pretty
> reasonable request from drivers do not access devlink before devlink_alloc()
> or after devlink_free(),
> 
> > I'm sure you'll utter your unprovable "in real drivers.." but the fact
> > is my approach does not suffer from any such issues. Or depends on
> > drivers registering devlink last.  
> 
> Registration of devlink doesn't do anything except opening it to the world.
> The lifetime is controlled with alloc and free. My beloved sentence "in
> real drivers ..." belongs to use of devlink_put and devlink_locks outside
> of devlink.c and nothing more.

As soon as there is a inter-dependency between two subsystems "must 
be last" breaks down.

> > I can start passing a pointer to a devlink_port to split/unsplit
> > functions, which is a great improvement to the devlink driver API.  
> 
> You can do it with my approach too. We incremented reference counter
> of devlink instance when devlink_nl_cmd_port_split_doit() was called,
> and we can safely take devlink->port_list_lock lock before returning
> from pre_doit.

Wait, I thought you'd hold devlink->lock around split/unsplit.

Please look at the port splitting case, mlx5 doesn't implement it
but it's an important feature.

Either way, IDK how ref count on devlink helps with lifetime of a
subobject. You must assume the sub-objects can only be created outside
of the time devlink instance is visible or under devlink->lock?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-23  2:27               ` Jakub Kicinski
@ 2021-11-23  8:33                 ` Leon Romanovsky
  2021-11-23 23:33                   ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-23  8:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Mon, Nov 22, 2021 at 06:27:28PM -0800, Jakub Kicinski wrote:
> On Sun, 21 Nov 2021 10:45:12 +0200 Leon Romanovsky wrote:
> > On Fri, Nov 19, 2021 at 08:10:17AM -0800, Jakub Kicinski wrote:
> > > On Fri, 19 Nov 2021 17:38:53 +0200 Leon Romanovsky wrote:  
> > > > My approach works, exactly like it works in other subsystems.
> > > > https://lore.kernel.org/netdev/cover.1636390483.git.leonro@nvidia.com/  
> > > 
> > > What "other subsystems"? I'm aware of the RFC version of these patches.  
> > 
> > Approach to have fine-grained locking scheme, instead of having one big lock.
> > This was done in MM for mmap_sem, we did it for RDMA too.
> 
> You're breaking things up to avoid lock ordering issues. The user can
> still only run a single write command at a time.
> 
> > > Breaking up the locks to to protect sub-objects only is fine for
> > > protecting internal lists but now you can't guarantee that the object
> > > exists when driver is called.  
> > 
> > I can only guess about which objects you are talking.
> 
> It obviously refers to the port splitting I mentioned below.
> 
> > If you are talking about various devlink sub-objects (ports, traps,
> > e.t.c), they created by the drivers and as such should be managed by them.
> > Also they are connected to devlink which is guaranteed to exist. At the end,
> > they called to devlink_XXX->devlink pointer without any existence check.
> > 
> > If you are talking about devlink instance itself, we guarantee that it
> > exists between devlink_alloc() and devlink_free(). It seems to me pretty
> > reasonable request from drivers do not access devlink before devlink_alloc()
> > or after devlink_free(),
> > 
> > > I'm sure you'll utter your unprovable "in real drivers.." but the fact
> > > is my approach does not suffer from any such issues. Or depends on
> > > drivers registering devlink last.  
> > 
> > Registration of devlink doesn't do anything except opening it to the world.
> > The lifetime is controlled with alloc and free. My beloved sentence "in
> > real drivers ..." belongs to use of devlink_put and devlink_locks outside
> > of devlink.c and nothing more.
> 
> As soon as there is a inter-dependency between two subsystems "must 
> be last" breaks down.

"Must be last" is better to be changed "when the device is ready".

-----------------------------------------------------------------
> The real question is whether you now require devlink_register()
> to go last in general? 

No, it is not requirement, but my suggestion. You need to be aware that
after call to devlink_register(), the device will be fully open for devlink
netlink access. So it is strongly advised to put devlink_register to be the
last command in PCI initialization sequence.

https://lore.kernel.org/netdev/YXhVd16heaHCegL1@unreal/
--------------------------------------------------------------------

> 
> > > I can start passing a pointer to a devlink_port to split/unsplit
> > > functions, which is a great improvement to the devlink driver API.  
> > 
> > You can do it with my approach too. We incremented reference counter
> > of devlink instance when devlink_nl_cmd_port_split_doit() was called,
> > and we can safely take devlink->port_list_lock lock before returning
> > from pre_doit.
> 
> Wait, I thought you'd hold devlink->lock around split/unsplit.

I'm holding.

    519 static int devlink_nl_pre_doit(const struct genl_ops *ops,
    520                                struct sk_buff *skb, struct genl_info *info)
    521 {
    ...
    529
    530         mutex_lock(&devlink->lock);


> 
> Please look at the port splitting case, mlx5 doesn't implement it
> but it's an important feature.

I'll, but please don't forget that it was RFC, just to present that
devlink can be changed internally without exposing internals.

> 
> Either way, IDK how ref count on devlink helps with lifetime of a
> subobject. You must assume the sub-objects can only be created outside
> of the time devlink instance is visible or under devlink->lock?

The devlink lifetime is:
stages:        I                   II                   III   
 devlink_alloc -> devlink_register -> devlink_unregister -> devlink_free.

All sub-objects should be created between devlink_alloc and devlink_free.
It will ensure that ->devlink pointer is always valid.

Stage I:
 * There is no need to hold any devlink locks or increase reference counter.
   If driver doesn't do anything crazy during its init, nothing in devlink
   land will run in parallel. 
Stage II:
 * There is a need to hold devlink->lock and/or play with reference counter
   and/or use fine-grained locks. Users can issue "devlink ..." commands.
Stage III:

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-23  8:33                 ` Leon Romanovsky
@ 2021-11-23 23:33                   ` Jakub Kicinski
  2021-11-25  9:02                     ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-11-23 23:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Tue, 23 Nov 2021 10:33:13 +0200 Leon Romanovsky wrote:
> > > You can do it with my approach too. We incremented reference counter
> > > of devlink instance when devlink_nl_cmd_port_split_doit() was called,
> > > and we can safely take devlink->port_list_lock lock before returning
> > > from pre_doit.  
> > 
> > Wait, I thought you'd hold devlink->lock around split/unsplit.  
> 
> I'm holding.
> 
>     519 static int devlink_nl_pre_doit(const struct genl_ops *ops,
>     520                                struct sk_buff *skb, struct genl_info *info)
>     521 {
>     ...
>     529
>     530         mutex_lock(&devlink->lock);

Then I'm confused why you said you need to hold a ref count on devlink.
Is it devlink_unregister() that's not taking devlink->lock?

> > Please look at the port splitting case, mlx5 doesn't implement it
> > but it's an important feature.  
> 
> I'll, but please don't forget that it was RFC, just to present that
> devlink can be changed internally without exposing internals.
> 
> > Either way, IDK how ref count on devlink helps with lifetime of a
> > subobject. You must assume the sub-objects can only be created outside
> > of the time devlink instance is visible or under devlink->lock?  
> 
> The devlink lifetime is:
> stages:        I                   II                   III   
>  devlink_alloc -> devlink_register -> devlink_unregister -> devlink_free.
> 
> All sub-objects should be created between devlink_alloc and devlink_free.
> It will ensure that ->devlink pointer is always valid.
> 
> Stage I:
>  * There is no need to hold any devlink locks or increase reference counter.
>    If driver doesn't do anything crazy during its init, nothing in devlink
>    land will run in parallel. 
> Stage II:
>  * There is a need to hold devlink->lock and/or play with reference counter
>    and/or use fine-grained locks. Users can issue "devlink ..." commands.

So sub-objects can (dis)appear only in I/III or under devlink->lock.
Why did you add the per-sub object list locks, then?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 5/6] devlink: Reshuffle resource registration logic
  2021-11-23 23:33                   ` Jakub Kicinski
@ 2021-11-25  9:02                     ` Leon Romanovsky
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2021-11-25  9:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Alexandre Belloni, Andrew Lunn, Aya Levin,
	Claudiu Manoil, drivers, Florian Fainelli, Ido Schimmel,
	intel-wired-lan, Ioana Ciornei, Jesse Brandeburg, Jiri Pirko,
	linux-kernel, linux-rdma, Michael Chan, netdev, oss-drivers,
	Saeed Mahameed, Shannon Nelson, Simon Horman, Taras Chornyi,
	Tariq Toukan, Tony Nguyen, UNGLinuxDriver, Vivien Didelot,
	Vladimir Oltean

On Tue, Nov 23, 2021 at 03:33:12PM -0800, Jakub Kicinski wrote:
> On Tue, 23 Nov 2021 10:33:13 +0200 Leon Romanovsky wrote:
> > > > You can do it with my approach too. We incremented reference counter
> > > > of devlink instance when devlink_nl_cmd_port_split_doit() was called,
> > > > and we can safely take devlink->port_list_lock lock before returning
> > > > from pre_doit.  
> > > 
> > > Wait, I thought you'd hold devlink->lock around split/unsplit.  
> > 
> > I'm holding.
> > 
> >     519 static int devlink_nl_pre_doit(const struct genl_ops *ops,
> >     520                                struct sk_buff *skb, struct genl_info *info)
> >     521 {
> >     ...
> >     529
> >     530         mutex_lock(&devlink->lock);
> 
> Then I'm confused why you said you need to hold a ref count on devlink.

This was an example to your sentence "I can start passing a pointer
to a devlink_port to split/unsplit functions, which is a great improvement
to the devlink driver API."
https://lore.kernel.org/all/20211119081017.6676843b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

In my view, it is complete over-engineering and not needed at all. In
current driver model, you can pass devlink_port pointer pretty safely
without worries that "->devlink" disappear.

> Is it devlink_unregister() that's not taking devlink->lock?

Maybe, but my rationale for devlink_get in my example was slightly different.
We need to use it when the ->devlink structure and sub-object are
managed completely independent with different lifetimes and sub-object
can over-live the devlink structure.

All devlink_*_register() calls require valid devlink structure, so as I
wrote above, devlink_get is not needed really needed.

However you used that example so many times that I started to fear that
I'm missing something very basic.

> 
> > > Please look at the port splitting case, mlx5 doesn't implement it
> > > but it's an important feature.  
> > 
> > I'll, but please don't forget that it was RFC, just to present that
> > devlink can be changed internally without exposing internals.
> > 
> > > Either way, IDK how ref count on devlink helps with lifetime of a
> > > subobject. You must assume the sub-objects can only be created outside
> > > of the time devlink instance is visible or under devlink->lock?  
> > 
> > The devlink lifetime is:
> > stages:        I                   II                   III   
> >  devlink_alloc -> devlink_register -> devlink_unregister -> devlink_free.
> > 
> > All sub-objects should be created between devlink_alloc and devlink_free.
> > It will ensure that ->devlink pointer is always valid.
> > 
> > Stage I:
> >  * There is no need to hold any devlink locks or increase reference counter.
> >    If driver doesn't do anything crazy during its init, nothing in devlink
> >    land will run in parallel. 
> > Stage II:
> >  * There is a need to hold devlink->lock and/or play with reference counter
> >    and/or use fine-grained locks. Users can issue "devlink ..." commands.
> 
> So sub-objects can (dis)appear only in I/III or under devlink->lock.
> Why did you add the per-sub object list locks, then?

There are number of reasons and not all of them are technical.

I wanted to do that, my initial plan was to cleanly separate user-visible
API vs. in-kernel API and use one lock or no locks at all.

But at some point of time, I recalculated my path, when I saw that
I'm failing to explain even simple devlink lifetime model, together
with warm feedback from the community and need to have this patch:

[RFC PATCH 14/16] devlink: Require devlink lock during device reload
https://lore.kernel.org/netdev/ad7f5f275bcda1ee058d7bd3020b7d85cd44b9f6.1636390483.git.leonro@nvidia.com/

That patch is super-important in the devlink_reload puzzle, it closes the hack
used in devlink_reload flow to allow to call to same devlink_*_register() calls
without taking devlink->lock, so they can take it. In order to do it, I
used list locks, because only for this that devlink->lock was needed in
these calls.

However, there is a way to avoid list locks. It can be achieved if we start
to manage devlink state machine (at least for reload) internally and add
something like that in devlink_*_register() calls:

 if (devlink->not_in_reload)
    mutex_lock(&devlink->lock);

It doesn't look nice, and invites immediate question: "why don't we
provide two APIs? locked and unlocked? Locked for reload, and unlocked
for all other parts". Unfortunately, this will require major changes in
the drivers and in offline conversation I was told "do whatever you need
in devlink as long as it doesn't require change in the driver, we want
same drver flow for probe and reload.".

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-11-25  9:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 18:26 [PATCH net-next 0/6] Devlink cleanups Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 1/6] devlink: Remove misleading internal_flags from health reporter dump Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 2/6] devlink: Delete useless checks of holding devlink lock Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 3/6] devlink: Simplify devlink resources unregister call Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 4/6] devlink: Clean registration of devlink port Leon Romanovsky
2021-11-18  4:49   ` Jakub Kicinski
2021-11-18  7:32     ` Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 5/6] devlink: Reshuffle resource registration logic Leon Romanovsky
2021-11-18  4:49   ` Jakub Kicinski
2021-11-18  7:50     ` Leon Romanovsky
2021-11-19  1:48       ` Jakub Kicinski
2021-11-19 15:38         ` Leon Romanovsky
2021-11-19 16:10           ` Jakub Kicinski
2021-11-21  8:45             ` Leon Romanovsky
2021-11-23  2:27               ` Jakub Kicinski
2021-11-23  8:33                 ` Leon Romanovsky
2021-11-23 23:33                   ` Jakub Kicinski
2021-11-25  9:02                     ` Leon Romanovsky
2021-11-17 18:26 ` [PATCH net-next 6/6] devlink: Inline sb related functions Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).