Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
@ 2021-08-10 13:37 Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
Hi Dave and Jakub,
This series prepares code to remove devlink_reload_enable/_disable API
and in order to do, we move all devlink_register() calls to be right
before devlink_reload_enable().
The best place for such a call should be right before exiting from
the probe().
This is done because devlink_register() opens devlink netlink to the
users and gives them a venue to issue commands before initialization
is finished.
1. Some drivers were aware of such "functionality" and tried to protect
themselves with extra locks, state machines and devlink_reload_enable().
Let's assume that it worked for them, but I'm personally skeptical about
it.
2. Some drivers copied that pattern, but without locks and state
machines. That protected them from reload flows, but not from any _set_
routines.
3. And all other drivers simply didn't understand the implications of early
devlink_register() and can be seen as "broken".
In this series, we focus on items #1 and #2.
Please share your opinion if I should change ALL other drivers to make
sure that devlink_register() is the last command or leave them in an
as-is state.
Thanks
Leon Romanovsky (5):
net: hns3: remove always exist devlink pointer check
net/mlx4: Move devlink_register to be the last initialization command
mlxsw: core: Refactor code to publish devlink ops when device is ready
net/mlx5: Accept devlink user input after driver initialization
complete
netdevsim: Delay user access till probe is finished
.../hisilicon/hns3/hns3pf/hclge_devlink.c | 8 +---
.../hisilicon/hns3/hns3vf/hclgevf_devlink.c | 8 +---
drivers/net/ethernet/mellanox/mlx4/main.c | 38 +++++++++++++------
.../net/ethernet/mellanox/mlx5/core/devlink.c | 10 +----
.../net/ethernet/mellanox/mlx5/core/main.c | 13 ++++++-
.../mellanox/mlx5/core/sf/dev/driver.c | 12 +++++-
drivers/net/ethernet/mellanox/mlxsw/core.c | 27 +++++++------
drivers/net/netdevsim/dev.c | 19 +++++-----
8 files changed, 76 insertions(+), 59 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
@ 2021-08-10 13:37 ` Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command Leon Romanovsky
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
The devlink pointer always exists after hclge_devlink_init() succeed.
Remove that check together with NULL setting after release and ensure
that devlink_register is last command prior to call to devlink_reload_enable().
Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c | 8 +-------
.../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c | 8 +-------
2 files changed, 2 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
index 448f29aa4e6b..e4aad695abcc 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
@@ -118,6 +118,7 @@ int hclge_devlink_init(struct hclge_dev *hdev)
priv = devlink_priv(devlink);
priv->hdev = hdev;
+ hdev->devlink = devlink;
ret = devlink_register(devlink);
if (ret) {
@@ -126,8 +127,6 @@ int hclge_devlink_init(struct hclge_dev *hdev)
goto out_reg_fail;
}
- hdev->devlink = devlink;
-
devlink_reload_enable(devlink);
return 0;
@@ -141,14 +140,9 @@ void hclge_devlink_uninit(struct hclge_dev *hdev)
{
struct devlink *devlink = hdev->devlink;
- if (!devlink)
- return;
-
devlink_reload_disable(devlink);
devlink_unregister(devlink);
devlink_free(devlink);
-
- hdev->devlink = NULL;
}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
index 1e6061fb8ed4..f478770299c6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
@@ -120,6 +120,7 @@ int hclgevf_devlink_init(struct hclgevf_dev *hdev)
priv = devlink_priv(devlink);
priv->hdev = hdev;
+ hdev->devlink = devlink;
ret = devlink_register(devlink);
if (ret) {
@@ -128,8 +129,6 @@ int hclgevf_devlink_init(struct hclgevf_dev *hdev)
goto out_reg_fail;
}
- hdev->devlink = devlink;
-
devlink_reload_enable(devlink);
return 0;
@@ -143,14 +142,9 @@ void hclgevf_devlink_uninit(struct hclgevf_dev *hdev)
{
struct devlink *devlink = hdev->devlink;
- if (!devlink)
- return;
-
devlink_reload_disable(devlink);
devlink_unregister(devlink);
devlink_free(devlink);
-
- hdev->devlink = NULL;
}
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
@ 2021-08-10 13:37 ` Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready Leon Romanovsky
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
Refactor the code to make sure that devlink_register() is the last
command during initialization stage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 38 ++++++++++++++++-------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 7267c6c6d2e2..7005c32195a3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3996,6 +3996,8 @@ static const struct devlink_ops mlx4_devlink_ops = {
.reload_up = mlx4_devlink_reload_up,
};
+static void _mlx4_remove_one(struct pci_dev *pdev);
+
static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct devlink *devlink;
@@ -4024,28 +4026,29 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
mutex_init(&dev->persist->interface_state_mutex);
mutex_init(&dev->persist->pci_status_mutex);
- ret = devlink_register(devlink);
- if (ret)
- goto err_persist_free;
ret = devlink_params_register(devlink, mlx4_devlink_params,
ARRAY_SIZE(mlx4_devlink_params));
if (ret)
- goto err_devlink_unregister;
+ goto err_persist_free;
mlx4_devlink_set_params_init_values(devlink);
ret = __mlx4_init_one(pdev, id->driver_data, priv);
if (ret)
goto err_params_unregister;
devlink_params_publish(devlink);
- devlink_reload_enable(devlink);
pci_save_state(pdev);
+
+ ret = devlink_register(devlink);
+ if (ret) {
+ _mlx4_remove_one(pdev);
+ return ret;
+ }
+ devlink_reload_enable(devlink);
return 0;
err_params_unregister:
devlink_params_unregister(devlink, mlx4_devlink_params,
ARRAY_SIZE(mlx4_devlink_params));
-err_devlink_unregister:
- devlink_unregister(devlink);
err_persist_free:
kfree(dev->persist);
err_devlink_free:
@@ -4141,7 +4144,7 @@ static void mlx4_unload_one(struct pci_dev *pdev)
priv->removed = 1;
}
-static void mlx4_remove_one(struct pci_dev *pdev)
+static void _mlx4_remove_one(struct pci_dev *pdev)
{
struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
struct mlx4_dev *dev = persist->dev;
@@ -4149,8 +4152,6 @@ static void mlx4_remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(priv);
int active_vfs = 0;
- devlink_reload_disable(devlink);
-
if (mlx4_is_slave(dev))
persist->interface_state |= MLX4_INTERFACE_STATE_NOWAIT;
@@ -4185,11 +4186,26 @@ static void mlx4_remove_one(struct pci_dev *pdev)
mlx4_pci_disable_device(dev);
devlink_params_unregister(devlink, mlx4_devlink_params,
ARRAY_SIZE(mlx4_devlink_params));
- devlink_unregister(devlink);
kfree(dev->persist);
devlink_free(devlink);
}
+static void mlx4_remove_one(struct pci_dev *pdev)
+{
+ struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
+ struct devlink *devlink;
+ struct mlx4_priv *priv;
+ struct mlx4_dev *dev;
+
+ dev = persist->dev;
+ priv = mlx4_priv(dev);
+ devlink = priv_to_devlink(priv);
+
+ devlink_reload_disable(devlink);
+ devlink_unregister(devlink);
+ _mlx4_remove_one(pdev);
+}
+
static int restore_current_port_types(struct mlx4_dev *dev,
enum mlx4_port_type *types,
enum mlx4_port_type *poss_types)
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command Leon Romanovsky
@ 2021-08-10 13:37 ` Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete Leon Romanovsky
` (2 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
Move devlink_register() to be last command after device is fully
initialized.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 27 +++++++++++-----------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index f080fab3de2b..a8a989070aaf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1974,12 +1974,6 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_emad_init;
- if (!reload) {
- err = devlink_register(devlink);
- if (err)
- goto err_devlink_register;
- }
-
if (!reload) {
err = mlxsw_core_params_register(mlxsw_core);
if (err)
@@ -2017,11 +2011,20 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
mlxsw_core->is_initialized = true;
devlink_params_publish(devlink);
- if (!reload)
+ if (!reload) {
+ err = devlink_register(devlink);
+ if (err)
+ goto err_devlink_register;
+
devlink_reload_enable(devlink);
+ }
return 0;
+err_devlink_register:
+ devlink_params_unpublish(devlink);
+ mlxsw_core->is_initialized = false;
+ mlxsw_env_fini(mlxsw_core->env);
err_env_init:
mlxsw_thermal_fini(mlxsw_core->thermal);
err_thermal_init:
@@ -2036,9 +2039,6 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
if (!reload)
mlxsw_core_params_unregister(mlxsw_core);
err_register_params:
- if (!reload)
- devlink_unregister(devlink);
-err_devlink_register:
mlxsw_emad_fini(mlxsw_core);
err_emad_init:
kfree(mlxsw_core->lag.mapping);
@@ -2087,8 +2087,10 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
{
struct devlink *devlink = priv_to_devlink(mlxsw_core);
- if (!reload)
+ if (!reload) {
devlink_reload_disable(devlink);
+ devlink_unregister(devlink);
+ }
if (devlink_is_reload_failed(devlink)) {
if (!reload)
/* Only the parts that were not de-initialized in the
@@ -2109,8 +2111,6 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
mlxsw_core_health_fini(mlxsw_core);
if (!reload)
mlxsw_core_params_unregister(mlxsw_core);
- if (!reload)
- devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
kfree(mlxsw_core->lag.mapping);
mlxsw_ports_fini(mlxsw_core, reload);
@@ -2124,7 +2124,6 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
reload_fail_deinit:
mlxsw_core_params_unregister(mlxsw_core);
- devlink_unregister(devlink);
devlink_resources_unregister(devlink, NULL);
devlink_free(devlink);
}
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
` (2 preceding siblings ...)
2021-08-10 13:37 ` [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready Leon Romanovsky
@ 2021-08-10 13:37 ` Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished Leon Romanovsky
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
5 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
The change of devlink_alloc() to accept device makes sure that device
is fully initialized and device_register() does nothing except allowing
users to use that devlink instance.
Such change ensures that no user input will be usable till that point and
it eliminates the need to worry about internal locking as long as devlink_register
is called last since all accesses to the devlink are during initialization.
This change fixes the following lockdep warning.
======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #27 Not tainted
------------------------------------------------------
devlink/265 is trying to acquire lock:
ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core]
but task is already holding lock:
ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (devlink_mutex){+.+.}-{3:3}:
__mutex_lock+0x149/0x1310
devlink_register+0xe7/0x280
mlx5_devlink_register+0x118/0x480 [mlx5_core]
mlx5_init_one+0x34b/0x440 [mlx5_core]
probe_one+0x480/0x6e0 [mlx5_core]
pci_device_probe+0x2a0/0x4a0
really_probe+0x1cb/0xba0
__driver_probe_device+0x18f/0x470
driver_probe_device+0x49/0x120
__driver_attach+0x1ce/0x400
bus_for_each_dev+0x11e/0x1a0
bus_add_driver+0x309/0x570
driver_register+0x20f/0x390
0xffffffffa04a0062
do_one_initcall+0xd5/0x400
do_init_module+0x1c8/0x760
load_module+0x7d9d/0xa4b0
__do_sys_finit_module+0x118/0x1a0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
__lock_acquire+0x2999/0x5a40
lock_acquire+0x1a9/0x4a0
__mutex_lock+0x149/0x1310
mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
devlink_reload+0x1f2/0x640
devlink_nl_cmd_reload+0x6c3/0x10d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
genl_rcv_msg+0x27f/0x4a0
netlink_rcv_skb+0x11e/0x340
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
netlink_sendmsg+0x6fb/0xbe0
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x192/0x240
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(devlink_mutex);
lock(&dev->intf_state_mutex);
lock(devlink_mutex);
lock(&dev->intf_state_mutex);
*** DEADLOCK ***
3 locks held by devlink/265:
#0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
#1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0
#2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0
stack backtrace:
CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x45/0x59
check_noncircular+0x268/0x310
? print_circular_bug+0x460/0x460
? __kernel_text_address+0xe/0x30
? alloc_chain_hlocks+0x1e6/0x5a0
__lock_acquire+0x2999/0x5a40
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? add_lock_to_list.constprop.0+0x6c/0x530
lock_acquire+0x1a9/0x4a0
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? lock_release+0x6c0/0x6c0
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? lock_is_held_type+0x98/0x110
__mutex_lock+0x149/0x1310
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? lock_is_held_type+0x98/0x110
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? find_held_lock+0x2d/0x110
? mutex_lock_io_nested+0x1160/0x1160
? mlx5_lag_is_active+0x72/0x90 [mlx5_core]
? lock_downgrade+0x6d0/0x6d0
? do_raw_spin_lock+0x12e/0x270
? rwlock_bug.part.0+0x90/0x90
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
? netlink_broadcast_filtered+0x308/0xac0
? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core]
? __build_skb_around+0x110/0x2b0
? __alloc_skb+0x113/0x2b0
devlink_reload+0x1f2/0x640
? devlink_unregister+0x1e0/0x1e0
? security_capable+0x51/0x90
devlink_nl_cmd_reload+0x6c3/0x10d0
? devlink_nl_cmd_get_doit+0x1e0/0x1e0
? devlink_nl_pre_doit+0x72/0x8d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
? __lock_acquire+0x15e2/0x5a40
? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
? mutex_lock_io_nested+0x1160/0x1160
? security_capable+0x51/0x90
genl_rcv_msg+0x27f/0x4a0
? genl_get_cmd+0x3c0/0x3c0
? lock_acquire+0x1a9/0x4a0
? devlink_nl_cmd_get_doit+0x1e0/0x1e0
? lock_release+0x6c0/0x6c0
netlink_rcv_skb+0x11e/0x340
? genl_get_cmd+0x3c0/0x3c0
? netlink_ack+0x930/0x930
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
? netlink_attachskb+0x750/0x750
? __alloc_skb+0x113/0x2b0
netlink_sendmsg+0x6fb/0xbe0
? netlink_unicast+0x700/0x700
? netlink_unicast+0x700/0x700
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x192/0x240
? __x64_sys_getpeername+0xb0/0xb0
? do_sys_openat2+0x10a/0x370
? down_write_nested+0x150/0x150
? do_user_addr_fault+0x215/0xd50
? __x64_sys_openat+0x11f/0x1d0
? __x64_sys_open+0x1a0/0x1a0
__x64_sys_sendto+0xdc/0x1b0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f50b50b6b3a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a
RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003
RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440
mlx5_core 0000:00:09.0: firmware version: 4.8.9999
mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
mlx5_core 0000:00:09.0 eth1: Link up
Fixes: a6f3b62386a0 ("net/mlx5: Move devlink registration before interfaces load")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 10 ++--------
drivers/net/ethernet/mellanox/mlx5/core/main.c | 13 ++++++++++++-
.../net/ethernet/mellanox/mlx5/core/sf/dev/driver.c | 12 ++++++++++--
3 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index f38553ff538b..9b058f97c8fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -643,14 +643,11 @@ int mlx5_devlink_register(struct devlink *devlink)
{
int err;
- err = devlink_register(devlink);
- if (err)
- return err;
-
err = devlink_params_register(devlink, mlx5_devlink_params,
ARRAY_SIZE(mlx5_devlink_params));
if (err)
- goto params_reg_err;
+ return err;
+
mlx5_devlink_set_params_init_values(devlink);
devlink_params_publish(devlink);
@@ -663,8 +660,6 @@ int mlx5_devlink_register(struct devlink *devlink)
traps_reg_err:
devlink_params_unregister(devlink, mlx5_devlink_params,
ARRAY_SIZE(mlx5_devlink_params));
-params_reg_err:
- devlink_unregister(devlink);
return err;
}
@@ -673,5 +668,4 @@ void mlx5_devlink_unregister(struct devlink *devlink)
mlx5_devlink_traps_unregister(devlink);
devlink_params_unregister(devlink, mlx5_devlink_params,
ARRAY_SIZE(mlx5_devlink_params));
- devlink_unregister(devlink);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index a8efd9f1af4c..9f10049a63f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1494,10 +1494,20 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code %d\n", err);
pci_save_state(pdev);
+ err = devlink_register(devlink);
+ if (err) {
+ mlx5_core_err(dev,
+ "devlink_register failed with error code %d\n",
+ err);
+ goto devlink_reg_err;
+ }
if (!mlx5_core_is_mp_slave(dev))
devlink_reload_enable(devlink);
return 0;
-
+devlink_reg_err:
+ mlx5_crdump_disable(dev);
+ mlx5_drain_health_wq(dev);
+ mlx5_uninit_one(dev);
err_init_one:
mlx5_pci_close(dev);
pci_init_err:
@@ -1516,6 +1526,7 @@ static void remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(dev);
devlink_reload_disable(devlink);
+ devlink_unregister(devlink);
mlx5_crdump_disable(dev);
mlx5_drain_health_wq(dev);
mlx5_uninit_one(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 052f48068dc1..b0f2b9db6d85 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -46,9 +46,17 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
mlx5_core_warn(mdev, "mlx5_init_one err=%d\n", err);
goto init_one_err;
}
+
+ err = devlink_register(devlink);
+ if (err) {
+ mlx5_core_warn(mdev, "devlink_register err=%d\n", err);
+ goto devlink_reg_err;
+ }
devlink_reload_enable(devlink);
return 0;
+devlink_reg_err:
+ mlx5_uninit_one(mdev);
init_one_err:
iounmap(mdev->iseg);
remap_err:
@@ -61,10 +69,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
{
struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
- struct devlink *devlink;
+ struct devlink *devlink = priv_to_devlink(sf_dev->mdev);
- devlink = priv_to_devlink(sf_dev->mdev);
devlink_reload_disable(devlink);
+ devlink_unregister(devlink);
mlx5_uninit_one(sf_dev->mdev);
iounmap(sf_dev->mdev->iseg);
mlx5_mdev_uninit(sf_dev->mdev);
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
` (3 preceding siblings ...)
2021-08-10 13:37 ` [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete Leon Romanovsky
@ 2021-08-10 13:37 ` Leon Romanovsky
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
5 siblings, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-10 13:37 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: Leon Romanovsky, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
From: Leon Romanovsky <leonro@nvidia.com>
Don't publish supported user space accessible ops till probe is finished.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/net/netdevsim/dev.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 54313bd57797..181258bd72f2 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1470,14 +1470,10 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
if (err)
goto err_devlink_free;
- err = devlink_register(devlink);
- if (err)
- goto err_resources_unregister;
-
err = devlink_params_register(devlink, nsim_devlink_params,
ARRAY_SIZE(nsim_devlink_params));
if (err)
- goto err_dl_unregister;
+ goto err_resources_unregister;
nsim_devlink_set_params_init_values(nsim_dev, devlink);
err = nsim_dev_dummy_region_init(nsim_dev, devlink);
@@ -1515,10 +1511,17 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
goto err_psample_exit;
devlink_params_publish(devlink);
- devlink_reload_enable(devlink);
nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
+ err = devlink_register(devlink);
+ if (err)
+ goto err_port_del_all;
+
+ devlink_reload_enable(devlink);
return 0;
+err_port_del_all:
+ devlink_params_unpublish(devlink);
+ nsim_dev_port_del_all(nsim_dev);
err_psample_exit:
nsim_dev_psample_exit(nsim_dev);
err_bpf_dev_exit:
@@ -1536,8 +1539,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
err_params_unregister:
devlink_params_unregister(devlink, nsim_devlink_params,
ARRAY_SIZE(nsim_devlink_params));
-err_dl_unregister:
- devlink_unregister(devlink);
err_resources_unregister:
devlink_resources_unregister(devlink, NULL);
err_devlink_free:
@@ -1573,6 +1574,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
struct devlink *devlink = priv_to_devlink(nsim_dev);
devlink_reload_disable(devlink);
+ devlink_unregister(devlink);
nsim_dev_reload_destroy(nsim_dev);
@@ -1580,7 +1582,6 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
nsim_dev_debugfs_exit(nsim_dev);
devlink_params_unregister(devlink, nsim_devlink_params,
ARRAY_SIZE(nsim_devlink_params));
- devlink_unregister(devlink);
devlink_resources_unregister(devlink, NULL);
devlink_free(devlink);
}
--
2.31.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
` (4 preceding siblings ...)
2021-08-10 13:37 ` [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished Leon Romanovsky
@ 2021-08-10 23:53 ` Jakub Kicinski
2021-08-11 6:10 ` Leon Romanovsky
5 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-08-10 23:53 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David S . Miller, Leon Romanovsky, Guangbin Huang, Ido Schimmel,
Jiri Pirko, linux-kernel, Michael Guralnik, netdev,
Saeed Mahameed, Salil Mehta, Tariq Toukan, Yisen Zhuang,
Yufeng Mo
On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> This series prepares code to remove devlink_reload_enable/_disable API
> and in order to do, we move all devlink_register() calls to be right
> before devlink_reload_enable().
>
> The best place for such a call should be right before exiting from
> the probe().
>
> This is done because devlink_register() opens devlink netlink to the
> users and gives them a venue to issue commands before initialization
> is finished.
>
> 1. Some drivers were aware of such "functionality" and tried to protect
> themselves with extra locks, state machines and devlink_reload_enable().
> Let's assume that it worked for them, but I'm personally skeptical about
> it.
>
> 2. Some drivers copied that pattern, but without locks and state
> machines. That protected them from reload flows, but not from any _set_
> routines.
>
> 3. And all other drivers simply didn't understand the implications of early
> devlink_register() and can be seen as "broken".
What are those implications for drivers which don't implement reload?
Depending on which parts of devlink the drivers implement there may well
be nothing to worry about.
Plus devlink instances start out with reload disabled. Could you please
take a step back and explain why these changes are needed.
> In this series, we focus on items #1 and #2.
>
> Please share your opinion if I should change ALL other drivers to make
> sure that devlink_register() is the last command or leave them in an
> as-is state.
Can you please share the output of devlink monitor and ip monitor link
before and after? The modified drivers will not register ports before
they register the devlink instance itself.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
@ 2021-08-11 6:10 ` Leon Romanovsky
2021-08-11 13:27 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-11 6:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > This series prepares code to remove devlink_reload_enable/_disable API
> > and in order to do, we move all devlink_register() calls to be right
> > before devlink_reload_enable().
> >
> > The best place for such a call should be right before exiting from
> > the probe().
> >
> > This is done because devlink_register() opens devlink netlink to the
> > users and gives them a venue to issue commands before initialization
> > is finished.
> >
> > 1. Some drivers were aware of such "functionality" and tried to protect
> > themselves with extra locks, state machines and devlink_reload_enable().
> > Let's assume that it worked for them, but I'm personally skeptical about
> > it.
> >
> > 2. Some drivers copied that pattern, but without locks and state
> > machines. That protected them from reload flows, but not from any _set_
> > routines.
> >
> > 3. And all other drivers simply didn't understand the implications of early
> > devlink_register() and can be seen as "broken".
>
> What are those implications for drivers which don't implement reload?
> Depending on which parts of devlink the drivers implement there may well
> be nothing to worry about.
>
> Plus devlink instances start out with reload disabled. Could you please
> take a step back and explain why these changes are needed.
The problem is that devlink_register() adds new devlink instance to the
list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
will try to access devices during their initialization, before they are ready.
The more troublesome case is that devlink_list is iterated in the
devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
latter function will return to the caller that new devlink is valid and
such caller will be able to proceed to *_set_doit() functions.
Just as an example:
* user sends netlink message
* devlink_nl_cmd_eswitch_set_doit()
* ops->eswitch_mode_set()
* Are you sure that all drivers protected here?
I remind that driver is in the middle of its probe().
Someone can argue that drivers and devlink are protected from anything
harmful with their global (devlink_mutex and devlink->lock) and internal
(device->lock, e.t.c.) locks. However it is impossible to prove for all
drivers and prone to errors.
Reload enable/disable gives false impression that the problem exists in
that flow only, which is not true.
devlink_reload_enable() is a duct tape because reload flows much easier
to hit.
>
> > In this series, we focus on items #1 and #2.
> >
> > Please share your opinion if I should change ALL other drivers to make
> > sure that devlink_register() is the last command or leave them in an
> > as-is state.
>
> Can you please share the output of devlink monitor and ip monitor link
> before and after? The modified drivers will not register ports before
> they register the devlink instance itself.
Not really, they will register but won't be accessible from the user space.
The only difference is the location of "[dev,new] ..." notification.
[leonro@vm ~]$ sudo modprobe mlx5_core
[ 105.575790] mlx5_core 0000:00:09.0: firmware version: 4.8.9999
[ 105.576349] mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
[ 105.686217] pps pps0: new PPS source ptp0
[ 105.688144] mlx5_core 0000:00:09.0: E-Switch: Total vports 2, per vport: max uc(32768) max mc(32768)
[ 105.717736] mlx5_core 0000:00:09.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 106.957028] mlx5_core 0000:00:09.0 eth1: Link down
[ 106.960379] mlx5_core 0000:00:09.0 eth1: Link up
[ 106.967916] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
================================================================================================
Before:
[leonro@vm ~]$ sudo devlink monitor
[dev,new] pci/0000:00:09.0
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
values:
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
values:
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
values:
[param,new] pci/0000:00:09.0: name enable_roce type generic
values:
cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
values:
cmode driverinit value 15
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
values:
cmode runtime value dmfs
[param,new] pci/0000:00:09.0: name enable_roce type generic
values:
cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
values:
cmode driverinit value 15
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
values:
cmode runtime value true
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
values:
cmode runtime value true
[trap-group,new] pci/0000:00:09.0: name l2_drops generic true
[trap,new] pci/0000:00:09.0: name ingress_vlan_filter type drop generic true action drop group l2_drops
[trap,new] pci/0000:00:09.0: name dmac_filter type drop generic true action drop group l2_drops
[port,new] pci/0000:00:09.0/131071: type notset flavour physical port 0 splittable false
[port,new] pci/0000:00:09.0/131071: type eth netdev eth1 flavour physical port 0 splittable false
[leonro@vm ~]$ sudo ip monitor
inet eth1 forwarding off rp_filter loose mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
inet6 eth1 forwarding off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DOWN group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
multicast ff00::/8 dev eth1 table local proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1 inet6 fe80::5054:ff:fe12:3456/64 scope link
valid_lft forever preferred_lft forever
local fe80::5054:ff:fe12:3456 dev eth1 table local proto kernel metric 0 pref medium
===========================================================================================================
After:
[leonro@vm ~]$ sudo devlink monitor
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
values:
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
values:
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
values:
[param,new] pci/0000:00:09.0: name enable_roce type generic
values:
cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
values:
cmode driverinit value 15
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
values:
cmode runtime value dmfs
[param,new] pci/0000:00:09.0: name enable_roce type generic
values:
cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
values:
cmode driverinit value 15
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
values:
cmode runtime value true
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
values:
cmode runtime value true
[trap-group,new] pci/0000:00:09.0: name l2_drops generic true
[trap,new] pci/0000:00:09.0: name ingress_vlan_filter type drop generic true action drop group l2_drops
[trap,new] pci/0000:00:09.0: name dmac_filter type drop generic true action drop group l2_drops
[dev,new] pci/0000:00:09.0
[port,new] pci/0000:00:09.0/131071: type notset flavour physical port 0 splittable false
[port,new] pci/0000:00:09.0/131071: type eth netdev eth1 flavour physical port 0 splittable false
[leonro@vm ~]$ sudo ip monitor
inet eth1 forwarding off rp_filter loose mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
inet6 eth1 forwarding off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DOWN group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
multicast ff00::/8 dev eth1 table local proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1 inet6 fe80::5054:ff:fe12:3456/64 scope link
valid_lft forever preferred_lft forever
local fe80::5054:ff:fe12:3456 dev eth1 table local proto kernel metric 0 pref medium
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 6:10 ` Leon Romanovsky
@ 2021-08-11 13:27 ` Jakub Kicinski
2021-08-11 14:01 ` Leon Romanovsky
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2021-08-11 13:27 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, 11 Aug 2021 09:10:49 +0300 Leon Romanovsky wrote:
> On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> > On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > > This series prepares code to remove devlink_reload_enable/_disable API
> > > and in order to do, we move all devlink_register() calls to be right
> > > before devlink_reload_enable().
> > >
> > > The best place for such a call should be right before exiting from
> > > the probe().
> > >
> > > This is done because devlink_register() opens devlink netlink to the
> > > users and gives them a venue to issue commands before initialization
> > > is finished.
> > >
> > > 1. Some drivers were aware of such "functionality" and tried to protect
> > > themselves with extra locks, state machines and devlink_reload_enable().
> > > Let's assume that it worked for them, but I'm personally skeptical about
> > > it.
> > >
> > > 2. Some drivers copied that pattern, but without locks and state
> > > machines. That protected them from reload flows, but not from any _set_
> > > routines.
> > >
> > > 3. And all other drivers simply didn't understand the implications of early
> > > devlink_register() and can be seen as "broken".
> >
> > What are those implications for drivers which don't implement reload?
> > Depending on which parts of devlink the drivers implement there may well
> > be nothing to worry about.
> >
> > Plus devlink instances start out with reload disabled. Could you please
> > take a step back and explain why these changes are needed.
>
> The problem is that devlink_register() adds new devlink instance to the
> list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
> will try to access devices during their initialization, before they are ready.
>
> The more troublesome case is that devlink_list is iterated in the
> devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
> latter function will return to the caller that new devlink is valid and
> such caller will be able to proceed to *_set_doit() functions.
>
> Just as an example:
> * user sends netlink message
> * devlink_nl_cmd_eswitch_set_doit()
> * ops->eswitch_mode_set()
> * Are you sure that all drivers protected here?
> I remind that driver is in the middle of its probe().
>
> Someone can argue that drivers and devlink are protected from anything
> harmful with their global (devlink_mutex and devlink->lock) and internal
> (device->lock, e.t.c.) locks. However it is impossible to prove for all
> drivers and prone to errors.
>
> Reload enable/disable gives false impression that the problem exists in
> that flow only, which is not true.
>
> devlink_reload_enable() is a duct tape because reload flows much easier
> to hit.
Right :/
> > > In this series, we focus on items #1 and #2.
> > >
> > > Please share your opinion if I should change ALL other drivers to make
> > > sure that devlink_register() is the last command or leave them in an
> > > as-is state.
> >
> > Can you please share the output of devlink monitor and ip monitor link
> > before and after? The modified drivers will not register ports before
> > they register the devlink instance itself.
>
> Not really, they will register but won't be accessible from the user space.
> The only difference is the location of "[dev,new] ..." notification.
Is that because of mlx5's use of auxdev, or locking? I don't see
anything that should prevent the port notification from coming out.
I think the notifications need to get straightened out, we can't notify
about sub-objects until the object is registered, since they are
inaccessible.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 13:27 ` Jakub Kicinski
@ 2021-08-11 14:01 ` Leon Romanovsky
2021-08-11 14:15 ` Leon Romanovsky
2021-08-11 14:18 ` Jakub Kicinski
0 siblings, 2 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-11 14:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, Aug 11, 2021 at 06:27:32AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 09:10:49 +0300 Leon Romanovsky wrote:
> > On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> > > On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > > > This series prepares code to remove devlink_reload_enable/_disable API
> > > > and in order to do, we move all devlink_register() calls to be right
> > > > before devlink_reload_enable().
> > > >
> > > > The best place for such a call should be right before exiting from
> > > > the probe().
> > > >
> > > > This is done because devlink_register() opens devlink netlink to the
> > > > users and gives them a venue to issue commands before initialization
> > > > is finished.
> > > >
> > > > 1. Some drivers were aware of such "functionality" and tried to protect
> > > > themselves with extra locks, state machines and devlink_reload_enable().
> > > > Let's assume that it worked for them, but I'm personally skeptical about
> > > > it.
> > > >
> > > > 2. Some drivers copied that pattern, but without locks and state
> > > > machines. That protected them from reload flows, but not from any _set_
> > > > routines.
> > > >
> > > > 3. And all other drivers simply didn't understand the implications of early
> > > > devlink_register() and can be seen as "broken".
> > >
> > > What are those implications for drivers which don't implement reload?
> > > Depending on which parts of devlink the drivers implement there may well
> > > be nothing to worry about.
> > >
> > > Plus devlink instances start out with reload disabled. Could you please
> > > take a step back and explain why these changes are needed.
> >
> > The problem is that devlink_register() adds new devlink instance to the
> > list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
> > will try to access devices during their initialization, before they are ready.
> >
> > The more troublesome case is that devlink_list is iterated in the
> > devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
> > latter function will return to the caller that new devlink is valid and
> > such caller will be able to proceed to *_set_doit() functions.
> >
> > Just as an example:
> > * user sends netlink message
> > * devlink_nl_cmd_eswitch_set_doit()
> > * ops->eswitch_mode_set()
> > * Are you sure that all drivers protected here?
> > I remind that driver is in the middle of its probe().
> >
> > Someone can argue that drivers and devlink are protected from anything
> > harmful with their global (devlink_mutex and devlink->lock) and internal
> > (device->lock, e.t.c.) locks. However it is impossible to prove for all
> > drivers and prone to errors.
> >
> > Reload enable/disable gives false impression that the problem exists in
> > that flow only, which is not true.
> >
> > devlink_reload_enable() is a duct tape because reload flows much easier
> > to hit.
>
> Right :/
>
> > > > In this series, we focus on items #1 and #2.
> > > >
> > > > Please share your opinion if I should change ALL other drivers to make
> > > > sure that devlink_register() is the last command or leave them in an
> > > > as-is state.
> > >
> > > Can you please share the output of devlink monitor and ip monitor link
> > > before and after? The modified drivers will not register ports before
> > > they register the devlink instance itself.
> >
> > Not really, they will register but won't be accessible from the user space.
> > The only difference is the location of "[dev,new] ..." notification.
>
> Is that because of mlx5's use of auxdev, or locking? I don't see
> anything that should prevent the port notification from coming out.
And it is ok, kernel can (and does) send notifications, because we left
devlink_ops assignment to be in devlink_alloc(). It ensures that all
flows that worked before will continue to work without too much changes.
>
> I think the notifications need to get straightened out, we can't notify
> about sub-objects until the object is registered, since they are
> inaccessible.
I'm not sure about that. You present the case where kernel and user
space races against each other and historically kernel doesn't protect
from such flows.
For example, you can randomly remove and add kernel modules. At some
point of time, you will get "missing symbols errors", just because
one module tries to load and it depends on already removed one.
We must protect kernel and this is what I do. User shouldn't access
devlink instance before he sees "dev name" notification.
Of course, we can move various iterators to devlink_register(), but it
will make code much complex, because we have objects that can be
registered at any time (IMHO. trap is one of them) and I will need to
implement notification logic that separate objects that were created
before devlink_register and after.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 14:01 ` Leon Romanovsky
@ 2021-08-11 14:15 ` Leon Romanovsky
2021-08-11 14:18 ` Jakub Kicinski
1 sibling, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-11 14:15 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, Aug 11, 2021 at 05:01:20PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 11, 2021 at 06:27:32AM -0700, Jakub Kicinski wrote:
> > On Wed, 11 Aug 2021 09:10:49 +0300 Leon Romanovsky wrote:
> > > On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> > > > On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > > > > This series prepares code to remove devlink_reload_enable/_disable API
> > > > > and in order to do, we move all devlink_register() calls to be right
> > > > > before devlink_reload_enable().
> > > > >
> > > > > The best place for such a call should be right before exiting from
> > > > > the probe().
> > > > >
> > > > > This is done because devlink_register() opens devlink netlink to the
> > > > > users and gives them a venue to issue commands before initialization
> > > > > is finished.
> > > > >
> > > > > 1. Some drivers were aware of such "functionality" and tried to protect
> > > > > themselves with extra locks, state machines and devlink_reload_enable().
> > > > > Let's assume that it worked for them, but I'm personally skeptical about
> > > > > it.
> > > > >
> > > > > 2. Some drivers copied that pattern, but without locks and state
> > > > > machines. That protected them from reload flows, but not from any _set_
> > > > > routines.
> > > > >
> > > > > 3. And all other drivers simply didn't understand the implications of early
> > > > > devlink_register() and can be seen as "broken".
> > > >
> > > > What are those implications for drivers which don't implement reload?
> > > > Depending on which parts of devlink the drivers implement there may well
> > > > be nothing to worry about.
> > > >
> > > > Plus devlink instances start out with reload disabled. Could you please
> > > > take a step back and explain why these changes are needed.
> > >
> > > The problem is that devlink_register() adds new devlink instance to the
> > > list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
> > > will try to access devices during their initialization, before they are ready.
> > >
> > > The more troublesome case is that devlink_list is iterated in the
> > > devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
> > > latter function will return to the caller that new devlink is valid and
> > > such caller will be able to proceed to *_set_doit() functions.
> > >
> > > Just as an example:
> > > * user sends netlink message
> > > * devlink_nl_cmd_eswitch_set_doit()
> > > * ops->eswitch_mode_set()
> > > * Are you sure that all drivers protected here?
> > > I remind that driver is in the middle of its probe().
> > >
> > > Someone can argue that drivers and devlink are protected from anything
> > > harmful with their global (devlink_mutex and devlink->lock) and internal
> > > (device->lock, e.t.c.) locks. However it is impossible to prove for all
> > > drivers and prone to errors.
> > >
> > > Reload enable/disable gives false impression that the problem exists in
> > > that flow only, which is not true.
> > >
> > > devlink_reload_enable() is a duct tape because reload flows much easier
> > > to hit.
> >
> > Right :/
> >
> > > > > In this series, we focus on items #1 and #2.
> > > > >
> > > > > Please share your opinion if I should change ALL other drivers to make
> > > > > sure that devlink_register() is the last command or leave them in an
> > > > > as-is state.
> > > >
> > > > Can you please share the output of devlink monitor and ip monitor link
> > > > before and after? The modified drivers will not register ports before
> > > > they register the devlink instance itself.
> > >
> > > Not really, they will register but won't be accessible from the user space.
> > > The only difference is the location of "[dev,new] ..." notification.
> >
> > Is that because of mlx5's use of auxdev, or locking? I don't see
> > anything that should prevent the port notification from coming out.
>
> And it is ok, kernel can (and does) send notifications, because we left
> devlink_ops assignment to be in devlink_alloc(). It ensures that all
> flows that worked before will continue to work without too much changes.
>
> >
> > I think the notifications need to get straightened out, we can't notify
> > about sub-objects until the object is registered, since they are
> > inaccessible.
>
> I'm not sure about that. You present the case where kernel and user
> space races against each other and historically kernel doesn't protect
> from such flows.
>
> For example, you can randomly remove and add kernel modules. At some
> point of time, you will get "missing symbols errors", just because
> one module tries to load and it depends on already removed one.
>
> We must protect kernel and this is what I do. User shouldn't access
> devlink instance before he sees "dev name" notification.
>
> Of course, we can move various iterators to devlink_register(), but it
> will make code much complex, because we have objects that can be
> registered at any time (IMHO. trap is one of them) and I will need to
> implement notification logic that separate objects that were created
> before devlink_register and after.
Bottom line,
I'm trying to make code simpler, not opposite :).
>
> Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 14:01 ` Leon Romanovsky
2021-08-11 14:15 ` Leon Romanovsky
@ 2021-08-11 14:18 ` Jakub Kicinski
2021-08-11 14:36 ` Leon Romanovsky
2021-08-12 4:10 ` Leon Romanovsky
1 sibling, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2021-08-11 14:18 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, 11 Aug 2021 17:01:20 +0300 Leon Romanovsky wrote:
> > > Not really, they will register but won't be accessible from the user space.
> > > The only difference is the location of "[dev,new] ..." notification.
> >
> > Is that because of mlx5's use of auxdev, or locking? I don't see
> > anything that should prevent the port notification from coming out.
>
> And it is ok, kernel can (and does) send notifications, because we left
> devlink_ops assignment to be in devlink_alloc(). It ensures that all
> flows that worked before will continue to work without too much changes.
>
> > I think the notifications need to get straightened out, we can't notify
> > about sub-objects until the object is registered, since they are
> > inaccessible.
>
> I'm not sure about that. You present the case where kernel and user
> space races against each other and historically kernel doesn't protect
> from such flows.
>
> For example, you can randomly remove and add kernel modules. At some
> point of time, you will get "missing symbols errors", just because
> one module tries to load and it depends on already removed one.
Sure. But there is a difference between an error because another
actor did something conflicting, asynchronously, and API which by design
sends notifications which can't be acted upon until later point in time,
because kernel sent them too early.
> We must protect kernel and this is what I do. User shouldn't access
> devlink instance before he sees "dev name" notification.
Which is a new rule, and therefore a uAPI change..
> Of course, we can move various iterators to devlink_register(), but it
> will make code much complex, because we have objects that can be
> registered at any time (IMHO. trap is one of them) and I will need to
> implement notification logic that separate objects that were created
> before devlink_register and after.
I appreciate it's a PITA but it is the downside of a solution where
registration of co-dependent objects exposed via devlink is reordered
in the kernel.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 14:18 ` Jakub Kicinski
@ 2021-08-11 14:36 ` Leon Romanovsky
2021-08-12 4:10 ` Leon Romanovsky
1 sibling, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-11 14:36 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, Aug 11, 2021 at 07:18:17AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 17:01:20 +0300 Leon Romanovsky wrote:
> > > > Not really, they will register but won't be accessible from the user space.
> > > > The only difference is the location of "[dev,new] ..." notification.
> > >
> > > Is that because of mlx5's use of auxdev, or locking? I don't see
> > > anything that should prevent the port notification from coming out.
> >
> > And it is ok, kernel can (and does) send notifications, because we left
> > devlink_ops assignment to be in devlink_alloc(). It ensures that all
> > flows that worked before will continue to work without too much changes.
> >
> > > I think the notifications need to get straightened out, we can't notify
> > > about sub-objects until the object is registered, since they are
> > > inaccessible.
> >
> > I'm not sure about that. You present the case where kernel and user
> > space races against each other and historically kernel doesn't protect
> > from such flows.
> >
> > For example, you can randomly remove and add kernel modules. At some
> > point of time, you will get "missing symbols errors", just because
> > one module tries to load and it depends on already removed one.
>
> Sure. But there is a difference between an error because another
> actor did something conflicting, asynchronously, and API which by design
> sends notifications which can't be acted upon until later point in time,
> because kernel sent them too early.
>
> > We must protect kernel and this is what I do. User shouldn't access
> > devlink instance before he sees "dev name" notification.
>
> Which is a new rule, and therefore a uAPI change..
>
> > Of course, we can move various iterators to devlink_register(), but it
> > will make code much complex, because we have objects that can be
> > registered at any time (IMHO. trap is one of them) and I will need to
> > implement notification logic that separate objects that were created
> > before devlink_register and after.
>
> I appreciate it's a PITA but it is the downside of a solution where
> registration of co-dependent objects exposed via devlink is reordered
> in the kernel.
No problem, I will rewrite notification logic to be queue-based mechanism.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
2021-08-11 14:18 ` Jakub Kicinski
2021-08-11 14:36 ` Leon Romanovsky
@ 2021-08-12 4:10 ` Leon Romanovsky
1 sibling, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-08-12 4:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Guangbin Huang, Ido Schimmel, Jiri Pirko,
linux-kernel, Michael Guralnik, netdev, Saeed Mahameed,
Salil Mehta, Tariq Toukan, Yisen Zhuang, Yufeng Mo
On Wed, Aug 11, 2021 at 07:18:17AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 17:01:20 +0300 Leon Romanovsky wrote:
> > > > Not really, they will register but won't be accessible from the user space.
> > > > The only difference is the location of "[dev,new] ..." notification.
> > >
> > > Is that because of mlx5's use of auxdev, or locking? I don't see
> > > anything that should prevent the port notification from coming out.
> >
> > And it is ok, kernel can (and does) send notifications, because we left
> > devlink_ops assignment to be in devlink_alloc(). It ensures that all
> > flows that worked before will continue to work without too much changes.
> >
> > > I think the notifications need to get straightened out, we can't notify
> > > about sub-objects until the object is registered, since they are
> > > inaccessible.
> >
> > I'm not sure about that. You present the case where kernel and user
> > space races against each other and historically kernel doesn't protect
> > from such flows.
> >
> > For example, you can randomly remove and add kernel modules. At some
> > point of time, you will get "missing symbols errors", just because
> > one module tries to load and it depends on already removed one.
>
> Sure. But there is a difference between an error because another
> actor did something conflicting, asynchronously, and API which by design
> sends notifications which can't be acted upon until later point in time,
> because kernel sent them too early.
>
> > We must protect kernel and this is what I do. User shouldn't access
> > devlink instance before he sees "dev name" notification.
>
> Which is a new rule, and therefore a uAPI change..
>
> > Of course, we can move various iterators to devlink_register(), but it
> > will make code much complex, because we have objects that can be
> > registered at any time (IMHO. trap is one of them) and I will need to
> > implement notification logic that separate objects that were created
> > before devlink_register and after.
>
> I appreciate it's a PITA but it is the downside of a solution where
> registration of co-dependent objects exposed via devlink is reordered
> in the kernel.
I thought about it more and realized what we can make registration
monitor notifications behave as before, we can't do it for unregister
path.
For register, we can buffer all notifications till devlink_register
comes, use it as a marker and release everything that was accumulated
till that point. Everything that will come later will be delivered
immediately.
It will give "dev name ..." print at the beginning as you want.
For unregister, this trick won't work because we don't know if any other
devlink unregister API is used after devlink_unregister. So we can't
delay notifications.
Even if we can, it will be even worse from user perspective, because
in such case devlink_unregister() will close netlink access without
notifying user and he won't understand why ports don't work (as an
example).
Jakub, you are over engineering here and solve non-existing problem.
> Which is a new rule, and therefore a uAPI change..
AFAIR, netlink can be out-of-order, because it is UDP, but it is just
impractical to see it in the real-life. So no, it is not new rule.
Thanks
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-08-12 4:10 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished Leon Romanovsky
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
2021-08-11 6:10 ` Leon Romanovsky
2021-08-11 13:27 ` Jakub Kicinski
2021-08-11 14:01 ` Leon Romanovsky
2021-08-11 14:15 ` Leon Romanovsky
2021-08-11 14:18 ` Jakub Kicinski
2021-08-11 14:36 ` Leon Romanovsky
2021-08-12 4:10 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).