Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "David S . Miller" <davem@davemloft.net>,
	Guangbin Huang <huangguangbin2@huawei.com>,
	Ido Schimmel <idosch@nvidia.com>, Jiri Pirko <jiri@nvidia.com>,
	linux-kernel@vger.kernel.org,
	Michael Guralnik <michaelgur@mellanox.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	Yisen Zhuang <yisen.zhuang@huawei.com>,
	Yufeng Mo <moyufeng@huawei.com>
Subject: Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
Date: Wed, 11 Aug 2021 09:10:49 +0300	[thread overview]
Message-ID: <YRNp6Zmh99N3kJVa@unreal> (raw)
In-Reply-To: <20210810165318.323eae24@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > This series prepares code to remove devlink_reload_enable/_disable API
> > and in order to do, we move all devlink_register() calls to be right
> > before devlink_reload_enable().
> > 
> > The best place for such a call should be right before exiting from
> > the probe().
> > 
> > This is done because devlink_register() opens devlink netlink to the
> > users and gives them a venue to issue commands before initialization
> > is finished.
> > 
> > 1. Some drivers were aware of such "functionality" and tried to protect
> > themselves with extra locks, state machines and devlink_reload_enable().
> > Let's assume that it worked for them, but I'm personally skeptical about
> > it.
> > 
> > 2. Some drivers copied that pattern, but without locks and state
> > machines. That protected them from reload flows, but not from any _set_
> > routines.
> > 
> > 3. And all other drivers simply didn't understand the implications of early
> > devlink_register() and can be seen as "broken".
> 
> What are those implications for drivers which don't implement reload?
> Depending on which parts of devlink the drivers implement there may well
> be nothing to worry about.
> 
> Plus devlink instances start out with reload disabled. Could you please
> take a step back and explain why these changes are needed.

The problem is that devlink_register() adds new devlink instance to the
list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
will try to access devices during their initialization, before they are ready.

The more troublesome case is that devlink_list is iterated in the
devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
latter function will return to the caller that new devlink is valid and
such caller will be able to proceed to *_set_doit() functions.

Just as an example:
 * user sends netlink message
  * devlink_nl_cmd_eswitch_set_doit()
   * ops->eswitch_mode_set()
    * Are you sure that all drivers protected here?
      I remind that driver is in the middle of its probe().

Someone can argue that drivers and devlink are protected from anything
harmful with their global (devlink_mutex and devlink->lock) and internal
(device->lock, e.t.c.) locks. However it is impossible to prove for all
drivers and prone to errors.

Reload enable/disable gives false impression that the problem exists in
that flow only, which is not true.

devlink_reload_enable() is a duct tape because reload flows much easier
to hit.

> 
> > In this series, we focus on items #1 and #2.
> > 
> > Please share your opinion if I should change ALL other drivers to make
> > sure that devlink_register() is the last command or leave them in an
> > as-is state.
> 
> Can you please share the output of devlink monitor and ip monitor link
> before and after?  The modified drivers will not register ports before
> they register the devlink instance itself.

Not really, they will register but won't be accessible from the user space.
The only difference is the location of "[dev,new] ..." notification.

[leonro@vm ~]$ sudo modprobe mlx5_core
[  105.575790] mlx5_core 0000:00:09.0: firmware version: 4.8.9999
[  105.576349] mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
[  105.686217] pps pps0: new PPS source ptp0
[  105.688144] mlx5_core 0000:00:09.0: E-Switch: Total vports 2, per vport: max uc(32768) max mc(32768)
[  105.717736] mlx5_core 0000:00:09.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[  106.957028] mlx5_core 0000:00:09.0 eth1: Link down
[  106.960379] mlx5_core 0000:00:09.0 eth1: Link up
[  106.967916] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
================================================================================================
Before:
[leonro@vm ~]$ sudo devlink monitor
[dev,new] pci/0000:00:09.0
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
  values:
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
  values:
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
  values:
[param,new] pci/0000:00:09.0: name enable_roce type generic
  values:
    cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
  values:
    cmode driverinit value 15
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
  values:
    cmode runtime value dmfs
[param,new] pci/0000:00:09.0: name enable_roce type generic
  values:
    cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
  values:
    cmode driverinit value 15
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
  values:
    cmode runtime value true
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
  values:
    cmode runtime value true
[trap-group,new] pci/0000:00:09.0: name l2_drops generic true
[trap,new] pci/0000:00:09.0: name ingress_vlan_filter type drop generic true action drop group l2_drops
[trap,new] pci/0000:00:09.0: name dmac_filter type drop generic true action drop group l2_drops
[port,new] pci/0000:00:09.0/131071: type notset flavour physical port 0 splittable false
[port,new] pci/0000:00:09.0/131071: type eth netdev eth1 flavour physical port 0 splittable false

[leonro@vm ~]$ sudo ip monitor
inet eth1 forwarding off rp_filter loose mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off 
inet6 eth1 forwarding off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off 
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DOWN group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
multicast ff00::/8 dev eth1 table local proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1    inet6 fe80::5054:ff:fe12:3456/64 scope link 
       valid_lft forever preferred_lft forever
local fe80::5054:ff:fe12:3456 dev eth1 table local proto kernel metric 0 pref medium

===========================================================================================================
After:
[leonro@vm ~]$ sudo devlink monitor
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
  values:
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
  values:
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
  values:
[param,new] pci/0000:00:09.0: name enable_roce type generic
  values:
    cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
  values:
    cmode driverinit value 15
[param,new] pci/0000:00:09.0: name flow_steering_mode type driver-specific
  values:
    cmode runtime value dmfs
[param,new] pci/0000:00:09.0: name enable_roce type generic
  values:
    cmode driverinit value true
[param,new] pci/0000:00:09.0: name fdb_large_groups type driver-specific
  values:
    cmode driverinit value 15
[param,new] pci/0000:00:09.0: name esw_port_metadata type driver-specific
  values:
    cmode runtime value true
[param,new] pci/0000:00:09.0: name enable_remote_dev_reset type generic
  values:
    cmode runtime value true
[trap-group,new] pci/0000:00:09.0: name l2_drops generic true
[trap,new] pci/0000:00:09.0: name ingress_vlan_filter type drop generic true action drop group l2_drops
[trap,new] pci/0000:00:09.0: name dmac_filter type drop generic true action drop group l2_drops
[dev,new] pci/0000:00:09.0
[port,new] pci/0000:00:09.0/131071: type notset flavour physical port 0 splittable false
[port,new] pci/0000:00:09.0/131071: type eth netdev eth1 flavour physical port 0 splittable false

[leonro@vm ~]$ sudo ip monitor
inet eth1 forwarding off rp_filter loose mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off 
inet6 eth1 forwarding off mc_forwarding off proxy_neigh off ignore_routes_with_linkdown off 
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DOWN group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
multicast ff00::/8 dev eth1 table local proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default 
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
4: eth1    inet6 fe80::5054:ff:fe12:3456/64 scope link 
       valid_lft forever preferred_lft forever
local fe80::5054:ff:fe12:3456 dev eth1 table local proto kernel metric 0 pref medium

  reply	other threads:[~2021-08-11  6:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 13:37 Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished Leon Romanovsky
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
2021-08-11  6:10   ` Leon Romanovsky [this message]
2021-08-11 13:27     ` Jakub Kicinski
2021-08-11 14:01       ` Leon Romanovsky
2021-08-11 14:15         ` Leon Romanovsky
2021-08-11 14:18         ` Jakub Kicinski
2021-08-11 14:36           ` Leon Romanovsky
2021-08-12  4:10           ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRNp6Zmh99N3kJVa@unreal \
    --to=leon@kernel.org \
    --cc=davem@davemloft.net \
    --cc=huangguangbin2@huawei.com \
    --cc=idosch@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michaelgur@mellanox.com \
    --cc=moyufeng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=salil.mehta@huawei.com \
    --cc=tariqt@nvidia.com \
    --cc=yisen.zhuang@huawei.com \
    --subject='Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).