LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 0/2] multi-threading device shutdown @ 2018-05-03 3:59 Pavel Tatashin 2018-05-03 3:59 ` [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() Pavel Tatashin 2018-05-03 3:59 ` [PATCH 2/2] drivers core: multi-threading device shutdown Pavel Tatashin 0 siblings, 2 replies; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 3:59 UTC (permalink / raw) To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh Do a faster shutdown by calling dev->*->shutdown(dev) in parallel. device_shutdown() calls these functions for every single device but only using one thread. Since, nothing else is running on the machine by the device_shutdown() s called, there is no reason not to utilize all the available CPU resources. Pavel Tatashin (2): ixgbe: release lock for the duration of ixgbe_suspend_close() drivers core: multi-threading device shutdown drivers/base/core.c | 238 ++++++++++++++---- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +- 2 files changed, 197 insertions(+), 50 deletions(-) -- 2.17.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 3:59 [PATCH 0/2] multi-threading device shutdown Pavel Tatashin @ 2018-05-03 3:59 ` Pavel Tatashin 2018-05-03 13:20 ` Alexander Duyck 2018-05-03 3:59 ` [PATCH 2/2] drivers core: multi-threading device shutdown Pavel Tatashin 1 sibling, 1 reply; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 3:59 UTC (permalink / raw) To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh Currently, during device_shutdown() ixgbe holds rtnl_lock for the duration of lengthy ixgbe_close_suspend(). On machines with multiple ixgbe cards this lock prevents scaling if device_shutdown() function is multi-threaded. It is not necessary to hold this lock during ixgbe_close_suspend() as it is not held when ixgbe_close() is called also during shutdown but for kexec case. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index afadba99f7b8..e7875b58854b 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6748,8 +6748,15 @@ static int __ixgbe_shutdown(struct pci_dev *pdev, bool *enable_wake) rtnl_lock(); netif_device_detach(netdev); - if (netif_running(netdev)) + if (netif_running(netdev)) { + /* Suspend takes a long time, device_shutdown may be + * parallelized this function, so drop lock for the + * duration of this call. + */ + rtnl_unlock(); ixgbe_close_suspend(adapter); + rtnl_lock(); + } ixgbe_clear_interrupt_scheme(adapter); rtnl_unlock(); -- 2.17.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 3:59 ` [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() Pavel Tatashin @ 2018-05-03 13:20 ` Alexander Duyck 2018-05-03 13:30 ` Pavel Tatashin 0 siblings, 1 reply; 10+ messages in thread From: Alexander Duyck @ 2018-05-03 13:20 UTC (permalink / raw) To: Pavel Tatashin Cc: steven.sistare, daniel.m.jordan, LKML, Jeff Kirsher, intel-wired-lan, Netdev, Greg KH On Wed, May 2, 2018 at 8:59 PM, Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > Currently, during device_shutdown() ixgbe holds rtnl_lock for the duration > of lengthy ixgbe_close_suspend(). On machines with multiple ixgbe cards > this lock prevents scaling if device_shutdown() function is multi-threaded. > > It is not necessary to hold this lock during ixgbe_close_suspend() > as it is not held when ixgbe_close() is called also during shutdown but for > kexec case. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index afadba99f7b8..e7875b58854b 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -6748,8 +6748,15 @@ static int __ixgbe_shutdown(struct pci_dev *pdev, bool *enable_wake) > rtnl_lock(); > netif_device_detach(netdev); > > - if (netif_running(netdev)) > + if (netif_running(netdev)) { > + /* Suspend takes a long time, device_shutdown may be > + * parallelized this function, so drop lock for the > + * duration of this call. > + */ > + rtnl_unlock(); > ixgbe_close_suspend(adapter); > + rtnl_lock(); > + } > > ixgbe_clear_interrupt_scheme(adapter); > rtnl_unlock(); I'm not a fan of dropping the mutex while we go through ixgbe_close_suspend. I'm concerned it will result in us having a number of races on shutdown. If anything, I think we would need to find a replacement for the mutex that can operate at the per-netdev level if you are wanting to parallelize this. - Alex ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 13:20 ` Alexander Duyck @ 2018-05-03 13:30 ` Pavel Tatashin 2018-05-03 13:46 ` Alexander Duyck 0 siblings, 1 reply; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 13:30 UTC (permalink / raw) To: alexander.duyck Cc: Steven Sistare, Daniel Jordan, LKML, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh Hi Alex, > I'm not a fan of dropping the mutex while we go through > ixgbe_close_suspend. I'm concerned it will result in us having a > number of races on shutdown. I would agree, but ixgbe_close_suspend() is already called without this mutex from ixgbe_close(). This path is executed also during machine shutdown but when kexec'ed. So, it is either an existing bug or there are no races. But, because ixgbe_close() is called from the userland, and a little earlier than ixgbe_shutdown() I think this means there are no races. > If anything, I think we would need to find a replacement for the mutex > that can operate at the per-netdev level if you are wanting to > parallelize this. Yes, if lock is necessary, it can be replaced in this place (and added to ixgbe_close()) with something scalable. Thank you, Pavel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 13:30 ` Pavel Tatashin @ 2018-05-03 13:46 ` Alexander Duyck 2018-05-03 14:01 ` Steven Sistare 0 siblings, 1 reply; 10+ messages in thread From: Alexander Duyck @ 2018-05-03 13:46 UTC (permalink / raw) To: Pavel Tatashin Cc: Steven Sistare, Daniel Jordan, LKML, Jeff Kirsher, intel-wired-lan, Netdev, Greg KH On Thu, May 3, 2018 at 6:30 AM, Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > Hi Alex, Hi Pavel, >> I'm not a fan of dropping the mutex while we go through >> ixgbe_close_suspend. I'm concerned it will result in us having a >> number of races on shutdown. > > I would agree, but ixgbe_close_suspend() is already called without this > mutex from ixgbe_close(). This path is executed also during machine > shutdown but when kexec'ed. So, it is either an existing bug or there are > no races. But, because ixgbe_close() is called from the userland, and a > little earlier than ixgbe_shutdown() I think this means there are no races. All callers of ixgbe_close are supposed to already be holding the RTNL mutex. That is the whole reason why we need to hold it here as the shutdown path doesn't have the mutex held otherwise. The mutex was added here because shutdown was racing with the open/close calls. > >> If anything, I think we would need to find a replacement for the mutex >> that can operate at the per-netdev level if you are wanting to >> parallelize this. > > Yes, if lock is necessary, it can be replaced in this place (and added to > ixgbe_close()) with something scalable. I wouldn't suggest adding yet another lock for all this. Instead it might make more sense for us to start looking at breaking up the locking since most of the networking subsystem uses the rtnl_lock call it might be time to start looking at doing something like cutting back on the use of this in cases where all the resources needed are specific to one device and instead have a per device lock that could address those kind of cases. - Alex ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 13:46 ` Alexander Duyck @ 2018-05-03 14:01 ` Steven Sistare 2018-05-03 14:23 ` Pavel Tatashin 0 siblings, 1 reply; 10+ messages in thread From: Steven Sistare @ 2018-05-03 14:01 UTC (permalink / raw) To: Pavel Tatashin Cc: Alexander Duyck, Daniel Jordan, LKML, Jeff Kirsher, intel-wired-lan, Netdev, Greg KH On 5/3/2018 9:46 AM, Alexander Duyck wrote: > On Thu, May 3, 2018 at 6:30 AM, Pavel Tatashin > <pasha.tatashin@oracle.com> wrote: >> Hi Alex, > > Hi Pavel, > >>> I'm not a fan of dropping the mutex while we go through >>> ixgbe_close_suspend. I'm concerned it will result in us having a >>> number of races on shutdown. >> >> I would agree, but ixgbe_close_suspend() is already called without this >> mutex from ixgbe_close(). This path is executed also during machine >> shutdown but when kexec'ed. So, it is either an existing bug or there are >> no races. But, because ixgbe_close() is called from the userland, and a >> little earlier than ixgbe_shutdown() I think this means there are no races. > > All callers of ixgbe_close are supposed to already be holding the RTNL > mutex. That is the whole reason why we need to hold it here as the > shutdown path doesn't have the mutex held otherwise. The mutex was > added here because shutdown was racing with the open/close calls. > >> >>> If anything, I think we would need to find a replacement for the mutex >>> that can operate at the per-netdev level if you are wanting to >>> parallelize this. >> >> Yes, if lock is necessary, it can be replaced in this place (and added to >> ixgbe_close()) with something scalable. > > I wouldn't suggest adding yet another lock for all this. Instead it > might make more sense for us to start looking at breaking up the > locking since most of the networking subsystem uses the rtnl_lock call > it might be time to start looking at doing something like cutting back > on the use of this in cases where all the resources needed are > specific to one device and instead have a per device lock that could > address those kind of cases. Hi Pavel, you might want to pull lock optimization out of this patch series. The parallelization by itself is valuable, and optimizations for individual devices including locks can come later. - Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() 2018-05-03 14:01 ` Steven Sistare @ 2018-05-03 14:23 ` Pavel Tatashin 0 siblings, 0 replies; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 14:23 UTC (permalink / raw) To: Steven Sistare Cc: alexander.duyck, Daniel Jordan, LKML, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh > Hi Pavel, you might want to pull lock optimization out of this patch series. > The parallelization by itself is valuable, and optimizations for individual > devices including locks can come later. Hi Steve, Yes, I will pull this patch out of the series. Thank you for the suggestion. Pavel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] drivers core: multi-threading device shutdown 2018-05-03 3:59 [PATCH 0/2] multi-threading device shutdown Pavel Tatashin 2018-05-03 3:59 ` [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() Pavel Tatashin @ 2018-05-03 3:59 ` Pavel Tatashin 2018-05-03 5:54 ` Tobin C. Harding 1 sibling, 1 reply; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 3:59 UTC (permalink / raw) To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh When system is rebooted, halted or kexeced device_shutdown() is called. This function shuts down every single device by calling either: dev->bus->shutdown(dev) dev->driver->shutdown(dev) Even on a machine just with a moderate amount of devices, device_shutdown() may take multiple seconds to complete. Because many devices require a specific delays to perform this operation. Here is sample analysis of time it takes to call device_shutdown() on two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine. device_shutdown 2.95s mlx4_shutdown 1.14s megasas_shutdown 0.24s ixgbe_shutdown 0.37s x 4 (four ixgbe devices on my machine). the rest 0.09s In mlx4 we spent the most time, but that is because there is a 1 second sleep: mlx4_shutdown mlx4_unload_one mlx4_free_ownership msleep(1000) With megasas we spend quoter of second, but sometimes longer (up-to 0.5s) in this path: megasas_shutdown megasas_flush_cache megasas_issue_blocked_cmd wait_event_timeout Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time is spread all over the place, with bigger offenders: ixgbe_shutdown __ixgbe_shutdown ixgbe_close_suspend ixgbe_down ixgbe_init_hw_generic ixgbe_reset_hw_X540 msleep(100); 0.104483472 ixgbe_get_san_mac_addr_generic 0.048414851 ixgbe_get_wwn_prefix_generic 0.048409893 ixgbe_start_hw_X540 ixgbe_start_hw_generic ixgbe_clear_hw_cntrs_generic 0.048581502 ixgbe_setup_fc_generic 0.024225800 All the ixgbe_*generic functions end-up calling: ixgbe_read_eerd_X540() ixgbe_acquire_swfw_sync_X540 usleep_range(5000, 6000); ixgbe_release_swfw_sync_X540 usleep_range(5000, 6000); While these are short sleeps, they end-up calling them over 24 times! 24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices. While we should keep optimizing the individual device drivers, in some cases this is simply a hardware property that forces a specific delay, and we must wait. So, the solution for this problem is to shutdown devices in parallel. However, we must shutdown children before shutting down parents, so parent device must wait for its children to finish. With this patch, on the same machine devices_shutdown() takes 1.142s, and without mlx4 one second delay only 0.38s Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> --- drivers/base/core.c | 238 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 189 insertions(+), 49 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index b610816eb887..f370369a303b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -25,6 +25,7 @@ #include <linux/netdevice.h> #include <linux/sched/signal.h> #include <linux/sysfs.h> +#include <linux/kthread.h> #include "base.h" #include "power/power.h" @@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev, return *tmp = s; } +/** + * device_children_count - device children count + * @parent: parent struct device. + * + * Returns number of children for this device or 0 if nonde. + */ +static int device_children_count(struct device *parent) +{ + struct klist_iter i; + int children = 0; + + if (!parent->p) + return 0; + + klist_iter_init(&parent->p->klist_children, &i); + while (next_device(&i)) + children++; + klist_iter_exit(&i); + + return children; +} + +/** + * device_get_child_by_index - Return child using the provide index. + * @parent: parent struct device. + * @index: Index of the child, where 0 is the first child in the children list, + * and so on. + * + * Returns child or NULL if child with this index is not present. + */ +static struct device * +device_get_child_by_index(struct device *parent, int index) +{ + struct klist_iter i; + struct device *dev = NULL, *d; + int child_index = 0; + + if (!parent->p || index < 0) + return NULL; + + klist_iter_init(&parent->p->klist_children, &i); + while ((d = next_device(&i)) != NULL) { + if (child_index == index) { + dev = d; + break; + } + child_index++; + } + klist_iter_exit(&i); + + return dev; +} + /** * device_for_each_child - device child iterator. * @parent: parent struct device. @@ -2765,71 +2819,157 @@ int device_move(struct device *dev, struct device *new_parent, } EXPORT_SYMBOL_GPL(device_move); +/* + * device_shutdown_one - call ->shutdown() for the device passed as + * argument. + */ +static void device_shutdown_one(struct device *dev) +{ + /* Don't allow any more runtime suspends */ + pm_runtime_get_noresume(dev); + pm_runtime_barrier(dev); + + if (dev->class && dev->class->shutdown_pre) { + if (initcall_debug) + dev_info(dev, "shutdown_pre\n"); + dev->class->shutdown_pre(dev); + } + if (dev->bus && dev->bus->shutdown) { + if (initcall_debug) + dev_info(dev, "shutdown\n"); + dev->bus->shutdown(dev); + } else if (dev->driver && dev->driver->shutdown) { + if (initcall_debug) + dev_info(dev, "shutdown\n"); + dev->driver->shutdown(dev); + } + + /* Release device lock, and decrement the reference counter */ + device_unlock(dev); + put_device(dev); +} + +static DECLARE_COMPLETION(device_root_tasks_complete); +static void device_shutdown_tree(struct device *dev); +static atomic_t device_root_tasks; + +/* + * Passed as an argument to to device_shutdown_task(). + * child_next_index the next available child index. + * tasks_running number of tasks still running. Each tasks decrements it + * when job is finished and the last tasks signals that the + * job is complete. + * complete Used to signal job competition. + * parent Parent device. + */ +struct device_shutdown_task_data { + atomic_t child_next_index; + atomic_t tasks_running; + struct completion complete; + struct device *parent; +}; + +static int device_shutdown_task(void *data) +{ + struct device_shutdown_task_data *tdata = + (struct device_shutdown_task_data *)data; + int child_idx = atomic_inc_return(&tdata->child_next_index) - 1; + struct device *dev = device_get_child_by_index(tdata->parent, + child_idx); + + if (dev) + device_shutdown_tree(dev); + if (atomic_dec_return(&tdata->tasks_running) == 0) + complete(&tdata->complete); + return 0; +} + +/* + * Shutdown device tree with root started in dev. If dev has no children + * simply shutdown only this device. If dev has children recursively shutdown + * children first, and only then the parent. For performance reasons children + * are shutdown in parallel using kernel threads. + */ +static void device_shutdown_tree(struct device *dev) +{ + int children_count = device_children_count(dev); + + if (children_count) { + struct device_shutdown_task_data tdata; + int i; + + init_completion(&tdata.complete); + atomic_set(&tdata.child_next_index, 0); + atomic_set(&tdata.tasks_running, children_count); + tdata.parent = dev; + + for (i = 0; i < children_count; i++) { + kthread_run(device_shutdown_task, + &tdata, "device_shutdown.%s", + dev_name(dev)); + } + wait_for_completion(&tdata.complete); + } + device_shutdown_one(dev); +} + +/* + * On shutdown each root device (the one that does not have a parent) goes + * through this function. + */ +static int +device_shutdown_root_task(void *data) +{ + struct device *dev = (struct device *)data; + + device_shutdown_tree(dev); + if (atomic_dec_return(&device_root_tasks) == 0) + complete(&device_root_tasks_complete); + return 0; +} + /** * device_shutdown - call ->shutdown() on each device to shutdown. */ void device_shutdown(void) { - struct device *dev, *parent; + struct list_head *pos, *next; + int root_devices = 0; + struct device *dev; spin_lock(&devices_kset->list_lock); /* - * Walk the devices list backward, shutting down each in turn. - * Beware that device unplug events may also start pulling - * devices offline, even as the system is shutting down. + * Prepare devices for shutdown: lock, and increment references in every + * devices. Remove child devices from the list, and count number of root + * devices. */ - while (!list_empty(&devices_kset->list)) { - dev = list_entry(devices_kset->list.prev, struct device, - kobj.entry); + list_for_each_safe(pos, next, &devices_kset->list) { + dev = list_entry(pos, struct device, kobj.entry); - /* - * hold reference count of device's parent to - * prevent it from being freed because parent's - * lock is to be held - */ - parent = get_device(dev->parent); get_device(dev); - /* - * Make sure the device is off the kset list, in the - * event that dev->*->shutdown() doesn't remove it. - */ - list_del_init(&dev->kobj.entry); - spin_unlock(&devices_kset->list_lock); - - /* hold lock to avoid race with probe/release */ - if (parent) - device_lock(parent); device_lock(dev); - /* Don't allow any more runtime suspends */ - pm_runtime_get_noresume(dev); - pm_runtime_barrier(dev); - - if (dev->class && dev->class->shutdown_pre) { - if (initcall_debug) - dev_info(dev, "shutdown_pre\n"); - dev->class->shutdown_pre(dev); - } - if (dev->bus && dev->bus->shutdown) { - if (initcall_debug) - dev_info(dev, "shutdown\n"); - dev->bus->shutdown(dev); - } else if (dev->driver && dev->driver->shutdown) { - if (initcall_debug) - dev_info(dev, "shutdown\n"); - dev->driver->shutdown(dev); - } - - device_unlock(dev); - if (parent) - device_unlock(parent); - - put_device(dev); - put_device(parent); - + if (!dev->parent) + root_devices++; + else + list_del_init(&dev->kobj.entry); + } + atomic_set(&device_root_tasks, root_devices); + /* + * Shutdown the root devices in parallel. The children are going to be + * shutdown first. + */ + list_for_each_safe(pos, next, &devices_kset->list) { + dev = list_entry(pos, struct device, kobj.entry); + list_del_init(&dev->kobj.entry); + spin_unlock(&devices_kset->list_lock); + kthread_run(device_shutdown_root_task, + dev, "device_root_shutdown.%s", + dev_name(dev)); spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); + wait_for_completion(&device_root_tasks_complete); } /* -- 2.17.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drivers core: multi-threading device shutdown 2018-05-03 3:59 ` [PATCH 2/2] drivers core: multi-threading device shutdown Pavel Tatashin @ 2018-05-03 5:54 ` Tobin C. Harding 2018-05-03 13:38 ` Pavel Tatashin 0 siblings, 1 reply; 10+ messages in thread From: Tobin C. Harding @ 2018-05-03 5:54 UTC (permalink / raw) To: Pavel Tatashin Cc: steven.sistare, daniel.m.jordan, linux-kernel, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh This code was a pleasure to read, super clean. On Wed, May 02, 2018 at 11:59:31PM -0400, Pavel Tatashin wrote: > When system is rebooted, halted or kexeced device_shutdown() is > called. > > This function shuts down every single device by calling either: > dev->bus->shutdown(dev) > dev->driver->shutdown(dev) > > Even on a machine just with a moderate amount of devices, device_shutdown() > may take multiple seconds to complete. Because many devices require a > specific delays to perform this operation. > > Here is sample analysis of time it takes to call device_shutdown() on > two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine. > > device_shutdown 2.95s > mlx4_shutdown 1.14s > megasas_shutdown 0.24s > ixgbe_shutdown 0.37s x 4 (four ixgbe devices on my machine). > the rest 0.09s > > In mlx4 we spent the most time, but that is because there is a 1 second > sleep: > mlx4_shutdown > mlx4_unload_one > mlx4_free_ownership > msleep(1000) > > With megasas we spend quoter of second, but sometimes longer (up-to 0.5s) > in this path: > > megasas_shutdown > megasas_flush_cache > megasas_issue_blocked_cmd > wait_event_timeout > > Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time > is spread all over the place, with bigger offenders: > > ixgbe_shutdown > __ixgbe_shutdown > ixgbe_close_suspend > ixgbe_down > ixgbe_init_hw_generic > ixgbe_reset_hw_X540 > msleep(100); 0.104483472 > ixgbe_get_san_mac_addr_generic 0.048414851 > ixgbe_get_wwn_prefix_generic 0.048409893 > ixgbe_start_hw_X540 > ixgbe_start_hw_generic > ixgbe_clear_hw_cntrs_generic 0.048581502 > ixgbe_setup_fc_generic 0.024225800 > > All the ixgbe_*generic functions end-up calling: > ixgbe_read_eerd_X540() > ixgbe_acquire_swfw_sync_X540 > usleep_range(5000, 6000); > ixgbe_release_swfw_sync_X540 > usleep_range(5000, 6000); > > While these are short sleeps, they end-up calling them over 24 times! > 24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices. > > While we should keep optimizing the individual device drivers, in some > cases this is simply a hardware property that forces a specific delay, and > we must wait. > > So, the solution for this problem is to shutdown devices in parallel. > However, we must shutdown children before shutting down parents, so parent > device must wait for its children to finish. > > With this patch, on the same machine devices_shutdown() takes 1.142s, and > without mlx4 one second delay only 0.38s > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > --- > drivers/base/core.c | 238 +++++++++++++++++++++++++++++++++++--------- > 1 file changed, 189 insertions(+), 49 deletions(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index b610816eb887..f370369a303b 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -25,6 +25,7 @@ > #include <linux/netdevice.h> > #include <linux/sched/signal.h> > #include <linux/sysfs.h> > +#include <linux/kthread.h> > > #include "base.h" > #include "power/power.h" > @@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev, > return *tmp = s; > } > > +/** > + * device_children_count - device children count > + * @parent: parent struct device. > + * > + * Returns number of children for this device or 0 if nonde. > + */ > +static int device_children_count(struct device *parent) > +{ > + struct klist_iter i; > + int children = 0; > + > + if (!parent->p) > + return 0; > + > + klist_iter_init(&parent->p->klist_children, &i); > + while (next_device(&i)) > + children++; > + klist_iter_exit(&i); > + > + return children; > +} > + > +/** > + * device_get_child_by_index - Return child using the provide index. > + * @parent: parent struct device. > + * @index: Index of the child, where 0 is the first child in the children list, > + * and so on. > + * > + * Returns child or NULL if child with this index is not present. > + */ > +static struct device * > +device_get_child_by_index(struct device *parent, int index) > +{ > + struct klist_iter i; > + struct device *dev = NULL, *d; > + int child_index = 0; > + > + if (!parent->p || index < 0) > + return NULL; > + > + klist_iter_init(&parent->p->klist_children, &i); > + while ((d = next_device(&i)) != NULL) { perhaps: while ((d = next_device(&i))) { > + if (child_index == index) { > + dev = d; > + break; > + } > + child_index++; > + } > + klist_iter_exit(&i); > + > + return dev; > +} > + > /** > * device_for_each_child - device child iterator. > * @parent: parent struct device. > @@ -2765,71 +2819,157 @@ int device_move(struct device *dev, struct device *new_parent, > } > EXPORT_SYMBOL_GPL(device_move); > > +/* > + * device_shutdown_one - call ->shutdown() for the device passed as > + * argument. > + */ > +static void device_shutdown_one(struct device *dev) > +{ > + /* Don't allow any more runtime suspends */ > + pm_runtime_get_noresume(dev); > + pm_runtime_barrier(dev); > + > + if (dev->class && dev->class->shutdown_pre) { > + if (initcall_debug) > + dev_info(dev, "shutdown_pre\n"); > + dev->class->shutdown_pre(dev); > + } > + if (dev->bus && dev->bus->shutdown) { > + if (initcall_debug) > + dev_info(dev, "shutdown\n"); > + dev->bus->shutdown(dev); > + } else if (dev->driver && dev->driver->shutdown) { > + if (initcall_debug) > + dev_info(dev, "shutdown\n"); > + dev->driver->shutdown(dev); > + } > + > + /* Release device lock, and decrement the reference counter */ > + device_unlock(dev); > + put_device(dev); > +} > + > +static DECLARE_COMPLETION(device_root_tasks_complete); > +static void device_shutdown_tree(struct device *dev); > +static atomic_t device_root_tasks; > + > +/* > + * Passed as an argument to to device_shutdown_task(). > + * child_next_index the next available child index. > + * tasks_running number of tasks still running. Each tasks decrements it > + * when job is finished and the last tasks signals that the > + * job is complete. > + * complete Used to signal job competition. > + * parent Parent device. > + */ > +struct device_shutdown_task_data { > + atomic_t child_next_index; > + atomic_t tasks_running; > + struct completion complete; > + struct device *parent; > +}; > + > +static int device_shutdown_task(void *data) > +{ > + struct device_shutdown_task_data *tdata = > + (struct device_shutdown_task_data *)data; perhaps: struct device_shutdown_task_data *tdata = data; > + int child_idx = atomic_inc_return(&tdata->child_next_index) - 1; > + struct device *dev = device_get_child_by_index(tdata->parent, > + child_idx); perhaps: struct device *dev = device_get_child_by_index(tdata->parent, child_idx); This is over the 80 character limit but only by one character :) > + > + if (dev) > + device_shutdown_tree(dev); > + if (atomic_dec_return(&tdata->tasks_running) == 0) > + complete(&tdata->complete); > + return 0; > +} > + > +/* > + * Shutdown device tree with root started in dev. If dev has no children > + * simply shutdown only this device. If dev has children recursively shutdown > + * children first, and only then the parent. For performance reasons children > + * are shutdown in parallel using kernel threads. > + */ > +static void device_shutdown_tree(struct device *dev) > +{ > + int children_count = device_children_count(dev); > + > + if (children_count) { > + struct device_shutdown_task_data tdata; > + int i; > + > + init_completion(&tdata.complete); > + atomic_set(&tdata.child_next_index, 0); > + atomic_set(&tdata.tasks_running, children_count); > + tdata.parent = dev; > + > + for (i = 0; i < children_count; i++) { > + kthread_run(device_shutdown_task, > + &tdata, "device_shutdown.%s", > + dev_name(dev)); > + } > + wait_for_completion(&tdata.complete); > + } > + device_shutdown_one(dev); > +} > + > +/* > + * On shutdown each root device (the one that does not have a parent) goes > + * through this function. > + */ > +static int > +device_shutdown_root_task(void *data) > +{ > + struct device *dev = (struct device *)data; > + > + device_shutdown_tree(dev); > + if (atomic_dec_return(&device_root_tasks) == 0) > + complete(&device_root_tasks_complete); > + return 0; > +} > + > /** > * device_shutdown - call ->shutdown() on each device to shutdown. > */ > void device_shutdown(void) > { > - struct device *dev, *parent; > + struct list_head *pos, *next; > + int root_devices = 0; > + struct device *dev; > > spin_lock(&devices_kset->list_lock); > /* > - * Walk the devices list backward, shutting down each in turn. > - * Beware that device unplug events may also start pulling > - * devices offline, even as the system is shutting down. > + * Prepare devices for shutdown: lock, and increment references in every > + * devices. Remove child devices from the list, and count number of root * Prepare devices for shutdown: lock, and increment reference in each * device. Remove child devices from the list, and count number of root Hope this helps, Tobin. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] drivers core: multi-threading device shutdown 2018-05-03 5:54 ` Tobin C. Harding @ 2018-05-03 13:38 ` Pavel Tatashin 0 siblings, 0 replies; 10+ messages in thread From: Pavel Tatashin @ 2018-05-03 13:38 UTC (permalink / raw) To: tobin Cc: Steven Sistare, Daniel Jordan, LKML, jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh On Thu, May 3, 2018 at 1:54 AM Tobin C. Harding <tobin@apporbit.com> wrote: > This code was a pleasure to read, super clean. Hi Tobin, Thank you very much for your review, I will address all of your comments in the next revision. BTW, I found a lock ordering issue in my work that that I will need to fix: In device_shutdown() device_lock() must be taken before devices_kset->list_lock. Instead I will use device_trylock(), and if that fails, will drop devices_kset->list_lock and acquiring the device lock outside, and check that the device is still in the list after taking the list lock again. Pavel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-05-03 14:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-03 3:59 [PATCH 0/2] multi-threading device shutdown Pavel Tatashin 2018-05-03 3:59 ` [PATCH 1/2] ixgbe: release lock for the duration of ixgbe_suspend_close() Pavel Tatashin 2018-05-03 13:20 ` Alexander Duyck 2018-05-03 13:30 ` Pavel Tatashin 2018-05-03 13:46 ` Alexander Duyck 2018-05-03 14:01 ` Steven Sistare 2018-05-03 14:23 ` Pavel Tatashin 2018-05-03 3:59 ` [PATCH 2/2] drivers core: multi-threading device shutdown Pavel Tatashin 2018-05-03 5:54 ` Tobin C. Harding 2018-05-03 13:38 ` Pavel Tatashin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).