From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755223AbbAOSWH (ORCPT ); Thu, 15 Jan 2015 13:22:07 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:59357 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752269AbbAOSWF (ORCPT ); Thu, 15 Jan 2015 13:22:05 -0500 Date: Thu, 15 Jan 2015 13:22:03 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Christoph Hellwig , Tejun Heo cc: Bart Van Assche , James Bottomley , Hannes Reinecke , "linux-scsi@vger.kernel.org" , Greg Kroah-Hartman , Kernel development list Subject: sysfs methods can race with ->remove In-Reply-To: <20150115160612.GA31446@infradead.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tejun: The context is that we have been talking about drivers/scsi/scsi_scan.c:scsi_rescan_device(), which is called by the store_rescan_field() sysfs method in scsi_sysfs.c. The problem is this: What happens in scsi_rescan_device if the device is unbound from its driver before the module_put call? The dev->driver->owner calculation would dereference a NULL pointer. On Thu, 15 Jan 2015, Christoph Hellwig wrote: > On Wed, Jan 14, 2015 at 10:07:00AM -0500, Alan Stern wrote: > > and the kernfs core insures that the underlying device won't be > > deallocated while a sysfs method runs. > > It has a reference to keep it from beeing freed, but so far I can't find > anything that prevents ->remove from beeing called while we are in or > just before a method call. There are two types of methods to think about: Those registered by the subsystem and those registered by the driver. If a method is registered by the driver, then the driver will unregister it when the ->remove routine runs. I don't know for certain, but I would expect that the sysfs/kernfs core will make sure that any existing method calls complete before unregister returns. This would prevent races. If a method is registered by the subsystem, and if the method runs entirely within the subsystem's code, then ->remove doesn't matter. The driver could be unbound while the method is running and it would be okay. The only time we have a problem is when the method is registered by the subsystem and the method calls into the driver. (Note that this is exactly what happens with scsi_rescan_device.) > > > But this seems like a more generic problem, and at least a quick glance at > > > the pci_driver methods seems like others don't have a good > > > synchroniation of ->remove against random driver methods. > > > > Can you give one or two examples? > > I look at the sriov_configure PCI method, or the various sub-methods > under pci_driver.err_handler. The sriov_numvfs_store method does have the same problem, and so does the reset_store method (by way of pci_reset_function -> pci_dev_save_and_disable -> pci_reset_notify). Tejun, is my analysis correct? How should we fix these races? Alan Stern