From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755333AbbAOTkg (ORCPT ); Thu, 15 Jan 2015 14:40:36 -0500 Received: from mail-qg0-f46.google.com ([209.85.192.46]:58692 "EHLO mail-qg0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751423AbbAOTkf (ORCPT ); Thu, 15 Jan 2015 14:40:35 -0500 Date: Thu, 15 Jan 2015 14:40:31 -0500 From: Tejun Heo To: Alan Stern Cc: Christoph Hellwig , Bart Van Assche , James Bottomley , Hannes Reinecke , "linux-scsi@vger.kernel.org" , Greg Kroah-Hartman , Kernel development list Subject: Re: sysfs methods can race with ->remove Message-ID: <20150115194031.GE28195@htj.dyndns.org> References: <20150115160612.GA31446@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Alan. On Thu, Jan 15, 2015 at 01:22:03PM -0500, Alan Stern wrote: > > It has a reference to keep it from beeing freed, but so far I can't find > > anything that prevents ->remove from beeing called while we are in or > > just before a method call. > > There are two types of methods to think about: Those registered by the > subsystem and those registered by the driver. > > If a method is registered by the driver, then the driver will > unregister it when the ->remove routine runs. I don't know for > certain, but I would expect that the sysfs/kernfs core will make sure > that any existing method calls complete before unregister returns. > This would prevent races. Yes, attribute deletions are blocked till the on-going sysfs read/write operations are finished and further rw accesses are failed. > If a method is registered by the subsystem, and if the method runs > entirely within the subsystem's code, then ->remove doesn't matter. > The driver could be unbound while the method is running and it would be > okay. > > The only time we have a problem is when the method is registered by the > subsystem and the method calls into the driver. (Note that this is > exactly what happens with scsi_rescan_device.) > > > > > But this seems like a more generic problem, and at least a quick glance at > > > > the pci_driver methods seems like others don't have a good > > > > synchroniation of ->remove against random driver methods. > > > > > > Can you give one or two examples? > > > > I look at the sriov_configure PCI method, or the various sub-methods > > under pci_driver.err_handler. > > The sriov_numvfs_store method does have the same problem, and so does > the reset_store method (by way of pci_reset_function -> > pci_dev_save_and_disable -> pci_reset_notify). > > Tejun, is my analysis correct? How should we fix these races? I'm not really following what the actual problem case is, so SCSI subsystem store methods are derefing dev->driver without synchronizing against detach events? If that's the case, the solution would be synchronizing against attach/detach events? Sorry if I'm being totally idiotic. I'm having a bit of hard time jumping right in. :) Thanks. -- tejun