LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH v2] nvme: avoid race in shutdown namespace removal @ 2021-09-02 9:20 Daniel Wagner 2021-09-06 8:01 ` Christoph Hellwig 2021-09-09 14:09 ` Christoph Hellwig 0 siblings, 2 replies; 5+ messages in thread From: Daniel Wagner @ 2021-09-02 9:20 UTC (permalink / raw) To: linux-nvme; +Cc: linux-kernel, Daniel Wagner, Hannes Reinecke, Keith Busch When we remove the siblings entry, we update ns->head->list, hence we can't separate the removal and test for being empty. They have to be in the same critical section to avoid a race. To avoid breaking the refcounting imbalance again, add a list empty check to nvme_find_ns_head. Fixes: 5396fdac56d8 ("nvme: fix refcounting imbalance when all paths are down") Cc: Hannes Reinecke <hare@suse.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Daniel Wagner <dwagner@suse.de> --- v2: - added nvme_find_ns_head fix as suggested by hch v1: - https://lore.kernel.org/linux-nvme/20210830093618.97657-1-dwagner@suse.de/ drivers/nvme/host/core.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 4a3a33f5f11c..ac9a61d1d011 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3524,7 +3524,9 @@ static struct nvme_ns_head *nvme_find_ns_head(struct nvme_subsystem *subsys, lockdep_assert_held(&subsys->lock); list_for_each_entry(h, &subsys->nsheads, entry) { - if (h->ns_id == nsid && nvme_tryget_ns_head(h)) + if (h->ns_id != nsid) + continue; + if (!list_empty(&h->list) && nvme_tryget_ns_head(h)) return h; } @@ -3836,6 +3838,10 @@ static void nvme_ns_remove(struct nvme_ns *ns) mutex_lock(&ns->ctrl->subsys->lock); list_del_rcu(&ns->siblings); + if (list_empty(&ns->head->list)) { + list_del_init(&ns->head->entry); + last_path = true; + } mutex_unlock(&ns->ctrl->subsys->lock); /* guarantee not available in head->list */ @@ -3856,13 +3862,6 @@ static void nvme_ns_remove(struct nvme_ns *ns) list_del_init(&ns->list); up_write(&ns->ctrl->namespaces_rwsem); - /* Synchronize with nvme_init_ns_head() */ - mutex_lock(&ns->head->subsys->lock); - if (list_empty(&ns->head->list)) { - list_del_init(&ns->head->entry); - last_path = true; - } - mutex_unlock(&ns->head->subsys->lock); if (last_path) nvme_mpath_shutdown_disk(ns->head); nvme_put_ns(ns); -- 2.29.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] nvme: avoid race in shutdown namespace removal 2021-09-02 9:20 [PATCH v2] nvme: avoid race in shutdown namespace removal Daniel Wagner @ 2021-09-06 8:01 ` Christoph Hellwig 2021-09-06 8:40 ` Hannes Reinecke 2021-09-09 14:09 ` Christoph Hellwig 1 sibling, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2021-09-06 8:01 UTC (permalink / raw) To: Daniel Wagner, Hannes Reinecke; +Cc: linux-nvme, linux-kernel, Keith Busch On Thu, Sep 02, 2021 at 11:20:02AM +0200, Daniel Wagner wrote: > When we remove the siblings entry, we update ns->head->list, hence we > can't separate the removal and test for being empty. They have to be > in the same critical section to avoid a race. > > To avoid breaking the refcounting imbalance again, add a list empty > check to nvme_find_ns_head. Hannes, can you look over this and run your tests on it? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] nvme: avoid race in shutdown namespace removal 2021-09-06 8:01 ` Christoph Hellwig @ 2021-09-06 8:40 ` Hannes Reinecke 2021-09-09 10:06 ` Hannes Reinecke 0 siblings, 1 reply; 5+ messages in thread From: Hannes Reinecke @ 2021-09-06 8:40 UTC (permalink / raw) To: Christoph Hellwig, Daniel Wagner; +Cc: linux-nvme, linux-kernel, Keith Busch On 9/6/21 10:01 AM, Christoph Hellwig wrote: > On Thu, Sep 02, 2021 at 11:20:02AM +0200, Daniel Wagner wrote: >> When we remove the siblings entry, we update ns->head->list, hence we >> can't separate the removal and test for being empty. They have to be >> in the same critical section to avoid a race. >> >> To avoid breaking the refcounting imbalance again, add a list empty >> check to nvme_find_ns_head. > > Hannes, can you look over this and run your tests on it? > I'm at it. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] nvme: avoid race in shutdown namespace removal 2021-09-06 8:40 ` Hannes Reinecke @ 2021-09-09 10:06 ` Hannes Reinecke 0 siblings, 0 replies; 5+ messages in thread From: Hannes Reinecke @ 2021-09-09 10:06 UTC (permalink / raw) To: Christoph Hellwig, Daniel Wagner; +Cc: linux-nvme, linux-kernel, Keith Busch On 9/6/21 10:40 AM, Hannes Reinecke wrote: > On 9/6/21 10:01 AM, Christoph Hellwig wrote: >> On Thu, Sep 02, 2021 at 11:20:02AM +0200, Daniel Wagner wrote: >>> When we remove the siblings entry, we update ns->head->list, hence we >>> can't separate the removal and test for being empty. They have to be >>> in the same critical section to avoid a race. >>> >>> To avoid breaking the refcounting imbalance again, add a list empty >>> check to nvme_find_ns_head. >> >> Hannes, can you look over this and run your tests on it? >> > I'm at it. > Finally. qemu being it's usual bitchy self. But managed to test the patch, and all looks good. For reference, the testcase is: - Create qemu instance with two NVMe namespaces: -device pcie-root-port,id=rp80,bus=pcie.0,chassis=2,addr=8.0,\ multifunction=on,pref64-reserve=32M \ -device pcie-root-port,id=rp90,bus=pcie.0,chassis=3,addr=9.0,\ multifunction=on,pref64-reserve=32M \ -device nvme-subsys,id=nvme-subsys1,nqn=slesnvmesubsys-1 \ -device nvme-subsys,id=nvme-subsys2,nqn=slesnvmesubsys-2 \ -device nvme,bus=rp80,id=nvme-rp80,serial=SLESNVME2,\ subsys=nvme-subsys1 \ -device nvme-ns,id=nvme-ns-2,bus=nvme-rp80,drive=nvme-2 \ -device nvme,bus=rp90,id=nvme-rp90,serial=SLESNVME3,\ subsys=nvme-subsys2 \ -device nvme-ns,id=nvme-ns-3,bus=nvme-rp90,drive=nvme-3 - Install the system, and create an MD RAID 1 on those namespaces. - Enter qemu monitor, and detach one controller: device_del nvme-rp90 - Check in the OS that the device has been removed, and MD has registered the device failure: # dmesg [ 1801.076236] pcieport 0000:00:09.0: pciehp: Slot(0-2): Attention button pressed [ 1801.076251] pcieport 0000:00:09.0: pciehp: Slot(0-2): Powering off due to button press [ 1806.250017] block nvme2n1: no available path - failing I/O [ 1806.250030] md: super_written gets error=-5 [ 1806.250036] md/raid1:md1: Disk failure on nvme2n1, disabling device. md/raid1:md1: Operation continuing on 1 devices. - Enter qemu monitor, and re-attach the controller: device_add bus=rp90,id=nvme-rp90,serial=SLESNVME3,subsys=nvme-subsys2 - Check in the OS that the device has been reattached: # dmesg [ 1845.634613] pcieport 0000:00:09.0: pciehp: Slot(0-2): Attention button pressed [ 1845.634626] pcieport 0000:00:09.0: pciehp: Slot(0-2) Powering on due to button press [ 1845.634821] pcieport 0000:00:09.0: pciehp: Slot(0-2): Card present [ 1845.634826] pcieport 0000:00:09.0: pciehp: Slot(0-2): Link Up [ 1845.770969] pci 0000:03:00.0: [1b36:0010] type 00 class 0x010802 [ 1845.771307] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] [ 1845.773503] pci 0000:03:00.0: BAR 0: assigned [mem 0xc1200000-0xc1203fff 64bit] [ 1845.773646] pcieport 0000:00:09.0: PCI bridge to [bus 03] [ 1845.773671] pcieport 0000:00:09.0: bridge window [io 0x7000-0x7fff] [ 1845.776926] pcieport 0000:00:09.0: bridge window [mem 0xc1200000-0xc13fffff] [ 1845.778816] pcieport 0000:00:09.0: bridge window [mem 0x804000000-0x805ffffff 64bit pref] [ 1845.783970] nvme nvme2: pci function 0000:03:00.0 [ 1845.784227] nvme 0000:03:00.0: enabling device (0000 -> 0002) [ 1845.798918] nvme nvme2: 1/0/0 default/read/poll queues - Reattach the namespace to the MD RAID: # mdadm --manage /dev/md1 --re-add /dev/nvme2n1 - Check that everything worked: # # cat /proc/mdstat Personalities : [raid1] md1 : active raid1 nvme2n1[1] nvme1n1[0] 4189184 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> So you can add: Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] nvme: avoid race in shutdown namespace removal 2021-09-02 9:20 [PATCH v2] nvme: avoid race in shutdown namespace removal Daniel Wagner 2021-09-06 8:01 ` Christoph Hellwig @ 2021-09-09 14:09 ` Christoph Hellwig 1 sibling, 0 replies; 5+ messages in thread From: Christoph Hellwig @ 2021-09-09 14:09 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-nvme, linux-kernel, Hannes Reinecke, Keith Busch Thanks, applied to nvme-5.15. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-09-09 14:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-02 9:20 [PATCH v2] nvme: avoid race in shutdown namespace removal Daniel Wagner 2021-09-06 8:01 ` Christoph Hellwig 2021-09-06 8:40 ` Hannes Reinecke 2021-09-09 10:06 ` Hannes Reinecke 2021-09-09 14:09 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).