LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Matteo Croce <mcroce@linux.microsoft.com>
To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	"Lennart Poettering" <lennart@poettering.net>,
	"Luca Boccassi" <bluca@debian.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Damien Le Moal" <damien.lemoal@wdc.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Javier González" <javier@javigon.com>,
	"Niklas Cassel" <niklas.cassel@wdc.com>,
	"Johannes Thumshirn" <johannes.thumshirn@wdc.com>,
	"Hannes Reinecke" <hare@suse.de>,
	"Matthew Wilcox" <willy@infradead.org>,
	JeffleXu <jefflexu@linux.alibaba.com>
Subject: [PATCH v5 1/6] block: add disk sequence number
Date: Tue, 13 Jul 2021 01:05:25 +0200	[thread overview]
Message-ID: <20210712230530.29323-2-mcroce@linux.microsoft.com> (raw)
In-Reply-To: <20210712230530.29323-1-mcroce@linux.microsoft.com>

From: Matteo Croce <mcroce@microsoft.com>

Associating uevents with block devices in userspace is difficult and racy:
the uevent netlink socket is lossy, and on slow and overloaded systems
has a very high latency.
Block devices do not have exclusive owners in userspace, any process can
set one up (e.g. loop devices). Moreover, device names can be reused
(e.g. loop0 can be reused again and again). A userspace process setting
up a block device and watching for its events cannot thus reliably tell
whether an event relates to the device it just set up or another earlier
instance with the same name.

Being able to set a UUID on a loop device would solve the race conditions.
But it does not allow to derive orderings from uevents: if you see a
uevent with a UUID that does not match the device you are waiting for,
you cannot tell whether it's because the right uevent has not arrived yet,
or it was already sent and you missed it. So you cannot tell whether you
should wait for it or not.

Associating a unique, monotonically increasing sequential number to the
lifetime of each block device, which can be retrieved with an ioctl
immediately upon setting it up, allows to solve the race conditions with
uevents, and also allows userspace processes to know whether they should
wait for the uevent they need or if it was dropped and thus they should
move on.

Additionally, increment the disk sequence number when the media change,
i.e. on DISK_EVENT_MEDIA_CHANGE event.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
---
 block/disk-events.c   |  3 +++
 block/genhd.c         | 24 ++++++++++++++++++++++++
 include/linux/genhd.h |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/block/disk-events.c b/block/disk-events.c
index a75931ff5da4..04c52f3992ed 100644
--- a/block/disk-events.c
+++ b/block/disk-events.c
@@ -190,6 +190,9 @@ static void disk_check_events(struct disk_events *ev,
 
 	spin_unlock_irq(&ev->lock);
 
+	if (events & DISK_EVENT_MEDIA_CHANGE)
+		inc_diskseq(disk);
+
 	/*
 	 * Tell userland about new events.  Only the events listed in
 	 * @disk->events are reported, and only if DISK_EVENT_FLAG_UEVENT
diff --git a/block/genhd.c b/block/genhd.c
index af4d2ab4a633..0be32dbe97bb 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,6 +29,23 @@
 
 static struct kobject *block_depr;
 
+/*
+ * Unique, monotonically increasing sequential number associated with block
+ * devices instances (i.e. incremented each time a device is attached).
+ * Associating uevents with block devices in userspace is difficult and racy:
+ * the uevent netlink socket is lossy, and on slow and overloaded systems has
+ * a very high latency.
+ * Block devices do not have exclusive owners in userspace, any process can set
+ * one up (e.g. loop devices). Moreover, device names can be reused (e.g. loop0
+ * can be reused again and again).
+ * A userspace process setting up a block device and watching for its events
+ * cannot thus reliably tell whether an event relates to the device it just set
+ * up or another earlier instance with the same name.
+ * This sequential number allows userspace processes to solve this problem, and
+ * uniquely associate an uevent to the lifetime to a device.
+ */
+static atomic64_t diskseq;
+
 /* for extended dynamic devt allocation, currently only one major is used */
 #define NR_EXT_DEVT		(1 << MINORBITS)
 static DEFINE_IDA(ext_devt_ida);
@@ -1263,6 +1280,8 @@ struct gendisk *__alloc_disk_node(int minors, int node_id)
 	disk_to_dev(disk)->class = &block_class;
 	disk_to_dev(disk)->type = &disk_type;
 	device_initialize(disk_to_dev(disk));
+	inc_diskseq(disk);
+
 	return disk;
 
 out_destroy_part_tbl:
@@ -1363,3 +1382,8 @@ int bdev_read_only(struct block_device *bdev)
 	return bdev->bd_read_only || get_disk_ro(bdev->bd_disk);
 }
 EXPORT_SYMBOL(bdev_read_only);
+
+void inc_diskseq(struct gendisk *disk)
+{
+	disk->diskseq = atomic64_inc_return(&diskseq);
+}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 13b34177cc85..140c028845af 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -172,6 +172,7 @@ struct gendisk {
 	int node_id;
 	struct badblocks *bb;
 	struct lockdep_map lockdep_map;
+	u64 diskseq;
 };
 
 /*
@@ -332,6 +333,7 @@ static inline void bd_unlink_disk_holder(struct block_device *bdev,
 #endif /* CONFIG_SYSFS */
 
 dev_t part_devt(struct gendisk *disk, u8 partno);
+void inc_diskseq(struct gendisk *disk);
 dev_t blk_lookup_devt(const char *name, int partno);
 void blk_request_module(dev_t devt);
 #ifdef CONFIG_BLOCK
-- 
2.31.1


  reply	other threads:[~2021-07-12 23:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-12 23:05 [PATCH v5 0/5] block: add a sequence number to disks Matteo Croce
2021-07-12 23:05 ` Matteo Croce [this message]
2021-07-12 23:05 ` [PATCH v5 2/6] block: export the diskseq in uevents Matteo Croce
2021-07-12 23:05 ` [PATCH v5 3/6] block: add ioctl to read the disk sequence number Matteo Croce
2021-07-12 23:05 ` [PATCH v5 4/6] block: export diskseq in sysfs Matteo Croce
2021-07-12 23:05 ` [PATCH v5 5/6] block: add a helper to raise a media changed event Matteo Croce
2021-07-12 23:05 ` [PATCH v5 6/6] loop: raise media_change event Matteo Croce
2021-07-13  6:03   ` Christoph Hellwig
2021-07-20 17:27 ` [PATCH v5 0/5] block: add a sequence number to disks Luca Boccassi
2021-07-22 11:41   ` Matteo Croce
2021-07-28 19:01   ` Lennart Poettering
2021-07-28 19:22 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210712230530.29323-2-mcroce@linux.microsoft.com \
    --to=mcroce@linux.microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=bluca@debian.org \
    --cc=damien.lemoal@wdc.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=javier@javigon.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=lennart@poettering.net \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=niklas.cassel@wdc.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH v5 1/6] block: add disk sequence number' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).