Linux-Fsdevel Archive on
help / color / mirror / Atom feed
From: Ming Lei <>
To: Matthew Wilcox <>
Cc:, linux-mm <>,
	Linux FS Devel <>,
	linux-block <>
Subject: Re: [LSF/MM TOPIC] A high-performance userspace block driver
Date: Mon, 22 Jan 2018 20:18:06 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Thu, Jan 18, 2018 at 5:21 AM, Matthew Wilcox <> wrote:
> On Wed, Jan 17, 2018 at 10:49:24AM +0800, Ming Lei wrote:
>> Userfaultfd might be another choice:
>> 1) map the block LBA space into a range of process vm space
> That would limit the size of a block device to ~200TB (with my laptop's
> CPU).  That's probably OK for most users, but I suspect there are some
> who would chafe at such a restriction (before the 57-bit CPUs arrive).

In theory, it won't be a issue, since the LBA space can be partitioned into
more than one process's vm space, so no matter what the size of block device
is, this way should work.

>> 2) when READ/WRITE req comes, convert it to page fault on the
>> mapped range, and let userland to take control of it, and meantime
>> kernel req context is slept
> You don't want to sleep the request; you want it to be able to submit
> more I/O.  But we have infrastructure in place to inform the submitter
> when I/Os have completed.

Yes, the current bio completion(.end_bio) model can be respected, and
this issue(where to sleep) may depend on UFFD's read/POLLIN protocol.

>> 3) IO req context in kernel side is waken up after userspace completed
>> the IO request via userfaultfd
>> 4) kernel side continue to complete the IO, such as copying page from
>> storage range to req(bio) pages.
>> Seems READ should be fine since it is very similar with the use case
>> of QEMU postcopy live migration, WRITE can be a bit different, and
>> maybe need some change on userfaultfd.
> I like this idea, and maybe extending UFFD is the way to solve this
> problem.  Perhaps I should explain a little more what the requirements
> are.  At the point the driver gets the I/O, pages to copy data into (for
> a read) or copy data from (for a write) have already been allocated.
> At all costs, we need to avoid playing VM tricks (because TLB flushes
> are expensive).  So one copy is probably OK, but we'd like to avoid it
> if reasonable.

I agree, and one time of page copy can be easier to implement.

> Let's assume that the userspace program looks at the request metadata and
> decides that it needs to send a network request.  Ideally, it would find
> a way to have the data from the response land in the pre-allocated pages
> (for a read) or send the data straight from the pages in the request
> (for a write).  I'm not sure UFFD helps us with that part of the problem.

Ming Lei

To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to  For more info on Linux MM,
see: .
Don't email: <a href=mailto:""> </a>

  parent reply	other threads:[~2018-01-22 12:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-16 14:52 Matthew Wilcox
2018-01-16 23:04 ` Viacheslav Dubeyko
2018-01-16 23:23 ` Theodore Ts'o
2018-01-16 23:28   ` [Lsf-pc] " James Bottomley
2018-01-16 23:57     ` Bart Van Assche
2018-01-17  0:41 ` Bart Van Assche
2018-01-17  2:49 ` Ming Lei
2018-01-17 21:21   ` Matthew Wilcox
2018-01-22 12:02     ` Mike Rapoport
2018-01-22 12:18     ` Ming Lei [this message]
2018-01-18  5:27 ` Figo.zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \
    --subject='Re: [LSF/MM TOPIC] A high-performance userspace block driver' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).