Linux-Fsdevel Archive on
help / color / mirror / Atom feed
From: Matthew Wilcox <>
To: Ming Lei <>
Cc:, linux-mm <>,
	Linux FS Devel <>,
	linux-block <>
Subject: Re: [LSF/MM TOPIC] A high-performance userspace block driver
Date: Wed, 17 Jan 2018 13:21:44 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Jan 17, 2018 at 10:49:24AM +0800, Ming Lei wrote:
> Userfaultfd might be another choice:
> 1) map the block LBA space into a range of process vm space

That would limit the size of a block device to ~200TB (with my laptop's
CPU).  That's probably OK for most users, but I suspect there are some
who would chafe at such a restriction (before the 57-bit CPUs arrive).

> 2) when READ/WRITE req comes, convert it to page fault on the
> mapped range, and let userland to take control of it, and meantime
> kernel req context is slept

You don't want to sleep the request; you want it to be able to submit
more I/O.  But we have infrastructure in place to inform the submitter
when I/Os have completed.

> 3) IO req context in kernel side is waken up after userspace completed
> the IO request via userfaultfd
> 4) kernel side continue to complete the IO, such as copying page from
> storage range to req(bio) pages.
> Seems READ should be fine since it is very similar with the use case
> of QEMU postcopy live migration, WRITE can be a bit different, and
> maybe need some change on userfaultfd.

I like this idea, and maybe extending UFFD is the way to solve this
problem.  Perhaps I should explain a little more what the requirements
are.  At the point the driver gets the I/O, pages to copy data into (for
a read) or copy data from (for a write) have already been allocated.
At all costs, we need to avoid playing VM tricks (because TLB flushes
are expensive).  So one copy is probably OK, but we'd like to avoid it
if reasonable.

Let's assume that the userspace program looks at the request metadata and
decides that it needs to send a network request.  Ideally, it would find
a way to have the data from the response land in the pre-allocated pages
(for a read) or send the data straight from the pages in the request
(for a write).  I'm not sure UFFD helps us with that part of the problem.

To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to  For more info on Linux MM,
see: .
Don't email: <a href=mailto:""> </a>

  reply	other threads:[~2018-01-17 21:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-16 14:52 Matthew Wilcox
2018-01-16 23:04 ` Viacheslav Dubeyko
2018-01-16 23:23 ` Theodore Ts'o
2018-01-16 23:28   ` [Lsf-pc] " James Bottomley
2018-01-16 23:57     ` Bart Van Assche
2018-01-17  0:41 ` Bart Van Assche
2018-01-17  2:49 ` Ming Lei
2018-01-17 21:21   ` Matthew Wilcox [this message]
2018-01-22 12:02     ` Mike Rapoport
2018-01-22 12:18     ` Ming Lei
2018-01-18  5:27 ` Figo.zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [LSF/MM TOPIC] A high-performance userspace block driver' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).