Linux-Fsdevel Archive on
help / color / mirror / Atom feed
From: Dave Chinner <>
To: Matthew Wilcox <>
Subject: Re: Support for I/O to a bitbucket
Date: Mon, 7 Sep 2020 10:56:42 +1000	[thread overview]
Message-ID: <20200907005642.GN12096@dread.disaster.area> (raw)
In-Reply-To: <>

On Tue, Aug 18, 2020 at 06:22:31PM +0100, Matthew Wilcox wrote:
> One of the annoying things in the iomap code is how we handle
> block-misaligned I/Os.  Consider a write to a file on a 4KiB block size
> filesystem (on a 4KiB page size kernel) which starts at byte offset 5000
> and is 4133 bytes long.
> Today, we allocate page 1 and read bytes 4096-8191 of the file into
> it, synchronously.  Then we allocate page 2 and read bytes 8192-12287
> into it, again, synchronously.  Then we copy the user's data into the
> pagecache and mark it dirty.  This is a fairly significant delay for
> the user who normally sees the latency of a memcpy() now has to wait
> for two non-overlapping reads to complete.
> What I'd love to be able to do is allocate pages 1 & 2, copy the user
> data into it and submit one read which targets:
> 0-903: page 1, offset 0, length 904
> 904-5036: bitbucket, length 4133
> 5037-8191: page 2, offset 942, length 3155
> That way, we don't even need to wait for the read to complete.

I'm not sure that offloading the page cache's job of isolating
unaligned IO from the block layer to the block layer is the write
way to do this.

Essentially you are moving the RMW down in the block layer where it
will have to allocate memory to do IO on sector based boundaries so
it doesn't trash the data you've already copied into the pages in
the bio.

Either way, you need a secondary buffer to do this - one for the
read IO to DMA into with sector alignment, the other to contain the
user data that is sungle byte aligned.

This seems to me like it could be done entirely at the iomap level
just by linking the async read IO buffer back to the page cache page
and holding the "data to copy in" state in a struct attached to the
async IO buffer's page->private. It adds a little complexity to the
read IO completion (i.e. iomap_read_finish()), but it's no worse
than anything we do with write IO completions...

And if the two pages are adjacent like the above, it could be done
with a single async reads, or even two separate async reads that
get merged into one IO at the block layer via plugging...

> Anyway, I don't have time to take on this work, but I thought I'd throw
> it out in case anyone's looking for a project.  Or if it's a stupid idea,
> someone can point out why.

I think it's pretty straight forward to do it in the iomap layer...


Dave Chinner

      reply	other threads:[~2020-09-07  0:56 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-18 17:22 Matthew Wilcox
2020-09-07  0:56 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200907005642.GN12096@dread.disaster.area \ \ \ \ \ \
    --subject='Re: Support for I/O to a bitbucket' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).