LKML Archive on
help / color / mirror / Atom feed
From: Nick Piggin <>
To: Christoph Hellwig <>,
	Mark Fasheh <>,
	Linux Filesystems <>,
	Linux Kernel <>,
	Andrew Morton <>
Subject: Re: [patch 2/3] fs: introduce perform_write aop
Date: Wed, 14 Mar 2007 14:30:24 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Sat, Mar 10, 2007 at 09:25:41AM +0000, Christoph Hellwig wrote:
> On Fri, Mar 09, 2007 at 03:33:01PM -0800, Mark Fasheh wrote:
> > ->kernel_write() as opposed to genericizing ->perform_write() would be fine
> > with me. Just so long as we get rid of ->prepare_write and ->commit_write in
> > that other kernel code doesn't call them directly. That interface just
> > doesn't work for Ocfs2.
> It doesn't work for any filesystem that needs slightly fancy locking.
> That and the reason that's an interface that doesn't fit into our
> layering is why I want to get rid of it.  Note that fops->kernel_write
> might in fact use ->perform_write with an actor as Nick suggested.
> I'm not quite sure how it'll look like - I'd rather take care of the
> buffered write path first and then handle this issue once the first
> changes have stabilized.

So I've tried a different approach - the 2-op API rather than an actor.

perform_write stays around as a higher performance API, but it isn't
required if the filesystem implements the 2-op API. I've called them
write_begin/write_end for now.

There are a few upshots to doing this rather than the actor approach.
First of all, this is what callers expect, they want to write into the
page directly rather than making an actor.

More importantly, it allows us to implement generic block versions of
the API which is much more reusable than block_perform_write (which was
basically useless for anything more than ext2).

The API calls for the filesystem to find and lock the page itself, and
pass that up to the caller, as well as handle short-writes properly, so
we can solve this deadlock properly.

The nice thing about this is that write_begin is basically a single
page case of the first half (before the iovec copy) of perform_write,
and write_end is the single page case of the second half (after the
copy). So any filesystem that implements perform_write should be able
to reuse write_begin/write_end components.

Anyway, I'm attaching the top patches of my stack (underneath are the
initial patches to solve prepare_write deadlock -- I'll repost the
complete set once I get some more feedback). Sorry it is a bit under
commented and not stress tested. However it does boot and run with
ext2/3/shmem and other simplefses.

Mark has been providing some helpful advice about the new 2-op interface,
but it is still pretty much up in the air.

Also note that a _lot_ of crud is there to support prepare_write (eg.
pagecache_write_begin/pagecache_write_end basically become single liners
once we get rid of the old API).

Anyway, I think I should throw this out for comments before investing too
much more time, in case everyone hates it ;)

Anyway, comments much appreciated...

  parent reply	other threads:[~2007-03-14 13:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-08 13:07 [rfc][patch 0/3] a faster buffered write deadlock fix? Nick Piggin
2007-02-08 13:07 ` [patch 1/3] fs: add an iovec iterator Nick Piggin
2007-02-08 19:49   ` Christoph Hellwig
2007-02-09  1:46     ` Nick Piggin
2007-02-09  2:03       ` Nate Diller
2007-02-09  3:31         ` Nick Piggin
2007-02-09 17:28           ` Zach Brown
2007-03-09 10:40         ` Christoph Hellwig
2007-02-08 23:04   ` Mark Fasheh
2007-02-08 13:07 ` [patch 2/3] fs: introduce perform_write aop Nick Piggin
2007-03-09 10:39   ` Christoph Hellwig
2007-03-09 12:52     ` Nick Piggin
2007-03-09 22:01       ` Anton Altaparmakov
2007-03-09 23:33     ` Mark Fasheh
2007-03-10  9:25       ` Christoph Hellwig
2007-03-12  2:13         ` Mark Fasheh
2007-03-14 13:30         ` Nick Piggin [this message]
2007-03-14 15:17           ` Christoph Hellwig
2007-02-08 13:07 ` [patch 3/3] ext2: use " Nick Piggin
2007-02-08 14:47   ` Dmitriy Monakhov
2007-02-09 19:14   ` Andrew Morton
2007-02-09 19:45     ` Andrew Morton
2007-02-10  1:34       ` Nick Piggin
2007-02-10  1:50         ` Andrew Morton
2007-02-09  0:38 ` [rfc][patch 0/3] a faster buffered write deadlock fix? Mark Fasheh
2007-02-09  2:04   ` Nick Piggin
2007-02-09  8:41 ` Andrew Morton
2007-02-09  9:54   ` Nick Piggin
2007-02-09 10:09     ` Andrew Morton
2007-02-09 10:32       ` Nick Piggin
2007-02-09 10:52         ` Andrew Morton
2007-02-09 11:31           ` Nick Piggin
2007-02-09 11:46             ` Andrew Morton
2007-02-09 12:11               ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [patch 2/3] fs: introduce perform_write aop' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).