LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/3] add perform_write to a_ops
@ 2008-02-04 17:04 Miklos Szeredi
2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm
a_ops->perform_write() was left out from Nick Piggin's new a_ops
patchset, as it was non-essential, and postponed for later inclusion.
This short series reintroduces it, but only adds the fuse
implementation and not simple_perform_write(), which I'm not sure
would be a significant improvement.
This allows larger than 4k buffered writes for fuse, which is one of
the most requested features.
This goes on top of the "fuse: writable mmap" patches.
Thanks,
Miklos
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch 1/3] vfs: introduce perform_write in a_ops
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm, Nick Piggin
[-- Attachment #1: perform_write.patch --]
[-- Type: text/plain, Size: 3534 bytes --]
From: Nick Piggin <npiggin@suse.de>
Introduce a new perform_write() address space operation.
This is a single-call, bulk version of write_begin/write_end
operations. It is only used in the buffered write path (write_begin
must still be implemented), and not for in-kernel writes to pagecache.
For some filesystems, using this can provide significant speedups.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h 2008-02-04 15:24:03.000000000 +0100
+++ linux/include/linux/fs.h 2008-02-04 16:24:19.000000000 +0100
@@ -469,6 +469,9 @@ struct address_space_operations {
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
+ ssize_t (*perform_write)(struct file *, struct address_space *mapping,
+ struct iov_iter *i, loff_t pos);
+
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned long);
Index: linux/mm/filemap.c
===================================================================
--- linux.orig/mm/filemap.c 2008-02-04 15:24:03.000000000 +0100
+++ linux/mm/filemap.c 2008-02-04 16:22:55.000000000 +0100
@@ -2312,7 +2312,9 @@ generic_file_buffered_write(struct kiocb
struct iov_iter i;
iov_iter_init(&i, iov, nr_segs, count, written);
- if (a_ops->write_begin)
+ if (a_ops->perform_write)
+ status = a_ops->perform_write(file, mapping, &i, pos);
+ else if (a_ops->write_begin)
status = generic_perform_write(file, &i, pos);
else
status = generic_perform_write_2copy(file, &i, pos);
Index: linux/Documentation/filesystems/vfs.txt
===================================================================
--- linux.orig/Documentation/filesystems/vfs.txt 2008-02-04 12:28:50.000000000 +0100
+++ linux/Documentation/filesystems/vfs.txt 2008-02-04 16:23:44.000000000 +0100
@@ -533,6 +533,9 @@ struct address_space_operations {
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
+ ssize_t (*perform_write)(struct file *, struct address_space *mapping,
+ struct iov_iter *i, loff_t pos);
+
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
@@ -664,6 +667,17 @@ struct address_space_operations {
Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
that were able to be copied into pagecache.
+ perform_write: This is a single-call, bulk version of write_begin/write_end
+ operations. It is only used in the buffered write path (write_begin
+ must still be implemented), and not for in-kernel writes to pagecache.
+ It takes an iov_iter structure, which provides a descriptor for the
+ source data (and has associated iov_iter_xxx helpers to operate on
+ that data). There are also file, mapping, and pos arguments, which
+ specify the destination of the data.
+
+ Returns < 0 on failure if nothing was written out, otherwise returns
+ the number of bytes copied into pagecache.
+
bmap: called by the VFS to map a logical block offset within object to
physical block number. This method is used by the FIBMAP
ioctl and for working with swap-files. To be able to swap to
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch 2/3] fuse: clean up setting i_size in write
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm
[-- Attachment #1: fuse_write_update_size.patch --]
[-- Type: text/plain, Size: 1851 bytes --]
From: Miklos Szeredi <mszeredi@suse.cz>
Extract common code for setting i_size in write functions into a
common helper.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2008-02-04 13:01:39.000000000 +0100
+++ linux/fs/fuse/file.c 2008-02-04 13:02:03.000000000 +0100
@@ -610,13 +610,24 @@ static int fuse_write_begin(struct file
return 0;
}
+static void fuse_write_update_size(struct inode *inode, loff_t pos)
+{
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fc->lock);
+ fi->attr_version = ++fc->attr_version;
+ if (pos > inode->i_size)
+ i_size_write(inode, pos);
+ spin_unlock(&fc->lock);
+}
+
static int fuse_buffered_write(struct file *file, struct inode *inode,
loff_t pos, unsigned count, struct page *page)
{
int err;
size_t nres;
struct fuse_conn *fc = get_fuse_conn(inode);
- struct fuse_inode *fi = get_fuse_inode(inode);
unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
struct fuse_req *req;
@@ -643,12 +654,7 @@ static int fuse_buffered_write(struct fi
err = -EIO;
if (!err) {
pos += nres;
- spin_lock(&fc->lock);
- fi->attr_version = ++fc->attr_version;
- if (pos > inode->i_size)
- i_size_write(inode, pos);
- spin_unlock(&fc->lock);
-
+ fuse_write_update_size(inode, pos);
if (count == PAGE_CACHE_SIZE)
SetPageUptodate(page);
}
@@ -766,12 +772,8 @@ static ssize_t fuse_direct_io(struct fil
}
fuse_put_request(fc, req);
if (res > 0) {
- if (write) {
- spin_lock(&fc->lock);
- if (pos > inode->i_size)
- i_size_write(inode, pos);
- spin_unlock(&fc->lock);
- }
+ if (write)
+ fuse_write_update_size(inode, pos);
*ppos = pos;
}
fuse_invalidate_attr(inode);
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch 3/3] fuse: implement perform_write
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm, Nick Piggin
[-- Attachment #1: fuse_perform_write.patch --]
[-- Type: text/plain, Size: 4282 bytes --]
From: Nick Piggin <npiggin@suse.de>
Introduce fuse_perform_write. With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).
[mszeredi@suse.cz]:
- split into smaller functions
- testing
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c 2008-02-04 17:11:18.000000000 +0100
+++ linux/fs/fuse/file.c 2008-02-04 17:11:59.000000000 +0100
@@ -677,6 +677,148 @@ static int fuse_write_end(struct file *f
return res;
}
+static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file,
+ struct inode *inode, loff_t pos,
+ size_t count)
+{
+ size_t res;
+ unsigned offset;
+ unsigned i;
+
+ for (i = 0; i < req->num_pages; i++)
+ fuse_wait_on_page_writeback(inode, req->pages[i]->index);
+
+ res = fuse_send_write(req, file, inode, pos, count, NULL);
+
+ offset = req->page_offset;
+ count = res;
+ for (i = 0; i < req->num_pages; i++) {
+ struct page *page = req->pages[i];
+
+ if (!req->out.h.error && !offset && count >= PAGE_CACHE_SIZE)
+ SetPageUptodate(page);
+
+ /* Just ignore count underflow on last page */
+ count -= PAGE_CACHE_SIZE - offset;
+ offset = 0;
+
+ unlock_page(page);
+ page_cache_release(page);
+ }
+
+ return res;
+}
+
+static ssize_t fuse_fill_write_pages(struct fuse_req *req,
+ struct address_space *mapping,
+ struct iov_iter *ii, loff_t pos)
+{
+ struct fuse_conn *fc = get_fuse_conn(mapping->host);
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ size_t count = 0;
+ int err;
+
+ req->page_offset = offset;
+
+ do {
+ size_t tmp;
+ struct page *page;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ size_t bytes = min_t(size_t, PAGE_CACHE_SIZE - offset,
+ iov_iter_count(ii));
+
+ bytes = min_t(size_t, bytes, fc->max_write - count);
+
+ again:
+ err = -EFAULT;
+ if (iov_iter_fault_in_readable(ii, bytes))
+ break;
+
+ err = -ENOMEM;
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ break;
+
+ pagefault_disable();
+ tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes);
+ pagefault_enable();
+ flush_dcache_page(page);
+
+ if (!tmp) {
+ unlock_page(page);
+ page_cache_release(page);
+ bytes = min(bytes, iov_iter_single_seg_count(ii));
+ goto again;
+ }
+
+ err = 0;
+ req->pages[req->num_pages] = page;
+ req->num_pages++;
+
+ iov_iter_advance(ii, tmp);
+ count += tmp;
+ pos += tmp;
+ offset += tmp;
+ if (offset == PAGE_CACHE_SIZE)
+ offset = 0;
+
+ } while (iov_iter_count(ii) && count < fc->max_write &&
+ req->num_pages < FUSE_MAX_PAGES_PER_REQ && offset == 0);
+
+ return count > 0 ? count : err;
+}
+
+static ssize_t fuse_perform_write(struct file *file,
+ struct address_space *mapping,
+ struct iov_iter *ii, loff_t pos)
+{
+ struct inode *inode = mapping->host;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ int err = 0;
+ ssize_t res = 0;
+
+ if (is_bad_inode(inode))
+ return -EIO;
+
+ do {
+ struct fuse_req *req;
+ ssize_t count;
+
+ req = fuse_get_req(fc);
+ if (IS_ERR(req)) {
+ err = PTR_ERR(req);
+ break;
+ }
+
+ count = fuse_fill_write_pages(req, mapping, ii, pos);
+ if (count <= 0) {
+ err = count;
+ } else {
+ size_t num_written;
+
+ num_written = fuse_send_write_pages(req, file, inode,
+ pos, count);
+ err = req->out.h.error;
+ if (!err) {
+ res += num_written;
+ pos += num_written;
+
+ /* break out of the loop on short write */
+ if (num_written != count)
+ err = -EIO;
+ }
+ }
+ fuse_put_request(fc, req);
+ } while (!err && iov_iter_count(ii));
+
+ if (res > 0)
+ fuse_write_update_size(inode, pos);
+
+ fuse_invalidate_attr(inode);
+
+ return res > 0 ? res : err;
+}
+
static void fuse_release_user_pages(struct fuse_req *req, int write)
{
unsigned i;
@@ -1247,6 +1389,7 @@ static const struct address_space_operat
.launder_page = fuse_launder_page,
.write_begin = fuse_write_begin,
.write_end = fuse_write_end,
+ .perform_write = fuse_perform_write,
.readpages = fuse_readpages,
.set_page_dirty = __set_page_dirty_nobuffers,
.bmap = fuse_bmap,
--
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 0/3] add perform_write to a_ops
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
` (2 preceding siblings ...)
2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
@ 2008-02-04 19:39 ` Christoph Hellwig
2008-02-04 20:52 ` Miklos Szeredi
3 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2008-02-04 19:39 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel, linux-mm
On Mon, Feb 04, 2008 at 06:04:10PM +0100, Miklos Szeredi wrote:
> a_ops->perform_write() was left out from Nick Piggin's new a_ops
> patchset, as it was non-essential, and postponed for later inclusion.
>
> This short series reintroduces it, but only adds the fuse
> implementation and not simple_perform_write(), which I'm not sure
> would be a significant improvement.
>
> This allows larger than 4k buffered writes for fuse, which is one of
> the most requested features.
>
> This goes on top of the "fuse: writable mmap" patches.
Please don't do this, but rather implement your own .aio_write. There's
very little in generic_file_aio_write that wouldn't be handle by
->perform_write and we should rather factor those up or move to higher
layers than adding this ill-defined abstraction.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 0/3] add perform_write to a_ops
2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
@ 2008-02-04 20:52 ` Miklos Szeredi
2008-02-04 20:59 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 20:52 UTC (permalink / raw)
To: hch; +Cc: npiggin, akpm, linux-kernel, linux-fsdevel, linux-mm
> > a_ops->perform_write() was left out from Nick Piggin's new a_ops
> > patchset, as it was non-essential, and postponed for later inclusion.
> >
> > This short series reintroduces it, but only adds the fuse
> > implementation and not simple_perform_write(), which I'm not sure
> > would be a significant improvement.
> >
> > This allows larger than 4k buffered writes for fuse, which is one of
> > the most requested features.
> >
> > This goes on top of the "fuse: writable mmap" patches.
>
> Please don't do this, but rather implement your own .aio_write. There's
> very little in generic_file_aio_write that wouldn't be handle by
> ->perform_write and we should rather factor those up or move to higher
> layers than adding this ill-defined abstraction.
>
Moving up to higher layers might not be possible, due to lock/unlock
of i_mutex being inside generic_file_aio_write().
But with fuse being the only user, it's not a huge issue duplicating
some code.
Nick, were there any other candidates, that would want to use such an
interface in the future?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 0/3] add perform_write to a_ops
2008-02-04 20:52 ` Miklos Szeredi
@ 2008-02-04 20:59 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2008-02-04 20:59 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: hch, npiggin, akpm, linux-kernel, linux-fsdevel, linux-mm
On Mon, Feb 04, 2008 at 09:52:06PM +0100, Miklos Szeredi wrote:
> Moving up to higher layers might not be possible, due to lock/unlock
> of i_mutex being inside generic_file_aio_write().
Well some bits can be moved up. Here's my grand plan which I plan
to implement once I get some time for it (or let someone else do
if they beat me):
- generic_segment_checks goes to fs/read_write.c before caling into
the filesystem
- dito for vfs_check_frozen
- generic_write_checks is a suitable helper already
- dito for remove_suid
- dito for file_update_time
- after that there's not a whole lot left in generic_file_aio_write,
except for direct I/O handling which will probably be very fs-specific
if you have your own buffered I/O code
generic_file_buffered_write is an almost trivial wrapper around what's
->perform_write in Nick's earlier patches and a helper for the syncing
activity.
>
> But with fuse being the only user, it's not a huge issue duplicating
> some code.
>
> Nick, were there any other candidates, that would want to use such an
> interface in the future?
>
> Thanks,
> Miklos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-04 21:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
2008-02-04 20:52 ` Miklos Szeredi
2008-02-04 20:59 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).