LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/3] add perform_write to a_ops
@ 2008-02-04 17:04 Miklos Szeredi
  2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm

a_ops->perform_write() was left out from Nick Piggin's new a_ops
patchset, as it was non-essential, and postponed for later inclusion.

This short series reintroduces it, but only adds the fuse
implementation and not simple_perform_write(), which I'm not sure
would be a significant improvement.

This allows larger than 4k buffered writes for fuse, which is one of
the most requested features.

This goes on top of the "fuse: writable mmap" patches.

Thanks,
Miklos

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 1/3] vfs: introduce perform_write in a_ops
  2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
  2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm, Nick Piggin

[-- Attachment #1: perform_write.patch --]
[-- Type: text/plain, Size: 3534 bytes --]

From: Nick Piggin <npiggin@suse.de>

Introduce a new perform_write() address space operation.

This is a single-call, bulk version of write_begin/write_end
operations.  It is only used in the buffered write path (write_begin
must still be implemented), and not for in-kernel writes to pagecache.

For some filesystems, using this can provide significant speedups.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-02-04 15:24:03.000000000 +0100
+++ linux/include/linux/fs.h	2008-02-04 16:24:19.000000000 +0100
@@ -469,6 +469,9 @@ struct address_space_operations {
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
 
+	ssize_t (*perform_write)(struct file *, struct address_space *mapping,
+				struct iov_iter *i, loff_t pos);
+
 	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidatepage) (struct page *, unsigned long);
Index: linux/mm/filemap.c
===================================================================
--- linux.orig/mm/filemap.c	2008-02-04 15:24:03.000000000 +0100
+++ linux/mm/filemap.c	2008-02-04 16:22:55.000000000 +0100
@@ -2312,7 +2312,9 @@ generic_file_buffered_write(struct kiocb
 	struct iov_iter i;
 
 	iov_iter_init(&i, iov, nr_segs, count, written);
-	if (a_ops->write_begin)
+	if (a_ops->perform_write)
+		status = a_ops->perform_write(file, mapping, &i, pos);
+	else if (a_ops->write_begin)
 		status = generic_perform_write(file, &i, pos);
 	else
 		status = generic_perform_write_2copy(file, &i, pos);
Index: linux/Documentation/filesystems/vfs.txt
===================================================================
--- linux.orig/Documentation/filesystems/vfs.txt	2008-02-04 12:28:50.000000000 +0100
+++ linux/Documentation/filesystems/vfs.txt	2008-02-04 16:23:44.000000000 +0100
@@ -533,6 +533,9 @@ struct address_space_operations {
 	int (*write_end)(struct file *, struct address_space *mapping,
 				loff_t pos, unsigned len, unsigned copied,
 				struct page *page, void *fsdata);
+	ssize_t (*perform_write)(struct file *, struct address_space *mapping,
+				struct iov_iter *i, loff_t pos);
+
 	sector_t (*bmap)(struct address_space *, sector_t);
 	int (*invalidatepage) (struct page *, unsigned long);
 	int (*releasepage) (struct page *, int);
@@ -664,6 +667,17 @@ struct address_space_operations {
         Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
         that were able to be copied into pagecache.
 
+  perform_write: This is a single-call, bulk version of write_begin/write_end
+        operations. It is only used in the buffered write path (write_begin
+        must still be implemented), and not for in-kernel writes to pagecache.
+        It takes an iov_iter structure, which provides a descriptor for the
+        source data (and has associated iov_iter_xxx helpers to operate on
+        that data). There are also file, mapping, and pos arguments, which
+        specify the destination of the data.
+
+        Returns < 0 on failure if nothing was written out, otherwise returns
+        the number of bytes copied into pagecache.
+
   bmap: called by the VFS to map a logical block offset within object to
   	physical block number. This method is used by the FIBMAP
   	ioctl and for working with swap-files.  To be able to swap to

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 2/3] fuse: clean up setting i_size in write
  2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
  2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
  2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
  2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
  3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm

[-- Attachment #1: fuse_write_update_size.patch --]
[-- Type: text/plain, Size: 1851 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Extract common code for setting i_size in write functions into a
common helper.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c	2008-02-04 13:01:39.000000000 +0100
+++ linux/fs/fuse/file.c	2008-02-04 13:02:03.000000000 +0100
@@ -610,13 +610,24 @@ static int fuse_write_begin(struct file 
 	return 0;
 }
 
+static void fuse_write_update_size(struct inode *inode, loff_t pos)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	spin_lock(&fc->lock);
+	fi->attr_version = ++fc->attr_version;
+	if (pos > inode->i_size)
+		i_size_write(inode, pos);
+	spin_unlock(&fc->lock);
+}
+
 static int fuse_buffered_write(struct file *file, struct inode *inode,
 			       loff_t pos, unsigned count, struct page *page)
 {
 	int err;
 	size_t nres;
 	struct fuse_conn *fc = get_fuse_conn(inode);
-	struct fuse_inode *fi = get_fuse_inode(inode);
 	unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
 	struct fuse_req *req;
 
@@ -643,12 +654,7 @@ static int fuse_buffered_write(struct fi
 		err = -EIO;
 	if (!err) {
 		pos += nres;
-		spin_lock(&fc->lock);
-		fi->attr_version = ++fc->attr_version;
-		if (pos > inode->i_size)
-			i_size_write(inode, pos);
-		spin_unlock(&fc->lock);
-
+		fuse_write_update_size(inode, pos);
 		if (count == PAGE_CACHE_SIZE)
 			SetPageUptodate(page);
 	}
@@ -766,12 +772,8 @@ static ssize_t fuse_direct_io(struct fil
 	}
 	fuse_put_request(fc, req);
 	if (res > 0) {
-		if (write) {
-			spin_lock(&fc->lock);
-			if (pos > inode->i_size)
-				i_size_write(inode, pos);
-			spin_unlock(&fc->lock);
-		}
+		if (write)
+			fuse_write_update_size(inode, pos);
 		*ppos = pos;
 	}
 	fuse_invalidate_attr(inode);

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 3/3] fuse: implement perform_write
  2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
  2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
  2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
@ 2008-02-04 17:04 ` Miklos Szeredi
  2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
  3 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 17:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel, linux-mm, Nick Piggin

[-- Attachment #1: fuse_perform_write.patch --]
[-- Type: text/plain, Size: 4282 bytes --]

From: Nick Piggin <npiggin@suse.de>

Introduce fuse_perform_write. With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).

[mszeredi@suse.cz]:

 - split into smaller functions
 - testing

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/fuse/file.c
===================================================================
--- linux.orig/fs/fuse/file.c	2008-02-04 17:11:18.000000000 +0100
+++ linux/fs/fuse/file.c	2008-02-04 17:11:59.000000000 +0100
@@ -677,6 +677,148 @@ static int fuse_write_end(struct file *f
 	return res;
 }
 
+static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file,
+				    struct inode *inode, loff_t pos,
+				    size_t count)
+{
+	size_t res;
+	unsigned offset;
+	unsigned i;
+
+	for (i = 0; i < req->num_pages; i++)
+		fuse_wait_on_page_writeback(inode, req->pages[i]->index);
+
+	res = fuse_send_write(req, file, inode, pos, count, NULL);
+
+	offset = req->page_offset;
+	count = res;
+	for (i = 0; i < req->num_pages; i++) {
+		struct page *page = req->pages[i];
+
+		if (!req->out.h.error && !offset && count >= PAGE_CACHE_SIZE)
+			SetPageUptodate(page);
+
+		/* Just ignore count underflow on last page */
+		count -= PAGE_CACHE_SIZE - offset;
+		offset = 0;
+
+		unlock_page(page);
+		page_cache_release(page);
+	}
+
+	return res;
+}
+
+static ssize_t fuse_fill_write_pages(struct fuse_req *req,
+			       struct address_space *mapping,
+			       struct iov_iter *ii, loff_t pos)
+{
+	struct fuse_conn *fc = get_fuse_conn(mapping->host);
+	unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+	size_t count = 0;
+	int err;
+
+	req->page_offset = offset;
+
+	do {
+		size_t tmp;
+		struct page *page;
+		pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+		size_t bytes = min_t(size_t, PAGE_CACHE_SIZE - offset,
+				     iov_iter_count(ii));
+
+		bytes = min_t(size_t, bytes, fc->max_write - count);
+
+ again:
+		err = -EFAULT;
+		if (iov_iter_fault_in_readable(ii, bytes))
+			break;
+
+		err = -ENOMEM;
+		page = __grab_cache_page(mapping, index);
+		if (!page)
+			break;
+
+		pagefault_disable();
+		tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes);
+		pagefault_enable();
+		flush_dcache_page(page);
+
+		if (!tmp) {
+			unlock_page(page);
+			page_cache_release(page);
+			bytes = min(bytes, iov_iter_single_seg_count(ii));
+			goto again;
+		}
+
+		err = 0;
+		req->pages[req->num_pages] = page;
+		req->num_pages++;
+
+		iov_iter_advance(ii, tmp);
+		count += tmp;
+		pos += tmp;
+		offset += tmp;
+		if (offset == PAGE_CACHE_SIZE)
+			offset = 0;
+
+	} while (iov_iter_count(ii) && count < fc->max_write &&
+		 req->num_pages < FUSE_MAX_PAGES_PER_REQ && offset == 0);
+
+	return count > 0 ? count : err;
+}
+
+static ssize_t fuse_perform_write(struct file *file,
+				  struct address_space *mapping,
+				  struct iov_iter *ii, loff_t pos)
+{
+	struct inode *inode = mapping->host;
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	int err = 0;
+	ssize_t res = 0;
+
+	if (is_bad_inode(inode))
+		return -EIO;
+
+	do {
+		struct fuse_req *req;
+		ssize_t count;
+
+		req = fuse_get_req(fc);
+		if (IS_ERR(req)) {
+			err = PTR_ERR(req);
+			break;
+		}
+
+		count = fuse_fill_write_pages(req, mapping, ii, pos);
+		if (count <= 0) {
+			err = count;
+		} else {
+			size_t num_written;
+
+			num_written = fuse_send_write_pages(req, file, inode,
+							    pos, count);
+			err = req->out.h.error;
+			if (!err) {
+				res += num_written;
+				pos += num_written;
+
+				/* break out of the loop on short write */
+				if (num_written != count)
+					err = -EIO;
+			}
+		}
+		fuse_put_request(fc, req);
+	} while (!err && iov_iter_count(ii));
+
+	if (res > 0)
+		fuse_write_update_size(inode, pos);
+
+	fuse_invalidate_attr(inode);
+
+	return res > 0 ? res : err;
+}
+
 static void fuse_release_user_pages(struct fuse_req *req, int write)
 {
 	unsigned i;
@@ -1247,6 +1389,7 @@ static const struct address_space_operat
 	.launder_page	= fuse_launder_page,
 	.write_begin	= fuse_write_begin,
 	.write_end	= fuse_write_end,
+	.perform_write	= fuse_perform_write,
 	.readpages	= fuse_readpages,
 	.set_page_dirty	= __set_page_dirty_nobuffers,
 	.bmap		= fuse_bmap,

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 0/3] add perform_write to a_ops
  2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
                   ` (2 preceding siblings ...)
  2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
@ 2008-02-04 19:39 ` Christoph Hellwig
  2008-02-04 20:52   ` Miklos Szeredi
  3 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2008-02-04 19:39 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel, linux-mm

On Mon, Feb 04, 2008 at 06:04:10PM +0100, Miklos Szeredi wrote:
> a_ops->perform_write() was left out from Nick Piggin's new a_ops
> patchset, as it was non-essential, and postponed for later inclusion.
> 
> This short series reintroduces it, but only adds the fuse
> implementation and not simple_perform_write(), which I'm not sure
> would be a significant improvement.
> 
> This allows larger than 4k buffered writes for fuse, which is one of
> the most requested features.
> 
> This goes on top of the "fuse: writable mmap" patches.

Please don't do this, but rather implement your own .aio_write.  There's
very little in generic_file_aio_write that wouldn't be handle by
->perform_write and we should rather factor those up or move to higher
layers than adding this ill-defined abstraction.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 0/3] add perform_write to a_ops
  2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
@ 2008-02-04 20:52   ` Miklos Szeredi
  2008-02-04 20:59     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2008-02-04 20:52 UTC (permalink / raw)
  To: hch; +Cc: npiggin, akpm, linux-kernel, linux-fsdevel, linux-mm

> > a_ops->perform_write() was left out from Nick Piggin's new a_ops
> > patchset, as it was non-essential, and postponed for later inclusion.
> > 
> > This short series reintroduces it, but only adds the fuse
> > implementation and not simple_perform_write(), which I'm not sure
> > would be a significant improvement.
> > 
> > This allows larger than 4k buffered writes for fuse, which is one of
> > the most requested features.
> > 
> > This goes on top of the "fuse: writable mmap" patches.
> 
> Please don't do this, but rather implement your own .aio_write.  There's
> very little in generic_file_aio_write that wouldn't be handle by
> ->perform_write and we should rather factor those up or move to higher
> layers than adding this ill-defined abstraction.
> 

Moving up to higher layers might not be possible, due to lock/unlock
of i_mutex being inside generic_file_aio_write().

But with fuse being the only user, it's not a huge issue duplicating
some code.

Nick, were there any other candidates, that would want to use such an
interface in the future?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 0/3] add perform_write to a_ops
  2008-02-04 20:52   ` Miklos Szeredi
@ 2008-02-04 20:59     ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2008-02-04 20:59 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: hch, npiggin, akpm, linux-kernel, linux-fsdevel, linux-mm

On Mon, Feb 04, 2008 at 09:52:06PM +0100, Miklos Szeredi wrote:
> Moving up to higher layers might not be possible, due to lock/unlock
> of i_mutex being inside generic_file_aio_write().

Well some bits can be moved up.  Here's my grand plan which I plan
to implement once I get some time for it (or let someone else do
if they beat me):

 - generic_segment_checks goes to fs/read_write.c before caling into
   the filesystem
 - dito for vfs_check_frozen
 - generic_write_checks is a suitable helper already
 - dito for remove_suid
 - dito for file_update_time
 - after that there's not a whole lot left in generic_file_aio_write,
   except for direct I/O handling which will probably be very fs-specific
   if you have your own buffered I/O code

generic_file_buffered_write is an almost trivial wrapper around what's
->perform_write in Nick's earlier patches and a helper for the syncing
activity.



> 
> But with fuse being the only user, it's not a huge issue duplicating
> some code.
> 
> Nick, were there any other candidates, that would want to use such an
> interface in the future?
> 
> Thanks,
> Miklos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-04 21:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-04 17:04 [patch 0/3] add perform_write to a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 1/3] vfs: introduce perform_write in a_ops Miklos Szeredi
2008-02-04 17:04 ` [patch 2/3] fuse: clean up setting i_size in write Miklos Szeredi
2008-02-04 17:04 ` [patch 3/3] fuse: implement perform_write Miklos Szeredi
2008-02-04 19:39 ` [patch 0/3] add perform_write to a_ops Christoph Hellwig
2008-02-04 20:52   ` Miklos Szeredi
2008-02-04 20:59     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).