LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH v7] iomap: make inline data support more flexible
@ 2021-08-01 10:29 Andreas Gruenbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-08-01 10:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andreas Gruenbacher, Cc : Gao Xiang, Christoph Hellwig,
	Darrick J . Wong, Huang Jianan, linux-erofs, linux-fsdevel,
	linux-kernel

On Mon, Jul 26, 2021 at 2:33 PM Matthew Wilcox <willy@infradead.org> wrote:
> Only tangentially related ... why do we memcpy the data into the tail
> at write_end() time instead of at writepage() time?  I see there's a
> workaround for that in gfs2's page_mkwrite():
>
>         if (gfs2_is_stuffed(ip)) {
>                 err = gfs2_unstuff_dinode(ip);
>
> (an mmap store cannot change the size of the file, so this would be
> unnecessary)
>
> Something like this ...

We can't just bail out after iomap_write_inline_data in
iomap_writepage_map; the page also needs to be unlocked.  Also, we want
to dirty the inode after copying out the inline data and unlocking the
page to make sure the inode gets written out.

Not sure if this can be further simplified.

Tested on gfs2 on top of:

 [PATCH v9] iomap: Support file tail packing [1]
 [PATCH v2] iomap: Support inline data with block size < page size [2]
 [PATCH] gfs2: iomap inline data handling cleanup [3]

[1] https://lore.kernel.org/linux-fsdevel/20210727025956.80684-1-hsiangkao@linux.alibaba.com/	
[2] https://lore.kernel.org/linux-fsdevel/20210729032344.3975412-1-willy@infradead.org/
[3] https://listman.redhat.com/archives/cluster-devel/2021-July/msg00244.html

Thanks,
Andreas

---
 fs/gfs2/bmap.c         |  3 ---
 fs/gfs2/file.c         |  9 ---------
 fs/iomap/buffered-io.c | 29 +++++++++++++++++++----------
 3 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 84ad0fe787ea..4cea16d6a3fa 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2527,9 +2527,6 @@ static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
 {
 	int ret;
 
-	if (WARN_ON_ONCE(gfs2_is_stuffed(GFS2_I(inode))))
-		return -EIO;
-
 	if (offset >= wpc->iomap.offset &&
 	    offset < wpc->iomap.offset + wpc->iomap.length)
 		return 0;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 84ec053d43b4..ce8f5eb66db7 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -510,15 +510,6 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 		goto out_trans_fail;
 	}
 
-	/* Unstuff, if required, and allocate backing blocks for page */
-	if (gfs2_is_stuffed(ip)) {
-		err = gfs2_unstuff_dinode(ip);
-		if (err) {
-			ret = block_page_mkwrite_return(err);
-			goto out_trans_end;
-		}
-	}
-
 	lock_page(page);
 	/* If truncated, we must retry the operation, we may have raced
 	 * with the glock demotion code.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 77d4fe5c1327..a1eb876a9445 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -683,21 +683,23 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	return copied;
 }
 
-static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
-		struct iomap *iomap, loff_t pos, size_t copied)
+static int iomap_write_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap)
 {
+	size_t size = i_size_read(inode) - page_offset(page);
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
 	BUG_ON(!iomap_inline_data_valid(iomap));
+	if (WARN_ON_ONCE(size > iomap->length))
+		return -EIO;
 
 	flush_dcache_page(page);
 	addr = kmap_atomic(page);
-	memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
+	memcpy(iomap->inline_data, addr, size);
 	kunmap_atomic(addr);
 
-	mark_inode_dirty(inode);
-	return copied;
+	return 0;
 }
 
 /* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
@@ -709,9 +711,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	loff_t old_size = inode->i_size;
 	size_t ret;
 
-	if (srcmap->type == IOMAP_INLINE) {
-		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
-	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
+	if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
@@ -1329,6 +1329,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	u64 file_offset; /* file offset of page */
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
+	bool dirty_inode = false;
 
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
 
@@ -1346,8 +1347,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		error = wpc->ops->map_blocks(wpc, inode, file_offset);
 		if (error)
 			break;
-		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
-			continue;
+		if (wpc->iomap.type == IOMAP_INLINE) {
+			error = iomap_write_inline_data(inode, page,
+					&wpc->iomap);
+			if (!error)
+				dirty_inode = true;
+			break;
+		}
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
 		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
@@ -1405,6 +1411,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 */
 	if (!count)
 		end_page_writeback(page);
+
+	if (dirty_inode)
+		mark_inode_dirty(inode);
 done:
 	mapping_set_error(page->mapping, error);
 	return error;
-- 
2.26.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-27 13:35           ` Matthew Wilcox
  2021-07-27 15:04             ` Gao Xiang
@ 2021-07-27 16:53             ` David Sterba
  1 sibling, 0 replies; 31+ messages in thread
From: David Sterba @ 2021-07-27 16:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Andreas Gruenbacher, Gao Xiang,
	Darrick J . Wong, Huang Jianan, linux-erofs, linux-fsdevel,
	linux-kernel, Andreas Gruenbacher

On Tue, Jul 27, 2021 at 02:35:46PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 27, 2021 at 10:20:42AM +0200, David Sterba wrote:
> > On Mon, Jul 26, 2021 at 02:17:02PM +0200, Christoph Hellwig wrote:
> > > > Subject: iomap: Support tail packing
> > > 
> > > I can't say I like this "tail packing" language here when we have the
> > > perfectly fine inline wording.  Same for various comments in the actual
> > > code.
> > 
> > Yes please, don't call it tail-packing when it's an inline extent, we'll
> > use that for btrfs eventually and conflating the two terms has been
> > cofusing users. Except reiserfs, no linux filesystem does tail-packing.
> 
> Hmm ... I see what reiserfs does as packing tails of multiple files into
> one block.  What gfs2 (and ext4) do is inline data.  Erofs packs the
> tail of a single file into the same block as the inode.  If I understand
> what btrfs does correctly, it stores data in the btree.  But (like
> gfs2/ext4), it's only for the entire-file-is-small case, not for
> its-just-ten-bytes-into-the-last-block case.
> 
> So what would you call what erofs is doing if not tail-packing?

That indeed sounds like tail-packing and I was not aware of that, the
docs I found were not clear what exactly was going on with the data
stored inline.

> Wikipedia calls it https://en.wikipedia.org/wiki/Block_suballocation
> which doesn't quite fit.  We need a phrase which means "this isn't
> just for small files but for small tails of large files".

So that's more generic than what we now have as inline files, so in the
interface everybody sets 0 as start of the range while erofs can also
set start of the last partial block.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-27 13:35           ` Matthew Wilcox
@ 2021-07-27 15:04             ` Gao Xiang
  2021-07-27 16:53             ` David Sterba
  1 sibling, 0 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-27 15:04 UTC (permalink / raw)
  To: Matthew Wilcox, Darrick J . Wong
  Cc: dsterba, Christoph Hellwig, Andreas Gruenbacher, Huang Jianan,
	linux-erofs, linux-fsdevel, linux-kernel, Andreas Gruenbacher

On Tue, Jul 27, 2021 at 02:35:46PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 27, 2021 at 10:20:42AM +0200, David Sterba wrote:
> > On Mon, Jul 26, 2021 at 02:17:02PM +0200, Christoph Hellwig wrote:
> > > > Subject: iomap: Support tail packing
> > > 
> > > I can't say I like this "tail packing" language here when we have the
> > > perfectly fine inline wording.  Same for various comments in the actual
> > > code.
> > 
> > Yes please, don't call it tail-packing when it's an inline extent, we'll
> > use that for btrfs eventually and conflating the two terms has been
> > cofusing users. Except reiserfs, no linux filesystem does tail-packing.
> 
> Hmm ... I see what reiserfs does as packing tails of multiple files into
> one block.  What gfs2 (and ext4) do is inline data.  Erofs packs the
> tail of a single file into the same block as the inode.  If I understand

Plus each erofs block can have multiple inodes (thus multi-tail blocks) 
oo as long as the meta block itself can fit.

No matter what it's called, it's a kind of inline data (I think inline
means that data mixes with metadata according to [1]). I was called it
tail-block inline initially... whatever.

Hopefully, Darrick could update the v8 title if some concern here.

[1] https://www.kernel.org/doc/Documentation/filesystems/fiemap.txt

Thanks,
Gao Xiang

> what btrfs does correctly, it stores data in the btree.  But (like
> gfs2/ext4), it's only for the entire-file-is-small case, not for
> its-just-ten-bytes-into-the-last-block case.
> 
> So what would you call what erofs is doing if not tail-packing?
> Wikipedia calls it https://en.wikipedia.org/wiki/Block_suballocation
> which doesn't quite fit.  We need a phrase which means "this isn't
> just for small files but for small tails of large files".

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-27  8:20         ` David Sterba
@ 2021-07-27 13:35           ` Matthew Wilcox
  2021-07-27 15:04             ` Gao Xiang
  2021-07-27 16:53             ` David Sterba
  0 siblings, 2 replies; 31+ messages in thread
From: Matthew Wilcox @ 2021-07-27 13:35 UTC (permalink / raw)
  To: dsterba, Christoph Hellwig, Andreas Gruenbacher, Gao Xiang,
	Darrick J . Wong, Huang Jianan, linux-erofs, linux-fsdevel,
	linux-kernel, Andreas Gruenbacher

On Tue, Jul 27, 2021 at 10:20:42AM +0200, David Sterba wrote:
> On Mon, Jul 26, 2021 at 02:17:02PM +0200, Christoph Hellwig wrote:
> > > Subject: iomap: Support tail packing
> > 
> > I can't say I like this "tail packing" language here when we have the
> > perfectly fine inline wording.  Same for various comments in the actual
> > code.
> 
> Yes please, don't call it tail-packing when it's an inline extent, we'll
> use that for btrfs eventually and conflating the two terms has been
> cofusing users. Except reiserfs, no linux filesystem does tail-packing.

Hmm ... I see what reiserfs does as packing tails of multiple files into
one block.  What gfs2 (and ext4) do is inline data.  Erofs packs the
tail of a single file into the same block as the inode.  If I understand
what btrfs does correctly, it stores data in the btree.  But (like
gfs2/ext4), it's only for the entire-file-is-small case, not for
its-just-ten-bytes-into-the-last-block case.

So what would you call what erofs is doing if not tail-packing?
Wikipedia calls it https://en.wikipedia.org/wiki/Block_suballocation
which doesn't quite fit.  We need a phrase which means "this isn't
just for small files but for small tails of large files".

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:17       ` Christoph Hellwig
  2021-07-26 12:27         ` Andreas Grünbacher
@ 2021-07-27  8:20         ` David Sterba
  2021-07-27 13:35           ` Matthew Wilcox
  1 sibling, 1 reply; 31+ messages in thread
From: David Sterba @ 2021-07-27  8:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Gao Xiang, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, linux-kernel,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 02:17:02PM +0200, Christoph Hellwig wrote:
> > Subject: iomap: Support tail packing
> 
> I can't say I like this "tail packing" language here when we have the
> perfectly fine inline wording.  Same for various comments in the actual
> code.

Yes please, don't call it tail-packing when it's an inline extent, we'll
use that for btrfs eventually and conflating the two terms has been
cofusing users. Except reiserfs, no linux filesystem does tail-packing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 21:36       ` Darrick J. Wong
@ 2021-07-26 22:20         ` Andreas Grünbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Grünbacher @ 2021-07-26 22:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, LKML

Am Mo., 26. Juli 2021 um 23:36 Uhr schrieb Darrick J. Wong <djwong@kernel.org>:
> On Mon, Jul 26, 2021 at 09:22:41AM +0200, Andreas Gruenbacher wrote:
> > On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > > > Here's a fixed and cleaned up version that passes fstests on gfs2.
> > > >
> > > > I see no reason why the combination of tail packing + writing should
> > > > cause any issues, so in my opinion, the check that disables that
> > > > combination in iomap_write_begin_inline should still be removed.
> > >
> > > Since there is no such fs for tail-packing write, I just do a wild
> > > guess, for example,
> > >  1) the tail-end block was not inlined, so iomap_write_end() dirtied
> > >     the whole page (or buffer) for the page writeback;
> > >  2) then it was truncated into a tail-packing inline block so the last
> > >     extent(page) became INLINE but dirty instead;
> > >  3) during the late page writeback for dirty pages,
> > >     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> > >     would be triggered in iomap_writepage_map() for such dirty page.
> > >
> > > As Matthew pointed out before,
> > > https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> > > currently tail-packing inline won't interact with page writeback, but
> > > I'm afraid a supported tail-packing write fs needs to reconsider the
> > > whole stuff how page, inode writeback works and what the pattern is
> > > with the tail-packing.
> > >
> > > >
> > > > It turns out that returning the number of bytes copied from
> > > > iomap_read_inline_data is a bit irritating: the function is really used
> > > > for filling the page, but that's not always the "progress" we're looking
> > > > for.  In the iomap_readpage case, we actually need to advance by an
> > > > antire page, but in the iomap_file_buffered_write case, we need to
> > > > advance by the length parameter of iomap_write_actor or less.  So I've
> > > > changed that back.
> > > >
> > > > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > > > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > > > more useful to me.
> > > >
> > > > Thanks,
> > > > Andreas
> > > >
> > > > --
> > > >
> > > > Subject: [PATCH] iomap: Support tail packing
> > > >
> > > > The existing inline data support only works for cases where the entire
> > > > file is stored as inline data.  For larger files, EROFS stores the
> > > > initial blocks separately and then can pack a small tail adjacent to the
> > > > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > > > cross a page boundary in memory.
> > > >
> > > > We currently have no filesystems that support tail packing and writing,
> > > > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > > > not aware of any reason why this code path shouldn't work, however.
> > > >
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > > Cc: Matthew Wilcox <willy@infradead.org>
> > > > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > > > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > > >  fs/iomap/direct-io.c   | 11 ++++++-----
> > > >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> > > >  3 files changed, 50 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index 87ccb3438bec..334bf98fdd4a 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> > > >       struct readahead_control *rac;
> > > >  };
> > > >
> > > > -static void
> > > > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > > > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > > >               struct iomap *iomap)
> > > >  {
> > > > -     size_t size = i_size_read(inode);
> > > > +     size_t size = i_size_read(inode) - iomap->offset;
> > >
> > > I wonder why you use i_size / iomap->offset here,
> >
> > This function is supposed to copy the inline or tail data at
> > iomap->inline_data into the page passed to it. Logically, the inline
> > data starts at iomap->offset and extends until i_size_read(inode).
> > Relative to the page, the inline data starts at offset 0 and extends
> > until i_size_read(inode) - iomap->offset. It's as simple as that.
>
> It's only as simple as that because the inline data read code is overfit
> to the single use case (gfs2) that it supports.  So far in its history,
> iomap has never had to support inline data regions that do not coincide
> or overlap with EOF, nor has it had to support regions that do not start
> at pos==0.  That is why it is appropriate to use the memcpy -> memset ->
> return PAGE_SIZE pattern and short-circuit what we do everywhere else in
> iomap.
>
> For a non-inline readpage call, filesystems are allowed to return
> mappings for blocks beyond EOF.  The call to iomap_adjust_read_range
> sets us up to read data from disk through the EOF block, and for the
> remainder of the page we zero the post-eof blocks within that page.
>
> IOWs, for reads, __gfs2_iomap_get probably ought to set iomap->length to
> gfs2_max_stuffed_size() like it does for writes, and we ought to
> generalize iomap_read_inline_data to stop copying after
> min(iomap->length, i_size_read() - iomap->offset) bytes.  If it then
> discovers that it has indeed reached EOF, then we can zero the rest of
> the page and add that quantity to the number of bytes read.

That sounds like a useful improvement. I'll give it a try.

Thanks,
Andreas

> Right now for gfs2 the two arguments to min are always the same so the
> function omits all the bits that would make the zeroing actually
> conditional on whether we really hit EOF, and pass any copied size other
> than PAGE_SIZE back to iomap_readpage_actor.  Given that we still don't
> have any filesystems that require us to support inline regions entirely
> below EOF I'm fine with omitting the general (and hence untestable)
> solution... for now.
>
> (I now think I understand why someone brought up inline data regions in
> the middle of files last week.)
>
> --D
>
> > > and why you completely ignoring iomap->length field returning by fs.
> >
> > In the iomap_readpage case (iomap_begin with flags == 0),
> > iomap->length will be the amount of data up to the end of the inode.
> > In the iomap_file_buffered_write case (iomap_begin with flags ==
> > IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> > (For extending writes, we need to write beyond the current end of
> > inode.) So iomap->length isn't all that useful for
> > iomap_read_inline_data.
> >
> > > Using i_size here instead of iomap->length seems coupling to me in the
> > > beginning (even currently in practice there is some limitation.)
> >
> > And what is that?
> >
> > Thanks,
> > Andreas
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  7:22     ` Andreas Gruenbacher
  2021-07-26  7:38       ` Gao Xiang
@ 2021-07-26 21:36       ` Darrick J. Wong
  2021-07-26 22:20         ` Andreas Grünbacher
  1 sibling, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2021-07-26 21:36 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matthew Wilcox, Huang Jianan, linux-erofs,
	linux-fsdevel, LKML, Andreas Gruenbacher

On Mon, Jul 26, 2021 at 09:22:41AM +0200, Andreas Gruenbacher wrote:
> On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > > Here's a fixed and cleaned up version that passes fstests on gfs2.
> > >
> > > I see no reason why the combination of tail packing + writing should
> > > cause any issues, so in my opinion, the check that disables that
> > > combination in iomap_write_begin_inline should still be removed.
> >
> > Since there is no such fs for tail-packing write, I just do a wild
> > guess, for example,
> >  1) the tail-end block was not inlined, so iomap_write_end() dirtied
> >     the whole page (or buffer) for the page writeback;
> >  2) then it was truncated into a tail-packing inline block so the last
> >     extent(page) became INLINE but dirty instead;
> >  3) during the late page writeback for dirty pages,
> >     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> >     would be triggered in iomap_writepage_map() for such dirty page.
> >
> > As Matthew pointed out before,
> > https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> > currently tail-packing inline won't interact with page writeback, but
> > I'm afraid a supported tail-packing write fs needs to reconsider the
> > whole stuff how page, inode writeback works and what the pattern is
> > with the tail-packing.
> >
> > >
> > > It turns out that returning the number of bytes copied from
> > > iomap_read_inline_data is a bit irritating: the function is really used
> > > for filling the page, but that's not always the "progress" we're looking
> > > for.  In the iomap_readpage case, we actually need to advance by an
> > > antire page, but in the iomap_file_buffered_write case, we need to
> > > advance by the length parameter of iomap_write_actor or less.  So I've
> > > changed that back.
> > >
> > > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > > more useful to me.
> > >
> > > Thanks,
> > > Andreas
> > >
> > > --
> > >
> > > Subject: [PATCH] iomap: Support tail packing
> > >
> > > The existing inline data support only works for cases where the entire
> > > file is stored as inline data.  For larger files, EROFS stores the
> > > initial blocks separately and then can pack a small tail adjacent to the
> > > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > > cross a page boundary in memory.
> > >
> > > We currently have no filesystems that support tail packing and writing,
> > > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > > not aware of any reason why this code path shouldn't work, however.
> > >
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > ---
> > >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > >  fs/iomap/direct-io.c   | 11 ++++++-----
> > >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> > >  3 files changed, 50 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 87ccb3438bec..334bf98fdd4a 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> > >       struct readahead_control *rac;
> > >  };
> > >
> > > -static void
> > > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > >               struct iomap *iomap)
> > >  {
> > > -     size_t size = i_size_read(inode);
> > > +     size_t size = i_size_read(inode) - iomap->offset;
> >
> > I wonder why you use i_size / iomap->offset here,
> 
> This function is supposed to copy the inline or tail data at
> iomap->inline_data into the page passed to it. Logically, the inline
> data starts at iomap->offset and extends until i_size_read(inode).
> Relative to the page, the inline data starts at offset 0 and extends
> until i_size_read(inode) - iomap->offset. It's as simple as that.

It's only as simple as that because the inline data read code is overfit
to the single use case (gfs2) that it supports.  So far in its history,
iomap has never had to support inline data regions that do not coincide
or overlap with EOF, nor has it had to support regions that do not start
at pos==0.  That is why it is appropriate to use the memcpy -> memset ->
return PAGE_SIZE pattern and short-circuit what we do everywhere else in
iomap.

For a non-inline readpage call, filesystems are allowed to return
mappings for blocks beyond EOF.  The call to iomap_adjust_read_range
sets us up to read data from disk through the EOF block, and for the
remainder of the page we zero the post-eof blocks within that page.

IOWs, for reads, __gfs2_iomap_get probably ought to set iomap->length to
gfs2_max_stuffed_size() like it does for writes, and we ought to
generalize iomap_read_inline_data to stop copying after
min(iomap->length, i_size_read() - iomap->offset) bytes.  If it then
discovers that it has indeed reached EOF, then we can zero the rest of
the page and add that quantity to the number of bytes read.

Right now for gfs2 the two arguments to min are always the same so the
function omits all the bits that would make the zeroing actually
conditional on whether we really hit EOF, and pass any copied size other
than PAGE_SIZE back to iomap_readpage_actor.  Given that we still don't
have any filesystems that require us to support inline regions entirely
below EOF I'm fine with omitting the general (and hence untestable)
solution... for now.

(I now think I understand why someone brought up inline data regions in
the middle of files last week.)

--D

> > and why you completely ignoring iomap->length field returning by fs.
> 
> In the iomap_readpage case (iomap_begin with flags == 0),
> iomap->length will be the amount of data up to the end of the inode.
> In the iomap_file_buffered_write case (iomap_begin with flags ==
> IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> (For extending writes, we need to write beyond the current end of
> inode.) So iomap->length isn't all that useful for
> iomap_read_inline_data.
> 
> > Using i_size here instead of iomap->length seems coupling to me in the
> > beginning (even currently in practice there is some limitation.)
> 
> And what is that?
> 
> Thanks,
> Andreas
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 13:12           ` Gao Xiang
@ 2021-07-26 13:30             ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2021-07-26 13:30 UTC (permalink / raw)
  To: Andreas Gruenbacher, Matthew Wilcox, Christoph Hellwig,
	Darrick J . Wong, Huang Jianan, linux-erofs, linux-fsdevel, LKML,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 09:12:44PM +0800, Gao Xiang wrote:
> I tend to leave it as another new story and can be resolved with
> another patch to improve it (or I will stuck in this, I need to do
> my own development stuff instead of spinning with this iomap patch
> since I can see this already work well for gfs2 and erofs), I will
> update the patch Andreas posted with Christoph's comments.

Yes, let's do the minimal work for erofs needs in a first step.
After that I hope I can get the iomap_iter change in and then we
can start the next projects.  That being said moving the copy back
to the inline data to writepage does seem very useful, as does the
"small" blocksize support from Matthew.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:27         ` Andreas Grünbacher
  2021-07-26 12:50           ` Gao Xiang
@ 2021-07-26 13:27           ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2021-07-26 13:27 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Christoph Hellwig, Andreas Gruenbacher, Gao Xiang,
	Darrick J . Wong, Matthew Wilcox, Huang Jianan, linux-erofs,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Jul 26, 2021 at 02:27:12PM +0200, Andreas Grünbacher wrote:
> > That is how can size be different from iomap->length?
> 
> Quoting from my previous reply,
> 
> "In the iomap_readpage case (iomap_begin with flags == 0),
> iomap->length will be the amount of data up to the end of the inode.
> In the iomap_file_buffered_write case (iomap_begin with flags ==
> IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> (For extending writes, we need to write beyond the current end of
> inode.) So iomap->length isn't all that useful for
> iomap_read_inline_data."

I think we should fix that now that we have the srcmap concept.
That is or IOMAP_WRITE|IOMAP_ZERO return the inline map as the soure
map, and return the actual block map we plan to write into as the
main iomap.

> 
> > Shouldn't the offset_in_page also go into iomap_inline_data_size_valid,
> > which should probably be called iomap_inline_data_valid then?
> 
> Hmm, not sure what you mean: iomap_inline_data_size_valid does take
> offset_in_page(iomap->inline_data) into account.

Indeed, orry for the braino.

> I thought people were okay with 80 character long lines?

No.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 13:03         ` Andreas Gruenbacher
@ 2021-07-26 13:12           ` Gao Xiang
  2021-07-26 13:30             ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Gao Xiang @ 2021-07-26 13:12 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Matthew Wilcox, Christoph Hellwig, Darrick J . Wong,
	Huang Jianan, linux-erofs, linux-fsdevel, LKML,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 03:03:14PM +0200, Andreas Gruenbacher wrote:
> On Mon, Jul 26, 2021 at 2:33 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Mon, Jul 26, 2021 at 01:06:11PM +0200, Andreas Gruenbacher wrote:
> > > @@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> > >       void *addr;
> > >
> > >       WARN_ON_ONCE(!PageUptodate(page));
> > > -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > > +     BUG_ON(!iomap_inline_data_size_valid(iomap));
> > >
> > >       flush_dcache_page(page);
> > >       addr = kmap_atomic(page);
> > > -     memcpy(iomap->inline_data + pos, addr + pos, copied);
> > > +     memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
> > >       kunmap_atomic(addr);
> > >
> > >       mark_inode_dirty(inode);
> >
> > Only tangentially related ... why do we memcpy the data into the tail
> > at write_end() time instead of at writepage() time?  I see there's a
> > workaround for that in gfs2's page_mkwrite():
> >
> >         if (gfs2_is_stuffed(ip)) {
> >                 err = gfs2_unstuff_dinode(ip);
> >
> > (an mmap store cannot change the size of the file, so this would be
> > unnecessary)
> 
> Not sure if an additional __set_page_dirty_nobuffers is needed in that
> case, but doing the writeback at writepage time should work just as
> well. It's just that gfs2 did it at write time historically. The
> un-inlining in gfs2_page_mkwrite() could probably also be removed.
> 
> I can give this a try, but I'll unfortunately be AFK for the next
> couple of days.

I tend to leave it as another new story and can be resolved with
another patch to improve it (or I will stuck in this, I need to do
my own development stuff instead of spinning with this iomap patch
since I can see this already work well for gfs2 and erofs), I will
update the patch Andreas posted with Christoph's comments.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:50           ` Gao Xiang
@ 2021-07-26 13:10             ` Andreas Gruenbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-26 13:10 UTC (permalink / raw)
  To: Andreas Grünbacher, Christoph Hellwig, Andreas Gruenbacher,
	Darrick J . Wong, Matthew Wilcox, Huang Jianan, linux-erofs,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Jul 26, 2021 at 2:51 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> Hi Andreas, Christoph,
>
> On Mon, Jul 26, 2021 at 02:27:12PM +0200, Andreas Grünbacher wrote:
> > Am Mo., 26. Juli 2021 um 14:17 Uhr schrieb Christoph Hellwig <hch@lst.de>:
> > >
> > > > Subject: iomap: Support tail packing
> > >
> > > I can't say I like this "tail packing" language here when we have the
> > > perfectly fine inline wording.  Same for various comments in the actual
> > > code.
> > >
> > > > +     /* inline and tail-packed data must start page aligned in the file */
> > > > +     if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> > > > +             return -EIO;
> > > > +     if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> > > > +             return -EIO;
> > >
> > > Why can't we use iomap_inline_data_size_valid here?
> >
> > We can now. Gao, can you change that?
>
> Thank you all taking so much time on this! much appreciated.
>
> I'm fine to update that.
>
> >
> > > That is how can size be different from iomap->length?
> >
> > Quoting from my previous reply,
> >
> > "In the iomap_readpage case (iomap_begin with flags == 0),
> > iomap->length will be the amount of data up to the end of the inode.
>
> For tail-packing cases, iomap->length is just the length of tail-packing
> inline extent.
>
> > In the iomap_file_buffered_write case (iomap_begin with flags ==
> > IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> > (For extending writes, we need to write beyond the current end of
> > inode.) So iomap->length isn't all that useful for
> > iomap_read_inline_data."
>
> Ok, now it seems I get your point. For the current gfs2 inline cases:
>   iomap_write_begin
>     iomap_write_begin_inline
>       iomap_read_inline_data
>
> here, gfs2 passes a buffer instead with "iomap->length", maybe it
> could be larger than i_size_read(inode) for gfs2. Is that correct?
>
>         loff_t max_size = gfs2_max_stuffed_size(ip);
>
>         iomap->length = max_size;
>
> If that is what gfs2 currently does, I think it makes sense to
> temporarily use as this, but IMO, iomap->inline_bufsize is not
> iomap->length. These are 2 different concepts.
>
> >
> > > Shouldn't the offset_in_page also go into iomap_inline_data_size_valid,
> > > which should probably be called iomap_inline_data_valid then?
> >
> > Hmm, not sure what you mean: iomap_inline_data_size_valid does take
> > offset_in_page(iomap->inline_data) into account.
> >
> > > >       if (iomap->type == IOMAP_INLINE) {
> > > > +             int ret = iomap_read_inline_data(inode, page, iomap);
> > > > +             return ret ?: PAGE_SIZE;
> >
> > > The ?: expression without the first leg is really confuing.  Especially
> > > if a good old if is much more readable here.
> >
> > I'm sure Gao can change this.
> >
> > >                 int ret = iomap_read_inline_data(inode, page, iomap);
> > >
> > >                 if (ret)
> > >                         return ret;
> > >                 return PAGE_SIZE;
>
> I'm fine to update it if no strong opinion.
>
> > >
> > > > +             copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
> > >
> > >
> > > > +             copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);
> > >
> > > Pleae avoid the overly long lines.
> >
> > I thought people were okay with 80 character long lines?
>
> Christoph mentioned before as below:
> https://lore.kernel.org/linux-fsdevel/YPVe41YqpfGLNsBS@infradead.org/
>
> We also need to take the offset into account for the write side.
> I guess it would be nice to have a local variable for the inline
> address to not duplicate that calculation multiple times.

Fair enough, we could add a local variable:

  void *inline_data = iomap_inline_data(iomap, pos);

and use that in the copy_from_iter and copy_to_iter. Why not.

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:32       ` Matthew Wilcox
@ 2021-07-26 13:03         ` Andreas Gruenbacher
  2021-07-26 13:12           ` Gao Xiang
  0 siblings, 1 reply; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-26 13:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Gao Xiang, Christoph Hellwig, Darrick J . Wong, Huang Jianan,
	linux-erofs, linux-fsdevel, LKML, Andreas Gruenbacher

On Mon, Jul 26, 2021 at 2:33 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Mon, Jul 26, 2021 at 01:06:11PM +0200, Andreas Gruenbacher wrote:
> > @@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> >       void *addr;
> >
> >       WARN_ON_ONCE(!PageUptodate(page));
> > -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     BUG_ON(!iomap_inline_data_size_valid(iomap));
> >
> >       flush_dcache_page(page);
> >       addr = kmap_atomic(page);
> > -     memcpy(iomap->inline_data + pos, addr + pos, copied);
> > +     memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
> >       kunmap_atomic(addr);
> >
> >       mark_inode_dirty(inode);
>
> Only tangentially related ... why do we memcpy the data into the tail
> at write_end() time instead of at writepage() time?  I see there's a
> workaround for that in gfs2's page_mkwrite():
>
>         if (gfs2_is_stuffed(ip)) {
>                 err = gfs2_unstuff_dinode(ip);
>
> (an mmap store cannot change the size of the file, so this would be
> unnecessary)

Not sure if an additional __set_page_dirty_nobuffers is needed in that
case, but doing the writeback at writepage time should work just as
well. It's just that gfs2 did it at write time historically. The
un-inlining in gfs2_page_mkwrite() could probably also be removed.

I can give this a try, but I'll unfortunately be AFK for the next
couple of days.

> Something like this ...
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438bec..3aeebe899fc5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -665,9 +665,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>         return copied;
>  }
>
> -static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> -               struct iomap *iomap, loff_t pos, size_t copied)
> +static int iomap_write_inline_data(struct inode *inode, struct page *page,
> +               struct iomap *iomap)
>  {
> +       size_t size = i_size_read(inode) - page_offset(page);

You surely mean inode->i_size - iomap->offset.

>         void *addr;
>
>         WARN_ON_ONCE(!PageUptodate(page));
> @@ -675,11 +676,10 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
>
>         flush_dcache_page(page);
>         addr = kmap_atomic(page);
> -       memcpy(iomap->inline_data + pos, addr + pos, copied);
> +       memcpy(iomap->inline_data, addr, size);
>         kunmap_atomic(addr);
>
> -       mark_inode_dirty(inode);
> -       return copied;
> +       return 0;
>  }
>
>  /* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
> @@ -691,9 +691,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>         loff_t old_size = inode->i_size;
>         size_t ret;
>
> -       if (srcmap->type == IOMAP_INLINE) {
> -               ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> -       } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> +       if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
>                 ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>                                 page, NULL);
>         } else {
> @@ -1314,6 +1312,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>
>         WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
>
> +       if (wpc->iomap.type == IOMAP_INLINE)
> +               return iomap_write_inline_data(inode, page, iomap);
> +
>         /*
>          * Walk through the page to find areas to write back. If we run off the
>          * end of the current map or find the current map invalid, grab a new
> @@ -1328,8 +1329,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>                 error = wpc->ops->map_blocks(wpc, inode, file_offset);
>                 if (error)
>                         break;
> -               if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> -                       continue;
>                 if (wpc->iomap.type == IOMAP_HOLE)
>                         continue;
>                 iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:27         ` Andreas Grünbacher
@ 2021-07-26 12:50           ` Gao Xiang
  2021-07-26 13:10             ` Andreas Gruenbacher
  2021-07-26 13:27           ` Christoph Hellwig
  1 sibling, 1 reply; 31+ messages in thread
From: Gao Xiang @ 2021-07-26 12:50 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Christoph Hellwig, Andreas Gruenbacher, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

Hi Andreas, Christoph,

On Mon, Jul 26, 2021 at 02:27:12PM +0200, Andreas Grünbacher wrote:
> Am Mo., 26. Juli 2021 um 14:17 Uhr schrieb Christoph Hellwig <hch@lst.de>:
> >
> > > Subject: iomap: Support tail packing
> >
> > I can't say I like this "tail packing" language here when we have the
> > perfectly fine inline wording.  Same for various comments in the actual
> > code.
> >
> > > +     /* inline and tail-packed data must start page aligned in the file */
> > > +     if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> > > +             return -EIO;
> > > +     if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> > > +             return -EIO;
> >
> > Why can't we use iomap_inline_data_size_valid here?
> 
> We can now. Gao, can you change that?

Thank you all taking so much time on this! much appreciated.

I'm fine to update that.

> 
> > That is how can size be different from iomap->length?
> 
> Quoting from my previous reply,
> 
> "In the iomap_readpage case (iomap_begin with flags == 0),
> iomap->length will be the amount of data up to the end of the inode.

For tail-packing cases, iomap->length is just the length of tail-packing
inline extent.

> In the iomap_file_buffered_write case (iomap_begin with flags ==
> IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> (For extending writes, we need to write beyond the current end of
> inode.) So iomap->length isn't all that useful for
> iomap_read_inline_data."

Ok, now it seems I get your point. For the current gfs2 inline cases:
  iomap_write_begin
    iomap_write_begin_inline
      iomap_read_inline_data

here, gfs2 passes a buffer instead with "iomap->length", maybe it
could be larger than i_size_read(inode) for gfs2. Is that correct?

	loff_t max_size = gfs2_max_stuffed_size(ip);

	iomap->length = max_size;

If that is what gfs2 currently does, I think it makes sense to
temporarily use as this, but IMO, iomap->inline_bufsize is not
iomap->length. These are 2 different concepts.

> 
> > Shouldn't the offset_in_page also go into iomap_inline_data_size_valid,
> > which should probably be called iomap_inline_data_valid then?
> 
> Hmm, not sure what you mean: iomap_inline_data_size_valid does take
> offset_in_page(iomap->inline_data) into account.
> 
> > >       if (iomap->type == IOMAP_INLINE) {
> > > +             int ret = iomap_read_inline_data(inode, page, iomap);
> > > +             return ret ?: PAGE_SIZE;
>
> > The ?: expression without the first leg is really confuing.  Especially
> > if a good old if is much more readable here.
> 
> I'm sure Gao can change this.
> 
> >                 int ret = iomap_read_inline_data(inode, page, iomap);
> >
> >                 if (ret)
> >                         return ret;
> >                 return PAGE_SIZE;

I'm fine to update it if no strong opinion.

> >
> > > +             copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
> >
> >
> > > +             copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);
> >
> > Pleae avoid the overly long lines.
> 
> I thought people were okay with 80 character long lines?

Christoph mentioned before as below:
https://lore.kernel.org/linux-fsdevel/YPVe41YqpfGLNsBS@infradead.org/

We also need to take the offset into account for the write side.
I guess it would be nice to have a local variable for the inline
address to not duplicate that calculation multiple times.

Thanks,
Gao Xiang

> 
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 11:06     ` Andreas Gruenbacher
  2021-07-26 12:17       ` Christoph Hellwig
@ 2021-07-26 12:32       ` Matthew Wilcox
  2021-07-26 13:03         ` Andreas Gruenbacher
  1 sibling, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2021-07-26 12:32 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Gao Xiang, Christoph Hellwig, Darrick J . Wong, Huang Jianan,
	linux-erofs, linux-fsdevel, linux-kernel, Andreas Gruenbacher

On Mon, Jul 26, 2021 at 01:06:11PM +0200, Andreas Gruenbacher wrote:
> @@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
>  	void *addr;
>  
>  	WARN_ON_ONCE(!PageUptodate(page));
> -	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!iomap_inline_data_size_valid(iomap));
>  
>  	flush_dcache_page(page);
>  	addr = kmap_atomic(page);
> -	memcpy(iomap->inline_data + pos, addr + pos, copied);
> +	memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);

Only tangentially related ... why do we memcpy the data into the tail
at write_end() time instead of at writepage() time?  I see there's a
workaround for that in gfs2's page_mkwrite():

        if (gfs2_is_stuffed(ip)) {
                err = gfs2_unstuff_dinode(ip);

(an mmap store cannot change the size of the file, so this would be
unnecessary)

Something like this ...

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 87ccb3438bec..3aeebe899fc5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -665,9 +665,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	return copied;
 }
 
-static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
-		struct iomap *iomap, loff_t pos, size_t copied)
+static int iomap_write_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap)
 {
+	size_t size = i_size_read(inode) - page_offset(page);
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
@@ -675,11 +676,10 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 
 	flush_dcache_page(page);
 	addr = kmap_atomic(page);
-	memcpy(iomap->inline_data + pos, addr + pos, copied);
+	memcpy(iomap->inline_data, addr, size);
 	kunmap_atomic(addr);
 
-	mark_inode_dirty(inode);
-	return copied;
+	return 0;
 }
 
 /* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
@@ -691,9 +691,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	loff_t old_size = inode->i_size;
 	size_t ret;
 
-	if (srcmap->type == IOMAP_INLINE) {
-		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
-	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
+	if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
@@ -1314,6 +1312,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
 
+	if (wpc->iomap.type == IOMAP_INLINE)
+		return iomap_write_inline_data(inode, page, iomap);
+
 	/*
 	 * Walk through the page to find areas to write back. If we run off the
 	 * end of the current map or find the current map invalid, grab a new
@@ -1328,8 +1329,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		error = wpc->ops->map_blocks(wpc, inode, file_offset);
 		if (error)
 			break;
-		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
-			continue;
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
 		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 12:17       ` Christoph Hellwig
@ 2021-07-26 12:27         ` Andreas Grünbacher
  2021-07-26 12:50           ` Gao Xiang
  2021-07-26 13:27           ` Christoph Hellwig
  2021-07-27  8:20         ` David Sterba
  1 sibling, 2 replies; 31+ messages in thread
From: Andreas Grünbacher @ 2021-07-26 12:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Gao Xiang, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, Linux FS-devel Mailing List,
	Linux Kernel Mailing List

Am Mo., 26. Juli 2021 um 14:17 Uhr schrieb Christoph Hellwig <hch@lst.de>:
>
> > Subject: iomap: Support tail packing
>
> I can't say I like this "tail packing" language here when we have the
> perfectly fine inline wording.  Same for various comments in the actual
> code.
>
> > +     /* inline and tail-packed data must start page aligned in the file */
> > +     if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> > +             return -EIO;
> > +     if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> > +             return -EIO;
>
> Why can't we use iomap_inline_data_size_valid here?

We can now. Gao, can you change that?

> That is how can size be different from iomap->length?

Quoting from my previous reply,

"In the iomap_readpage case (iomap_begin with flags == 0),
iomap->length will be the amount of data up to the end of the inode.
In the iomap_file_buffered_write case (iomap_begin with flags ==
IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
(For extending writes, we need to write beyond the current end of
inode.) So iomap->length isn't all that useful for
iomap_read_inline_data."

> Shouldn't the offset_in_page also go into iomap_inline_data_size_valid,
> which should probably be called iomap_inline_data_valid then?

Hmm, not sure what you mean: iomap_inline_data_size_valid does take
offset_in_page(iomap->inline_data) into account.

> >       if (iomap->type == IOMAP_INLINE) {
> > +             int ret = iomap_read_inline_data(inode, page, iomap);
> > +             return ret ?: PAGE_SIZE;
>
> The ?: expression without the first leg is really confuing.  Especially
> if a good old if is much more readable here.

I'm sure Gao can change this.

>                 int ret = iomap_read_inline_data(inode, page, iomap);
>
>                 if (ret)
>                         return ret;
>                 return PAGE_SIZE;
>
> > +             copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
>
>
> > +             copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);
>
> Pleae avoid the overly long lines.

I thought people were okay with 80 character long lines?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26 11:06     ` Andreas Gruenbacher
@ 2021-07-26 12:17       ` Christoph Hellwig
  2021-07-26 12:27         ` Andreas Grünbacher
  2021-07-27  8:20         ` David Sterba
  2021-07-26 12:32       ` Matthew Wilcox
  1 sibling, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2021-07-26 12:17 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Gao Xiang, Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, linux-kernel,
	Andreas Gruenbacher

> Subject: iomap: Support tail packing

I can't say I like this "tail packing" language here when we have the
perfectly fine inline wording.  Same for various comments in the actual
code.

> +	/* inline and tail-packed data must start page aligned in the file */
> +	if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> +		return -EIO;
> +	if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> +		return -EIO;

Why can't we use iomap_inline_data_size_valid here? That is how can
size be different from iomap->legth?

Shouldn't the offset_in_page also go into iomap_inline_data_size_valid,
which should probably be called iomap_inline_data_valid then?

>  	if (iomap->type == IOMAP_INLINE) {
> +		int ret = iomap_read_inline_data(inode, page, iomap);
> +		return ret ?: PAGE_SIZE;

The ?: expression without the first leg is really confuing.  Especially
if a good old if is much more readable here.

		int ret = iomap_read_inline_data(inode, page, iomap);

		if (ret)
			return ret;
		return PAGE_SIZE;

> +		copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);


> +		copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);

Pleae avoid the overly long lines.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  4:00   ` Gao Xiang
  2021-07-26  8:08     ` Andreas Grünbacher
@ 2021-07-26 11:06     ` Andreas Gruenbacher
  2021-07-26 12:17       ` Christoph Hellwig
  2021-07-26 12:32       ` Matthew Wilcox
  1 sibling, 2 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-26 11:06 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Andreas Gruenbacher, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs, linux-fsdevel,
	linux-kernel, Andreas Gruenbacher

Here's the promised update.  Passes fstests on gfs2.

Thanks,
Andreas

--

Subject: iomap: Support tail packing

The existing inline data support only works for cases where the entire
file is stored as inline data.  For larger files, EROFS stores the
initial blocks separately and then can pack a small tail adjacent to the
inode.  Generalise inline data to allow for tail packing.  Tails may not
cross a page boundary in memory.

We currently have no filesystems that support tail packing and writing,
so that case is currently disabled (see iomap_write_begin_inline).  I'm
not aware of any reason why this code path shouldn't work, however.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/iomap/buffered-io.c | 38 +++++++++++++++++++++++++-------------
 fs/iomap/direct-io.c   |  9 +++++----
 include/linux/iomap.h  | 20 +++++++++++++++++++-
 3 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 87ccb3438bec..dee6b0952ef8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
 	struct readahead_control *rac;
 };
 
-static void
-iomap_read_inline_data(struct inode *inode, struct page *page,
+static int iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
 {
-	size_t size = i_size_read(inode);
+	size_t size = i_size_read(inode) - iomap->offset;
 	void *addr;
 
 	if (PageUptodate(page))
-		return;
+		return 0;
 
-	BUG_ON(page_has_private(page));
-	BUG_ON(page->index);
-	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	/* inline and tail-packed data must start page aligned in the file */
+	if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
+		return -EIO;
+	if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
+		return -EIO;
+	if (WARN_ON_ONCE(page_has_private(page)))
+		return -EIO;
 
 	addr = kmap_atomic(page);
 	memcpy(addr, iomap->inline_data, size);
 	memset(addr + size, 0, PAGE_SIZE - size);
 	kunmap_atomic(addr);
 	SetPageUptodate(page);
+	return 0;
 }
 
 static inline bool iomap_block_needs_zeroing(struct inode *inode,
@@ -247,9 +251,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
-		iomap_read_inline_data(inode, page, iomap);
-		return PAGE_SIZE;
+		int ret = iomap_read_inline_data(inode, page, iomap);
+		return ret ?: PAGE_SIZE;
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
@@ -589,6 +592,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
+static int iomap_write_begin_inline(struct inode *inode,
+		struct page *page, struct iomap *srcmap)
+{
+	/* needs more work for the tailpacking case, disable for now */
+	if (WARN_ON_ONCE(srcmap->offset != 0))
+		return -EIO;
+	return iomap_read_inline_data(inode, page, srcmap);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
@@ -618,7 +630,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	}
 
 	if (srcmap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, srcmap);
+		status = iomap_write_begin_inline(inode, page, srcmap);
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
@@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
-	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	BUG_ON(!iomap_inline_data_size_valid(iomap));
 
 	flush_dcache_page(page);
 	addr = kmap_atomic(page);
-	memcpy(iomap->inline_data + pos, addr + pos, copied);
+	memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
 	kunmap_atomic(addr);
 
 	mark_inode_dirty(inode);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9398b8c31323..6fdae86d0f1d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,21 +380,22 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 	struct iov_iter *iter = dio->submit.iter;
 	size_t copied;
 
-	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
+		return -EIO;
 
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		loff_t size = inode->i_size;
 
 		if (pos > size)
-			memset(iomap->inline_data + size, 0, pos - size);
-		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
+			memset(iomap_inline_data(iomap, size), 0, pos - size);
+		copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
 		if (copied) {
 			if (pos + copied > size)
 				i_size_write(inode, pos + copied);
 			mark_inode_dirty(inode);
 		}
 	} else {
-		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
+		copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);
 	}
 	dio->size += copied;
 	return copied;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 479c1da3e221..c6af1ef608c6 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -28,7 +28,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	1	/* delayed allocation blocks */
 #define IOMAP_MAPPED	2	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
-#define IOMAP_INLINE	4	/* data inline in the inode */
+#define IOMAP_INLINE	4	/* inline or tail-packed data */
 
 /*
  * Flags reported by the file system from iomap_begin:
@@ -97,6 +97,24 @@ iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
 
+/*
+ * Returns the inline data pointer for logical offset @pos.
+ */
+static inline void *iomap_inline_data(struct iomap *iomap, loff_t pos)
+{
+	return iomap->inline_data + pos - iomap->offset;
+}
+
+/*
+ * Check if the mapping's length is within the valid range for inline data.
+ * This is used to guard against accessing data beyond the page inline_data
+ * points at.
+ */
+static inline bool iomap_inline_data_size_valid(struct iomap *iomap)
+{
+	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
+}
+
 /*
  * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
  * and page_done will be called for each page written to.  This only applies to
-- 
2.26.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  8:08     ` Andreas Grünbacher
@ 2021-07-26  8:17       ` Gao Xiang
  0 siblings, 0 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-26  8:17 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Andreas Gruenbacher, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Jul 26, 2021 at 10:08:06AM +0200, Andreas Grünbacher wrote:
> Am Mo., 26. Juli 2021 um 06:00 Uhr schrieb Gao Xiang

...

> >
> > iomap_inline_buf() was suggested by Darrick. From my point of view,
> > I think it's better since it's a part of iomap->inline_data due to
> > pos involved.
> 
> I find iomap_inline_buf a bit more confusing because it's not
> immediately obvious that "buf" == "data".
> 
> I'll send an updated version.

Okay, thank you very much!

Thanks,
Gao Xiang

> 
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-23 17:41 Gao Xiang
                   ` (2 preceding siblings ...)
  2021-07-25 22:16 ` Andreas Gruenbacher
@ 2021-07-26  8:08 ` Joseph Qi
  3 siblings, 0 replies; 31+ messages in thread
From: Joseph Qi @ 2021-07-26  8:08 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs, linux-fsdevel
  Cc: LKML, Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Andreas Gruenbacher, Huang Jianan



On 7/24/21 1:41 AM, Gao Xiang wrote:
> Add support for reading inline data content into the page cache from
> nonzero page-aligned file offsets.  This enables the EROFS tailpacking
> mode where the last few bytes of the file are stored right after the
> inode.
> 
> The buffered write path remains untouched since EROFS cannot be used
> for testing. It'd be better to be implemented if upcoming real users
> care and provide a real pattern rather than leave untested dead code
> around.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Looks good to me.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

> ---
> v6: https://lore.kernel.org/r/20210722031729.51628-1-hsiangkao@linux.alibaba.com
> changes since v6:
>  - based on Christoph's reply;
>  - update commit message suggested by Darrick;
>  - disable buffered write path until some real fs users.
> 
>  fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++----------------
>  fs/iomap/direct-io.c   | 10 ++++++----
>  include/linux/iomap.h  | 14 ++++++++++++++
>  3 files changed, 46 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438bec..f351e1f9e3f6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
>  	struct readahead_control *rac;
>  };
>  
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> -		struct iomap *iomap)
> +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> +		struct iomap *iomap, loff_t pos)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = iomap->length + iomap->offset - pos;
>  	void *addr;
>  
>  	if (PageUptodate(page))
> -		return;
> +		return PAGE_SIZE;
>  
> -	BUG_ON(page_has_private(page));
> -	BUG_ON(page->index);
> -	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	/* inline data must start page aligned in the file */
> +	if (WARN_ON_ONCE(offset_in_page(pos)))
> +		return -EIO;
> +	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> +		return -EIO;
> +	if (WARN_ON_ONCE(page_has_private(page)))
> +		return -EIO;
>  
>  	addr = kmap_atomic(page);
> -	memcpy(addr, iomap->inline_data, size);
> +	memcpy(addr, iomap_inline_buf(iomap, pos), size);
>  	memset(addr + size, 0, PAGE_SIZE - size);
>  	kunmap_atomic(addr);
>  	SetPageUptodate(page);
> +	return PAGE_SIZE;
>  }
>  
>  static inline bool iomap_block_needs_zeroing(struct inode *inode,
> @@ -246,11 +250,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	unsigned poff, plen;
>  	sector_t sector;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
> -		iomap_read_inline_data(inode, page, iomap);
> -		return PAGE_SIZE;
> -	}
> +	if (iomap->type == IOMAP_INLINE)
> +		return iomap_read_inline_data(inode, page, iomap, pos);
>  
>  	/* zero post-eof blocks as the page may be mapped */
>  	iop = iomap_page_create(inode, page);
> @@ -589,6 +590,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  	return 0;
>  }
>  
> +static int iomap_write_begin_inline(struct inode *inode,
> +		struct page *page, struct iomap *srcmap)
> +{
> +	/* needs more work for the tailpacking case, disable for now */
> +	if (WARN_ON_ONCE(srcmap->offset != 0))
> +		return -EIO;
> +	return iomap_read_inline_data(inode, page, srcmap, 0);
> +}
> +
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> @@ -618,14 +628,14 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	}
>  
>  	if (srcmap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, srcmap);
> +		status = iomap_write_begin_inline(inode, page, srcmap);
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, flags, page,
>  				srcmap);
>  
> -	if (unlikely(status))
> +	if (unlikely(status < 0))
>  		goto out_unlock;
>  
>  	*pagep = page;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9398b8c31323..a6aaea2764a5 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -378,23 +378,25 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  		struct iomap_dio *dio, struct iomap *iomap)
>  {
>  	struct iov_iter *iter = dio->submit.iter;
> +	void *dst = iomap_inline_buf(iomap, pos);
>  	size_t copied;
>  
> -	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> +		return -EIO;
>  
>  	if (dio->flags & IOMAP_DIO_WRITE) {
>  		loff_t size = inode->i_size;
>  
>  		if (pos > size)
> -			memset(iomap->inline_data + size, 0, pos - size);
> -		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> +			memset(iomap_inline_buf(iomap, size), 0, pos - size);
> +		copied = copy_from_iter(dst, length, iter);
>  		if (copied) {
>  			if (pos + copied > size)
>  				i_size_write(inode, pos + copied);
>  			mark_inode_dirty(inode);
>  		}
>  	} else {
> -		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
> +		copied = copy_to_iter(dst, length, iter);
>  	}
>  	dio->size += copied;
>  	return copied;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 479c1da3e221..56b118c6d05c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -97,6 +97,20 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>  	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
>  
> +static inline void *iomap_inline_buf(const struct iomap *iomap, loff_t pos)
> +{
> +	return iomap->inline_data - iomap->offset + pos;
> +}
> +
> +/*
> + * iomap->inline_data is a potentially kmapped page, ensure it never crosses a
> + * page boundary.
> + */
> +static inline bool iomap_inline_data_size_valid(const struct iomap *iomap)
> +{
> +	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
> +}
> +
>  /*
>   * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
>   * and page_done will be called for each page written to.  This only applies to
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  4:00   ` Gao Xiang
@ 2021-07-26  8:08     ` Andreas Grünbacher
  2021-07-26  8:17       ` Gao Xiang
  2021-07-26 11:06     ` Andreas Gruenbacher
  1 sibling, 1 reply; 31+ messages in thread
From: Andreas Grünbacher @ 2021-07-26  8:08 UTC (permalink / raw)
  To: Andreas Gruenbacher, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs,
	Linux FS-devel Mailing List, Linux Kernel Mailing List,
	Andreas Gruenbacher

Am Mo., 26. Juli 2021 um 06:00 Uhr schrieb Gao Xiang
<hsiangkao@linux.alibaba.com>:
> On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > Here's a fixed and cleaned up version that passes fstests on gfs2.
>
> (cont.
> https://lore.kernel.org/r/YP4fk75mr%2FmIotDy@B-P7TQMD6M-0146.local)
>
> Would you mind listing what it fixed on gfs2 compared with v7?
> IOWs, I wonder which case failed with v7 on gfs2 so I could recheck
> this.

The use of iomap->length or pos in iomap_read_inline_data is
fundamentally broken, that's what I've fixed.

> > I see no reason why the combination of tail packing + writing should
> > cause any issues, so in my opinion, the check that disables that
> > combination in iomap_write_begin_inline should still be removed.
> >
> > It turns out that returning the number of bytes copied from
> > iomap_read_inline_data is a bit irritating: the function is really used
> > for filling the page, but that's not always the "progress" we're looking
> > for.  In the iomap_readpage case, we actually need to advance by an
> > antire page, but in the iomap_file_buffered_write case, we need to
> > advance by the length parameter of iomap_write_actor or less.  So I've
> > changed that back.
> >
> > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > more useful to me.
> >
> > Thanks,
> > Andreas
> >
> > --
> >
> > Subject: [PATCH] iomap: Support tail packing
> >
> > The existing inline data support only works for cases where the entire
> > file is stored as inline data.  For larger files, EROFS stores the
> > initial blocks separately and then can pack a small tail adjacent to the
> > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > cross a page boundary in memory.
> >
> > We currently have no filesystems that support tail packing and writing,
> > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > not aware of any reason why this code path shouldn't work, however.
> >
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Darrick J. Wong <djwong@kernel.org>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> >  fs/iomap/direct-io.c   | 11 ++++++-----
> >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> >  3 files changed, 50 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 87ccb3438bec..334bf98fdd4a 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> >       struct readahead_control *rac;
> >  };
> >
> > -static void
> > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> >               struct iomap *iomap)
> >  {
> > -     size_t size = i_size_read(inode);
> > +     size_t size = i_size_read(inode) - iomap->offset;
> >       void *addr;
> >
> >       if (PageUptodate(page))
> > -             return;
> > +             return 0;
> >
> > -     BUG_ON(page_has_private(page));
> > -     BUG_ON(page->index);
> > -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     /* inline and tail-packed data must start page aligned in the file */
> > +     if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> > +             return -EIO;
> > +     if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> > +             return -EIO;
> > +     if (WARN_ON_ONCE(page_has_private(page)))
> > +             return -EIO;
> >
> >       addr = kmap_atomic(page);
> >       memcpy(addr, iomap->inline_data, size);
> >       memset(addr + size, 0, PAGE_SIZE - size);
> >       kunmap_atomic(addr);
> >       SetPageUptodate(page);
> > +     return 0;
> >  }
> >
> >  static inline bool iomap_block_needs_zeroing(struct inode *inode,
> > @@ -247,7 +251,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >       sector_t sector;
> >
> >       if (iomap->type == IOMAP_INLINE) {
> > -             WARN_ON_ONCE(pos);
> >               iomap_read_inline_data(inode, page, iomap);
> >               return PAGE_SIZE;
> >       }
> > @@ -589,6 +592,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >       return 0;
> >  }
> >
> > +static int iomap_write_begin_inline(struct inode *inode,
> > +             struct page *page, struct iomap *srcmap)
> > +{
> > +     /* needs more work for the tailpacking case, disable for now */
> > +     if (WARN_ON_ONCE(srcmap->offset != 0))
> > +             return -EIO;
> > +     return iomap_read_inline_data(inode, page, srcmap);
> > +}
> > +
> >  static int
> >  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >               struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> > @@ -618,7 +630,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >       }
> >
> >       if (srcmap->type == IOMAP_INLINE)
> > -             iomap_read_inline_data(inode, page, srcmap);
> > +             status = iomap_write_begin_inline(inode, page, srcmap);
> >       else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> >               status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> >       else
> > @@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> >       void *addr;
> >
> >       WARN_ON_ONCE(!PageUptodate(page));
> > -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     BUG_ON(!iomap_within_inline_data(iomap, pos + copied - 1));
> >
> >       flush_dcache_page(page);
> >       addr = kmap_atomic(page);
> > -     memcpy(iomap->inline_data + pos, addr + pos, copied);
> > +     memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
> >       kunmap_atomic(addr);
> >
> >       mark_inode_dirty(inode);
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 9398b8c31323..c9424e58f613 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -380,21 +380,22 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
> >       struct iov_iter *iter = dio->submit.iter;
> >       size_t copied;
> >
> > -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     if (WARN_ON_ONCE(!iomap_within_inline_data(iomap, pos + length - 1)))
> > +             return -EIO;
>
> I also wonder what is wrong with the previous patch:
>
> +       if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> +               return -EIO;
>
> +/*
> + * iomap->inline_data is a potentially kmapped page, ensure it never crosses a
> + * page boundary.
> + */
> +static inline bool iomap_inline_data_size_valid(const struct iomap *iomap)
> +{
> +       return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
> +}
>
> In principle, the relationship of iomap->offset, pos, length and
> iomap->length is:
>
> "       iomap->offset <= pos < pos + length <= iomap->offset +
> iomap->length   "
>
> pos and pos + length are also impacted by what user requests rather
> than the original extent itself reported by fs.
>
> Here we need to make sure the whole extent in the page, so I think
> it'd be better to check with iomap->length rather than some pos,
> length related stuffs.

Yeah okay, the difference is that I've checked the validity of the
range of the extent actually used and iomap_inline_data_size_valid
checks if the entire mapping is valid. So let's stick with Christoph's
previous version.

> >
> >       if (dio->flags & IOMAP_DIO_WRITE) {
> > -             loff_t size = inode->i_size;
> > +             loff_t size = iomap->offset + iomap->length;
>
> and here, since it's the last extent and due to the current limitation
> in practice,
> iomap->offset + iomap->length == inode->i_size,
>
> yet I wonder why this part uses iomap->length to calculate instead of
> using i_size as in iomap_read_inline_data().
>
> My thought is "here it handles the i_size pointer and append write",
> so I think "loff_t size = inode->i_size" makes more sense here.

Hmm, right.

> >               if (pos > size)
> > -                     memset(iomap->inline_data + size, 0, pos - size);
> > -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> > +                     memset(iomap_inline_data(iomap, size), 0, pos - size);
> > +             copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
>
> iomap_inline_buf() was suggested by Darrick. From my point of view,
> I think it's better since it's a part of iomap->inline_data due to
> pos involved.

I find iomap_inline_buf a bit more confusing because it's not
immediately obvious that "buf" == "data".

I'll send an updated version.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  7:22     ` Andreas Gruenbacher
@ 2021-07-26  7:38       ` Gao Xiang
  2021-07-26 21:36       ` Darrick J. Wong
  1 sibling, 0 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-26  7:38 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, LKML,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 09:22:41AM +0200, Andreas Gruenbacher wrote:
> On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > > Here's a fixed and cleaned up version that passes fstests on gfs2.
> > >
> > > I see no reason why the combination of tail packing + writing should
> > > cause any issues, so in my opinion, the check that disables that
> > > combination in iomap_write_begin_inline should still be removed.
> >
> > Since there is no such fs for tail-packing write, I just do a wild
> > guess, for example,
> >  1) the tail-end block was not inlined, so iomap_write_end() dirtied
> >     the whole page (or buffer) for the page writeback;
> >  2) then it was truncated into a tail-packing inline block so the last
> >     extent(page) became INLINE but dirty instead;
> >  3) during the late page writeback for dirty pages,
> >     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> >     would be triggered in iomap_writepage_map() for such dirty page.
> >
> > As Matthew pointed out before,
> > https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> > currently tail-packing inline won't interact with page writeback, but
> > I'm afraid a supported tail-packing write fs needs to reconsider the
> > whole stuff how page, inode writeback works and what the pattern is
> > with the tail-packing.
> >
> > >
> > > It turns out that returning the number of bytes copied from
> > > iomap_read_inline_data is a bit irritating: the function is really used
> > > for filling the page, but that's not always the "progress" we're looking
> > > for.  In the iomap_readpage case, we actually need to advance by an
> > > antire page, but in the iomap_file_buffered_write case, we need to
> > > advance by the length parameter of iomap_write_actor or less.  So I've
> > > changed that back.
> > >
> > > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > > more useful to me.
> > >
> > > Thanks,
> > > Andreas
> > >
> > > --
> > >
> > > Subject: [PATCH] iomap: Support tail packing
> > >
> > > The existing inline data support only works for cases where the entire
> > > file is stored as inline data.  For larger files, EROFS stores the
> > > initial blocks separately and then can pack a small tail adjacent to the
> > > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > > cross a page boundary in memory.
> > >
> > > We currently have no filesystems that support tail packing and writing,
> > > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > > not aware of any reason why this code path shouldn't work, however.
> > >
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > ---
> > >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > >  fs/iomap/direct-io.c   | 11 ++++++-----
> > >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> > >  3 files changed, 50 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 87ccb3438bec..334bf98fdd4a 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> > >       struct readahead_control *rac;
> > >  };
> > >
> > > -static void
> > > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > >               struct iomap *iomap)
> > >  {
> > > -     size_t size = i_size_read(inode);
> > > +     size_t size = i_size_read(inode) - iomap->offset;
> >
> > I wonder why you use i_size / iomap->offset here,
> 
> This function is supposed to copy the inline or tail data at
> iomap->inline_data into the page passed to it. Logically, the inline
> data starts at iomap->offset and extends until i_size_read(inode).
> Relative to the page, the inline data starts at offset 0 and extends
> until i_size_read(inode) - iomap->offset. It's as simple as that.
> 
> > and why you completely ignoring iomap->length field returning by fs.
> 
> In the iomap_readpage case (iomap_begin with flags == 0),
> iomap->length will be the amount of data up to the end of the inode.
> In the iomap_file_buffered_write case (iomap_begin with flags ==
> IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> (For extending writes, we need to write beyond the current end of
> inode.) So iomap->length isn't all that useful for
> iomap_read_inline_data.
> 
> > Using i_size here instead of iomap->length seems coupling to me in the
> > beginning (even currently in practice there is some limitation.)
> 
> And what is that?

In short, I'm not against your modification. Since from my own point of
view, these are all minor stuffs. No matter my v7 or your patch
attached.

As Darrick said before, "at least not muddy the waters further." What I
need to confirm is to make sure the functionality works both on gfs2 and
erofs, and merge into iomap for-next so I could rebase my development
branch and do my own development then.

Finally, would you minding add your own SOB on this and fix what
Matthew pointed out?

Thanks,
Gao Xiang

> 
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  2:36   ` Gao Xiang
@ 2021-07-26  7:22     ` Andreas Gruenbacher
  2021-07-26  7:38       ` Gao Xiang
  2021-07-26 21:36       ` Darrick J. Wong
  0 siblings, 2 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-26  7:22 UTC (permalink / raw)
  To: Andreas Gruenbacher, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs, linux-fsdevel, LKML,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > Here's a fixed and cleaned up version that passes fstests on gfs2.
> >
> > I see no reason why the combination of tail packing + writing should
> > cause any issues, so in my opinion, the check that disables that
> > combination in iomap_write_begin_inline should still be removed.
>
> Since there is no such fs for tail-packing write, I just do a wild
> guess, for example,
>  1) the tail-end block was not inlined, so iomap_write_end() dirtied
>     the whole page (or buffer) for the page writeback;
>  2) then it was truncated into a tail-packing inline block so the last
>     extent(page) became INLINE but dirty instead;
>  3) during the late page writeback for dirty pages,
>     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
>     would be triggered in iomap_writepage_map() for such dirty page.
>
> As Matthew pointed out before,
> https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> currently tail-packing inline won't interact with page writeback, but
> I'm afraid a supported tail-packing write fs needs to reconsider the
> whole stuff how page, inode writeback works and what the pattern is
> with the tail-packing.
>
> >
> > It turns out that returning the number of bytes copied from
> > iomap_read_inline_data is a bit irritating: the function is really used
> > for filling the page, but that's not always the "progress" we're looking
> > for.  In the iomap_readpage case, we actually need to advance by an
> > antire page, but in the iomap_file_buffered_write case, we need to
> > advance by the length parameter of iomap_write_actor or less.  So I've
> > changed that back.
> >
> > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > more useful to me.
> >
> > Thanks,
> > Andreas
> >
> > --
> >
> > Subject: [PATCH] iomap: Support tail packing
> >
> > The existing inline data support only works for cases where the entire
> > file is stored as inline data.  For larger files, EROFS stores the
> > initial blocks separately and then can pack a small tail adjacent to the
> > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > cross a page boundary in memory.
> >
> > We currently have no filesystems that support tail packing and writing,
> > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > not aware of any reason why this code path shouldn't work, however.
> >
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Darrick J. Wong <djwong@kernel.org>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> >  fs/iomap/direct-io.c   | 11 ++++++-----
> >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> >  3 files changed, 50 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 87ccb3438bec..334bf98fdd4a 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> >       struct readahead_control *rac;
> >  };
> >
> > -static void
> > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> >               struct iomap *iomap)
> >  {
> > -     size_t size = i_size_read(inode);
> > +     size_t size = i_size_read(inode) - iomap->offset;
>
> I wonder why you use i_size / iomap->offset here,

This function is supposed to copy the inline or tail data at
iomap->inline_data into the page passed to it. Logically, the inline
data starts at iomap->offset and extends until i_size_read(inode).
Relative to the page, the inline data starts at offset 0 and extends
until i_size_read(inode) - iomap->offset. It's as simple as that.

> and why you completely ignoring iomap->length field returning by fs.

In the iomap_readpage case (iomap_begin with flags == 0),
iomap->length will be the amount of data up to the end of the inode.
In the iomap_file_buffered_write case (iomap_begin with flags ==
IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
(For extending writes, we need to write beyond the current end of
inode.) So iomap->length isn't all that useful for
iomap_read_inline_data.

> Using i_size here instead of iomap->length seems coupling to me in the
> beginning (even currently in practice there is some limitation.)

And what is that?

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-26  3:06   ` Matthew Wilcox
@ 2021-07-26  6:56     ` Andreas Gruenbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-26  6:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Gao Xiang, Christoph Hellwig, Darrick J . Wong, Huang Jianan,
	linux-erofs, linux-fsdevel, LKML, Andreas Gruenbacher

On Mon, Jul 26, 2021 at 5:07 AM Matthew Wilcox <willy@infradead.org> wrote:
> On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > @@ -247,7 +251,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >       sector_t sector;
> >
> >       if (iomap->type == IOMAP_INLINE) {
> > -             WARN_ON_ONCE(pos);
> >               iomap_read_inline_data(inode, page, iomap);
> >               return PAGE_SIZE;
>
> This surely needs to return -EIO if there was an error.

Hmm, right.

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-25 22:16 ` Andreas Gruenbacher
  2021-07-26  2:36   ` Gao Xiang
  2021-07-26  3:06   ` Matthew Wilcox
@ 2021-07-26  4:00   ` Gao Xiang
  2021-07-26  8:08     ` Andreas Grünbacher
  2021-07-26 11:06     ` Andreas Gruenbacher
  2 siblings, 2 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-26  4:00 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, linux-kernel,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> Here's a fixed and cleaned up version that passes fstests on gfs2.

(cont.
https://lore.kernel.org/r/YP4fk75mr%2FmIotDy@B-P7TQMD6M-0146.local)

Would you mind listing what it fixed on gfs2 compared with v7?
IOWs, I wonder which case failed with v7 on gfs2 so I could recheck
this.

> 
> I see no reason why the combination of tail packing + writing should
> cause any issues, so in my opinion, the check that disables that
> combination in iomap_write_begin_inline should still be removed.
> 
> It turns out that returning the number of bytes copied from
> iomap_read_inline_data is a bit irritating: the function is really used
> for filling the page, but that's not always the "progress" we're looking
> for.  In the iomap_readpage case, we actually need to advance by an
> antire page, but in the iomap_file_buffered_write case, we need to
> advance by the length parameter of iomap_write_actor or less.  So I've
> changed that back.
> 
> I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> more useful to me.
> 
> Thanks,
> Andreas
> 
> --
> 
> Subject: [PATCH] iomap: Support tail packing
> 
> The existing inline data support only works for cases where the entire
> file is stored as inline data.  For larger files, EROFS stores the
> initial blocks separately and then can pack a small tail adjacent to the
> inode.  Generalise inline data to allow for tail packing.  Tails may not
> cross a page boundary in memory.
> 
> We currently have no filesystems that support tail packing and writing,
> so that case is currently disabled (see iomap_write_begin_inline).  I'm
> not aware of any reason why this code path shouldn't work, however.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
>  fs/iomap/direct-io.c   | 11 ++++++-----
>  include/linux/iomap.h  | 22 +++++++++++++++++++++-
>  3 files changed, 50 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438bec..334bf98fdd4a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
>  	struct readahead_control *rac;
>  };
>  
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> +static int iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = i_size_read(inode) - iomap->offset;
>  	void *addr;
>  
>  	if (PageUptodate(page))
> -		return;
> +		return 0;
>  
> -	BUG_ON(page_has_private(page));
> -	BUG_ON(page->index);
> -	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	/* inline and tail-packed data must start page aligned in the file */
> +	if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
> +		return -EIO;
> +	if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
> +		return -EIO;
> +	if (WARN_ON_ONCE(page_has_private(page)))
> +		return -EIO;
>  
>  	addr = kmap_atomic(page);
>  	memcpy(addr, iomap->inline_data, size);
>  	memset(addr + size, 0, PAGE_SIZE - size);
>  	kunmap_atomic(addr);
>  	SetPageUptodate(page);
> +	return 0;
>  }
>  
>  static inline bool iomap_block_needs_zeroing(struct inode *inode,
> @@ -247,7 +251,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	sector_t sector;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
>  		iomap_read_inline_data(inode, page, iomap);
>  		return PAGE_SIZE;
>  	}
> @@ -589,6 +592,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  	return 0;
>  }
>  
> +static int iomap_write_begin_inline(struct inode *inode,
> +		struct page *page, struct iomap *srcmap)
> +{
> +	/* needs more work for the tailpacking case, disable for now */
> +	if (WARN_ON_ONCE(srcmap->offset != 0))
> +		return -EIO;
> +	return iomap_read_inline_data(inode, page, srcmap);
> +}
> +
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> @@ -618,7 +630,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	}
>  
>  	if (srcmap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, srcmap);
> +		status = iomap_write_begin_inline(inode, page, srcmap);
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
> @@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
>  	void *addr;
>  
>  	WARN_ON_ONCE(!PageUptodate(page));
> -	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!iomap_within_inline_data(iomap, pos + copied - 1));
>  
>  	flush_dcache_page(page);
>  	addr = kmap_atomic(page);
> -	memcpy(iomap->inline_data + pos, addr + pos, copied);
> +	memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9398b8c31323..c9424e58f613 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -380,21 +380,22 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  	struct iov_iter *iter = dio->submit.iter;
>  	size_t copied;
>  
> -	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	if (WARN_ON_ONCE(!iomap_within_inline_data(iomap, pos + length - 1)))
> +		return -EIO;

I also wonder what is wrong with the previous patch:

+	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
+		return -EIO;

+/*
+ * iomap->inline_data is a potentially kmapped page, ensure it never crosses a
+ * page boundary.
+ */
+static inline bool iomap_inline_data_size_valid(const struct iomap *iomap)
+{
+	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
+}

In principle, the relationship of iomap->offset, pos, length and
iomap->length is:

"	iomap->offset <= pos < pos + length <= iomap->offset +
iomap->length	"

pos and pos + length are also impacted by what user requests rather
than the original extent itself reported by fs.

Here we need to make sure the whole extent in the page, so I think
it'd be better to check with iomap->length rather than some pos,
length related stuffs.

>  
>  	if (dio->flags & IOMAP_DIO_WRITE) {
> -		loff_t size = inode->i_size;
> +		loff_t size = iomap->offset + iomap->length;

and here, since it's the last extent and due to the current limitation
in practice,
iomap->offset + iomap->length == inode->i_size,

yet I wonder why this part uses iomap->length to calculate instead of
using i_size as in iomap_read_inline_data().

My thought is "here it handles the i_size pointer and append write",
so I think "loff_t size = inode->i_size" makes more sense here.

>  
>  		if (pos > size)
> -			memset(iomap->inline_data + size, 0, pos - size);
> -		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> +			memset(iomap_inline_data(iomap, size), 0, pos - size);
> +		copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);

iomap_inline_buf() was suggested by Darrick. From my point of view,
I think it's better since it's a part of iomap->inline_data due to
pos involved.

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-25 22:16 ` Andreas Gruenbacher
  2021-07-26  2:36   ` Gao Xiang
@ 2021-07-26  3:06   ` Matthew Wilcox
  2021-07-26  6:56     ` Andreas Gruenbacher
  2021-07-26  4:00   ` Gao Xiang
  2 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2021-07-26  3:06 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Gao Xiang, Christoph Hellwig, Darrick J . Wong, Huang Jianan,
	linux-erofs, linux-fsdevel, linux-kernel, Andreas Gruenbacher

On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> @@ -247,7 +251,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	sector_t sector;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
>  		iomap_read_inline_data(inode, page, iomap);
>  		return PAGE_SIZE;

This surely needs to return -EIO if there was an error.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-25 22:16 ` Andreas Gruenbacher
@ 2021-07-26  2:36   ` Gao Xiang
  2021-07-26  7:22     ` Andreas Gruenbacher
  2021-07-26  3:06   ` Matthew Wilcox
  2021-07-26  4:00   ` Gao Xiang
  2 siblings, 1 reply; 31+ messages in thread
From: Gao Xiang @ 2021-07-26  2:36 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan, linux-erofs, linux-fsdevel, linux-kernel,
	Andreas Gruenbacher

On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> Here's a fixed and cleaned up version that passes fstests on gfs2.
> 
> I see no reason why the combination of tail packing + writing should
> cause any issues, so in my opinion, the check that disables that
> combination in iomap_write_begin_inline should still be removed.

Since there is no such fs for tail-packing write, I just do a wild
guess, for example,
 1) the tail-end block was not inlined, so iomap_write_end() dirtied
    the whole page (or buffer) for the page writeback;
 2) then it was truncated into a tail-packing inline block so the last
    extent(page) became INLINE but dirty instead;
 3) during the late page writeback for dirty pages,
    if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
    would be triggered in iomap_writepage_map() for such dirty page.

As Matthew pointed out before,
https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
currently tail-packing inline won't interact with page writeback, but
I'm afraid a supported tail-packing write fs needs to reconsider the
whole stuff how page, inode writeback works and what the pattern is
with the tail-packing.

> 
> It turns out that returning the number of bytes copied from
> iomap_read_inline_data is a bit irritating: the function is really used
> for filling the page, but that's not always the "progress" we're looking
> for.  In the iomap_readpage case, we actually need to advance by an
> antire page, but in the iomap_file_buffered_write case, we need to
> advance by the length parameter of iomap_write_actor or less.  So I've
> changed that back.
> 
> I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> more useful to me.
> 
> Thanks,
> Andreas
> 
> --
> 
> Subject: [PATCH] iomap: Support tail packing
> 
> The existing inline data support only works for cases where the entire
> file is stored as inline data.  For larger files, EROFS stores the
> initial blocks separately and then can pack a small tail adjacent to the
> inode.  Generalise inline data to allow for tail packing.  Tails may not
> cross a page boundary in memory.
> 
> We currently have no filesystems that support tail packing and writing,
> so that case is currently disabled (see iomap_write_begin_inline).  I'm
> not aware of any reason why this code path shouldn't work, however.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
>  fs/iomap/direct-io.c   | 11 ++++++-----
>  include/linux/iomap.h  | 22 +++++++++++++++++++++-
>  3 files changed, 50 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438bec..334bf98fdd4a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
>  	struct readahead_control *rac;
>  };
>  
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> +static int iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = i_size_read(inode) - iomap->offset;

I wonder why you use i_size / iomap->offset here, and why you completely
ignoring iomap->length field returning by fs.

Using i_size here instead of iomap->length seems coupling to me in the
beginning (even currently in practice there is some limitation.)

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-23 17:41 Gao Xiang
  2021-07-23 19:40 ` Matthew Wilcox
  2021-07-25 21:39 ` Andreas Grünbacher
@ 2021-07-25 22:16 ` Andreas Gruenbacher
  2021-07-26  2:36   ` Gao Xiang
                     ` (2 more replies)
  2021-07-26  8:08 ` Joseph Qi
  3 siblings, 3 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2021-07-25 22:16 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Andreas Gruenbacher, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Huang Jianan, linux-erofs, linux-fsdevel,
	linux-kernel, Andreas Gruenbacher

Here's a fixed and cleaned up version that passes fstests on gfs2.

I see no reason why the combination of tail packing + writing should
cause any issues, so in my opinion, the check that disables that
combination in iomap_write_begin_inline should still be removed.

It turns out that returning the number of bytes copied from
iomap_read_inline_data is a bit irritating: the function is really used
for filling the page, but that's not always the "progress" we're looking
for.  In the iomap_readpage case, we actually need to advance by an
antire page, but in the iomap_file_buffered_write case, we need to
advance by the length parameter of iomap_write_actor or less.  So I've
changed that back.

I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
iomap_inline_data_size_valid into iomap_within_inline_data, which seems
more useful to me.

Thanks,
Andreas

--

Subject: [PATCH] iomap: Support tail packing

The existing inline data support only works for cases where the entire
file is stored as inline data.  For larger files, EROFS stores the
initial blocks separately and then can pack a small tail adjacent to the
inode.  Generalise inline data to allow for tail packing.  Tails may not
cross a page boundary in memory.

We currently have no filesystems that support tail packing and writing,
so that case is currently disabled (see iomap_write_begin_inline).  I'm
not aware of any reason why this code path shouldn't work, however.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
 fs/iomap/direct-io.c   | 11 ++++++-----
 include/linux/iomap.h  | 22 +++++++++++++++++++++-
 3 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 87ccb3438bec..334bf98fdd4a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
 	struct readahead_control *rac;
 };
 
-static void
-iomap_read_inline_data(struct inode *inode, struct page *page,
+static int iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
 {
-	size_t size = i_size_read(inode);
+	size_t size = i_size_read(inode) - iomap->offset;
 	void *addr;
 
 	if (PageUptodate(page))
-		return;
+		return 0;
 
-	BUG_ON(page_has_private(page));
-	BUG_ON(page->index);
-	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	/* inline and tail-packed data must start page aligned in the file */
+	if (WARN_ON_ONCE(offset_in_page(iomap->offset)))
+		return -EIO;
+	if (WARN_ON_ONCE(size > PAGE_SIZE - offset_in_page(iomap->inline_data)))
+		return -EIO;
+	if (WARN_ON_ONCE(page_has_private(page)))
+		return -EIO;
 
 	addr = kmap_atomic(page);
 	memcpy(addr, iomap->inline_data, size);
 	memset(addr + size, 0, PAGE_SIZE - size);
 	kunmap_atomic(addr);
 	SetPageUptodate(page);
+	return 0;
 }
 
 static inline bool iomap_block_needs_zeroing(struct inode *inode,
@@ -247,7 +251,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
 		iomap_read_inline_data(inode, page, iomap);
 		return PAGE_SIZE;
 	}
@@ -589,6 +592,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
+static int iomap_write_begin_inline(struct inode *inode,
+		struct page *page, struct iomap *srcmap)
+{
+	/* needs more work for the tailpacking case, disable for now */
+	if (WARN_ON_ONCE(srcmap->offset != 0))
+		return -EIO;
+	return iomap_read_inline_data(inode, page, srcmap);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
@@ -618,7 +630,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	}
 
 	if (srcmap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, srcmap);
+		status = iomap_write_begin_inline(inode, page, srcmap);
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
@@ -671,11 +683,11 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
-	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	BUG_ON(!iomap_within_inline_data(iomap, pos + copied - 1));
 
 	flush_dcache_page(page);
 	addr = kmap_atomic(page);
-	memcpy(iomap->inline_data + pos, addr + pos, copied);
+	memcpy(iomap_inline_data(iomap, pos), addr + pos, copied);
 	kunmap_atomic(addr);
 
 	mark_inode_dirty(inode);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9398b8c31323..c9424e58f613 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,21 +380,22 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 	struct iov_iter *iter = dio->submit.iter;
 	size_t copied;
 
-	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	if (WARN_ON_ONCE(!iomap_within_inline_data(iomap, pos + length - 1)))
+		return -EIO;
 
 	if (dio->flags & IOMAP_DIO_WRITE) {
-		loff_t size = inode->i_size;
+		loff_t size = iomap->offset + iomap->length;
 
 		if (pos > size)
-			memset(iomap->inline_data + size, 0, pos - size);
-		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
+			memset(iomap_inline_data(iomap, size), 0, pos - size);
+		copied = copy_from_iter(iomap_inline_data(iomap, pos), length, iter);
 		if (copied) {
 			if (pos + copied > size)
 				i_size_write(inode, pos + copied);
 			mark_inode_dirty(inode);
 		}
 	} else {
-		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
+		copied = copy_to_iter(iomap_inline_data(iomap, pos), length, iter);
 	}
 	dio->size += copied;
 	return copied;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 479c1da3e221..c1b57d34cb76 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -28,7 +28,7 @@ struct vm_fault;
 #define IOMAP_DELALLOC	1	/* delayed allocation blocks */
 #define IOMAP_MAPPED	2	/* blocks allocated at @addr */
 #define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
-#define IOMAP_INLINE	4	/* data inline in the inode */
+#define IOMAP_INLINE	4	/* inline or tail-packed data */
 
 /*
  * Flags reported by the file system from iomap_begin:
@@ -97,6 +97,26 @@ iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
 
+/*
+ * Returns the inline data pointer for logical offset @pos.
+ */
+static void *iomap_inline_data(struct iomap *iomap, loff_t pos)
+{
+	return iomap->inline_data + pos - iomap->offset;
+}
+
+/*
+ * Check if logical offset @pos is within the valid range for inline data.
+ * This is used to guard against accessing data beyond the page inline_data
+ * points at.
+ */
+static bool iomap_within_inline_data(struct iomap *iomap, loff_t pos)
+{
+	unsigned int size = PAGE_SIZE - offset_in_page(iomap->inline_data);
+
+	return pos - iomap->offset < size;
+}
+
 /*
  * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
  * and page_done will be called for each page written to.  This only applies to
-- 
2.26.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-23 17:41 Gao Xiang
  2021-07-23 19:40 ` Matthew Wilcox
@ 2021-07-25 21:39 ` Andreas Grünbacher
  2021-07-25 22:16 ` Andreas Gruenbacher
  2021-07-26  8:08 ` Joseph Qi
  3 siblings, 0 replies; 31+ messages in thread
From: Andreas Grünbacher @ 2021-07-25 21:39 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-erofs, Linux FS-devel Mailing List, LKML,
	Christoph Hellwig, Darrick J . Wong, Matthew Wilcox,
	Huang Jianan

Am Fr., 23. Juli 2021 um 19:41 Uhr schrieb Gao Xiang
<hsiangkao@linux.alibaba.com>:
> Add support for reading inline data content into the page cache from
> nonzero page-aligned file offsets.  This enables the EROFS tailpacking
> mode where the last few bytes of the file are stored right after the
> inode.
>
> The buffered write path remains untouched since EROFS cannot be used
> for testing. It'd be better to be implemented if upcoming real users
> care and provide a real pattern rather than leave untested dead code
> around.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v6: https://lore.kernel.org/r/20210722031729.51628-1-hsiangkao@linux.alibaba.com
> changes since v6:
>  - based on Christoph's reply;
>  - update commit message suggested by Darrick;
>  - disable buffered write path until some real fs users.
>
>  fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++----------------
>  fs/iomap/direct-io.c   | 10 ++++++----
>  include/linux/iomap.h  | 14 ++++++++++++++
>  3 files changed, 46 insertions(+), 20 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438bec..f351e1f9e3f6 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
>         struct readahead_control *rac;
>  };
>
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> -               struct iomap *iomap)
> +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> +               struct iomap *iomap, loff_t pos)

This is completely broken. This function is about filling the page
cache. All the information needed for that is in struct iomap; the
start position of an iomap operation has nothing to do with it.

>  {
> -       size_t size = i_size_read(inode);
> +       size_t size = iomap->length + iomap->offset - pos;
>         void *addr;
>
>         if (PageUptodate(page))
> -               return;
> +               return PAGE_SIZE;
>
> -       BUG_ON(page_has_private(page));
> -       BUG_ON(page->index);
> -       BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +       /* inline data must start page aligned in the file */
> +       if (WARN_ON_ONCE(offset_in_page(pos)))
> +               return -EIO;
> +       if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> +               return -EIO;
> +       if (WARN_ON_ONCE(page_has_private(page)))
> +               return -EIO;
>
>         addr = kmap_atomic(page);
> -       memcpy(addr, iomap->inline_data, size);
> +       memcpy(addr, iomap_inline_buf(iomap, pos), size);
>         memset(addr + size, 0, PAGE_SIZE - size);
>         kunmap_atomic(addr);
>         SetPageUptodate(page);
> +       return PAGE_SIZE;
>  }
>
>  static inline bool iomap_block_needs_zeroing(struct inode *inode,
> @@ -246,11 +250,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>         unsigned poff, plen;
>         sector_t sector;
>
> -       if (iomap->type == IOMAP_INLINE) {
> -               WARN_ON_ONCE(pos);
> -               iomap_read_inline_data(inode, page, iomap);
> -               return PAGE_SIZE;
> -       }
> +       if (iomap->type == IOMAP_INLINE)
> +               return iomap_read_inline_data(inode, page, iomap, pos);
>
>         /* zero post-eof blocks as the page may be mapped */
>         iop = iomap_page_create(inode, page);
> @@ -589,6 +590,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>         return 0;
>  }
>
> +static int iomap_write_begin_inline(struct inode *inode,
> +               struct page *page, struct iomap *srcmap)
> +{
> +       /* needs more work for the tailpacking case, disable for now */
> +       if (WARN_ON_ONCE(srcmap->offset != 0))
> +               return -EIO;
> +       return iomap_read_inline_data(inode, page, srcmap, 0);
> +}
> +
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>                 struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> @@ -618,14 +628,14 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>         }
>
>         if (srcmap->type == IOMAP_INLINE)
> -               iomap_read_inline_data(inode, page, srcmap);
> +               status = iomap_write_begin_inline(inode, page, srcmap);
>         else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>                 status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>         else
>                 status = __iomap_write_begin(inode, pos, len, flags, page,
>                                 srcmap);
>
> -       if (unlikely(status))
> +       if (unlikely(status < 0))
>                 goto out_unlock;
>
>         *pagep = page;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9398b8c31323..a6aaea2764a5 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -378,23 +378,25 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>                 struct iomap_dio *dio, struct iomap *iomap)
>  {
>         struct iov_iter *iter = dio->submit.iter;
> +       void *dst = iomap_inline_buf(iomap, pos);
>         size_t copied;
>
> -       BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +       if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> +               return -EIO;
>
>         if (dio->flags & IOMAP_DIO_WRITE) {
>                 loff_t size = inode->i_size;
>
>                 if (pos > size)
> -                       memset(iomap->inline_data + size, 0, pos - size);
> -               copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> +                       memset(iomap_inline_buf(iomap, size), 0, pos - size);
> +               copied = copy_from_iter(dst, length, iter);
>                 if (copied) {
>                         if (pos + copied > size)
>                                 i_size_write(inode, pos + copied);
>                         mark_inode_dirty(inode);
>                 }
>         } else {
> -               copied = copy_to_iter(iomap->inline_data + pos, length, iter);
> +               copied = copy_to_iter(dst, length, iter);
>         }
>         dio->size += copied;
>         return copied;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 479c1da3e221..56b118c6d05c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -97,6 +97,20 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>         return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
>
> +static inline void *iomap_inline_buf(const struct iomap *iomap, loff_t pos)
> +{
> +       return iomap->inline_data - iomap->offset + pos;
> +}
> +
> +/*
> + * iomap->inline_data is a potentially kmapped page, ensure it never crosses a
> + * page boundary.
> + */
> +static inline bool iomap_inline_data_size_valid(const struct iomap *iomap)
> +{
> +       return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
> +}
> +
>  /*
>   * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
>   * and page_done will be called for each page written to.  This only applies to
> --
> 2.24.4
>

Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-23 19:40 ` Matthew Wilcox
@ 2021-07-24  0:54   ` Gao Xiang
  0 siblings, 0 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-24  0:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-erofs, linux-fsdevel, LKML, Christoph Hellwig,
	Darrick J . Wong, Andreas Gruenbacher, Huang Jianan

Hi Matthew,

On Fri, Jul 23, 2021 at 08:40:51PM +0100, Matthew Wilcox wrote:
> On Sat, Jul 24, 2021 at 01:41:31AM +0800, Gao Xiang wrote:
> > Add support for reading inline data content into the page cache from
> > nonzero page-aligned file offsets.  This enables the EROFS tailpacking
> > mode where the last few bytes of the file are stored right after the
> > inode.
> > 
> > The buffered write path remains untouched since EROFS cannot be used
> > for testing. It'd be better to be implemented if upcoming real users
> > care and provide a real pattern rather than leave untested dead code
> > around.
> 
> My one complaint with this version is the subject line.  It's a bit vague.
> I went with:
> 
> iomap: Support file tail packing
> 
> I also wrote a changelog entry that reads:
>     The existing inline data support only works for cases where the entire
>     file is stored as inline data.  For larger files, EROFS stores the
>     initial blocks separately and then can pack a small tail adjacent to
>     the inode.  Generalise inline data to allow for tail packing.  Tails
>     may not cross a page boundary in memory.
>

Yeah, we could complete the commit message like this.

Actually EROFS inode base is only 32-byte or 64-byte (so the maximum could
not be exactly small), compared to using another tail block or storing other
(maybe) irrelevant inodes. According to cache locality principle, a strategy
can be selected by mkfs to load tail block with the inode base itself to the
page cache by the tail-packing inline and so reduce I/O and fragmentation.

> ... but I'm not sure that's necessarily better than what you've written
> here.
> 
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Darrick J. Wong <djwong@kernel.org>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Many thanks for the review!

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v7] iomap: make inline data support more flexible
  2021-07-23 17:41 Gao Xiang
@ 2021-07-23 19:40 ` Matthew Wilcox
  2021-07-24  0:54   ` Gao Xiang
  2021-07-25 21:39 ` Andreas Grünbacher
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2021-07-23 19:40 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-erofs, linux-fsdevel, LKML, Christoph Hellwig,
	Darrick J . Wong, Andreas Gruenbacher, Huang Jianan

On Sat, Jul 24, 2021 at 01:41:31AM +0800, Gao Xiang wrote:
> Add support for reading inline data content into the page cache from
> nonzero page-aligned file offsets.  This enables the EROFS tailpacking
> mode where the last few bytes of the file are stored right after the
> inode.
> 
> The buffered write path remains untouched since EROFS cannot be used
> for testing. It'd be better to be implemented if upcoming real users
> care and provide a real pattern rather than leave untested dead code
> around.

My one complaint with this version is the subject line.  It's a bit vague.
I went with:

iomap: Support file tail packing

I also wrote a changelog entry that reads:
    The existing inline data support only works for cases where the entire
    file is stored as inline data.  For larger files, EROFS stores the
    initial blocks separately and then can pack a small tail adjacent to
    the inode.  Generalise inline data to allow for tail packing.  Tails
    may not cross a page boundary in memory.

... but I'm not sure that's necessarily better than what you've written
here.

> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7] iomap: make inline data support more flexible
@ 2021-07-23 17:41 Gao Xiang
  2021-07-23 19:40 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Gao Xiang @ 2021-07-23 17:41 UTC (permalink / raw)
  To: linux-erofs, linux-fsdevel
  Cc: LKML, Gao Xiang, Christoph Hellwig, Darrick J . Wong,
	Matthew Wilcox, Andreas Gruenbacher, Huang Jianan

Add support for reading inline data content into the page cache from
nonzero page-aligned file offsets.  This enables the EROFS tailpacking
mode where the last few bytes of the file are stored right after the
inode.

The buffered write path remains untouched since EROFS cannot be used
for testing. It'd be better to be implemented if upcoming real users
care and provide a real pattern rather than leave untested dead code
around.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v6: https://lore.kernel.org/r/20210722031729.51628-1-hsiangkao@linux.alibaba.com
changes since v6:
 - based on Christoph's reply;
 - update commit message suggested by Darrick;
 - disable buffered write path until some real fs users.

 fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++----------------
 fs/iomap/direct-io.c   | 10 ++++++----
 include/linux/iomap.h  | 14 ++++++++++++++
 3 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 87ccb3438bec..f351e1f9e3f6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
 	struct readahead_control *rac;
 };
 
-static void
-iomap_read_inline_data(struct inode *inode, struct page *page,
-		struct iomap *iomap)
+static int iomap_read_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap, loff_t pos)
 {
-	size_t size = i_size_read(inode);
+	size_t size = iomap->length + iomap->offset - pos;
 	void *addr;
 
 	if (PageUptodate(page))
-		return;
+		return PAGE_SIZE;
 
-	BUG_ON(page_has_private(page));
-	BUG_ON(page->index);
-	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	/* inline data must start page aligned in the file */
+	if (WARN_ON_ONCE(offset_in_page(pos)))
+		return -EIO;
+	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
+		return -EIO;
+	if (WARN_ON_ONCE(page_has_private(page)))
+		return -EIO;
 
 	addr = kmap_atomic(page);
-	memcpy(addr, iomap->inline_data, size);
+	memcpy(addr, iomap_inline_buf(iomap, pos), size);
 	memset(addr + size, 0, PAGE_SIZE - size);
 	kunmap_atomic(addr);
 	SetPageUptodate(page);
+	return PAGE_SIZE;
 }
 
 static inline bool iomap_block_needs_zeroing(struct inode *inode,
@@ -246,11 +250,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	unsigned poff, plen;
 	sector_t sector;
 
-	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
-		iomap_read_inline_data(inode, page, iomap);
-		return PAGE_SIZE;
-	}
+	if (iomap->type == IOMAP_INLINE)
+		return iomap_read_inline_data(inode, page, iomap, pos);
 
 	/* zero post-eof blocks as the page may be mapped */
 	iop = iomap_page_create(inode, page);
@@ -589,6 +590,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
+static int iomap_write_begin_inline(struct inode *inode,
+		struct page *page, struct iomap *srcmap)
+{
+	/* needs more work for the tailpacking case, disable for now */
+	if (WARN_ON_ONCE(srcmap->offset != 0))
+		return -EIO;
+	return iomap_read_inline_data(inode, page, srcmap, 0);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
@@ -618,14 +628,14 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	}
 
 	if (srcmap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, srcmap);
+		status = iomap_write_begin_inline(inode, page, srcmap);
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
 		status = __iomap_write_begin(inode, pos, len, flags, page,
 				srcmap);
 
-	if (unlikely(status))
+	if (unlikely(status < 0))
 		goto out_unlock;
 
 	*pagep = page;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9398b8c31323..a6aaea2764a5 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -378,23 +378,25 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 		struct iomap_dio *dio, struct iomap *iomap)
 {
 	struct iov_iter *iter = dio->submit.iter;
+	void *dst = iomap_inline_buf(iomap, pos);
 	size_t copied;
 
-	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
+		return -EIO;
 
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		loff_t size = inode->i_size;
 
 		if (pos > size)
-			memset(iomap->inline_data + size, 0, pos - size);
-		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
+			memset(iomap_inline_buf(iomap, size), 0, pos - size);
+		copied = copy_from_iter(dst, length, iter);
 		if (copied) {
 			if (pos + copied > size)
 				i_size_write(inode, pos + copied);
 			mark_inode_dirty(inode);
 		}
 	} else {
-		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
+		copied = copy_to_iter(dst, length, iter);
 	}
 	dio->size += copied;
 	return copied;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 479c1da3e221..56b118c6d05c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -97,6 +97,20 @@ iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
 
+static inline void *iomap_inline_buf(const struct iomap *iomap, loff_t pos)
+{
+	return iomap->inline_data - iomap->offset + pos;
+}
+
+/*
+ * iomap->inline_data is a potentially kmapped page, ensure it never crosses a
+ * page boundary.
+ */
+static inline bool iomap_inline_data_size_valid(const struct iomap *iomap)
+{
+	return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
+}
+
 /*
  * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
  * and page_done will be called for each page written to.  This only applies to
-- 
2.24.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-08-01 10:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-01 10:29 [PATCH v7] iomap: make inline data support more flexible Andreas Gruenbacher
  -- strict thread matches above, loose matches on Subject: below --
2021-07-23 17:41 Gao Xiang
2021-07-23 19:40 ` Matthew Wilcox
2021-07-24  0:54   ` Gao Xiang
2021-07-25 21:39 ` Andreas Grünbacher
2021-07-25 22:16 ` Andreas Gruenbacher
2021-07-26  2:36   ` Gao Xiang
2021-07-26  7:22     ` Andreas Gruenbacher
2021-07-26  7:38       ` Gao Xiang
2021-07-26 21:36       ` Darrick J. Wong
2021-07-26 22:20         ` Andreas Grünbacher
2021-07-26  3:06   ` Matthew Wilcox
2021-07-26  6:56     ` Andreas Gruenbacher
2021-07-26  4:00   ` Gao Xiang
2021-07-26  8:08     ` Andreas Grünbacher
2021-07-26  8:17       ` Gao Xiang
2021-07-26 11:06     ` Andreas Gruenbacher
2021-07-26 12:17       ` Christoph Hellwig
2021-07-26 12:27         ` Andreas Grünbacher
2021-07-26 12:50           ` Gao Xiang
2021-07-26 13:10             ` Andreas Gruenbacher
2021-07-26 13:27           ` Christoph Hellwig
2021-07-27  8:20         ` David Sterba
2021-07-27 13:35           ` Matthew Wilcox
2021-07-27 15:04             ` Gao Xiang
2021-07-27 16:53             ` David Sterba
2021-07-26 12:32       ` Matthew Wilcox
2021-07-26 13:03         ` Andreas Gruenbacher
2021-07-26 13:12           ` Gao Xiang
2021-07-26 13:30             ` Christoph Hellwig
2021-07-26  8:08 ` Joseph Qi

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox