Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] Fix silent write loss in iomap
@ 2020-09-07 20:37 Matthew Wilcox (Oracle)
  2020-09-07 20:37 ` [PATCH 1/2] iomap: Clear page error before beginning a write Matthew Wilcox (Oracle)
  2020-09-07 20:37 ` [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin Matthew Wilcox (Oracle)
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-07 20:37 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J . Wong
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-fsdevel

While working on the THP patchset, I decided to inject errors, and
unfortunately I found a hole in our handling of errors with non-THPs.
You can probably reproduce these errors by inserting your own error
injection for readahead pages; mine is a little too tied to the THP
patchset to post.

The basic outline of the problem is:

 - read(ahead) hits an error, page is marked Error, !Uptodate
 - write_begin succeeds at reading in page, but it is not marked
   Uptodate due to PageError being set
 - write path copies data to page, write() call returns success
 - subsequent read() sees a page which is !Uptodate, clears Error,
   calls ->readpage, re-reads data from storage, overwrites data
   from write() with old data.

The solution presented here is to behave compatibly with mm/filemap.c.
I don't _like_ how we handle PageError for read errors.  See that other
mail to linux-fsdevel for details, but this solution fixes an error that
can be hit by people with flaky storage.

I've done this as two patches because there are actually two independent
problems here.  The bug is not fixed without applying both patches, so
I'm happy to combine them into a single patch if that makes life easier.

The problem was introduced with commit 9dc55f1389f9, which made setting
PageUptodate conditional on PageError().

Matthew Wilcox (Oracle) (2):
  iomap: Clear page error before beginning a write
  iomap: Mark read blocks uptodate in write_begin

 fs/iomap/buffered-io.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] iomap: Clear page error before beginning a write
  2020-09-07 20:37 [PATCH 0/2] Fix silent write loss in iomap Matthew Wilcox (Oracle)
@ 2020-09-07 20:37 ` Matthew Wilcox (Oracle)
  2020-09-08 14:59   ` Christoph Hellwig
  2020-09-07 20:37 ` [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin Matthew Wilcox (Oracle)
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-07 20:37 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J . Wong
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-fsdevel

If we find a page in write_begin which is !Uptodate, we need
to clear any error on the page before starting to read data
into it.  This matches how filemap_fault(), do_read_cache_page()
and generic_file_buffered_read() handle PageError on !Uptodate pages.
When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
were not being marked as uptodate.

This was found with generic/127 and a specially modified kernel which
would fail (some) readahead I/Os.  The test read some bytes in a prior
page which caused readahead to extend into page 0x34.  There was
a subsequent write to page 0x34, followed by a read to page 0x34.
Because the blocks were still marked as !Uptodate, the read caused all
blocks to be re-read, overwriting the write.  With this change, and the
next one, the bytes which were written are marked as being Uptodate, so
even though the page is still marked as !Uptodate, the blocks containing
the written data are not re-read from storage.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..c95454784df4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -578,6 +578,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 
 	if (PageUptodate(page))
 		return 0;
+	ClearPageError(page);
 
 	do {
 		iomap_adjust_read_range(inode, iop, &block_start,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin
  2020-09-07 20:37 [PATCH 0/2] Fix silent write loss in iomap Matthew Wilcox (Oracle)
  2020-09-07 20:37 ` [PATCH 1/2] iomap: Clear page error before beginning a write Matthew Wilcox (Oracle)
@ 2020-09-07 20:37 ` Matthew Wilcox (Oracle)
  2020-09-08 15:03   ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-09-07 20:37 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J . Wong
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-fsdevel

When bringing (portions of) a page uptodate, we were marking blocks that
were zeroed as being uptodate, but not blocks that were read from storage.

Like the previous commit, this problem was found with generic/127 and
a kernel which failed readahead I/Os.  This bug causes writes to be
silently lost when working with flaky storage.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c95454784df4..897ab9a26a74 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -574,7 +574,6 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
 	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
-	int status;
 
 	if (PageUptodate(page))
 		return 0;
@@ -595,14 +594,13 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
 				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
-			iomap_set_range_uptodate(page, poff, plen);
-			continue;
+		} else {
+			int status = iomap_read_page_sync(block_start, page,
+					poff, plen, srcmap);
+			if (status)
+				return status;
 		}
-
-		status = iomap_read_page_sync(block_start, page, poff, plen,
-				srcmap);
-		if (status)
-			return status;
+		iomap_set_range_uptodate(page, poff, plen);
 	} while ((block_start += plen) < block_end);
 
 	return 0;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] iomap: Clear page error before beginning a write
  2020-09-07 20:37 ` [PATCH 1/2] iomap: Clear page error before beginning a write Matthew Wilcox (Oracle)
@ 2020-09-08 14:59   ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2020-09-08 14:59 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Christoph Hellwig, Darrick J . Wong, linux-xfs, linux-fsdevel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin
  2020-09-07 20:37 ` [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin Matthew Wilcox (Oracle)
@ 2020-09-08 15:03   ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2020-09-08 15:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Christoph Hellwig, Darrick J . Wong, linux-xfs, linux-fsdevel

On Mon, Sep 07, 2020 at 09:37:07PM +0100, Matthew Wilcox (Oracle) wrote:
> When bringing (portions of) a page uptodate, we were marking blocks that
> were zeroed as being uptodate, but not blocks that were read from storage.
> 
> Like the previous commit, this problem was found with generic/127 and
> a kernel which failed readahead I/Os.  This bug causes writes to be
> silently lost when working with flaky storage.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-08 20:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 20:37 [PATCH 0/2] Fix silent write loss in iomap Matthew Wilcox (Oracle)
2020-09-07 20:37 ` [PATCH 1/2] iomap: Clear page error before beginning a write Matthew Wilcox (Oracle)
2020-09-08 14:59   ` Christoph Hellwig
2020-09-07 20:37 ` [PATCH 2/2] iomap: Mark read blocks uptodate in write_begin Matthew Wilcox (Oracle)
2020-09-08 15:03   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).