* [PATCH v3 1/5] ext4: abort the filesystem if failed to async write metadata buffer
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
@ 2020-06-20 2:54 ` zhangyi (F)
2020-08-07 17:49 ` tytso
2020-06-20 2:54 ` [PATCH v3 2/5] ext4: remove ext4_buffer_uptodate() zhangyi (F)
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-06-20 2:54 UTC (permalink / raw)
To: linux-ext4, tytso, jack
Cc: adilger.kernel, zhangxiaoxu5, yi.zhang, linux-fsdevel
There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.
This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 3 +++
fs/ext4/ext4_jbd2.c | 25 +++++++++++++++++++++++++
fs/ext4/super.c | 17 +++++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b08841f70b69..60374eda1f51 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1573,6 +1573,9 @@ struct ext4_sb_info {
#ifdef CONFIG_EXT4_DEBUG
unsigned long s_simulate_fail;
#endif
+ /* Record the errseq of the backing block device */
+ errseq_t s_bdev_wb_err;
+ spinlock_t s_bdev_wb_lock;
};
static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 0c76cdd44d90..760b9ee49dc0 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -195,6 +195,28 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
jbd2_journal_abort_handle(handle);
}
+static void ext4_check_bdev_write_error(struct super_block *sb)
+{
+ struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ int err;
+
+ /*
+ * If the block device has write error flag, it may have failed to
+ * async write out metadata buffers in the background. In this case,
+ * we could read old data from disk and write it out again, which
+ * may lead to on-disk filesystem inconsistency.
+ */
+ if (errseq_check(&mapping->wb_err, READ_ONCE(sbi->s_bdev_wb_err))) {
+ spin_lock(&sbi->s_bdev_wb_lock);
+ err = errseq_check_and_advance(&mapping->wb_err, &sbi->s_bdev_wb_err);
+ spin_unlock(&sbi->s_bdev_wb_lock);
+ if (err)
+ ext4_error_err(sb, -err,
+ "Error while async write back metadata");
+ }
+}
+
int __ext4_journal_get_write_access(const char *where, unsigned int line,
handle_t *handle, struct buffer_head *bh)
{
@@ -202,6 +224,9 @@ int __ext4_journal_get_write_access(const char *where, unsigned int line,
might_sleep();
+ if (bh->b_bdev->bd_super)
+ ext4_check_bdev_write_error(bh->b_bdev->bd_super);
+
if (ext4_handle_valid(handle)) {
err = jbd2_journal_get_write_access(handle, bh);
if (err)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c668f6b42374..8d3925c31b8a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4699,6 +4699,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
}
#endif /* CONFIG_QUOTA */
+ /*
+ * Save the original bdev mapping's wb_err value which could be
+ * used to detect the metadata async write error.
+ */
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+ if (!sb_rdonly(sb))
+ errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
+ &sbi->s_bdev_wb_err);
+ sb->s_bdev->bd_super = sb;
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
ext4_orphan_cleanup(sb, es);
EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
@@ -5562,6 +5571,14 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts;
}
+ /*
+ * Update the original bdev mapping's wb_err value
+ * which could be used to detect the metadata async
+ * write error.
+ */
+ errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
+ &sbi->s_bdev_wb_err);
+
/*
* Mounting a RDONLY partition read-write, so reread
* and store the current valid flag. (It may have
--
2.25.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/5] ext4: abort the filesystem if failed to async write metadata buffer
2020-06-20 2:54 ` [PATCH v3 1/5] ext4: abort the filesystem if failed to async write metadata buffer zhangyi (F)
@ 2020-08-07 17:49 ` tytso
0 siblings, 0 replies; 10+ messages in thread
From: tytso @ 2020-08-07 17:49 UTC (permalink / raw)
To: zhangyi (F); +Cc: linux-ext4, jack, adilger.kernel, zhangxiaoxu5, linux-fsdevel
On Sat, Jun 20, 2020 at 10:54:23AM +0800, zhangyi (F) wrote:
> There is a risk of filesystem inconsistency if we failed to async write
> back metadata buffer in the background. Because of current buffer's end
> io procedure is handled by end_buffer_async_write() in the block layer,
> and it only clear the buffer's uptodate flag and mark the write_io_error
> flag, so ext4 cannot detect such failure immediately. In most cases of
> getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
> buffer's data is actually uptodate, it may still read data from disk
> because the buffer's uptodate flag has been cleared. Finally, it may
> lead to on-disk filesystem inconsistency if reading old data from the
> disk successfully and write them out again.
>
> This patch detect bdev mapping->wb_err when getting journal's write
> access and mark the filesystem error if bdev's mapping->wb_err was
> increased, this could prevent further writing and potential
> inconsistency.
>
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> Suggested-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/5] ext4: remove ext4_buffer_uptodate()
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
2020-06-20 2:54 ` [PATCH v3 1/5] ext4: abort the filesystem if failed to async write metadata buffer zhangyi (F)
@ 2020-06-20 2:54 ` zhangyi (F)
2020-08-07 17:53 ` tytso
2020-06-20 2:54 ` [PATCH v3 3/5] ext4: remove write io error check before read inode block zhangyi (F)
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-06-20 2:54 UTC (permalink / raw)
To: linux-ext4, tytso, jack
Cc: adilger.kernel, zhangxiaoxu5, yi.zhang, linux-fsdevel
After we add async write error check in ext4_journal_get_write_access(),
we can remove the partial fix for filesystem inconsistency problem
caused by reading old data from disk, which in commit <7963e5ac9012>
"ext4: treat buffers with write errors as containing valid data" and
<cf2834a5ed57> "ext4: treat buffers contining write errors as valid in
ext4_sb_bread()".
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
fs/ext4/ext4.h | 13 -------------
fs/ext4/inode.c | 4 ++--
fs/ext4/super.c | 2 +-
3 files changed, 3 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 60374eda1f51..f22940e5de5a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3501,19 +3501,6 @@ extern const struct iomap_ops ext4_iomap_ops;
extern const struct iomap_ops ext4_iomap_overwrite_ops;
extern const struct iomap_ops ext4_iomap_report_ops;
-static inline int ext4_buffer_uptodate(struct buffer_head *bh)
-{
- /*
- * If the buffer has the write error flag, we have failed
- * to write out data in the block. In this case, we don't
- * have to read the block because we may read the old data
- * successfully.
- */
- if (!buffer_uptodate(bh) && buffer_write_io_error(bh))
- set_buffer_uptodate(bh);
- return buffer_uptodate(bh);
-}
-
#endif /* __KERNEL__ */
#define EFSBADCRC EBADMSG /* Bad CRC detected */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 40ec5c7ef0d3..f68afc5c0b2d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -883,7 +883,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
bh = ext4_getblk(handle, inode, block, map_flags);
if (IS_ERR(bh))
return bh;
- if (!bh || ext4_buffer_uptodate(bh))
+ if (!bh || buffer_uptodate(bh))
return bh;
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh);
wait_on_buffer(bh);
@@ -910,7 +910,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
for (i = 0; i < bh_count; i++)
/* Note that NULL bhs[i] is valid because of holes. */
- if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
+ if (bhs[i] && !buffer_uptodate(bhs[i]))
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1,
&bhs[i]);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8d3925c31b8a..513d1e270f6d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -154,7 +154,7 @@ ext4_sb_bread(struct super_block *sb, sector_t block, int op_flags)
if (bh == NULL)
return ERR_PTR(-ENOMEM);
- if (ext4_buffer_uptodate(bh))
+ if (buffer_uptodate(bh))
return bh;
ll_rw_block(REQ_OP_READ, REQ_META | op_flags, 1, &bh);
wait_on_buffer(bh);
--
2.25.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/5] ext4: remove ext4_buffer_uptodate()
2020-06-20 2:54 ` [PATCH v3 2/5] ext4: remove ext4_buffer_uptodate() zhangyi (F)
@ 2020-08-07 17:53 ` tytso
0 siblings, 0 replies; 10+ messages in thread
From: tytso @ 2020-08-07 17:53 UTC (permalink / raw)
To: zhangyi (F); +Cc: linux-ext4, jack, adilger.kernel, zhangxiaoxu5, linux-fsdevel
On Sat, Jun 20, 2020 at 10:54:24AM +0800, zhangyi (F) wrote:
> After we add async write error check in ext4_journal_get_write_access(),
> we can remove the partial fix for filesystem inconsistency problem
> caused by reading old data from disk, which in commit <7963e5ac9012>
> "ext4: treat buffers with write errors as containing valid data" and
> <cf2834a5ed57> "ext4: treat buffers contining write errors as valid in
> ext4_sb_bread()".
>
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
I think it's better to keep these checks (in commits 2 and 3) because
ext4_error() doens't guarantee that the file system will be aborted.
For better or for worse, there are a large number of file systems
which default to errors=continue.
So if we notice that the buffer is not up to date, due to a write
error, we can avoid some (although not all) damage if we keep this
workaround.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 3/5] ext4: remove write io error check before read inode block
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
2020-06-20 2:54 ` [PATCH v3 1/5] ext4: abort the filesystem if failed to async write metadata buffer zhangyi (F)
2020-06-20 2:54 ` [PATCH v3 2/5] ext4: remove ext4_buffer_uptodate() zhangyi (F)
@ 2020-06-20 2:54 ` zhangyi (F)
2020-06-20 2:54 ` [PATCH v3 4/5] jbd2: abort journal if free a async write error metadata buffer zhangyi (F)
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: zhangyi (F) @ 2020-06-20 2:54 UTC (permalink / raw)
To: linux-ext4, tytso, jack
Cc: adilger.kernel, zhangxiaoxu5, yi.zhang, linux-fsdevel
After we add async write error check in ext4_journal_get_write_access(),
we can remove the partial fix for filesystem inconsistency problem
caused by reading old data from disk, which in commit <9c83a923c67d>
"ext4: don't read inode block if the buffer has a write error".
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
fs/ext4/inode.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f68afc5c0b2d..79b73a86ef6c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4289,15 +4289,6 @@ static int __ext4_get_inode_loc(struct inode *inode,
if (!buffer_uptodate(bh)) {
lock_buffer(bh);
- /*
- * If the buffer has the write error flag, we have failed
- * to write out another inode in the same block. In this
- * case, we don't have to read the block because we may
- * read the old inode data successfully.
- */
- if (buffer_write_io_error(bh) && !buffer_uptodate(bh))
- set_buffer_uptodate(bh);
-
if (buffer_uptodate(bh)) {
/* someone brought it uptodate while we waited */
unlock_buffer(bh);
--
2.25.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 4/5] jbd2: abort journal if free a async write error metadata buffer
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
` (2 preceding siblings ...)
2020-06-20 2:54 ` [PATCH v3 3/5] ext4: remove write io error check before read inode block zhangyi (F)
@ 2020-06-20 2:54 ` zhangyi (F)
2020-08-07 17:59 ` tytso
2020-06-20 2:54 ` [PATCH v3 5/5] jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() zhangyi (F)
2020-07-13 1:40 ` [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
5 siblings, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-06-20 2:54 UTC (permalink / raw)
To: linux-ext4, tytso, jack
Cc: adilger.kernel, zhangxiaoxu5, yi.zhang, linux-fsdevel
If we free a metadata buffer which has been failed to async write out
in the background, the jbd2 checkpoint procedure will not detect this
failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
inconsistency after cleanup journal tail. This patch abort the journal
if free a buffer has write_io_error flag to prevent potential further
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
fs/jbd2/transaction.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e91aad3637a2..a4932e8dcb65 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2117,6 +2117,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal,
{
struct buffer_head *head;
struct buffer_head *bh;
+ bool has_write_io_error = false;
int ret = 0;
J_ASSERT(PageLocked(page));
@@ -2141,11 +2142,26 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal,
jbd2_journal_put_journal_head(jh);
if (buffer_jbd(bh))
goto busy;
+
+ /*
+ * If we free a metadata buffer which has been failed to
+ * write out, the jbd2 checkpoint procedure will not detect
+ * this failure and may lead to filesystem inconsistency
+ * after cleanup journal tail.
+ */
+ if (buffer_write_io_error(bh)) {
+ pr_err("JBD2: Error while async write back metadata bh %llu.",
+ (unsigned long long)bh->b_blocknr);
+ has_write_io_error = true;
+ }
} while ((bh = bh->b_this_page) != head);
ret = try_to_free_buffers(page);
busy:
+ if (has_write_io_error)
+ jbd2_journal_abort(journal, -EIO);
+
return ret;
}
--
2.25.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/5] jbd2: abort journal if free a async write error metadata buffer
2020-06-20 2:54 ` [PATCH v3 4/5] jbd2: abort journal if free a async write error metadata buffer zhangyi (F)
@ 2020-08-07 17:59 ` tytso
0 siblings, 0 replies; 10+ messages in thread
From: tytso @ 2020-08-07 17:59 UTC (permalink / raw)
To: zhangyi (F); +Cc: linux-ext4, jack, adilger.kernel, zhangxiaoxu5, linux-fsdevel
On Sat, Jun 20, 2020 at 10:54:26AM +0800, zhangyi (F) wrote:
> If we free a metadata buffer which has been failed to async write out
> in the background, the jbd2 checkpoint procedure will not detect this
> failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
> inconsistency after cleanup journal tail. This patch abort the journal
> if free a buffer has write_io_error flag to prevent potential further
> inconsistency.
>
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 5/5] jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
` (3 preceding siblings ...)
2020-06-20 2:54 ` [PATCH v3 4/5] jbd2: abort journal if free a async write error metadata buffer zhangyi (F)
@ 2020-06-20 2:54 ` zhangyi (F)
2020-07-13 1:40 ` [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
5 siblings, 0 replies; 10+ messages in thread
From: zhangyi (F) @ 2020-06-20 2:54 UTC (permalink / raw)
To: linux-ext4, tytso, jack
Cc: adilger.kernel, zhangxiaoxu5, yi.zhang, linux-fsdevel
Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer
used after commit <536fc240e7147> ("jbd2: clean up
jbd2_journal_try_to_free_buffers()"), so just remove it.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
fs/ext4/inode.c | 2 +-
fs/ext4/super.c | 4 ++--
fs/jbd2/transaction.c | 7 +------
include/linux/jbd2.h | 2 +-
4 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 79b73a86ef6c..61a56023bec6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3288,7 +3288,7 @@ static int ext4_releasepage(struct page *page, gfp_t wait)
if (PageChecked(page))
return 0;
if (journal)
- return jbd2_journal_try_to_free_buffers(journal, page, wait);
+ return jbd2_journal_try_to_free_buffers(journal, page);
else
return try_to_free_buffers(page);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 513d1e270f6d..b622995723e4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1294,8 +1294,8 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
if (!page_has_buffers(page))
return 0;
if (journal)
- return jbd2_journal_try_to_free_buffers(journal, page,
- wait & ~__GFP_DIRECT_RECLAIM);
+ return jbd2_journal_try_to_free_buffers(journal, page);
+
return try_to_free_buffers(page);
}
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a4932e8dcb65..7b20c77fefa9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2078,10 +2078,6 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
* int jbd2_journal_try_to_free_buffers() - try to free page buffers.
* @journal: journal for operation
* @page: to try and free
- * @gfp_mask: we use the mask to detect how hard should we try to release
- * buffers. If __GFP_DIRECT_RECLAIM and __GFP_FS is set, we wait for commit
- * code to release the buffers.
- *
*
* For all the buffers on this page,
* if they are fully written out ordered data, move them onto BUF_CLEAN
@@ -2112,8 +2108,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
*
* Return 0 on failure, 1 on success
*/
-int jbd2_journal_try_to_free_buffers(journal_t *journal,
- struct page *page, gfp_t gfp_mask)
+int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
{
struct buffer_head *head;
struct buffer_head *bh;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f613d8529863..6d94b2c4cf75 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1376,7 +1376,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
extern int jbd2_journal_forget (handle_t *, struct buffer_head *);
extern int jbd2_journal_invalidatepage(journal_t *,
struct page *, unsigned int, unsigned int);
-extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
+extern int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page);
extern int jbd2_journal_stop(handle_t *);
extern int jbd2_journal_flush (journal_t *);
extern void jbd2_journal_lock_updates (journal_t *);
--
2.25.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error
2020-06-20 2:54 [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error zhangyi (F)
` (4 preceding siblings ...)
2020-06-20 2:54 ` [PATCH v3 5/5] jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() zhangyi (F)
@ 2020-07-13 1:40 ` zhangyi (F)
5 siblings, 0 replies; 10+ messages in thread
From: zhangyi (F) @ 2020-07-13 1:40 UTC (permalink / raw)
To: linux-ext4, tytso, jack; +Cc: adilger.kernel, zhangxiaoxu5, linux-fsdevel
Hi, Ted and Jan, what do you think about this solution ?
Thanks,
Yi.
On 2020/6/20 10:54, zhangyi (F) wrote:
> Changes since v2:
> - Christoph against the solution of adding callback in the block layer
> that could let ext4 handle write error. So for simplicity, switch to
> check the bdev mapping->wb_err when ext4 getting journal write access
> as Jan suggested now. Maybe we could implement the callback through
> introduce a special inode (e.g. a meta inode) for ext4 in the future.
> - Patch 1: Add mapping->wb_err check and invoke ext4_error_err() in
> ext4_journal_get_write_access() if wb_err is different from the
> original one saved at mount time.
> - Patch 2-3: Remove partial fix <7963e5ac90125> and <9c83a923c67d>.
> - Patch 4: Fix another inconsistency problem since we may bypass the
> journal's checkpoint procedure if we free metadata buffers which
> were failed to async write out.
> - Patch 5: Just a cleanup patch.
>
> The above 5 patches are based on linux-5.8-rc1 and have been tested by
> xfstests, no newly increased failures.
>
> Thanks,
> Yi.
>
> -----------------------
>
> Original background
> ===================
>
> This patch set point to fix the inconsistency problem which has been
> discussed and partial fixed in [1].
>
> Now, the problem is on the unstable storage which has a flaky transport
> (e.g. iSCSI transport may disconnect few seconds and reconnect due to
> the bad network environment), if we failed to async write metadata in
> background, the end write routine in block layer will clear the buffer's
> uptodate flag, but the data in such buffer is actually uptodate. Finally
> we may read "old && inconsistent" metadata from the disk when we get the
> buffer later because not only the uptodate flag was cleared but also we
> do not check the write io error flag, or even worse the buffer has been
> freed due to memory presure.
>
> Fortunately, if the jbd2 do checkpoint after async IO error happens,
> the checkpoint routine will check the write_io_error flag and abort the
> the journal if detect IO error. And in the journal recover case, the
> recover code will invoke sync_blockdev() after recover complete, it will
> also detect IO error and refuse to mount the filesystem.
>
> Current ext4 have already deal with this problem in __ext4_get_inode_loc()
> and commit 7963e5ac90125 ("ext4: treat buffers with write errors as
> containing valid data"), but it's not enough.
>
> [1] https://lore.kernel.org/linux-ext4/20190823030207.GC8130@mit.edu/
>
>
> zhangyi (F) (5):
> ext4: abort the filesystem if failed to async write metadata buffer
> ext4: remove ext4_buffer_uptodate()
> ext4: remove write io error check before read inode block
> jbd2: abort journal if free a async write error metadata buffer
> jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()
>
> fs/ext4/ext4.h | 16 +++-------------
> fs/ext4/ext4_jbd2.c | 25 +++++++++++++++++++++++++
> fs/ext4/inode.c | 15 +++------------
> fs/ext4/super.c | 23 ++++++++++++++++++++---
> fs/jbd2/transaction.c | 20 ++++++++++++++------
> include/linux/jbd2.h | 2 +-
> 6 files changed, 66 insertions(+), 35 deletions(-)
>
^ permalink raw reply [flat|nested] 10+ messages in thread