Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes
@ 2020-09-19 9:39 Shijie Luo
2020-09-19 14:56 ` Matthew Wilcox
2020-09-21 10:25 ` Jan Kara
0 siblings, 2 replies; 5+ messages in thread
From: Shijie Luo @ 2020-09-19 9:39 UTC (permalink / raw)
To: linux-fsdevel; +Cc: viro, lihaotian9, lutianxiong, jack, linfeilong
We tested an oops problem in Linux 4.18. The Call Trace message is
followed below.
[255946.665989] Oops: 0000 [#1] SMP PTI
[255946.674811] Workqueue: writeback wb_workfn (flush-253:6)
[255946.676443] RIP: 0010:locked_inode_to_wb_and_lock_list+0x20/0x120
[255946.683916] RSP: 0018:ffffbb0e44727c00 EFLAGS: 00010286
[255946.685518] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[255946.687699] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ef282be5398
[255946.689866] RBP: ffff9ef282be5398 R08: ffffbb0e44727cd8 R09: ffff9ef3064f306e
[255946.692037] R10: 0000000000000000 R11: 0000000000000010 R12: ffff9ef282be5420
[255946.694208] R13: ffff9ef3351cc800 R14: 0000000000000000 R15: ffff9ef3352e2058
[255946.696378] FS: 0000000000000000(0000) GS:ffff9ef33ad80000(0000) knlGS:0000000000000000
[255946.698835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[255946.700604] CR2: 0000000000000000 CR3: 000000000760a005 CR4: 00000000003606e0
[255946.702787] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[255946.704955] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[255946.707123] Call Trace:
[255946.707918] writeback_sb_inodes+0x1fe/0x460
[255946.709244] __writeback_inodes_wb+0x5d/0xb0
[255946.710575] wb_writeback+0x265/0x2f0
[255946.711728] ? wb_workfn+0x3cf/0x4d0
[255946.712850] wb_workfn+0x3cf/0x4d0
[255946.713923] process_one_work+0x195/0x390
[255946.715173] worker_thread+0x30/0x390
[255946.716319] ? process_one_work+0x390/0x390
[255946.717625] kthread+0x10d/0x130
[255946.718789] ? kthread_flush_work_fn+0x10/0x10
[255946.720170] ret_from_fork+0x35/0x40
There is a race condition between destroy_inode and writeback_sb_inodes,
thread-1 thread-2
wb_workfn
writeback_inodes_wb
__writeback_inodes_wb
writeback_sb_inodes
wbc_attach_and_unlock_inode
iget_locked
destroy_inode
inode_detach_wb
inode->i_wb = NULL;
inode_to_wb_and_lock_list
locked_inode_to_wb_and_lock_list
wb_get
oops
so destroy inode after adding I_FREEING to inode state and the I_SYNC state
being cleared.
Reported-by: Tianxiong Lu <lutianxiong@huawei.com>
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Haotian Li <lihaotian9@huawei.com>
---
fs/inode.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index 72c4c347afb7..b28a2a9e15d5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1148,10 +1148,17 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
struct inode *new = alloc_inode(sb);
if (new) {
+ spin_lock(&new->i_lock);
new->i_state = 0;
+ spin_unlock(&new->i_lock);
inode = inode_insert5(new, hashval, test, set, data);
- if (unlikely(inode != new))
+ if (unlikely(inode != new)) {
+ spin_lock(&new->i_lock);
+ new->i_state |= I_FREEING;
+ spin_unlock(&new->i_lock);
+ inode_wait_for_writeback(new);
destroy_inode(new);
+ }
}
}
return inode;
@@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
* allocated.
*/
spin_unlock(&inode_hash_lock);
+
+ spin_lock(&inode->i_lock);
+ inode->i_state |= I_FREEING;
+ spin_unlock(&inode->i_lock);
+ inode_wait_for_writeback(inode);
destroy_inode(inode);
if (IS_ERR(old))
return NULL;
--
2.19.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes
2020-09-19 9:39 [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes Shijie Luo
@ 2020-09-19 14:56 ` Matthew Wilcox
2020-09-21 8:29 ` Shijie Luo
2020-09-21 10:25 ` Jan Kara
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2020-09-19 14:56 UTC (permalink / raw)
To: Shijie Luo; +Cc: linux-fsdevel, viro, lihaotian9, lutianxiong, jack, linfeilong
On Sat, Sep 19, 2020 at 05:39:23AM -0400, Shijie Luo wrote:
> There is a race condition between destroy_inode and writeback_sb_inodes,
> thread-1 thread-2
> wb_workfn
> writeback_inodes_wb
> __writeback_inodes_wb
> writeback_sb_inodes
> wbc_attach_and_unlock_inode
> iget_locked
> destroy_inode
> inode_detach_wb
> inode->i_wb = NULL;
>
> inode_to_wb_and_lock_list
> locked_inode_to_wb_and_lock_list
> wb_get
> oops
>
> so destroy inode after adding I_FREEING to inode state and the I_SYNC state
> being cleared.
>
> Reported-by: Tianxiong Lu <lutianxiong@huawei.com>
> Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
> Signed-off-by: Haotian Li <lihaotian9@huawei.com>
> ---
> fs/inode.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 72c4c347afb7..b28a2a9e15d5 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1148,10 +1148,17 @@ struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
> struct inode *new = alloc_inode(sb);
>
> if (new) {
> + spin_lock(&new->i_lock);
> new->i_state = 0;
> + spin_unlock(&new->i_lock);
This part is unnecessary. We just allocated 'new' two lines above;
nobody else can see 'new' yet. We make it visible with hlist_add_head_rcu()
which uses rcu_assign_pointer() whch contains a memory barrier, so it's
impossible for another CPU to see a stale i_state.
> inode = inode_insert5(new, hashval, test, set, data);
> - if (unlikely(inode != new))
> + if (unlikely(inode != new)) {
> + spin_lock(&new->i_lock);
> + new->i_state |= I_FREEING;
> + spin_unlock(&new->i_lock);
> + inode_wait_for_writeback(new);
> destroy_inode(new);
This doesn't make sense either. If an inode is returned here which is not
'new', then adding 'new' to the hash failed, and new was never visible
to another CPU.
> @@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
> * allocated.
> */
> spin_unlock(&inode_hash_lock);
> +
> + spin_lock(&inode->i_lock);
> + inode->i_state |= I_FREEING;
> + spin_unlock(&inode->i_lock);
> + inode_wait_for_writeback(inode);
> destroy_inode(inode);
Again, this doesn't make sense. This is also a codepath which failed to
make 'inode' visible to any other thread.
I don't understand how this patch could fix anything.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes
2020-09-19 14:56 ` Matthew Wilcox
@ 2020-09-21 8:29 ` Shijie Luo
0 siblings, 0 replies; 5+ messages in thread
From: Shijie Luo @ 2020-09-21 8:29 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-fsdevel, viro, lihaotian9, lutianxiong, jack, linfeilong
On 2020/9/19 22:56, Matthew Wilcox wrote:
> This part is unnecessary. We just allocated 'new' two lines above;
> nobody else can see 'new' yet. We make it visible with hlist_add_head_rcu()
> which uses rcu_assign_pointer() whch contains a memory barrier, so it's
> impossible for another CPU to see a stale i_state.
>
>> inode = inode_insert5(new, hashval, test, set, data);
>> - if (unlikely(inode != new))
>> + if (unlikely(inode != new)) {
>> + spin_lock(&new->i_lock);
>> + new->i_state |= I_FREEING;
>> + spin_unlock(&new->i_lock);
>> + inode_wait_for_writeback(new);
>> destroy_inode(new);
> This doesn't make sense either. If an inode is returned here which is not
> 'new', then adding 'new' to the hash failed, and new was never visible
> to another CPU.
>
>> @@ -1218,6 +1225,11 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
>> * allocated.
>> */
>> spin_unlock(&inode_hash_lock);
>> +
>> + spin_lock(&inode->i_lock);
>> + inode->i_state |= I_FREEING;
>> + spin_unlock(&inode->i_lock);
>> + inode_wait_for_writeback(inode);
>> destroy_inode(inode);
> Again, this doesn't make sense. This is also a codepath which failed to
> make 'inode' visible to any other thread.
>
> I don't understand how this patch could fix anything.
> .
Thanks for your review,the underlying filesystem is ext4,
ext4_alloc_inode doesn't
allocate a new vfs inode from slab, and I found the "new inode" was used
by another
thread in vmcore, in other words, the new inode should be a new one ,
but not.
Maybe it's not a filesystem problem, and fixing this problem in
iget_locked is not
a good way, I 'll try to find the root cause and fix it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes
2020-09-19 9:39 [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes Shijie Luo
2020-09-19 14:56 ` Matthew Wilcox
@ 2020-09-21 10:25 ` Jan Kara
2020-09-24 14:00 ` Shijie Luo
1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2020-09-21 10:25 UTC (permalink / raw)
To: Shijie Luo; +Cc: linux-fsdevel, viro, lihaotian9, lutianxiong, jack, linfeilong
On Sat 19-09-20 05:39:23, Shijie Luo wrote:
> We tested an oops problem in Linux 4.18. The Call Trace message is
> followed below.
>
> [255946.665989] Oops: 0000 [#1] SMP PTI
> [255946.674811] Workqueue: writeback wb_workfn (flush-253:6)
> [255946.676443] RIP: 0010:locked_inode_to_wb_and_lock_list+0x20/0x120
> [255946.683916] RSP: 0018:ffffbb0e44727c00 EFLAGS: 00010286
> [255946.685518] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [255946.687699] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ef282be5398
> [255946.689866] RBP: ffff9ef282be5398 R08: ffffbb0e44727cd8 R09: ffff9ef3064f306e
> [255946.692037] R10: 0000000000000000 R11: 0000000000000010 R12: ffff9ef282be5420
> [255946.694208] R13: ffff9ef3351cc800 R14: 0000000000000000 R15: ffff9ef3352e2058
> [255946.696378] FS: 0000000000000000(0000) GS:ffff9ef33ad80000(0000) knlGS:0000000000000000
> [255946.698835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [255946.700604] CR2: 0000000000000000 CR3: 000000000760a005 CR4: 00000000003606e0
> [255946.702787] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [255946.704955] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [255946.707123] Call Trace:
> [255946.707918] writeback_sb_inodes+0x1fe/0x460
> [255946.709244] __writeback_inodes_wb+0x5d/0xb0
> [255946.710575] wb_writeback+0x265/0x2f0
> [255946.711728] ? wb_workfn+0x3cf/0x4d0
> [255946.712850] wb_workfn+0x3cf/0x4d0
> [255946.713923] process_one_work+0x195/0x390
> [255946.715173] worker_thread+0x30/0x390
> [255946.716319] ? process_one_work+0x390/0x390
> [255946.717625] kthread+0x10d/0x130
> [255946.718789] ? kthread_flush_work_fn+0x10/0x10
> [255946.720170] ret_from_fork+0x35/0x40
So 4.18 is rather old and we had several fixes in this area for crashes
similar to the one you show above. The list was likely:
68f23b89067 ("memcg: fix a crash in wb_workfn when a device disappears")
but there were multiple changes before that to bdi logic to fix lifetime
issues when devices are hot-removed.
> There is a race condition between destroy_inode and writeback_sb_inodes,
> thread-1 thread-2
> wb_workfn
> writeback_inodes_wb
> __writeback_inodes_wb
> writeback_sb_inodes
> wbc_attach_and_unlock_inode
> iget_locked
> destroy_inode
> inode_detach_wb
> inode->i_wb = NULL;
So thread-1 looks sensible but I don't see how what is in thread-2 can ever
happen. We can call destroy_inode() from iget_locked() only for inodes that
were never added to inode hash (and so they couldn't ever be dirty of even
be handled by the flusher thread). Active inodes must (and AFAIK always do)
pass through fs/inode.c:evict() which takes care of waiting for the running
flusher thread (through inode_wait_for_writeback()).
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes
2020-09-21 10:25 ` Jan Kara
@ 2020-09-24 14:00 ` Shijie Luo
0 siblings, 0 replies; 5+ messages in thread
From: Shijie Luo @ 2020-09-24 14:00 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel, viro, lihaotian9, lutianxiong, linfeilong
On 2020/9/21 18:25, Jan Kara wrote:
> On Sat 19-09-20 05:39:23, Shijie Luo wrote:
>> So 4.18 is rather old and we had several fixes in this area for crashes
>> similar to the one you show above. The list was likely:
>>
>> 68f23b89067 ("memcg: fix a crash in wb_workfn when a device disappears")
>>
>> but there were multiple changes before that to bdi logic to fix lifetime
>> issues when devices are hot-removed.
>>
Thanks for your reply, we checked several fixes in wb_workfn , and
finally found
this patch (ceff86fddae8 ext4: Avoid freeing inodes on dirty list) works.
Our fsstress process randomly uses ioctl interface to set inode with
journal data flag, ext4 inode with journal data
flags is possible to be marked dirty and added to writeback lists again.
When locked_inode_to_wb_and_lock_list in __mark_inode_dirty releases
inode->i_lock and do not lock
wb->list_lock, simultaneously the inode is evicted and removed from
writeback lists, it's possible this
inode will be added to writeback list again. This problem causes inode
allocated from slab is still on
writeback list, and may causes crash because destory_inode set inode->wb
to be NULL.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-24 14:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-19 9:39 [PATCH RESEND] fs: fix race condition oops between destroy_inode and writeback_sb_inodes Shijie Luo
2020-09-19 14:56 ` Matthew Wilcox
2020-09-21 8:29 ` Shijie Luo
2020-09-21 10:25 ` Jan Kara
2020-09-24 14:00 ` Shijie Luo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).