LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* WARNING in ext4_write_inode
@ 2020-02-26  6:57 syzbot
  2020-03-08  4:35 ` [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread Eric Biggers
  2020-03-08  4:36 ` [PATCH] cifs: clear PF_MEMALLOC before exiting demultiplex thread Eric Biggers
  0 siblings, 2 replies; 18+ messages in thread
From: syzbot @ 2020-02-26  6:57 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, linux-kernel, syzkaller-bugs, tytso

Hello,

syzbot found the following crash on:

HEAD commit:    d2eee258 Merge tag 'for-5.6-rc2-tag' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1706127ee00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3e57a6b450fb9883
dashboard link: https://syzkaller.appspot.com/bug?extid=1f9dc49e8de2582d90c2
compiler:       clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 2697 at fs/ext4/inode.c:5097 ext4_write_inode+0x4eb/0x5b0 fs/ext4/inode.c:5097
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 2697 Comm: xfsaild/loop1 Not tainted 5.6.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1fb/0x318 lib/dump_stack.c:118
 panic+0x264/0x7a9 kernel/panic.c:221
 __warn+0x209/0x210 kernel/panic.c:582
 report_bug+0x1b6/0x2f0 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:174 [inline]
 do_error_trap+0xcf/0x1c0 arch/x86/kernel/traps.c:267
 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:286
 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:ext4_write_inode+0x4eb/0x5b0 fs/ext4/inode.c:5097
Code: 65 48 8b 04 25 28 00 00 00 48 3b 45 d0 0f 85 d2 00 00 00 44 89 f8 48 83 c4 38 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 d5 37 72 ff <0f> 0b eb d2 e8 cc 37 72 ff 41 bf fb ff ff ff eb c5 89 d9 80 e1 07
RSP: 0018:ffffc90001b97700 EFLAGS: 00010293
RAX: ffffffff8204d1ab RBX: 0000000000000800 RCX: ffff888049612580
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000000000000000
RBP: ffffc90001b97760 R08: ffffffff8204cd2e R09: ffffed1015d47004
R10: ffffed1015d47004 R11: 0000000000000000 R12: ffff88809229e588
R13: ffff888049612580 R14: 0000000000200844 R15: 0000000000000000
 write_inode fs/fs-writeback.c:1312 [inline]
 __writeback_single_inode+0x550/0x620 fs/fs-writeback.c:1511
 writeback_single_inode+0x3b9/0x800 fs/fs-writeback.c:1565
 sync_inode fs/fs-writeback.c:2602 [inline]
 sync_inode_metadata+0x7a/0xc0 fs/fs-writeback.c:2622
 ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline]
 ext4_sync_file+0x4c9/0x890 fs/ext4/fsync.c:172
 vfs_fsync_range+0xfd/0x1a0 fs/sync.c:197
 generic_write_sync include/linux/fs.h:2867 [inline]
 ext4_buffered_write_iter+0x520/0x5c0 fs/ext4/file.c:277
 ext4_file_write_iter+0xb06/0x1810 include/linux/fs.h:796
 call_write_iter include/linux/fs.h:1901 [inline]
 new_sync_write fs/read_write.c:483 [inline]
 __vfs_write+0x5a1/0x740 fs/read_write.c:496
 __kernel_write+0x11d/0x350 fs/read_write.c:515
 do_acct_process+0xf9c/0x1390 kernel/acct.c:522
 slow_acct_process kernel/acct.c:581 [inline]
 acct_process+0x478/0x590 kernel/acct.c:607
 do_exit+0x580/0x2000 kernel/exit.c:791
 kthread+0x350/0x350 kernel/kthread.c:257
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-02-26  6:57 WARNING in ext4_write_inode syzbot
@ 2020-03-08  4:35 ` Eric Biggers
  2020-03-08 23:03   ` Dave Chinner
  2020-03-08  4:36 ` [PATCH] cifs: clear PF_MEMALLOC before exiting demultiplex thread Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-08  4:35 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled, then it's possible for do_exit() to end up
calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

This case was reported by syzbot at
https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Fix this in xfsaild() by using the helper functions to save and restore
PF_MEMALLOC.

Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/xfs/xfs_trans_ail.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 00cc5b8734be..3bc570c90ad9 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -529,8 +529,9 @@ xfsaild(
 {
 	struct xfs_ail	*ailp = data;
 	long		tout = 0;	/* milliseconds */
+	unsigned int	noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	set_freezable();
 
 	while (1) {
@@ -601,6 +602,7 @@ xfsaild(
 		tout = xfsaild_push(ailp);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] cifs: clear PF_MEMALLOC before exiting demultiplex thread
  2020-02-26  6:57 WARNING in ext4_write_inode syzbot
  2020-03-08  4:35 ` [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread Eric Biggers
@ 2020-03-08  4:36 ` Eric Biggers
  2020-03-08  6:16   ` [PATCH v2] " Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-08  4:36 UTC (permalink / raw)
  To: linux-cifs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled, then it's possible for do_exit() to end up
calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

This case was reported by syzbot at
https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Fix this in cifs_demultiplex_thread() by using the helper functions to
save and restore PF_MEMALLOC.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/cifs/connect.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 4804d1df8c1c..beab1dc2dc01 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1164,8 +1164,9 @@ cifs_demultiplex_thread(void *p)
 	struct task_struct *task_to_wake = NULL;
 	struct mid_q_entry *mids[MAX_COMPOUND];
 	char *bufs[MAX_COMPOUND];
+	unsigned int noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
 
 	length = atomic_inc_return(&tcpSesAllocCount);
@@ -1320,6 +1321,7 @@ cifs_demultiplex_thread(void *p)
 		set_current_state(TASK_RUNNING);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	module_put_and_exit(0);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] cifs: clear PF_MEMALLOC before exiting demultiplex thread
  2020-03-08  4:36 ` [PATCH] cifs: clear PF_MEMALLOC before exiting demultiplex thread Eric Biggers
@ 2020-03-08  6:16   ` Eric Biggers
  2020-03-08 18:43     ` Steve French
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-08  6:16 UTC (permalink / raw)
  To: linux-cifs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled, then it's possible for do_exit() to end up
calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

This case was reported by syzbot at
https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Fix this in cifs_demultiplex_thread() by using the helper functions to
save and restore PF_MEMALLOC.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v2: added missing include of <linux/sched/mm.h>
    (I missed that I didn't actually have CONFIG_CIFS set...)

 fs/cifs/connect.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 4804d1df8c1c..97b8eb585cf9 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -21,6 +21,7 @@
 #include <linux/fs.h>
 #include <linux/net.h>
 #include <linux/string.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/list.h>
 #include <linux/wait.h>
@@ -1164,8 +1165,9 @@ cifs_demultiplex_thread(void *p)
 	struct task_struct *task_to_wake = NULL;
 	struct mid_q_entry *mids[MAX_COMPOUND];
 	char *bufs[MAX_COMPOUND];
+	unsigned int noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
 
 	length = atomic_inc_return(&tcpSesAllocCount);
@@ -1320,6 +1322,7 @@ cifs_demultiplex_thread(void *p)
 		set_current_state(TASK_RUNNING);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	module_put_and_exit(0);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] cifs: clear PF_MEMALLOC before exiting demultiplex thread
  2020-03-08  6:16   ` [PATCH v2] " Eric Biggers
@ 2020-03-08 18:43     ` Steve French
  2020-03-09  5:56       ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Steve French @ 2020-03-08 18:43 UTC (permalink / raw)
  To: Eric Biggers; +Cc: CIFS, linux-ext4, syzkaller-bugs, LKML

merged into cifs-2.6.git for-next

running buildbot cifs/smb3 automated regression tests now

On Sun, Mar 8, 2020 at 12:17 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> From: Eric Biggers <ebiggers@google.com>
>
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  For example, if BSD process
> accounting is enabled, then it's possible for do_exit() to end up
> calling ext4_write_inode().  That triggers the
> WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> (appropriately) that inodes aren't written when allocating memory.
>
> This case was reported by syzbot at
> https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.
>
> Fix this in cifs_demultiplex_thread() by using the helper functions to
> save and restore PF_MEMALLOC.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>
> v2: added missing include of <linux/sched/mm.h>
>     (I missed that I didn't actually have CONFIG_CIFS set...)
>
>  fs/cifs/connect.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 4804d1df8c1c..97b8eb585cf9 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -21,6 +21,7 @@
>  #include <linux/fs.h>
>  #include <linux/net.h>
>  #include <linux/string.h>
> +#include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
>  #include <linux/list.h>
>  #include <linux/wait.h>
> @@ -1164,8 +1165,9 @@ cifs_demultiplex_thread(void *p)
>         struct task_struct *task_to_wake = NULL;
>         struct mid_q_entry *mids[MAX_COMPOUND];
>         char *bufs[MAX_COMPOUND];
> +       unsigned int noreclaim_flag;
>
> -       current->flags |= PF_MEMALLOC;
> +       noreclaim_flag = memalloc_noreclaim_save();
>         cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
>
>         length = atomic_inc_return(&tcpSesAllocCount);
> @@ -1320,6 +1322,7 @@ cifs_demultiplex_thread(void *p)
>                 set_current_state(TASK_RUNNING);
>         }
>
> +       memalloc_noreclaim_restore(noreclaim_flag);
>         module_put_and_exit(0);
>  }
>
> --
> 2.25.1
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-08  4:35 ` [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread Eric Biggers
@ 2020-03-08 23:03   ` Dave Chinner
  2020-03-09  1:04     ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2020-03-08 23:03 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Sat, Mar 07, 2020 at 08:35:40PM -0800, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  For example, if BSD process
> accounting is enabled, then it's possible for do_exit() to end up
> calling ext4_write_inode().  That triggers the
> WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> (appropriately) that inodes aren't written when allocating memory.

And just how the hell does and XFS kernel thread end up calling
ext4_write_inode()? That's kinda a key factor in all this, and
it's not explained here.

> This case was reported by syzbot at
> https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Which doesn't really explain it, either.

What is the configuration conditions under which this triggers? It
looks like some weird combination of a no-journal ext4 root
filesystem and the audit subsystem being configured with O_SYNC
files?

People trying to decide if this is something that needs to be
backported to stable kernels need to be able to unerstand how this
bug is actually triggered so they can make sane decisions about
it...

/me tracks the PF_MEMALLOC flag back to commit 43ff2122e649 ("xfs:
on-stack delayed write buffer lists") where is was inherited here
from the buffer flush daemon that xfsaild took over from. Which also
never cleared the PF_MEMALLOC flag. That goes back to 2002:

commit d676c94914eb97d72061aff69c99406df4f395e9
Author: Steve Lord <lord@sgi.com>
Date:   Fri Jan 11 23:31:51 2002 +0000

    Merge pagebuf module into XFS

So this issue of calling do_exit() with PF_MEMALLOC set has been
around for 18+ years without anyone noticing it.

I also note that cifs_demultiplex_thread() has the same problem -
can you please do a complete audit of all the users of PF_MEMALLOC
and fix all of them?

> Fix this in xfsaild() by using the helper functions to save and restore
> PF_MEMALLOC.
> 
> Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/xfs/xfs_trans_ail.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 00cc5b8734be..3bc570c90ad9 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -529,8 +529,9 @@ xfsaild(
>  {
>  	struct xfs_ail	*ailp = data;
>  	long		tout = 0;	/* milliseconds */
> +	unsigned int	noreclaim_flag;
>  
> -	current->flags |= PF_MEMALLOC;
> +	noreclaim_flag = memalloc_noreclaim_save();
>  	set_freezable();
>  
>  	while (1) {
> @@ -601,6 +602,7 @@ xfsaild(
>  		tout = xfsaild_push(ailp);
>  	}
>  
> +	memalloc_noreclaim_restore(noreclaim_flag);
>  	return 0;
>  }

The code looks fine - I considered doing this a couple of weeks ago
just for cleaniness reasons - but the commit message needs work to
explain the context of the bug...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-08 23:03   ` Dave Chinner
@ 2020-03-09  1:04     ` Eric Biggers
  2020-03-09  4:34       ` [PATCH v2] " Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-09  1:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Mon, Mar 09, 2020 at 10:03:07AM +1100, Dave Chinner wrote:
> On Sat, Mar 07, 2020 at 08:35:40PM -0800, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> > during do_exit().  That can confuse things.  For example, if BSD process
> > accounting is enabled, then it's possible for do_exit() to end up
> > calling ext4_write_inode().  That triggers the
> > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> > (appropriately) that inodes aren't written when allocating memory.
> 
> And just how the hell does and XFS kernel thread end up calling
> ext4_write_inode()? That's kinda a key factor in all this, and
> it's not explained here.
> 
> > This case was reported by syzbot at
> > https://lkml.kernel.org/r/0000000000000e7156059f751d7b@google.com.
> 
> Which doesn't really explain it, either.
> 
> What is the configuration conditions under which this triggers? It
> looks like some weird combination of a no-journal ext4 root
> filesystem and the audit subsystem being configured with O_SYNC
> files?
> 
> People trying to decide if this is something that needs to be
> backported to stable kernels need to be able to unerstand how this
> bug is actually triggered so they can make sane decisions about
> it...

My guess is that syzbot enabled BSD process accounting to a file with FS_SYNC_FL
on an ext4 nojournal fs.  It didn't provide a reproducer itself.  Sure, I'll try
to write a reproducer and include it in the commit message.  I felt it wasn't
quite as important for this one compared to most of the other syzbot bugs, since
BSD process accounting can only be enabled by root, and once we're talking about
do_exit() being able to write an arbitrary file, it's not hard to see why
*something* could get tripped up by PF_MEMALLOC.  There are probably other ways
it could cause problems besides this specific one.  But sure, I'll try.

> 
> I also note that cifs_demultiplex_thread() has the same problem -
> can you please do a complete audit of all the users of PF_MEMALLOC
> and fix all of them?

I already did, that's why at the same time I sent out this patch, I also sent
out one to fix cifs_demultiplex_thread()
(https://lkml.kernel.org/linux-cifs/20200308043645.1034870-1-ebiggers@kernel.org/T/#t).
These were the only two; I didn't find any others that needed to be fixed.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09  1:04     ` Eric Biggers
@ 2020-03-09  4:34       ` Eric Biggers
  2020-03-09 10:57         ` Brian Foster
  2020-03-09 16:24         ` Darrick J. Wong
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2020-03-09  4:34 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled and the accounting file has FS_SYNC_FL set and is
located on an ext4 filesystem without a journal, then do_exit() ends up
calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

Fix this in xfsaild() by using the helper functions to save and restore
PF_MEMALLOC.

This can be reproduced as follows in the kvm-xfstests test appliance
modified to add the 'acct' Debian package, and with kvm-xfstests's
recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:

	mkfs.ext2 -F /dev/vdb
	mount /vdb -t ext4
	touch /vdb/file
	chattr +S /vdb/file
	accton /vdb/file
	mkfs.xfs -f /dev/vdc
	mount /vdc
	umount /vdc

It causes:
	WARNING: CPU: 0 PID: 332 at fs/ext4/inode.c:5097 ext4_write_inode+0x140/0x1a0
	CPU: 0 PID: 332 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #5
	[...]
	RIP: 0010:ext4_write_inode+0x140/0x1a0 fs/ext4/inode.c:5097
	[...]
	Call Trace:
	 write_inode fs/fs-writeback.c:1312 [inline]
	 __writeback_single_inode+0x465/0x5f0 fs/fs-writeback.c:1511
	 writeback_single_inode+0xad/0x120 fs/fs-writeback.c:1565
	 sync_inode fs/fs-writeback.c:2602 [inline]
	 sync_inode_metadata+0x3d/0x57 fs/fs-writeback.c:2622
	 ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline]
	 ext4_sync_file+0x243/0x4b0 fs/ext4/fsync.c:172
	 generic_write_sync include/linux/fs.h:2867 [inline]
	 ext4_buffered_write_iter+0xe1/0x130 fs/ext4/file.c:277
	 call_write_iter include/linux/fs.h:1901 [inline]
	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
	 __kernel_write+0x54/0xe0 fs/read_write.c:515
	 do_acct_process+0x122/0x170 kernel/acct.c:522
	 slow_acct_process kernel/acct.c:581 [inline]
	 acct_process+0x1d4/0x27c kernel/acct.c:607
	 do_exit+0x83d/0xbc0 kernel/exit.c:791
	 kthread+0xf1/0x140 kernel/kthread.c:257
	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352

This case was originally reported by syzbot at
https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v2: include more details in the commit message.

 fs/xfs/xfs_trans_ail.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 00cc5b8734be8..3bc570c90ad97 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -529,8 +529,9 @@ xfsaild(
 {
 	struct xfs_ail	*ailp = data;
 	long		tout = 0;	/* milliseconds */
+	unsigned int	noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	set_freezable();
 
 	while (1) {
@@ -601,6 +602,7 @@ xfsaild(
 		tout = xfsaild_push(ailp);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] cifs: clear PF_MEMALLOC before exiting demultiplex thread
  2020-03-08 18:43     ` Steve French
@ 2020-03-09  5:56       ` Eric Biggers
  2020-03-09  5:58         ` [PATCH v3] " Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-09  5:56 UTC (permalink / raw)
  To: Steve French; +Cc: CIFS, linux-ext4, syzkaller-bugs, LKML

On Sun, Mar 08, 2020 at 01:43:56PM -0500, Steve French wrote:
> merged into cifs-2.6.git for-next
> 
> running buildbot cifs/smb3 automated regression tests now
> 

Thanks.  FYI, I also sent a similar patch for an XFS kernel thread, and the XFS
folks requested more details in the commit message [1].  So I ended up trying to
reproduce this warning on both XFS and CIFS, and I added more details to both
commit messages.  So if it's not too late, please update the CIFS commit to the
v3 I'm sending out.  Thanks!

[1] https://lkml.kernel.org/linux-xfs/20200308230307.GM10776@dread.disaster.area

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] cifs: clear PF_MEMALLOC before exiting demultiplex thread
  2020-03-09  5:56       ` Eric Biggers
@ 2020-03-09  5:58         ` Eric Biggers
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2020-03-09  5:58 UTC (permalink / raw)
  To: linux-cifs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  For example, if BSD process
accounting is enabled and the accounting file has FS_SYNC_FL set and is
located on an ext4 filesystem without a journal, then do_exit() can end
up calling ext4_write_inode().  That triggers the
WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
(appropriately) that inodes aren't written when allocating memory.

This was originally reported for another kernel thread, xfsaild() [1].
cifs_demultiplex_thread() also exits with PF_MEMALLOC set, so it's
potentially subject to this same class of issue -- though I haven't been
able to reproduce the WARN_ON_ONCE() via CIFS, since unlike xfsaild(),
cifs_demultiplex_thread() is sent SIGKILL before exiting, and that
interrupts the write to the BSD process accounting file.

Either way, leaving PF_MEMALLOC set is potentially problematic.  Let's
clean this up by properly saving and restoring PF_MEMALLOC.

[1] https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com

Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v3: improved commit message
v2: added missing include

 fs/cifs/connect.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 4804d1df8c1cf..97b8eb585cf9d 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -21,6 +21,7 @@
 #include <linux/fs.h>
 #include <linux/net.h>
 #include <linux/string.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/list.h>
 #include <linux/wait.h>
@@ -1164,8 +1165,9 @@ cifs_demultiplex_thread(void *p)
 	struct task_struct *task_to_wake = NULL;
 	struct mid_q_entry *mids[MAX_COMPOUND];
 	char *bufs[MAX_COMPOUND];
+	unsigned int noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
 
 	length = atomic_inc_return(&tcpSesAllocCount);
@@ -1320,6 +1322,7 @@ cifs_demultiplex_thread(void *p)
 		set_current_state(TASK_RUNNING);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	module_put_and_exit(0);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09  4:34       ` [PATCH v2] " Eric Biggers
@ 2020-03-09 10:57         ` Brian Foster
  2020-03-09 16:24         ` Darrick J. Wong
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2020-03-09 10:57 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  For example, if BSD process
> accounting is enabled and the accounting file has FS_SYNC_FL set and is
> located on an ext4 filesystem without a journal, then do_exit() ends up
> calling ext4_write_inode().  That triggers the
> WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> (appropriately) that inodes aren't written when allocating memory.
> 
> Fix this in xfsaild() by using the helper functions to save and restore
> PF_MEMALLOC.
> 
> This can be reproduced as follows in the kvm-xfstests test appliance
> modified to add the 'acct' Debian package, and with kvm-xfstests's
> recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> 
> 	mkfs.ext2 -F /dev/vdb
> 	mount /vdb -t ext4
> 	touch /vdb/file
> 	chattr +S /vdb/file
> 	accton /vdb/file
> 	mkfs.xfs -f /dev/vdc
> 	mount /vdc
> 	umount /vdc
> 
> It causes:
> 	WARNING: CPU: 0 PID: 332 at fs/ext4/inode.c:5097 ext4_write_inode+0x140/0x1a0
> 	CPU: 0 PID: 332 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #5
> 	[...]
> 	RIP: 0010:ext4_write_inode+0x140/0x1a0 fs/ext4/inode.c:5097
> 	[...]
> 	Call Trace:
> 	 write_inode fs/fs-writeback.c:1312 [inline]
> 	 __writeback_single_inode+0x465/0x5f0 fs/fs-writeback.c:1511
> 	 writeback_single_inode+0xad/0x120 fs/fs-writeback.c:1565
> 	 sync_inode fs/fs-writeback.c:2602 [inline]
> 	 sync_inode_metadata+0x3d/0x57 fs/fs-writeback.c:2622
> 	 ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline]
> 	 ext4_sync_file+0x243/0x4b0 fs/ext4/fsync.c:172
> 	 generic_write_sync include/linux/fs.h:2867 [inline]
> 	 ext4_buffered_write_iter+0xe1/0x130 fs/ext4/file.c:277
> 	 call_write_iter include/linux/fs.h:1901 [inline]
> 	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
> 	 __kernel_write+0x54/0xe0 fs/read_write.c:515
> 	 do_acct_process+0x122/0x170 kernel/acct.c:522
> 	 slow_acct_process kernel/acct.c:581 [inline]
> 	 acct_process+0x1d4/0x27c kernel/acct.c:607
> 	 do_exit+0x83d/0xbc0 kernel/exit.c:791
> 	 kthread+0xf1/0x140 kernel/kthread.c:257
> 	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352
> 
> This case was originally reported by syzbot at
> https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.
> 
> Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

> 
> v2: include more details in the commit message.
> 
>  fs/xfs/xfs_trans_ail.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 00cc5b8734be8..3bc570c90ad97 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -529,8 +529,9 @@ xfsaild(
>  {
>  	struct xfs_ail	*ailp = data;
>  	long		tout = 0;	/* milliseconds */
> +	unsigned int	noreclaim_flag;
>  
> -	current->flags |= PF_MEMALLOC;
> +	noreclaim_flag = memalloc_noreclaim_save();
>  	set_freezable();
>  
>  	while (1) {
> @@ -601,6 +602,7 @@ xfsaild(
>  		tout = xfsaild_push(ailp);
>  	}
>  
> +	memalloc_noreclaim_restore(noreclaim_flag);
>  	return 0;
>  }
>  
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09  4:34       ` [PATCH v2] " Eric Biggers
  2020-03-09 10:57         ` Brian Foster
@ 2020-03-09 16:24         ` Darrick J. Wong
  2020-03-09 18:04           ` Eric Biggers
  2020-03-12 22:20           ` [PATCH v2] " Eric Biggers
  1 sibling, 2 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-03-09 16:24 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  For example, if BSD process
> accounting is enabled and the accounting file has FS_SYNC_FL set and is
> located on an ext4 filesystem without a journal, then do_exit() ends up
> calling ext4_write_inode().  That triggers the
> WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> (appropriately) that inodes aren't written when allocating memory.
> 
> Fix this in xfsaild() by using the helper functions to save and restore
> PF_MEMALLOC.
> 
> This can be reproduced as follows in the kvm-xfstests test appliance
> modified to add the 'acct' Debian package, and with kvm-xfstests's
> recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> 
> 	mkfs.ext2 -F /dev/vdb
> 	mount /vdb -t ext4
> 	touch /vdb/file
> 	chattr +S /vdb/file

Does this trip if the process accounting file is also on an xfs
filesystem?

> 	accton /vdb/file
> 	mkfs.xfs -f /dev/vdc
> 	mount /vdc
> 	umount /vdc

...and if so, can this be turned into an fstests case, please?


> 
> It causes:
> 	WARNING: CPU: 0 PID: 332 at fs/ext4/inode.c:5097 ext4_write_inode+0x140/0x1a0
> 	CPU: 0 PID: 332 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #5
> 	[...]
> 	RIP: 0010:ext4_write_inode+0x140/0x1a0 fs/ext4/inode.c:5097
> 	[...]
> 	Call Trace:
> 	 write_inode fs/fs-writeback.c:1312 [inline]
> 	 __writeback_single_inode+0x465/0x5f0 fs/fs-writeback.c:1511
> 	 writeback_single_inode+0xad/0x120 fs/fs-writeback.c:1565
> 	 sync_inode fs/fs-writeback.c:2602 [inline]
> 	 sync_inode_metadata+0x3d/0x57 fs/fs-writeback.c:2622
> 	 ext4_fsync_nojournal fs/ext4/fsync.c:94 [inline]
> 	 ext4_sync_file+0x243/0x4b0 fs/ext4/fsync.c:172
> 	 generic_write_sync include/linux/fs.h:2867 [inline]
> 	 ext4_buffered_write_iter+0xe1/0x130 fs/ext4/file.c:277
> 	 call_write_iter include/linux/fs.h:1901 [inline]
> 	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
> 	 __kernel_write+0x54/0xe0 fs/read_write.c:515
> 	 do_acct_process+0x122/0x170 kernel/acct.c:522
> 	 slow_acct_process kernel/acct.c:581 [inline]
> 	 acct_process+0x1d4/0x27c kernel/acct.c:607
> 	 do_exit+0x83d/0xbc0 kernel/exit.c:791
> 	 kthread+0xf1/0x140 kernel/kthread.c:257
> 	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352
> 
> This case was originally reported by syzbot at
> https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.
> 
> Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> v2: include more details in the commit message.
> 
>  fs/xfs/xfs_trans_ail.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 00cc5b8734be8..3bc570c90ad97 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -529,8 +529,9 @@ xfsaild(
>  {
>  	struct xfs_ail	*ailp = data;
>  	long		tout = 0;	/* milliseconds */
> +	unsigned int	noreclaim_flag;
>  
> -	current->flags |= PF_MEMALLOC;
> +	noreclaim_flag = memalloc_noreclaim_save();
>  	set_freezable();
>  
>  	while (1) {
> @@ -601,6 +602,7 @@ xfsaild(
>  		tout = xfsaild_push(ailp);
>  	}
>  
> +	memalloc_noreclaim_restore(noreclaim_flag);
>  	return 0;
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 16:24         ` Darrick J. Wong
@ 2020-03-09 18:04           ` Eric Biggers
  2020-03-09 18:13             ` Darrick J. Wong
  2020-03-12 22:20           ` [PATCH v2] " Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2020-03-09 18:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote:
> On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> > during do_exit().  That can confuse things.  For example, if BSD process
> > accounting is enabled and the accounting file has FS_SYNC_FL set and is
> > located on an ext4 filesystem without a journal, then do_exit() ends up
> > calling ext4_write_inode().  That triggers the
> > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> > (appropriately) that inodes aren't written when allocating memory.
> > 
> > Fix this in xfsaild() by using the helper functions to save and restore
> > PF_MEMALLOC.
> > 
> > This can be reproduced as follows in the kvm-xfstests test appliance
> > modified to add the 'acct' Debian package, and with kvm-xfstests's
> > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> > 
> > 	mkfs.ext2 -F /dev/vdb
> > 	mount /vdb -t ext4
> > 	touch /vdb/file
> > 	chattr +S /vdb/file
> 
> Does this trip if the process accounting file is also on an xfs
> filesystem?
> 
> > 	accton /vdb/file
> > 	mkfs.xfs -f /dev/vdc
> > 	mount /vdc
> > 	umount /vdc
> 
> ...and if so, can this be turned into an fstests case, please?

I wasn't expecting it, but it turns out it does actually trip a similar warning
in iomap_do_writepage():

        mkfs.xfs -f /dev/vdb
        mount /vdb
        touch /vdb/file
        chattr +S /vdb/file
        accton /vdb/file
        mkfs.xfs -f /dev/vdc
        mount /vdc
        umount /vdc

causes...

	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
	[...]
	Call Trace:
	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
	 generic_write_sync include/linux/fs.h:2867 [inline]
	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
	 call_write_iter include/linux/fs.h:1901 [inline]
	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
	 __kernel_write+0x54/0xe0 fs/read_write.c:515
	 do_acct_process+0x122/0x170 kernel/acct.c:522
	 slow_acct_process kernel/acct.c:581 [inline]
	 acct_process+0x1d4/0x27c kernel/acct.c:607
	 do_exit+0x83d/0xbc0 kernel/exit.c:791
	 kthread+0xf1/0x140 kernel/kthread.c:257
	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352

So sure, since it's not necessarily a multi-filesystem thing, I can try to turn
it into an xfstest.  There's currently no way to enable BSD process accounting
in xfstests though, so we'll either need to make the test depend on the 'acct'
program or add a helper test program.

Also, do you want me to update the commit message again, to mention the above
case?

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 18:04           ` Eric Biggers
@ 2020-03-09 18:13             ` Darrick J. Wong
  2020-03-09 18:57               ` [PATCH v3] " Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2020-03-09 18:13 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Mon, Mar 09, 2020 at 11:04:52AM -0700, Eric Biggers wrote:
> On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote:
> > On Sun, Mar 08, 2020 at 09:34:30PM -0700, Eric Biggers wrote:
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> > > during do_exit().  That can confuse things.  For example, if BSD process
> > > accounting is enabled and the accounting file has FS_SYNC_FL set and is
> > > located on an ext4 filesystem without a journal, then do_exit() ends up
> > > calling ext4_write_inode().  That triggers the
> > > WARN_ON_ONCE(current->flags & PF_MEMALLOC) there, as it assumes
> > > (appropriately) that inodes aren't written when allocating memory.
> > > 
> > > Fix this in xfsaild() by using the helper functions to save and restore
> > > PF_MEMALLOC.
> > > 
> > > This can be reproduced as follows in the kvm-xfstests test appliance
> > > modified to add the 'acct' Debian package, and with kvm-xfstests's
> > > recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> > > 
> > > 	mkfs.ext2 -F /dev/vdb
> > > 	mount /vdb -t ext4
> > > 	touch /vdb/file
> > > 	chattr +S /vdb/file
> > 
> > Does this trip if the process accounting file is also on an xfs
> > filesystem?
> > 
> > > 	accton /vdb/file
> > > 	mkfs.xfs -f /dev/vdc
> > > 	mount /vdc
> > > 	umount /vdc
> > 
> > ...and if so, can this be turned into an fstests case, please?
> 
> I wasn't expecting it, but it turns out it does actually trip a similar warning
> in iomap_do_writepage():
> 
>         mkfs.xfs -f /dev/vdb
>         mount /vdb
>         touch /vdb/file
>         chattr +S /vdb/file
>         accton /vdb/file
>         mkfs.xfs -f /dev/vdc
>         mount /vdc
>         umount /vdc
> 
> causes...
> 
> 	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
> 	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
> 	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
> 	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
> 	[...]
> 	Call Trace:
> 	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
> 	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
> 	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
> 	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
> 	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
> 	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
> 	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
> 	 generic_write_sync include/linux/fs.h:2867 [inline]
> 	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
> 	 call_write_iter include/linux/fs.h:1901 [inline]
> 	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
> 	 __kernel_write+0x54/0xe0 fs/read_write.c:515
> 	 do_acct_process+0x122/0x170 kernel/acct.c:522
> 	 slow_acct_process kernel/acct.c:581 [inline]
> 	 acct_process+0x1d4/0x27c kernel/acct.c:607
> 	 do_exit+0x83d/0xbc0 kernel/exit.c:791
> 	 kthread+0xf1/0x140 kernel/kthread.c:257
> 	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352
> 
> So sure, since it's not necessarily a multi-filesystem thing, I can try to turn
> it into an xfstest.  There's currently no way to enable BSD process accounting
> in xfstests though, so we'll either need to make the test depend on the 'acct'
> program or add a helper test program.
> 
> Also, do you want me to update the commit message again, to mention the above
> case?

I think it's worth mentioning that this is a general problem that
applies any time the process accounting file has that sync flag set,
since this problem isn't specific to ext4 + xfs.

(and now I wonder how many other places in the kernel suffer from these
kinds of file write surprises...)

--D

> - Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 18:13             ` Darrick J. Wong
@ 2020-03-09 18:57               ` Eric Biggers
  2020-03-10 15:47                 ` Darrick J. Wong
  2020-03-11  6:34                 ` Christoph Hellwig
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2020-03-09 18:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-ext4, syzkaller-bugs, linux-kernel

From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  In particular, if BSD
process accounting is enabled, then do_exit() writes data to an
accounting file.  If that file has FS_SYNC_FL set, then this write
occurs synchronously and can misbehave if PF_MEMALLOC is set.

For example, if the accounting file is located on an XFS filesystem,
then a WARN_ON_ONCE() in iomap_do_writepage() is triggered and the data
doesn't get written when it should.  Or if the accounting file is
located on an ext4 filesystem without a journal, then a WARN_ON_ONCE()
in ext4_write_inode() is triggered and the inode doesn't get written.

Fix this in xfsaild() by using the helper functions to save and restore
PF_MEMALLOC.

This can be reproduced as follows in the kvm-xfstests test appliance
modified to add the 'acct' Debian package, and with kvm-xfstests's
recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:

        mkfs.xfs -f /dev/vdb
        mount /vdb
        touch /vdb/file
        chattr +S /vdb/file
        accton /vdb/file
        mkfs.xfs -f /dev/vdc
        mount /vdc
        umount /vdc

It causes:
	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
	[...]
	Call Trace:
	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
	 generic_write_sync include/linux/fs.h:2867 [inline]
	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
	 call_write_iter include/linux/fs.h:1901 [inline]
	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
	 __kernel_write+0x54/0xe0 fs/read_write.c:515
	 do_acct_process+0x122/0x170 kernel/acct.c:522
	 slow_acct_process kernel/acct.c:581 [inline]
	 acct_process+0x1d4/0x27c kernel/acct.c:607
	 do_exit+0x83d/0xbc0 kernel/exit.c:791
	 kthread+0xf1/0x140 kernel/kthread.c:257
	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352

This bug was originally reported by syzbot at
https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v3: updated commit message again, this time to take into account the bug
    also being reproducible when the accounting file is located on XFS.

v2: include more details in the commit message.

 fs/xfs/xfs_trans_ail.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 00cc5b8734be8..3bc570c90ad97 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -529,8 +529,9 @@ xfsaild(
 {
 	struct xfs_ail	*ailp = data;
 	long		tout = 0;	/* milliseconds */
+	unsigned int	noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	set_freezable();
 
 	while (1) {
@@ -601,6 +602,7 @@ xfsaild(
 		tout = xfsaild_push(ailp);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 18:57               ` [PATCH v3] " Eric Biggers
@ 2020-03-10 15:47                 ` Darrick J. Wong
  2020-03-11  6:34                 ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-03-10 15:47 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Mon, Mar 09, 2020 at 11:57:14AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  In particular, if BSD
> process accounting is enabled, then do_exit() writes data to an
> accounting file.  If that file has FS_SYNC_FL set, then this write
> occurs synchronously and can misbehave if PF_MEMALLOC is set.
> 
> For example, if the accounting file is located on an XFS filesystem,
> then a WARN_ON_ONCE() in iomap_do_writepage() is triggered and the data
> doesn't get written when it should.  Or if the accounting file is
> located on an ext4 filesystem without a journal, then a WARN_ON_ONCE()
> in ext4_write_inode() is triggered and the inode doesn't get written.
> 
> Fix this in xfsaild() by using the helper functions to save and restore
> PF_MEMALLOC.
> 
> This can be reproduced as follows in the kvm-xfstests test appliance
> modified to add the 'acct' Debian package, and with kvm-xfstests's
> recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> 
>         mkfs.xfs -f /dev/vdb
>         mount /vdb
>         touch /vdb/file
>         chattr +S /vdb/file
>         accton /vdb/file
>         mkfs.xfs -f /dev/vdc
>         mount /vdc
>         umount /vdc
> 
> It causes:
> 	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
> 	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
> 	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
> 	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
> 	[...]
> 	Call Trace:
> 	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
> 	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
> 	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
> 	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
> 	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
> 	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
> 	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
> 	 generic_write_sync include/linux/fs.h:2867 [inline]
> 	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
> 	 call_write_iter include/linux/fs.h:1901 [inline]
> 	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
> 	 __kernel_write+0x54/0xe0 fs/read_write.c:515
> 	 do_acct_process+0x122/0x170 kernel/acct.c:522
> 	 slow_acct_process kernel/acct.c:581 [inline]
> 	 acct_process+0x1d4/0x27c kernel/acct.c:607
> 	 do_exit+0x83d/0xbc0 kernel/exit.c:791
> 	 kthread+0xf1/0x140 kernel/kthread.c:257
> 	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352
> 
> This bug was originally reported by syzbot at
> https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.
> 
> Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Seems reasonable to me, will give it a spin...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> v3: updated commit message again, this time to take into account the bug
>     also being reproducible when the accounting file is located on XFS.
> 
> v2: include more details in the commit message.
> 
>  fs/xfs/xfs_trans_ail.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 00cc5b8734be8..3bc570c90ad97 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -529,8 +529,9 @@ xfsaild(
>  {
>  	struct xfs_ail	*ailp = data;
>  	long		tout = 0;	/* milliseconds */
> +	unsigned int	noreclaim_flag;
>  
> -	current->flags |= PF_MEMALLOC;
> +	noreclaim_flag = memalloc_noreclaim_save();
>  	set_freezable();
>  
>  	while (1) {
> @@ -601,6 +602,7 @@ xfsaild(
>  		tout = xfsaild_push(ailp);
>  	}
>  
> +	memalloc_noreclaim_restore(noreclaim_flag);
>  	return 0;
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 18:57               ` [PATCH v3] " Eric Biggers
  2020-03-10 15:47                 ` Darrick J. Wong
@ 2020-03-11  6:34                 ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-03-11  6:34 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] xfs: clear PF_MEMALLOC before exiting xfsaild thread
  2020-03-09 16:24         ` Darrick J. Wong
  2020-03-09 18:04           ` Eric Biggers
@ 2020-03-12 22:20           ` Eric Biggers
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2020-03-12 22:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, linux-ext4, syzkaller-bugs, linux-kernel

On Mon, Mar 09, 2020 at 09:24:39AM -0700, Darrick J. Wong wrote:
> > 
> > 	mkfs.ext2 -F /dev/vdb
> > 	mount /vdb -t ext4
> > 	touch /vdb/file
> > 	chattr +S /vdb/file
> 
> Does this trip if the process accounting file is also on an xfs
> filesystem?
> 
> > 	accton /vdb/file
> > 	mkfs.xfs -f /dev/vdc
> > 	mount /vdc
> > 	umount /vdc
> 
> ...and if so, can this be turned into an fstests case, please?
> 

Test sent out at
https://lkml.kernel.org/fstests/20200312221437.141484-1-ebiggers@kernel.org/

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-03-12 22:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26  6:57 WARNING in ext4_write_inode syzbot
2020-03-08  4:35 ` [PATCH] xfs: clear PF_MEMALLOC before exiting xfsaild thread Eric Biggers
2020-03-08 23:03   ` Dave Chinner
2020-03-09  1:04     ` Eric Biggers
2020-03-09  4:34       ` [PATCH v2] " Eric Biggers
2020-03-09 10:57         ` Brian Foster
2020-03-09 16:24         ` Darrick J. Wong
2020-03-09 18:04           ` Eric Biggers
2020-03-09 18:13             ` Darrick J. Wong
2020-03-09 18:57               ` [PATCH v3] " Eric Biggers
2020-03-10 15:47                 ` Darrick J. Wong
2020-03-11  6:34                 ` Christoph Hellwig
2020-03-12 22:20           ` [PATCH v2] " Eric Biggers
2020-03-08  4:36 ` [PATCH] cifs: clear PF_MEMALLOC before exiting demultiplex thread Eric Biggers
2020-03-08  6:16   ` [PATCH v2] " Eric Biggers
2020-03-08 18:43     ` Steve French
2020-03-09  5:56       ` Eric Biggers
2020-03-09  5:58         ` [PATCH v3] " Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).