Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	syzbot <syzbot+96cc7aba7e969b1d305c@syzkaller.appspotmail.com>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [RFC PATCH] pipe: make pipe_release() deferrable.
Date: Sat, 22 Aug 2020 13:34:49 +0900	[thread overview]
Message-ID: <7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp> (raw)
In-Reply-To: <dc9b2681-3b84-eb74-8c88-3815beaff7f8@i-love.sakura.ne.jp>

syzbot is reporting hung task at pipe_write() [1], for __pipe_lock() from
pipe_write() by task-A can be blocked forever waiting for
handle_userfault() from copy_page_from_iter() from pipe_write() by task-B
to complete and call __pipe_unlock().

Since the problem is that we can't enforce task-B to immediately complete
handle_userfault() (this is effectively returning to userspace with locks
held), we won't be able to avoid this hung task report unless we convert
all pipe locks to killable (because khungtaskd does not warn stalling
killable waits).

Linus Torvalds commented that we could introduce timeout for
handle_userfault(), and Andrea Arcangeli responded that too short timeout
can cause problems (e.g. breaking qemu's live migration) [2], and we can't
guarantee that khungtaskd's timeout is longer than timeout for
multiple handle_userfault() events.

Since Andrea commented that we will be able to avoid this hung task report
if we convert pipe locks to killable, I tried a feasibility test [3].
While it is not difficult to make some of pipe locks killable, there
are subtle or complicated locations (e.g. pipe_wait() users).

syzbot already reported that even pipe_release() is subjected to this hung
task report [4]. While the cause of [4] is that splice() from pipe to file
hit an infinite busy loop bug after holding pipe lock, it is a sign that
we have to care about __pipe_lock() in pipe_release() even if pipe_read()
or pipe_write() is stalling due to page fault handling.

Therefore, this patch tries to convert __pipe_lock() in pipe_release() to
killable, by deferring to a workqueue context when __pipe_lock_killable()
failed.

(a) Do you think that we can make all pipe locks killable?
(b) Do you think that we can introduce timeout for handling page faults?
(c) Do you think that we can avoid page faults with pipe locks held?

[1] https://syzkaller.appspot.com/bug?id=ab3d277fa3b068651edb7171a1aa4f78e5eacf78
[2] https://lkml.kernel.org/r/CAHk-=wj15SDiHjP2wPiC=Ru-RrUjOuT4AoULj6N_9pVvSXaWiw@mail.gmail.com
[3] https://lkml.kernel.org/r/dc9b2681-3b84-eb74-8c88-3815beaff7f8@i-love.sakura.ne.jp
[4] https://syzkaller.appspot.com/bug?id=2ccac875e85dc852911a0b5b788ada82dc0a081e

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 fs/pipe.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 60dbee457143..a64c7fc1794f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -87,6 +87,11 @@ static inline void __pipe_lock(struct pipe_inode_info *pipe)
 	mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
 }
 
+static inline int __pipe_lock_killable(struct pipe_inode_info *pipe)
+{
+	return mutex_lock_killable_nested(&pipe->mutex, I_MUTEX_PARENT);
+}
+
 static inline void __pipe_unlock(struct pipe_inode_info *pipe)
 {
 	mutex_unlock(&pipe->mutex);
@@ -714,15 +719,12 @@ static void put_pipe_info(struct inode *inode, struct pipe_inode_info *pipe)
 		free_pipe_info(pipe);
 }
 
-static int
-pipe_release(struct inode *inode, struct file *file)
+/* Caller holds pipe->mutex. */
+static void do_pipe_release(struct inode *inode, struct pipe_inode_info *pipe, fmode_t f_mode)
 {
-	struct pipe_inode_info *pipe = file->private_data;
-
-	__pipe_lock(pipe);
-	if (file->f_mode & FMODE_READ)
+	if (f_mode & FMODE_READ)
 		pipe->readers--;
-	if (file->f_mode & FMODE_WRITE)
+	if (f_mode & FMODE_WRITE)
 		pipe->writers--;
 
 	/* Was that the last reader or writer, but not the other side? */
@@ -732,9 +734,48 @@ pipe_release(struct inode *inode, struct file *file)
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
 		kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
 	}
+}
+
+struct pipe_release_data {
+	struct work_struct work;
+	struct inode *inode;
+	struct pipe_inode_info *pipe;
+	fmode_t f_mode;
+};
+
+static void deferred_pipe_release(struct work_struct *work)
+{
+	struct pipe_release_data *w = container_of(work, struct pipe_release_data, work);
+	struct inode *inode = w->inode;
+	struct pipe_inode_info *pipe = w->pipe;
+
+	__pipe_lock(pipe);
+	do_pipe_release(inode, pipe, w->f_mode);
 	__pipe_unlock(pipe);
 
 	put_pipe_info(inode, pipe);
+	iput(inode); /* pipe_release() called ihold(inode). */
+	kfree(w);
+}
+
+static int pipe_release(struct inode *inode, struct file *file)
+{
+	struct pipe_inode_info *pipe = file->private_data;
+	struct pipe_release_data *w;
+
+	if (likely(__pipe_lock_killable(pipe) == 0)) {
+		do_pipe_release(inode, pipe, file->f_mode);
+		__pipe_unlock(pipe);
+		put_pipe_info(inode, pipe);
+		return 0;
+	}
+	w = kmalloc(sizeof(*w), GFP_KERNEL | __GFP_NOFAIL);
+	ihold(inode); /* deferred_pipe_release() will call iput(inode). */
+	w->inode = inode;
+	w->pipe = pipe;
+	w->f_mode = file->f_mode;
+	INIT_WORK(&w->work, deferred_pipe_release);
+	queue_work(system_wq, &w->work);
 	return 0;
 }
 
-- 
2.18.4



  reply	other threads:[~2020-08-22  4:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-01  8:45 INFO: task hung in pipe_read (2) syzbot
2020-08-01 15:29 ` Tetsuo Handa
2020-08-01 17:39   ` Linus Torvalds
2020-08-07  5:31     ` Andrea Arcangeli
2020-08-08  1:01       ` Tetsuo Handa
2020-08-10 19:29         ` Andrea Arcangeli
2020-08-13  7:00           ` Tetsuo Handa
2020-08-13 11:20             ` Tetsuo Handa
2020-08-22  4:34               ` Tetsuo Handa [this message]
2020-08-22 16:30                 ` [RFC PATCH] pipe: make pipe_release() deferrable Linus Torvalds
2020-08-23  0:21                   ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dvyukov@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+96cc7aba7e969b1d305c@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --subject='Re: [RFC PATCH] pipe: make pipe_release() deferrable.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).