LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com>,
	Josef Bacik <josef@toxicpanda.com>, Will Deacon <will@kernel.org>
Subject: Re: [RFC][arm64] possible infinite loop in btrfs search_ioctl()
Date: Thu, 21 Oct 2021 20:00:50 +0200	[thread overview]
Message-ID: <CAHc6FU7im8UzxWCzqUFMKOwyg9zoQ8OZ_M+rRC_E20yE5RNu9g@mail.gmail.com> (raw)
In-Reply-To: <YXGexrdprC+NTslm@arm.com>

On Thu, Oct 21, 2021 at 7:09 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Oct 21, 2021 at 04:42:33PM +0200, Andreas Gruenbacher wrote:
> > On Thu, Oct 21, 2021 at 12:06 PM Catalin Marinas
> > <catalin.marinas@arm.com> wrote:
> > > On Thu, Oct 21, 2021 at 02:46:10AM +0200, Andreas Gruenbacher wrote:
> > > > When a page fault would occur, we
> > > > get back an error instead, and then we try to fault in the offending
> > > > pages. If a page is resident and we still get a fault trying to access
> > > > it, trying to fault in the same page again isn't going to help and we
> > > > have a true error.
> > >
> > > You can't be sure the second fault is a true error. The unlocked
> > > fault_in_*() may race with some LRU scheme making the pte not accessible
> > > or a write-back making it clean/read-only. copy_to_user() with
> > > pagefault_disabled() fails again but that's a benign fault. The
> > > filesystem should re-attempt the fault-in (gup would correct the pte),
> > > disable page faults and copy_to_user(), potentially in an infinite loop.
> > > If you bail out on the second/third uaccess following a fault_in_*()
> > > call, you may get some unexpected errors (though very rare). Maybe the
> > > filesystems avoid this problem somehow but I couldn't figure it out.
> >
> > Good point, we can indeed only bail out if both the user copy and the
> > fault-in fail.
> >
> > But probing the entire memory range in fault domain granularity in the
> > page fault-in functions still doesn't actually make sense. Those
> > functions really only need to guarantee that we'll be able to make
> > progress eventually. From that point of view, it should be enough to
> > probe the first byte of the requested memory range, so when one of
> > those functions reports that the next N bytes should be accessible,
> > this really means that the first byte surely isn't permanently
> > inaccessible and that the rest is likely accessible. Functions
> > fault_in_readable and fault_in_writeable already work that way, so
> > this only leaves function fault_in_safe_writeable to worry about.
>
> I agree, that's why generic_perform_write() works. It does a get_user()
> from the first byte in that range and the subsequent copy_from_user()
> will make progress of at least one byte if it was readable. Eventually
> it will hit the byte that faults. The gup-based fault_in_*() are a bit
> more problematic.
>
> Your series introduces fault_in_safe_writeable() and I think for MTE
> doing a _single_ get_user(uaddr) (in addition to the gup checks for
> write) would be sufficient as long as generic_file_read_iter() advances
> by at least one byte (eventually).
>
> This discussion started with the btrfs search_ioctl() where, even if
> some bytes were written in copy_to_sk(), it always restarts from an
> earlier position, reattempting to write the same bytes. Since
> copy_to_sk() doesn't guarantee forward progress even if some bytes are
> writable, Linus' suggestion was for fault_in_writable() to probe the
> whole range. I consider this overkill since btrfs is the only one that
> needs probing every 16 bytes. The other cases like the new
> fault_in_safe_writeable() can be fixed by probing the first byte only
> followed by gup.

Hmm. Direct I/O request sizes are multiples of the underlying device
block size, so we'll also get stuck there if fault-in won't give us a
full block. This is getting pretty ugly. So scratch that idea; let's
stick with probing the whole range.

Thanks,
Andreas

> I think we need to better define the semantics of the fault_in + uaccess
> sequences. For uaccess, we document "a hard requirement that not storing
> anything at all (i.e. returning size) should happen only when nothing
> could be copied" (from linux/uaccess.h). I think we can add a
> requirement for the new size_t-based fault_in_* variants without
> mandating that the whole range is probed, something like: "returning
> leftover < size guarantees that a subsequent user access at uaddr copies
> at least one byte eventually". I said "eventually" but maybe we can come
> up with some clearer wording for a liveness property.
>
> Such requirement would ensure that infinite loops of fault_in_* +
> uaccess make progress as long as they don't reset the probed range. Code
> like btrfs search_ioctl() would need to be adjusted to avoid such range
> reset and guarantee forward progress.
>
> --
> Catalin
>


  reply	other threads:[~2021-10-21 18:01 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 16:49 [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 01/19] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Andreas Gruenbacher
2021-09-09 11:09   ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 02/19] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 03/19] gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-27 19:08   ` Al Viro
2021-09-03 14:56   ` Filipe Manana
2021-09-28 15:02     ` Andreas Gruenbacher
2021-09-28 16:37       ` Matthew Wilcox
2021-09-28 20:41         ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 04/19] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-08-27 18:53   ` Al Viro
2021-08-27 18:57     ` Linus Torvalds
2021-08-27 19:16       ` Al Viro
2021-08-27 20:56   ` Kari Argillander
2021-08-28 17:13     ` Linus Torvalds
2021-08-27 16:49 ` [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-08-27 18:49   ` Al Viro
2021-08-27 19:05     ` Linus Torvalds
2021-08-27 19:23       ` Al Viro
2021-08-27 19:33         ` Linus Torvalds
2021-08-27 19:37           ` Al Viro
2021-08-27 21:48             ` Al Viro
2021-08-27 21:57               ` Al Viro
2021-08-27 23:22                 ` Luck, Tony
2021-08-28  2:20                   ` Luck, Tony
2021-08-28 21:47                   ` Thomas Gleixner
2021-08-28 22:04                     ` Al Viro
2021-08-28 22:11                       ` Al Viro
2021-08-28 22:19                         ` Al Viro
2021-08-28 22:51                           ` Al Viro
2021-08-29 18:44                             ` Thomas Gleixner
2021-08-29 19:46                               ` Al Viro
2021-08-29 19:51                                 ` Thomas Gleixner
2021-08-28 22:20                         ` Tony Luck
2021-08-29  1:40                           ` Matthew Wilcox
2021-08-30 15:41                             ` Luck, Tony
2021-08-28 22:23                       ` Thomas Gleixner
2021-08-28 19:28               ` [RFC][arm64] possible infinite loop in btrfs search_ioctl() Al Viro
2021-08-31 13:54                 ` Catalin Marinas
2021-08-31 15:28                   ` Al Viro
2021-08-31 16:01                     ` Catalin Marinas
2021-10-11 17:37                     ` Catalin Marinas
2021-10-11 19:15                       ` Linus Torvalds
2021-10-11 21:08                         ` Catalin Marinas
2021-10-11 23:59                           ` Linus Torvalds
2021-10-12 17:27                             ` Catalin Marinas
2021-10-12 17:58                               ` Linus Torvalds
2021-10-18 17:13                                 ` Catalin Marinas
2021-10-21  0:46                             ` Andreas Gruenbacher
2021-10-21 10:05                               ` Catalin Marinas
2021-10-21 14:42                                 ` Andreas Gruenbacher
2021-10-21 17:09                                   ` Catalin Marinas
2021-10-21 18:00                                     ` Andreas Gruenbacher [this message]
2021-10-22 18:41                                       ` Catalin Marinas
2021-10-25 19:37                                         ` Andreas Gruenbacher
2021-10-22  2:30                                   ` Linus Torvalds
2021-10-22  9:34                                     ` Catalin Marinas
2021-08-29  0:58               ` [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Al Viro
2021-08-27 16:49 ` [PATCH v7 06/19] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 07/19] gfs2: Clean up function may_grant Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 08/19] gfs2: Eliminate vestigial HIF_FIRST Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 09/19] gfs2: Remove redundant check from gfs2_glock_dq Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 10/19] gfs2: Introduce flag for glock holder auto-demotion Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 11/19] gfs2: Move the inode glock locking to gfs2_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 12/19] gfs2: Eliminate ip->i_gh Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 13/19] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 14/19] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:17   ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 15/19] iomap: Support partial direct I/O on user copy failures Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:20   ` Christoph Hellwig
2021-09-28 15:05     ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 16/19] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-08-27 18:30   ` Darrick J. Wong
2021-08-27 20:15     ` Andreas Gruenbacher
2021-08-27 21:32       ` Darrick J. Wong
2021-08-27 21:49         ` Andreas Grünbacher
2021-08-27 22:35         ` Linus Torvalds
2021-09-03 18:47           ` Darrick J. Wong
2021-09-03 18:53   ` Darrick J. Wong
2021-09-09 11:30   ` Christoph Hellwig
2021-09-09 17:22     ` Linus Torvalds
2021-09-10  7:36       ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 17/19] gup: Introduce FOLL_NOFAULT flag to disable page faults Andreas Gruenbacher
2021-09-09 11:36   ` Christoph Hellwig
2021-09-09 17:17     ` Linus Torvalds
2021-09-10  7:24       ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 18/19] iov_iter: Introduce nofault " Andreas Gruenbacher
2021-08-27 18:47   ` Al Viro
2021-08-27 19:56     ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 19/19] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-08-27 17:16 ` [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-09-01 19:52   ` Andreas Gruenbacher
2021-09-03 15:52     ` Linus Torvalds
2021-09-03 18:25       ` Al Viro
2021-09-03 18:47         ` Linus Torvalds
2021-09-03 15:07 ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHc6FU7im8UzxWCzqUFMKOwyg9zoQ8OZ_M+rRC_E20yE5RNu9g@mail.gmail.com \
    --to=agruenba@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cluster-devel@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --subject='Re: [RFC][arm64] possible infinite loop in btrfs search_ioctl()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).