LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap
@ 2018-05-19  3:15 Vasily Gorbik
  2018-05-19  3:15 ` [PATCH] procfs: fix mmap() for /proc/vmcore Vasily Gorbik
  0 siblings, 1 reply; 9+ messages in thread
From: Vasily Gorbik @ 2018-05-19  3:15 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Al Viro; +Cc: linux-kernel, Heiko Carstens

Greetings,

be83bbf80682 "mmap: introduce sane default mmap limits" introduced a
problem with mapping /proc/vmcore if it is bigger than 2gb. This
breaks s390 kernel zfcpdump. But it should be a general problem.

Please consider the following one-liner fix, if it makes sense.

Vasily Gorbik (1):
  procfs: fix mmap() for /proc/vmcore

 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  3:15 [v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap Vasily Gorbik
@ 2018-05-19  3:15 ` Vasily Gorbik
  2018-05-19  3:20   ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Vasily Gorbik @ 2018-05-19  3:15 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Al Viro; +Cc: linux-kernel, Heiko Carstens

Procfs does not set max file size (s_maxbytes) and ends up with default
value of
MAX_NON_LFS	 ((1UL<<31) - 1)
Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
introduced "pgoff" limits checks for mmap, considering fs max file size
as upper limit, which made it impossible to mmap /proc/vmcore file
contents above 2Gb (/proc/vmcore appears to be the only procfs file
supporting mmap).

Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
---
 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..de1b3b1da883 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	/* User space would break if executables or devices appear on proc */
 	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
 	s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
+	s->s_maxbytes = MAX_LFS_FILESIZE;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
 	s->s_magic = PROC_SUPER_MAGIC;
-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  3:15 ` [PATCH] procfs: fix mmap() for /proc/vmcore Vasily Gorbik
@ 2018-05-19  3:20   ` Linus Torvalds
  2018-05-19  3:33     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-05-19  3:20 UTC (permalink / raw)
  To: gor; +Cc: Andrew Morton, Al Viro, Linux Kernel Mailing List, Heiko Carstens

On Fri, May 18, 2018 at 8:15 PM Vasily Gorbik <gor@linux.ibm.com> wrote:

> Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
> introduced "pgoff" limits checks for mmap, considering fs max file size
> as upper limit, which made it impossible to mmap /proc/vmcore file
> contents above 2Gb (/proc/vmcore appears to be the only procfs file
> supporting mmap).

> Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Ugh. /proc is where a lot of problems *have* been.

Admittedly not as much as random drivers, but still. If proc doesn't set
s_maxbytes, then we should not raise it here magically, there might be
various /proc files that get offset accounting wrong.

I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
rather than open up all proc files to issues with 4G+ offsets.

                Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  3:20   ` Linus Torvalds
@ 2018-05-19  3:33     ` Linus Torvalds
  2018-05-19  3:43       ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-05-19  3:33 UTC (permalink / raw)
  To: gor; +Cc: Andrew Morton, Al Viro, Linux Kernel Mailing List, Heiko Carstens

On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
torvalds@linux-foundation.org> wrote:

> I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> rather than open up all proc files to issues with 4G+ offsets.

Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
ask you how that ever worked for that file, but it's not there, the
s_maxbyte checks are only in lseek and in do_splice().

So apparently we protect against llseek + read/write, but we don't protect
against pread64/pwrite64 having offset overflows..

That's crazy. That makes all the s_maxbytes protection much less effective
than it should be. Filesystems that don't get the 64-bit case right will
screw up pread64 and friends.

Al, I'm missing something. Did we always have this gaping hole where we
didn't actually check s_maxbytes against read/write, only
generic_file_llseek? Apparently.

               Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  3:33     ` Linus Torvalds
@ 2018-05-19  3:43       ` Al Viro
  2018-05-19  4:12         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2018-05-19  3:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: gor, Andrew Morton, Linux Kernel Mailing List, Heiko Carstens

On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
> torvalds@linux-foundation.org> wrote:
> 
> > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> > rather than open up all proc files to issues with 4G+ offsets.
> 
> Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
> ask you how that ever worked for that file, but it's not there, the
> s_maxbyte checks are only in lseek and in do_splice().
> 
> So apparently we protect against llseek + read/write, but we don't protect
> against pread64/pwrite64 having offset overflows..
> 
> That's crazy. That makes all the s_maxbytes protection much less effective
> than it should be. Filesystems that don't get the 64-bit case right will
> screw up pread64 and friends.
> 
> Al, I'm missing something. Did we always have this gaping hole where we
> didn't actually check s_maxbytes against read/write, only
> generic_file_llseek? Apparently.

Not quite.  The things like
        if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
                return 0;
        iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
protect most of the regular files (see mm/filemap.c).  And for devices (which is
where the majority of crap ->read()/->write() is) it's obviously not applicable -
->s_maxbytes of *what*?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  3:43       ` Al Viro
@ 2018-05-19  4:12         ` Linus Torvalds
  2018-05-19  4:16           ` Linus Torvalds
  2018-05-19 11:39           ` Vasily Gorbik
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-05-19  4:12 UTC (permalink / raw)
  To: Al Viro; +Cc: gor, Andrew Morton, Linux Kernel Mailing List, Heiko Carstens

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On Fri, May 18, 2018 at 8:43 PM Al Viro <viro@zeniv.linux.org.uk> wrote:

> Not quite.  The things like
>          if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>                  return 0;
>          iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> protect most of the regular files (see mm/filemap.c).  And for devices
(which is
> where the majority of crap ->read()/->write() is) it's obviously not
applicable -
> ->s_maxbytes of *what*?

Yeah that "s_maxbytes of what" is I think the real issue. We should never
have made s_maxbytes be super-block specific: we should have made it be
per-inode, and then have inode_init_always() initialize it using something
like the file_mmap_size_max() logic.

(So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
would only be used as a "fill in inode value for regular files for this
superblock").

Then we could actually protect read/write properly, because many of the
nasty bugs have been in character device drivers.

Oh well. It would still be a good thing to do some day, I suspect, but it's
clearly not the case now, and so s_maxbytes actually has much less coverage
than I was hoping for.

(And thus also the problems with /proc/vmcore - it never saw s_maxbytes
limits before).

Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
means that my objections against Vasily's patch are mostly invalid. Even if
/proc does use "generic_file_llseek()" a lot and that should limit things
to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
the offset.

I'd still prefer to limit the damage to just "vmcore".

Something like the below COMPLETELY UNTESTED patch? Vasily?

             Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 658 bytes --]

 fs/proc/vmcore.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..83278c547127 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 }
 #endif
 
+/* Mark vmcore as being able and willing to do 64-bit mmaps */
+static int vmcore_open(struct inode *inode, struct file *file)
+{
+	file->f_mode |= FMODE_UNSIGNED_OFFSET;
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
+	.open		= vmcore_open,
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
 	.mmap		= mmap_vmcore,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  4:12         ` Linus Torvalds
@ 2018-05-19  4:16           ` Linus Torvalds
  2018-05-19 11:39           ` Vasily Gorbik
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-05-19  4:16 UTC (permalink / raw)
  To: Al Viro; +Cc: gor, Andrew Morton, Linux Kernel Mailing List, Heiko Carstens

On Fri, May 18, 2018 at 9:12 PM Linus Torvalds <
torvalds@linux-foundation.org> wrote:

> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").

Actually, maybe we could just make rw_verify_area() use the new
file_mmap_size_max() function.

We'd just remove the "mmap" part of the name, and have all IO (and all
mmap) use that limit.

That doesn't sound like a horrible idea to try for 4.18. Hmm?

               Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19  4:12         ` Linus Torvalds
  2018-05-19  4:16           ` Linus Torvalds
@ 2018-05-19 11:39           ` Vasily Gorbik
  2018-05-19 16:45             ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Vasily Gorbik @ 2018-05-19 11:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Andrew Morton, Linux Kernel Mailing List, Heiko Carstens

On Fri, May 18, 2018 at 09:12:56PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:43 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > Not quite.  The things like
> >          if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> >                  return 0;
> >          iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> > protect most of the regular files (see mm/filemap.c).  And for devices
> (which is
> > where the majority of crap ->read()/->write() is) it's obviously not
> applicable -
> > ->s_maxbytes of *what*?
> 
> Yeah that "s_maxbytes of what" is I think the real issue. We should never
> have made s_maxbytes be super-block specific: we should have made it be
> per-inode, and then have inode_init_always() initialize it using something
> like the file_mmap_size_max() logic.
> 
> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").
> 
> Then we could actually protect read/write properly, because many of the
> nasty bugs have been in character device drivers.
> 
> Oh well. It would still be a good thing to do some day, I suspect, but it's
> clearly not the case now, and so s_maxbytes actually has much less coverage
> than I was hoping for.
> 
> (And thus also the problems with /proc/vmcore - it never saw s_maxbytes
> limits before).
> 
> Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
> means that my objections against Vasily's patch are mostly invalid. Even if
> /proc does use "generic_file_llseek()" a lot and that should limit things
> to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
> the offset.
> 
> I'd still prefer to limit the damage to just "vmcore".
> 
> Something like the below COMPLETELY UNTESTED patch? Vasily?

Would work, but file_mmap_size_max first checks
	if (S_ISREG(inode->i_mode))
		return inode->i_sb->s_maxbytes;
before
	if (file->f_mode & FMODE_UNSIGNED_OFFSET)
		return 0;
so, as it is this patch does not fix the issue.

>              Linus
>
>  fs/proc/vmcore.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af22a60..83278c547127 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
>  }
>  #endif
>  
> +/* Mark vmcore as being able and willing to do 64-bit mmaps */
> +static int vmcore_open(struct inode *inode, struct file *file)
> +{
> +	file->f_mode |= FMODE_UNSIGNED_OFFSET;
> +	return 0;
> +}
> +
>  static const struct file_operations proc_vmcore_operations = {
> +	.open		= vmcore_open,
>  	.read		= read_vmcore,
>  	.llseek		= default_llseek,
>  	.mmap		= mmap_vmcore,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] procfs: fix mmap() for /proc/vmcore
  2018-05-19 11:39           ` Vasily Gorbik
@ 2018-05-19 16:45             ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2018-05-19 16:45 UTC (permalink / raw)
  To: gor; +Cc: Al Viro, Andrew Morton, Linux Kernel Mailing List, Heiko Carstens

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Sat, May 19, 2018 at 4:39 AM Vasily Gorbik <gor@linux.ibm.com> wrote:

> Would work, but file_mmap_size_max first checks
>              if (S_ISREG(inode->i_mode))
>                      return inode->i_sb->s_maxbytes;
> before
>              if (file->f_mode & FMODE_UNSIGNED_OFFSET)
>                      return 0;
> so, as it is this patch does not fix the issue.

Right you are.

We could easily reverse the order of those two checks, but since clearly
the s_maxbytes case hadn't gotten as much coverage as I thought it had,
let's just simplify file_mmap_size_max() instead, and just make the limit
be MAX_LFS_FILESIZE for regular files too, the way we already do for block
devices (instead of trying to figure out the size of the block device).

That way we won't be introducing changes to s_maxbytes late in the rc
series, and we'll only affect the new logic.

Plus regular files was never what that logic really was introduced to
protect against anyway. It's the random ad-hoc mmap implementations in
drivers that it really aimed to protect.

So for now, I've committed the minimal fixlet that doesn't affect any
existing code (just the new max file size logic), and maybe this can be
revisited for 4.18.

                Linus

[-- Attachment #2: 0001-mmap-relax-file-size-limit-for-regular-files.patch --]
[-- Type: application/x-patch, Size: 2055 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-19 16:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-19  3:15 [v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap Vasily Gorbik
2018-05-19  3:15 ` [PATCH] procfs: fix mmap() for /proc/vmcore Vasily Gorbik
2018-05-19  3:20   ` Linus Torvalds
2018-05-19  3:33     ` Linus Torvalds
2018-05-19  3:43       ` Al Viro
2018-05-19  4:12         ` Linus Torvalds
2018-05-19  4:16           ` Linus Torvalds
2018-05-19 11:39           ` Vasily Gorbik
2018-05-19 16:45             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).