Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: syzbot <syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com>,
	akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mszeredi@redhat.com, syzkaller-bugs@googlegroups.com,
	viro@zeniv.linux.org.uk
Subject: Re: kernel BUG at mm/hugetlb.c:LINE!
Date: Mon, 6 Apr 2020 15:05:58 -0700	[thread overview]
Message-ID: <aa7812b8-60ae-8578-40db-e71ad766b4d3@oracle.com> (raw)
In-Reply-To: <000000000000b4684e05a2968ca6@google.com>

On 4/5/20 8:06 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    1a323ea5 x86: get rid of 'errret' argument to __get_user_x..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=172e940be00000
> 
> The bug was bisected to:
> 
> commit e950564b97fd0f541b02eb207685d0746f5ecf29
> Author: Miklos Szeredi <mszeredi@redhat.com>
> Date:   Tue Jul 24 13:01:55 2018 +0000
> 
>     vfs: don't evict uninitialized inode
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
> Fixes: e950564b97fd ("vfs: don't evict uninitialized inode")
> 
> overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.
> ------------[ cut here ]------------
> kernel BUG at mm/hugetlb.c:3416!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416
> Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00
> RSP: 0018:ffffc900017779b0 EFLAGS: 00010293
> RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4
> RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007
> RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111
> R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000
> R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000
> FS:  00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507
>  unmap_single_vma+0x238/0x300 mm/memory.c:1296
>  unmap_vmas+0x16f/0x2f0 mm/memory.c:1332
>  exit_mmap+0x2aa/0x510 mm/mmap.c:3126
>  __mmput kernel/fork.c:1082 [inline]
>  mmput+0x168/0x4b0 kernel/fork.c:1103
>  exit_mm kernel/exit.c:477 [inline]
>  do_exit+0xa51/0x2dd0 kernel/exit.c:780
>  do_group_exit+0x125/0x340 kernel/exit.c:891
>  __do_sys_exit_group kernel/exit.c:902 [inline]
>  __se_sys_exit_group kernel/exit.c:900 [inline]
>  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900
>  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3

This is not new and certainly not caused by commit e950564b97fd.

hugetlbf only operates on huge page aligned and sized files/mappings.
To make sure this happens, the mmap code contians the following to 'round
up' length to huge page size:

	if (!(flags & MAP_ANONYMOUS)) {
		audit_mmap_fd(fd, flags);
		file = fget(fd);
		if (!file)
			return -EBADF;
		if (is_file_hugepages(file))
			len = ALIGN(len, huge_page_size(hstate_file(file)));
		retval = -EINVAL;
		if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
			goto out_fput;
	} else if (flags & MAP_HUGETLB) {
		struct user_struct *user = NULL;
		struct hstate *hs;

		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
		if (!hs)
			return -EINVAL;

		len = ALIGN(len, huge_page_size(hs));

However, in this syzbot test case the 'file' is in an overlayfs filesystem
created as follows:

mkdir("./file0", 000)                   = 0
mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
chdir("./file0")                        = 0
mkdir("./file1", 000)                   = 0
mkdir("./bus", 000)                     = 0
mkdir("./file0", 000)                   = 0
mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0

The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:

	if (file->f_op == &hugetlbfs_file_operations)
		return true;

Since the file is in an overlayfs, file->f_op == ovl_file_operations.
Therefore, length will not be rounded up to huge page size and we create a
mapping with incorrect size which leads to the BUG.

Because of the code in mmap, the hugetlbfs mmap() routine assumes length is
rounded to a huge page size.  I can easily add a check to hugetlbfs mmap
to validate length and return -EINVAL.  However, I think we really want to
do the 'round up' earlier in mmap.  This is because the man page says:

   Huge page (Huge TLB) mappings
       For mappings that employ huge pages, the requirements for the arguments
       of  mmap()  and munmap() differ somewhat from the requirements for map‐
       pings that use the native system page size.

       For mmap(), offset must be a multiple of the underlying huge page size.
       The system automatically aligns length to be a multiple of the underly‐
       ing huge page size.

Since the location for the mapping is chosen BEFORE getting to the hugetlbfs
mmap routine, we can not wait until then to round up the length.  Is there a
defined way to go from a struct file * to the underlying filesystem so we
can continue to do the 'round up' in early mmap code?

One other thing I noticed with overlayfs is that it does not contain a
specific get_unmapped_area file_operations routine.  I would expect it to at
least check for and use the get_unmapped_area of the underlying filesystem?
Can someone comment if this is by design?
In the case of hugetlbfs, get_unmapped_area is even provided by most
architectures.  So, it seems like we would like/need to be calling the correct
routine.
-- 
Mike Kravetz

  reply	other threads:[~2020-04-06 22:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-06  3:06 syzbot
2020-04-06 22:05 ` Mike Kravetz [this message]
2020-05-12 15:04   ` Miklos Szeredi
2020-05-12 18:11     ` Mike Kravetz
2020-05-15 22:15       ` Mike Kravetz
2020-05-18 11:12         ` Miklos Szeredi
2020-05-18 23:22           ` Mike Kravetz
2020-05-18 23:41     ` Colin Walters
2020-05-19  0:35       ` Mike Kravetz
2020-05-20 11:20         ` Miklos Szeredi
2020-05-20 17:27           ` Mike Kravetz
2020-05-22 10:05             ` Miklos Szeredi
2020-05-28  0:01               ` Mike Kravetz
2020-05-28  8:37                 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot
2020-05-28 21:01                   ` Mike Kravetz
2020-06-04  9:16                     ` Miklos Szeredi
2020-06-11  0:13                       ` Mike Kravetz
2020-06-11  0:37                         ` Al Viro
2020-06-11  1:36                           ` Matthew Wilcox
2020-06-11  2:17                             ` Al Viro
2020-06-11  2:31                               ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7812b8-60ae-8578-40db-e71ad766b4d3@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mszeredi@redhat.com \
    --cc=syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    --subject='Re: kernel BUG at mm/hugetlb.c:LINE'\!'' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).