Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* possible deadlock in proc_pid_syscall (2)
@ 2020-08-28  4:57 syzbot
  2020-08-28 12:01 ` Eric W. Biederman
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2020-08-28  4:57 UTC (permalink / raw)
  To: adobriyan, akpm, avagin, christian, ebiederm, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken

Hello,

syzbot found the following issue on:

HEAD commit:    15bc20c6 Merge tag 'tty-5.9-rc3' of git://git.kernel.org/p..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15349f96900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=978db74cb30aa994
dashboard link: https://syzkaller.appspot.com/bug?extid=db9cdf3dd1f64252c6ef
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc2-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.0/18445 is trying to acquire lock:
ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: lock_trace fs/proc/base.c:408 [inline]
ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646

but task is already holding lock:
ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&p->lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:956 [inline]
       __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
       seq_read+0x61/0x1070 fs/seq_file.c:155
       pde_read fs/proc/inode.c:306 [inline]
       proc_reg_read+0x221/0x300 fs/proc/inode.c:318
       do_loop_readv_writev fs/read_write.c:734 [inline]
       do_loop_readv_writev fs/read_write.c:721 [inline]
       do_iter_read+0x48e/0x6e0 fs/read_write.c:955
       vfs_readv+0xe5/0x150 fs/read_write.c:1073
       kernel_readv fs/splice.c:355 [inline]
       default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412
       do_splice_to+0x137/0x170 fs/splice.c:871
       splice_direct_to_actor+0x307/0x980 fs/splice.c:950
       do_splice_direct+0x1b3/0x280 fs/splice.c:1059
       do_sendfile+0x55f/0xd40 fs/read_write.c:1540
       __do_sys_sendfile64 fs/read_write.c:1601 [inline]
       __se_sys_sendfile64 fs/read_write.c:1587 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (sb_writers#4){.+.+}-{0:0}:
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write+0x234/0x470 fs/super.c:1672
       sb_start_write include/linux/fs.h:1643 [inline]
       mnt_want_write+0x3a/0xb0 fs/namespace.c:354
       ovl_setattr+0x5c/0x850 fs/overlayfs/inode.c:28
       notify_change+0xb60/0x10a0 fs/attr.c:336
       chown_common+0x4a9/0x550 fs/open.c:674
       do_fchownat+0x126/0x1e0 fs/open.c:704
       __do_sys_lchown fs/open.c:729 [inline]
       __se_sys_lchown fs/open.c:727 [inline]
       __x64_sys_lchown+0x7a/0xc0 fs/open.c:727
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
       down_read+0x96/0x420 kernel/locking/rwsem.c:1492
       inode_lock_shared include/linux/fs.h:789 [inline]
       lookup_slow fs/namei.c:1560 [inline]
       walk_component+0x409/0x6a0 fs/namei.c:1860
       lookup_last fs/namei.c:2309 [inline]
       path_lookupat+0x1ba/0x830 fs/namei.c:2333
       filename_lookup+0x19f/0x560 fs/namei.c:2366
       create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574
       perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323
       perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580
       perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899
       perf_init_event kernel/events/core.c:10951 [inline]
       perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229
       perf_event_alloc kernel/events/core.c:11608 [inline]
       __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&sig->exec_update_mutex){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2496 [inline]
       check_prevs_add kernel/locking/lockdep.c:2601 [inline]
       validate_chain kernel/locking/lockdep.c:3218 [inline]
       __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426
       lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005
       __mutex_lock_common kernel/locking/mutex.c:956 [inline]
       __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
       lock_trace fs/proc/base.c:408 [inline]
       proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646
       proc_single_show+0x116/0x1e0 fs/proc/base.c:775
       seq_read+0x432/0x1070 fs/seq_file.c:208
       do_loop_readv_writev fs/read_write.c:734 [inline]
       do_loop_readv_writev fs/read_write.c:721 [inline]
       do_iter_read+0x48e/0x6e0 fs/read_write.c:955
       vfs_readv+0xe5/0x150 fs/read_write.c:1073
       do_preadv fs/read_write.c:1165 [inline]
       __do_sys_preadv fs/read_write.c:1215 [inline]
       __se_sys_preadv fs/read_write.c:1210 [inline]
       __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  &sig->exec_update_mutex --> sb_writers#4 --> &p->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&p->lock);
                               lock(sb_writers#4);
                               lock(&p->lock);
  lock(&sig->exec_update_mutex);

 *** DEADLOCK ***

1 lock held by syz-executor.0/18445:
 #0: ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155

stack backtrace:
CPU: 0 PID: 18445 Comm: syz-executor.0 Not tainted 5.9.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
 check_prev_add kernel/locking/lockdep.c:2496 [inline]
 check_prevs_add kernel/locking/lockdep.c:2601 [inline]
 validate_chain kernel/locking/lockdep.c:3218 [inline]
 __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426
 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005
 __mutex_lock_common kernel/locking/mutex.c:956 [inline]
 __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
 lock_trace fs/proc/base.c:408 [inline]
 proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646
 proc_single_show+0x116/0x1e0 fs/proc/base.c:775
 seq_read+0x432/0x1070 fs/seq_file.c:208
 do_loop_readv_writev fs/read_write.c:734 [inline]
 do_loop_readv_writev fs/read_write.c:721 [inline]
 do_iter_read+0x48e/0x6e0 fs/read_write.c:955
 vfs_readv+0xe5/0x150 fs/read_write.c:1073
 do_preadv fs/read_write.c:1165 [inline]
 __do_sys_preadv fs/read_write.c:1215 [inline]
 __se_sys_preadv fs/read_write.c:1210 [inline]
 __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d5b9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb613f9ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
RAX: ffffffffffffffda RBX: 0000000000025740 RCX: 000000000045d5b9
RDX: 0000000000000333 RSI: 00000000200017c0 RDI: 0000000000000006
RBP: 000000000118cf90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c
R13: 00007ffe2a82bbbf R14: 00007fb613f9f9c0 R15: 000000000118cf4c


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-28  4:57 possible deadlock in proc_pid_syscall (2) syzbot
@ 2020-08-28 12:01 ` Eric W. Biederman
  2020-08-28 12:37   ` peterz
  0 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2020-08-28 12:01 UTC (permalink / raw)
  To: syzbot
  Cc: adobriyan, akpm, avagin, christian, gladkov.alexey, keescook,
	linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Miklos Szeredi

syzbot <syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com> writes:

> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    15bc20c6 Merge tag 'tty-5.9-rc3' of git://git.kernel.org/p..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15349f96900000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=978db74cb30aa994
> dashboard link: https://syzkaller.appspot.com/bug?extid=db9cdf3dd1f64252c6ef
> compiler:       gcc (GCC) 10.1.0-syz 20200507
>
> Unfortunately, I don't have any reproducer for this issue yet.

Ok.

So if I read this set of traces correctly:

- perf_event_open holds exec_update_mutex

- perf_event_open can call kern_path which for overlayfs takes ovl_i_mutex

- chown on overlayfs calls mnt_want_write which takes sb_writes

- sendfile/splice can read from a seq_file (which takes p->lock)
  while holding mnt_want_write  aka sb_writes of the target file. 

- There are proc files that are seq_files that first take p->lock
  then take exec_update_mutex

So this looks real, if painful.

I have added some likely looking overlayfs and perf people to look at
this.

This feels like an issue where perf can just do too much under
exec_update_mutex.  In particular calling kern_path from
create_local_trace_uprobe.  Calling into the vfs at the very least
makes it impossible to know exactly which locks will be taken.

Thoughts?

Eric

> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0-rc2-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor.0/18445 is trying to acquire lock:
> ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: lock_trace fs/proc/base.c:408 [inline]
> ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646
>
> but task is already holding lock:
> ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&p->lock){+.+.}-{3:3}:
>        __mutex_lock_common kernel/locking/mutex.c:956 [inline]
>        __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
>        seq_read+0x61/0x1070 fs/seq_file.c:155
>        pde_read fs/proc/inode.c:306 [inline]
>        proc_reg_read+0x221/0x300 fs/proc/inode.c:318
>        do_loop_readv_writev fs/read_write.c:734 [inline]
>        do_loop_readv_writev fs/read_write.c:721 [inline]
>        do_iter_read+0x48e/0x6e0 fs/read_write.c:955
>        vfs_readv+0xe5/0x150 fs/read_write.c:1073
>        kernel_readv fs/splice.c:355 [inline]
>        default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412
>        do_splice_to+0x137/0x170 fs/splice.c:871
>        splice_direct_to_actor+0x307/0x980 fs/splice.c:950
>        do_splice_direct+0x1b3/0x280 fs/splice.c:1059
>        do_sendfile+0x55f/0xd40 fs/read_write.c:1540
>        __do_sys_sendfile64 fs/read_write.c:1601 [inline]
>        __se_sys_sendfile64 fs/read_write.c:1587 [inline]
>        __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #2 (sb_writers#4){.+.+}-{0:0}:
>        percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
>        __sb_start_write+0x234/0x470 fs/super.c:1672
>        sb_start_write include/linux/fs.h:1643 [inline]
>        mnt_want_write+0x3a/0xb0 fs/namespace.c:354
>        ovl_setattr+0x5c/0x850 fs/overlayfs/inode.c:28
>        notify_change+0xb60/0x10a0 fs/attr.c:336
>        chown_common+0x4a9/0x550 fs/open.c:674
>        do_fchownat+0x126/0x1e0 fs/open.c:704
>        __do_sys_lchown fs/open.c:729 [inline]
>        __se_sys_lchown fs/open.c:727 [inline]
>        __x64_sys_lchown+0x7a/0xc0 fs/open.c:727
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
>        down_read+0x96/0x420 kernel/locking/rwsem.c:1492
>        inode_lock_shared include/linux/fs.h:789 [inline]
>        lookup_slow fs/namei.c:1560 [inline]
>        walk_component+0x409/0x6a0 fs/namei.c:1860
>        lookup_last fs/namei.c:2309 [inline]
>        path_lookupat+0x1ba/0x830 fs/namei.c:2333
>        filename_lookup+0x19f/0x560 fs/namei.c:2366
>        create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574
>        perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323
>        perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580
>        perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899
>        perf_init_event kernel/events/core.c:10951 [inline]
>        perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229
>        perf_event_alloc kernel/events/core.c:11608 [inline]
>        __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (&sig->exec_update_mutex){+.+.}-{3:3}:
>        check_prev_add kernel/locking/lockdep.c:2496 [inline]
>        check_prevs_add kernel/locking/lockdep.c:2601 [inline]
>        validate_chain kernel/locking/lockdep.c:3218 [inline]
>        __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426
>        lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005
>        __mutex_lock_common kernel/locking/mutex.c:956 [inline]
>        __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
>        lock_trace fs/proc/base.c:408 [inline]
>        proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646
>        proc_single_show+0x116/0x1e0 fs/proc/base.c:775
>        seq_read+0x432/0x1070 fs/seq_file.c:208
>        do_loop_readv_writev fs/read_write.c:734 [inline]
>        do_loop_readv_writev fs/read_write.c:721 [inline]
>        do_iter_read+0x48e/0x6e0 fs/read_write.c:955
>        vfs_readv+0xe5/0x150 fs/read_write.c:1073
>        do_preadv fs/read_write.c:1165 [inline]
>        __do_sys_preadv fs/read_write.c:1215 [inline]
>        __se_sys_preadv fs/read_write.c:1210 [inline]
>        __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210
>        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sig->exec_update_mutex --> sb_writers#4 --> &p->lock
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&p->lock);
>                                lock(sb_writers#4);
>                                lock(&p->lock);
>   lock(&sig->exec_update_mutex);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor.0/18445:
>  #0: ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155
>
> stack backtrace:
> CPU: 0 PID: 18445 Comm: syz-executor.0 Not tainted 5.9.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
>  check_prev_add kernel/locking/lockdep.c:2496 [inline]
>  check_prevs_add kernel/locking/lockdep.c:2601 [inline]
>  validate_chain kernel/locking/lockdep.c:3218 [inline]
>  __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426
>  lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005
>  __mutex_lock_common kernel/locking/mutex.c:956 [inline]
>  __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
>  lock_trace fs/proc/base.c:408 [inline]
>  proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646
>  proc_single_show+0x116/0x1e0 fs/proc/base.c:775
>  seq_read+0x432/0x1070 fs/seq_file.c:208
>  do_loop_readv_writev fs/read_write.c:734 [inline]
>  do_loop_readv_writev fs/read_write.c:721 [inline]
>  do_iter_read+0x48e/0x6e0 fs/read_write.c:955
>  vfs_readv+0xe5/0x150 fs/read_write.c:1073
>  do_preadv fs/read_write.c:1165 [inline]
>  __do_sys_preadv fs/read_write.c:1215 [inline]
>  __se_sys_preadv fs/read_write.c:1210 [inline]
>  __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45d5b9
> Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb613f9ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
> RAX: ffffffffffffffda RBX: 0000000000025740 RCX: 000000000045d5b9
> RDX: 0000000000000333 RSI: 00000000200017c0 RDI: 0000000000000006
> RBP: 000000000118cf90 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c
> R13: 00007ffe2a82bbbf R14: 00007fb613f9f9c0 R15: 000000000118cf4c
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-28 12:01 ` Eric W. Biederman
@ 2020-08-28 12:37   ` peterz
  2020-08-30 12:31     ` Eric W. Biederman
  0 siblings, 1 reply; 7+ messages in thread
From: peterz @ 2020-08-28 12:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi,
	jannh

On Fri, Aug 28, 2020 at 07:01:17AM -0500, Eric W. Biederman wrote:
> This feels like an issue where perf can just do too much under
> exec_update_mutex.  In particular calling kern_path from
> create_local_trace_uprobe.  Calling into the vfs at the very least
> makes it impossible to know exactly which locks will be taken.
> 
> Thoughts?

> > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
> >        down_read+0x96/0x420 kernel/locking/rwsem.c:1492
> >        inode_lock_shared include/linux/fs.h:789 [inline]
> >        lookup_slow fs/namei.c:1560 [inline]
> >        walk_component+0x409/0x6a0 fs/namei.c:1860
> >        lookup_last fs/namei.c:2309 [inline]
> >        path_lookupat+0x1ba/0x830 fs/namei.c:2333
> >        filename_lookup+0x19f/0x560 fs/namei.c:2366
> >        create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574
> >        perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323
> >        perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580
> >        perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899
> >        perf_init_event kernel/events/core.c:10951 [inline]
> >        perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229
> >        perf_event_alloc kernel/events/core.c:11608 [inline]
> >        __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724
> >        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> >        entry_SYSCALL_64_after_hwframe+0x44/0xa9

Right, so we hold the mutex fairly long there, supposedly to ensure
privs don't change out from under us.

We do the permission checks early -- no point in doing anything else if
we're not allowed, but we then have to hold this mutex until the event
is actually installed according to that comment.

/me goes look at git history:

  6914303824bb5 - changed cred_guard_mutex into exec_update_mutex
  79c9ce57eb2d5 - introduces cred_guard_mutex

So that latter commit explains the race we're guarding against. Without
this we can install the event after execve() which might have changed
privs on us.

I'm open to suggestions on how to do this differently.

Could we check privs twice instead?

Something like the completely untested below..

---
diff --git a/include/linux/freelist.h b/include/linux/freelist.h
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bfe8e3c6e44..14e6c9bbfcda 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11701,21 +11701,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (task) {
-		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
-		if (err)
-			goto err_task;
-
-		/*
-		 * Preserve ptrace permission check for backwards compatibility.
-		 *
-		 * We must hold exec_update_mutex across this and any potential
-		 * perf_install_in_context() call for this new event to
-		 * serialize against exec() altering our credentials (and the
-		 * perf_event_exit_task() that could imply).
-		 */
 		err = -EACCES;
 		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
-			goto err_cred;
+			goto err_task;
 	}
 
 	if (flags & PERF_FLAG_PID_CGROUP)
@@ -11844,6 +11832,24 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_context;
 	}
 
+	if (task) {
+		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
+		if (err)
+			goto err_file;
+
+		/*
+		 * Preserve ptrace permission check for backwards compatibility.
+		 *
+		 * We must hold exec_update_mutex across this and any potential
+		 * perf_install_in_context() call for this new event to
+		 * serialize against exec() altering our credentials (and the
+		 * perf_event_exit_task() that could imply).
+		 */
+		err = -EACCES;
+		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
+			goto err_cred;
+	}
+
 	if (move_group) {
 		gctx = __perf_event_ctx_lock_double(group_leader, ctx);
 
@@ -12019,7 +12025,10 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (move_group)
 		perf_event_ctx_unlock(group_leader, gctx);
 	mutex_unlock(&ctx->mutex);
-/* err_file: */
+err_cred:
+	if (task)
+		mutex_unlock(&task->signal->exec_update_mutex);
+err_file:
 	fput(event_file);
 err_context:
 	perf_unpin_context(ctx);
@@ -12031,9 +12040,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	 */
 	if (!event_file)
 		free_event(event);
-err_cred:
-	if (task)
-		mutex_unlock(&task->signal->exec_update_mutex);
 err_task:
 	if (task)
 		put_task_struct(task);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-28 12:37   ` peterz
@ 2020-08-30 12:31     ` Eric W. Biederman
  2020-08-31  7:43       ` peterz
  2020-09-02  9:57       ` peterz
  0 siblings, 2 replies; 7+ messages in thread
From: Eric W. Biederman @ 2020-08-30 12:31 UTC (permalink / raw)
  To: peterz
  Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi,
	jannh

peterz@infradead.org writes:

> On Fri, Aug 28, 2020 at 07:01:17AM -0500, Eric W. Biederman wrote:
>> This feels like an issue where perf can just do too much under
>> exec_update_mutex.  In particular calling kern_path from
>> create_local_trace_uprobe.  Calling into the vfs at the very least
>> makes it impossible to know exactly which locks will be taken.
>> 
>> Thoughts?
>
>> > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
>> >        down_read+0x96/0x420 kernel/locking/rwsem.c:1492
>> >        inode_lock_shared include/linux/fs.h:789 [inline]
>> >        lookup_slow fs/namei.c:1560 [inline]
>> >        walk_component+0x409/0x6a0 fs/namei.c:1860
>> >        lookup_last fs/namei.c:2309 [inline]
>> >        path_lookupat+0x1ba/0x830 fs/namei.c:2333
>> >        filename_lookup+0x19f/0x560 fs/namei.c:2366
>> >        create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574
>> >        perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323
>> >        perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580
>> >        perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899
>> >        perf_init_event kernel/events/core.c:10951 [inline]
>> >        perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229
>> >        perf_event_alloc kernel/events/core.c:11608 [inline]
>> >        __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724
>> >        do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>> >        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Right, so we hold the mutex fairly long there, supposedly to ensure
> privs don't change out from under us.
>
> We do the permission checks early -- no point in doing anything else if
> we're not allowed, but we then have to hold this mutex until the event
> is actually installed according to that comment.
>
> /me goes look at git history:
>
>   6914303824bb5 - changed cred_guard_mutex into exec_update_mutex
>   79c9ce57eb2d5 - introduces cred_guard_mutex
>
> So that latter commit explains the race we're guarding against. Without
> this we can install the event after execve() which might have changed
> privs on us.
>
> I'm open to suggestions on how to do this differently.
>
> Could we check privs twice instead?
>
> Something like the completely untested below..

That might work.

I am thinking that for cases where we want to do significant work it
might be better to ask the process to pause at someplace safe (probably
get_signal) and then do all of the work when we know nothing is changing
in the process.

I don't really like the idea of checking and then checking again.  We
might have to do it but it feels like the model is wrong somewhere.

Given that this is tricky to hit in practice, and given that I am
already working the general problem of how to sort out the locking I am
going to work this with the rest of the thorny issues of in exec.  This
feels like a case where the proper solution is that we simply need
something better than a mutex.


I had not realized before this how much setting up tracing in
perf_even_open looks like attaching a debugger in ptrace_attach.


I need to look at this some more but I suspect exec should be
treating a tracer like exec currently treats a ptracer for
purposes of permission checks.  So I think we may have more issues
than simply the possibility of deadlock on exec_update_mutex.

Eric


> ---
> diff --git a/include/linux/freelist.h b/include/linux/freelist.h
> new file mode 100644
> index 000000000000..e69de29bb2d1
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5bfe8e3c6e44..14e6c9bbfcda 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11701,21 +11701,9 @@ SYSCALL_DEFINE5(perf_event_open,
>  	}
>  
>  	if (task) {
> -		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
> -		if (err)
> -			goto err_task;
> -
> -		/*
> -		 * Preserve ptrace permission check for backwards compatibility.
> -		 *
> -		 * We must hold exec_update_mutex across this and any potential
> -		 * perf_install_in_context() call for this new event to
> -		 * serialize against exec() altering our credentials (and the
> -		 * perf_event_exit_task() that could imply).
> -		 */
>  		err = -EACCES;
>  		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> -			goto err_cred;
> +			goto err_task;
>  	}
>  
>  	if (flags & PERF_FLAG_PID_CGROUP)
> @@ -11844,6 +11832,24 @@ SYSCALL_DEFINE5(perf_event_open,
>  		goto err_context;
>  	}
>  
> +	if (task) {
> +		err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
> +		if (err)
> +			goto err_file;
> +
> +		/*
> +		 * Preserve ptrace permission check for backwards compatibility.
> +		 *
> +		 * We must hold exec_update_mutex across this and any potential
> +		 * perf_install_in_context() call for this new event to
> +		 * serialize against exec() altering our credentials (and the
> +		 * perf_event_exit_task() that could imply).
> +		 */
> +		err = -EACCES;
> +		if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> +			goto err_cred;
> +	}
> +
>  	if (move_group) {
>  		gctx = __perf_event_ctx_lock_double(group_leader, ctx);
>  
> @@ -12019,7 +12025,10 @@ SYSCALL_DEFINE5(perf_event_open,
>  	if (move_group)
>  		perf_event_ctx_unlock(group_leader, gctx);
>  	mutex_unlock(&ctx->mutex);
> -/* err_file: */
> +err_cred:
> +	if (task)
> +		mutex_unlock(&task->signal->exec_update_mutex);
> +err_file:
>  	fput(event_file);
>  err_context:
>  	perf_unpin_context(ctx);
> @@ -12031,9 +12040,6 @@ SYSCALL_DEFINE5(perf_event_open,
>  	 */
>  	if (!event_file)
>  		free_event(event);
> -err_cred:
> -	if (task)
> -		mutex_unlock(&task->signal->exec_update_mutex);
>  err_task:
>  	if (task)
>  		put_task_struct(task);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-30 12:31     ` Eric W. Biederman
@ 2020-08-31  7:43       ` peterz
  2020-08-31 13:52         ` Eric W. Biederman
  2020-09-02  9:57       ` peterz
  1 sibling, 1 reply; 7+ messages in thread
From: peterz @ 2020-08-31  7:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi,
	jannh

On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote:

> I am thinking that for cases where we want to do significant work it
> might be better to ask the process to pause at someplace safe (probably
> get_signal) and then do all of the work when we know nothing is changing
> in the process.
> 
> I don't really like the idea of checking and then checking again.  We
> might have to do it but it feels like the model is wrong somewhere.
> 
> Given that this is tricky to hit in practice, and given that I am
> already working the general problem of how to sort out the locking I am
> going to work this with the rest of the thorny issues of in exec.  This
> feels like a case where the proper solution is that we simply need
> something better than a mutex.

One possible alternative would be something RCU-like, surround the thing
with get_task_cred() / put_cred() and then have commit_creds() wait for
the usage of the old creds to drop to 0 before continuing.

(Also, get_cred_rcu() is disgusting for casting away const)

But this could be complete garbage, I'm not much familiar with any of
thise code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-31  7:43       ` peterz
@ 2020-08-31 13:52         ` Eric W. Biederman
  0 siblings, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2020-08-31 13:52 UTC (permalink / raw)
  To: peterz
  Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi,
	jannh

peterz@infradead.org writes:

> On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote:
>
>> I am thinking that for cases where we want to do significant work it
>> might be better to ask the process to pause at someplace safe (probably
>> get_signal) and then do all of the work when we know nothing is changing
>> in the process.
>> 
>> I don't really like the idea of checking and then checking again.  We
>> might have to do it but it feels like the model is wrong somewhere.
>> 
>> Given that this is tricky to hit in practice, and given that I am
>> already working the general problem of how to sort out the locking I am
>> going to work this with the rest of the thorny issues of in exec.  This
>> feels like a case where the proper solution is that we simply need
>> something better than a mutex.
>
> One possible alternative would be something RCU-like, surround the thing
> with get_task_cred() / put_cred() and then have commit_creds() wait for
> the usage of the old creds to drop to 0 before continuing.
>
> (Also, get_cred_rcu() is disgusting for casting away const)
>
> But this could be complete garbage, I'm not much familiar with any of
> thise code.

This looks like an area of code that will take a couple of passes to get
100% right.

Usually changing creds happens atomically, and separately from
everything else so we simply don't care if there a race,  Either the old
creds or the new creds are valid.

With exec the situation is trickier as several things in addition to the
cred are changing at the same time.  So a lock is needed.

Now that it is separated from the cred_guard_mutex, probably the easiest
solution is to make exec_update_mutex a sleeping reader writer lock.
There are fewer cases that matter as such a lock would only block on
exec (the writer).

I don't understand perf well enough to do much without carefully
studying the code.

Eric








^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: possible deadlock in proc_pid_syscall (2)
  2020-08-30 12:31     ` Eric W. Biederman
  2020-08-31  7:43       ` peterz
@ 2020-09-02  9:57       ` peterz
  1 sibling, 0 replies; 7+ messages in thread
From: peterz @ 2020-09-02  9:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey,
	keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi,
	jannh

On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote:
> peterz@infradead.org writes:

> > Could we check privs twice instead?
> >
> > Something like the completely untested below..
> 
> That might work.
> 
> I am thinking that for cases where we want to do significant work it
> might be better to ask the process to pause at someplace safe (probably
> get_signal) and then do all of the work when we know nothing is changing
> in the process.
> 
> I don't really like the idea of checking and then checking again.  We
> might have to do it but it feels like the model is wrong somewhere.

Another possible aproach might be to grab a copy of the cred pointer and
have the final install check that. It means we need to allow
perf_install_in_context() to fail though. That might be a little more
work.

> I had not realized before this how much setting up tracing in
> perf_even_open looks like attaching a debugger in ptrace_attach.

Same problem; once you've attached a perf event you can observe much of
what the task does.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-02  9:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28  4:57 possible deadlock in proc_pid_syscall (2) syzbot
2020-08-28 12:01 ` Eric W. Biederman
2020-08-28 12:37   ` peterz
2020-08-30 12:31     ` Eric W. Biederman
2020-08-31  7:43       ` peterz
2020-08-31 13:52         ` Eric W. Biederman
2020-09-02  9:57       ` peterz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).