LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
@ 2022-02-08 16:39 Waiman Long
  2022-02-08 18:16 ` Al Viro
  2022-02-09 22:18 ` Eric W. Biederman
  0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2022-02-08 16:39 UTC (permalink / raw)
  To: Christian Brauner, Eric W. Biederman, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, Al Viro
  Cc: linux-kernel, Waiman Long

I was made aware of the following lockdep splat:

[ 2516.308763] =====================================================
[ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
[ 2516.309703] -----------------------------------------------------
[ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
[ 2516.310944]
               and this task is already holding:
[ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
[ 2516.311804] which would create a new lock dependency:
[ 2516.312066]  (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
[ 2516.312446]
               but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2516.312983]  (&sighand->siglock){-.-.}-{2:2}
   :
[ 2516.330700]  Possible interrupt unsafe locking scenario:

[ 2516.331075]        CPU0                    CPU1
[ 2516.331328]        ----                    ----
[ 2516.331580]   lock(&newf->file_lock);
[ 2516.331790]                                local_irq_disable();
[ 2516.332231]                                lock(&sighand->siglock);
[ 2516.332579]                                lock(&newf->file_lock);
[ 2516.332922]   <Interrupt>
[ 2516.333069]     lock(&sighand->siglock);
[ 2516.333291]
                *** DEADLOCK ***
[ 2516.389845]
               stack backtrace:
[ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
[ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2516.391155] Call trace:
[ 2516.391302]  dump_backtrace+0x0/0x3e0
[ 2516.391518]  show_stack+0x24/0x30
[ 2516.391717]  dump_stack_lvl+0x9c/0xd8
[ 2516.391938]  dump_stack+0x1c/0x38
[ 2516.392247]  print_bad_irq_dependency+0x620/0x710
[ 2516.392525]  check_irq_usage+0x4fc/0x86c
[ 2516.392756]  check_prev_add+0x180/0x1d90
[ 2516.392988]  validate_chain+0x8e0/0xee0
[ 2516.393215]  __lock_acquire+0x97c/0x1e40
[ 2516.393449]  lock_acquire.part.0+0x240/0x570
[ 2516.393814]  lock_acquire+0x90/0xb4
[ 2516.394021]  _raw_spin_lock+0xe8/0x154
[ 2516.394244]  fd_install+0x368/0x4f0
[ 2516.394451]  copy_process+0x1f5c/0x3e80
[ 2516.394678]  kernel_clone+0x134/0x660
[ 2516.394895]  __do_sys_clone3+0x130/0x1f4
[ 2516.395128]  __arm64_sys_clone3+0x5c/0x7c
[ 2516.395478]  invoke_syscall.constprop.0+0x78/0x1f0
[ 2516.395762]  el0_svc_common.constprop.0+0x22c/0x2c4
[ 2516.396050]  do_el0_svc+0xb0/0x10c
[ 2516.396252]  el0_svc+0x24/0x34
[ 2516.396436]  el0t_64_sync_handler+0xa4/0x12c
[ 2516.396688]  el0t_64_sync+0x198/0x19c
[ 2517.491197] NET: Registered PF_ATMPVC protocol family
[ 2517.491524] NET: Registered PF_ATMSVC protocol family
[ 2591.991877] sched: RT throttling activated

One way to solve this problem is to move the fd_install() call out of
the sighand->siglock critical section.

Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
on cleanups"), the pidfd installation was done without holding both
the task_list lock and the sighand->siglock. Obviously, holding these
two locks are not really needed to protect the fd_install() call.
So move the fd_install() call down to after the releases of both locks.

Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/fork.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..007af7fb47c7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2323,10 +2323,6 @@ static __latent_entropy struct task_struct *copy_process(
 		goto bad_fork_cancel_cgroup;
 	}
 
-	/* past the last point of failure */
-	if (pidfile)
-		fd_install(pidfd, pidfile);
-
 	init_task_pid_links(p);
 	if (likely(p->pid)) {
 		ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
@@ -2375,6 +2371,9 @@ static __latent_entropy struct task_struct *copy_process(
 	syscall_tracepoint_update(p);
 	write_unlock_irq(&tasklist_lock);
 
+	if (pidfile)
+		fd_install(pidfd, pidfile);
+
 	proc_fork_connector(p);
 	sched_post_fork(p, args);
 	cgroup_post_fork(p, args);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 16:39 [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section Waiman Long
@ 2022-02-08 18:16 ` Al Viro
  2022-02-08 18:51   ` Waiman Long
  2022-02-09 22:18 ` Eric W. Biederman
  1 sibling, 1 reply; 10+ messages in thread
From: Al Viro @ 2022-02-08 18:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christian Brauner, Eric W. Biederman, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:

> One way to solve this problem is to move the fd_install() call out of
> the sighand->siglock critical section.
> 
> Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
> on cleanups"), the pidfd installation was done without holding both
> the task_list lock and the sighand->siglock. Obviously, holding these
> two locks are not really needed to protect the fd_install() call.
> So move the fd_install() call down to after the releases of both locks.

	Umm... That assumes we can delay it that far.  IOW, that nothing
relies upon having pidfd observable in /proc/*/fd as soon as the child
becomes visible there in the first place.

	What warranties are expected from CLONE_PIDFD wrt observation of
child's descriptor table?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 18:16 ` Al Viro
@ 2022-02-08 18:51   ` Waiman Long
  2022-02-08 19:07     ` Al Viro
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2022-02-08 18:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Eric W. Biederman, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

On 2/8/22 13:16, Al Viro wrote:
> On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:
>
>> One way to solve this problem is to move the fd_install() call out of
>> the sighand->siglock critical section.
>>
>> Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
>> on cleanups"), the pidfd installation was done without holding both
>> the task_list lock and the sighand->siglock. Obviously, holding these
>> two locks are not really needed to protect the fd_install() call.
>> So move the fd_install() call down to after the releases of both locks.
> 	Umm... That assumes we can delay it that far.  IOW, that nothing
> relies upon having pidfd observable in /proc/*/fd as soon as the child
> becomes visible there in the first place.
>
> 	What warranties are expected from CLONE_PIDFD wrt observation of
> child's descriptor table?
>
I think the fd_install() call can be moved after the release of 
sighand->siglock but before the release the tasklist_lock. Will that be 
good enough?

I am afraid that I am not knowledgeable enough to talk about the 
CLONE_PIDFD expectation. May other people chime in on this?

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 18:51   ` Waiman Long
@ 2022-02-08 19:07     ` Al Viro
  2022-02-08 21:59       ` Eric W. Biederman
  2022-02-11  8:27       ` Christian Brauner
  0 siblings, 2 replies; 10+ messages in thread
From: Al Viro @ 2022-02-08 19:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christian Brauner, Eric W. Biederman, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

On Tue, Feb 08, 2022 at 01:51:35PM -0500, Waiman Long wrote:
> On 2/8/22 13:16, Al Viro wrote:
> > On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:
> > 
> > > One way to solve this problem is to move the fd_install() call out of
> > > the sighand->siglock critical section.
> > > 
> > > Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
> > > on cleanups"), the pidfd installation was done without holding both
> > > the task_list lock and the sighand->siglock. Obviously, holding these
> > > two locks are not really needed to protect the fd_install() call.
> > > So move the fd_install() call down to after the releases of both locks.
> > 	Umm... That assumes we can delay it that far.  IOW, that nothing
> > relies upon having pidfd observable in /proc/*/fd as soon as the child
> > becomes visible there in the first place.
> > 
> > 	What warranties are expected from CLONE_PIDFD wrt observation of
> > child's descriptor table?
> > 
> I think the fd_install() call can be moved after the release of
> sighand->siglock but before the release the tasklist_lock. Will that be good
> enough?

Looks like it should, but I'd rather hear from the CLONE_PIDFD authors first...
Christian, could you comment on that?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 19:07     ` Al Viro
@ 2022-02-08 21:59       ` Eric W. Biederman
  2022-02-08 22:16         ` Al Viro
  2022-02-09 16:25         ` Waiman Long
  2022-02-11  8:27       ` Christian Brauner
  1 sibling, 2 replies; 10+ messages in thread
From: Eric W. Biederman @ 2022-02-08 21:59 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, Christian Brauner, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Tue, Feb 08, 2022 at 01:51:35PM -0500, Waiman Long wrote:
>> On 2/8/22 13:16, Al Viro wrote:
>> > On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:
>> > 
>> > > One way to solve this problem is to move the fd_install() call out of
>> > > the sighand->siglock critical section.
>> > > 
>> > > Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
>> > > on cleanups"), the pidfd installation was done without holding both
>> > > the task_list lock and the sighand->siglock. Obviously, holding these
>> > > two locks are not really needed to protect the fd_install() call.
>> > > So move the fd_install() call down to after the releases of both locks.
>> > 	Umm... That assumes we can delay it that far.  IOW, that nothing
>> > relies upon having pidfd observable in /proc/*/fd as soon as the child
>> > becomes visible there in the first place.
>> > 
>> > 	What warranties are expected from CLONE_PIDFD wrt observation of
>> > child's descriptor table?
>> > 
>> I think the fd_install() call can be moved after the release of
>> sighand->siglock but before the release the tasklist_lock. Will that be good
>> enough?
>
> Looks like it should, but I'd rather hear from the CLONE_PIDFD authors first...
> Christian, could you comment on that?

The tasklist_lock and the siglock provide no protection against
being looked up in proc.

The proc filesystem looks up process information with things only
protected by the rcu_read_lock().  Which means that the process
will be visible through proc after "attach_pid(p, PIDTYPE_PID".

The fd is being installed in the fdtable of the parent process,
and the siglock and tasklist_lock are held to protect the child.


Further fd_install is exposing the fd to userspace where it can be used
by the process_madvise and the process_mrelease system calls, from
anything that shares the fdtable of the parent thread.  Which means it
needs to be guaranteed that kernel_clone will call wake_up_process
before it is safe to call fd_install.


So it appears to me that moving fd_install earlier fundamentally unsafe,
and the locks are meaningless from an fd_install perspective.

Which means it should be perfectly fine to move the fd_install outside
of the tasklist_lock and the outside siglock.


I don't see how we could support the fd appearing in the fdtable sooner
which seems to make the question moot as to weather userspace in some
odd corner case expects the fd to appear in the fdtable sooner.

So I say move fd_install down with proc_fork_connector and friends.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 21:59       ` Eric W. Biederman
@ 2022-02-08 22:16         ` Al Viro
  2022-02-08 22:25           ` Eric W. Biederman
  2022-02-09 16:25         ` Waiman Long
  1 sibling, 1 reply; 10+ messages in thread
From: Al Viro @ 2022-02-08 22:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Waiman Long, Christian Brauner, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

On Tue, Feb 08, 2022 at 03:59:06PM -0600, Eric W. Biederman wrote:

> The fd is being installed in the fdtable of the parent process,
> and the siglock and tasklist_lock are held to protect the child.
> 
> 
> Further fd_install is exposing the fd to userspace where it can be used
> by the process_madvise and the process_mrelease system calls, from
> anything that shares the fdtable of the parent thread.  Which means it
> needs to be guaranteed that kernel_clone will call wake_up_process
> before it is safe to call fd_install.

You mean "no calling fd_install() until after we are past the last possible
failure exit, by which point we know that wake_up_process() will eventually
be called", hopefully?  If so (as I assumed all along), anything downstream
of
        if (fatal_signal_pending(current)) {
		retval = -EINTR;
		goto bad_fork_cancel_cgroup;
	}

should be fine...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 22:16         ` Al Viro
@ 2022-02-08 22:25           ` Eric W. Biederman
  0 siblings, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2022-02-08 22:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, Christian Brauner, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Tue, Feb 08, 2022 at 03:59:06PM -0600, Eric W. Biederman wrote:
>
>> The fd is being installed in the fdtable of the parent process,
>> and the siglock and tasklist_lock are held to protect the child.
>> 
>> 
>> Further fd_install is exposing the fd to userspace where it can be used
>> by the process_madvise and the process_mrelease system calls, from
>> anything that shares the fdtable of the parent thread.  Which means it
>> needs to be guaranteed that kernel_clone will call wake_up_process
>> before it is safe to call fd_install.
>
> You mean "no calling fd_install() until after we are past the last possible
> failure exit, by which point we know that wake_up_process() will eventually
> be called", hopefully?  If so (as I assumed all along), anything downstream
> of
>         if (fatal_signal_pending(current)) {
> 		retval = -EINTR;
> 		goto bad_fork_cancel_cgroup;
> 	}
>
> should be fine...

Except for the problems of calling fd_install under siglock, and
tasklist_lock, which protect nothing and cause lockdep splats.

There may also be assumptions on the task actually being fully setup,
if not today then in a future use pidfd.  So I am not particularly
comfortable with fd_install coming before we drop tasklist_lock.

I was pointing out that to resolve the locking issue we fundamentally
can not move the fd_install earlier, to resolve the locking issues.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 21:59       ` Eric W. Biederman
  2022-02-08 22:16         ` Al Viro
@ 2022-02-09 16:25         ` Waiman Long
  1 sibling, 0 replies; 10+ messages in thread
From: Waiman Long @ 2022-02-09 16:25 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro
  Cc: Christian Brauner, Andrew Morton, Jens Axboe, Alexey Gladkov,
	David Hildenbrand, Jann Horn, linux-kernel

On 2/8/22 16:59, Eric W. Biederman wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
>
>> On Tue, Feb 08, 2022 at 01:51:35PM -0500, Waiman Long wrote:
>>> On 2/8/22 13:16, Al Viro wrote:
>>>> On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:
>>>>
>>>>> One way to solve this problem is to move the fd_install() call out of
>>>>> the sighand->siglock critical section.
>>>>>
>>>>> Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
>>>>> on cleanups"), the pidfd installation was done without holding both
>>>>> the task_list lock and the sighand->siglock. Obviously, holding these
>>>>> two locks are not really needed to protect the fd_install() call.
>>>>> So move the fd_install() call down to after the releases of both locks.
>>>> 	Umm... That assumes we can delay it that far.  IOW, that nothing
>>>> relies upon having pidfd observable in /proc/*/fd as soon as the child
>>>> becomes visible there in the first place.
>>>>
>>>> 	What warranties are expected from CLONE_PIDFD wrt observation of
>>>> child's descriptor table?
>>>>
>>> I think the fd_install() call can be moved after the release of
>>> sighand->siglock but before the release the tasklist_lock. Will that be good
>>> enough?
>> Looks like it should, but I'd rather hear from the CLONE_PIDFD authors first...
>> Christian, could you comment on that?
> The tasklist_lock and the siglock provide no protection against
> being looked up in proc.
>
> The proc filesystem looks up process information with things only
> protected by the rcu_read_lock().  Which means that the process
> will be visible through proc after "attach_pid(p, PIDTYPE_PID".
>
> The fd is being installed in the fdtable of the parent process,
> and the siglock and tasklist_lock are held to protect the child.
>
>
> Further fd_install is exposing the fd to userspace where it can be used
> by the process_madvise and the process_mrelease system calls, from
> anything that shares the fdtable of the parent thread.  Which means it
> needs to be guaranteed that kernel_clone will call wake_up_process
> before it is safe to call fd_install.
>
>
> So it appears to me that moving fd_install earlier fundamentally unsafe,
> and the locks are meaningless from an fd_install perspective.
>
> Which means it should be perfectly fine to move the fd_install outside
> of the tasklist_lock and the outside siglock.
>
>
> I don't see how we could support the fd appearing in the fdtable sooner
> which seems to make the question moot as to weather userspace in some
> odd corner case expects the fd to appear in the fdtable sooner.
>
> So I say move fd_install down with proc_fork_connector and friends.

Right. Keeping fd_install() inside of tasklist_lock may also be 
problematic as a read lock can be taken at interrupt context which may 
cause similar lockdep splat. So I am keep this patch as is.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 16:39 [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section Waiman Long
  2022-02-08 18:16 ` Al Viro
@ 2022-02-09 22:18 ` Eric W. Biederman
  1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2022-02-09 22:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christian Brauner, Andrew Morton, Jens Axboe, Alexey Gladkov,
	David Hildenbrand, Jann Horn, Al Viro, linux-kernel

Waiman Long <longman@redhat.com> writes:

> I was made aware of the following lockdep splat:
>
> [ 2516.308763] =====================================================
> [ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> [ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
> [ 2516.309703] -----------------------------------------------------
> [ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
> [ 2516.310944]
>                and this task is already holding:
> [ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
> [ 2516.311804] which would create a new lock dependency:
> [ 2516.312066]  (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
> [ 2516.312446]
>                but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 2516.312983]  (&sighand->siglock){-.-.}-{2:2}
>    :
> [ 2516.330700]  Possible interrupt unsafe locking scenario:
>
> [ 2516.331075]        CPU0                    CPU1
> [ 2516.331328]        ----                    ----
> [ 2516.331580]   lock(&newf->file_lock);
> [ 2516.331790]                                local_irq_disable();
> [ 2516.332231]                                lock(&sighand->siglock);
> [ 2516.332579]                                lock(&newf->file_lock);
> [ 2516.332922]   <Interrupt>
> [ 2516.333069]     lock(&sighand->siglock);
> [ 2516.333291]
>                 *** DEADLOCK ***
> [ 2516.389845]
>                stack backtrace:
> [ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
> [ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [ 2516.391155] Call trace:
> [ 2516.391302]  dump_backtrace+0x0/0x3e0
> [ 2516.391518]  show_stack+0x24/0x30
> [ 2516.391717]  dump_stack_lvl+0x9c/0xd8
> [ 2516.391938]  dump_stack+0x1c/0x38
> [ 2516.392247]  print_bad_irq_dependency+0x620/0x710
> [ 2516.392525]  check_irq_usage+0x4fc/0x86c
> [ 2516.392756]  check_prev_add+0x180/0x1d90
> [ 2516.392988]  validate_chain+0x8e0/0xee0
> [ 2516.393215]  __lock_acquire+0x97c/0x1e40
> [ 2516.393449]  lock_acquire.part.0+0x240/0x570
> [ 2516.393814]  lock_acquire+0x90/0xb4
> [ 2516.394021]  _raw_spin_lock+0xe8/0x154
> [ 2516.394244]  fd_install+0x368/0x4f0
> [ 2516.394451]  copy_process+0x1f5c/0x3e80
> [ 2516.394678]  kernel_clone+0x134/0x660
> [ 2516.394895]  __do_sys_clone3+0x130/0x1f4
> [ 2516.395128]  __arm64_sys_clone3+0x5c/0x7c
> [ 2516.395478]  invoke_syscall.constprop.0+0x78/0x1f0
> [ 2516.395762]  el0_svc_common.constprop.0+0x22c/0x2c4
> [ 2516.396050]  do_el0_svc+0xb0/0x10c
> [ 2516.396252]  el0_svc+0x24/0x34
> [ 2516.396436]  el0t_64_sync_handler+0xa4/0x12c
> [ 2516.396688]  el0t_64_sync+0x198/0x19c
> [ 2517.491197] NET: Registered PF_ATMPVC protocol family
> [ 2517.491524] NET: Registered PF_ATMSVC protocol family
> [ 2591.991877] sched: RT throttling activated
>
> One way to solve this problem is to move the fd_install() call out of
> the sighand->siglock critical section.
>
> Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
> on cleanups"), the pidfd installation was done without holding both
> the task_list lock and the sighand->siglock. Obviously, holding these
> two locks are not really needed to protect the fd_install() call.
> So move the fd_install() call down to after the releases of both
> locks.
>
> Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")
> Signed-off-by: Waiman Long <longman@redhat.com>

The fd_install can not be moved earlier and it does need to be moved
outside the locks.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

> ---
>  kernel/fork.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d75a528f7b21..007af7fb47c7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2323,10 +2323,6 @@ static __latent_entropy struct task_struct *copy_process(
>  		goto bad_fork_cancel_cgroup;
>  	}
>  
> -	/* past the last point of failure */
> -	if (pidfile)
> -		fd_install(pidfd, pidfile);
> -
>  	init_task_pid_links(p);
>  	if (likely(p->pid)) {
>  		ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
> @@ -2375,6 +2371,9 @@ static __latent_entropy struct task_struct *copy_process(
>  	syscall_tracepoint_update(p);
>  	write_unlock_irq(&tasklist_lock);
>  
> +	if (pidfile)
> +		fd_install(pidfd, pidfile);
> +
>  	proc_fork_connector(p);
>  	sched_post_fork(p, args);
>  	cgroup_post_fork(p, args);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section
  2022-02-08 19:07     ` Al Viro
  2022-02-08 21:59       ` Eric W. Biederman
@ 2022-02-11  8:27       ` Christian Brauner
  1 sibling, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2022-02-11  8:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Waiman Long, Eric W. Biederman, Andrew Morton, Jens Axboe,
	Alexey Gladkov, David Hildenbrand, Jann Horn, linux-kernel

On Tue, Feb 08, 2022 at 07:07:41PM +0000, Al Viro wrote:
> On Tue, Feb 08, 2022 at 01:51:35PM -0500, Waiman Long wrote:
> > On 2/8/22 13:16, Al Viro wrote:
> > > On Tue, Feb 08, 2022 at 11:39:12AM -0500, Waiman Long wrote:
> > > 
> > > > One way to solve this problem is to move the fd_install() call out of
> > > > the sighand->siglock critical section.
> > > > 
> > > > Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
> > > > on cleanups"), the pidfd installation was done without holding both
> > > > the task_list lock and the sighand->siglock. Obviously, holding these
> > > > two locks are not really needed to protect the fd_install() call.
> > > > So move the fd_install() call down to after the releases of both locks.
> > > 	Umm... That assumes we can delay it that far.  IOW, that nothing
> > > relies upon having pidfd observable in /proc/*/fd as soon as the child
> > > becomes visible there in the first place.
> > > 
> > > 	What warranties are expected from CLONE_PIDFD wrt observation of
> > > child's descriptor table?
> > > 
> > I think the fd_install() call can be moved after the release of
> > sighand->siglock but before the release the tasklist_lock. Will that be good
> > enough?
> 
> Looks like it should, but I'd rather hear from the CLONE_PIDFD authors first...
> Christian, could you comment on that?

Sorry, I'm on vacation right now until the 17th so I'm a bit mia right
now.

Short story is, I think it's fine to move the fd_install() later.

We explicitly did not give such a strong guarantee for /proc/*/fd as
we figured that's unpleasant to do cleanly and really not required, i.e.
there's no obvious use-case for that apart from really weird
corner-cases that I think we can fend off.

(For the most part we expect people that use pidfd to almost not rely on
proc. When they access proc they likely want get metadata about the
process - something which cannot yet be done using pidfds - and that's
usually only relevant later in the lifecycle of a process.)

Originally the code installed the fd before grabbing tasklist lock and
siglock but as you rightly pointed out back then this gets us into
ksys_close() territory which is yucky. So I think this was an oversight
when we fixed that.

I can pick that patch up now. Thanks for all the reviews.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-11  8:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08 16:39 [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section Waiman Long
2022-02-08 18:16 ` Al Viro
2022-02-08 18:51   ` Waiman Long
2022-02-08 19:07     ` Al Viro
2022-02-08 21:59       ` Eric W. Biederman
2022-02-08 22:16         ` Al Viro
2022-02-08 22:25           ` Eric W. Biederman
2022-02-09 16:25         ` Waiman Long
2022-02-11  8:27       ` Christian Brauner
2022-02-09 22:18 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).