LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-07 16:52 [PATCH] account_group_exec_runtime: fix the racy usage of ->signal Oleg Nesterov
@ 2008-11-07 16:21 ` Ingo Molnar
  2008-11-07 16:58   ` Doug Chapman
  2008-11-07 17:40   ` Oleg Nesterov
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-11-07 16:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, adobriyan, Doug Chapman, Peter Zijlstra,
	Roland McGrath, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> Compile tested.
> 
> Unlike other similar routines, account_group_exec_runtime() could be
> called "implicitly" after exit_notify(). This means we can race with
> the parent doing release_task(), we can't just check ->signal != NULL.
> 
> Take ->siglock to make sure ->signal can't go away.
> 
> This is the minimal fix, with this patch we don't need need get/put cpu,
> and I think we should uninline this function.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

> 
> --- K-28/kernel/sched_stats.h~A_G_E_R_FIX	2008-11-07 17:32:02.000000000 +0100
> +++ K-28/kernel/sched_stats.h	2008-11-07 17:44:39.000000000 +0100
> @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
>  					      unsigned long long ns)
>  {
>  	struct signal_struct *sig;
> +	unsigned long flags;
>  
> -	sig = tsk->signal;
> -	if (unlikely(!sig))
> +	if (unlikely(!lock_task_sighand(tsk, &flags)))
>  		return;

i think this will lock up: the signal lock must not nest inside the rq 
lock, and these accounting functions are called from within the 
scheduler.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
@ 2008-11-07 16:52 Oleg Nesterov
  2008-11-07 16:21 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-07 16:52 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: adobriyan, Doug Chapman, Peter Zijlstra, Roland McGrath, linux-kernel

Compile tested.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" after exit_notify(). This means we can race with
the parent doing release_task(), we can't just check ->signal != NULL.

Take ->siglock to make sure ->signal can't go away.

This is the minimal fix, with this patch we don't need need get/put cpu,
and I think we should uninline this function.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

--- K-28/kernel/sched_stats.h~A_G_E_R_FIX	2008-11-07 17:32:02.000000000 +0100
+++ K-28/kernel/sched_stats.h	2008-11-07 17:44:39.000000000 +0100
@@ -351,10 +351,12 @@ static inline void account_group_exec_ru
 					      unsigned long long ns)
 {
 	struct signal_struct *sig;
+	unsigned long flags;
 
-	sig = tsk->signal;
-	if (unlikely(!sig))
+	if (unlikely(!lock_task_sighand(tsk, &flags)))
 		return;
+
+	sig = tsk->signal;
 	if (sig->cputime.totals) {
 		struct task_cputime *times;
 
@@ -362,4 +364,6 @@ static inline void account_group_exec_ru
 		times->sum_exec_runtime += ns;
 		put_cpu_no_resched();
 	}
+
+	unlock_task_sighand(tsk, &flags);
 }


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-07 16:21 ` Ingo Molnar
@ 2008-11-07 16:58   ` Doug Chapman
  2008-11-07 18:42     ` Oleg Nesterov
  2008-11-07 17:40   ` Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Doug Chapman @ 2008-11-07 16:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Andrew Morton, adobriyan, Peter Zijlstra,
	Roland McGrath, linux-kernel


On Fri, 2008-11-07 at 17:21 +0100, Ingo Molnar wrote:
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > Compile tested.
> > 
> > Unlike other similar routines, account_group_exec_runtime() could be
> > called "implicitly" after exit_notify(). This means we can race with
> > the parent doing release_task(), we can't just check ->signal != NULL.
> > 
> > Take ->siglock to make sure ->signal can't go away.
> > 
> > This is the minimal fix, with this patch we don't need need get/put cpu,
> > and I think we should uninline this function.
> > 
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> > 
> > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX	2008-11-07 17:32:02.000000000 +0100
> > +++ K-28/kernel/sched_stats.h	2008-11-07 17:44:39.000000000 +0100
> > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
> >  					      unsigned long long ns)
> >  {
> >  	struct signal_struct *sig;
> > +	unsigned long flags;
> >  
> > -	sig = tsk->signal;
> > -	if (unlikely(!sig))
> > +	if (unlikely(!lock_task_sighand(tsk, &flags)))
> >  		return;
> 
> i think this will lock up: the signal lock must not nest inside the rq 
> lock, and these accounting functions are called from within the 
> scheduler.
> 
> 	Ingo

I can confirm that this does hang on bootup.

- Doug



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-07 16:21 ` Ingo Molnar
  2008-11-07 16:58   ` Doug Chapman
@ 2008-11-07 17:40   ` Oleg Nesterov
  2008-11-08  9:28     ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-07 17:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, adobriyan, Doug Chapman, Peter Zijlstra,
	Roland McGrath, linux-kernel

On 11/07, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX	2008-11-07 17:32:02.000000000 +0100
> > +++ K-28/kernel/sched_stats.h	2008-11-07 17:44:39.000000000 +0100
> > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
> >  					      unsigned long long ns)
> >  {
> >  	struct signal_struct *sig;
> > +	unsigned long flags;
> >
> > -	sig = tsk->signal;
> > -	if (unlikely(!sig))
> > +	if (unlikely(!lock_task_sighand(tsk, &flags)))
> >  		return;
>
> i think this will lock up:

Ah. I worried about this, but convinced myself this is OK...

> the signal lock must not nest inside the rq
> lock, and these accounting functions are called from within the
> scheduler.

Why? we seem to never do task_rq_lock() under ->siglock ?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-07 16:58   ` Doug Chapman
@ 2008-11-07 18:42     ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-07 18:42 UTC (permalink / raw)
  To: Doug Chapman
  Cc: Ingo Molnar, Andrew Morton, adobriyan, Peter Zijlstra,
	Roland McGrath, linux-kernel

On 11/07, Doug Chapman wrote:
> 
> On Fri, 2008-11-07 at 17:21 +0100, Ingo Molnar wrote:
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
> > >  					      unsigned long long ns)
> > >  {
> > >  	struct signal_struct *sig;
> > > +	unsigned long flags;
> > >
> > > -	sig = tsk->signal;
> > > -	if (unlikely(!sig))
> > > +	if (unlikely(!lock_task_sighand(tsk, &flags)))
> > >  		return;
> >
> > i think this will lock up: the signal lock must not nest inside the rq 
> > lock, and these accounting functions are called from within the 
> > scheduler.
>
> I can confirm that this does hang on bootup.

Thanks a lot Doug.

If only I could understand what happens. I am running the 2.6.27 kernel
with the patch below just fine.

Ingo, could you please explain?

OK, perhaps we can check ->exit_state... I'll return on Monday.


--- linux-2.6.27/kernel/sched_fair.c~DBG	2008-10-10 00:13:53.000000000 +0200
+++ linux-2.6.27/kernel/sched_fair.c	2008-11-07 19:15:28.000000000 +0100
@@ -484,6 +484,16 @@ __update_curr(struct cfs_rq *cfs_rq, str
 	curr->vruntime += delta_exec_weighted;
 }
 
+static void ttt(struct task_struct *tsk)
+{
+	unsigned long flags;
+
+	if (unlikely(!lock_task_sighand(tsk, &flags)))
+		return;
+
+	unlock_task_sighand(tsk, &flags);
+}
+
 static void update_curr(struct cfs_rq *cfs_rq)
 {
 	struct sched_entity *curr = cfs_rq->curr;
@@ -507,6 +517,7 @@ static void update_curr(struct cfs_rq *c
 		struct task_struct *curtask = task_of(curr);
 
 		cpuacct_charge(curtask, delta_exec);
+		ttt(curtask);
 	}
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-07 17:40   ` Oleg Nesterov
@ 2008-11-08  9:28     ` Ingo Molnar
  2008-11-10 13:04       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-11-08  9:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, adobriyan, Doug Chapman, Peter Zijlstra,
	Roland McGrath, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 11/07, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX	2008-11-07 17:32:02.000000000 +0100
> > > +++ K-28/kernel/sched_stats.h	2008-11-07 17:44:39.000000000 +0100
> > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
> > >  					      unsigned long long ns)
> > >  {
> > >  	struct signal_struct *sig;
> > > +	unsigned long flags;
> > >
> > > -	sig = tsk->signal;
> > > -	if (unlikely(!sig))
> > > +	if (unlikely(!lock_task_sighand(tsk, &flags)))
> > >  		return;
> >
> > i think this will lock up:
> 
> Ah. I worried about this, but convinced myself this is OK...
> 
> > the signal lock must not nest inside the rq
> > lock, and these accounting functions are called from within the
> > scheduler.
> 
> Why? we seem to never do task_rq_lock() under ->siglock ?

signal_wake_up() ?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-10 13:04       ` Oleg Nesterov
@ 2008-11-10 12:13         ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2008-11-10 12:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Andrew Morton, adobriyan, Doug Chapman,
	Roland McGrath, linux-kernel

On Mon, 2008-11-10 at 14:04 +0100, Oleg Nesterov wrote:
> On 11/08, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > On 11/07, Ingo Molnar wrote:
> > > >
> > > > the signal lock must not nest inside the rq
> > > > lock, and these accounting functions are called from within the
> > > > scheduler.
> > >
> > > Why? we seem to never do task_rq_lock() under ->siglock ?
> >
> > signal_wake_up() ?
> 
> I'd wish very much I could say I have already realized this, but I didn't.
> Thanks Ingo!
> 
> I don't see the good solution for this problem. I'll send the new patch in
> a minute, but it is ugly. Basically it is
> 
> 	--- a/kernel/exit.c
> 	+++ b/kernel/exit.c
> 	@@ -141,6 +141,8 @@ static void __exit_signal(struct task_st
> 		if (sig) {
> 			flush_sigqueue(&sig->shared_pending);
> 			taskstats_tgid_free(sig);
> 	+		smp_mb();
> 	+		spin_unlock_wait(&task_rq(tsk)->lock);
> 			__cleanup_signal(sig);
> 		}
> 	 }
> 
> except this needs a helper in sched.c. You can nack it right now ;)
> Of course we can protect ->signal with rcu, but this is even worse
> imho.
> 
> Anybody sees a bettter fix?
> 
> 
> Perhaps we can change sched.c to do update_curr() only when the
> task is not running (except ->task_tick), iow perhaps we can check
> sleep/wakeup == T before calling update_cur(). But this is not easy
> even if really possible.

and butt ugly to boot..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal
  2008-11-08  9:28     ` Ingo Molnar
@ 2008-11-10 13:04       ` Oleg Nesterov
  2008-11-10 12:13         ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-10 13:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, adobriyan, Doug Chapman, Peter Zijlstra,
	Roland McGrath, linux-kernel

On 11/08, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 11/07, Ingo Molnar wrote:
> > >
> > > the signal lock must not nest inside the rq
> > > lock, and these accounting functions are called from within the
> > > scheduler.
> >
> > Why? we seem to never do task_rq_lock() under ->siglock ?
>
> signal_wake_up() ?

I'd wish very much I could say I have already realized this, but I didn't.
Thanks Ingo!

I don't see the good solution for this problem. I'll send the new patch in
a minute, but it is ugly. Basically it is

	--- a/kernel/exit.c
	+++ b/kernel/exit.c
	@@ -141,6 +141,8 @@ static void __exit_signal(struct task_st
		if (sig) {
			flush_sigqueue(&sig->shared_pending);
			taskstats_tgid_free(sig);
	+		smp_mb();
	+		spin_unlock_wait(&task_rq(tsk)->lock);
			__cleanup_signal(sig);
		}
	 }

except this needs a helper in sched.c. You can nack it right now ;)
Of course we can protect ->signal with rcu, but this is even worse
imho.

Anybody sees a bettter fix?


Perhaps we can change sched.c to do update_curr() only when the
task is not running (except ->task_tick), iow perhaps we can check
sleep/wakeup == T before calling update_cur(). But this is not easy
even if really possible.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-11-10 12:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-07 16:52 [PATCH] account_group_exec_runtime: fix the racy usage of ->signal Oleg Nesterov
2008-11-07 16:21 ` Ingo Molnar
2008-11-07 16:58   ` Doug Chapman
2008-11-07 18:42     ` Oleg Nesterov
2008-11-07 17:40   ` Oleg Nesterov
2008-11-08  9:28     ` Ingo Molnar
2008-11-10 13:04       ` Oleg Nesterov
2008-11-10 12:13         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).