LKML Archive on
help / color / mirror / Atom feed
From: Oleg Nesterov <>
To: Roland McGrath <>
Cc: Jiri Kosina <>,
	Davide Libenzi <>,
	Andrew Morton <>,
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases
Date: Fri, 29 Feb 2008 15:01:01 +0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20080229023902.D93E12700FD@magilla.localdomain>

(Sorry Roland, re-sending with cc's)

On 02/28, Roland McGrath wrote:
> > BTW, I think we have the same problem when handle_stop_signal() does
> > do_notify_parent_cldstop(p, CLD_STOPPED) above. The multithreaded app
> > in the middle of the group stop can resume without actually seeing
> > SIGCONT, no?
> I don't think I follow the scenario you have in mind there.  When siglock
> is dropped there, noone has been woken up yet.  It's just as if the sending
> of SIGCONT had not begun yet.


> > Note also sig_needs_tasklist(), we take tasklist_lock() exactly because
> > we will probably drop siglock.
> do_notify_parent_cldstop will always need tasklist_lock because of its
> parent link.  With my patch below, the signal-posting path should never
> drop and reacquire the siglock any more.  So we could just make
> do_notify_parent_cldstop take the lock (read_lock nests anyway) and the
> signal-posting path callers don't need to take it.

We have to take tasklist in advance to make sure that the target task
can't go away once we drop its ->siglock.

Otherwise, finish_signal() can do:

	if (p->signal)
		do_notify_parent_cldstop(p, notify);

but the parent can miss the notification. Is it OK?

> > What do you think about the approach at least?
> My first reaction was that it would have the wrong semantics.  The special
> effect of SIGCONT to resume the process happens at the time of signal
> generation, not at signal delivery.  The CLD_CONTINUED notification to the
> parent has to happen when the resumption happens.  If SIGCONT is blocked or
> ignored, then the process is woken up but may never dequeue the signal.
> However, in fact it will always already been in get_signal_to_deliver by
> definition, since that's where stopping happens.  So it will indeed always
> come out and see your check before it resumes.  The only difference then is
> whether the SIGCHLD gets sent immediately at post time or only when the
> resumed child actually gets scheduled.  I don't think that's a problem.

Another difference is that the notification may come from another thread,
not from the signalled one. (this only matters if the task is ptraced).
Hopefully not a problem too?

> Certainly it should just be a flag or two in signal_struct.flags.  The
> extra field is really too ugly.  We need to think carefully about all the
> places that clear/reset ->signal->flags, though.

Yes, exactly.

> Note that doing two
> do_notify_parent_cldstop calls in a two is redundant, since the signals
> don't queue and the parent is not often going to respond so quickly that
> the second one ever actually posts.  I'd be inclined to just make that an
> else if.

Yes, I also thought about that.

> > Hmm... Can't we make a simpler patch for the start? See below.
> > We can notify the parent earlier, before waking the TASK_STOPPED
> > child. This should fix this race, no?
> That is exactly what my first impulse was and what I described as opening
> another can of worms.  (I almost sent the identical patch yesterday.)  Here
> is what I was concerned about.  This does the CLD_CONTINUED SIGCHLD before
> any wakeups, so task_struct.state is still TASK_STOPPED.  The parent can
> see the SIGCHLD, call wait, and see the child still in TASK_STOPPED.  In
> wait_task_stopped, a few things can happen.  It might by then have gotten
> to the signal-sender's retaking of the siglock and wake_up_state; if so, it
> sees p->state no longer TASK_STOPPED and/or p->exit_code no longer nonzero,
> and returns 0.  Then the parent's wait goes back to sleep (or reports
> nothing for WNOHANG), and it never gets a wakeup corresponding to the
> CLD_CONTINUED notification, which it expects in a call using WCONTINUED.
> This is why the notification comes after the wakeup.

Yes, you are right, thanks. But please note that do_wait() is very wrong
here, see (the patch
needs more work, of course).

Btw, I tried to read the comment many times, but still I can't understand
why handle_stop_signal() reports CLD_STOPPED.

Let's look at your patch:

	notify = ((p->signal->flags & SIGNAL_STOP_STOPPED) ?

Q: why can't we do

	if (p->signal->flags & SIGNAL_STOP_STOPPED)
		notify = CLD_CONTINUED;


IOW. Suppose that SIGCONT comes after SIGSTOP. Now, if SIGSTOP was not
dequeued yet, we report nothing. If it was dequeued by some thread which
initiates ->group_stop_count we report CLD_STOPPED. What is the difference?

> Here is
> another version of my patch, that also eliminates the lock-drop for the
> CLD_STOPPED notification when a group stop is still in progress.
> (Though I still haven't yet grokked how that case introduced any bug.)
> What do you think?

My only concern is that it complicates the code :(


> I don't dislike the direction of your flag-setting approach above.  But
> it does introduce more new subtleties that warrant more thought.

Yes sure.

>  /*
> + * This is called by a few places outside this file.
> + */
> +int
> +__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> +{
> +	int notify;
> +	int ret = generate_group_signal(sig, info, p, &notify);
> +	if (unlikely(notify)) {
> +		spin_unlock(&p->sighand->siglock);
> +		do_notify_parent_cldstop(p, notify);
> +		spin_lock(&p->sighand->siglock);
> +	}
> +	return ret;
> +}

Perhaps it is better to export generate_group_signal() instead (not right now
of course). There is only one caller which does __group_send_sig_info(SIGCONT),
do_tty_hangup(). Any function which does unlock+lock is dangerous, the user
can miss the fact it doesn't hold the lock throughout.


  reply	other threads:[~2008-02-29 11:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-27 15:22 Jiri Kosina
2008-02-27 21:00 ` Roland McGrath
2008-02-27 22:04   ` Jiri Kosina
2008-02-28 10:26     ` Roland McGrath
2008-02-28 11:42       ` Jiri Kosina
2008-02-28 15:32         ` Oleg Nesterov
2008-02-28 15:32           ` Jiri Kosina
2008-02-28 15:30       ` Oleg Nesterov
2008-02-28 15:40         ` Jiri Kosina
2008-02-28 16:08           ` Oleg Nesterov
2008-02-28 16:13             ` Jiri Kosina
2008-02-28 16:45               ` [PATCH] handle_stop_signal: don't wake up the stopped task until it sees SIGCONT Oleg Nesterov
2008-02-28 16:56                 ` Jiri Kosina
2008-02-29  2:39         ` [PATCH] [RFC] fix missed SIGCONT cases Roland McGrath
2008-02-29 12:01           ` Oleg Nesterov [this message]
2008-02-29 13:20             ` Oleg Nesterov
2008-03-01  1:59             ` Roland McGrath
2008-03-01 16:41               ` Oleg Nesterov
2008-03-03 13:25                 ` Oleg Nesterov
2008-03-06 10:50                   ` Roland McGrath
2008-03-06  9:05                 ` Roland McGrath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: [PATCH] [RFC] fix missed SIGCONT cases' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).