LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2
@ 2010-12-06 16:56 Tejun Heo
  2010-12-06 16:56 ` [PATCH 01/16] signal: fix SIGCONT notification code Tejun Heo
                   ` (17 more replies)
  0 siblings, 18 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

This is the second attempt at cleaning up ptrace and signal behaviors,
especially the interaction between ptrace and group stop.  There are
quite some number of problems in the area and the current behavior is
often racy, indeterministic and sometimes outright buggy.  This
patchset aims to clean up the muddy stuff, and define and implement a
clear interaction between ptrace and job control.

Most changes from the last take[L] are to address the problems pointed
out by Oleg.  Let's hope things are less swiss-cheesy this time.  I
considered adding a separate task state tracking which encompasses
ptrace related transitions but it quickly became too redundant and
uglier than the current approach of augmenting the default state
machine with flags for special cases.  So, at least for now, I think
it's better to continue with the current approach.

* Group stop helpers restructured: task_clear_group_stop[_trapping]()
  added and consume_group_stop() replaced with
  task_participate_group_stop().

* 0002-freezer-fix-a-race-during-freezing-of-TASK_STOPPED-t.patch from
  the last posting was taken by Rafael and dropped from this series.

* 0002-signal-fix-CLD_CONTINUED-notification-target.patch added.

* 0004-signal-don-t-notify-parent-if-not-stopping-after-tra.patch
  replaced with 0004-ptrace-kill-tracehook_notify_jctl.patch to
  streamline the conversion.  Whether something similar can or should
  be factored out after conversion is to be determined but I'm quite
  doubtful.

* 0005-ptrace-add-why-to-ptrace_stop.patch repositioned from 0007 to
  0005.

* 0007-signal-use-GROUP_STOP_PENDING-to-stop-once-for-a-sin.patch:
  Fixed the problems pointed out by Oleg.

* 0010-ptrace-don-t-consume-group-count-from-ptrace_stop.patch
  replaced by
  0008-ptrace-participate-in-group-stop-from-ptrace_stop-if.patch.
  This is to ensure that GROUP_STOP_PENDING consumption and transition
  into TASK_TRACED is atomic so that a task doesn't trap for group
  stop with GROUP_STOP_PENDING set.

* 0010-ptrace-clean-transitions-between-TASK_STOPPED-and-TR.patch.
  Fixed group stop bookkeeping bug on retry.  TASK_STOPPED ->
  TASK_TRACED transition is now hidden from userland with the help of
  GROUP_STOP_TRAPPING flag.

* Notification reliability patches restructured so that the
  determination logics now reside in do_notify_parent_cldstop() and
  are executed while holding both tasklist_lock and siglock.

* New patches 0011 and 0015 added.

This patchset contains the following sixteen patches.

 0001-signal-fix-SIGCONT-notification-code.patch
 0002-signal-fix-CLD_CONTINUED-notification-target.patch
 0003-signal-remove-superflous-try_to_freeze-loop-in-do_si.patch
 0004-ptrace-kill-tracehook_notify_jctl.patch
 0005-ptrace-add-why-to-ptrace_stop.patch
 0006-signal-fix-premature-completion-of-group-stop-when-i.patch
 0007-signal-use-GROUP_STOP_PENDING-to-stop-once-for-a-sin.patch
 0008-ptrace-participate-in-group-stop-from-ptrace_stop-if.patch
 0009-ptrace-make-do_signal_stop-use-ptrace_stop-if-the-ta.patch
 0010-ptrace-clean-transitions-between-TASK_STOPPED-and-TR.patch
 0011-signal-prepare-for-CLD_-notification-changes.patch
 0012-ptrace-make-group-stop-notification-reliable-against.patch
 0013-ptrace-reorganize-__ptrace_unlink-and-ptrace_untrace.patch
 0014-ptrace-make-SIGCONT-notification-reliable-against-pt.patch
 0015-ptrace-make-sure-SIGNAL_NOTIFY_CONT-is-checked-after.patch
 0016-ptrace-remove-the-extra-wake_up_process-from-ptrace_.patch

0001-0002 are fixes.  0003-0005 are preparation patches.

0006 prevents a ptraced task from being stopped multiple times for the
same group stop instance.

0007-0010 update the code such that a ptracee always enters and leaves
TASK_TRACED on its own.  ptracer no longer changes tracee's state
underneath it; instead, it tells the tracee to enter the target state.
A TASK_TRACED task is guaranteed to be stopped inside ptrace_stop()
after executing the arch hooks while TASK_STOPPED task is guaranteed
to be stopped in do_signal_stop().

0011-0015 make CLD_STOPPED/CONTINUED notification reliable with
intervening ptrace.  Whether a ptracee owes an notification to its
parent is tracked and the real parent is notified accordingly on
detach.

0016 kills the unnecessary wake_up_process() which is wrong in so many
ways including the possibility of abruptly waking up a task in an
uninterruptible sleep.  Now that ptrace / job control interaction is
cleaned up, this really should go.

After the patchset, the interaction between ptrace and job control is
defined as,

* Regardless of ptrace, job control signals control the process-wide
  stopped state.  A ptracee receives and handles job control signals
  the same as before but no matter what ptrace does the global state
  isn't affected.

* On ptrace detach, the task is put into the state which matches the
  process-wide stopped state.  If necessary, notification to the real
  parent is reinstated.

Due to the implicit sending of SIGSTOP on PTRACE_ATTACH, ptrace
attach/detach are still not transparent w.r.t. job control but these
changes lay the base for (almost) transparent ptracing.

The branch is on top of 2.6.37-rc4 and available in the following git
branch,

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace

and contains the following changes.

 fs/exec.c                 |    1 
 include/linux/sched.h     |   14 +
 include/linux/tracehook.h |   27 ---
 kernel/ptrace.c           |  139 ++++++++++++++----
 kernel/signal.c           |  345 +++++++++++++++++++++++++++++++++++-----------
 5 files changed, 391 insertions(+), 135 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/1068420

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/16] signal: fix SIGCONT notification code
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-06 16:56 ` [PATCH 02/16] signal: fix CLD_CONTINUED notification target Tejun Heo
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

After a task receives SIGCONT, its parent is notified via SIGCHLD with
its siginfo describing what the notified event is.  If SIGCONT is
received while the child process is stopped, the code should be
CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
the process of being stopped, it should be CLD_STOPPED.  Which code to
use is determined in prepare_signal() and recorded in signal->flags
using SIGNAL_CLD_CONTINUED|STOP flags.

get_signal_deliver() should test these flags and then notify
accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
and CLD_STOPPED if the state was fetched already.

Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
?: test into if/else clause for better readability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..fe004b5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1853,8 +1853,13 @@ relock:
 	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
 	 */
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		int why = (signal->flags & SIGNAL_STOP_CONTINUED)
-				? CLD_CONTINUED : CLD_STOPPED;
+		int why;
+
+		if (signal->flags & SIGNAL_CLD_CONTINUED)
+			why = CLD_CONTINUED;
+		else
+			why = CLD_STOPPED;
+
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
 		why = tracehook_notify_jctl(why, CLD_CONTINUED);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 02/16] signal: fix CLD_CONTINUED notification target
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
  2010-12-06 16:56 ` [PATCH 01/16] signal: fix SIGCONT notification code Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 14:58   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

CLD_CONTINUED notification code calls do_notify_parent_cldstop() with
its group_leader; however, do_notify_parent_cldstop() already uses the
group_leader for non-ptraced notifications.  The duplicate
->group_leader dereferencing is unnecessary and leads to incorrectly
notifying the group_leader's ptracer of CLD_CONTINUED from a different
task in the group.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index fe004b5..0411131 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1867,7 +1867,7 @@ relock:
 
 		if (why) {
 			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current->group_leader, why);
+			do_notify_parent_cldstop(current, why);
 			read_unlock(&tasklist_lock);
 		}
 		goto relock;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
  2010-12-06 16:56 ` [PATCH 01/16] signal: fix SIGCONT notification code Tejun Heo
  2010-12-06 16:56 ` [PATCH 02/16] signal: fix CLD_CONTINUED notification target Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 14:59   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 04/16] ptrace: kill tracehook_notify_jctl() Tejun Heo
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0411131..7211c9f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
 	}
 
 	/* Now we don't run again until woken by SIGCONT or SIGKILL */
-	do {
-		schedule();
-	} while (try_to_freeze());
+	schedule();
 
 	tracehook_finish_jctl();
 	current->exit_code = 0;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 04/16] ptrace: kill tracehook_notify_jctl()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (2 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 14:59   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 05/16] ptrace: add @why to ptrace_stop() Tejun Heo
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

tracehook_notify_jctl() aids in determining whether and what to report
to the parent when a task is stopped or continued.  The function also
adds an extra requirement that siglock may be released across it,
which is currently unused and quite difficult to satisfy in
well-defined manner.

As job control and the notifications are about to receive major
overhaul, remove the tracehook and open code it.  If ever necessary,
let's factor it out after the overhaul.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/tracehook.h |   27 ---------------------------
 kernel/signal.c           |   37 ++++++++++++++++---------------------
 2 files changed, 16 insertions(+), 48 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3a2e66d..b073f3c 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -469,33 +469,6 @@ static inline int tracehook_get_signal(struct task_struct *task,
 }
 
 /**
- * tracehook_notify_jctl - report about job control stop/continue
- * @notify:		zero, %CLD_STOPPED or %CLD_CONTINUED
- * @why:		%CLD_STOPPED or %CLD_CONTINUED
- *
- * This is called when we might call do_notify_parent_cldstop().
- *
- * @notify is zero if we would not ordinarily send a %SIGCHLD,
- * or is the %CLD_STOPPED or %CLD_CONTINUED .si_code for %SIGCHLD.
- *
- * @why is %CLD_STOPPED when about to stop for job control;
- * we are already in %TASK_STOPPED state, about to call schedule().
- * It might also be that we have just exited (check %PF_EXITING),
- * but need to report that a group-wide stop is complete.
- *
- * @why is %CLD_CONTINUED when waking up after job control stop and
- * ready to make a delayed @notify report.
- *
- * Return the %CLD_* value for %SIGCHLD, or zero to generate no signal.
- *
- * Called with the siglock held.
- */
-static inline int tracehook_notify_jctl(int notify, int why)
-{
-	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
-}
-
-/**
  * tracehook_finish_jctl - report about return from job control stop
  *
  * This is called by do_signal_stop() after wakeup.
diff --git a/kernel/signal.c b/kernel/signal.c
index 7211c9f..5c334ec 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1727,7 +1727,7 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int notify;
+	int notify = 0;
 
 	if (!sig->group_stop_count) {
 		struct task_struct *t;
@@ -1759,19 +1759,16 @@ static int do_signal_stop(int signr)
 	 * a group stop in progress and we are the last to stop, report
 	 * to the parent.  When ptraced, every thread reports itself.
 	 */
-	notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
-	notify = tracehook_notify_jctl(notify, CLD_STOPPED);
-	/*
-	 * tracehook_notify_jctl() can drop and reacquire siglock, so
-	 * we keep ->group_stop_count != 0 before the call. If SIGCONT
-	 * or SIGKILL comes in between ->group_stop_count == 0.
-	 */
-	if (sig->group_stop_count) {
-		if (!--sig->group_stop_count)
-			sig->flags = SIGNAL_STOP_STOPPED;
-		current->exit_code = sig->group_exit_code;
-		__set_current_state(TASK_STOPPED);
+	if (!--sig->group_stop_count) {
+		sig->flags = SIGNAL_STOP_STOPPED;
+		notify = CLD_STOPPED;
 	}
+	if (task_ptrace(current))
+		notify = CLD_STOPPED;
+
+	current->exit_code = sig->group_exit_code;
+	__set_current_state(TASK_STOPPED);
+
 	spin_unlock_irq(&current->sighand->siglock);
 
 	if (notify) {
@@ -1853,21 +1850,19 @@ relock:
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
 		int why;
 
-		if (signal->flags & SIGNAL_CLD_CONTINUED)
+		if (task_ptrace(current) ||
+		    (signal->flags & SIGNAL_CLD_CONTINUED))
 			why = CLD_CONTINUED;
 		else
 			why = CLD_STOPPED;
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		why = tracehook_notify_jctl(why, CLD_CONTINUED);
 		spin_unlock_irq(&sighand->siglock);
 
-		if (why) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, why);
-			read_unlock(&tasklist_lock);
-		}
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current, why);
+		read_unlock(&tasklist_lock);
 		goto relock;
 	}
 
@@ -2034,7 +2029,7 @@ void exit_signals(struct task_struct *tsk)
 	if (unlikely(tsk->signal->group_stop_count) &&
 			!--tsk->signal->group_stop_count) {
 		tsk->signal->flags = SIGNAL_STOP_STOPPED;
-		group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+		group_stop = CLD_STOPPED;
 	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 05/16] ptrace: add @why to ptrace_stop()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (3 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 04/16] ptrace: kill tracehook_notify_jctl() Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-06 16:56 ` [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

To prepare for cleanup of the interaction between group stop and
ptrace, add @why to ptrace_stop().  Existing users are updateda such
that there is no behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/signal.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 5c334ec..2964642 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1617,7 +1617,7 @@ static int sigkill_pending(struct task_struct *tsk)
  * If we actually decide not to stop at all because the tracer
  * is gone, we keep current->exit_code unless clear_code.
  */
-static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -1655,7 +1655,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
-		do_notify_parent_cldstop(current, CLD_TRAPPED);
+		do_notify_parent_cldstop(current, why);
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
@@ -1714,7 +1714,7 @@ void ptrace_notify(int exit_code)
 
 	/* Let the debugger run.  */
 	spin_lock_irq(&current->sighand->siglock);
-	ptrace_stop(exit_code, 1, &info);
+	ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
@@ -1795,7 +1795,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
 	ptrace_signal_deliver(regs, cookie);
 
 	/* Let the debugger run.  */
-	ptrace_stop(signr, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	signr = current->exit_code;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (4 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 05/16] ptrace: add @why to ptrace_stop() Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 15:00   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 07/16] signal: use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

task->signal->group_stop_count is used to tracke the progress of group
stop.  It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.

Please consider the following example code.

 static void *worker(void *arg)
 {
	 while (1) ;
	 return NULL;
 }

 int main(void)
 {
	 pthread_t thread;
	 pid_t pid;
	 int i;

	 pid = fork();
	 if (!pid) {
		 for (i = 0; i < 5; i++)
			 pthread_create(&thread, NULL, worker, NULL);
		 while (1) ;
		 return 0;
	 }

	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
	 while (1) {
		 waitid(P_PID, pid, NULL, WSTOPPED);
		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
	 }
	 return 0;
 }

The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered.  If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop.  However, due to
the above bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.

This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.

task_clear_group_stop() and task_participate_group_stop() are added to
help manipulating group stop states.  As ptrace_stop() now also uses
task_participate_group_stop(), it will set SIGNAL_STOP_STOPPED if it
completes a group stop.

There still are many issues regarding the interaction between group
stop and ptrace.  Patches to address them will follow.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/sched.h |    6 ++++
 kernel/signal.c       |   64 ++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c79e92..93157a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1245,6 +1245,7 @@ struct task_struct {
 	int exit_state;
 	int exit_code, exit_signal;
 	int pdeath_signal;  /*  The signal sent when the parent dies  */
+	unsigned int group_stop;	/* GROUP_STOP_*, siglock protected */
 	/* ??? */
 	unsigned int personality;
 	unsigned did_exec:1;
@@ -1756,6 +1757,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+/*
+ * task->group_stop flags
+ */
+#define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
+
 #ifdef CONFIG_PREEMPT_RCU
 
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index 2964642..bac959e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -223,6 +223,54 @@ static inline void print_dropped_signal(int sig)
 				current->comm, current->pid, sig);
 }
 
+/**
+ * task_clear_group_stop - clear pending group stop
+ * @task: target task
+ *
+ * Clear group stop states for @task.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop(struct task_struct *task)
+{
+	task->group_stop &= ~GROUP_STOP_CONSUME;
+}
+
+/**
+ * task_participate_group_stop - participate in a group stop
+ * @task: task participating in a group stop
+ *
+ * @task is participating in a group stop.  Group stop states are cleared
+ * and the group stop count is consumed if %GROUP_STOP_CONSUME was set.  If
+ * the consumption completes the group stop, the appropriate %SIGNAL_*
+ * flags are set.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static bool task_participate_group_stop(struct task_struct *task)
+{
+	struct signal_struct *sig = task->signal;
+	bool consume = task->group_stop & GROUP_STOP_CONSUME;
+
+	task_clear_group_stop(task);
+
+	if (!consume)
+		return false;
+
+	task->group_stop &= ~GROUP_STOP_CONSUME;
+
+	if (!WARN_ON_ONCE(sig->group_stop_count == 0))
+		sig->group_stop_count--;
+
+	if (!sig->group_stop_count) {
+		sig->flags = SIGNAL_STOP_STOPPED;
+		return true;
+	}
+	return false;
+}
+
 /*
  * allocate a new signal queue record
  * - this may be called without locks if and only if t == current, otherwise an
@@ -1645,7 +1693,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * we must participate in the bookkeeping.
 	 */
 	if (current->signal->group_stop_count > 0)
-		--current->signal->group_stop_count;
+		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
@@ -1730,6 +1778,7 @@ static int do_signal_stop(int signr)
 	int notify = 0;
 
 	if (!sig->group_stop_count) {
+		unsigned int gstop = GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1741,6 +1790,7 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
+		current->group_stop = gstop;
 		sig->group_stop_count = 1;
 		for (t = next_thread(current); t != current; t = next_thread(t))
 			/*
@@ -1750,19 +1800,19 @@ static int do_signal_stop(int signr)
 			 */
 			if (!(t->flags & PF_EXITING) &&
 			    !task_is_stopped_or_traced(t)) {
+				t->group_stop = gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
-			}
+			} else
+				task_clear_group_stop(t);
 	}
 	/*
 	 * If there are no other threads in the group, or if there is
 	 * a group stop in progress and we are the last to stop, report
 	 * to the parent.  When ptraced, every thread reports itself.
 	 */
-	if (!--sig->group_stop_count) {
-		sig->flags = SIGNAL_STOP_STOPPED;
+	if (task_participate_group_stop(current))
 		notify = CLD_STOPPED;
-	}
 	if (task_ptrace(current))
 		notify = CLD_STOPPED;
 
@@ -2027,10 +2077,8 @@ void exit_signals(struct task_struct *tsk)
 			recalc_sigpending_and_wake(t);
 
 	if (unlikely(tsk->signal->group_stop_count) &&
-			!--tsk->signal->group_stop_count) {
-		tsk->signal->flags = SIGNAL_STOP_STOPPED;
+	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
-	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 07/16] signal: use GROUP_STOP_PENDING to stop once for a single group stop
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (5 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-06 16:56 ` [PATCH 08/16] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop.  However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

Conversely, if a ptraced task is in TASK_TRACED state, the debugger
won't get notified of group stops which is inconsistent compared to
the ptraced task in any other state.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress.  The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, and consulted whenever whether the task should
participate in a group stop needs to be determined.  Note that now
tasks in TASK_TRACED also participate in group stop.

This results in the following behavior changes.

* For a single group stop, a ptracer would see at most one stop
  reported.

* A ptracee in TASK_TRACED now also participates in group stop and the
  tracer would get the notification.  However, as a ptraced task could
  be in TASK_STOPPED state or any ptrace trap could consume group
  stop, the notification may still be missing.  These will be
  addressed with further patches.

* A ptracee may start a group stop while one is still in progress if
  the tracer let it continue with stop signal delivery.  Group stop
  code handles this correctly.

Oleg:

* Spotted that a task might skip signal check even when its
  GROUP_STOP_PENDING is set.  Fixed by updating
  recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
  group_stop_count.

* Pointed out that task->group_stop should be cleared whenever
  task->signal->group_stop_count is cleared.  Fixed accordingly.

* Pointed out the behavior inconsistency between TASK_TRACED and
  RUNNING and the last behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 fs/exec.c             |    1 +
 include/linux/sched.h |    3 +++
 kernel/signal.c       |   36 +++++++++++++++++++++---------------
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d68c378..8d4142c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1648,6 +1648,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 
 	t = start;
 	do {
+		task_clear_group_stop(t);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93157a4..c2538dd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,8 +1760,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * task->group_stop flags
  */
+#define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
 #define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
 
+extern void task_clear_group_stop(struct task_struct *task);
+
 #ifdef CONFIG_PREEMPT_RCU
 
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index bac959e..bff46ed 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
-	if (t->signal->group_stop_count > 0 ||
+	if ((t->group_stop & GROUP_STOP_PENDING) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -232,19 +232,19 @@ static inline void print_dropped_signal(int sig)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-static void task_clear_group_stop(struct task_struct *task)
+void task_clear_group_stop(struct task_struct *task)
 {
-	task->group_stop &= ~GROUP_STOP_CONSUME;
+	task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
 }
 
 /**
  * task_participate_group_stop - participate in a group stop
  * @task: task participating in a group stop
  *
- * @task is participating in a group stop.  Group stop states are cleared
- * and the group stop count is consumed if %GROUP_STOP_CONSUME was set.  If
- * the consumption completes the group stop, the appropriate %SIGNAL_*
- * flags are set.
+ * @task has GROUP_STOP_PENDING set and is participating in a group stop.
+ * Group stop states are cleared and the group stop count is consumed if
+ * %GROUP_STOP_CONSUME was set.  If the consumption completes the group
+ * stop, the appropriate %SIGNAL_* flags are set.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
@@ -254,6 +254,8 @@ static bool task_participate_group_stop(struct task_struct *task)
 	struct signal_struct *sig = task->signal;
 	bool consume = task->group_stop & GROUP_STOP_CONSUME;
 
+	WARN_ON_ONCE(!(task->group_stop & GROUP_STOP_PENDING));
+
 	task_clear_group_stop(task);
 
 	if (!consume)
@@ -767,6 +769,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		t = p;
 		do {
 			unsigned int state;
+
+			task_clear_group_stop(t);
+
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
 			/*
 			 * If there is a handler for SIGCONT, we must make
@@ -908,6 +913,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			signal->group_stop_count = 0;
 			t = p;
 			do {
+				task_clear_group_stop(t);
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
@@ -1141,6 +1147,7 @@ int zap_other_threads(struct task_struct *p)
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
+		task_clear_group_stop(t);
 		count++;
 
 		/* Don't bother with already dead threads */
@@ -1692,7 +1699,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * If there is a group stop in progress,
 	 * we must participate in the bookkeeping.
 	 */
-	if (current->signal->group_stop_count > 0)
+	if (current->group_stop & GROUP_STOP_PENDING)
 		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
@@ -1777,8 +1784,8 @@ static int do_signal_stop(int signr)
 	struct signal_struct *sig = current->signal;
 	int notify = 0;
 
-	if (!sig->group_stop_count) {
-		unsigned int gstop = GROUP_STOP_CONSUME;
+	if (!(current->group_stop & GROUP_STOP_PENDING)) {
+		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1798,8 +1805,7 @@ static int do_signal_stop(int signr)
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
-			if (!(t->flags & PF_EXITING) &&
-			    !task_is_stopped_or_traced(t)) {
+			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
 				t->group_stop = gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
@@ -1929,8 +1935,8 @@ relock:
 		if (unlikely(signr != 0))
 			ka = return_ka;
 		else {
-			if (unlikely(signal->group_stop_count > 0) &&
-			    do_signal_stop(0))
+			if (unlikely(current->group_stop &
+				     GROUP_STOP_PENDING) && do_signal_stop(0))
 				goto relock;
 
 			signr = dequeue_signal(current, &current->blocked,
@@ -2076,7 +2082,7 @@ void exit_signals(struct task_struct *tsk)
 		if (!signal_pending(t) && !(t->flags & PF_EXITING))
 			recalc_sigpending_and_wake(t);
 
-	if (unlikely(tsk->signal->group_stop_count) &&
+	if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
 	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
 out:
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 08/16] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for group stop
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (6 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 07/16] signal: use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-06 16:56 ` [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Currently, ptrace_stop() unconditionally participates in group stop
bookkeeping.  This is unnecessary and inaccurate.  Make it only
participate if the task is trapping for group stop - ie. if @why is
CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
group stop, this equals to disabling group stop participation from
ptrace_stop().

A visible behavior change is increased likelihood of delayed group
stop completion if the thread group contains one or more ptraced
tasks.

This is to preapre for further cleanup of the interaction between
group stop and ptrace.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index bff46ed..faf218b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1696,10 +1696,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * If there is a group stop in progress,
-	 * we must participate in the bookkeeping.
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * while siglock was released for the arch hook, PENDING could be
+	 * clear now.  We act as if SIGCONT is received after TASK_TRACED
+	 * is entered - ignore it.
 	 */
-	if (current->group_stop & GROUP_STOP_PENDING)
+	if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
 		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (7 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 08/16] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-23 12:26   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED Tejun Heo
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

A ptraced task would still stop at do_signal_stop() when it's stopping
for stop signals and do_signal_stop() behaves the same whether the
task is ptraced or not.  However, in addition to stopping,
ptrace_stop() also does ptrace specific stuff like calling
architecture specific callbacks, so this behavior makes the code more
fragile and difficult to understand.

This patch makes do_signal_stop() test whether the task is ptraced and
use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
pointless as the ptrace notification is now handled by ptrace_stop()
regardless of the return value from the tracehook.  It probably is a
good idea to update it.

This doesn't solve the whole problem as tasks already in stopped state
would stay in the regular stop when ptrace attached.  That part will
be handled by the next patch.

Oleg spotted a minor userland visible change.  In some cases, the
ptracee's state would now be TASK_TRACED where it used to be
TASK_STOPPED, which is visible via fs/proc.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 kernel/signal.c |   43 +++++++++++++++++++++++++------------------
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index faf218b..a6bc4cf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1785,7 +1785,6 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int notify = 0;
 
 	if (!(current->group_stop & GROUP_STOP_PENDING)) {
 		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
@@ -1815,29 +1814,37 @@ static int do_signal_stop(int signr)
 			} else
 				task_clear_group_stop(t);
 	}
-	/*
-	 * If there are no other threads in the group, or if there is
-	 * a group stop in progress and we are the last to stop, report
-	 * to the parent.  When ptraced, every thread reports itself.
-	 */
-	if (task_participate_group_stop(current))
-		notify = CLD_STOPPED;
-	if (task_ptrace(current))
-		notify = CLD_STOPPED;
 
 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);
 
-	spin_unlock_irq(&current->sighand->siglock);
+	if (likely(!task_ptrace(current))) {
+		int notify = 0;
 
-	if (notify) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, notify);
-		read_unlock(&tasklist_lock);
-	}
+		/*
+		 * If there are no other threads in the group, or if there
+		 * is a group stop in progress and we are the last to stop,
+		 * report to the parent.
+		 */
+		if (task_participate_group_stop(current))
+			notify = CLD_STOPPED;
 
-	/* Now we don't run again until woken by SIGCONT or SIGKILL */
-	schedule();
+		spin_unlock_irq(&current->sighand->siglock);
+
+		if (notify) {
+			read_lock(&tasklist_lock);
+			do_notify_parent_cldstop(current, notify);
+			read_unlock(&tasklist_lock);
+		}
+
+		/* Now we don't run again until woken by SIGCONT or SIGKILL */
+		schedule();
+
+		spin_lock_irq(&current->sighand->siglock);
+	} else
+		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+
+	spin_unlock_irq(&current->sighand->siglock);
 
 	tracehook_finish_jctl();
 	current->exit_code = 0;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (8 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 15:00   ` Oleg Nesterov
  2010-12-06 16:56 ` [PATCH 11/16] signal: prepare for CLD_* notification changes Tejun Heo
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean.  The tracer can use the flag to tell the
tracee to retry stop on attach and detach.  On retry, the tracee will
enter the desired state in the correct way.  The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop.  This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop().  This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from userland by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear.  Completing the transition or any event which clears
the group stop states of the task clears the bit and wakes up the
ptracer if waiting.

Oleg:

* Spotted a race condition where a task may retry group stop without
  proper bookkeeping.  Fixed by redoing bookkeeping on retry.

* Pointed out the userland visible intermediate state.  Fixed with
  GROUP_STOP_TRAPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 include/linux/sched.h |    2 +
 kernel/ptrace.c       |   63 ++++++++++++++++++++++++++++++++++++++++++------
 kernel/signal.c       |   62 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c2538dd..7045c34 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,8 +1760,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * task->group_stop flags
  */
+#define GROUP_STOP_SIGMASK	0xffff    /* signr of the last group stop */
 #define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
 #define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
+#define GROUP_STOP_TRAPPING	(1 << 18) /* switching from STOPPED to TRACED */
 
 extern void task_clear_group_stop(struct task_struct *task);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99bbaa3..5191301 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -49,14 +49,14 @@ static void ptrace_untrace(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 	if (task_is_traced(child)) {
 		/*
-		 * If the group stop is completed or in progress,
-		 * this thread was already counted as stopped.
+		 * If group stop is completed or in progress, it should
+		 * participate in the group stop.  Set GROUP_STOP_PENDING
+		 * before kicking it.
 		 */
 		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
 		    child->signal->group_stop_count)
-			__set_task_state(child, TASK_STOPPED);
-		else
-			signal_wake_up(child, 1);
+			child->group_stop |= GROUP_STOP_PENDING;
+		signal_wake_up(child, 1);
 	}
 	spin_unlock(&child->sighand->siglock);
 }
@@ -79,6 +79,12 @@ void __ptrace_unlink(struct task_struct *child)
 		ptrace_untrace(child);
 }
 
+static int ptrace_wait_trap(void *flags)
+{
+	schedule();
+	return 0;
+}
+
 /*
  * Check that we have indeed attached to the thing..
  */
@@ -93,6 +99,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
+relock:
 	read_lock(&tasklist_lock);
 	if ((child->ptrace & PT_PTRACED) && child->parent == current) {
 		ret = 0;
@@ -101,10 +108,30 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		 * does ptrace_unlink() before __exit_signal().
 		 */
 		spin_lock_irq(&child->sighand->siglock);
-		if (task_is_stopped(child))
-			child->state = TASK_TRACED;
-		else if (!task_is_traced(child) && !kill)
+		if (!task_is_traced(child) && !kill) {
+			/*
+			 * If GROUP_STOP_TRAPPING is set, it is known that
+			 * the tracee will enter either TRACED or the bit
+			 * will be cleared in definite amount of (userland)
+			 * time.  Wait while the bit is set.
+			 *
+			 * This hides PTRACE_ATTACH initiated transition
+			 * from STOPPED to TRACED from userland.
+			 */
+			if (child->group_stop & GROUP_STOP_TRAPPING) {
+				const int bit = ilog2(GROUP_STOP_TRAPPING);
+				DEFINE_WAIT_BIT(wait, &child->group_stop, bit);
+
+				spin_unlock_irq(&child->sighand->siglock);
+				read_unlock(&tasklist_lock);
+
+				wait_on_bit(&child->group_stop, bit,
+					    ptrace_wait_trap,
+					    TASK_UNINTERRUPTIBLE);
+				goto relock;
+			}
 			ret = -ESRCH;
+		}
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
@@ -204,6 +231,26 @@ int ptrace_attach(struct task_struct *task)
 	__ptrace_link(task, current);
 	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
+	spin_lock(&task->sighand->siglock);
+
+	/*
+	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
+	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
+	 * will be cleared if the child completes the transition or any
+	 * event which clears the group stop states happens.  The bit is
+	 * waited by ptrace_check_attach() to hide the transition from
+	 * userland.
+	 *
+	 * The following is safe as both transitions in and out of STOPPED
+	 * are protected by siglock.
+	 */
+	if (task_is_stopped(task)) {
+		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+		signal_wake_up(task, 1);
+	}
+
+	spin_unlock(&task->sighand->siglock);
+
 	retval = 0;
 unlock_tasklist:
 	write_unlock_irq(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index a6bc4cf..6d93a3f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -224,10 +224,29 @@ static inline void print_dropped_signal(int sig)
 }
 
 /**
+ * task_clear_group_stop_trapping - clear group stop trapping bit
+ * @task: target task
+ *
+ * If GROUP_STOP_TRAPPING is set, it's cleared and wake_up_bit() is called
+ * on the bit.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop_trapping(struct task_struct *task)
+{
+	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
+		task->group_stop &= ~GROUP_STOP_TRAPPING;
+		wake_up_bit(&task->group_stop, ilog2(GROUP_STOP_TRAPPING));
+	}
+}
+
+/**
  * task_clear_group_stop - clear pending group stop
  * @task: target task
  *
- * Clear group stop states for @task.
+ * Clear group stop pending state for @task.  All group stop states except
+ * for the recorded last stop signal are cleared.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
@@ -235,6 +254,7 @@ static inline void print_dropped_signal(int sig)
 void task_clear_group_stop(struct task_struct *task)
 {
 	task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
+	task_clear_group_stop_trapping(task);
 }
 
 /**
@@ -1696,6 +1716,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
+	 * We're committing to trapping.  Clearing GROUP_STOP_TRAPPING and
+	 * transition to TASK_TRACED should be atomic with respect to
+	 * siglock.  Do it after the arch hook as siglock is released and
+	 * regrabbed across it.
+	 */
+	task_clear_group_stop_trapping(current);
+
+	/*
 	 * If @why is CLD_STOPPED, we're trapping to participate in a group
 	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
 	 * while siglock was released for the arch hook, PENDING could be
@@ -1790,6 +1818,9 @@ static int do_signal_stop(int signr)
 		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
+		/* signr will be recorded in task->group_stop for retries */
+		WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
 			return 0;
@@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
-		current->group_stop = gstop;
+		current->group_stop &= ~GROUP_STOP_SIGMASK;
+		current->group_stop |= signr | gstop;
 		sig->group_stop_count = 1;
-		for (t = next_thread(current); t != current; t = next_thread(t))
+		for (t = next_thread(current); t != current;
+		     t = next_thread(t)) {
+			t->group_stop &= ~GROUP_STOP_SIGMASK;
 			/*
 			 * Setting state to TASK_STOPPED for a group
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
 			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
-				t->group_stop = gstop;
+				t->group_stop |= signr | gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
-			} else
+			} else {
 				task_clear_group_stop(t);
+				t->group_stop |= signr;
+			}
+		}
 	}
-
+retry:
 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);
 
@@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
 
 		spin_lock_irq(&current->sighand->siglock);
 	} else
-		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+			    CLD_STOPPED, 0, NULL);
+
+	/*
+	 * GROUP_STOP_PENDING could be set if another group stop has
+	 * started since being woken up or ptrace wants us to transit
+	 * between TASK_STOPPED and TRACED.  Retry group stop.
+	 */
+	if (current->group_stop & GROUP_STOP_PENDING) {
+		WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+		goto retry;
+	}
 
 	spin_unlock_irq(&current->sighand->siglock);
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 11/16] signal: prepare for CLD_* notification changes
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (9 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED Tejun Heo
@ 2010-12-06 16:56 ` Tejun Heo
  2010-12-20 16:21   ` Oleg Nesterov
  2010-12-06 16:57 ` [PATCH 12/16] ptrace: make group stop notification reliable against ptrace Tejun Heo
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:56 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Add docbook comment for do_notify_parent_cldstop(), fix a comment typo
and add boilerplat code to prepare for moving notification
determination logic into do_notify_parent_cldstop().

This patch doesn't cause any visible behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6d93a3f..7dfbba9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1597,13 +1597,57 @@ int do_notify_parent(struct task_struct *tsk, int sig)
 	return ret;
 }
 
+/**
+ * do_notify_parent_cldstop - notify parent of CLD_CONTINUED, STOPPED or TRAPPED
+ * @tsk: task which has been continued or is about to stop
+ * @why: CLD_{CONTINUED|STOPPED|TRAPPED}
+ *
+ * Notifies the parent that @tsk has been continued or is about to stop.
+ *
+ * The notify target changes depending on whether @tsk is being ptraced or
+ * not.  If @tsk is being ptraced, it's always the ptracer; otherwise, it's
+ * the task group's real parent.
+ *
+ * CONTEXT:
+ * Must be called with tasklist_lock held.  Grabs and releases the siglocks
+ * of @tsk and the notify target.
+ */
 static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 {
 	struct siginfo info;
 	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	struct signal_struct *sig;
+	int notify = 0;
+
+	/*
+	 * Determine whether and what to notify.  This should be done under
+	 * @tsk's siglock.  As tasklist_lock is already held,
+	 * task_ptrace(@tsk) won't change beneath us.
+	 */
+	sighand = tsk->sighand;
+	sig = tsk->signal;
+	spin_lock_irqsave(&sighand->siglock, flags);
 
+	switch (why) {
+	case CLD_CONTINUED:
+	case CLD_STOPPED:
+	case CLD_TRAPPED:
+		notify = why;
+		break;
+	}
+
+	spin_unlock_irqrestore(&sighand->siglock, flags);
+
+	if (!notify)
+		return;
+
+	/*
+	 * Determine notify target and who should be notifying.  If being
+	 * ptraced, @tsk notifies to its current parent; otherwise, the
+	 * group leader reports to its real parent.
+	 */
 	if (task_ptrace(tsk))
 		parent = tsk->parent;
 	else {
@@ -1614,7 +1658,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 	info.si_signo = SIGCHLD;
 	info.si_errno = 0;
 	/*
-	 * see comment in do_notify_parent() abot the following 3 lines
+	 * see comment in do_notify_parent() about the following 3 lines
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
@@ -1624,13 +1668,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 	info.si_utime = cputime_to_clock_t(tsk->utime);
 	info.si_stime = cputime_to_clock_t(tsk->stime);
 
- 	info.si_code = why;
- 	switch (why) {
+	info.si_code = notify;
+	switch (notify) {
  	case CLD_CONTINUED:
  		info.si_status = SIGCONT;
  		break;
  	case CLD_STOPPED:
- 		info.si_status = tsk->signal->group_exit_code & 0x7f;
+		info.si_status = sig->group_exit_code & 0x7f;
  		break;
  	case CLD_TRAPPED:
  		info.si_status = tsk->exit_code & 0x7f;
@@ -1640,6 +1684,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
  	}
 
 	sighand = parent->sighand;
+	sig = parent->signal;
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (10 preceding siblings ...)
  2010-12-06 16:56 ` [PATCH 11/16] signal: prepare for CLD_* notification changes Tejun Heo
@ 2010-12-06 16:57 ` Tejun Heo
  2010-12-20 17:34   ` Oleg Nesterov
  2010-12-06 16:57 ` [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace() Tejun Heo
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:57 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Group stop notifications are unreliable if one or more tasks of the
task group are being ptraced.  If a ptraced task ends up finishing a
group stop, the notification is sent to the ptracer and the real
parent never gets notified.

This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
group stop completion and cleared on notification to the real parent
or together with other stopped flags on SIGCONT/KILL.  This guarantees
that the real parent is notified correctly regardless of ptrace.  If a
ptraced task is the last task to stop, the notification is postponed
till ptrace detach or canceled if SIGCONT/KILL is received inbetween.

Oleg spotted race against ptrace attach/detach in the initial
implementation.  This is fixed by moving notification determiniation
into do_notify_parent_cldstop() and performing it while holding both
tasklist_lock and siglock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/sched.h |    2 +
 kernel/signal.c       |   65 +++++++++++++++++++++++++++++-------------------
 2 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7045c34..7a26e7d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,6 +653,8 @@ struct signal_struct {
 
 #define SIGNAL_UNKILLABLE	0x00000040 /* for init: ignore fatal signals */
 
+#define SIGNAL_NOTIFY_STOP	0x00000100 /* notify parent of group stop */
+
 /* If true, all threads except ->group_exit_task have pending SIGKILL */
 static inline int signal_group_exit(const struct signal_struct *sig)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index 7dfbba9..3196367 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -269,7 +269,7 @@ void task_clear_group_stop(struct task_struct *task)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-static bool task_participate_group_stop(struct task_struct *task)
+static void task_participate_group_stop(struct task_struct *task)
 {
 	struct signal_struct *sig = task->signal;
 	bool consume = task->group_stop & GROUP_STOP_CONSUME;
@@ -279,18 +279,15 @@ static bool task_participate_group_stop(struct task_struct *task)
 	task_clear_group_stop(task);
 
 	if (!consume)
-		return false;
+		return;
 
 	task->group_stop &= ~GROUP_STOP_CONSUME;
 
 	if (!WARN_ON_ONCE(sig->group_stop_count == 0))
 		sig->group_stop_count--;
 
-	if (!sig->group_stop_count) {
-		sig->flags = SIGNAL_STOP_STOPPED;
-		return true;
-	}
-	return false;
+	if (!sig->group_stop_count)
+		sig->flags = SIGNAL_STOP_STOPPED | SIGNAL_NOTIFY_STOP;
 }
 
 /*
@@ -1603,6 +1600,16 @@ int do_notify_parent(struct task_struct *tsk, int sig)
  * @why: CLD_{CONTINUED|STOPPED|TRAPPED}
  *
  * Notifies the parent that @tsk has been continued or is about to stop.
+ * Depending on @why and other conditions, the notification might be
+ * skipped.
+ *
+ * CLD_STOPPED		: If ptraced, always notify; otherwise, notify
+ *			  once if SIGNAL_NOTIFY_STOP is set.
+ *
+ * CLD_TRAPPED		: Always notify.
+ *
+ * For notify once cases, the respective NOTIFY flag is consumed and
+ * cleared.
  *
  * The notify target changes depending on whether @tsk is being ptraced or
  * not.  If @tsk is being ptraced, it's always the ptracer; otherwise, it's
@@ -1632,9 +1639,26 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 
 	switch (why) {
 	case CLD_CONTINUED:
+		notify = why;
+		break;
+
 	case CLD_STOPPED:
+		/*
+		 * If ptraced, always notify; otherwise, notify once if
+		 * NOTIFY_STOP is set.
+		 */
+		if (task_ptrace(tsk))
+			notify = CLD_STOPPED;
+		else if (sig->flags & SIGNAL_NOTIFY_STOP) {
+			notify = CLD_STOPPED;
+			sig->flags &= ~SIGNAL_NOTIFY_STOP;
+		}
+		break;
+
 	case CLD_TRAPPED:
-		notify = why;
+		/* TRAPPED is possible only while ptraced and always notified */
+		WARN_ON_ONCE(!task_ptrace(tsk));
+		notify = CLD_TRAPPED;
 		break;
 	}
 
@@ -1901,21 +1925,12 @@ retry:
 	__set_current_state(TASK_STOPPED);
 
 	if (likely(!task_ptrace(current))) {
-		int notify = 0;
-
-		/*
-		 * If there are no other threads in the group, or if there
-		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
-		 */
-		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
-
+		task_participate_group_stop(current);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		if (notify) {
+		if (sig->flags & SIGNAL_NOTIFY_STOP) {
 			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, notify);
+			do_notify_parent_cldstop(current, CLD_STOPPED);
 			read_unlock(&tasklist_lock);
 		}
 
@@ -2160,7 +2175,6 @@ relock:
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	struct task_struct *t;
 
 	if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
@@ -2185,15 +2199,14 @@ void exit_signals(struct task_struct *tsk)
 		if (!signal_pending(t) && !(t->flags & PF_EXITING))
 			recalc_sigpending_and_wake(t);
 
-	if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
+	if (unlikely(tsk->group_stop & GROUP_STOP_PENDING))
+		task_participate_group_stop(tsk);
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-	if (unlikely(group_stop)) {
+	if (unlikely(tsk->signal->flags & SIGNAL_NOTIFY_STOP)) {
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, group_stop);
+		do_notify_parent_cldstop(tsk, CLD_STOPPED);
 		read_unlock(&tasklist_lock);
 	}
 }
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (11 preceding siblings ...)
  2010-12-06 16:57 ` [PATCH 12/16] ptrace: make group stop notification reliable against ptrace Tejun Heo
@ 2010-12-06 16:57 ` Tejun Heo
  2010-12-20 18:15   ` Oleg Nesterov
  2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:57 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

* Collapse ptrace_untrace() into __ptrace_unlink().

* Always do the whole unlinking inside siglock.

* Untracing is done before unlinking.

This is to prepare for further changes.  As the whole
unlinking/tracing is done inside both tasklist lock and siglock, the
reordering doesn't cause any visible behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/ptrace.c |   40 ++++++++++++++++++----------------------
 1 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 5191301..6ac12f4 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -38,45 +38,41 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
 }
 
 /*
- * Turn a tracing stop into a normal stop now, since with no tracer there
- * would be no way to wake it up with SIGCONT or SIGKILL.  If there was a
- * signal sent that would resume the child, but didn't because it was in
- * TASK_TRACED, resume it now.
- * Requires that irqs be disabled.
+ * unptrace a task: move it back to its original parent and remove it
+ * from the ptrace list.
+ *
+ * Turn a tracing stop into a normal stop now, since with no tracer
+ * there would be no way to wake it up with SIGCONT or SIGKILL.  If
+ * there was a signal sent that would resume the child, but didn't
+ * because it was in TASK_TRACED, resume it now.  Requires that irqs
+ * be disabled.
+ *
+ * Must be called with the tasklist lock write-held.
  */
-static void ptrace_untrace(struct task_struct *child)
+void __ptrace_unlink(struct task_struct *child)
 {
+	struct signal_struct *sig = child->signal;
+
+	BUG_ON(!child->ptrace);
+
 	spin_lock(&child->sighand->siglock);
+
 	if (task_is_traced(child)) {
 		/*
 		 * If group stop is completed or in progress, it should
 		 * participate in the group stop.  Set GROUP_STOP_PENDING
 		 * before kicking it.
 		 */
-		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
-		    child->signal->group_stop_count)
+		if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
 			child->group_stop |= GROUP_STOP_PENDING;
 		signal_wake_up(child, 1);
 	}
-	spin_unlock(&child->sighand->siglock);
-}
-
-/*
- * unptrace a task: move it back to its original parent and
- * remove it from the ptrace list.
- *
- * Must be called with the tasklist lock write-held.
- */
-void __ptrace_unlink(struct task_struct *child)
-{
-	BUG_ON(!child->ptrace);
 
 	child->ptrace = 0;
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 
-	if (task_is_traced(child))
-		ptrace_untrace(child);
+	spin_unlock(&child->sighand->siglock);
 }
 
 static int ptrace_wait_trap(void *flags)
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (12 preceding siblings ...)
  2010-12-06 16:57 ` [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace() Tejun Heo
@ 2010-12-06 16:57 ` Tejun Heo
  2010-12-20 19:43   ` Oleg Nesterov
  2010-12-21 17:25   ` Oleg Nesterov
  2010-12-06 16:57 ` [PATCH 15/16] ptrace: make sure SIGNAL_NOTIFY_CONT is checked after ptrace_signal() Tejun Heo
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:57 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

Currently, SIGCONT notifications which are pending on ptrace attach or
occur while ptraced are reported to the tracer and never make it to
the real parent.

This patch adds a new signal flag SIGNAL_NOTIFY_CONT which is set when
a task is woken up by SIGCONT and cleared once the event is notified
to the parent.  SIGNAL_CLD_MASK bits are no longer cleared after
notification.  Combined with clearing SIGNAL_CLD_MASK if
!SIGNAL_NOTIFY_CONT on ptrace attach, these bits are set on ptrace
detach iff the tracee owes a notification to the real parent.
__ptrace_unlink() is updated to check these bits and reschedule
SIGCONT notification if necessary.

As notification delivery should consider both the child's signal flags
and ptraced state, it should be done while holding both siglock and
tasklist_lock.  The delivery logic is moved into
do_notify_parent_cldstop() and performed while holding both
tasklist_lock and siglock.

This change puts the initial SIGNAL_NOTIFY_CONT test out of siglock.
This is safe as the bit set by the waker are guaranteed to be visible
to the woken up task.

Oleg spotted the above race condition caused by ptrace state and
signal state protected by different locks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/sched.h |    1 +
 kernel/ptrace.c       |   36 ++++++++++++++++++++++++++++++++++++
 kernel/signal.c       |   49 ++++++++++++++++++++++++++++---------------------
 3 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7a26e7d..ed7725b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -654,6 +654,7 @@ struct signal_struct {
 #define SIGNAL_UNKILLABLE	0x00000040 /* for init: ignore fatal signals */
 
 #define SIGNAL_NOTIFY_STOP	0x00000100 /* notify parent of group stop */
+#define SIGNAL_NOTIFY_CONT	0x00000200 /* notify parent of continuation */
 
 /* If true, all threads except ->group_exit_task have pending SIGKILL */
 static inline int signal_group_exit(const struct signal_struct *sig)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 6ac12f4..dba6aeb 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -52,6 +52,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
 void __ptrace_unlink(struct task_struct *child)
 {
 	struct signal_struct *sig = child->signal;
+	bool woken_up = false;
 
 	BUG_ON(!child->ptrace);
 
@@ -66,6 +67,33 @@ void __ptrace_unlink(struct task_struct *child)
 		if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
 			child->group_stop |= GROUP_STOP_PENDING;
 		signal_wake_up(child, 1);
+		woken_up = true;
+	}
+
+	/*
+	 * SIGNAL_CLD_MASK is cleared only on a stop signal or, if
+	 * notification isn't pending, ptrace attach.  If any bit is
+	 * set,
+	 *
+	 * - SIGCONT notification was pending before attach or there
+	 *   was one or more SIGCONT notifications while tracing.
+	 *
+	 * - And, there hasn't been any stop signal since the last
+	 *   pending SIGCONT notification.
+	 *
+	 * Combined, it means that the tracee owes a SIGCONT
+	 * notification to the real parent.
+	 */
+	if (sig->flags & SIGNAL_CLD_MASK) {
+		sig->flags |= SIGNAL_NOTIFY_CONT;
+		/*
+		 * Force the tracee into signal delivery path so that
+		 * the notification is delievered ASAP.  This wakeup
+		 * is unintrusive as SIGCONT delivery would have
+		 * caused the same effect.
+		 */
+		if (!woken_up)
+			signal_wake_up(child, 0);
 	}
 
 	child->ptrace = 0;
@@ -245,6 +273,14 @@ int ptrace_attach(struct task_struct *task)
 		signal_wake_up(task, 1);
 	}
 
+	/*
+	 * Clear SIGNAL_CLD_MASK if NOTIFY_CONT is not set.  This is
+	 * used to preserve SIGCONT notification across ptrace
+	 * attach/detach.  Read the comment in __ptrace_unlink().
+	 */
+	if (!(task->signal->flags & SIGNAL_NOTIFY_CONT))
+		task->signal->flags &= ~SIGNAL_CLD_MASK;
+
 	spin_unlock(&task->sighand->siglock);
 
 	retval = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index 3196367..7b6f972 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -832,7 +832,8 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 			 * will take ->siglock, notice SIGNAL_CLD_MASK, and
 			 * notify its parent. See get_signal_to_deliver().
 			 */
-			signal->flags = why | SIGNAL_STOP_CONTINUED;
+			why |= SIGNAL_STOP_CONTINUED | SIGNAL_NOTIFY_CONT;
+			signal->flags = why;
 			signal->group_stop_count = 0;
 			signal->group_exit_code = 0;
 		} else {
@@ -1603,6 +1604,8 @@ int do_notify_parent(struct task_struct *tsk, int sig)
  * Depending on @why and other conditions, the notification might be
  * skipped.
  *
+ * CLD_CONTINUED	: Notify once if SIGNAL_NOTIFY_CONT is set.
+ *
  * CLD_STOPPED		: If ptraced, always notify; otherwise, notify
  *			  once if SIGNAL_NOTIFY_STOP is set.
  *
@@ -1639,7 +1642,24 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 
 	switch (why) {
 	case CLD_CONTINUED:
-		notify = why;
+		/*
+		 * Notify once if NOTIFY_CONT is set regardless of ptrace.
+		 * NOTIFY_CONT will be reinstated on detach if necessary.
+		 */
+		if (!(sig->flags & SIGNAL_NOTIFY_CONT))
+			break;
+
+		/*
+		 * If ptraced, always report CLD_CONTINUED; otherwise,
+		 * prepare_signal(SIGCONT) encodes the CLD_ si_code into
+		 * SIGNAL_CLD_MASK bits.
+		 */
+		if (task_ptrace(tsk) || (sig->flags & SIGNAL_CLD_CONTINUED))
+			notify = CLD_CONTINUED;
+		else
+			notify = CLD_STOPPED;
+
+		sig->flags &= ~SIGNAL_NOTIFY_CONT;
 		break;
 
 	case CLD_STOPPED:
@@ -2015,31 +2035,18 @@ relock:
 	 */
 	try_to_freeze();
 
-	spin_lock_irq(&sighand->siglock);
 	/*
-	 * Every stopped thread goes here after wakeup. Check to see if
-	 * we should notify the parent, prepare_signal(SIGCONT) encodes
-	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
+	 * Every stopped thread should go through this function after
+	 * waking up.  Check to see if we should notify the parent.
 	 */
-	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		int why;
-
-		if (task_ptrace(current) ||
-		    (signal->flags & SIGNAL_CLD_CONTINUED))
-			why = CLD_CONTINUED;
-		else
-			why = CLD_STOPPED;
-
-		signal->flags &= ~SIGNAL_CLD_MASK;
-
-		spin_unlock_irq(&sighand->siglock);
-
+	if (unlikely(current->signal->flags & SIGNAL_NOTIFY_CONT)) {
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, why);
+		do_notify_parent_cldstop(current, CLD_CONTINUED);
 		read_unlock(&tasklist_lock);
-		goto relock;
 	}
 
+	spin_lock_irq(&sighand->siglock);
+
 	for (;;) {
 		struct k_sigaction *ka;
 		/*
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 15/16] ptrace: make sure SIGNAL_NOTIFY_CONT is checked after ptrace_signal()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (13 preceding siblings ...)
  2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
@ 2010-12-06 16:57 ` Tejun Heo
  2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:57 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

ptrace_signal() releases siglock and signal delivery may continue
afterwards.  SIGNAL_NOTIFY_CONT can be set inbetween and should be
checked after returning from the function.

* Restart from the top if ptrace_signal() returns 0.

* Factor out CLD_CONTINUED check code into notify_parent_cont() and
  check before returning from get_signal_to_deliver() too.

With the latter, the former isn't strictly necessary but it's still
better to do it to document what's going on if for nothing else.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |   38 +++++++++++++++++++++++++++-----------
 1 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7b6f972..5eddda6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2019,6 +2019,19 @@ static int ptrace_signal(int signr, siginfo_t *info,
 	return signr;
 }
 
+static inline void notify_parent_cont(void)
+{
+	/*
+	 * Every stopped thread should go through this function after
+	 * waking up.  Check to see if we should notify the parent.
+	 */
+	if (unlikely(current->signal->flags & SIGNAL_NOTIFY_CONT)) {
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current->group_leader, CLD_CONTINUED);
+		read_unlock(&tasklist_lock);
+	}
+}
+
 int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
 			  struct pt_regs *regs, void *cookie)
 {
@@ -2035,15 +2048,7 @@ relock:
 	 */
 	try_to_freeze();
 
-	/*
-	 * Every stopped thread should go through this function after
-	 * waking up.  Check to see if we should notify the parent.
-	 */
-	if (unlikely(current->signal->flags & SIGNAL_NOTIFY_CONT)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, CLD_CONTINUED);
-		read_unlock(&tasklist_lock);
-	}
+	notify_parent_cont();
 
 	spin_lock_irq(&sighand->siglock);
 
@@ -2073,8 +2078,11 @@ relock:
 			if (signr != SIGKILL) {
 				signr = ptrace_signal(signr, info,
 						      regs, cookie);
-				if (!signr)
-					continue;
+				if (!signr) {
+					/* siglock was released, restart */
+					spin_unlock_irq(&sighand->siglock);
+					goto relock;
+				}
 			}
 
 			ka = &sighand->action[signr-1];
@@ -2177,6 +2185,14 @@ relock:
 		/* NOTREACHED */
 	}
 	spin_unlock_irq(&sighand->siglock);
+
+	/*
+	 * If ptrace_signal() returned a non-zero signal, control can reach
+	 * here without other pending signals or going through relocking
+	 * and a CONT notification may be left pending.  Check it.
+	 */
+	notify_parent_cont();
+
 	return signr;
 }
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (14 preceding siblings ...)
  2010-12-06 16:57 ` [PATCH 15/16] ptrace: make sure SIGNAL_NOTIFY_CONT is checked after ptrace_signal() Tejun Heo
@ 2010-12-06 16:57 ` Tejun Heo
  2010-12-07  0:10   ` Roland McGrath
  2010-12-21 17:54   ` Oleg Nesterov
  2010-12-14 17:36 ` [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Oleg Nesterov
  2010-12-22 15:20 ` Oleg Nesterov
  17 siblings, 2 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-06 16:57 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil; +Cc: Tejun Heo

This wake_up_process() has a turbulent history.  This is a remnant
from ancient ptrace implementation and patently wrong.  Commit
95a3540d (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx
logic) removed it but the change was reverted later by commit edaba2c5
(ptrace: revert "ptrace_detach: the wrong wakeup breaks the
ERESTARTxxx logic ") citing compatibility breakage and general
brokeness of the whole group stop / ptrace interaction.

Digging through the mailing archives, the compatibility breakage
doesn't seem to be critical in the sense that the behavior isn't well
defined or reliable to begin with and it seems to have been agreed to
remove the wakeup with proper cleanup of the whole thing.

Now that the group stop and its interaction with ptrace are cleaned up
and well defined, it's high time to finally kill this silliness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/ptrace.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index dba6aeb..575bf8f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -391,8 +391,6 @@ int ptrace_detach(struct task_struct *child, unsigned int data)
 	if (child->ptrace) {
 		child->exit_code = data;
 		dead = __ptrace_detach(current, child);
-		if (!child->exit_state)
-			wake_up_process(child);
 	}
 	write_unlock_irq(&tasklist_lock);
 
-- 
1.7.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
@ 2010-12-07  0:10   ` Roland McGrath
  2010-12-07 13:43     ` Tejun Heo
  2010-12-21 17:54   ` Oleg Nesterov
  1 sibling, 1 reply; 62+ messages in thread
From: Roland McGrath @ 2010-12-07  0:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

The plain wake_up_process was certainly wrong from the beginning.

We were conservative about changing it because of the difficulty of
chasing all the corners where userland debuggers' behavior might be
made to regress when it had been reliable in practice before (even
if not always in theory, such as possible races that didn't bite in
reality).  The userland code has gone to many contortions to cope
with how the kernel behaved in the past, whether or not that
behavior ever made any good sense.

For that sort of reason, none of this stuff should change at all in
a -stable kernel, nor late in a release cycle.

For new kernels, I think changing the behavior in the direction of
something that can actually be described is OK as long as userland
debugger maintainers like Jan agree to the new behavior and that the
behavior really and truly does follow an articulated set of rules
that the kernel and userland sides agree to.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-07  0:10   ` Roland McGrath
@ 2010-12-07 13:43     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-07 13:43 UTC (permalink / raw)
  To: Roland McGrath; +Cc: oleg, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Roland.

On 12/07/2010 01:10 AM, Roland McGrath wrote:
> The plain wake_up_process was certainly wrong from the beginning.
> 
> We were conservative about changing it because of the difficulty of
> chasing all the corners where userland debuggers' behavior might be
> made to regress when it had been reliable in practice before (even
> if not always in theory, such as possible races that didn't bite in
> reality).  The userland code has gone to many contortions to cope
> with how the kernel behaved in the past, whether or not that
> behavior ever made any good sense.
> 
> For that sort of reason, none of this stuff should change at all in
> a -stable kernel, nor late in a release cycle.

Sure, definitely.  All these changes are at the earliest for the next
merge window.

> For new kernels, I think changing the behavior in the direction of
> something that can actually be described is OK as long as userland
> debugger maintainers like Jan agree to the new behavior and that the
> behavior really and truly does follow an articulated set of rules
> that the kernel and userland sides agree to.

Yeap, that sounds good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (15 preceding siblings ...)
  2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
@ 2010-12-14 17:36 ` Oleg Nesterov
  2010-12-14 17:46   ` Tejun Heo
  2010-12-22 15:20 ` Oleg Nesterov
  17 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-14 17:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> This is the second attempt at cleaning up ptrace and signal behaviors,

Tejun, I am really, really sorry for the huge delay.

I will _try_ very much to read this on Friday. If nothing else,
I am very interested to understand these changes in details.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2
  2010-12-14 17:36 ` [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Oleg Nesterov
@ 2010-12-14 17:46   ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-14 17:46 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On 12/14/2010 06:36 PM, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
>>
>> This is the second attempt at cleaning up ptrace and signal behaviors,
> 
> Tejun, I am really, really sorry for the huge delay.
> 
> I will _try_ very much to read this on Friday. If nothing else,
> I am very interested to understand these changes in details.

No need to be sorry at all.  It's not like this patchset is an urgent
fix.  This part of ptrace has been broken for a very long time and
we've been more or less fine with it.  At least I would be able to
enjoy the sweet delusion that this take is okay for a bit longer.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] signal: fix CLD_CONTINUED notification target
  2010-12-06 16:56 ` [PATCH 02/16] signal: fix CLD_CONTINUED notification target Tejun Heo
@ 2010-12-20 14:58   ` Oleg Nesterov
  2010-12-21 16:31     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 14:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> CLD_CONTINUED notification code calls do_notify_parent_cldstop() with
> its group_leader; however, do_notify_parent_cldstop() already uses the
> group_leader for non-ptraced notifications.

Yes,

> The duplicate
> ->group_leader dereferencing is unnecessary and leads to incorrectly
> notifying the group_leader's ptracer of CLD_CONTINUED from a different
> task in the group.  Fix it.

I do not really agree this is wrong, group_leader was used intentionally
for ptrace case.

There is no "correct" thread who should report CLD_CONTINUED, a random
thread wins and notifies its ->real_parent or debugger. If we always
choose ->group_leader, we always knows what happens. With this patch
we can't predict where does this notification go.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1867,7 +1867,7 @@ relock:
>
>  		if (why) {
>  			read_lock(&tasklist_lock);
> -			do_notify_parent_cldstop(current->group_leader, why);
> +			do_notify_parent_cldstop(current, why);

OTOH, I see nothing really wrong with this change, and this all will
be reworked by the next patches anyway.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop()
  2010-12-06 16:56 ` [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
@ 2010-12-20 14:59   ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 14:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> do_signal_stop() is used only by get_signal_to_deliver() and after a
> successful signal stop, it always calls try_to_freeze(), so the
> try_to_freeze() loop around schedule() in do_signal_stop() is
> superflous and confusing.  Remove it.

I think the patch is obvioulsy fine.

Acked-by: Oleg Nesterov <oleg@redhat.com>



But I am a bit confused, a couple of off-topic questions to Rafael.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
>  	}
>
>  	/* Now we don't run again until woken by SIGCONT or SIGKILL */
> -	do {
> -		schedule();
> -	} while (try_to_freeze());
> +	schedule();

I am wondering what was the purpose of this do/while loop. Probably
this was just oversight. We always return in TASK_RUNNING state from
schedule, try_to_freeze() should return the task in this state too.

My question is: refrigerator() tries to preserve the caller's state.
Why? I think it is just wrong to call refrigerator() unless the
task is TASK_RUNNING, no? If no, then the games with saving/restoring
->state look obviously racy/wrong.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] ptrace: kill tracehook_notify_jctl()
  2010-12-06 16:56 ` [PATCH 04/16] ptrace: kill tracehook_notify_jctl() Tejun Heo
@ 2010-12-20 14:59   ` Oleg Nesterov
  2010-12-21 17:00     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 14:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> tracehook_notify_jctl() aids in determining whether and what to report
> to the parent when a task is stopped or continued.  The function also
> adds an extra requirement that siglock may be released across it,
> which is currently unused and quite difficult to satisfy in
> well-defined manner.

OK. I agree, tracehook_notify_jctl() looks very unobvious, especially
because it is not really used currently.

The patch looks correct, except

> @@ -1853,21 +1850,19 @@ relock:
>  	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
>  		int why;
>
> -		if (signal->flags & SIGNAL_CLD_CONTINUED)
> +		if (task_ptrace(current) ||
> +		    (signal->flags & SIGNAL_CLD_CONTINUED))
>  			why = CLD_CONTINUED;
>  		else
>  			why = CLD_STOPPED;

Hmm, I can't understand this.

task_ptrace() should not turn CLD_STOPPED in CLD_CONTINUED?

Looking ahead, it _seems_ that the next patches keep this logic,
could you explain?

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace
  2010-12-06 16:56 ` [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
@ 2010-12-20 15:00   ` Oleg Nesterov
  2010-12-21 17:04     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 15:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> +static bool task_participate_group_stop(struct task_struct *task)
> +{
> +	struct signal_struct *sig = task->signal;
> +	bool consume = task->group_stop & GROUP_STOP_CONSUME;
> +
> +	task_clear_group_stop(task);
> +
> +	if (!consume)
> +		return false;
> +
> +	task->group_stop &= ~GROUP_STOP_CONSUME;

Minor nit, GROUP_STOP_CONSUME was already cleared by task_clear_group_stop().

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-06 16:56 ` [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED Tejun Heo
@ 2010-12-20 15:00   ` Oleg Nesterov
  2010-12-21 17:31     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 15:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> @@ -93,6 +99,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
>  	 * we are sure that this is our traced child and that can only
>  	 * be changed by us so it's not changing right after this.
>  	 */
> +relock:
>  	read_lock(&tasklist_lock);
>  	if ((child->ptrace & PT_PTRACED) && child->parent == current) {
>  		ret = 0;
> @@ -101,10 +108,30 @@ int ptrace_check_attach(struct task_struct *child, int kill)
>  		 * does ptrace_unlink() before __exit_signal().
>  		 */
>  		spin_lock_irq(&child->sighand->siglock);
> -		if (task_is_stopped(child))
> -			child->state = TASK_TRACED;
> -		else if (!task_is_traced(child) && !kill)
> +		if (!task_is_traced(child) && !kill) {
> +			/*
> +			 * If GROUP_STOP_TRAPPING is set, it is known that
> +			 * the tracee will enter either TRACED or the bit
> +			 * will be cleared in definite amount of (userland)
> +			 * time.  Wait while the bit is set.
> +			 *
> +			 * This hides PTRACE_ATTACH initiated transition
> +			 * from STOPPED to TRACED from userland.
> +			 */
> +			if (child->group_stop & GROUP_STOP_TRAPPING) {
> +				const int bit = ilog2(GROUP_STOP_TRAPPING);
> +				DEFINE_WAIT_BIT(wait, &child->group_stop, bit);

Unused "wait_bit_queue wait"

> +
> +				spin_unlock_irq(&child->sighand->siglock);
> +				read_unlock(&tasklist_lock);
> +
> +				wait_on_bit(&child->group_stop, bit,

Hmm. we could probably use ->wait_chldexit/__wake_up_parent instead,
although I am not sure this would be more clean...

> +					    ptrace_wait_trap,
> +					    TASK_UNINTERRUPTIBLE);

I am nervous ;) To me, TASK_KILLABLE makes more sense, and it is
safer if we have a bug.

> +				goto relock;

We already set ret = 0. "relock" should set -ESRCH.

> +			}
>  			ret = -ESRCH;
> +		}

Probably this deserves a minor cleanup,

	relock:
		ret = -ESRCH;
		read_lock(&tasklist_lock);
		if (task_is_traced() || kill) {
			ret = 0;
		} else {
		...


OK. So, ptrace_attach() asks a stopped tracee to s/STOPPED/TRACED/.
If it is not stopped, it should call ptrace_stop() eventually.

This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
ptrace_check_attach() can return the wrong ESRCH after that. Perhaps
it is time to kill the CLONE_STOPPED code in do_fork().

> @@ -204,6 +231,26 @@ int ptrace_attach(struct task_struct *task)
>  	__ptrace_link(task, current);
>  	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>
> +	spin_lock(&task->sighand->siglock);
> +
> +	/*
> +	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
> +	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
> +	 * will be cleared if the child completes the transition or any
> +	 * event which clears the group stop states happens.  The bit is
> +	 * waited by ptrace_check_attach() to hide the transition from
> +	 * userland.
> +	 *
> +	 * The following is safe as both transitions in and out of STOPPED
> +	 * are protected by siglock.
> +	 */
> +	if (task_is_stopped(task)) {
> +		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
> +		signal_wake_up(task, 1);
> +	}
> +
> +	spin_unlock(&task->sighand->siglock);

Well. I do not know whether this matters, but "hide the transition from
userland" is not 100% correct. I mean, this change is still visible.

ptrace_check_attach()->wait_on_bit() logic fixes the previous example,
but:

	1. the tracer knows that the tracee is stopped

	2. the tracer does ptrace(ATTACH)

	3. the tracer does do_wait()

In this case do_wait() can see the tracee in TASK_RUNNING state,
this breaks wait_task_stopped(ptrace => true).

Jan?

> @@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
>  		 */
>  		sig->group_exit_code = signr;
>
> -		current->group_stop = gstop;
> +		current->group_stop &= ~GROUP_STOP_SIGMASK;
> +		current->group_stop |= signr | gstop;
>  		sig->group_stop_count = 1;
> -		for (t = next_thread(current); t != current; t = next_thread(t))
> +		for (t = next_thread(current); t != current;
> +		     t = next_thread(t)) {
> +			t->group_stop &= ~GROUP_STOP_SIGMASK;
>  			/*
>  			 * Setting state to TASK_STOPPED for a group
>  			 * stop is always done with the siglock held,
>  			 * so this check has no races.
>  			 */
>  			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
> -				t->group_stop = gstop;
> +				t->group_stop |= signr | gstop;
>  				sig->group_stop_count++;
>  				signal_wake_up(t, 0);
> -			} else
> +			} else {
>  				task_clear_group_stop(t);

This looks racy. Suppose that "current" is ptraced, in this case
it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
is set and we have another TASK_STOPPED thead T.

Suppose that another (or same) debugger ataches to this thread T,
wakes it up and sets GROUP_STOP_TRAPPING.

T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
->siglock.

Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).

I think ptrace_stop() should be called in TASK_RUNNING state.
This also makes sense because we may call arch_ptrace_stop().

> +				t->group_stop |= signr;

Probably this doesn't really matter, but why do we need to
change the GROUP_STOP_SIGMASK part of t->group_stop? If it
is exiting, this is not needed. If it is already stopped, then
it already has the correct (previous) signr.

> +			}
> +		}
>  	}
> -
> +retry:
>  	current->exit_code = sig->group_exit_code;
>  	__set_current_state(TASK_STOPPED);

It is no longer needed to set ->exit_code here. The only reason
was s/STOPPED/TRACED/ change in ptrace_check_attach(). Now that
we rely on ptrace_stop() which sets ->exit_state, this can be
removed.

And,

> @@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
>
>  		spin_lock_irq(&current->sighand->siglock);
>  	} else
> -		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> +		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> +			    CLD_STOPPED, 0, NULL);

Perhaps it would be more clean to clear ->exit_code here, in the
"else" branch.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/16] signal: prepare for CLD_* notification changes
  2010-12-06 16:56 ` [PATCH 11/16] signal: prepare for CLD_* notification changes Tejun Heo
@ 2010-12-20 16:21   ` Oleg Nesterov
  2010-12-20 16:23     ` Oleg Nesterov
  2010-12-21 17:35     ` Tejun Heo
  0 siblings, 2 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 16:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
>  static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
>  {
>  	struct siginfo info;
>  	unsigned long flags;
>  	struct task_struct *parent;
>  	struct sighand_struct *sighand;
> +	struct signal_struct *sig;
> +	int notify = 0;
> +
> +	/*
> +	 * Determine whether and what to notify.  This should be done under
> +	 * @tsk's siglock.

Hmm... it is not clear why.

> +	spin_lock_irqsave(&sighand->siglock, flags);
>
> +	switch (why) {
> +	case CLD_CONTINUED:
> +	case CLD_STOPPED:
> +	case CLD_TRAPPED:
> +		notify = why;
> +		break;
> +	}

OK, with the next patches this code checks sig->flags, probably that
is why we take ->siglock. Still I can't understand this so far.
May be the comment could tell more?

> @@ -1640,6 +1684,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
>   	}
>
>  	sighand = parent->sighand;
> +	sig = parent->signal;

This looks unneeded.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/16] signal: prepare for CLD_* notification changes
  2010-12-20 16:21   ` Oleg Nesterov
@ 2010-12-20 16:23     ` Oleg Nesterov
  2010-12-21 17:35     ` Tejun Heo
  1 sibling, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 16:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/20, Oleg Nesterov wrote:
>
> On 12/06, Tejun Heo wrote:
> >
> > +	 * Determine whether and what to notify.  This should be done under
> > +	 * @tsk's siglock.
>
> Hmm... it is not clear why.
>
> > +	spin_lock_irqsave(&sighand->siglock, flags);
> >
> > +	switch (why) {
> > +	case CLD_CONTINUED:
> > +	case CLD_STOPPED:
> > +	case CLD_TRAPPED:
> > +		notify = why;
> > +		break;
> > +	}
>
> OK, with the next patches this code checks sig->flags, probably that
> is why we take ->siglock. Still I can't understand this so far.
> May be the comment could tell more?

Ah, I didn't notice we are going to change sig->flags here, please
ignore.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-06 16:57 ` [PATCH 12/16] ptrace: make group stop notification reliable against ptrace Tejun Heo
@ 2010-12-20 17:34   ` Oleg Nesterov
  2010-12-21 17:43     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 17:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
> group stop completion and cleared on notification to the real parent
> or together with other stopped flags on SIGCONT/KILL.  This guarantees
> that the real parent is notified correctly regardless of ptrace.

OK, I am a bit confused. I do not understand exactly what this
"correctly" actually means.

> If a
> ptraced task is the last task to stop, the notification is postponed
> till ptrace detach or canceled if SIGCONT/KILL is received inbetween.

OK. But what if the last task to stop is not ptraced? In this case
->real_parent gets the notification.

Of course, the current behaviour is not better, it is obviously wrong.
But if we want to fix things, perhaps we should invite the new and
clear rules. Isn't it better to always notify ->real_parent when the
last thread stops in STOPPED/TRACED state? Otherwise the behaviour is
not predictable, it depends on task_ptrace() state of the last thead
which sees SIGNAL_NOTIFY_STOP.

Actually, I think it would be even better to never notify ->real_parent
until debugger detaches all threads, but this is not simple to implement.

> @@ -1901,21 +1925,12 @@ retry:
>  	__set_current_state(TASK_STOPPED);
>
>  	if (likely(!task_ptrace(current))) {
> -		int notify = 0;
> -
> -		/*
> -		 * If there are no other threads in the group, or if there
> -		 * is a group stop in progress and we are the last to stop,
> -		 * report to the parent.
> -		 */
> -		if (task_participate_group_stop(current))
> -			notify = CLD_STOPPED;
> -
> +		task_participate_group_stop(current);
>  		spin_unlock_irq(&current->sighand->siglock);
>
> -		if (notify) {
> +		if (sig->flags & SIGNAL_NOTIFY_STOP) {
>  			read_lock(&tasklist_lock);
> -			do_notify_parent_cldstop(current, notify);
> +			do_notify_parent_cldstop(current, CLD_STOPPED);

Suppose that debugger attaches right after spin_unlock(->siglock).

Nothing really bad can happen afaics, but in this case the debugger
will be notified twice. Hmm. If the debugger does do_wait() immediately
after the first notification, it has all rights to see the stopped
tracee but wait_task_stopped() fails, not good.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace()
  2010-12-06 16:57 ` [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace() Tejun Heo
@ 2010-12-20 18:15   ` Oleg Nesterov
  2010-12-21 17:54     ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 18:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

A bit off-topic note,

On 12/06, Tejun Heo wrote:
>
> -static void ptrace_untrace(struct task_struct *child)
> +void __ptrace_unlink(struct task_struct *child)
>  {
> +	struct signal_struct *sig = child->signal;
> +
> +	BUG_ON(!child->ptrace);
> +
>  	spin_lock(&child->sighand->siglock);
> +
>  	if (task_is_traced(child)) {
>  		/*
>  		 * If group stop is completed or in progress, it should
>  		 * participate in the group stop.  Set GROUP_STOP_PENDING
>  		 * before kicking it.
>  		 */
> -		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
> -		    child->signal->group_stop_count)
> +		if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
>  			child->group_stop |= GROUP_STOP_PENDING;
>  		signal_wake_up(child, 1);

OK. Of course, I do not blame this patch, this mimics the current
behaviour.

But, afaics, this is not exactly right in the long term. Suppose
that SIGNAL_STOP_STOPPED is set but the tracee is running (this can
happen if, say, debugger resumes the tracee and exits). In this case,
I think this thread should be stopped too.

IIRC, I already tried to do this, but the patch (or idea) was nacked
because it means another user-visible change. However, if we want to
really fix things, we should fix this case too. If SIGNAL_STOP_STOPPED
is set, there should be no running threads after detach.

Or. We can change the rules for ptrace_resume(), more on this later.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
@ 2010-12-20 19:43   ` Oleg Nesterov
  2010-12-21 17:48     ` Tejun Heo
  2010-12-21 17:25   ` Oleg Nesterov
  1 sibling, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-20 19:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> This patch adds a new signal flag SIGNAL_NOTIFY_CONT which is set when
> a task is woken up by SIGCONT and cleared once the event is notified
> to the parent.  SIGNAL_CLD_MASK bits are no longer cleared after
> notification.  Combined with clearing SIGNAL_CLD_MASK if
> !SIGNAL_NOTIFY_CONT on ptrace attach, these bits are set on ptrace
> detach iff the tracee owes a notification to the real parent.
> __ptrace_unlink() is updated to check these bits and reschedule
> SIGCONT notification if necessary.

It turns out I can no longer read the patches today ;)  Will continue
tomorrow with the fresh head.

I must admit, I have some concerns. But most probably I just need
to actually understand what this patch does.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/16] signal: fix CLD_CONTINUED notification target
  2010-12-20 14:58   ` Oleg Nesterov
@ 2010-12-21 16:31     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 16:31 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Oleg.

On Mon, Dec 20, 2010 at 03:58:15PM +0100, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
> >
> > CLD_CONTINUED notification code calls do_notify_parent_cldstop() with
> > its group_leader; however, do_notify_parent_cldstop() already uses the
> > group_leader for non-ptraced notifications.
> 
> Yes,
> 
> > The duplicate
> > ->group_leader dereferencing is unnecessary and leads to incorrectly
> > notifying the group_leader's ptracer of CLD_CONTINUED from a different
> > task in the group.  Fix it.
> 
> I do not really agree this is wrong, group_leader was used intentionally
> for ptrace case.
> 
> There is no "correct" thread who should report CLD_CONTINUED, a random
> thread wins and notifies its ->real_parent or debugger. If we always
> choose ->group_leader, we always knows what happens. With this patch
> we can't predict where does this notification go.
> 
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1867,7 +1867,7 @@ relock:
> >
> >  		if (why) {
> >  			read_lock(&tasklist_lock);
> > -			do_notify_parent_cldstop(current->group_leader, why);
> > +			do_notify_parent_cldstop(current, why);
> 
> OTOH, I see nothing really wrong with this change, and this all will
> be reworked by the next patches anyway.

Ah, okay, I thought it was an unintentional bug.  The reason why the
change is necessary is because of later changes which re-instates the
notification on ptrace detach.  The code is modified such that
notification pending is cleared iff the notification is delivered to
an actual parent.  The double dereference makes it difficult to tell
whether the notification is beling delivered to an actual parent or
not.

If this wasn't a bug, I think the proper thing to do is to move this
later where the change is necessary, after the code learns how to
buffer the notification until ptrace detach and needs to determine
whether it's being eaten by the debugger or not.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/16] ptrace: kill tracehook_notify_jctl()
  2010-12-20 14:59   ` Oleg Nesterov
@ 2010-12-21 17:00     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:00 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Mon, Dec 20, 2010 at 03:59:56PM +0100, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
> >
> > tracehook_notify_jctl() aids in determining whether and what to report
> > to the parent when a task is stopped or continued.  The function also
> > adds an extra requirement that siglock may be released across it,
> > which is currently unused and quite difficult to satisfy in
> > well-defined manner.
> 
> OK. I agree, tracehook_notify_jctl() looks very unobvious, especially
> because it is not really used currently.
> 
> The patch looks correct, except
> 
> > @@ -1853,21 +1850,19 @@ relock:
> >  	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
> >  		int why;
> >
> > -		if (signal->flags & SIGNAL_CLD_CONTINUED)
> > +		if (task_ptrace(current) ||
> > +		    (signal->flags & SIGNAL_CLD_CONTINUED))
> >  			why = CLD_CONTINUED;
> >  		else
> >  			why = CLD_STOPPED;
> 
> Hmm, I can't understand this.
> 
> task_ptrace() should not turn CLD_STOPPED in CLD_CONTINUED?
> 
> Looking ahead, it _seems_ that the next patches keep this logic,
> could you explain?

That's the logic from tracehook_notify_jctl() or I think it is
incorrectly.  Yes, the latter.  I got confused the two parameters.  I
thought tracehook_notify_jctl() always returned CLD_CONTINUED when
traced.  The @why is @notified and CLD_CONTINUED is @why.  :-)

I'll drop the above chunk.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace
  2010-12-20 15:00   ` Oleg Nesterov
@ 2010-12-21 17:04     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:04 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Mon, Dec 20, 2010 at 04:00:18PM +0100, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
> >
> > +static bool task_participate_group_stop(struct task_struct *task)
> > +{
> > +	struct signal_struct *sig = task->signal;
> > +	bool consume = task->group_stop & GROUP_STOP_CONSUME;
> > +
> > +	task_clear_group_stop(task);
> > +
> > +	if (!consume)
> > +		return false;
> > +
> > +	task->group_stop &= ~GROUP_STOP_CONSUME;
> 
> Minor nit, GROUP_STOP_CONSUME was already cleared by task_clear_group_stop().

Thanks.  Dropped.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
  2010-12-20 19:43   ` Oleg Nesterov
@ 2010-12-21 17:25   ` Oleg Nesterov
  2010-12-22 10:35     ` Tejun Heo
  1 sibling, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-21 17:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> This patch adds a new signal flag SIGNAL_NOTIFY_CONT which is set when
> a task is woken up by SIGCONT and cleared once the event is notified
> to the parent.  SIGNAL_CLD_MASK bits are no longer cleared after
> notification.  Combined with clearing SIGNAL_CLD_MASK if
> !SIGNAL_NOTIFY_CONT on ptrace attach, these bits are set on ptrace
> detach iff the tracee owes a notification to the real parent.

But we can't know this. The notification and SIGNAL_NOTIFY_CONT are
per-process, while attach/detach is per-thread.

> @@ -66,6 +67,33 @@ void __ptrace_unlink(struct task_struct *child)
>  		if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
>  			child->group_stop |= GROUP_STOP_PENDING;
>  		signal_wake_up(child, 1);
> +		woken_up = true;
> +	}
> +
> +	/*
> +	 * SIGNAL_CLD_MASK is cleared only on a stop signal or, if
> +	 * notification isn't pending, ptrace attach.  If any bit is
> +	 * set,
> +	 *
> +	 * - SIGCONT notification was pending before attach or there
> +	 *   was one or more SIGCONT notifications while tracing.
> +	 *
> +	 * - And, there hasn't been any stop signal since the last
> +	 *   pending SIGCONT notification.
> +	 *
> +	 * Combined, it means that the tracee owes a SIGCONT
> +	 * notification to the real parent.
> +	 */
> +	if (sig->flags & SIGNAL_CLD_MASK) {
> +		sig->flags |= SIGNAL_NOTIFY_CONT;

Two threads, T1 and T2. T1 is ptraced, T2 is not.

SIGSTOP stops them both. T1 sleeps in TASK_TRACED, T2 in TASK_STOPPED.

prepare_signal(SIGCONT) sets SIGNAL_NOTIFY_CONT + SIGNAL_CLD_CONTINUED,
and wakes T2 up.

T2 notifies its ->real_parent, clears SIGNAL_NOTIFY_CONT.

Debugger does ptrace(PTRACE_DETACH, T1), sees SIGNAL_CLD_MASK, and
restores SIGNAL_NOTIFY_CONT.

T1 resends the (bogus) notification to its (and T2's) real_parent.


Even if I missed something,

> @@ -245,6 +273,14 @@ int ptrace_attach(struct task_struct *task)
>  		signal_wake_up(task, 1);
>  	}
>
> +	/*
> +	 * Clear SIGNAL_CLD_MASK if NOTIFY_CONT is not set.  This is
> +	 * used to preserve SIGCONT notification across ptrace
> +	 * attach/detach.  Read the comment in __ptrace_unlink().
> +	 */
> +	if (!(task->signal->flags & SIGNAL_NOTIFY_CONT))
> +		task->signal->flags &= ~SIGNAL_CLD_MASK;

What if there is another ptraced sub-thread in this group who "owes"
the notification ?


> +              * Force the tracee into signal delivery path so that
> +              * the notification is delievered ASAP.  This wakeup
> +              * is unintrusive as SIGCONT delivery would have
> +              * caused the same effect.
> +              */
> +             if (!woken_up)
> +                     signal_wake_up(child, 0);

Well, signal_wake_up() can really force the tracee into signal delivery.
It only sets TIF_SIGPENDING, but this can race with recalc_sigpending().

Oh. This reminds me: http://marc.info/?t=123411921400004

> @@ -1639,7 +1642,24 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
>
>  	switch (why) {
>  	case CLD_CONTINUED:
> -		notify = why;
> +		/*
> +		 * Notify once if NOTIFY_CONT is set regardless of ptrace.
> +		 * NOTIFY_CONT will be reinstated on detach if necessary.
> +		 */
> +		if (!(sig->flags & SIGNAL_NOTIFY_CONT))
> +			break;
> +
> +		/*
> +		 * If ptraced, always report CLD_CONTINUED; otherwise,
> +		 * prepare_signal(SIGCONT) encodes the CLD_ si_code into
> +		 * SIGNAL_CLD_MASK bits.
> +		 */
> +		if (task_ptrace(tsk) || (sig->flags & SIGNAL_CLD_CONTINUED))
> +			notify = CLD_CONTINUED;

See the comment on 4/16

> @@ -2015,31 +2035,18 @@ relock:
>  	 */
>  	try_to_freeze();
>
> -	spin_lock_irq(&sighand->siglock);
>  	/*
> -	 * Every stopped thread goes here after wakeup. Check to see if
> -	 * we should notify the parent, prepare_signal(SIGCONT) encodes
> -	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
> +	 * Every stopped thread should go through this function after
> +	 * waking up.  Check to see if we should notify the parent.
>  	 */
> -	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
> -		int why;
> -
> -		if (task_ptrace(current) ||
> -		    (signal->flags & SIGNAL_CLD_CONTINUED))
> -			why = CLD_CONTINUED;
> -		else
> -			why = CLD_STOPPED;
> -
> -		signal->flags &= ~SIGNAL_CLD_MASK;
> -
> -		spin_unlock_irq(&sighand->siglock);
> -
> +	if (unlikely(current->signal->flags & SIGNAL_NOTIFY_CONT)) {

I am not sure it is OK to check SIGNAL_NOTIFY_CONT without ->siglock.
If we return from do_signal_stop(), everything is fine.

But if we got here because of __ptrace_unlink()->signal_wake_up(1),
we can miss SIGNAL_NOTIFY_CONT.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-20 15:00   ` Oleg Nesterov
@ 2010-12-21 17:31     ` Tejun Heo
  2010-12-21 17:32       ` Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:31 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Mon, Dec 20, 2010 at 04:00:37PM +0100, Oleg Nesterov wrote:
> > +		if (!task_is_traced(child) && !kill) {
> > +			/*
> > +			 * If GROUP_STOP_TRAPPING is set, it is known that
> > +			 * the tracee will enter either TRACED or the bit
> > +			 * will be cleared in definite amount of (userland)
> > +			 * time.  Wait while the bit is set.
> > +			 *
> > +			 * This hides PTRACE_ATTACH initiated transition
> > +			 * from STOPPED to TRACED from userland.
> > +			 */
> > +			if (child->group_stop & GROUP_STOP_TRAPPING) {
> > +				const int bit = ilog2(GROUP_STOP_TRAPPING);
> > +				DEFINE_WAIT_BIT(wait, &child->group_stop, bit);
> 
> Unused "wait_bit_queue wait"

Oops, left over from a previous waitqueue based implementation.

> > +
> > +				spin_unlock_irq(&child->sighand->siglock);
> > +				read_unlock(&tasklist_lock);
> > +
> > +				wait_on_bit(&child->group_stop, bit,
> 
> Hmm. we could probably use ->wait_chldexit/__wake_up_parent instead,
> although I am not sure this would be more clean...

Hmmmm, I actually think that would be cleaner.  I just didn't know it
was there.  Will convert over to it.

> > +					    ptrace_wait_trap,
> > +					    TASK_UNINTERRUPTIBLE);
> 
> I am nervous ;) To me, TASK_KILLABLE makes more sense, and it is
> safer if we have a bug.

Sure thing.

> > +				goto relock;
> 
> We already set ret = 0. "relock" should set -ESRCH.
> 
> > +			}
> >  			ret = -ESRCH;
> > +		}
> 
> Probably this deserves a minor cleanup,
> 
> 	relock:
> 		ret = -ESRCH;
> 		read_lock(&tasklist_lock);
> 		if (task_is_traced() || kill) {
> 			ret = 0;
> 		} else {
> 		...

Updated.

> OK. So, ptrace_attach() asks a stopped tracee to s/STOPPED/TRACED/.
> If it is not stopped, it should call ptrace_stop() eventually.
> 
> This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
> ptrace_check_attach() can return the wrong ESRCH after that. Perhaps
> it is time to kill the CLONE_STOPPED code in do_fork().

Ah, thanks for spotting it.  I missed that.  We should be able to
convert it to call ptrace_stop(), right?

> > @@ -204,6 +231,26 @@ int ptrace_attach(struct task_struct *task)
> >  	__ptrace_link(task, current);
> >  	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
> >
> > +	spin_lock(&task->sighand->siglock);
> > +
> > +	/*
> > +	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
> > +	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
> > +	 * will be cleared if the child completes the transition or any
> > +	 * event which clears the group stop states happens.  The bit is
> > +	 * waited by ptrace_check_attach() to hide the transition from
> > +	 * userland.
> > +	 *
> > +	 * The following is safe as both transitions in and out of STOPPED
> > +	 * are protected by siglock.
> > +	 */
> > +	if (task_is_stopped(task)) {
> > +		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
> > +		signal_wake_up(task, 1);
> > +	}
> > +
> > +	spin_unlock(&task->sighand->siglock);
> 
> Well. I do not know whether this matters, but "hide the transition from
> userland" is not 100% correct. I mean, this change is still visible.
> 
> ptrace_check_attach()->wait_on_bit() logic fixes the previous example,
> but:
> 
> 	1. the tracer knows that the tracee is stopped
> 
> 	2. the tracer does ptrace(ATTACH)
> 
> 	3. the tracer does do_wait()
> 
> In this case do_wait() can see the tracee in TASK_RUNNING state,
> this breaks wait_task_stopped(ptrace => true).
> 
> Jan?

I see.  I can move the transition wait logic into PTRACE_ATTACH.
Would that be good enough?

This is also related to how to wait for attach completion for a new
more transparent attach.  Would it be better for such a request to
make sure the operation to complete before returning or is it
preferable to keep using wait(2) for that?  We'll probably be able to
share the transition wait logic with it.  I think it would be better
to return after the attach is actually complete but is there any
reason that I'm missing which makes using wait(2) preferrable?

> > @@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
> >  		 */
> >  		sig->group_exit_code = signr;
> >
> > -		current->group_stop = gstop;
> > +		current->group_stop &= ~GROUP_STOP_SIGMASK;
> > +		current->group_stop |= signr | gstop;
> >  		sig->group_stop_count = 1;
> > -		for (t = next_thread(current); t != current; t = next_thread(t))
> > +		for (t = next_thread(current); t != current;
> > +		     t = next_thread(t)) {
> > +			t->group_stop &= ~GROUP_STOP_SIGMASK;
> >  			/*
> >  			 * Setting state to TASK_STOPPED for a group
> >  			 * stop is always done with the siglock held,
> >  			 * so this check has no races.
> >  			 */
> >  			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
> > -				t->group_stop = gstop;
> > +				t->group_stop |= signr | gstop;
> >  				sig->group_stop_count++;
> >  				signal_wake_up(t, 0);
> > -			} else
> > +			} else {
> >  				task_clear_group_stop(t);
> 
> This looks racy. Suppose that "current" is ptraced, in this case
> it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
> is set and we have another TASK_STOPPED thead T.
> 
> Suppose that another (or same) debugger ataches to this thread T,
> wakes it up and sets GROUP_STOP_TRAPPING.
> 
> T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
> ->siglock.
> 
> Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).
> 
> I think ptrace_stop() should be called in TASK_RUNNING state.
> This also makes sense because we may call arch_ptrace_stop().

I'm feeling a bit too dense to process the above right now.  I'll
respond to the above next morning after a strong cup of coffee. :-)

> > +				t->group_stop |= signr;
> 
> Probably this doesn't really matter, but why do we need to
> change the GROUP_STOP_SIGMASK part of t->group_stop? If it
> is exiting, this is not needed. If it is already stopped, then
> it already has the correct (previous) signr.

The GROUP_STOP_SIGMASK part of t->group_stop is supposed to track the
signr of the latest stop attempt.  Hmmm... but yeah, not changing the
value there would result in more consistent behavior.  Updating...

> > +			}
> > +		}
> >  	}
> > -
> > +retry:
> >  	current->exit_code = sig->group_exit_code;
> >  	__set_current_state(TASK_STOPPED);
> 
> It is no longer needed to set ->exit_code here. The only reason
> was s/STOPPED/TRACED/ change in ptrace_check_attach(). Now that
> we rely on ptrace_stop() which sets ->exit_state, this can be
> removed.

I see.  Removed.

> And,
> 
> > @@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
> >
> >  		spin_lock_irq(&current->sighand->siglock);
> >  	} else
> > -		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> > +		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> > +			    CLD_STOPPED, 0, NULL);
> 
> Perhaps it would be more clean to clear ->exit_code here, in the
> "else" branch.

Hmmm... and dropping current->exit_code clearing from the
do_signal_stop(), right?  I'm a bit confused about the use of
current->exit_code tho.  Why aren't we clearing it from ptrace_stop()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-21 17:31     ` Tejun Heo
@ 2010-12-21 17:32       ` Tejun Heo
  2010-12-22 10:54       ` Tejun Heo
  2010-12-22 11:39       ` Oleg Nesterov
  2 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:32 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Tue, Dec 21, 2010 at 06:31:55PM +0100, Tejun Heo wrote:
> > > @@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
> > >
> > >  		spin_lock_irq(&current->sighand->siglock);
> > >  	} else
> > > -		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> > > +		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> > > +			    CLD_STOPPED, 0, NULL);
> > 
> > Perhaps it would be more clean to clear ->exit_code here, in the
> > "else" branch.
> 
> Hmmm... and dropping current->exit_code clearing from the
> do_signal_stop(), right?  I'm a bit confused about the use of
> current->exit_code tho.  Why aren't we clearing it from ptrace_stop()?

Ah, never mind.  It's used as the signr return from ptrace signal
trap.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/16] signal: prepare for CLD_* notification changes
  2010-12-20 16:21   ` Oleg Nesterov
  2010-12-20 16:23     ` Oleg Nesterov
@ 2010-12-21 17:35     ` Tejun Heo
  1 sibling, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Mon, Dec 20, 2010 at 05:21:20PM +0100, Oleg Nesterov wrote:
> > @@ -1640,6 +1684,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
> >   	}
> >
> >  	sighand = parent->sighand;
> > +	sig = parent->signal;
> 
> This looks unneeded.

Yeah, it's unneeded but sighand and sig are used earlier in the
function to point to the respective structures of the child while in
the latter part for the parent.  I was afraid a later change to the
parent part might incorrectly use sig not realizing that it's still
pointing to the child->signal, so I think it would be better to keep
the unnecessary chunk for now.  Or we can introduce separate variables
for child and parent.  Hmmm... yeah, that actually would be better, I
think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-20 17:34   ` Oleg Nesterov
@ 2010-12-21 17:43     ` Tejun Heo
  2010-12-22 11:54       ` Oleg Nesterov
  0 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:43 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Mon, Dec 20, 2010 at 06:34:25PM +0100, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
> >
> > This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
> > group stop completion and cleared on notification to the real parent
> > or together with other stopped flags on SIGCONT/KILL.  This guarantees
> > that the real parent is notified correctly regardless of ptrace.
> 
> OK, I am a bit confused. I do not understand exactly what this
> "correctly" actually means.

It means that the ptracer wouldn't eat the notification.  The
notification is buffered and delivered when ptrace detaches.

> > If a ptraced task is the last task to stop, the notification is
> > postponed till ptrace detach or canceled if SIGCONT/KILL is
> > received inbetween.
> 
> OK. But what if the last task to stop is not ptraced? In this case
> ->real_parent gets the notification.
> 
> Of course, the current behaviour is not better, it is obviously wrong.
> But if we want to fix things, perhaps we should invite the new and
> clear rules. Isn't it better to always notify ->real_parent when the
> last thread stops in STOPPED/TRACED state? Otherwise the behaviour is
> not predictable, it depends on task_ptrace() state of the last thead
> which sees SIGNAL_NOTIFY_STOP.

I see.  My focus was to make ptrace attach/detach transparent.  IOW,
minimizing the effect of a debugger (or gcore or whatever) attaching
and then leaving.  So, this patch just makes sure that the
notification isn't absorbed by a ptracer.

> Actually, I think it would be even better to never notify ->real_parent
> until debugger detaches all threads, but this is not simple to implement.

But, yes, this is gonna change the behavior in subtle ways anyway so
it would be better to take care of that too.

> > @@ -1901,21 +1925,12 @@ retry:
> >  	__set_current_state(TASK_STOPPED);
> >
> >  	if (likely(!task_ptrace(current))) {
> > -		int notify = 0;
> > -
> > -		/*
> > -		 * If there are no other threads in the group, or if there
> > -		 * is a group stop in progress and we are the last to stop,
> > -		 * report to the parent.
> > -		 */
> > -		if (task_participate_group_stop(current))
> > -			notify = CLD_STOPPED;
> > -
> > +		task_participate_group_stop(current);
> >  		spin_unlock_irq(&current->sighand->siglock);
> >
> > -		if (notify) {
> > +		if (sig->flags & SIGNAL_NOTIFY_STOP) {
> >  			read_lock(&tasklist_lock);
> > -			do_notify_parent_cldstop(current, notify);
> > +			do_notify_parent_cldstop(current, CLD_STOPPED);
> 
> Suppose that debugger attaches right after spin_unlock(->siglock).
> 
> Nothing really bad can happen afaics, but in this case the debugger
> will be notified twice. Hmm. If the debugger does do_wait() immediately
> after the first notification, it has all rights to see the stopped
> tracee but wait_task_stopped() fails, not good.

Hmmm?  ptrace_attach() can't happen while tasklist_lock is held.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-20 19:43   ` Oleg Nesterov
@ 2010-12-21 17:48     ` Tejun Heo
  2010-12-22 12:16       ` Oleg Nesterov
  0 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:48 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Oleg.

On Mon, Dec 20, 2010 at 08:43:54PM +0100, Oleg Nesterov wrote:
> I must admit, I have some concerns.

Hey, join the club.

> But most probably I just need to actually understand what this patch
> does.

The bigger question is about the direction tho.  The patchset
definitely still has enough holes but do you in general agree with
where it's headed?  If so, maybe it would be a good idea to set up an
incremental devel branch somewhere?  We definitely need more review
rounds but given the fragility of the whole thing I think it would be
wiser to keep the spotted problems and fixes in changelog after the
general direction is agreed on.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace()
  2010-12-20 18:15   ` Oleg Nesterov
@ 2010-12-21 17:54     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-21 17:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On Mon, Dec 20, 2010 at 07:15:16PM +0100, Oleg Nesterov wrote:
> OK. Of course, I do not blame this patch, this mimics the current
> behaviour.
> 
> But, afaics, this is not exactly right in the long term. Suppose
> that SIGNAL_STOP_STOPPED is set but the tracee is running (this can
> happen if, say, debugger resumes the tracee and exits). In this case,
> I think this thread should be stopped too.

Yes, that would be the more consistent behavior.

> IIRC, I already tried to do this, but the patch (or idea) was nacked
> because it means another user-visible change. However, if we want to
> really fix things, we should fix this case too. If SIGNAL_STOP_STOPPED
> is set, there should be no running threads after detach.

Aside from the user-visible change part, I don't think the
implemnentation would be difficult.

> Or. We can change the rules for ptrace_resume(), more on this later.

You haven't written this yet, right?  (I reconfigured / migrated my
mail setup during past few days so things are still a bit shaky.)

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
  2010-12-07  0:10   ` Roland McGrath
@ 2010-12-21 17:54   ` Oleg Nesterov
  2010-12-22 10:36     ` Tejun Heo
  1 sibling, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-21 17:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/06, Tejun Heo wrote:
>
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -391,8 +391,6 @@ int ptrace_detach(struct task_struct *child, unsigned int data)
>  	if (child->ptrace) {
>  		child->exit_code = data;
>  		dead = __ptrace_detach(current, child);
> -		if (!child->exit_state)
> -			wake_up_process(child);

I can only repeat that I obviously like this change ;)

Now that Jan and Roland do not object, perhaps it can be sent as 2/X.
This is bugfix, after all (like the first patch).

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-21 17:25   ` Oleg Nesterov
@ 2010-12-22 10:35     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 10:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Oleg.

On Tue, Dec 21, 2010 at 06:25:16PM +0100, Oleg Nesterov wrote:
> Two threads, T1 and T2. T1 is ptraced, T2 is not.
> 
> SIGSTOP stops them both. T1 sleeps in TASK_TRACED, T2 in TASK_STOPPED.
> 
> prepare_signal(SIGCONT) sets SIGNAL_NOTIFY_CONT + SIGNAL_CLD_CONTINUED,
> and wakes T2 up.
> 
> T2 notifies its ->real_parent, clears SIGNAL_NOTIFY_CONT.
> 
> Debugger does ptrace(PTRACE_DETACH, T1), sees SIGNAL_CLD_MASK, and
> restores SIGNAL_NOTIFY_CONT.
> 
> T1 resends the (bogus) notification to its (and T2's) real_parent.

You're right.  Any thread which notifies the real parent should clear
the pending status.

> Even if I missed something,
> 
> > @@ -245,6 +273,14 @@ int ptrace_attach(struct task_struct *task)
> >  		signal_wake_up(task, 1);
> >  	}
> >
> > +	/*
> > +	 * Clear SIGNAL_CLD_MASK if NOTIFY_CONT is not set.  This is
> > +	 * used to preserve SIGCONT notification across ptrace
> > +	 * attach/detach.  Read the comment in __ptrace_unlink().
> > +	 */
> > +	if (!(task->signal->flags & SIGNAL_NOTIFY_CONT))
> > +		task->signal->flags &= ~SIGNAL_CLD_MASK;
> 
> What if there is another ptraced sub-thread in this group who "owes"
> the notification ?

Okay, hmmm... yeah, this is problematic.  Man, I suck.

Combined with the previously pointed out issue with indeterministic
behavior regarding stop notification, maybe it's better to reimplement
something which blocks both stop and cont notifications while any
thread in the group is ptraced and reissues them when all detach?  So
that we can get both transparent and more predictable behavior?

> > +              * Force the tracee into signal delivery path so that
> > +              * the notification is delievered ASAP.  This wakeup
> > +              * is unintrusive as SIGCONT delivery would have
> > +              * caused the same effect.
> > +              */
> > +             if (!woken_up)
> > +                     signal_wake_up(child, 0);
> 
> Well, signal_wake_up() can really force the tracee into signal delivery.
> It only sets TIF_SIGPENDING, but this can race with recalc_sigpending().

Indeed.

> Oh. This reminds me: http://marc.info/?t=123411921400004
> 
> > @@ -1639,7 +1642,24 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
> >
> >  	switch (why) {
> >  	case CLD_CONTINUED:
> > -		notify = why;
> > +		/*
> > +		 * Notify once if NOTIFY_CONT is set regardless of ptrace.
> > +		 * NOTIFY_CONT will be reinstated on detach if necessary.
> > +		 */
> > +		if (!(sig->flags & SIGNAL_NOTIFY_CONT))
> > +			break;
> > +
> > +		/*
> > +		 * If ptraced, always report CLD_CONTINUED; otherwise,
> > +		 * prepare_signal(SIGCONT) encodes the CLD_ si_code into
> > +		 * SIGNAL_CLD_MASK bits.
> > +		 */
> > +		if (task_ptrace(tsk) || (sig->flags & SIGNAL_CLD_CONTINUED))
> > +			notify = CLD_CONTINUED;
> 
> See the comment on 4/16

Will update.

> > @@ -2015,31 +2035,18 @@ relock:
> >  	 */
> >  	try_to_freeze();
> >
> > -	spin_lock_irq(&sighand->siglock);
> >  	/*
> > -	 * Every stopped thread goes here after wakeup. Check to see if
> > -	 * we should notify the parent, prepare_signal(SIGCONT) encodes
> > -	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
> > +	 * Every stopped thread should go through this function after
> > +	 * waking up.  Check to see if we should notify the parent.
> >  	 */
> > -	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
> > -		int why;
> > -
> > -		if (task_ptrace(current) ||
> > -		    (signal->flags & SIGNAL_CLD_CONTINUED))
> > -			why = CLD_CONTINUED;
> > -		else
> > -			why = CLD_STOPPED;
> > -
> > -		signal->flags &= ~SIGNAL_CLD_MASK;
> > -
> > -		spin_unlock_irq(&sighand->siglock);
> > -
> > +	if (unlikely(current->signal->flags & SIGNAL_NOTIFY_CONT)) {
> 
> I am not sure it is OK to check SIGNAL_NOTIFY_CONT without ->siglock.
> If we return from do_signal_stop(), everything is fine.
> 
> But if we got here because of __ptrace_unlink()->signal_wake_up(1),
> we can miss SIGNAL_NOTIFY_CONT.

This probably should be resolved together with the above
recalc_sigpending() issue.  I'll think about it.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-21 17:54   ` Oleg Nesterov
@ 2010-12-22 10:36     ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 10:36 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Tue, Dec 21, 2010 at 06:54:39PM +0100, Oleg Nesterov wrote:
> On 12/06, Tejun Heo wrote:
> >
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -391,8 +391,6 @@ int ptrace_detach(struct task_struct *child, unsigned int data)
> >  	if (child->ptrace) {
> >  		child->exit_code = data;
> >  		dead = __ptrace_detach(current, child);
> > -		if (!child->exit_state)
> > -			wake_up_process(child);
> 
> I can only repeat that I obviously like this change ;)
> 
> Now that Jan and Roland do not object, perhaps it can be sent as 2/X.
> This is bugfix, after all (like the first patch).

I'll send out the acked ones as a separate patchset so that they can
be merged.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-21 17:31     ` Tejun Heo
  2010-12-21 17:32       ` Tejun Heo
@ 2010-12-22 10:54       ` Tejun Heo
  2010-12-22 11:39       ` Oleg Nesterov
  2 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 10:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On Tue, Dec 21, 2010 at 06:31:55PM +0100, Tejun Heo wrote:
> > This looks racy. Suppose that "current" is ptraced, in this case
> > it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
> > is set and we have another TASK_STOPPED thead T.
> > 
> > Suppose that another (or same) debugger ataches to this thread T,
> > wakes it up and sets GROUP_STOP_TRAPPING.
> > 
> > T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
> > ->siglock.
> > 
> > Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).
> > 
> > I think ptrace_stop() should be called in TASK_RUNNING state.
> > This also makes sense because we may call arch_ptrace_stop().
> 
> I'm feeling a bit too dense to process the above right now.  I'll
> respond to the above next morning after a strong cup of coffee. :-)

Ah, right, the lock drop across arch_ptrace_stop().  Yeah, I agree
calling ptrace_stop() with TASK_RUNNING would solve it.  I'll think
about it a bit more.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-21 17:31     ` Tejun Heo
  2010-12-21 17:32       ` Tejun Heo
  2010-12-22 10:54       ` Tejun Heo
@ 2010-12-22 11:39       ` Oleg Nesterov
  2010-12-22 15:14         ` Tejun Heo
  2 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 11:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/21, Tejun Heo wrote:
>
> On Mon, Dec 20, 2010 at 04:00:37PM +0100, Oleg Nesterov wrote:
> > > +
> > > +				wait_on_bit(&child->group_stop, bit,
> >
> > Hmm. we could probably use ->wait_chldexit/__wake_up_parent instead,
> > although I am not sure this would be more clean...
>
> Hmmmm, I actually think that would be cleaner.  I just didn't know it
> was there.  Will convert over to it.

__wake_up_parent() needs tasklist to pin ->parent. But probably in
this particular case we can rely on rcu, or even ->siglock (given
that attach/detach take this lock too).

> > This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
> > ptrace_check_attach() can return the wrong ESRCH after that. Perhaps
> > it is time to kill the CLONE_STOPPED code in do_fork().
>
> Ah, thanks for spotting it.  I missed that.  We should be able to
> convert it to call ptrace_stop(), right?

Perhaps... But then we should wakeup the new child. Perhaps we can
just kill that code, CLONE_STOPPED is deprecated and triggers the
warning since bdff746a (Feb 4 2008).

> > ptrace_check_attach()->wait_on_bit() logic fixes the previous example,
> > but:
> >
> > 	1. the tracer knows that the tracee is stopped
> >
> > 	2. the tracer does ptrace(ATTACH)
> >
> > 	3. the tracer does do_wait()
> >
> > In this case do_wait() can see the tracee in TASK_RUNNING state,
> > this breaks wait_task_stopped(ptrace => true).
> >
> > Jan?
>
> I see.  I can move the transition wait logic into PTRACE_ATTACH.
> Would that be good enough?

Yes, I thought about this too. But ptrace's semantics is really strange,
even if we move wait_on_bit() into ptrace_attach() we still have a
user-visible change.

sys_ptrace() only works for the single thread who did PTRACE_ATTACH,
but do_wait() should work for its sub-threads.

	1. the tracer knows that the tracee is stopped

	2. the tracer does ptrace(ATTACH)

	3. the tracer's sub-thread does do_wait()

Note! Personally I think we can ignore this "problem", I do not
think it can break anything except some specialized test-case.

> This is also related to how to wait for attach completion for a new
> more transparent attach.  Would it be better for such a request to
> make sure the operation to complete before returning or is it
> preferable to keep using wait(2) for that?  We'll probably be able to
> share the transition wait logic with it.  I think it would be better
> to return after the attach is actually complete but is there any
> reason that I'm missing which makes using wait(2) preferrable?

Oh, I do not know. This is the main problem with ptrace. You can
always understand what the code does, but you can never know what
was the supposed behaviour ;)

That is why I am asking Jan and Roland who understand the userland
needs.

Personally, I _think_ it makes sense to keep do_wait() working after
ptrace_attach(), if it is called by the thread which did attach.
But perhaps even this is not really important.

> @@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
> > >  		 */
> > >  		sig->group_exit_code = signr;
> > >
> > > -		current->group_stop = gstop;
> > > +		current->group_stop &= ~GROUP_STOP_SIGMASK;
> > > +		current->group_stop |= signr | gstop;
> > >  		sig->group_stop_count = 1;
> > > -		for (t = next_thread(current); t != current; t = next_thread(t))
> > > +		for (t = next_thread(current); t != current;
> > > +		     t = next_thread(t)) {
> > > +			t->group_stop &= ~GROUP_STOP_SIGMASK;
> > >  			/*
> > >  			 * Setting state to TASK_STOPPED for a group
> > >  			 * stop is always done with the siglock held,
> > >  			 * so this check has no races.
> > >  			 */
> > >  			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
> > > -				t->group_stop = gstop;
> > > +				t->group_stop |= signr | gstop;
> > >  				sig->group_stop_count++;
> > >  				signal_wake_up(t, 0);
> > > -			} else
> > > +			} else {
> > >  				task_clear_group_stop(t);
> >
> > This looks racy. Suppose that "current" is ptraced, in this case
> > it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
> > is set and we have another TASK_STOPPED thead T.
> >
> > Suppose that another (or same) debugger ataches to this thread T,
> > wakes it up and sets GROUP_STOP_TRAPPING.
> >
> > T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
> > ->siglock.
> >
> > Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).
> >
> > I think ptrace_stop() should be called in TASK_RUNNING state.
> > This also makes sense because we may call arch_ptrace_stop().
>
> I'm feeling a bit too dense to process the above right now.  I'll
> respond to the above next morning after a strong cup of coffee. :-)

OK ;)

But look. Even if the race doesn't exist. ptrace_stop() can drop
->siglock and call arch_ptrace_stop() which can fault/sleep/whatever.
I think this doesn't really matter, but otoh it would be more clean
to do this in TASK_RUNNING state anyway. At least, in anny case
arch_ptrace_stop() can return in TASK_RUNNING.

> > > @@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
> > >
> > >  		spin_lock_irq(&current->sighand->siglock);
> > >  	} else
> > > -		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> > > +		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> > > +			    CLD_STOPPED, 0, NULL);
> >
> > Perhaps it would be more clean to clear ->exit_code here, in the
> > "else" branch.
>
> Hmmm... and dropping current->exit_code clearing from the
> do_signal_stop(), right?  I'm a bit confused about the use of
> current->exit_code tho.

Oh, the right answer is: ptrace shouldn't use ->exit_code at all ;)
And its usage is very confusing.

> Why aren't we clearing it from ptrace_stop()?

ptrace_report_syscall() and ptrace_signal() check ->exit_code after
return from ptrace_stop(), otherwise we ignore the "data" argument
of ptrace_resume/ptrace_detach.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-21 17:43     ` Tejun Heo
@ 2010-12-22 11:54       ` Oleg Nesterov
  2010-12-22 15:26         ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 11:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/21, Tejun Heo wrote:
>
> On Mon, Dec 20, 2010 at 06:34:25PM +0100, Oleg Nesterov wrote:
> > On 12/06, Tejun Heo wrote:
> > >
> > > This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
> > > group stop completion and cleared on notification to the real parent
> > > or together with other stopped flags on SIGCONT/KILL.  This guarantees
> > > that the real parent is notified correctly regardless of ptrace.
> >
> > OK, I am a bit confused. I do not understand exactly what this
> > "correctly" actually means.
>
> It means that the ptracer wouldn't eat the notification.  The
> notification is buffered and delivered when ptrace detaches.

Yes, I understand. I am a bit worried it is not easy to describe the
new behaviour exactly.

> I see.  My focus was to make ptrace attach/detach transparent.  IOW,
> minimizing the effect of a debugger (or gcore or whatever) attaching
> and then leaving.  So, this patch just makes sure that the
> notification isn't absorbed by a ptracer.

Agreed. And the code itself certainly becomes correct/consistent,
contrary to "everything is broken" we currently have.


Tejun, I'll try to summarize my (very foggy) concerns in a separate
email. Don't get me wrong, I think this series rightly addresses the
numerous problems we have. My only question is, can't we go a bit
further and create the new (and simple) rules. Probably not.

> > > @@ -1901,21 +1925,12 @@ retry:
> > >  	__set_current_state(TASK_STOPPED);
> > >
> > >  	if (likely(!task_ptrace(current))) {
> > > -		int notify = 0;
> > > -
> > > -		/*
> > > -		 * If there are no other threads in the group, or if there
> > > -		 * is a group stop in progress and we are the last to stop,
> > > -		 * report to the parent.
> > > -		 */
> > > -		if (task_participate_group_stop(current))
> > > -			notify = CLD_STOPPED;
> > > -
> > > +		task_participate_group_stop(current);
> > >  		spin_unlock_irq(&current->sighand->siglock);
> > >
> > > -		if (notify) {
> > > +		if (sig->flags & SIGNAL_NOTIFY_STOP) {
> > >  			read_lock(&tasklist_lock);
> > > -			do_notify_parent_cldstop(current, notify);
> > > +			do_notify_parent_cldstop(current, CLD_STOPPED);
> >
> > Suppose that debugger attaches right after spin_unlock(->siglock).
> >
> > Nothing really bad can happen afaics, but in this case the debugger
> > will be notified twice. Hmm. If the debugger does do_wait() immediately
> > after the first notification, it has all rights to see the stopped
> > tracee but wait_task_stopped() fails, not good.
>
> Hmmm?  ptrace_attach() can't happen while tasklist_lock is held.

Sure, but is is not held after we drop ->siglock. And ptrace_attach() can
happen in the window before we take it for do_notify_parent_cldstop().

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace
  2010-12-21 17:48     ` Tejun Heo
@ 2010-12-22 12:16       ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 12:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/21, Tejun Heo wrote:
>
> but do you in general agree with
> where it's headed?

Absolutely.

Except. I am not sure we understand the desired behaviour for
CLD_CONTINUED. To the point, perhaps the debugger does not
need it at all?


Again, I'll try to summarize my questions later today.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-22 11:39       ` Oleg Nesterov
@ 2010-12-22 15:14         ` Tejun Heo
  2010-12-22 16:00           ` Oleg Nesterov
  0 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 15:14 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Wed, Dec 22, 2010 at 12:39:48PM +0100, Oleg Nesterov wrote:
> > > This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
> > > ptrace_check_attach() can return the wrong ESRCH after that. Perhaps
> > > it is time to kill the CLONE_STOPPED code in do_fork().
> >
> > Ah, thanks for spotting it.  I missed that.  We should be able to
> > convert it to call ptrace_stop(), right?
> 
> Perhaps... But then we should wakeup the new child. Perhaps we can
> just kill that code, CLONE_STOPPED is deprecated and triggers the
> warning since bdff746a (Feb 4 2008).

I see.  Added a patch to kill CLONE_STOPPED.

> > I see.  I can move the transition wait logic into PTRACE_ATTACH.
> > Would that be good enough?
> 
> Yes, I thought about this too. But ptrace's semantics is really strange,
> even if we move wait_on_bit() into ptrace_attach() we still have a
> user-visible change.
> 
> sys_ptrace() only works for the single thread who did PTRACE_ATTACH,
> but do_wait() should work for its sub-threads.
> 
> 	1. the tracer knows that the tracee is stopped
> 
> 	2. the tracer does ptrace(ATTACH)
> 
> 	3. the tracer's sub-thread does do_wait()
> 
> Note! Personally I think we can ignore this "problem", I do not
> think it can break anything except some specialized test-case.

But if ptrace(ATTACH) doesn't return until the transition is complete
when the task is already stopped, the tracer's sub-thread's do_wait()
will behave exactly the same.  The only difference would be that
ptrace(ATTACH) may now block and/or is failed by a signal delivery.

How would #3 behave differently if STOPPED -> TRACED transition is
guaranteed to be complete by the end of #2?

> > This is also related to how to wait for attach completion for a new
> > more transparent attach.  Would it be better for such a request to
> > make sure the operation to complete before returning or is it
> > preferable to keep using wait(2) for that?  We'll probably be able to
> > share the transition wait logic with it.  I think it would be better
> > to return after the attach is actually complete but is there any
> > reason that I'm missing which makes using wait(2) preferrable?
> 
> Oh, I do not know. This is the main problem with ptrace. You can
> always understand what the code does, but you can never know what
> was the supposed behaviour ;)
> 
> That is why I am asking Jan and Roland who understand the userland
> needs.
> 
> Personally, I _think_ it makes sense to keep do_wait() working after
> ptrace_attach(), if it is called by the thread which did attach.
> But perhaps even this is not really important.

Hmmm... I see.  After this fix / cleanup rounds are complete, I'll
just write up something.  It would be much easier to decide which way
to go with a working implementation and switching between wait(2)
based one and with implicit wait shouldn't be too difficult anyway.

> > @@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
> > > >  		 */
> > > >  		sig->group_exit_code = signr;
> > > >
> > > > -		current->group_stop = gstop;
> > > > +		current->group_stop &= ~GROUP_STOP_SIGMASK;
> > > > +		current->group_stop |= signr | gstop;
> > > >  		sig->group_stop_count = 1;
> > > > -		for (t = next_thread(current); t != current; t = next_thread(t))
> > > > +		for (t = next_thread(current); t != current;
> > > > +		     t = next_thread(t)) {
> > > > +			t->group_stop &= ~GROUP_STOP_SIGMASK;
> > > >  			/*
> > > >  			 * Setting state to TASK_STOPPED for a group
> > > >  			 * stop is always done with the siglock held,
> > > >  			 * so this check has no races.
> > > >  			 */
> > > >  			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
> > > > -				t->group_stop = gstop;
> > > > +				t->group_stop |= signr | gstop;
> > > >  				sig->group_stop_count++;
> > > >  				signal_wake_up(t, 0);
> > > > -			} else
> > > > +			} else {
> > > >  				task_clear_group_stop(t);
> > >
> > > This looks racy. Suppose that "current" is ptraced, in this case
> > > it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
> > > is set and we have another TASK_STOPPED thead T.
> > >
> > > Suppose that another (or same) debugger ataches to this thread T,
> > > wakes it up and sets GROUP_STOP_TRAPPING.
> > >
> > > T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
> > > ->siglock.

On resume, T is in TASK_RUNNING and the lock is only dropped in
ptrace_stop() if arch_ptrace_stop_needed() is true.

> > > Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).
> > >
> > > I think ptrace_stop() should be called in TASK_RUNNING state.
> > > This also makes sense because we may call arch_ptrace_stop().
> >
> > I'm feeling a bit too dense to process the above right now.  I'll
> > respond to the above next morning after a strong cup of coffee. :-)
> 
> OK ;)
> 
> But look. Even if the race doesn't exist. ptrace_stop() can drop
> ->siglock and call arch_ptrace_stop() which can fault/sleep/whatever.

So, yes, the temporary lock dropping can definitely confuse
ptrace_check_attach().

> I think this doesn't really matter, but otoh it would be more clean
> to do this in TASK_RUNNING state anyway. At least, in anny case
> arch_ptrace_stop() can return in TASK_RUNNING.

I agree.  I'll update toward that direction.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2
  2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
                   ` (16 preceding siblings ...)
  2010-12-14 17:36 ` [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Oleg Nesterov
@ 2010-12-22 15:20 ` Oleg Nesterov
  17 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 15:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/21, Tejun Heo wrote:
>
> > Or. We can change the rules for ptrace_resume(), more on this later.
>
> You haven't written this yet, right?  (I reconfigured / migrated my
> mail setup during past few days so things are still a bit shaky.)

I am moving this to 0/16 to get more attention from everyone.

First of all, I'd like to clarify that I am not arguing with these
changes. Quite contrary, I think this is the good step in the right
direction imho. In this email, I do not try to comment this series,
I am going to ask the questions.

My concern is: we never tried to discuss the desired behaviour as
it seen by the user-space.



To simplify the discussion, let's assume that debugger != real_parent.
Now, what should we actually do if the tracee starts/completes the
group stop?

To me, the only obvious thing is that each thread should report
CLD_STOPPED to the debugger. Everything else is not clear to me. How
and when we should notify real_parent? What should we do if tracee
is multithreaded and some threads are not traced? (in the latter
case we can't know which thread completes the group stop and sends
the final notification).

Probably we can delay this notification until the debugger detaches
all threads. This makes sense because the debugger can resume the
stopped thread and confuse its real_parent (say, /bin/sh) who has
all rights to assume the child can't run without the subsequent
CLD_CONTINUED. However, this doesn't look very good. This doesn't
allow to write the "really transparent" strace, if the tracee was
stopped by SIGSTOP this should be visible to its real_parent who
probably owns this application and should react (again, sh/fg/bg).

So. I think that probably we need some very simple and predictable
behaviour, even if this implies the user-visible changes. If nothing
else, any fix in this area is visible to user-space. To me, the
best behaviour is

	- each thread notifies the debugger (if it is traced)

	- when the last thread stops, it also notifies its
	  real_parent. IOW, it can send two notifications if
	  it is traced.

	  (This differs from the logic in 12/16)

But, again, this means we are trying to fool the poor real_parent
who does do_wait() and doesn't expect the child can suddenly run
because of PTRACE_CONT/etc which does the unconditional wakeup.

A bit off-topic, but can't resist. I like very much what utrace
does in this case. Since it doesn't use these notifications (in
fact it doesn't use signals/reparenting at all) we do not have
any problems with parent/real_parent mess. And, utrace does _not_
resume the stopped tracee. If the debugger wants to resume a
thread in the SIGNAL_STOP_STOPPED group, it should send SIGCONT
and this is visible to the real_parent. But of course, we can't
change ptrace this way.

However. Any chance we can change ptrace_resume() so that it won't
break SIGNAL_STOP_STOPPED contract? Roughly, instead of unconditional
wake_up_process(child) ptrace_resume() should do

	if (child->signal->flags & SIGNAL_STOP_STOPPED)
		prepare_signal(SIGCONT);
	wake_up_state(child, __TASK_TRACED);

(of course, we should not literally use prepare_signal(), only to
 explain what I mean).

IOW, if we are going to resume the tracee and its thread group
is stopped, we notify the real_parent and wakeup all TASK_STOPPED
(or non-ptraced) sub-threads.

Sure, this is the serious change. But otherwise, imho whatever we
do the end result is not sane.

Thoughts?


As for CLD_CONTINUED, basically the same questions (in particular,
I think that real_parent should be notified unconditionally). Except,
perhaps the debugger doesn't need it at all?

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-22 11:54       ` Oleg Nesterov
@ 2010-12-22 15:26         ` Tejun Heo
  2010-12-22 16:02           ` Oleg Nesterov
  0 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 15:26 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On Wed, Dec 22, 2010 at 12:54:09PM +0100, Oleg Nesterov wrote:
> On 12/21, Tejun Heo wrote:
> > I see.  My focus was to make ptrace attach/detach transparent.  IOW,
> > minimizing the effect of a debugger (or gcore or whatever) attaching
> > and then leaving.  So, this patch just makes sure that the
> > notification isn't absorbed by a ptracer.
> 
> Agreed. And the code itself certainly becomes correct/consistent,
> contrary to "everything is broken" we currently have.
> 
> Tejun, I'll try to summarize my (very foggy) concerns in a separate
> email. Don't get me wrong, I think this series rightly addresses the
> numerous problems we have. My only question is, can't we go a bit
> further and create the new (and simple) rules. Probably not.

Yeah, definitely, if we're gonna make some userland visible changes,
let's get it right once and for all (at least in terms of the intended
behaviors, that is).  It's sure gonna take more work but I think it'll
be manageable both in terms of the required effort and implementation
complexity.  Well, actually, in terms of the latter, I think we're
likely to improve the situation by making the rules and intentions
clear.

> > > > @@ -1901,21 +1925,12 @@ retry:
> > > >  	__set_current_state(TASK_STOPPED);
> > > >
> > > >  	if (likely(!task_ptrace(current))) {
> > > > -		int notify = 0;
> > > > -
> > > > -		/*
> > > > -		 * If there are no other threads in the group, or if there
> > > > -		 * is a group stop in progress and we are the last to stop,
> > > > -		 * report to the parent.
> > > > -		 */
> > > > -		if (task_participate_group_stop(current))
> > > > -			notify = CLD_STOPPED;
> > > > -
> > > > +		task_participate_group_stop(current);
> > > >  		spin_unlock_irq(&current->sighand->siglock);
> > > >
> > > > -		if (notify) {
> > > > +		if (sig->flags & SIGNAL_NOTIFY_STOP) {
> > > >  			read_lock(&tasklist_lock);
> > > > -			do_notify_parent_cldstop(current, notify);
> > > > +			do_notify_parent_cldstop(current, CLD_STOPPED);
> > >
> > > Suppose that debugger attaches right after spin_unlock(->siglock).
> > >
> > > Nothing really bad can happen afaics, but in this case the debugger
> > > will be notified twice. Hmm. If the debugger does do_wait() immediately
> > > after the first notification, it has all rights to see the stopped
> > > tracee but wait_task_stopped() fails, not good.
> >
> > Hmmm?  ptrace_attach() can't happen while tasklist_lock is held.
> 
> Sure, but is is not held after we drop ->siglock. And ptrace_attach() can
> happen in the window before we take it for do_notify_parent_cldstop().

I thought the code snippet was from inside do_notify_parent_cldstop()
for some reason.  Okay, so the debugger can attach there
and... hmmm... right.  Yeah, the debugger gets the extra notification.
I don't think the previous code fared any better tho.  Anyways, I'll
think about how to fix this.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-22 15:14         ` Tejun Heo
@ 2010-12-22 16:00           ` Oleg Nesterov
  2010-12-22 16:21             ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 16:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/22, Tejun Heo wrote:
>
> On Wed, Dec 22, 2010 at 12:39:48PM +0100, Oleg Nesterov wrote:
> >
> > sys_ptrace() only works for the single thread who did PTRACE_ATTACH,
> > but do_wait() should work for its sub-threads.
> >
> > 	1. the tracer knows that the tracee is stopped
> >
> > 	2. the tracer does ptrace(ATTACH)
> >
> > 	3. the tracer's sub-thread does do_wait()
> >
> > Note! Personally I think we can ignore this "problem", I do not
> > think it can break anything except some specialized test-case.
>
> But if ptrace(ATTACH) doesn't return until the transition is complete
> when the task is already stopped, the tracer's sub-thread's do_wait()
> will behave exactly the same.  The only difference would be that
> ptrace(ATTACH) may now block and/or is failed by a signal delivery.
>
> How would #3 behave differently if STOPPED -> TRACED transition is
> guaranteed to be complete by the end of #2?

Ahhh, sorry. I meant, two threads can do 2. and 3. at the same time.

But let me repeat, it is not that I think we should worry. I mentioned
this only because I think it is better to discuss everything we can,
even the really minor things.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/16] ptrace: make group stop notification reliable against ptrace
  2010-12-22 15:26         ` Tejun Heo
@ 2010-12-22 16:02           ` Oleg Nesterov
  0 siblings, 0 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-22 16:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/22, Tejun Heo wrote:
>
> I don't think the previous code fared any better tho.

Heh, the previous code is much worse, sure ;)

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
  2010-12-22 16:00           ` Oleg Nesterov
@ 2010-12-22 16:21             ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-22 16:21 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On Wed, Dec 22, 2010 at 05:00:16PM +0100, Oleg Nesterov wrote:
> On 12/22, Tejun Heo wrote:
> >
> > On Wed, Dec 22, 2010 at 12:39:48PM +0100, Oleg Nesterov wrote:
> > >
> > > sys_ptrace() only works for the single thread who did PTRACE_ATTACH,
> > > but do_wait() should work for its sub-threads.
> > >
> > > 	1. the tracer knows that the tracee is stopped
> > >
> > > 	2. the tracer does ptrace(ATTACH)
> > >
> > > 	3. the tracer's sub-thread does do_wait()
> > >
> > > Note! Personally I think we can ignore this "problem", I do not
> > > think it can break anything except some specialized test-case.
> >
> > But if ptrace(ATTACH) doesn't return until the transition is complete
> > when the task is already stopped, the tracer's sub-thread's do_wait()
> > will behave exactly the same.  The only difference would be that
> > ptrace(ATTACH) may now block and/or is failed by a signal delivery.
> >
> > How would #3 behave differently if STOPPED -> TRACED transition is
> > guaranteed to be complete by the end of #2?
> 
> Ahhh, sorry. I meant, two threads can do 2. and 3. at the same time.

Ah, okay, now I see, so WNOHANG wait(2) from a different thread may
fail if it races against ptrace(ATTACH), although it would have
succeeded regardless of the timing between #2 and #3 before the
change.

> But let me repeat, it is not that I think we should worry. I mentioned
> this only because I think it is better to discuss everything we can,
> even the really minor things.

It's definitely worth documenting.  We can close off the condition by
tweaking wait(2) so that it blocks if the target thread is in
transition, but that might be reaching into the realm of unnecessary
over-engineering.  IMHO, it would be wiser to hold off implementing it
until we know it's actually necessary.  Roland, Jan, do you guys think
this can actually cause a problem?

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-06 16:56 ` [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
@ 2010-12-23 12:26   ` Oleg Nesterov
  2010-12-23 13:53     ` Tejun Heo
  2011-01-17 22:09     ` Roland McGrath
  0 siblings, 2 replies; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-23 12:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Jan, Roland, this change needs your review.

As it was already discussed, this is the user-visible change, and
I am starting to worry we can underestimate it.

Again, I am not saying this can break something, I simply do not
know. However, I think there is something non-consistent in the
new behaviour, please see below.

On 12/06, Tejun Heo wrote:
>
> This patch makes do_signal_stop() test whether the task is ptraced and
> use ptrace_stop() if so.

In short: suppose that the tracee recieves a signal, reports it
to debugger, and the debugger does ptrace(PTRACE_CONT, pid, SIGSTOP).

Before the patch the tracee sleeps in TASK_STOPPED, after the patch
it becomes TASK_TRACED.

> Oleg spotted a minor userland visible change.  In some cases, the
> ptracee's state would now be TASK_TRACED where it used to be
> TASK_STOPPED, which is visible via fs/proc.

I missed this part of the changelog. "visible via fs/proc" is not
the only change. Another change is how the tracee reacts to SIGCONT
after that. With this patch it can't wake the tracee up.

Consider the simplest example. The tracee is single-threaded and
debugger == parent. Something like

	int main(void)
	{
		int child, status;

		child = fork();
		if (!child) {
			ptrace(PTRACE_TRACEME);

			kill(getpid(), SIGSTOP);

			return 0;
		}

		wait(&status)
		// the tracee reports the signal
		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
		// it should stop after that
		ptrace(PTRACE_CONT, child, SIGSTOP);

		wait(&status);
		// now it is stopped
		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);

		kill(child, SIGCONT);

		wait(&status);
		assert(WIFSTOPPED() && WSTOPSIG() == SIGCONT);

This won't work with this patch. the last do_wait() will hang forever.
Probably this is fine, I do not know. Please take a look and ack/nack
explicitly.


However. There is something I missed previously, and this small
detail doesn't look good to me: the behaviour of SIGCONT becomes
a bit unpredictable. Suppose it races with do_signal_stop() and
clears GROUP_STOP_PENDING or SIGNAL_STOP_DEQUEUED before, in this
case in can "wakeup" the tracee.

IOW. Let's remove the 2nd wait() in the code above, the parent
does

		wait(&status)
		// the tracee reports the signal
		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
		// it should stop after that
		ptrace(PTRACE_CONT, child, SIGSTOP);

		kill(child, SIGCONT);

Now we can't know id this SIGCONT works or not. If the tracee
is already parked in ptrace_stop() - it doesn't. If the parent
wins - the tracee doesn't stop.



OTOH. Looking at this patch, I can no longer understand why
ptrace_check_attach() can silently do s/STOPPED/TRACED/. Indeed,
as Tejun pointed out, if ptrace_stop() needs arch_ptrace_stop(),
then ptrace_check_attach() should be arch-friendly as well.

So, the patch looks like the bugfix, but I do not understand this
ia64/sparc magic and thus I do not know how important this fix.
Nobody complained so far, though.

Roland, could you comment this part?

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-23 12:26   ` Oleg Nesterov
@ 2010-12-23 13:53     ` Tejun Heo
  2010-12-23 16:06       ` Oleg Nesterov
  2011-01-17 22:09     ` Roland McGrath
  1 sibling, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2010-12-23 13:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello,

On Thu, Dec 23, 2010 at 01:26:34PM +0100, Oleg Nesterov wrote:
> I missed this part of the changelog. "visible via fs/proc" is not
> the only change. Another change is how the tracee reacts to SIGCONT
> after that. With this patch it can't wake the tracee up.
> 
> Consider the simplest example. The tracee is single-threaded and
> debugger == parent. Something like
> 
> 	int main(void)
> 	{
> 		int child, status;
> 
> 		child = fork();
> 		if (!child) {
> 			ptrace(PTRACE_TRACEME);
> 
> 			kill(getpid(), SIGSTOP);
> 
> 			return 0;
> 		}
> 
> 		wait(&status)
> 		// the tracee reports the signal
> 		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
> 		// it should stop after that
> 		ptrace(PTRACE_CONT, child, SIGSTOP);
> 
> 		wait(&status);
> 		// now it is stopped
> 		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
> 
> 		kill(child, SIGCONT);
> 
> 		wait(&status);
> 		assert(WIFSTOPPED() && WSTOPSIG() == SIGCONT);
> 
> This won't work with this patch. the last do_wait() will hang forever.
> Probably this is fine, I do not know. Please take a look and ack/nack
> explicitly.

Yes, before the change, the task would respond to SIGCONT before the
first ptrace request succeeds after attach.  To me, this doesn't seem
to be anything intentional tho.  It seems a lot of ptrace and group
stop interactions is in the grey area with only the current (quirky,
I'm afraid) behavior drawing almost arbitrary lines across different
behaviors.

We can try to preserve all those behaviors but I don't think that will
be very beneficial.  I think the best way to proceed would be
identifying the sensible and depended upon assumptions and then draw
clear lines from there stating what's guaranteed and what's undefined.
We'll have to keep some of quirkiness for sure.

Anyways, pondering and verifying all the possibly visible changes
definitely is necessary, but that said, we fortunately have rather
limited number of ptrace users and their usages don't seem to be too
wild (at least on my cursory investigation), so I think it to be
doable without breaking anything noticeably.  But yeap we definitely
need to be careful.

> However. There is something I missed previously, and this small
> detail doesn't look good to me: the behaviour of SIGCONT becomes
> a bit unpredictable. Suppose it races with do_signal_stop() and
> clears GROUP_STOP_PENDING or SIGNAL_STOP_DEQUEUED before, in this
> case in can "wakeup" the tracee.
> 
> IOW. Let's remove the 2nd wait() in the code above, the parent
> does
> 
> 		wait(&status)
> 		// the tracee reports the signal
> 		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
> 		// it should stop after that
> 		ptrace(PTRACE_CONT, child, SIGSTOP);
> 
> 		kill(child, SIGCONT);
> 
> Now we can't know id this SIGCONT works or not. If the tracee
> is already parked in ptrace_stop() - it doesn't. If the parent
> wins - the tracee doesn't stop.

We can change ptrace_stop() to use TASK_STOPPED if it's stopping for
group stop to preserve the original behavior but if it doesn't disturb
current users (and I doubt it would), I think it would be far cleaner
to state that the behavior is undefined.  The current behavior - it
works if there hasn't been whichever ptrace operation inbetween - is
quite unexpected anyway, IMHO.

And, for longer term, I think it would be a good idea to separate
group stop and ptrace trap mechanisms, so that ptrace trap works
properly on per-task level and properly transparent from group stop
handling.  The intertwining between the two across different domains
of threads inhfferently involves a lot of grey areas where there is no
good intuitive behavior.

> OTOH. Looking at this patch, I can no longer understand why
> ptrace_check_attach() can silently do s/STOPPED/TRACED/. Indeed,
> as Tejun pointed out, if ptrace_stop() needs arch_ptrace_stop(),
> then ptrace_check_attach() should be arch-friendly as well.
> 
> So, the patch looks like the bugfix, but I do not understand this
> ia64/sparc magic and thus I do not know how important this fix.
> Nobody complained so far, though.

IIUC, it dumps the register window to userland memory.  ia64 has this
stacked windows of registers which gets wound up and unwound according
to function calls and those need to be dumped to userland memory so
that the debugger can PEEK and POKE them.  Not really sure why
skipping it didn't cause any problem until now tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-23 13:53     ` Tejun Heo
@ 2010-12-23 16:06       ` Oleg Nesterov
  2010-12-23 16:33         ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Oleg Nesterov @ 2010-12-23 16:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On 12/23, Tejun Heo wrote:
>
> > 	int main(void)
> > 	{
> > 		int child, status;
> >
> > 		child = fork();
> > 		if (!child) {
> > 			ptrace(PTRACE_TRACEME);
> >
> > 			kill(getpid(), SIGSTOP);
> >
> > 			return 0;
> > 		}
> >
> > 		wait(&status)
> > 		// the tracee reports the signal
> > 		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
> > 		// it should stop after that
> > 		ptrace(PTRACE_CONT, child, SIGSTOP);
> >
> > 		wait(&status);
> > 		// now it is stopped
> > 		assert(WIFSTOPPED() && WSTOPSIG() == SIGSTOP);
> >
> > 		kill(child, SIGCONT);
> >
> > 		wait(&status);
> > 		assert(WIFSTOPPED() && WSTOPSIG() == SIGCONT);
> >
> > This won't work with this patch. the last do_wait() will hang forever.
> > Probably this is fine, I do not know. Please take a look and ack/nack
> > explicitly.
>
> Yes, before the change, the task would respond to SIGCONT before the
> first ptrace request succeeds after attach.

Not exactly. But perhaps you meant that even without this change,
any ptrace() request after ptrace(PTRACE_CONT, SIGSTOP) will change
child->state = TASK_TRACED, and kill(SIGCONT) won't work after that.

> To me, this doesn't seem
> to be anything intentional tho.  It seems a lot of ptrace and group
> stop interactions is in the grey area with only the current (quirky,
> I'm afraid) behavior drawing almost arbitrary lines across different
> behaviors.

Agreed.

However. Strangely, I didn't think about this before. With this
change, it is not possible to trace/debug the application so that
it can properly react to SIGCONT. Yes, currently we have a lot
more problems here, including do_wait, so probably this doesn't
matter.

Still I'd like to know what Jan and Roland think. I am paranoid,
yes ;)

> Anyways, pondering and verifying all the possibly visible changes
> definitely is necessary, but that said, we fortunately have rather
> limited number of ptrace users and their usages don't seem to be too
> wild (at least on my cursory investigation), so I think it to be
> doable without breaking anything noticeably.  But yeap we definitely
> need to be careful.

Yes, at least I think it makes sense to document this change in the
changelog. This can simplify the life if we have a bug report blaiming
this patch.

> And, for longer term, I think it would be a good idea to separate
> group stop and ptrace trap mechanisms, so that ptrace trap works
> properly on per-task level and properly transparent from group stop
> handling.  The intertwining between the two across different domains
> of threads inhfferently involves a lot of grey areas where there is no
> good intuitive behavior.

Agreed.

> Not really sure why
> skipping it didn't cause any problem until now tho.

Yes, that was my question.

Oleg.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-23 16:06       ` Oleg Nesterov
@ 2010-12-23 16:33         ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2010-12-23 16:33 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

On Thu, Dec 23, 2010 at 05:06:48PM +0100, Oleg Nesterov wrote:
> > Yes, before the change, the task would respond to SIGCONT before the
> > first ptrace request succeeds after attach.
> 
> Not exactly. But perhaps you meant that even without this change,
> any ptrace() request after ptrace(PTRACE_CONT, SIGSTOP) will change
> child->state = TASK_TRACED, and kill(SIGCONT) won't work after that.

Yeap, exactly.

> > To me, this doesn't seem to be anything intentional tho.  It seems
> > a lot of ptrace and group stop interactions is in the grey area
> > with only the current (quirky, I'm afraid) behavior drawing almost
> > arbitrary lines across different behaviors.
> 
> Agreed.
> 
> However. Strangely, I didn't think about this before. With this
> change, it is not possible to trace/debug the application so that
> it can properly react to SIGCONT. Yes, currently we have a lot
> more problems here, including do_wait, so probably this doesn't
> matter.
> 
> Still I'd like to know what Jan and Roland think. I am paranoid,
> yes ;)

Definitely.

Hmm... the mention of SIGCONT behavior reminds me of your mention of
notification behavior on the other message.  I might write a more
detailed reply there but, anyways, I don't think we'll be able to find
one good behavior with the current set of operations.  If we hide the
group stop handling and notification from the ptracer, it would allow,
for example, transparent strace, but it will at the same time restrict
what a debugger can do behind the tracee's back in a regressive way
(ie. the current users would notice).

I think it would be better to introduce a few new ptrace operations
and give them more control in clearly defined and hopefully intutive
way.  For example, a ptracer is notified when a task is stopping for
group stop and it should be able to decide and tell the kernel whether
to participate in the group stop or not.  For SIGCONT, likewise, the
kernel can notify all ptracers in the group and can notify the real
parent if all ptracers decide to participate in the continuation.

That way, debuggers can still work behind the real parent's back (they
often want to) while audit type tools can sniff transparently.  I
think the trickier part would be adapting the existing operations so
that they have clearly defined (mostly) compatible behavior while not
hindering too much with addition of new capabilities.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2010-12-23 12:26   ` Oleg Nesterov
  2010-12-23 13:53     ` Tejun Heo
@ 2011-01-17 22:09     ` Roland McGrath
  2011-01-27 13:56       ` Tejun Heo
  1 sibling, 1 reply; 62+ messages in thread
From: Roland McGrath @ 2011-01-17 22:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

> In short: suppose that the tracee recieves a signal, reports it
> to debugger, and the debugger does ptrace(PTRACE_CONT, pid, SIGSTOP).
> 
> Before the patch the tracee sleeps in TASK_STOPPED, after the patch
> it becomes TASK_TRACED.

I am not immediately comfortable with this.  What really matters is
the userland-visible behavior (ptrace, wait, and signals).  The
/proc/pid/status difference of "T (stopped)" vs "T (traced)" is
technically userland-visible, but I don't think that really matters
to userland.

> I missed this part of the changelog. "visible via fs/proc" is not
> the only change. Another change is how the tracee reacts to SIGCONT
> after that. With this patch it can't wake the tracee up.

This seems like a problem to me, though there are caveats to that.
My rationale has always been that there should be some way to get
into normal job-control stop state, both meaning that group-stop
happens normally and that SIGCONT works normally.  

However, it is not entirely clear how useful that is to userland
given the rest of ptrace behavior as it stands.  That is, the real
parent can't see the job-control stop anyway because its wait
reports are preempted by ptrace, so what is it for?

It would be more clear if we had a cleaner interface available in
which tracing did not prevent normal job-control stops from being
reported to the real parent.  I'm not really sure how much we should
try to change this stuff before adding something like that.

> However. There is something I missed previously, and this small
> detail doesn't look good to me: the behaviour of SIGCONT becomes
> a bit unpredictable. Suppose it races with do_signal_stop() and
> clears GROUP_STOP_PENDING or SIGNAL_STOP_DEQUEUED before, in this
> case in can "wakeup" the tracee.

This sounds to me exactly like the normal race between a SIGSTOP and
a SIGCONT when ptrace is not involved.  It's supposed to be that
way, i.e. generating SIGCONT clears all stop signals whether pending
or in the midst of delivery or already fully delivered.  So, if the
meaning of PTRACE_CONT,SIGSTOP is to behave like a normal SIGSTOP
delivery, then that is what's expected.

> OTOH. Looking at this patch, I can no longer understand why
> ptrace_check_attach() can silently do s/STOPPED/TRACED/. 

The purpose of that is so that when ptrace is in use, SIGCONT can't
resume the tracee.  If the tracee was in TASK_STOPPED either because
it was already in job-control stop before PTRACE_ATTACH, or because
PTRACE_CONT,SIGSTOP put it there, then we need it to be in TASK_TRACED
while ptrace does its work in case an external SIGCONT comes along.

> Indeed, as Tejun pointed out, if ptrace_stop() needs
> arch_ptrace_stop(), then ptrace_check_attach() should be
> arch-friendly as well.

Yes, perhaps so.  It's certainly the case that the lack of complaint
from ia64/sparc users doesn't mean it's correct now, just that this
case is subtle, difficult to notice in practice, and subject to other
external conditions (other load causing flushes or not, etc.).

But the way the code is structured, arch_ptrace_stop can only be used
on current, meaning it requires waking it up from TASK_STOPPED long
enough to run the ptrace_stop logic.  IMHO it would be wrong to
introduce a new window visible to userland wherein a process
previously in job-control stop is seen to be running before it gets
into its ptrace stop.  It might not matter if /proc/pid/status can be
read to see "R (running)" for such a window.  But certainly it should
not have CLD_CONTINUED behavior (SIGCHLD and waitid visibility), nor
report a CLD_TRAPPED stop via SIGCHLD or wait after it was already stopped.

[tj:]
> Yes, before the change, the task would respond to SIGCONT before the
> first ptrace request succeeds after attach.  To me, this doesn't seem
> to be anything intentional tho.  It seems a lot of ptrace and group
> stop interactions is in the grey area with only the current (quirky,
> I'm afraid) behavior drawing almost arbitrary lines across different
> behaviors.

That is largely true.  But userland programs like gdb and strace have
gone to lots of contortions to work with the behavior as it is, and we
should not break what works in practice for them.

> We can try to preserve all those behaviors but I don't think that will
> be very beneficial.  I think the best way to proceed would be
> identifying the sensible and depended upon assumptions and then draw
> clear lines from there stating what's guaranteed and what's undefined.
> We'll have to keep some of quirkiness for sure.

I agree with this overall.  But we must consider whatever weirdness
comes up in ptrace.

> Anyways, pondering and verifying all the possibly visible changes
> definitely is necessary, but that said, we fortunately have rather
> limited number of ptrace users and their usages don't seem to be too
> wild (at least on my cursory investigation), so I think it to be
> doable without breaking anything noticeably.  But yeap we definitely
> need to be careful.

There are more ptrace users than you might think, though indeed there
are few of them that are widely-known and free code (strace and gdb).
Most every ptrace user out there has worked with the manifest behavior
seen on existing kernels rather than trying to understand any sound
principles of behavior, because it's been to hard to figure out.

> And, for longer term, I think it would be a good idea to separate
> group stop and ptrace trap mechanisms, so that ptrace trap works
> properly on per-task level and properly transparent from group stop
> handling.  The intertwining between the two across different domains
> of threads inhfferently involves a lot of grey areas where there is no
> good intuitive behavior.

I quite agree.  The best way to fix this is with sane semantics for
normal job-control operations when traced.  But ptrace has never had
that before and it's just not possible both to be sane and to match
the expected behavior of the traditional ptrace semantics.  Hence I
lean towards leaving things as much as possible as they are today when
only the ptrace operations we have today are used.

> IIUC, it dumps the register window to userland memory.  ia64 has this
> stacked windows of registers which gets wound up and unwound according
> to function calls and those need to be dumped to userland memory so
> that the debugger can PEEK and POKE them.  Not really sure why
> skipping it didn't cause any problem until now tho.

It's not necessarily that it didn't cause any problem.  It's more
likely that it's hard to notice, because the problems are so arcane.
It's also entirely possible that the remaining problematic cases just
happen to be vanishingly rare because of other factors--e.g., between
when something entered job-control stop and when ptrace was used, the
system is likely to have had enough other pressure on its magical
register caches that they got flushed anyway.  Moreover, the current
method with arch_ptrace_stop was only added relatively recently--on
the time scale for the actual users of these machines (in 2.6.28 for
sparc, in 2.6.25 for ia64).  Before that, ia64's PTRACE_PEEKDATA et al
used terrible arch-specific kludges to read the unflushed register
data rather than the real user memory (I don't know if sparc had
something similar).

> I think it would be better to introduce a few new ptrace operations
> and give them more control in clearly defined and hopefully intutive
> way.  For example, a ptracer is notified when a task is stopping for
> group stop and it should be able to decide and tell the kernel whether
> to participate in the group stop or not.  For SIGCONT, likewise, the
> kernel can notify all ptracers in the group and can notify the real
> parent if all ptracers decide to participate in the continuation.

I certainly agree that we should have some new operations or modes of
tracing and have them behave sanely.  I don't think the details you've
described here are exactly what would make sense, however.

IMHO there should be no discretion about whether a thread participates
in a group stop.  If a group stop is happening, every thread must be
stopped ASAP and the debugger should not be able to interfere with
that.  (The time for the debugger to interfere is when it gets the
chance to intercept the stop signal before it's delivered.)  If a
thread is stop for tracing when a group stop happens, then the right
way to think about its state is "both stopped for tracing and stopped
for job-control".  When the tracer says it can resume, it goes to
simply "stopped for job-control".  Likewise, if before that a SIGCONT
is generated, then it goes to simply "stopped for tracing" and does
not prevent the normal CLD_CONTINUED notification from happening.
That is, to the non-debugger parts of the system, it's no different
from if the debugger stopped that thread for tracing immediately after
SIGCONT resumed it and before it had a chance to do anything else.
(This is exactly the model we tried to implement in utrace.)

I've read over the rest of the long thread from December and will
reply further to the later iterations of the patches shortly.  I'm
sorry if I've replied to stale issues or overlooked questions still
needing answers.  I hope we can all get caught up to discussing the
same things this week.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2011-01-17 22:09     ` Roland McGrath
@ 2011-01-27 13:56       ` Tejun Heo
  2011-01-28 20:30         ` Roland McGrath
  0 siblings, 1 reply; 62+ messages in thread
From: Tejun Heo @ 2011-01-27 13:56 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Roland.

On Mon, Jan 17, 2011 at 02:09:23PM -0800, Roland McGrath wrote:
> > In short: suppose that the tracee recieves a signal, reports it
> > to debugger, and the debugger does ptrace(PTRACE_CONT, pid, SIGSTOP).
> > 
> > Before the patch the tracee sleeps in TASK_STOPPED, after the patch
> > it becomes TASK_TRACED.
> 
> I am not immediately comfortable with this.  What really matters is
> the userland-visible behavior (ptrace, wait, and signals).  The
> /proc/pid/status difference of "T (stopped)" vs "T (traced)" is
> technically userland-visible, but I don't think that really matters
> to userland.
> 
> > I missed this part of the changelog. "visible via fs/proc" is not
> > the only change. Another change is how the tracee reacts to SIGCONT
> > after that. With this patch it can't wake the tracee up.
> 
> This seems like a problem to me, though there are caveats to that.
> My rationale has always been that there should be some way to get
> into normal job-control stop state, both meaning that group-stop
> happens normally and that SIGCONT works normally.  
> 
> However, it is not entirely clear how useful that is to userland
> given the rest of ptrace behavior as it stands.  That is, the real
> parent can't see the job-control stop anyway because its wait
> reports are preempted by ptrace, so what is it for?
> 
> It would be more clear if we had a cleaner interface available in
> which tracing did not prevent normal job-control stops from being
> reported to the real parent.  I'm not really sure how much we should
> try to change this stuff before adding something like that.

I think we're getting a bit off course here.  The CONT problem in this
context isn't generic at all.  In principle, CONT doesn't
(immediately) work on a ptraced task.  We basically just had this
window where STOPPED -> TRACED transition was delayed in which CONT
worked.  It's more of a behavioral inconsistency than anything else.

If a task was RUNNING when PTRACE_ATTACH happens, when the tracer
wait(2) completes, the task is TRACED and won't immediately respond to
CONT.  If a task was STOPPED, CONT can still wake up between the
completion of wait(2) and the first ptrace command on the tracee.
It's a subtle inconsistency which I don't believe anyone could have
taken advantag of.

As for the generic discussion of group stop vs. ptrace, yeah, there's
a lot to discuss and as I already wrote before I don't think there
will be a single behavior which would satisfy all the requirements.  I
think it would be best to iron out the existing inconsistencies in a
way which doesn't break the current users and then gradually introduce
a few more ptrace operations which have well defined and more flexible
interaction with group stop.

> But the way the code is structured, arch_ptrace_stop can only be used
> on current, meaning it requires waking it up from TASK_STOPPED long
> enough to run the ptrace_stop logic.  IMHO it would be wrong to
> introduce a new window visible to userland wherein a process
> previously in job-control stop is seen to be running before it gets
> into its ptrace stop.  It might not matter if /proc/pid/status can be
> read to see "R (running)" for such a window.  But certainly it should
> not have CLD_CONTINUED behavior (SIGCHLD and waitid visibility), nor
> report a CLD_TRAPPED stop via SIGCHLD or wait after it was already stopped.

As I wrote in another reply, the visibility is masked from the tracer
and only visible if the user does a pretty convoluted thing.  Unless
there's such current user, I think we can clearly define what's
guaranteed regarding ptrace/wait interaction and leave cases outside
of it as undefined.

> [tj:]
> > Yes, before the change, the task would respond to SIGCONT before the
> > first ptrace request succeeds after attach.  To me, this doesn't seem
> > to be anything intentional tho.  It seems a lot of ptrace and group
> > stop interactions is in the grey area with only the current (quirky,
> > I'm afraid) behavior drawing almost arbitrary lines across different
> > behaviors.
> 
> That is largely true.  But userland programs like gdb and strace have
> gone to lots of contortions to work with the behavior as it is, and we
> should not break what works in practice for them.

Sure, I definitely agree but at the same time that doesn't mean we
shouldn't improve ptrace at all.  It just means it's delicate and we
should proceed carefully, which we shall.

> > We can try to preserve all those behaviors but I don't think that will
> > be very beneficial.  I think the best way to proceed would be
> > identifying the sensible and depended upon assumptions and then draw
> > clear lines from there stating what's guaranteed and what's undefined.
> > We'll have to keep some of quirkiness for sure.
> 
> I agree with this overall.  But we must consider whatever weirdness
> comes up in ptrace.

Yeap.

> I quite agree.  The best way to fix this is with sane semantics for
> normal job-control operations when traced.  But ptrace has never had
> that before and it's just not possible both to be sane and to match
> the expected behavior of the traditional ptrace semantics.  Hence I
> lean towards leaving things as much as possible as they are today when
> only the ptrace operations we have today are used.

Yeap, definitely.  I want to make things just clean enough so that it
doesn't hinder introduction of improvements while not breaking the
existing users, but things like silent STOPPED -> TRACED transition
are outright buggy and have to be fixed one way or the other.  We
can't ignore them.

> I certainly agree that we should have some new operations or modes of
> tracing and have them behave sanely.  I don't think the details you've
> described here are exactly what would make sense, however.
> 
> IMHO there should be no discretion about whether a thread participates
> in a group stop.  If a group stop is happening, every thread must be
> stopped ASAP and the debugger should not be able to interfere with
> that.  (The time for the debugger to interfere is when it gets the
> chance to intercept the stop signal before it's delivered.)  If a
> thread is stop for tracing when a group stop happens, then the right
> way to think about its state is "both stopped for tracing and stopped
> for job-control".  When the tracer says it can resume, it goes to
> simply "stopped for job-control".  Likewise, if before that a SIGCONT
> is generated, then it goes to simply "stopped for tracing" and does
> not prevent the normal CLD_CONTINUED notification from happening.
> That is, to the non-debugger parts of the system, it's no different
> from if the debugger stopped that thread for tracing immediately after
> SIGCONT resumed it and before it had a chance to do anything else.
> (This is exactly the model we tried to implement in utrace.)

There are two extremes.  You can put ptrace under group stop or
completely the other way around.  I think there are valid use cases
for both of them.  I'll try to summarize it later.

> I've read over the rest of the long thread from December and will
> reply further to the later iterations of the patches shortly.  I'm
> sorry if I've replied to stale issues or overlooked questions still
> needing answers.  I hope we can all get caught up to discussing the
> same things this week.

Heh, I was the late one this time.  I'll soon repost the refreshed
STOPPED <-> TRACED patchset.  I think we're mostly in agreement
regarding that part.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2011-01-27 13:56       ` Tejun Heo
@ 2011-01-28 20:30         ` Roland McGrath
  2011-01-31 14:39           ` Tejun Heo
  0 siblings, 1 reply; 62+ messages in thread
From: Roland McGrath @ 2011-01-28 20:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

> I think we're getting a bit off course here.  

I was thinking out loud to share the rationale behind my reactions.

> The CONT problem in this context isn't generic at all.  In principle,
> CONT doesn't (immediately) work on a ptraced task.  We basically just had
> this window where STOPPED -> TRACED transition was delayed in which CONT
> worked.  It's more of a behavioral inconsistency than anything else.

SIGCONT works just fine on a process that is being traced.  It only resumes
on a process that is in job control stop rather than tracing stop, but then
it works normally.  The other effects of SIGCONT (to clear all pending stop
signals at generation time, and to make a SIGCONT signal pending for later
delivery if it's not ignored or default-ignored) always work.  It is a
further wrinkle that any ptrace operation on a thread that is in job
control stop morphs that into tracing stop.  But it's just not true to say
that "in principle" SIGCONT doesn't work.

> If a task was RUNNING when PTRACE_ATTACH happens, when the tracer
> wait(2) completes, the task is TRACED and won't immediately respond to
> CONT.  If a task was STOPPED, CONT can still wake up between the
> completion of wait(2) and the first ptrace command on the tracee.
> It's a subtle inconsistency which I don't believe anyone could have
> taken advantag of.

I think you underestimate the fiddliness of userland ptrace users.

> As for the generic discussion of group stop vs. ptrace, yeah, there's
> a lot to discuss and as I already wrote before I don't think there
> will be a single behavior which would satisfy all the requirements.  I
> think it would be best to iron out the existing inconsistencies in a
> way which doesn't break the current users and then gradually introduce
> a few more ptrace operations which have well defined and more flexible
> interaction with group stop.

I agree with that overall sentiment, as I think I said before.

> As I wrote in another reply, the visibility is masked from the tracer
> and only visible if the user does a pretty convoluted thing.  Unless
> there's such current user, I think we can clearly define what's
> guaranteed regarding ptrace/wait interaction and leave cases outside
> of it as undefined.

I don't really buy the "masked" claim.

> Sure, I definitely agree but at the same time that doesn't mean we
> shouldn't improve ptrace at all.  It just means it's delicate and we
> should proceed carefully, which we shall.

And so we are.  I'm just pushing on the degree of caution.

> Yeap, definitely.  I want to make things just clean enough so that it
> doesn't hinder introduction of improvements while not breaking the
> existing users, but things like silent STOPPED -> TRACED transition
> are outright buggy and have to be fixed one way or the other.  We
> can't ignore them.

I don't agree with "outright buggy".  It's a well-defined situation that
has been well understood for quite some time, by possibility as many as
three people.

> There are two extremes.  You can put ptrace under group stop or
> completely the other way around.  I think there are valid use cases
> for both of them.  I'll try to summarize it later.

I look forward to that elucidation.  I can't really imagine a rationale I'd
accept for anything contrary to what I said about the supremacy of group stops.

> Heh, I was the late one this time.  I'll soon repost the refreshed
> STOPPED <-> TRACED patchset.  I think we're mostly in agreement
> regarding that part.

Mostly.  Mostly. ;-)


Thanks,
Roland

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2011-01-28 20:30         ` Roland McGrath
@ 2011-01-31 14:39           ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2011-01-31 14:39 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, linux-kernel, torvalds, akpm, rjw, jan.kratochvil

Hello, Roland.

On Fri, Jan 28, 2011 at 12:30:18PM -0800, Roland McGrath wrote:
> > I think we're getting a bit off course here.  
> 
> I was thinking out loud to share the rationale behind my reactions.

I see.

> > The CONT problem in this context isn't generic at all.  In principle,
> > CONT doesn't (immediately) work on a ptraced task.  We basically just had
> > this window where STOPPED -> TRACED transition was delayed in which CONT
> > worked.  It's more of a behavioral inconsistency than anything else.
> 
> SIGCONT works just fine on a process that is being traced.  It only resumes
> on a process that is in job control stop rather than tracing stop, but then
> it works normally.

I'm afraid it's not that clear.  I wish it were but it simply isn't.
There's no clear distinction between group stopped and trapped state
as group stopped tasks silently transit to trapped state in racy,
non-obvious and buggy (in that it skips arch hooks) manner.

The ambiguity in completion and notification of group stop also
contributes to non-deterministic behavior.  Before the patchset, a
ptrace task would happily report that it participated in group stop
triggering group stop notification (which also is buggy and wouldn't
reach the parent reliably BTW) even when it's stopping for ptrace
trap.  SIGCONT may or may not work depending _both_ on where the task
has stopped and whether the tracee has received another ptrace call
after that or not.

There's nothing even resembling consistency in that direction and it's
no wonder userland had so many problems and needed to grow hacks to
work around quirky behaviors.

The ironic upper side of all those inconsistencies is that we can flip
the behavior to one side and be relatively safe as the userland must
have been dealing with non-deterministic behavior until now.

So, I really can't agree that the current behavior regarding group
stop is "just fine".  It's utterly incoherent.  It simply is that the
userland learned to cope with it and we can and better make it
consistent within the limits of the current inconsistent behavior.

> The other effects of SIGCONT (to clear all pending stop
> signals at generation time, and to make a SIGCONT signal pending for later
> delivery if it's not ignored or default-ignored) always work.  It is a
> further wrinkle that any ptrace operation on a thread that is in job
> control stop morphs that into tracing stop.  But it's just not true to say
> that "in principle" SIGCONT doesn't work.

I really don't think "it may work depending on where it stops and
whether there's another ptrace call afterwards" can be called working
in any principle.  It is broken.

> > If a task was RUNNING when PTRACE_ATTACH happens, when the tracer
> > wait(2) completes, the task is TRACED and won't immediately respond to
> > CONT.  If a task was STOPPED, CONT can still wake up between the
> > completion of wait(2) and the first ptrace command on the tracee.
> > It's a subtle inconsistency which I don't believe anyone could have
> > taken advantag of.
> 
> I think you underestimate the fiddliness of userland ptrace users.

Give me one realistic scenario where the above would actually affect
userland, or better, an actually used program which breaks due to the
above change.  I might look like coming off too aggressively but to me
it seems like the don't-change-the-current-behavior principle is being
taken too far.  I do agree that we shouldn't be breaking existing
users but at the same time there should be a balance - a trade-off at
the right point.  Almost all ptrace users follow up PTRACE_ATTACH,
wait sequence with another PTRACE request, and the above change would
increase behavioral consistency.

If there actually is an user, we can change the behavior such that
SIGCONT is followed until the second PTRACE request, which should
probably be implemented by making the first ptrace_check_attach()
call, instead of PTRACE_ATTACH, put the tracee into TRACED state.  I
think it should be doable but that behavior is extremely subtle and
thus likely to cause more confusion for its users.  Its description
would go like the following,

  PTRACE_ATTACH stops the tracee and the tracer can wait for it with
  wait(2); however, regardless of wait(2) success, SIGCONT is allowed
  to wake up the tracee until the next ptrace request and if SIGCONT
  is received before the next ptrace request, the ptrace request might
  fail with -ESRCH.  Once the next ptrace request has succeeded,
  further SIGCONT are ignored until the task participates in another
  group stop which can't be determined reliably by the tracer.

With the proposed patchset, it becomes,

  PTRACE_ATTACH stops the tracee and the tracer can wait for it with
  wait(2).  Once attached, whenever wait(2) indicates task stop, it is
  guaranteed that all ptrace requests don't fail with -ESRCH and the
  tracee stays stopped until continued or detached by the tracer or
  killed with SIGKILL.

To me, arguing that the first description is better sounds quite far
fetched.  If we _absolutely_ have to, well, we'll have to, but let's
not mold ourselves into that mess if at all possible.

> > As for the generic discussion of group stop vs. ptrace, yeah, there's
> > a lot to discuss and as I already wrote before I don't think there
> > will be a single behavior which would satisfy all the requirements.  I
> > think it would be best to iron out the existing inconsistencies in a
> > way which doesn't break the current users and then gradually introduce
> > a few more ptrace operations which have well defined and more flexible
> > interaction with group stop.
> 
> I agree with that overall sentiment, as I think I said before.

Awesome. :-)

> > As I wrote in another reply, the visibility is masked from the tracer
> > and only visible if the user does a pretty convoluted thing.  Unless
> > there's such current user, I think we can clearly define what's
> > guaranteed regarding ptrace/wait interaction and leave cases outside
> > of it as undefined.
> 
> I don't really buy the "masked" claim.

Hah?  It's not a claim that I'm trying to sell out of thin air.  It's
actually masked.  There's no way for the tracer to see it (outside the
/proc state, that is).  It's properly and completely masked from the
tracer.

> > Sure, I definitely agree but at the same time that doesn't mean we
> > shouldn't improve ptrace at all.  It just means it's delicate and we
> > should proceed carefully, which we shall.
> 
> And so we are.  I'm just pushing on the degree of caution.

Yeah, your reviews are much appreciated.  Please don't take the heated
arguments as attacks.  :-)

> > Yeap, definitely.  I want to make things just clean enough so that it
> > doesn't hinder introduction of improvements while not breaking the
> > existing users, but things like silent STOPPED -> TRACED transition
> > are outright buggy and have to be fixed one way or the other.  We
> > can't ignore them.
> 
> I don't agree with "outright buggy".  It's a well-defined situation that
> has been well understood for quite some time, by possibility as many as
> three people.

We'll have to agree to disagree on that.  To me, the overall behavior
seems extremely inconsistent and filled with implementation details
leaking out to userland without clearly defined intentions.  As I
wrote in another reply, the only upside seems to be that we probably
have quite a leeway in changing the behavior thanks to the
inconsistencies as userland already has had to deal with them.  In
many cases, we should be able to choose one mode of operation among
the various inconsistent behaviors and stick with it without breaking
userland.

> > There are two extremes.  You can put ptrace under group stop or
> > completely the other way around.  I think there are valid use cases
> > for both of them.  I'll try to summarize it later.
> 
> I look forward to that elucidation.  I can't really imagine a
> rationale I'd accept for anything contrary to what I said about the
> supremacy of group stops.

But, with or without future use cases, currently group stop is already
inferior to ptrace and that is very visible to userland.  There is no
way we can change that without breaking the current users.  It doesn't
even fall inside the range of inconsistent behaviors.

> > Heh, I was the late one this time.  I'll soon repost the refreshed
> > STOPPED <-> TRACED patchset.  I think we're mostly in agreement
> > regarding that part.
> 
> Mostly.  Mostly. ;-)

Thank you.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2011-01-31 14:39 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
2010-12-06 16:56 ` [PATCH 01/16] signal: fix SIGCONT notification code Tejun Heo
2010-12-06 16:56 ` [PATCH 02/16] signal: fix CLD_CONTINUED notification target Tejun Heo
2010-12-20 14:58   ` Oleg Nesterov
2010-12-21 16:31     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
2010-12-20 14:59   ` Oleg Nesterov
2010-12-06 16:56 ` [PATCH 04/16] ptrace: kill tracehook_notify_jctl() Tejun Heo
2010-12-20 14:59   ` Oleg Nesterov
2010-12-21 17:00     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 05/16] ptrace: add @why to ptrace_stop() Tejun Heo
2010-12-06 16:56 ` [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
2010-12-20 15:00   ` Oleg Nesterov
2010-12-21 17:04     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 07/16] signal: use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
2010-12-06 16:56 ` [PATCH 08/16] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
2010-12-06 16:56 ` [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
2010-12-23 12:26   ` Oleg Nesterov
2010-12-23 13:53     ` Tejun Heo
2010-12-23 16:06       ` Oleg Nesterov
2010-12-23 16:33         ` Tejun Heo
2011-01-17 22:09     ` Roland McGrath
2011-01-27 13:56       ` Tejun Heo
2011-01-28 20:30         ` Roland McGrath
2011-01-31 14:39           ` Tejun Heo
2010-12-06 16:56 ` [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED Tejun Heo
2010-12-20 15:00   ` Oleg Nesterov
2010-12-21 17:31     ` Tejun Heo
2010-12-21 17:32       ` Tejun Heo
2010-12-22 10:54       ` Tejun Heo
2010-12-22 11:39       ` Oleg Nesterov
2010-12-22 15:14         ` Tejun Heo
2010-12-22 16:00           ` Oleg Nesterov
2010-12-22 16:21             ` Tejun Heo
2010-12-06 16:56 ` [PATCH 11/16] signal: prepare for CLD_* notification changes Tejun Heo
2010-12-20 16:21   ` Oleg Nesterov
2010-12-20 16:23     ` Oleg Nesterov
2010-12-21 17:35     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 12/16] ptrace: make group stop notification reliable against ptrace Tejun Heo
2010-12-20 17:34   ` Oleg Nesterov
2010-12-21 17:43     ` Tejun Heo
2010-12-22 11:54       ` Oleg Nesterov
2010-12-22 15:26         ` Tejun Heo
2010-12-22 16:02           ` Oleg Nesterov
2010-12-06 16:57 ` [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace() Tejun Heo
2010-12-20 18:15   ` Oleg Nesterov
2010-12-21 17:54     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
2010-12-20 19:43   ` Oleg Nesterov
2010-12-21 17:48     ` Tejun Heo
2010-12-22 12:16       ` Oleg Nesterov
2010-12-21 17:25   ` Oleg Nesterov
2010-12-22 10:35     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 15/16] ptrace: make sure SIGNAL_NOTIFY_CONT is checked after ptrace_signal() Tejun Heo
2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
2010-12-07  0:10   ` Roland McGrath
2010-12-07 13:43     ` Tejun Heo
2010-12-21 17:54   ` Oleg Nesterov
2010-12-22 10:36     ` Tejun Heo
2010-12-14 17:36 ` [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Oleg Nesterov
2010-12-14 17:46   ` Tejun Heo
2010-12-22 15:20 ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).