LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] do_signal_stop: use signal_group_exit()
@ 2008-02-15 18:02 Oleg Nesterov
  2008-02-16  3:37 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-15 18:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Roland McGrath, linux-kernel

do_signal_stop() needs signal_group_exit() but checks sig->group_exit_task.
This (optimization) is correct, SIGNAL_STOP_DEQUEUED and SIGNAL_GROUP_EXIT
are mutually exclusive, but looks confusing. Use signal_group_exit(), this
is not fastpath, the code clarity is more important.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/signal.c~6_DO_SIGNAL_STOP	2008-02-15 16:59:17.000000000 +0300
+++ 25/kernel/signal.c	2008-02-15 20:34:32.000000000 +0300
@@ -1718,7 +1718,7 @@ static int do_signal_stop(int signr)
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
-		    unlikely(sig->group_exit_task))
+		    unlikely(signal_group_exit(sig)))
 			return 0;
 		/*
 		 * There is no group stop already in progress.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-15 18:02 [PATCH] do_signal_stop: use signal_group_exit() Oleg Nesterov
@ 2008-02-16  3:37 ` Andrew Morton
  2008-02-16 14:02   ` Oleg Nesterov
  2008-02-16 15:09   ` [PATCH] free_pidmap: turn it into free_pidmap(struct upid *) Oleg Nesterov
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2008-02-16  3:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: roland, linux-kernel, Pavel Emelyanov

ug.  On about the fourth boot with the current -mm lineup I hit:


: security:  permission sendto in class node not defined in policy
: security:  permission dccp_recv in class netif not defined in policy
: security:  permission dccp_send in class netif not defined in policy
: security:  permission ingress in class netif not defined in policy
: security:  permission egress in class netif not defined in policy
: security:  permission setfcap in class capability not defined in policy
: security:  permission forward_in in class packet not defined in policy
: security:  permission forward_out in class packet not defined in policy
: SELinux: policy loaded with handle_unknown=deny
: type=1403 audit(1203124656.152:3): policy loaded auid=4294967295 ses=4294967295
: BUG: unable to handle kernel paging request at 0000000000200200
: IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
: PGD 2574cb067 PUD 257561067 PMD 0 
: Oops: 0002 [1] SMP 
: last sysfs file: /sys/class/net/eth0/address
: CPU 2 
: Modules linked in: ipv6 dm_mirror dm_multipath dm_mod sbs sbshc dock battery ac parport_pc lp parport snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq shpchp snd_seq_device sg snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer button i2c_i801 snd soundcore ide_cd_mod cdrom serio_raw i2c_core snd_page_alloc pcspkr ehci_hcd ohci_hcd uhci_hcd
: Pid: 3132, comm: ifup-eth Not tainted 2.6.25-rc2-mm1 #5
: RIP: 0010:[<ffffffff802444f5>]  [<ffffffff802444f5>] free_pid+0x35/0x8e
: RSP: 0018:ffff81025754de58  EFLAGS: 00010046
: RAX: 0000000000000000 RBX: ffff81025f268840 RCX: ffff81025f263f08
: RDX: 0000000000200200 RSI: 0000000000000046 RDI: 0000000000000000
: RBP: ffff81025f263ec0 R08: ffff81025f268b18 R09: ffff81025f268b08
: R10: ffff81025f268b08 R11: 0000000000000000 R12: ffff810259853140
: R13: 0000000000000c78 R14: 0000000000000000 R15: 0000000000000000
: FS:  00007f8f9ba7d6f0(0000) GS:ffff81025f16f0c0(0000) knlGS:0000000000000000
: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
: CR2: 0000000000200200 CR3: 00000002598d0000 CR4: 00000000000006e0
: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
: Process ifup-eth (pid: 3132, threadinfo ffff81025754c000, task ffff81025d467620)
: Stack:  ffff81025f268b08 ffff81025f268840 ffff81025994b660 ffffffff80237727
:  ffff81025994b660 ffff81025994b660 0000000000000000 ffffffff80237f81
:  00000000000005d0 ffff810257561018 0000000000000000 00007fffa3aa9514
: Call Trace:
:  [<ffffffff80237727>] ? release_task+0x152/0x2e5
:  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
:  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
:  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
:  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
:  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
: 
: 
: Code: 80 53 41 51 e8 83 d4 28 00 31 ff 48 89 c6 eb 2e 48 63 c7 48 c1 e0 05 48 8d 44 28 40 48 8d 48 08 48 8b 40 08 48 8b 51 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 c7 41 08 00 02 20 00 ff c7 3b 7d 
: RIP  [<ffffffff802444f5>] free_pid+0x35/0x8e
:  RSP <ffff81025754de58>
: CR2: 0000000000200200
: ---[ end trace efde415d3f801416 ]---
: BUG: sleeping function called from invalid context at kernel/rwsem.c:21
: in_atomic():0, irqs_disabled():1
: Pid: 3132, comm: ifup-eth Tainted: G      D   2.6.25-rc2-mm1 #5
: 
: Call Trace:
:  [<ffffffff804d0c0a>] down_read+0x15/0x23
:  [<ffffffff802550cc>] acct_collect+0x40/0x180
:  [<ffffffff802385fd>] do_exit+0x1f3/0x6ba
:  [<ffffffff804d1ea1>] sync_regs+0x0/0x67
:  [<ffffffff804d3c02>] do_page_fault+0x755/0x80e
:  [<ffffffff804d1ae9>] error_exit+0x0/0x51
:  [<ffffffff802444f5>] free_pid+0x35/0x8e
:  [<ffffffff802444d3>] free_pid+0x13/0x8e
:  [<ffffffff80237727>] release_task+0x152/0x2e5
:  [<ffffffff80237f81>] do_wait+0x6c7/0xa1c
:  [<ffffffff8022f4cc>] default_wake_function+0x0/0xe
:  [<ffffffff8023e670>] sys_rt_sigaction+0x7a/0x98
:  [<ffffffff80238360>] sys_wait4+0x8a/0xa1
:  [<ffffffff8020be4b>] system_call_after_swapgs+0x7b/0x80
: 
: eth0: no IPv6 routers present
: 

and I don't have a clue which patch caused it and I won't be near this
machine again for over a week.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-16  3:37 ` Andrew Morton
@ 2008-02-16 14:02   ` Oleg Nesterov
  2008-02-17 23:10     ` Oleg Nesterov
  2008-02-16 15:09   ` [PATCH] free_pidmap: turn it into free_pidmap(struct upid *) Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-16 14:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: roland, linux-kernel, Pavel Emelyanov

On 02/15, Andrew Morton wrote:
>
> ug.  On about the fourth boot with the current -mm lineup I hit:
>
> : BUG: unable to handle kernel paging request at 0000000000200200
                                                   ^^^^^^^^^^^^^^^^
== LIST_POISON2

> : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e

most probably == hlist_del_rcu(pid_chain)

> : PGD 2574cb067 PUD 257561067 PMD 0 
> : Oops: 0002 [1] SMP 
> : last sysfs file: /sys/class/net/eth0/address
> : CPU 2 
> : Modules linked in: ipv6 dm_mirror dm_multipath dm_mod sbs sbshc dock battery ac parport_pc lp parport snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq shpchp snd_seq_device sg snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer button i2c_i801 snd soundcore ide_cd_mod cdrom serio_raw i2c_core snd_page_alloc pcspkr ehci_hcd ohci_hcd uhci_hcd
> : Pid: 3132, comm: ifup-eth Not tainted 2.6.25-rc2-mm1 #5
> : RIP: 0010:[<ffffffff802444f5>]  [<ffffffff802444f5>] free_pid+0x35/0x8e
> : RSP: 0018:ffff81025754de58  EFLAGS: 00010046
> : RAX: 0000000000000000 RBX: ffff81025f268840 RCX: ffff81025f263f08
> : RDX: 0000000000200200 RSI: 0000000000000046 RDI: 0000000000000000
> : RBP: ffff81025f263ec0 R08: ffff81025f268b18 R09: ffff81025f268b08
> : R10: ffff81025f268b08 R11: 0000000000000000 R12: ffff810259853140
> : R13: 0000000000000c78 R14: 0000000000000000 R15: 0000000000000000
> : FS:  00007f8f9ba7d6f0(0000) GS:ffff81025f16f0c0(0000) knlGS:0000000000000000
> : CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> : CR2: 0000000000200200 CR3: 00000002598d0000 CR4: 00000000000006e0
> : DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> : DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> : Process ifup-eth (pid: 3132, threadinfo ffff81025754c000, task ffff81025d467620)
> : Stack:  ffff81025f268b08 ffff81025f268840 ffff81025994b660 ffffffff80237727
> :  ffff81025994b660 ffff81025994b660 0000000000000000 ffffffff80237f81
> :  00000000000005d0 ffff810257561018 0000000000000000 00007fffa3aa9514
> : Call Trace:
> :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
> :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80

(Can't understand why there is no detach_pid() in this stack trace,
 but it is the only possible caller of free_pid()).

So, detach_pid()->free_pid() hit an already unhashed pid. But this
is not possible?

This means we already did detach_pid(), but in that case the previous
detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".

> and I don't have a clue which patch caused it and I won't be near this
> machine again for over a week.

Definitely not this patch...

I'll try to think more about this, but I doubt very much I'll find the
reason :(

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] free_pidmap: turn it into free_pidmap(struct upid *)
  2008-02-16  3:37 ` Andrew Morton
  2008-02-16 14:02   ` Oleg Nesterov
@ 2008-02-16 15:09   ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-16 15:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: roland, linux-kernel, Pavel Emelyanov, Eric W. Biederman

The callers of free_pidmap() pass 2 members of "struct upid", we can just pass
"struct upid *" instead. Shaves off 10 bytes from pid.o.

Also, simplify the alloc_pid's "out_free:" error path a little bit. This way it
looks more clear which subset of pid->numbers[] we are freeing.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/pid.c~	2008-02-15 16:59:17.000000000 +0300
+++ 25/kernel/pid.c	2008-02-16 17:51:17.000000000 +0300
@@ -111,10 +111,11 @@ EXPORT_SYMBOL(is_container_init);
 
 static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock);
 
-static void free_pidmap(struct pid_namespace *pid_ns, int pid)
+static void free_pidmap(struct upid *upid)
 {
-	struct pidmap *map = pid_ns->pidmap + pid / BITS_PER_PAGE;
-	int offset = pid & BITS_PER_PAGE_MASK;
+	int nr = upid->nr;
+	struct pidmap *map = upid->ns->pidmap + nr / BITS_PER_PAGE;
+	int offset = nr & BITS_PER_PAGE_MASK;
 
 	clear_bit(offset, map->page);
 	atomic_inc(&map->nr_free);
@@ -232,7 +233,7 @@ void free_pid(struct pid *pid)
 	spin_unlock_irqrestore(&pidmap_lock, flags);
 
 	for (i = 0; i <= pid->level; i++)
-		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
+		free_pidmap(pid->numbers + i);
 
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
@@ -278,8 +279,8 @@ out:
 	return pid;
 
 out_free:
-	for (i++; i <= ns->level; i++)
-		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
+	while (++i <= ns->level)
+		free_pidmap(pid->numbers + i);
 
 	kmem_cache_free(ns->pid_cachep, pid);
 	pid = NULL;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-16 14:02   ` Oleg Nesterov
@ 2008-02-17 23:10     ` Oleg Nesterov
  2008-02-18  4:11       ` Eric W. Biederman
  2008-02-20  2:32       ` [PATCH] do_signal_stop: use signal_group_exit() Eric W. Biederman
  0 siblings, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-17 23:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: roland, linux-kernel, Pavel Emelyanov, Kamalesh Babulal,
	Eric W. Biederman

On 02/16, Oleg Nesterov wrote:
>
> On 02/15, Andrew Morton wrote:
> >
> > ug.  On about the fourth boot with the current -mm lineup I hit:
> >
> > : BUG: unable to handle kernel paging request at 0000000000200200
>                                                    ^^^^^^^^^^^^^^^^
> == LIST_POISON2
> 
> > : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
> 
> most probably == hlist_del_rcu(pid_chain)
> 
> > : Call Trace:
> > :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
> 
> (Can't understand why there is no detach_pid() in this stack trace,
>  but it is the only possible caller of free_pid()).
> 
> So, detach_pid()->free_pid() hit an already unhashed pid. But this
> is not possible?
> 
> This means we already did detach_pid(), but in that case the previous
> detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
> somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".

Yes, this is not possible.

But what possible is: we have the unbalanced put_pid(pid) which frees the
live pid. The next alloc_pid() gets the same memory, and initializes all
pid->tasks[] lists.

Now, if that pid was used as PIDTYPE_PGID/PIDTYPE_SID, the next detach_pid()
from this pgrp/sid sees that the pid is not used (all lists are hlist_empty),
frees this pid again, and the bug manifests itself this way.

	tiocspgrp:

		put_pid(real_tty->pgrp);
		// ------ WINDOW ------
		real_tty->pgrp = get_pid(pgrp);

When bash spawns the command, both parent and child do ioctl(TIOCSPGRP,child),
and it is possible that both do put_pid() on the same parent's pid.

Damn, when you know what the bug is, the test case is trivial:

	$ while true; do perl -e0; done

The kernel crashes.

>From 2.6.25-rc2-mm1.bz2 patch:
>
> -       .ioctl          = tty_ioctl,
> +       .unlocked_ioctl = tty_ioctl,

and this is why this didn't happen before, I guess.

> I'll try to think more about this, but I doubt very much I'll find the
> reason :(

Ohhh... 7 (!!!) hours of hacking + some vodka did the trick.

(Kamalesh, I think you hit the same bug).

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-17 23:10     ` Oleg Nesterov
@ 2008-02-18  4:11       ` Eric W. Biederman
  2008-02-19 23:39         ` Valdis.Kletnieks
  2008-02-20  2:32       ` [PATCH] do_signal_stop: use signal_group_exit() Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2008-02-18  4:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, roland, linux-kernel, Pavel Emelyanov, Kamalesh Babulal

Oleg Nesterov <oleg@tv-sign.ru> writes:

> On 02/16, Oleg Nesterov wrote:
>>
>> On 02/15, Andrew Morton wrote:
>> >
>> > ug.  On about the fourth boot with the current -mm lineup I hit:
>> >
>> > : BUG: unable to handle kernel paging request at 0000000000200200
>>                                                    ^^^^^^^^^^^^^^^^
>> == LIST_POISON2
>> 
>> > : IP: [<ffffffff802444f5>] free_pid+0x35/0x8e
>> 
>> most probably == hlist_del_rcu(pid_chain)
>> 
>> > : Call Trace:
>> > :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
>> > :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
>> > :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
>> > :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
>> > :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
>> > :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
>> 
>> (Can't understand why there is no detach_pid() in this stack trace,
>>  but it is the only possible caller of free_pid()).

That is just because of tail call optimization.  There is no need
to return to detach_pid so we just jumped to free_pid.

>
>> So, detach_pid()->free_pid() hit an already unhashed pid. But this
>> is not possible?
>> 
>> This means we already did detach_pid(), but in that case the previous
>> detach_pid() has set task->pids[].pid = NULL, and we should OOPS earlier,
>> somewhere at "if (!hlist_empty(&pid->tasks[tmp]))".
>
> Yes, this is not possible.
>
> But what possible is: we have the unbalanced put_pid(pid) which frees the
> live pid. The next alloc_pid() gets the same memory, and initializes all
> pid->tasks[] lists.
>
> Now, if that pid was used as PIDTYPE_PGID/PIDTYPE_SID, the next detach_pid()
> from this pgrp/sid sees that the pid is not used (all lists are hlist_empty),
> frees this pid again, and the bug manifests itself this way.
>
> 	tiocspgrp:
>
> 		put_pid(real_tty->pgrp);
> 		// ------ WINDOW ------
> 		real_tty->pgrp = get_pid(pgrp);
>
> When bash spawns the command, both parent and child do ioctl(TIOCSPGRP,child),
> and it is possible that both do put_pid() on the same parent's pid.
>
> Damn, when you know what the bug is, the test case is trivial:
>
> 	$ while true; do perl -e0; done
>
> The kernel crashes.
>
>>From 2.6.25-rc2-mm1.bz2 patch:
>>
>> -       .ioctl          = tty_ioctl,
>> +       .unlocked_ioctl = tty_ioctl,
>
> and this is why this didn't happen before, I guess.
>
>> I'll try to think more about this, but I doubt very much I'll find the
>> reason :(
>
> Ohhh... 7 (!!!) hours of hacking + some vodka did the trick.
>
> (Kamalesh, I think you hit the same bug).

Thanks.  Looks like we need to grab a lock there.
At a quick skim I think we need the tty lock.


Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-18  4:11       ` Eric W. Biederman
@ 2008-02-19 23:39         ` Valdis.Kletnieks
  2008-02-20 16:14           ` [PATCH] (for -mm only) put_pid: make sure we don't free the live pid Oleg Nesterov
  2008-02-20 16:18           ` tty && pid problems Oleg Nesterov
  0 siblings, 2 replies; 15+ messages in thread
From: Valdis.Kletnieks @ 2008-02-19 23:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Andrew Morton, roland, linux-kernel,
	Pavel Emelyanov, Kamalesh Babulal

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> Oleg Nesterov <oleg@tv-sign.ru> writes:
> > On 02/16, Oleg Nesterov wrote:
> >> On 02/15, Andrew Morton wrote:
> >> > : BUG: unable to handle kernel paging request at 0000000000200200

> >> > : Call Trace:
> >> > :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
> >> > :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> >> > :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> >> > :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> >> > :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> >> > :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80

> Thanks.  Looks like we need to grab a lock there.
> At a quick skim I think we need the tty lock.

*ping* - Any further activity on this one?  I got bit by it as well on
the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
single-user and go multi-user.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] do_signal_stop: use signal_group_exit()
  2008-02-17 23:10     ` Oleg Nesterov
  2008-02-18  4:11       ` Eric W. Biederman
@ 2008-02-20  2:32       ` Eric W. Biederman
  1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2008-02-20  2:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, roland, linux-kernel, Pavel Emelyanov,
	Kamalesh Babulal, Valdis.Kletnieks

Oleg Nesterov <oleg@tv-sign.ru> writes:

>
>>From 2.6.25-rc2-mm1.bz2 patch:
>>
>> -       .ioctl          = tty_ioctl,
>> +       .unlocked_ioctl = tty_ioctl,
>
> and this is why this didn't happen before, I guess.

Right.  Does anyone know what kind of audit was made of the tty code
to ensure everything would be fine?

It it was pretty thorough and it was just this little corner case
it makes sense to get the struct tty locking for pids correct.

Otherwise since the tty layer is historically not especially good
with it's locking. We should just revert the change above, and save
attacking this until someone has time to do a thorough review.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] (for -mm only) put_pid: make sure we don't free the live pid
  2008-02-19 23:39         ` Valdis.Kletnieks
@ 2008-02-20 16:14           ` Oleg Nesterov
  2008-02-20 16:18           ` tty && pid problems Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-20 16:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, roland, linux-kernel, Pavel Emelyanov,
	Kamalesh Babulal, Alan Cox, Valdis.Kletnieks

[PATCH] (for -mm only) put_pid: make sure we don't free the live pid

Add the temporary (for -mm only) debugging code to catch the unbalanced
put_pid()'s. At least those which can free the "live" pid.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- MM/kernel/pid.c~	2008-02-20 18:29:40.000000000 +0300
+++ MM/kernel/pid.c	2008-02-20 18:35:15.000000000 +0300
@@ -208,6 +208,10 @@ void put_pid(struct pid *pid)
 	ns = pid->numbers[pid->level].ns;
 	if ((atomic_read(&pid->count) == 1) ||
 	     atomic_dec_and_test(&pid->count)) {
+		int type = PIDTYPE_MAX;
+		while (--type >= 0)
+			if (WARN_ON(!hlist_empty(&pid->tasks[type])))
+				return;
 		kmem_cache_free(ns->pid_cachep, pid);
 		put_pid_ns(ns);
 	}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* tty && pid problems
  2008-02-19 23:39         ` Valdis.Kletnieks
  2008-02-20 16:14           ` [PATCH] (for -mm only) put_pid: make sure we don't free the live pid Oleg Nesterov
@ 2008-02-20 16:18           ` Oleg Nesterov
  2008-02-20 16:19             ` Alan Cox
  2008-02-20 16:28             ` Oleg Nesterov
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-20 16:18 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Eric W. Biederman, Andrew Morton, roland, linux-kernel,
	Pavel Emelyanov, Kamalesh Babulal, Alan Cox

(Change the subject, cc Alan)

On 02/19, Valdis.Kletnieks@vt.edu wrote:
>
> On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> > Oleg Nesterov <oleg@tv-sign.ru> writes:
> > > On 02/16, Oleg Nesterov wrote:
> > >> On 02/15, Andrew Morton wrote:
> > >> > : BUG: unable to handle kernel paging request at 0000000000200200
> 
> > >> > : Call Trace:
> > >> > :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > >> > :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > >> > :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > >> > :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > >> > :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > >> > :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
> 
> > Thanks.  Looks like we need to grab a lock there.
> > At a quick skim I think we need the tty lock.
> 
> *ping* - Any further activity on this one?  I got bit by it as well on
> the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> single-user and go multi-user.

Valdis, any chance you can try the
	"[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
I sent? just to make sure we don't have other problems here.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tty && pid problems
  2008-02-20 16:18           ` tty && pid problems Oleg Nesterov
@ 2008-02-20 16:19             ` Alan Cox
  2008-02-22  1:39               ` Eric W. Biederman
  2008-02-20 16:28             ` Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Cox @ 2008-02-20 16:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Valdis.Kletnieks, Eric W. Biederman, Andrew Morton, roland,
	linux-kernel, Pavel Emelyanov, Kamalesh Babulal

> > *ping* - Any further activity on this one?  I got bit by it as well on
> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> > single-user and go multi-user.
> 
> Valdis, any chance you can try the
> 	"[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
> I sent? just to make sure we don't have other problems here.

There is some other iffy locking of the pid objects ever since they were
changed from pid_t to ref counted structs. Whoever did that didn't add
any locking for it, and the old code knew it was "safe" not to.

I've added locks in my test tree and now I've finally got -mm to build
will do some testing then push more stuff upstream

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tty && pid problems
  2008-02-20 16:18           ` tty && pid problems Oleg Nesterov
  2008-02-20 16:19             ` Alan Cox
@ 2008-02-20 16:28             ` Oleg Nesterov
  2008-02-20 19:00               ` Jiri Slaby
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2008-02-20 16:28 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Eric W. Biederman, Andrew Morton, roland, linux-kernel,
	Pavel Emelyanov, Kamalesh Babulal, Alan Cox

(sorry, the previous message was not finished)

On 02/20, Oleg Nesterov wrote:
>
> (Change the subject, cc Alan)
> 
> On 02/19, Valdis.Kletnieks@vt.edu wrote:
> >
> > On Sun, 17 Feb 2008 21:11:14 MST, Eric W. Biederman said:
> > > Oleg Nesterov <oleg@tv-sign.ru> writes:
> > > > On 02/16, Oleg Nesterov wrote:
> > > >> On 02/15, Andrew Morton wrote:
> > > >> > : BUG: unable to handle kernel paging request at 0000000000200200
> > 
> > > >> > : Call Trace:
> > > >> > :  [<ffffffff80237727>] ? release_task+0x152/0x2e5
> > > >> > :  [<ffffffff80237f81>] ? do_wait+0x6c7/0xa1c
> > > >> > :  [<ffffffff8022f4cc>] ? default_wake_function+0x0/0xe
> > > >> > :  [<ffffffff8023e670>] ? sys_rt_sigaction+0x7a/0x98
> > > >> > :  [<ffffffff80238360>] ? sys_wait4+0x8a/0xa1
> > > >> > :  [<ffffffff8020be4b>] ? system_call_after_swapgs+0x7b/0x80
> > 
> > > Thanks.  Looks like we need to grab a lock there.
> > > At a quick skim I think we need the tty lock.
> > 
> > *ping* - Any further activity on this one?  I got bit by it as well on
> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
> > single-user and go multi-user.
> 
> Valdis, any chance you can try the
> 	"[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
> I sent? just to make sure we don't have other problems here.

I think you can revert the tty-bkl-pushdown.patch. Or, as Eric suggested, just
revert this

	@@ -1222,7 +1221,7 @@ static const struct file_operations tty_
		.read           = tty_read,
		.write          = tty_write,
		.poll           = tty_poll,
	-       .ioctl          = tty_ioctl,
	+       .unlocked_ioctl = tty_ioctl,
		.compat_ioctl   = tty_compat_ioctl,
		.open           = tty_open,
		.release        = tty_release,
	@@ -1235,7 +1234,7 @@ static const struct file_operations ptmx
		.read           = tty_read,
		.write          = tty_write,
		.poll           = tty_poll,
	-       .ioctl          = tty_ioctl,
	+       .unlocked_ioctl = tty_ioctl,
		.compat_ioctl   = tty_compat_ioctl,
		.open           = ptmx_open,
		.release        = tty_release,
	@@ -1248,7 +1247,7 @@ static const struct file_operations cons
		.read           = tty_read,
		.write          = redirected_tty_write,
		.poll           = tty_poll,
	-       .ioctl          = tty_ioctl,
	+       .unlocked_ioctl = tty_ioctl,
		.compat_ioctl   = tty_compat_ioctl,
		.open           = tty_open,
		.release        = tty_release,
	@@ -1260,7 +1259,7 @@ static const struct file_operations hung
		.read           = hung_up_tty_read,
		.write          = hung_up_tty_write,
		.poll           = hung_up_tty_poll,
	-       .ioctl          = hung_up_tty_ioctl,
	+       .unlocked_ioctl = hung_up_tty_ioctl,
		.compat_ioctl   = hung_up_tty_compat_ioctl,
		.release        = tty_release,
	 };

chunk. I'd prefer to know what Alan's opinion.


HOWEVER. We have another bug here, and this bug is old.

When tiocspgrp() does put_pid(real_tty->pgrp), it is possible that real_tty
has the last reference, and the pid will be actually freed. This means that
tiocgpgrp() and do_task_stat() are not safe (rcu_read_lock() can't help).
We can read the freed/reused memory, and since pid_nr_ns() is not trivial
the kernel can crash. Unlikely, but possible. We need the lock.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tty && pid problems
  2008-02-20 16:28             ` Oleg Nesterov
@ 2008-02-20 19:00               ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2008-02-20 19:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Valdis.Kletnieks, Eric W. Biederman, Andrew Morton, roland,
	linux-kernel, Pavel Emelyanov, Kamalesh Babulal, Alan Cox

On 02/20/2008 05:28 PM, Oleg Nesterov wrote:
> I think you can revert the tty-bkl-pushdown.patch. Or, as Eric suggested, just
> revert this
> 
> 	@@ -1222,7 +1221,7 @@ static const struct file_operations tty_
> 		.read           = tty_read,
> 		.write          = tty_write,
> 		.poll           = tty_poll,
> 	-       .ioctl          = tty_ioctl,
> 	+       .unlocked_ioctl = tty_ioctl,
> 		.compat_ioctl   = tty_compat_ioctl,
> 		.open           = tty_open,
> 		.release        = tty_release,
> 	@@ -1235,7 +1234,7 @@ static const struct file_operations ptmx
> 		.read           = tty_read,
> 		.write          = tty_write,
> 		.poll           = tty_poll,
> 	-       .ioctl          = tty_ioctl,
> 	+       .unlocked_ioctl = tty_ioctl,
> 		.compat_ioctl   = tty_compat_ioctl,
> 		.open           = ptmx_open,
> 		.release        = tty_release,
> 	@@ -1248,7 +1247,7 @@ static const struct file_operations cons
> 		.read           = tty_read,
> 		.write          = redirected_tty_write,
> 		.poll           = tty_poll,
> 	-       .ioctl          = tty_ioctl,
> 	+       .unlocked_ioctl = tty_ioctl,
> 		.compat_ioctl   = tty_compat_ioctl,
> 		.open           = tty_open,
> 		.release        = tty_release,
> 	@@ -1260,7 +1259,7 @@ static const struct file_operations hung
> 		.read           = hung_up_tty_read,
> 		.write          = hung_up_tty_write,
> 		.poll           = hung_up_tty_poll,
> 	-       .ioctl          = hung_up_tty_ioctl,
> 	+       .unlocked_ioctl = hung_up_tty_ioctl,
> 		.compat_ioctl   = hung_up_tty_compat_ioctl,
> 		.release        = tty_release,
> 	 };
> 
> chunk.

This would result in unpredictable behaviour. If I left locking apart, ioctl 
prototype != unlocked_ioctl prototype.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tty && pid problems
  2008-02-20 16:19             ` Alan Cox
@ 2008-02-22  1:39               ` Eric W. Biederman
  2008-02-22  9:37                 ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2008-02-22  1:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Oleg Nesterov, Valdis.Kletnieks, Andrew Morton, roland,
	linux-kernel, Pavel Emelyanov, Kamalesh Babulal

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

>> > *ping* - Any further activity on this one?  I got bit by it as well on
>> > the very first attempted boot of 25-rc2-mm1, the instant it tried to leave
>> > single-user and go multi-user.
>> 
>> Valdis, any chance you can try the
>> 	"[PATCH] (for -mm only) put_pid: make sure we don't free the live pid"
>> I sent? just to make sure we don't have other problems here.
>
> There is some other iffy locking of the pid objects ever since they were
> changed from pid_t to ref counted structs. Whoever did that didn't add
> any locking for it, and the old code knew it was "safe" not to.
>
> I've added locks in my test tree and now I've finally got -mm to build
> will do some testing then push more stuff upstream

Thanks.  At the tty layer that was probably me.
Most of the instances already appear to be nested in some other kind of
locking, but that doesn't make no additional locking correct or ensure
that it will give a uniform result.

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tty && pid problems
  2008-02-22  1:39               ` Eric W. Biederman
@ 2008-02-22  9:37                 ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2008-02-22  9:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Valdis.Kletnieks, Andrew Morton, roland,
	linux-kernel, Pavel Emelyanov, Kamalesh Babulal

> > I've added locks in my test tree and now I've finally got -mm to build
> > will do some testing then push more stuff upstream
> 
> Thanks.  At the tty layer that was probably me.
> Most of the instances already appear to be nested in some other kind of
> locking, but that doesn't make no additional locking correct or ensure
> that it will give a uniform result.

Fortunately your pid struct is ref counted so not too hard to sort out.
Need to look at procfs but at worst tty needs to export a function which
returns a reference bumped pid struct to people who stick their nose in
from outside.

Alan

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-02-22  9:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-15 18:02 [PATCH] do_signal_stop: use signal_group_exit() Oleg Nesterov
2008-02-16  3:37 ` Andrew Morton
2008-02-16 14:02   ` Oleg Nesterov
2008-02-17 23:10     ` Oleg Nesterov
2008-02-18  4:11       ` Eric W. Biederman
2008-02-19 23:39         ` Valdis.Kletnieks
2008-02-20 16:14           ` [PATCH] (for -mm only) put_pid: make sure we don't free the live pid Oleg Nesterov
2008-02-20 16:18           ` tty && pid problems Oleg Nesterov
2008-02-20 16:19             ` Alan Cox
2008-02-22  1:39               ` Eric W. Biederman
2008-02-22  9:37                 ` Alan Cox
2008-02-20 16:28             ` Oleg Nesterov
2008-02-20 19:00               ` Jiri Slaby
2008-02-20  2:32       ` [PATCH] do_signal_stop: use signal_group_exit() Eric W. Biederman
2008-02-16 15:09   ` [PATCH] free_pidmap: turn it into free_pidmap(struct upid *) Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).