LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@tv-sign.ru>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Pavel Emelyanov <xemul@openvz.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU
Date: Thu, 31 Jan 2008 12:31:50 +0300	[thread overview]
Message-ID: <20080131093150.GB82@tv-sign.ru> (raw)
In-Reply-To: <m11w7zgocj.fsf@ebiederm.dsl.xmission.com>

On 01/30, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> writes:
> 
> > On 01/29, Eric W. Biederman wrote:
> >>
> >> Ok. I believe I see what problem you are trying to fix.  That
> >> a pid returned from find_pid might disappear if we are not rcu
> >> protected.
> >
> > Not only.
> >
> > Any find_pid() is unsafe under tasklist, even the find_pid(1).
> > Because free_pid() mangles the pid_hash[hash] while find_pid()
> > scans the same list.
> 
> But we do it the rcu way.
> 
> So it isn't the mangling that is a problem.  Just walking off into
> freed memory.  That is fundamentally the principle that the rcu
> accessors work on.

Yes, this is what I meant.

> >> We currently appear to have two options.
> >> 1) Force all pid hash table access and pid accesses that
> >>    do not get a count to be covered under rcu_read_lock.
> >
> > I agree, we can (and should) convert most of these read_lock(tasklist)'s
> > to rcu_read_lock(). But we have a lot of them.
> 
> We have to touch and recheck all of that code anyway to ensure
> the pid namespace stuff is correct and working well.

Sure.

> >> 2) To modify the locking requirements for free_pid to require
> >>    the tasklist_lock.
> >>
> >>    However this second approach is horribly brittle, as it
> >>    will break if we ever have intermediate entries in the
> >>    hash table protected by pidmap_lock.
> >>
> >> Using the tasklist_lock to still guarantee we see the list, the entire
> >> list, and exactly the list for proper implementation of kill to
> >> process groups and sessions still seems sane.
> >
> > And this means that attach_pid() and detach_pid() need write_lock(tasklist)
> > anyway.
> >
> > So copy_process()->free_pid() is the only case when we modify the pid_hash[]
> > list without tasklist.
> 
> Nope.  alloc_pid does also.

Doesn't matter, inserts are safe. hlist_add_head_rcu() does wmb()
after the full init, and find_pid() has rmb(). We don't really
need rcu_read_lock() to traverse the rcu-protected list as long as
we know the node can't go away.

> The pid_hash table is fundamentally not
> protected by task_lock, but by pidmap_lock.

Until this patch.

> task_list only protects detach (by fluke) and going from struct pid 
> to a task.  It doesn't prevent additions to the hash chains at all.

see above.

> >> - * look up a PID in the hash table. Must be called with the tasklist_lock
> >> - * or rcu_read_lock() held.
> >> + * look up a PID in the hash table. Must be called with the rcu_read_lock()
> > held.
> >
> > Imho, we should first fix all users of read_lock(tasklist)+find_..._pid().
> >
> > So I still think this patch makes sense as a trivial fix for now, until
> > we add the necessary rcu_read_lock()s. However this race is very unlikely,
> > perhaps we can live with it.
> 
> The patch is horrible because it works for all of the wrong reasons,
> and obscures what is really going on.  That can't be good.

Eric, perhaps you misunderstood me. I do agree, the patch is horrible,
I added this awful #ifdef's exactly because this must look as "FIXME!".

Let me explain. Until now, find_pid() under tasklist was legal and documented.
It was actually broken, but happened to work because read_lock() implied
rcu_read_lock(). This patch fixes tasklist+find_pid(), and since PREEMPT_RCU
was merged, imho it could go into 2.6.25.

Now we can change the rules, fix all users, and then revert this hack.

OK. please feel free to do what you think right, but unless you prove
I'm technically wrong I don't understand you. But I won't insist any
longer.

Oleg.


  reply	other threads:[~2008-01-31  9:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29 16:40 Oleg Nesterov
2008-01-29 23:02 ` Andrew Morton
2008-01-30 14:17   ` Peter Zijlstra
2008-01-31 13:32   ` Ingo Molnar
2008-01-29 23:08 ` Andrew Morton
2008-01-30  2:16   ` Eric W. Biederman
2008-01-30  4:56     ` Paul E. McKenney
2008-01-30  3:24 ` Eric W. Biederman
2008-01-30  5:00   ` Paul E. McKenney
2008-01-30  9:20     ` Eric W. Biederman
2008-01-30  9:48       ` Oleg Nesterov
2008-01-30  9:30   ` Oleg Nesterov
2008-01-30 18:28     ` Eric W. Biederman
2008-01-31  9:31       ` Oleg Nesterov [this message]
2009-12-14  2:15 Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080131093150.GB82@tv-sign.ru \
    --to=oleg@tv-sign.ru \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=xemul@openvz.org \
    --subject='Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).