LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: ebiederm@xmission.com (Eric W. Biederman)
Cc: menage@google.com, akpm@osdl.org, sekharan@us.ibm.com, dev@sw.ru,
	xemul@sw.ru, serue@us.ibm.com, vatsa@in.ibm.com,
	rohitseth@google.com, winget@google.com,
	containers@lists.osdl.org, ckrm-tech@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/6] containers: Generic container system abstracted from cpusets code
Date: Sat, 30 Dec 2006 21:17:56 -0800	[thread overview]
Message-ID: <20061230211756.e1b1cc41.pj@sgi.com> (raw)
In-Reply-To: <m1ac15o7g7.fsf@ebiederm.dsl.xmission.com>

Eric wrote:
> The whole interface that reads out the processes in your task
> grouping looks scary.  It takes the tasklist_lock and holds
> it for an indefinite duration.

It doesn't look "indefinite" to me.  It reads the 'container'
field of each task struct, and then is done, dropping the lock.

That's got to be one of the lowest cost, most definite duration,
invocations of "do_each_thread(g, p)" in the kernel.

> Although I am curious why this is even needed when
> we have /proc/<pid>/cpuset  which gets us the information
> in another way.

The /proc/<pid>/cpuset interface lets you map one pid to its cpuset.

That's the opposite of mapping a cpuset to the set of all pids that
are attached to it.

I suppose one could get all the tasks in a cpuset by doing whatever it
takes for an opendir/readdir/closedir loop over the pid entries
of /proc, and then each pid in the system doing an open/read/close on
its /proc/<pid>/cpuset, and doing a strcmp on its path with the cpuset
path of interest, to see if they match.

What kind of locking is done in the kernel when a user task does an
opendir/readdir/closedir loop over /proc?

In any case, this would be hecka more expensive than the current
quite -definite- "do_each_thread(g, p)" over the task list, with
three system calls per pid.  And the 'tasks' file in cpusets is an
existing and valuable feature, which we can't just remove without
serious cause.

> I hate attach_task.  Allowing movement of a process from
> one set to another by another process looks like a great way
> to create subtle races.  The very long and exhaustive locking
> comments seem to verify this.

The ability to move tasks between cpusets a valued feature for my
customers.  Sorry you hate it.

I'll try to make my comments shorter and less exhausting next
time </sarcarsm>.

The locking is difficult, because:
 1) yes, as you note, attach_task() isn't easy,
 2) the cpu and memory placement of a whole set of tasks can be
    changed by a single write system call on some cpuset file,
 2) cpusets is on the critical code path for both the memory
    allocator and task scheduler (controlling where one can
    allocate and schedule), but needs to avoid putting any
    sigificant locks on either of these paths.

> currently I am horrified at what
> currently looks like huge piles of unnecessary complexity in the
> cpuset implementation.

Not much I can do to help you with your horror, sorry.

If you could be more specific on ways to trim the code while
maintaining the API's that we use, then that might be useful.

> that cpusets need to get fixed

Let me know when you have patches.

> Why does any of this code need a user mode helper?  I guess
> because of the complicated semantics this doesn't do proper
> reference counting

The cpuset reference counting is just fine, thank-you.

Removing nodes from the bottom of a vfs file system, when one got there
by an unexpected code path, such as task exit, is not easy.  Well, for
someone of my limited vfs talents, quite impossible.  I had no desire
(nor ability) to replicate in the kernel/cpuset.c code whatever voodoo
it takes to get the vfs locking correct for a rmdir(2) system call.

Using a user mode helper lets this be handled using the ordinary
rmdir(2) system call, with no special vfs locking awareness, from
a separate thread.

... Hope you had a Merry Christmas.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  reply	other threads:[~2006-12-31  5:19 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-22 14:14 [PATCH 0/6] containers: Generic Process Containers (V6) Paul Menage
2006-12-22 14:14 ` [PATCH 1/6] containers: Generic container system abstracted from cpusets code Paul Menage
2006-12-30 13:10   ` Eric W. Biederman
2006-12-31  5:17     ` Paul Jackson [this message]
2007-01-02 22:15     ` Paul Menage
2006-12-22 14:14 ` [PATCH 2/6] containers: Cpusets hooked into containers Paul Menage
2006-12-22 14:14 ` [PATCH 3/6] containers: Add generic multi-subsystem API to containers Paul Menage
2007-01-10 15:56   ` [ckrm-tech] " Balbir Singh
2007-01-11 22:53     ` Paul Menage
2007-01-12  6:29       ` Balbir Singh
2007-01-12  8:10         ` Paul Menage
2007-01-12  8:22           ` Balbir Singh
2007-01-20 17:27           ` Balbir Singh
2006-12-22 14:14 ` [PATCH 4/6] containers: Simple CPU accounting container subsystem Paul Menage
2007-01-10 14:21   ` [ckrm-tech] " Balbir Singh
2007-01-12  0:33     ` Paul Menage
2007-01-12  6:24       ` Balbir Singh
2007-01-12  8:15         ` Paul Menage
2007-01-12  8:26           ` Balbir Singh
2007-01-12 17:32             ` Paul Menage
2007-01-15  9:01       ` [PATCH 0/1] Add mount/umount callbacks to containers (Re: [ckrm-tech] [PATCH 4/6] containers: Simple CPU accounting container subsystem) Balbir Singh
2007-01-15  9:04         ` [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups " Balbir Singh
2007-01-15  9:22           ` Paul Menage
2007-01-15  9:51             ` [ckrm-tech] [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups (Re: " Balbir Singh
2007-01-15 10:01               ` Paul Menage
2007-01-15 10:10                 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 5/6] containers: Resource Groups over generic containers Paul Menage
2006-12-22 14:14 ` [PATCH 6/6] containers: BeanCounters over generic process containers Paul Menage
2006-12-23 19:49   ` Herbert Poetzl
2006-12-24 11:32     ` Paul Menage
2006-12-25 10:16     ` Kirill Korotaev
2006-12-26  0:54       ` Paul Menage
2006-12-25 10:35     ` Pavel Emelianov
2007-01-03 14:43 ` [PATCH 0/6] containers: Generic Process Containers (V6) Serge E. Hallyn
2007-01-05  0:25   ` Paul Menage
2007-01-12 18:42     ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061230211756.e1b1cc41.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=akpm@osdl.org \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=rohitseth@google.com \
    --cc=sekharan@us.ibm.com \
    --cc=serue@us.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=winget@google.com \
    --cc=xemul@sw.ru \
    --subject='Re: [PATCH 1/6] containers: Generic container system abstracted from cpusets code' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).