LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Paul Menage <menage@google.com>
Cc: akpm@osdl.org, pj@sgi.com, sekharan@us.ibm.com, dev@sw.ru,
	xemul@sw.ru, serue@us.ibm.com, vatsa@in.ibm.com,
	rohitseth@google.com, winget@google.com,
	containers@lists.osdl.org, ckrm-tech@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/6] containers: Generic container system abstracted from cpusets code
Date: Sat, 30 Dec 2006 06:10:32 -0700	[thread overview]
Message-ID: <m1ac15o7g7.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20061222145216.096568023@menage.corp.google.com> (Paul Menage's message of "Fri, 22 Dec 2006 06:14:43 -0800")

Paul Menage <menage@google.com> writes:

> This patch creates a generic process container system based on (and
> parallel top) the cpusets code.  At a coarse level it was created by
> copying kernel/cpuset.c, doing s/cpuset/container/g, and stripping out any
> code that was cpuset-specific rather than applicable to any process
> container subsystem.

First thank you for bring the conversation here.  Given what
you are implementing I rather object to the term containers as
that is what we have been using to refer to the aggregate whole
and not the individual pieces.

I'm still digesting this but do you think you could make the code
pid namespace safe before moving it all over creation.

i.e.  pid_nr(task_pid(task)) not task->pid.

I hadn't realized we had any users like the one below left.

The whole interface that reads out the processes in your task
grouping looks scary.  It takes the tasklist_lock and holds
it for an indefinite duration.  All it currently needs is
the rcu_read_lock.  Holding the tasklist_lock looks like a good
way to kill performance on a big box.  Even hold the cpu for
an indefinite duration I find a little worrying but no where
near as bad as taking a global lock for an indefinite period
of time.  Although I am curious why this is even needed when
we have /proc/<pid>/cpuset  which gets us the information
in another way.

This interface really belongs in /proc as it is about managing
processes.

The filesystem operations to manage cpusets are a little non-intuitive
but once you see what they are they appear usable.

I hate attach_task.  Allowing movement of a process from
one set to another by another process looks like a great way
to create subtle races.  The very long and exhaustive locking
comments seem to verify this.  For most of the unix API
we have avoided things for precisely this reason.  Leaving that
set of races to the debugging commands in sys_ptrace.

You are putting a pointer into the task_struct for each class
of resource you want to count.  Ouch.  Andi Kleen was sufficiently
paranoid about the space bloat that we were obliged to introduce
struct nsproxy.

The more I look at this the more this appears to be completely
overkill for process resource control, and currently I am horrified at what
currently looks like huge piles of unnecessary complexity in the
cpuset implementation.

I still need to do some research but at the moment my feeling that
this approach is so wrong that cpusets need to get fixed and nothing
should ever look at cloning them.

Process resource control that looks like a good reason to add some
more unshare flags or some separate syscalls whichever is simpler.
At least that has a simple user interface that is easy to audit.

If nothing else the code needs to find a way to be refactored so
it isn't scary too look at.  

Please also next time explain the mechanism you are talking about
using to track processes and don't grandfather it in with oh
this is just a slightly enhanced cpuset.  The insanity of this
interface would have been a lot easier to have been spotted
if it had been described more clearly.

Why does any of this code need a user mode helper?  I guess
because of the complicated semantics this doesn't do proper
reference counting so you can't implicitly free these things
on the exit of the last task that uses them.  That isn't the
unix way and I don't like it.  Way over complicated.

Eric

> +/*
> + * Load into 'pidarray' up to 'npids' of the tasks using container 'cont'.
> + * Return actual number of pids loaded.  No need to task_lock(p)
> + * when reading out p->container, as we don't really care if it changes
> + * on the next cycle, and we are not going to try to dereference it.
> + */
> +static int pid_array_load(pid_t *pidarray, int npids, struct container *cont)
> +{
> +	int n = 0;
> +	struct task_struct *g, *p;
> +
> +	read_lock(&tasklist_lock);
> +
> +	do_each_thread(g, p) {
> +		if (p->container == cont) {
> +			pidarray[n++] = p->pid;
> +			if (unlikely(n == npids))
> +				goto array_full;
> +		}
> +	} while_each_thread(g, p);
> +
> +array_full:
> +	read_unlock(&tasklist_lock);
> +	return n;
> +}

  reply	other threads:[~2006-12-30 13:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-22 14:14 [PATCH 0/6] containers: Generic Process Containers (V6) Paul Menage
2006-12-22 14:14 ` [PATCH 1/6] containers: Generic container system abstracted from cpusets code Paul Menage
2006-12-30 13:10   ` Eric W. Biederman [this message]
2006-12-31  5:17     ` Paul Jackson
2007-01-02 22:15     ` Paul Menage
2006-12-22 14:14 ` [PATCH 2/6] containers: Cpusets hooked into containers Paul Menage
2006-12-22 14:14 ` [PATCH 3/6] containers: Add generic multi-subsystem API to containers Paul Menage
2007-01-10 15:56   ` [ckrm-tech] " Balbir Singh
2007-01-11 22:53     ` Paul Menage
2007-01-12  6:29       ` Balbir Singh
2007-01-12  8:10         ` Paul Menage
2007-01-12  8:22           ` Balbir Singh
2007-01-20 17:27           ` Balbir Singh
2006-12-22 14:14 ` [PATCH 4/6] containers: Simple CPU accounting container subsystem Paul Menage
2007-01-10 14:21   ` [ckrm-tech] " Balbir Singh
2007-01-12  0:33     ` Paul Menage
2007-01-12  6:24       ` Balbir Singh
2007-01-12  8:15         ` Paul Menage
2007-01-12  8:26           ` Balbir Singh
2007-01-12 17:32             ` Paul Menage
2007-01-15  9:01       ` [PATCH 0/1] Add mount/umount callbacks to containers (Re: [ckrm-tech] [PATCH 4/6] containers: Simple CPU accounting container subsystem) Balbir Singh
2007-01-15  9:04         ` [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups " Balbir Singh
2007-01-15  9:22           ` Paul Menage
2007-01-15  9:51             ` [ckrm-tech] [PATCH 1/1] Fix a panic while mouting containers on powerpc and some other small cleanups (Re: " Balbir Singh
2007-01-15 10:01               ` Paul Menage
2007-01-15 10:10                 ` Balbir Singh
2006-12-22 14:14 ` [PATCH 5/6] containers: Resource Groups over generic containers Paul Menage
2006-12-22 14:14 ` [PATCH 6/6] containers: BeanCounters over generic process containers Paul Menage
2006-12-23 19:49   ` Herbert Poetzl
2006-12-24 11:32     ` Paul Menage
2006-12-25 10:16     ` Kirill Korotaev
2006-12-26  0:54       ` Paul Menage
2006-12-25 10:35     ` Pavel Emelianov
2007-01-03 14:43 ` [PATCH 0/6] containers: Generic Process Containers (V6) Serge E. Hallyn
2007-01-05  0:25   ` Paul Menage
2007-01-12 18:42     ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1ac15o7g7.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@osdl.org \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=rohitseth@google.com \
    --cc=sekharan@us.ibm.com \
    --cc=serue@us.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=winget@google.com \
    --cc=xemul@sw.ru \
    --subject='Re: [PATCH 1/6] containers: Generic container system abstracted from cpusets code' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).