LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Prepare for the unshare support of the pid namespace
@ 2011-01-31 10:25 Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 1/4] pid: Remove the child_reaper special case in init/main.c Daniel Lezcano
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 10:25 UTC (permalink / raw)
  To: akpm; +Cc: ebiederm, oleg, containers, linux-kernel, clg

This patchset is a cleanup and a preparation to unshare the pid
namespace. These prerequisites prepares the next Eric's patchset
to give a file descriptor to a namespace and join an existing
namespace.

The initial authors of this patchset are Eric Biederman and Oleg
Nesterov.

Changelog:
	01/31/11 - Daniel Lezcano <daniel.lezcano@free.fr>
	* patch 1/4 : wrapped test in a function
	* other patches : refreshed against linux-next



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] pid: Remove the child_reaper special case in init/main.c
  2011-01-31 10:25 Prepare for the unshare support of the pid namespace Daniel Lezcano
@ 2011-01-31 10:25 ` Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace Daniel Lezcano
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 10:25 UTC (permalink / raw)
  To: akpm; +Cc: ebiederm, oleg, containers, linux-kernel, clg

From: Eric W. Biederman <ebiederm@xmission.com>

It turns out that the existing assignment in copy_process of
the child_reaper can handle the initial assignment of child_reaper
we just need to generalize the test in kernel/fork.c

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 include/linux/pid.h |   11 +++++++++++
 init/main.c         |    9 ---------
 kernel/fork.c       |    2 +-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 49f1c2f..efceda0 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -141,6 +141,17 @@ static inline struct pid_namespace *ns_of_pid(struct pid *pid)
 }
 
 /*
+ * is_child_reaper returns true if the pid is the init process
+ * of the current namespace. As this one could be checked before
+ * pid_ns->child_reaper is assigned in copy_process, we check
+ * with the pid number.
+ */
+static inline bool is_child_reaper(struct pid *pid)
+{
+	return pid->numbers[pid->level].nr == 1;
+}
+
+/*
  * the helpers to get the pid's id seen from different namespaces
  *
  * pid_nr()    : global id, i.e. the id seen from the init namespace;
diff --git a/init/main.c b/init/main.c
index 33c37c3..793ebfd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -875,15 +875,6 @@ static int __init kernel_init(void * unused)
 	 * init can run on any cpu.
 	 */
 	set_cpus_allowed_ptr(current, cpu_all_mask);
-	/*
-	 * Tell the world that we're going to be the grim
-	 * reaper of innocent orphaned children.
-	 *
-	 * We don't want people to have to make incorrect
-	 * assumptions about where in the task array this
-	 * can be found.
-	 */
-	init_pid_ns.child_reaper = current;
 
 	cad_pid = task_pid(current);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 25e4291..c9f0784 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1289,7 +1289,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		tracehook_finish_clone(p, clone_flags, trace);
 
 		if (thread_group_leader(p)) {
-			if (clone_flags & CLONE_NEWPID)
+			if (is_child_reaper(pid))
 				p->nsproxy->pid_ns->child_reaper = p;
 
 			p->signal->leader_pid = pid;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace
  2011-01-31 10:25 Prepare for the unshare support of the pid namespace Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 1/4] pid: Remove the child_reaper special case in init/main.c Daniel Lezcano
@ 2011-01-31 10:25 ` Daniel Lezcano
  2011-01-31 13:22   ` Oleg Nesterov
  2011-01-31 10:25 ` [PATCH 3/4] procfs: kill the global proc_mnt variable Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate Daniel Lezcano
  3 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 10:25 UTC (permalink / raw)
  To: akpm; +Cc: ebiederm, oleg, containers, linux-kernel, clg

From: Eric W. Biederman <ebiederm@xmission.com>

Reorganize proc_get_sb so it can be called before the struct pid
of the first process is allocated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 fs/proc/root.c         |   25 +++++++------------------
 kernel/fork.c          |    6 ------
 kernel/pid_namespace.c |    4 ++++
 3 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/fs/proc/root.c b/fs/proc/root.c
index ef9fa8e..e5e2bfa 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -43,17 +43,6 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	struct pid_namespace *ns;
 	struct proc_inode *ei;
 
-	if (proc_mnt) {
-		/* Seed the root directory with a pid so it doesn't need
-		 * to be special in base.c.  I would do this earlier but
-		 * the only task alive when /proc is mounted the first time
-		 * is the init_task and it doesn't have any pids.
-		 */
-		ei = PROC_I(proc_mnt->mnt_sb->s_root->d_inode);
-		if (!ei->pid)
-			ei->pid = find_get_pid(1);
-	}
-
 	if (flags & MS_KERNMOUNT)
 		ns = (struct pid_namespace *)data;
 	else
@@ -71,16 +60,16 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 			return ERR_PTR(err);
 		}
 
-		ei = PROC_I(sb->s_root->d_inode);
-		if (!ei->pid) {
-			rcu_read_lock();
-			ei->pid = get_pid(find_pid_ns(1, ns));
-			rcu_read_unlock();
-		}
-
 		sb->s_flags |= MS_ACTIVE;
 	}
 
+	ei = PROC_I(sb->s_root->d_inode);
+	if (!ei->pid) {
+		rcu_read_lock();
+		ei->pid = get_pid(find_pid_ns(1, ns));
+		rcu_read_unlock();
+	}
+
 	return dget(sb->s_root);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index c9f0784..e7a5907 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1180,12 +1180,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		pid = alloc_pid(p->nsproxy->pid_ns);
 		if (!pid)
 			goto bad_fork_cleanup_io;
-
-		if (clone_flags & CLONE_NEWPID) {
-			retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
-			if (retval < 0)
-				goto bad_fork_free_pid;
-		}
 	}
 
 	p->pid = pid_nr(pid);
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index a5aff94..b90e4ab 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -14,6 +14,7 @@
 #include <linux/err.h>
 #include <linux/acct.h>
 #include <linux/slab.h>
+#include <linux/proc_fs.h>
 
 #define BITS_PER_PAGE		(PAGE_SIZE*8)
 
@@ -96,6 +97,9 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
 	for (i = 1; i < PIDMAP_ENTRIES; i++)
 		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
 
+	if (pid_ns_prepare_proc(ns))
+		goto out_free_map;
+
 	return ns;
 
 out_free_map:
-- 
1.7.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] procfs: kill the global proc_mnt variable
  2011-01-31 10:25 Prepare for the unshare support of the pid namespace Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 1/4] pid: Remove the child_reaper special case in init/main.c Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace Daniel Lezcano
@ 2011-01-31 10:25 ` Daniel Lezcano
  2011-01-31 10:25 ` [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate Daniel Lezcano
  3 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 10:25 UTC (permalink / raw)
  To: akpm; +Cc: ebiederm, oleg, containers, linux-kernel, clg

From: Oleg Nesterov <oleg@redhat.com>

After the previous cleanup in proc_get_sb() the global proc_mnt has
no reasons to exists, kill it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 fs/proc/inode.c    |    2 --
 fs/proc/internal.h |    1 -
 fs/proc/root.c     |    7 ++++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 176ce4c..ee0f802 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -42,8 +42,6 @@ static void proc_evict_inode(struct inode *inode)
 		sysctl_head_put(PROC_I(inode)->sysctl);
 }
 
-struct vfsmount *proc_mnt;
-
 static struct kmem_cache * proc_inode_cachep;
 
 static struct inode *proc_alloc_inode(struct super_block *sb)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 9ad561d..c03e8d3 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -107,7 +107,6 @@ static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde)
 }
 void pde_put(struct proc_dir_entry *pde);
 
-extern struct vfsmount *proc_mnt;
 int proc_fill_super(struct super_block *);
 struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index e5e2bfa..a9000e9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -90,19 +90,20 @@ static struct file_system_type proc_fs_type = {
 
 void __init proc_root_init(void)
 {
+	struct vfsmount *mnt;
 	int err;
 
 	proc_init_inodecache();
 	err = register_filesystem(&proc_fs_type);
 	if (err)
 		return;
-	proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
-	if (IS_ERR(proc_mnt)) {
+	mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
+	if (IS_ERR(mnt)) {
 		unregister_filesystem(&proc_fs_type);
 		return;
 	}
 
-	init_pid_ns.proc_mnt = proc_mnt;
+	init_pid_ns.proc_mnt = mnt;
 	proc_symlink("mounts", NULL, "self/mounts");
 
 	proc_net_init();
-- 
1.7.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate
  2011-01-31 10:25 Prepare for the unshare support of the pid namespace Daniel Lezcano
                   ` (2 preceding siblings ...)
  2011-01-31 10:25 ` [PATCH 3/4] procfs: kill the global proc_mnt variable Daniel Lezcano
@ 2011-01-31 10:25 ` Daniel Lezcano
  2011-01-31 11:26   ` Alexey Dobriyan
  3 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 10:25 UTC (permalink / raw)
  To: akpm; +Cc: ebiederm, oleg, containers, linux-kernel, clg

From: Eric W. Biederman <ebiederm@xmission.com>

The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 arch/powerpc/platforms/cell/spufs/sched.c |    2 +-
 arch/um/drivers/mconsole_kern.c           |    2 +-
 fs/proc/root.c                            |    2 +-
 kernel/cgroup.c                           |    3 +--
 kernel/perf_event.c                       |    2 +-
 kernel/pid.c                              |    8 ++++----
 kernel/signal.c                           |    9 ++++-----
 kernel/sysctl_binary.c                    |    2 +-
 8 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c
index 0b04662..82e26a0 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -1095,7 +1095,7 @@ static int show_spu_loadavg(struct seq_file *s, void *private)
 		LOAD_INT(c), LOAD_FRAC(c),
 		count_active_contexts(),
 		atomic_read(&nr_spu_contexts),
-		current->nsproxy->pid_ns->last_pid);
+		task_active_pid_ns(current)->last_pid);
 	return 0;
 }
 
diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 975613b..edac0da 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -125,7 +125,7 @@ void mconsole_log(struct mc_request *req)
 void mconsole_proc(struct mc_request *req)
 {
 	struct nameidata nd;
-	struct vfsmount *mnt = current->nsproxy->pid_ns->proc_mnt;
+	struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
 	struct file *file;
 	int n, err;
 	char *ptr = req->request.data, *buf;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index a9000e9..9ea237e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -46,7 +46,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	if (flags & MS_KERNMOUNT)
 		ns = (struct pid_namespace *)data;
 	else
-		ns = current->nsproxy->pid_ns;
+		ns = task_active_pid_ns(current);
 
 	sb = sget(fs_type, proc_test_super, proc_set_super, ns);
 	if (IS_ERR(sb))
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b24d702..5cb4ae7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2741,8 +2741,7 @@ static struct cgroup_pidlist *cgroup_pidlist_find(struct cgroup *cgrp,
 {
 	struct cgroup_pidlist *l;
 	/* don't need task_nsproxy() if we're looking at ourself */
-	struct pid_namespace *ns = current->nsproxy->pid_ns;
-
+	struct pid_namespace *ns = task_active_pid_ns(current);
 	/*
 	 * We can't drop the pidlist_mutex before taking the l->mutex in case
 	 * the last ref-holder is trying to remove l from the list at the same
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 852ae8c..42bdb40 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -5581,7 +5581,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 
 	event->parent		= parent_event;
 
-	event->ns		= get_pid_ns(current->nsproxy->pid_ns);
+	event->ns		= get_pid_ns(task_active_pid_ns(current));
 	event->id		= atomic64_inc_return(&perf_event_id);
 
 	event->state		= PERF_EVENT_STATE_INACTIVE;
diff --git a/kernel/pid.c b/kernel/pid.c
index 39b65b6..b45189d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -339,7 +339,7 @@ EXPORT_SYMBOL_GPL(find_pid_ns);
 
 struct pid *find_vpid(int nr)
 {
-	return find_pid_ns(nr, current->nsproxy->pid_ns);
+	return find_pid_ns(nr, task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(find_vpid);
 
@@ -422,7 +422,7 @@ struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
 
 struct task_struct *find_task_by_vpid(pid_t vnr)
 {
-	return find_task_by_pid_ns(vnr, current->nsproxy->pid_ns);
+	return find_task_by_pid_ns(vnr, task_active_pid_ns(current));
 }
 
 struct pid *get_task_pid(struct task_struct *task, enum pid_type type)
@@ -474,7 +474,7 @@ pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
 
 pid_t pid_vnr(struct pid *pid)
 {
-	return pid_nr_ns(pid, current->nsproxy->pid_ns);
+	return pid_nr_ns(pid, task_active_pid_ns(current));
 }
 EXPORT_SYMBOL_GPL(pid_vnr);
 
@@ -485,7 +485,7 @@ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
 
 	rcu_read_lock();
 	if (!ns)
-		ns = current->nsproxy->pid_ns;
+		ns = task_active_pid_ns(current);
 	if (likely(pid_alive(task))) {
 		if (type != PIDTYPE_PID)
 			task = task->group_leader;
diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..6f10e78 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1462,16 +1462,15 @@ int do_notify_parent(struct task_struct *tsk, int sig)
 	 * we are under tasklist_lock here so our parent is tied to
 	 * us and cannot exit and release its namespace.
 	 *
-	 * the only it can is to switch its nsproxy with sys_unshare,
-	 * bu uncharing pid namespaces is not allowed, so we'll always
-	 * see relevant namespace
+	 * The only it can is to switch its nsproxy with sys_unshare,
+	 * but we use the pid_namespace for task_pid which never changes.
 	 *
 	 * write_lock() currently calls preempt_disable() which is the
 	 * same as rcu_read_lock(), but according to Oleg, this is not
 	 * correct to rely on this
 	 */
 	rcu_read_lock();
-	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
+	info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(tsk->parent));
 	info.si_uid = __task_cred(tsk)->uid;
 	rcu_read_unlock();
 
@@ -1542,7 +1541,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
 	 * see comment in do_notify_parent() abot the following 3 lines
 	 */
 	rcu_read_lock();
-	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
+	info.si_pid = task_pid_nr_ns(tsk, task_active_pid_ns(parent));
 	info.si_uid = __task_cred(tsk)->uid;
 	rcu_read_unlock();
 
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index b875bed..88c69d5 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -1349,7 +1349,7 @@ static ssize_t binary_sysctl(const int *name, int nlen,
 		goto out_putname;
 	}
 
-	mnt = current->nsproxy->pid_ns->proc_mnt;
+	mnt = task_active_pid_ns(current)->proc_mnt;
 	result = vfs_path_lookup(mnt->mnt_root, mnt, pathname, 0, &nd);
 	if (result)
 		goto out_putname;
-- 
1.7.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate
  2011-01-31 10:25 ` [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate Daniel Lezcano
@ 2011-01-31 11:26   ` Alexey Dobriyan
  2011-01-31 14:47     ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Dobriyan @ 2011-01-31 11:26 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, ebiederm, oleg, containers, linux-kernel, clg

On Mon, Jan 31, 2011 at 12:25 PM, Daniel Lezcano <daniel.lezcano@free.fr> wrote:
> The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
> aka ns_of_pid(task_pid(tsk)) should have the same number of
> cache line misses with the practical difference that
> ns_of_pid(task_pid(tsk)) is released later in a processes life.
>
> Furthermore by using task_active_pid_ns it becomes trivial
> to write an unshare implementation for the the pid namespace.
>
> So I have used task_active_pid_ns everywhere I can.

Yet current->nsproxy->pid_ns is way clearer.
Because live current always has pid_ns.

This task_active_pid_ns() is misnamed(?) because it does matter only
for dead tasks?

> -               current->nsproxy->pid_ns->last_pid);
> +               task_active_pid_ns(current)->last_pid);

I thought of doing exactly opposite patch :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace
  2011-01-31 10:25 ` [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace Daniel Lezcano
@ 2011-01-31 13:22   ` Oleg Nesterov
  2011-01-31 13:33     ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2011-01-31 13:22 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, ebiederm, containers, linux-kernel, clg

On 01/31, Daniel Lezcano wrote:
>
> @@ -96,6 +97,9 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
>  	for (i = 1; i < PIDMAP_ENTRIES; i++)
>  		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
>
> +	if (pid_ns_prepare_proc(ns))
> +		goto out_free_map;
> +
>  	return ns;

This is not right, afaics. I already sent the similar patches, but
they were ignored ;)

Please see http://marc.info/?l=linux-kernel&m=127697484000334

If pid_ns_prepare_proc() fails we shouldn't blindly return ENOMEM
and, more importantly, we need put_pid_ns(parent_ns).

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace
  2011-01-31 13:22   ` Oleg Nesterov
@ 2011-01-31 13:33     ` Daniel Lezcano
  2011-01-31 14:02       ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 13:33 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, ebiederm, containers, linux-kernel, clg

On 01/31/2011 02:22 PM, Oleg Nesterov wrote:
> On 01/31, Daniel Lezcano wrote:
>> @@ -96,6 +97,9 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
>>   	for (i = 1; i<  PIDMAP_ENTRIES; i++)
>>   		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
>>
>> +	if (pid_ns_prepare_proc(ns))
>> +		goto out_free_map;
>> +
>>   	return ns;
> This is not right, afaics. I already sent the similar patches, but
> they were ignored ;)
>
> Please see http://marc.info/?l=linux-kernel&m=127697484000334
>
> If pid_ns_prepare_proc() fails we shouldn't blindly return ENOMEM
> and, more importantly, we need put_pid_ns(parent_ns).

Oh, ok. Right. Thanks for the pointer.

Are you ok if I replace the patch 2/4 with your patch ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace
  2011-01-31 13:33     ` Daniel Lezcano
@ 2011-01-31 14:02       ` Oleg Nesterov
  2011-01-31 14:21         ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2011-01-31 14:02 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: akpm, ebiederm, containers, linux-kernel, clg

On 01/31, Daniel Lezcano wrote:
>
> On 01/31/2011 02:22 PM, Oleg Nesterov wrote:
>> On 01/31, Daniel Lezcano wrote:
>>> @@ -96,6 +97,9 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
>>>   	for (i = 1; i<  PIDMAP_ENTRIES; i++)
>>>   		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
>>>
>>> +	if (pid_ns_prepare_proc(ns))
>>> +		goto out_free_map;
>>> +
>>>   	return ns;
>> This is not right, afaics. I already sent the similar patches, but
>> they were ignored ;)
>>
>> Please see http://marc.info/?l=linux-kernel&m=127697484000334
>>
>> If pid_ns_prepare_proc() fails we shouldn't blindly return ENOMEM
>> and, more importantly, we need put_pid_ns(parent_ns).
>
> Oh, ok. Right. Thanks for the pointer.
>
> Are you ok if I replace the patch 2/4 with your patch ?

My patch depends on 1/4, http://marc.info/?l=linux-kernel&m=127697468632667

Your change looks very similar to 1/4 + 3/4. Just fix the problem
in create_pid_namespace(), no need to replace.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace
  2011-01-31 14:02       ` Oleg Nesterov
@ 2011-01-31 14:21         ` Daniel Lezcano
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 14:21 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, ebiederm, containers, linux-kernel, clg

On 01/31/2011 03:02 PM, Oleg Nesterov wrote:
> On 01/31, Daniel Lezcano wrote:
>> On 01/31/2011 02:22 PM, Oleg Nesterov wrote:
>>> On 01/31, Daniel Lezcano wrote:
>>>> @@ -96,6 +97,9 @@ static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_p
>>>>    	for (i = 1; i<   PIDMAP_ENTRIES; i++)
>>>>    		atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE);
>>>>
>>>> +	if (pid_ns_prepare_proc(ns))
>>>> +		goto out_free_map;
>>>> +
>>>>    	return ns;
>>> This is not right, afaics. I already sent the similar patches, but
>>> they were ignored ;)
>>>
>>> Please see http://marc.info/?l=linux-kernel&m=127697484000334
>>>
>>> If pid_ns_prepare_proc() fails we shouldn't blindly return ENOMEM
>>> and, more importantly, we need put_pid_ns(parent_ns).
>> Oh, ok. Right. Thanks for the pointer.
>>
>> Are you ok if I replace the patch 2/4 with your patch ?
> My patch depends on 1/4, http://marc.info/?l=linux-kernel&m=127697468632667
>
> Your change looks very similar to 1/4 + 3/4. Just fix the problem
> in create_pid_namespace(), no need to replace.

Ok, will do.

Thanks Oleg.

   -- Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate
  2011-01-31 11:26   ` Alexey Dobriyan
@ 2011-01-31 14:47     ` Daniel Lezcano
  2011-01-31 15:36       ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2011-01-31 14:47 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, ebiederm, oleg, containers, linux-kernel, clg

On 01/31/2011 12:26 PM, Alexey Dobriyan wrote:
> On Mon, Jan 31, 2011 at 12:25 PM, Daniel Lezcano<daniel.lezcano@free.fr>  wrote:
>> The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
>> aka ns_of_pid(task_pid(tsk)) should have the same number of
>> cache line misses with the practical difference that
>> ns_of_pid(task_pid(tsk)) is released later in a processes life.
>>
>> Furthermore by using task_active_pid_ns it becomes trivial
>> to write an unshare implementation for the the pid namespace.
>>
>> So I have used task_active_pid_ns everywhere I can.
> Yet current->nsproxy->pid_ns is way clearer.
> Because live current always has pid_ns.
>
> This task_active_pid_ns() is misnamed(?) because it does matter only
> for dead tasks?

Actually this function is later used, for the unshare, to get the pid_ns 
of a specific task, not the current one.

http://kerneltrap.org/mailarchive/linux-kernel/2010/6/20/4585095

Do you suggest task_pid_ns(struct task_struct *tsk) would be a better name ?

>> -               current->nsproxy->pid_ns->last_pid);
>> +               task_active_pid_ns(current)->last_pid);
> I thought of doing exactly opposite patch :-)
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate
  2011-01-31 14:47     ` Daniel Lezcano
@ 2011-01-31 15:36       ` Oleg Nesterov
  2011-02-01 10:07         ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2011-01-31 15:36 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Alexey Dobriyan, akpm, ebiederm, containers, linux-kernel, clg

On 01/31, Daniel Lezcano wrote:
>
> On 01/31/2011 12:26 PM, Alexey Dobriyan wrote:
>>
>> This task_active_pid_ns() is misnamed(?) because it does matter only
>> for dead tasks?
>
> Actually this function is later used, for the unshare, to get the pid_ns
> of a specific task, not the current one.

Well, it is already used to get the pid_ns of !current task.

> http://kerneltrap.org/mailarchive/linux-kernel/2010/6/20/4585095

The actual need for this change is that you are going to complicate
the things so that current->proxy->pid_ns != task_active_pid_ns().
This makes me cry ;)

Please don't forget, this patch is buggy, iirc...

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate
  2011-01-31 15:36       ` Oleg Nesterov
@ 2011-02-01 10:07         ` Daniel Lezcano
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2011-02-01 10:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexey Dobriyan, akpm, ebiederm, containers, linux-kernel, clg

On 01/31/2011 04:36 PM, Oleg Nesterov wrote:
> On 01/31, Daniel Lezcano wrote:
>> On 01/31/2011 12:26 PM, Alexey Dobriyan wrote:
>>> This task_active_pid_ns() is misnamed(?) because it does matter only
>>> for dead tasks?
>> Actually this function is later used, for the unshare, to get the pid_ns
>> of a specific task, not the current one.
> Well, it is already used to get the pid_ns of !current task.
>
>> http://kerneltrap.org/mailarchive/linux-kernel/2010/6/20/4585095
> The actual need for this change is that you are going to complicate
> the things so that current->proxy->pid_ns != task_active_pid_ns().
> This makes me cry ;)

Mmh, ok that makes sense.

> Please don't forget, this patch is buggy, iirc...

Ok, I will resend the cleanup patchset without this patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-02-01 10:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 10:25 Prepare for the unshare support of the pid namespace Daniel Lezcano
2011-01-31 10:25 ` [PATCH 1/4] pid: Remove the child_reaper special case in init/main.c Daniel Lezcano
2011-01-31 10:25 ` [PATCH 2/4] pidns: Call pid_ns_prepare_proc from create_pid_namespace Daniel Lezcano
2011-01-31 13:22   ` Oleg Nesterov
2011-01-31 13:33     ` Daniel Lezcano
2011-01-31 14:02       ` Oleg Nesterov
2011-01-31 14:21         ` Daniel Lezcano
2011-01-31 10:25 ` [PATCH 3/4] procfs: kill the global proc_mnt variable Daniel Lezcano
2011-01-31 10:25 ` [PATCH 4/4] pidns: Use task_active_pid_ns where appropriate Daniel Lezcano
2011-01-31 11:26   ` Alexey Dobriyan
2011-01-31 14:47     ` Daniel Lezcano
2011-01-31 15:36       ` Oleg Nesterov
2011-02-01 10:07         ` Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).