LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] User namespaces: set of cleanups (eventually for linux-next?)
@ 2008-10-16 22:46 Serge E. Hallyn
  2008-10-16 23:17 ` James Morris
  2008-10-16 23:54 ` David Howells
  0 siblings, 2 replies; 4+ messages in thread
From: Serge E. Hallyn @ 2008-10-16 22:46 UTC (permalink / raw)
  To: David Howells, James Morris, Stephen Rothwell
  Cc: lkml, Eric W. Biederman, Michael Halcrow

Hi,

Once the creds-next-subsys branch of the security testing tree hits
linux-next, is there any chance of having this patch in there?  It
offers (imho) drastic cleanup over the current userns code.
LTP-approved (but then so were previous buggy versions...)

David, since this consumes your patch, I wasn't sure whether it was
appropriate to put your signed-off-by on here or not.  I decided
doing so was the worse of the potential offenses...

thanks,
-serge

Subject: [PATCH] User namespaces: set of cleanups
From: Serge Hallyn <serue@us.ibm.com>
Date: 1224106725 -0500

The user_ns is moved from nsproxy to user_struct, so that a struct
cred by itself is sufficient to determine access (which it otherwise
would not be).  Corresponding ecryptfs fixes (by David Howells) are
here as well.

Fix refcounting.  The following rules now apply:
        1. The task pins the user struct.
        2. The user struct pins its user namespace.
        3. The user namespace pins the struct user which created it.

User namespaces are cloned during copy_creds().  Unsharing a new user_ns
is no longer possible.  (We could re-add that, but it'll cause code
duplication and doesn't seem useful if PAM doesn't need to clone user
namespaces).

When a user namespace is created, its first user (uid 0) gets empty
keyrings and a clean group_info.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>

---

 fs/ecryptfs/messaging.c        |   13 +++----
 fs/ecryptfs/miscdev.c          |   19 ++++------
 include/linux/cred.h           |    2 +
 include/linux/init_task.h      |    1 -
 include/linux/nsproxy.h        |    1 -
 include/linux/sched.h          |    1 +
 include/linux/user_namespace.h |   13 ++-----
 kernel/cred.c                  |   15 +++++++-
 kernel/fork.c                  |   19 ++++++++--
 kernel/nsproxy.c               |   15 +-------
 kernel/sys.c                   |    4 +-
 kernel/user.c                  |   47 +++++++-----------------
 kernel/user_namespace.c        |   77 +++++++++++++++++-----------------------
 13 files changed, 98 insertions(+), 129 deletions(-)

1d3aca5269bcd4471ca7725111d05ad92f17a3a5
diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c
index 92bf606..eecb8b5 100644
--- a/fs/ecryptfs/messaging.c
+++ b/fs/ecryptfs/messaging.c
@@ -376,7 +376,7 @@ int ecryptfs_process_response(struct ecr
 	struct ecryptfs_msg_ctx *msg_ctx;
 	size_t msg_size;
 	struct nsproxy *nsproxy;
-	struct user_namespace *current_user_ns;
+	struct user_namespace *tsk_user_ns;
 	uid_t ctx_euid;
 	int rc;
 
@@ -401,9 +401,9 @@ int ecryptfs_process_response(struct ecr
 		mutex_unlock(&ecryptfs_daemon_hash_mux);
 		goto wake_up;
 	}
-	current_user_ns = nsproxy->user_ns;
+	tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns;
 	ctx_euid = task_euid(msg_ctx->task);
-	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, current_user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns);
 	rcu_read_unlock();
 	mutex_unlock(&ecryptfs_daemon_hash_mux);
 	if (rc) {
@@ -421,11 +421,11 @@ int ecryptfs_process_response(struct ecr
 		       euid, ctx_euid);
 		goto unlock;
 	}
-	if (current_user_ns != user_ns) {
+	if (tsk_user_ns != user_ns) {
 		rc = -EBADMSG;
 		printk(KERN_WARNING "%s: Received message from user_ns "
 		       "[0x%p]; expected message from user_ns [0x%p]\n",
-		       __func__, user_ns, nsproxy->user_ns);
+		       __func__, user_ns, tsk_user_ns);
 		goto unlock;
 	}
 	if (daemon->pid != pid) {
@@ -486,8 +486,7 @@ ecryptfs_send_message_locked(unsigned in
 	uid_t euid = current_euid();
 	int rc;
 
-	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
-					  current->nsproxy->user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
 	if (rc || !daemon) {
 		rc = -ENOTCONN;
 		printk(KERN_ERR "%s: User [%d] does not have a daemon "
diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c
index 047ac60..efd95a0 100644
--- a/fs/ecryptfs/miscdev.c
+++ b/fs/ecryptfs/miscdev.c
@@ -47,8 +47,7 @@ ecryptfs_miscdev_poll(struct file *file,
 
 	mutex_lock(&ecryptfs_daemon_hash_mux);
 	/* TODO: Just use file->private_data? */
-	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
-					  current->nsproxy->user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
 	BUG_ON(rc || !daemon);
 	mutex_lock(&daemon->mux);
 	mutex_unlock(&ecryptfs_daemon_hash_mux);
@@ -95,11 +94,9 @@ ecryptfs_miscdev_open(struct inode *inod
 		       "count; rc = [%d]\n", __func__, rc);
 		goto out_unlock_daemon_list;
 	}
-	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
-					  current->nsproxy->user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
 	if (rc || !daemon) {
-		rc = ecryptfs_spawn_daemon(&daemon, euid,
-					   current->nsproxy->user_ns,
+		rc = ecryptfs_spawn_daemon(&daemon, euid, current_user_ns(),
 					   task_pid(current));
 		if (rc) {
 			printk(KERN_ERR "%s: Error attempting to spawn daemon; "
@@ -153,8 +150,7 @@ ecryptfs_miscdev_release(struct inode *i
 	int rc;
 
 	mutex_lock(&ecryptfs_daemon_hash_mux);
-	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
-					  current->nsproxy->user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
 	BUG_ON(rc || !daemon);
 	mutex_lock(&daemon->mux);
 	BUG_ON(daemon->pid != task_pid(current));
@@ -254,8 +250,7 @@ ecryptfs_miscdev_read(struct file *file,
 
 	mutex_lock(&ecryptfs_daemon_hash_mux);
 	/* TODO: Just use file->private_data? */
-	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
-					  current->nsproxy->user_ns);
+	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
 	BUG_ON(rc || !daemon);
 	mutex_lock(&daemon->mux);
 	if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) {
@@ -295,7 +290,7 @@ check_list:
 		goto check_list;
 	}
 	BUG_ON(euid != daemon->euid);
-	BUG_ON(current->nsproxy->user_ns != daemon->user_ns);
+	BUG_ON(current_user_ns() != daemon->user_ns);
 	BUG_ON(task_pid(current) != daemon->pid);
 	msg_ctx = list_first_entry(&daemon->msg_ctx_out_queue,
 				   struct ecryptfs_msg_ctx, daemon_out_list);
@@ -468,7 +463,7 @@ ecryptfs_miscdev_write(struct file *file
 			goto out_free;
 		}
 		rc = ecryptfs_miscdev_response(&data[i], packet_size,
-					       euid, current->nsproxy->user_ns,
+					       euid, current_user_ns(),
 					       task_pid(current), seq);
 		if (rc)
 			printk(KERN_WARNING "%s: Failed to deliver miscdev "
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 26c1ab1..3282ee4 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -60,6 +60,7 @@ do {							\
 } while (0)
 
 extern struct group_info *groups_alloc(int);
+extern struct group_info init_groups;
 extern void groups_free(struct group_info *);
 extern int set_current_groups(struct group_info *);
 extern int set_groups(struct cred *, struct group_info *);
@@ -315,6 +316,7 @@ static inline void put_cred(const struct
 #define current_fsgid() 	(current_cred_xxx(fsgid))
 #define current_cap()		(current_cred_xxx(cap_effective))
 #define current_user()		(current_cred_xxx(user))
+#define current_user_ns()	(current_cred_xxx(user)->user_ns)
 #define current_security()	(current_cred_xxx(security))
 
 #define current_uid_gid(_uid, _gid)		\
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index edf0285..126c3c4 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -57,7 +57,6 @@ extern struct nsproxy init_nsproxy;
 	.mnt_ns		= NULL,						\
 	INIT_NET_NS(net_ns)                                             \
 	INIT_IPC_NS(ipc_ns)						\
-	.user_ns	= &init_user_ns,				\
 }
 
 #define INIT_SIGHAND(sighand) {						\
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index c8a768e..afad7de 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -27,7 +27,6 @@ struct nsproxy {
 	struct ipc_namespace *ipc_ns;
 	struct mnt_namespace *mnt_ns;
 	struct pid_namespace *pid_ns;
-	struct user_namespace *user_ns;
 	struct net 	     *net_ns;
 };
 extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0812cfd..02693db 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -595,6 +595,7 @@ struct user_struct {
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
 	uid_t uid;
+	struct user_namespace *user_ns;
 
 #ifdef CONFIG_USER_SCHED
 	struct task_group *tg;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b5f41d4..315bcd3 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -12,7 +12,7 @@
 struct user_namespace {
 	struct kref		kref;
 	struct hlist_head	uidhash_table[UIDHASH_SZ];
-	struct user_struct	*root_user;
+	struct user_struct	*creator;
 };
 
 extern struct user_namespace init_user_ns;
@@ -26,8 +26,7 @@ static inline struct user_namespace *get
 	return ns;
 }
 
-extern struct user_namespace *copy_user_ns(int flags,
-					   struct user_namespace *old_ns);
+extern int create_user_ns(struct cred *new);
 extern void free_user_ns(struct kref *kref);
 
 static inline void put_user_ns(struct user_namespace *ns)
@@ -43,13 +42,9 @@ static inline struct user_namespace *get
 	return &init_user_ns;
 }
 
-static inline struct user_namespace *copy_user_ns(int flags,
-						  struct user_namespace *old_ns)
+static inline int create_user_ns(struct cred *new)
 {
-	if (flags & CLONE_NEWUSER)
-		return ERR_PTR(-EINVAL);
-
-	return old_ns;
+	return -EINVAL;
 }
 
 static inline void put_user_ns(struct user_namespace *ns)
diff --git a/kernel/cred.c b/kernel/cred.c
index 13697ca..ff7bc07 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -274,6 +274,7 @@ int copy_creds(struct task_struct *p, un
 	struct thread_group_cred *tgcred;
 #endif
 	struct cred *new;
+	int ret;
 
 	mutex_init(&p->cred_exec_mutex);
 
@@ -293,6 +294,12 @@ int copy_creds(struct task_struct *p, un
 	if (!new)
 		return -ENOMEM;
 
+	if (clone_flags & CLONE_NEWUSER) {
+		ret = create_user_ns(new);
+		if (ret < 0)
+			goto error_put;
+	}
+
 #ifdef CONFIG_KEYS
 	/* new threads get their own thread keyrings if their parent already
 	 * had one */
@@ -309,8 +316,8 @@ int copy_creds(struct task_struct *p, un
 	if (!(clone_flags & CLONE_THREAD)) {
 		tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL);
 		if (!tgcred) {
-			put_cred(new);
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto error_put;
 		}
 		atomic_set(&tgcred->usage, 1);
 		spin_lock_init(&tgcred->lock);
@@ -325,6 +332,10 @@ int copy_creds(struct task_struct *p, un
 	atomic_inc(&new->user->processes);
 	p->cred = p->real_cred = get_cred(new);
 	return 0;
+
+error_put:
+	put_cred(new);
+	return ret;
 }
 
 /**
diff --git a/kernel/fork.c b/kernel/fork.c
index e392e5a..c71f3c1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -935,7 +935,7 @@ static struct task_struct *copy_process(
 	if (atomic_read(&p->real_cred->user->processes) >=
 			p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
 		if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
-		    p->real_cred->user != current->nsproxy->user_ns->root_user)
+		    p->real_cred->user != INIT_USER)
 			goto bad_fork_free;
 	}
 
@@ -1312,6 +1312,20 @@ long do_fork(unsigned long clone_flags,
 	long nr;
 
 	/*
+	 * Do some preliminary argument and permissions checking before we
+	 * actually start allocating stuff
+	 */
+	if (clone_flags & CLONE_NEWUSER) {
+		if (clone_flags & CLONE_THREAD)
+			return -EINVAL;
+		/* hopefully this check will go away when userns support is
+		 * complete
+		 */
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+	}
+
+	/*
 	 * We hope to recycle these flags after 2.6.26
 	 */
 	if (unlikely(clone_flags & CLONE_STOPPED)) {
@@ -1556,8 +1570,7 @@ asmlinkage long sys_unshare(unsigned lon
 	err = -EINVAL;
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
 				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
-				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|
-				CLONE_NEWNET))
+				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET))
 		goto bad_unshare_out;
 
 	/*
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 1d3ef29..63598dc 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -80,12 +80,6 @@ static struct nsproxy *create_new_namesp
 		goto out_pid;
 	}
 
-	new_nsp->user_ns = copy_user_ns(flags, tsk->nsproxy->user_ns);
-	if (IS_ERR(new_nsp->user_ns)) {
-		err = PTR_ERR(new_nsp->user_ns);
-		goto out_user;
-	}
-
 	new_nsp->net_ns = copy_net_ns(flags, tsk->nsproxy->net_ns);
 	if (IS_ERR(new_nsp->net_ns)) {
 		err = PTR_ERR(new_nsp->net_ns);
@@ -95,9 +89,6 @@ static struct nsproxy *create_new_namesp
 	return new_nsp;
 
 out_net:
-	if (new_nsp->user_ns)
-		put_user_ns(new_nsp->user_ns);
-out_user:
 	if (new_nsp->pid_ns)
 		put_pid_ns(new_nsp->pid_ns);
 out_pid:
@@ -130,7 +121,7 @@ int copy_namespaces(unsigned long flags,
 	get_nsproxy(old_ns);
 
 	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-				CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNET)))
+				CLONE_NEWPID | CLONE_NEWNET)))
 		return 0;
 
 	if (!capable(CAP_SYS_ADMIN)) {
@@ -173,8 +164,6 @@ void free_nsproxy(struct nsproxy *ns)
 		put_ipc_ns(ns->ipc_ns);
 	if (ns->pid_ns)
 		put_pid_ns(ns->pid_ns);
-	if (ns->user_ns)
-		put_user_ns(ns->user_ns);
 	put_net(ns->net_ns);
 	kmem_cache_free(nsproxy_cachep, ns);
 }
@@ -189,7 +178,7 @@ int unshare_nsproxy_namespaces(unsigned 
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWUSER | CLONE_NEWNET)))
+			       CLONE_NEWNET)))
 		return 0;
 
 	if (!capable(CAP_SYS_ADMIN))
diff --git a/kernel/sys.c b/kernel/sys.c
index 6bc3899..26eda31 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -565,13 +565,13 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(current->nsproxy->user_ns, new->uid);
+	new_user = alloc_uid(current_user()->user_ns, new->uid);
 	if (!new_user)
 		return -EAGAIN;
 
 	if (atomic_read(&new_user->processes) >=
 				current->signal->rlim[RLIMIT_NPROC].rlim_cur &&
-			new_user != current->nsproxy->user_ns->root_user) {
+			new_user != INIT_USER) {
 		free_uid(new_user);
 		return -EAGAIN;
 	}
diff --git a/kernel/user.c b/kernel/user.c
index d476307..c0ef3a4 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -20,9 +20,9 @@
 
 struct user_namespace init_user_ns = {
 	.kref = {
-		.refcount	= ATOMIC_INIT(2),
+		.refcount	= ATOMIC_INIT(1),
 	},
-	.root_user = &root_user,
+	.creator = &root_user,
 };
 EXPORT_SYMBOL_GPL(init_user_ns);
 
@@ -48,12 +48,14 @@ static struct kmem_cache *uid_cachep;
  */
 static DEFINE_SPINLOCK(uidhash_lock);
 
+/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(1),
+	.__count	= ATOMIC_INIT(2),
 	.processes	= ATOMIC_INIT(1),
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
+	.user_ns	= &init_user_ns,
 #ifdef CONFIG_USER_SCHED
 	.tg		= &init_task_group,
 #endif
@@ -314,12 +316,13 @@ done:
  * IRQ state (as stored in flags) is restored and uidhash_lock released
  * upon function exit.
  */
-static inline void free_user(struct user_struct *up, unsigned long flags)
+static void free_user(struct user_struct *up, unsigned long flags)
 {
 	/* restore back the count */
 	atomic_inc(&up->__count);
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 
+	put_user_ns(up->user_ns);
 	INIT_WORK(&up->work, remove_user_sysfs_dir);
 	schedule_work(&up->work);
 }
@@ -335,13 +338,14 @@ static inline void uids_mutex_unlock(voi
  * IRQ state (as stored in flags) is restored and uidhash_lock released
  * upon function exit.
  */
-static inline void free_user(struct user_struct *up, unsigned long flags)
+static void free_user(struct user_struct *up, unsigned long flags)
 {
 	uid_hash_remove(up);
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 	sched_destroy_user(up);
 	key_put(up->uid_keyring);
 	key_put(up->session_keyring);
+	put_user_ns(up->user_ns);
 	kmem_cache_free(uid_cachep, up);
 }
 
@@ -357,7 +361,7 @@ struct user_struct *find_user(uid_t uid)
 {
 	struct user_struct *ret;
 	unsigned long flags;
-	struct user_namespace *ns = current->nsproxy->user_ns;
+	struct user_namespace *ns = current_user()->user_ns;
 
 	spin_lock_irqsave(&uidhash_lock, flags);
 	ret = uid_hash_find(uid, uidhashentry(ns, uid));
@@ -404,6 +408,8 @@ struct user_struct *alloc_uid(struct use
 		if (sched_create_user(new) < 0)
 			goto out_free_user;
 
+		new->user_ns = get_user_ns(ns);
+
 		if (uids_user_create(new))
 			goto out_destoy_sched;
 
@@ -427,7 +433,6 @@ struct user_struct *alloc_uid(struct use
 			up = new;
 		}
 		spin_unlock_irq(&uidhash_lock);
-
 	}
 
 	uids_mutex_unlock();
@@ -436,6 +441,7 @@ struct user_struct *alloc_uid(struct use
 
 out_destoy_sched:
 	sched_destroy_user(new);
+	put_user_ns(new->user_ns);
 out_free_user:
 	kmem_cache_free(uid_cachep, new);
 out_unlock:
@@ -443,33 +449,6 @@ out_unlock:
 	return NULL;
 }
 
-#ifdef CONFIG_USER_NS
-void release_uids(struct user_namespace *ns)
-{
-	int i;
-	unsigned long flags;
-	struct hlist_head *head;
-	struct hlist_node *nd;
-
-	spin_lock_irqsave(&uidhash_lock, flags);
-	/*
-	 * collapse the chains so that the user_struct-s will
-	 * be still alive, but not in hashes. subsequent free_uid()
-	 * will free them.
-	 */
-	for (i = 0; i < UIDHASH_SZ; i++) {
-		head = ns->uidhash_table + i;
-		while (!hlist_empty(head)) {
-			nd = head->first;
-			hlist_del_init(nd);
-		}
-	}
-	spin_unlock_irqrestore(&uidhash_lock, flags);
-
-	free_uid(ns->root_user);
-}
-#endif
-
 static int __init uid_cache_init(void)
 {
 	int n;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 0d9c51d..53b164b 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -9,70 +9,57 @@
 #include <linux/nsproxy.h>
 #include <linux/slab.h>
 #include <linux/user_namespace.h>
+#include <linux/cred.h>
 
 /*
- * Clone a new ns copying an original user ns, setting refcount to 1
- * @old_ns: namespace to clone
- * Return NULL on error (failure to kmalloc), new ns otherwise
+ * Create a new user namespace, deriving the creator from the user in the
+ * passed credentials, and replacing that user with the new root user for the
+ * new namespace.
+ *
+ * This is called by copy_creds(), which will finish setting the target task's
+ * credentials.
  */
-static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
+int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns;
-	struct user_struct *new_user;
-	struct cred *new;
+	struct user_struct *root_user;
 	int n;
 
 	ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL);
 	if (!ns)
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
 
 	kref_init(&ns->kref);
 
 	for (n = 0; n < UIDHASH_SZ; ++n)
 		INIT_HLIST_HEAD(ns->uidhash_table + n);
 
-	/* Insert new root user.  */
-	ns->root_user = alloc_uid(ns, 0);
-	if (!ns->root_user) {
+	/* Alloc new root user.  */
+	root_user = alloc_uid(ns, 0);
+	if (!root_user) {
 		kfree(ns);
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
 	}
 
-	/* Reset current->user with a new one */
-	new_user = alloc_uid(ns, current_uid());
-	if (!new_user) {
-		free_uid(ns->root_user);
-		kfree(ns);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/* Install the new user */
-	new = prepare_creds();
-	if (!new) {
-		free_uid(new_user);
-		free_uid(ns->root_user);
-		kfree(ns);
-	}
-	free_uid(new->user);
-	new->user = new_user;
-	commit_creds(new);
-	return ns;
-}
-
-struct user_namespace * copy_user_ns(int flags, struct user_namespace *old_ns)
-{
-	struct user_namespace *new_ns;
-
-	BUG_ON(!old_ns);
-	get_user_ns(old_ns);
-
-	if (!(flags & CLONE_NEWUSER))
-		return old_ns;
+	/* set the new root user in the credentials under preparation */
+	ns->creator = new->user;
+	new->user = root_user;
+	new->uid = new->euid = new->suid = new->fsuid = 0;
+	new->gid = new->egid = new->sgid = new->fsgid = 0;
+	put_group_info(new->group_info);
+	new->group_info = get_group_info(&init_groups);
+	key_put(new->thread_keyring);
+#ifdef CONFIG_KEYS
+	new->thread_keyring = NULL;
+	key_put(new->request_key_auth);
+	new->request_key_auth = NULL;
+#endif
+	/* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */
 
-	new_ns = clone_user_ns(old_ns);
+	/* alloc_uid() incremented the userns refcount.  Just set it to 1 */
+	kref_set(&ns->kref, 1);
 
-	put_user_ns(old_ns);
-	return new_ns;
+	return 0;
 }
 
 void free_user_ns(struct kref *kref)
@@ -80,7 +67,7 @@ void free_user_ns(struct kref *kref)
 	struct user_namespace *ns;
 
 	ns = container_of(kref, struct user_namespace, kref);
-	release_uids(ns);
+	free_uid(ns->creator);
 	kfree(ns);
 }
 EXPORT_SYMBOL(free_user_ns);
-- 
1.1.6

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] User namespaces: set of cleanups (eventually for linux-next?)
  2008-10-16 22:46 [PATCH] User namespaces: set of cleanups (eventually for linux-next?) Serge E. Hallyn
@ 2008-10-16 23:17 ` James Morris
  2008-10-16 23:54 ` David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: James Morris @ 2008-10-16 23:17 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: David Howells, Stephen Rothwell, lkml, Eric W. Biederman,
	Michael Halcrow

On Thu, 16 Oct 2008, Serge E. Hallyn wrote:

> Hi,
> 
> Once the creds-next-subsys branch of the security testing tree hits
> linux-next, is there any chance of having this patch in there?

It should be in linux-next now.

>  It
> offers (imho) drastic cleanup over the current userns code.
> LTP-approved (but then so were previous buggy versions...)
> 
> David, since this consumes your patch, I wasn't sure whether it was
> appropriate to put your signed-off-by on here or not.  I decided
> doing so was the worse of the potential offenses...
> 
> thanks,
> -serge
> 
> Subject: [PATCH] User namespaces: set of cleanups
> From: Serge Hallyn <serue@us.ibm.com>
> Date: 1224106725 -0500
> 
> The user_ns is moved from nsproxy to user_struct, so that a struct
> cred by itself is sufficient to determine access (which it otherwise
> would not be).  Corresponding ecryptfs fixes (by David Howells) are
> here as well.
> 
> Fix refcounting.  The following rules now apply:
>         1. The task pins the user struct.
>         2. The user struct pins its user namespace.
>         3. The user namespace pins the struct user which created it.
> 
> User namespaces are cloned during copy_creds().  Unsharing a new user_ns
> is no longer possible.  (We could re-add that, but it'll cause code
> duplication and doesn't seem useful if PAM doesn't need to clone user
> namespaces).
> 
> When a user namespace is created, its first user (uid 0) gets empty
> keyrings and a clean group_info.
> 
> Signed-off-by: Serge Hallyn <serue@us.ibm.com>
> 
> ---
> 
>  fs/ecryptfs/messaging.c        |   13 +++----
>  fs/ecryptfs/miscdev.c          |   19 ++++------
>  include/linux/cred.h           |    2 +
>  include/linux/init_task.h      |    1 -
>  include/linux/nsproxy.h        |    1 -
>  include/linux/sched.h          |    1 +
>  include/linux/user_namespace.h |   13 ++-----
>  kernel/cred.c                  |   15 +++++++-
>  kernel/fork.c                  |   19 ++++++++--
>  kernel/nsproxy.c               |   15 +-------
>  kernel/sys.c                   |    4 +-
>  kernel/user.c                  |   47 +++++++-----------------
>  kernel/user_namespace.c        |   77 +++++++++++++++++-----------------------
>  13 files changed, 98 insertions(+), 129 deletions(-)
> 
> 1d3aca5269bcd4471ca7725111d05ad92f17a3a5
> diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c
> index 92bf606..eecb8b5 100644
> --- a/fs/ecryptfs/messaging.c
> +++ b/fs/ecryptfs/messaging.c
> @@ -376,7 +376,7 @@ int ecryptfs_process_response(struct ecr
>  	struct ecryptfs_msg_ctx *msg_ctx;
>  	size_t msg_size;
>  	struct nsproxy *nsproxy;
> -	struct user_namespace *current_user_ns;
> +	struct user_namespace *tsk_user_ns;
>  	uid_t ctx_euid;
>  	int rc;
>  
> @@ -401,9 +401,9 @@ int ecryptfs_process_response(struct ecr
>  		mutex_unlock(&ecryptfs_daemon_hash_mux);
>  		goto wake_up;
>  	}
> -	current_user_ns = nsproxy->user_ns;
> +	tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns;
>  	ctx_euid = task_euid(msg_ctx->task);
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, current_user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns);
>  	rcu_read_unlock();
>  	mutex_unlock(&ecryptfs_daemon_hash_mux);
>  	if (rc) {
> @@ -421,11 +421,11 @@ int ecryptfs_process_response(struct ecr
>  		       euid, ctx_euid);
>  		goto unlock;
>  	}
> -	if (current_user_ns != user_ns) {
> +	if (tsk_user_ns != user_ns) {
>  		rc = -EBADMSG;
>  		printk(KERN_WARNING "%s: Received message from user_ns "
>  		       "[0x%p]; expected message from user_ns [0x%p]\n",
> -		       __func__, user_ns, nsproxy->user_ns);
> +		       __func__, user_ns, tsk_user_ns);
>  		goto unlock;
>  	}
>  	if (daemon->pid != pid) {
> @@ -486,8 +486,7 @@ ecryptfs_send_message_locked(unsigned in
>  	uid_t euid = current_euid();
>  	int rc;
>  
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
> -					  current->nsproxy->user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
>  	if (rc || !daemon) {
>  		rc = -ENOTCONN;
>  		printk(KERN_ERR "%s: User [%d] does not have a daemon "
> diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c
> index 047ac60..efd95a0 100644
> --- a/fs/ecryptfs/miscdev.c
> +++ b/fs/ecryptfs/miscdev.c
> @@ -47,8 +47,7 @@ ecryptfs_miscdev_poll(struct file *file,
>  
>  	mutex_lock(&ecryptfs_daemon_hash_mux);
>  	/* TODO: Just use file->private_data? */
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
> -					  current->nsproxy->user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
>  	BUG_ON(rc || !daemon);
>  	mutex_lock(&daemon->mux);
>  	mutex_unlock(&ecryptfs_daemon_hash_mux);
> @@ -95,11 +94,9 @@ ecryptfs_miscdev_open(struct inode *inod
>  		       "count; rc = [%d]\n", __func__, rc);
>  		goto out_unlock_daemon_list;
>  	}
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
> -					  current->nsproxy->user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
>  	if (rc || !daemon) {
> -		rc = ecryptfs_spawn_daemon(&daemon, euid,
> -					   current->nsproxy->user_ns,
> +		rc = ecryptfs_spawn_daemon(&daemon, euid, current_user_ns(),
>  					   task_pid(current));
>  		if (rc) {
>  			printk(KERN_ERR "%s: Error attempting to spawn daemon; "
> @@ -153,8 +150,7 @@ ecryptfs_miscdev_release(struct inode *i
>  	int rc;
>  
>  	mutex_lock(&ecryptfs_daemon_hash_mux);
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
> -					  current->nsproxy->user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
>  	BUG_ON(rc || !daemon);
>  	mutex_lock(&daemon->mux);
>  	BUG_ON(daemon->pid != task_pid(current));
> @@ -254,8 +250,7 @@ ecryptfs_miscdev_read(struct file *file,
>  
>  	mutex_lock(&ecryptfs_daemon_hash_mux);
>  	/* TODO: Just use file->private_data? */
> -	rc = ecryptfs_find_daemon_by_euid(&daemon, euid,
> -					  current->nsproxy->user_ns);
> +	rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns());
>  	BUG_ON(rc || !daemon);
>  	mutex_lock(&daemon->mux);
>  	if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) {
> @@ -295,7 +290,7 @@ check_list:
>  		goto check_list;
>  	}
>  	BUG_ON(euid != daemon->euid);
> -	BUG_ON(current->nsproxy->user_ns != daemon->user_ns);
> +	BUG_ON(current_user_ns() != daemon->user_ns);
>  	BUG_ON(task_pid(current) != daemon->pid);
>  	msg_ctx = list_first_entry(&daemon->msg_ctx_out_queue,
>  				   struct ecryptfs_msg_ctx, daemon_out_list);
> @@ -468,7 +463,7 @@ ecryptfs_miscdev_write(struct file *file
>  			goto out_free;
>  		}
>  		rc = ecryptfs_miscdev_response(&data[i], packet_size,
> -					       euid, current->nsproxy->user_ns,
> +					       euid, current_user_ns(),
>  					       task_pid(current), seq);
>  		if (rc)
>  			printk(KERN_WARNING "%s: Failed to deliver miscdev "
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 26c1ab1..3282ee4 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -60,6 +60,7 @@ do {							\
>  } while (0)
>  
>  extern struct group_info *groups_alloc(int);
> +extern struct group_info init_groups;
>  extern void groups_free(struct group_info *);
>  extern int set_current_groups(struct group_info *);
>  extern int set_groups(struct cred *, struct group_info *);
> @@ -315,6 +316,7 @@ static inline void put_cred(const struct
>  #define current_fsgid() 	(current_cred_xxx(fsgid))
>  #define current_cap()		(current_cred_xxx(cap_effective))
>  #define current_user()		(current_cred_xxx(user))
> +#define current_user_ns()	(current_cred_xxx(user)->user_ns)
>  #define current_security()	(current_cred_xxx(security))
>  
>  #define current_uid_gid(_uid, _gid)		\
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index edf0285..126c3c4 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -57,7 +57,6 @@ extern struct nsproxy init_nsproxy;
>  	.mnt_ns		= NULL,						\
>  	INIT_NET_NS(net_ns)                                             \
>  	INIT_IPC_NS(ipc_ns)						\
> -	.user_ns	= &init_user_ns,				\
>  }
>  
>  #define INIT_SIGHAND(sighand) {						\
> diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> index c8a768e..afad7de 100644
> --- a/include/linux/nsproxy.h
> +++ b/include/linux/nsproxy.h
> @@ -27,7 +27,6 @@ struct nsproxy {
>  	struct ipc_namespace *ipc_ns;
>  	struct mnt_namespace *mnt_ns;
>  	struct pid_namespace *pid_ns;
> -	struct user_namespace *user_ns;
>  	struct net 	     *net_ns;
>  };
>  extern struct nsproxy init_nsproxy;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0812cfd..02693db 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -595,6 +595,7 @@ struct user_struct {
>  	/* Hash table maintenance information */
>  	struct hlist_node uidhash_node;
>  	uid_t uid;
> +	struct user_namespace *user_ns;
>  
>  #ifdef CONFIG_USER_SCHED
>  	struct task_group *tg;
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index b5f41d4..315bcd3 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -12,7 +12,7 @@
>  struct user_namespace {
>  	struct kref		kref;
>  	struct hlist_head	uidhash_table[UIDHASH_SZ];
> -	struct user_struct	*root_user;
> +	struct user_struct	*creator;
>  };
>  
>  extern struct user_namespace init_user_ns;
> @@ -26,8 +26,7 @@ static inline struct user_namespace *get
>  	return ns;
>  }
>  
> -extern struct user_namespace *copy_user_ns(int flags,
> -					   struct user_namespace *old_ns);
> +extern int create_user_ns(struct cred *new);
>  extern void free_user_ns(struct kref *kref);
>  
>  static inline void put_user_ns(struct user_namespace *ns)
> @@ -43,13 +42,9 @@ static inline struct user_namespace *get
>  	return &init_user_ns;
>  }
>  
> -static inline struct user_namespace *copy_user_ns(int flags,
> -						  struct user_namespace *old_ns)
> +static inline int create_user_ns(struct cred *new)
>  {
> -	if (flags & CLONE_NEWUSER)
> -		return ERR_PTR(-EINVAL);
> -
> -	return old_ns;
> +	return -EINVAL;
>  }
>  
>  static inline void put_user_ns(struct user_namespace *ns)
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 13697ca..ff7bc07 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -274,6 +274,7 @@ int copy_creds(struct task_struct *p, un
>  	struct thread_group_cred *tgcred;
>  #endif
>  	struct cred *new;
> +	int ret;
>  
>  	mutex_init(&p->cred_exec_mutex);
>  
> @@ -293,6 +294,12 @@ int copy_creds(struct task_struct *p, un
>  	if (!new)
>  		return -ENOMEM;
>  
> +	if (clone_flags & CLONE_NEWUSER) {
> +		ret = create_user_ns(new);
> +		if (ret < 0)
> +			goto error_put;
> +	}
> +
>  #ifdef CONFIG_KEYS
>  	/* new threads get their own thread keyrings if their parent already
>  	 * had one */
> @@ -309,8 +316,8 @@ int copy_creds(struct task_struct *p, un
>  	if (!(clone_flags & CLONE_THREAD)) {
>  		tgcred = kmalloc(sizeof(*tgcred), GFP_KERNEL);
>  		if (!tgcred) {
> -			put_cred(new);
> -			return -ENOMEM;
> +			ret = -ENOMEM;
> +			goto error_put;
>  		}
>  		atomic_set(&tgcred->usage, 1);
>  		spin_lock_init(&tgcred->lock);
> @@ -325,6 +332,10 @@ int copy_creds(struct task_struct *p, un
>  	atomic_inc(&new->user->processes);
>  	p->cred = p->real_cred = get_cred(new);
>  	return 0;
> +
> +error_put:
> +	put_cred(new);
> +	return ret;
>  }
>  
>  /**
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e392e5a..c71f3c1 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -935,7 +935,7 @@ static struct task_struct *copy_process(
>  	if (atomic_read(&p->real_cred->user->processes) >=
>  			p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
>  		if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
> -		    p->real_cred->user != current->nsproxy->user_ns->root_user)
> +		    p->real_cred->user != INIT_USER)
>  			goto bad_fork_free;
>  	}
>  
> @@ -1312,6 +1312,20 @@ long do_fork(unsigned long clone_flags,
>  	long nr;
>  
>  	/*
> +	 * Do some preliminary argument and permissions checking before we
> +	 * actually start allocating stuff
> +	 */
> +	if (clone_flags & CLONE_NEWUSER) {
> +		if (clone_flags & CLONE_THREAD)
> +			return -EINVAL;
> +		/* hopefully this check will go away when userns support is
> +		 * complete
> +		 */
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +	}
> +
> +	/*
>  	 * We hope to recycle these flags after 2.6.26
>  	 */
>  	if (unlikely(clone_flags & CLONE_STOPPED)) {
> @@ -1556,8 +1570,7 @@ asmlinkage long sys_unshare(unsigned lon
>  	err = -EINVAL;
>  	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
>  				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
> -				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|
> -				CLONE_NEWNET))
> +				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET))
>  		goto bad_unshare_out;
>  
>  	/*
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 1d3ef29..63598dc 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -80,12 +80,6 @@ static struct nsproxy *create_new_namesp
>  		goto out_pid;
>  	}
>  
> -	new_nsp->user_ns = copy_user_ns(flags, tsk->nsproxy->user_ns);
> -	if (IS_ERR(new_nsp->user_ns)) {
> -		err = PTR_ERR(new_nsp->user_ns);
> -		goto out_user;
> -	}
> -
>  	new_nsp->net_ns = copy_net_ns(flags, tsk->nsproxy->net_ns);
>  	if (IS_ERR(new_nsp->net_ns)) {
>  		err = PTR_ERR(new_nsp->net_ns);
> @@ -95,9 +89,6 @@ static struct nsproxy *create_new_namesp
>  	return new_nsp;
>  
>  out_net:
> -	if (new_nsp->user_ns)
> -		put_user_ns(new_nsp->user_ns);
> -out_user:
>  	if (new_nsp->pid_ns)
>  		put_pid_ns(new_nsp->pid_ns);
>  out_pid:
> @@ -130,7 +121,7 @@ int copy_namespaces(unsigned long flags,
>  	get_nsproxy(old_ns);
>  
>  	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> -				CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNET)))
> +				CLONE_NEWPID | CLONE_NEWNET)))
>  		return 0;
>  
>  	if (!capable(CAP_SYS_ADMIN)) {
> @@ -173,8 +164,6 @@ void free_nsproxy(struct nsproxy *ns)
>  		put_ipc_ns(ns->ipc_ns);
>  	if (ns->pid_ns)
>  		put_pid_ns(ns->pid_ns);
> -	if (ns->user_ns)
> -		put_user_ns(ns->user_ns);
>  	put_net(ns->net_ns);
>  	kmem_cache_free(nsproxy_cachep, ns);
>  }
> @@ -189,7 +178,7 @@ int unshare_nsproxy_namespaces(unsigned 
>  	int err = 0;
>  
>  	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> -			       CLONE_NEWUSER | CLONE_NEWNET)))
> +			       CLONE_NEWNET)))
>  		return 0;
>  
>  	if (!capable(CAP_SYS_ADMIN))
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 6bc3899..26eda31 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -565,13 +565,13 @@ static int set_user(struct cred *new)
>  {
>  	struct user_struct *new_user;
>  
> -	new_user = alloc_uid(current->nsproxy->user_ns, new->uid);
> +	new_user = alloc_uid(current_user()->user_ns, new->uid);
>  	if (!new_user)
>  		return -EAGAIN;
>  
>  	if (atomic_read(&new_user->processes) >=
>  				current->signal->rlim[RLIMIT_NPROC].rlim_cur &&
> -			new_user != current->nsproxy->user_ns->root_user) {
> +			new_user != INIT_USER) {
>  		free_uid(new_user);
>  		return -EAGAIN;
>  	}
> diff --git a/kernel/user.c b/kernel/user.c
> index d476307..c0ef3a4 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -20,9 +20,9 @@
>  
>  struct user_namespace init_user_ns = {
>  	.kref = {
> -		.refcount	= ATOMIC_INIT(2),
> +		.refcount	= ATOMIC_INIT(1),
>  	},
> -	.root_user = &root_user,
> +	.creator = &root_user,
>  };
>  EXPORT_SYMBOL_GPL(init_user_ns);
>  
> @@ -48,12 +48,14 @@ static struct kmem_cache *uid_cachep;
>   */
>  static DEFINE_SPINLOCK(uidhash_lock);
>  
> +/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
>  struct user_struct root_user = {
> -	.__count	= ATOMIC_INIT(1),
> +	.__count	= ATOMIC_INIT(2),
>  	.processes	= ATOMIC_INIT(1),
>  	.files		= ATOMIC_INIT(0),
>  	.sigpending	= ATOMIC_INIT(0),
>  	.locked_shm     = 0,
> +	.user_ns	= &init_user_ns,
>  #ifdef CONFIG_USER_SCHED
>  	.tg		= &init_task_group,
>  #endif
> @@ -314,12 +316,13 @@ done:
>   * IRQ state (as stored in flags) is restored and uidhash_lock released
>   * upon function exit.
>   */
> -static inline void free_user(struct user_struct *up, unsigned long flags)
> +static void free_user(struct user_struct *up, unsigned long flags)
>  {
>  	/* restore back the count */
>  	atomic_inc(&up->__count);
>  	spin_unlock_irqrestore(&uidhash_lock, flags);
>  
> +	put_user_ns(up->user_ns);
>  	INIT_WORK(&up->work, remove_user_sysfs_dir);
>  	schedule_work(&up->work);
>  }
> @@ -335,13 +338,14 @@ static inline void uids_mutex_unlock(voi
>   * IRQ state (as stored in flags) is restored and uidhash_lock released
>   * upon function exit.
>   */
> -static inline void free_user(struct user_struct *up, unsigned long flags)
> +static void free_user(struct user_struct *up, unsigned long flags)
>  {
>  	uid_hash_remove(up);
>  	spin_unlock_irqrestore(&uidhash_lock, flags);
>  	sched_destroy_user(up);
>  	key_put(up->uid_keyring);
>  	key_put(up->session_keyring);
> +	put_user_ns(up->user_ns);
>  	kmem_cache_free(uid_cachep, up);
>  }
>  
> @@ -357,7 +361,7 @@ struct user_struct *find_user(uid_t uid)
>  {
>  	struct user_struct *ret;
>  	unsigned long flags;
> -	struct user_namespace *ns = current->nsproxy->user_ns;
> +	struct user_namespace *ns = current_user()->user_ns;
>  
>  	spin_lock_irqsave(&uidhash_lock, flags);
>  	ret = uid_hash_find(uid, uidhashentry(ns, uid));
> @@ -404,6 +408,8 @@ struct user_struct *alloc_uid(struct use
>  		if (sched_create_user(new) < 0)
>  			goto out_free_user;
>  
> +		new->user_ns = get_user_ns(ns);
> +
>  		if (uids_user_create(new))
>  			goto out_destoy_sched;
>  
> @@ -427,7 +433,6 @@ struct user_struct *alloc_uid(struct use
>  			up = new;
>  		}
>  		spin_unlock_irq(&uidhash_lock);
> -
>  	}
>  
>  	uids_mutex_unlock();
> @@ -436,6 +441,7 @@ struct user_struct *alloc_uid(struct use
>  
>  out_destoy_sched:
>  	sched_destroy_user(new);
> +	put_user_ns(new->user_ns);
>  out_free_user:
>  	kmem_cache_free(uid_cachep, new);
>  out_unlock:
> @@ -443,33 +449,6 @@ out_unlock:
>  	return NULL;
>  }
>  
> -#ifdef CONFIG_USER_NS
> -void release_uids(struct user_namespace *ns)
> -{
> -	int i;
> -	unsigned long flags;
> -	struct hlist_head *head;
> -	struct hlist_node *nd;
> -
> -	spin_lock_irqsave(&uidhash_lock, flags);
> -	/*
> -	 * collapse the chains so that the user_struct-s will
> -	 * be still alive, but not in hashes. subsequent free_uid()
> -	 * will free them.
> -	 */
> -	for (i = 0; i < UIDHASH_SZ; i++) {
> -		head = ns->uidhash_table + i;
> -		while (!hlist_empty(head)) {
> -			nd = head->first;
> -			hlist_del_init(nd);
> -		}
> -	}
> -	spin_unlock_irqrestore(&uidhash_lock, flags);
> -
> -	free_uid(ns->root_user);
> -}
> -#endif
> -
>  static int __init uid_cache_init(void)
>  {
>  	int n;
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 0d9c51d..53b164b 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -9,70 +9,57 @@
>  #include <linux/nsproxy.h>
>  #include <linux/slab.h>
>  #include <linux/user_namespace.h>
> +#include <linux/cred.h>
>  
>  /*
> - * Clone a new ns copying an original user ns, setting refcount to 1
> - * @old_ns: namespace to clone
> - * Return NULL on error (failure to kmalloc), new ns otherwise
> + * Create a new user namespace, deriving the creator from the user in the
> + * passed credentials, and replacing that user with the new root user for the
> + * new namespace.
> + *
> + * This is called by copy_creds(), which will finish setting the target task's
> + * credentials.
>   */
> -static struct user_namespace *clone_user_ns(struct user_namespace *old_ns)
> +int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns;
> -	struct user_struct *new_user;
> -	struct cred *new;
> +	struct user_struct *root_user;
>  	int n;
>  
>  	ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL);
>  	if (!ns)
> -		return ERR_PTR(-ENOMEM);
> +		return -ENOMEM;
>  
>  	kref_init(&ns->kref);
>  
>  	for (n = 0; n < UIDHASH_SZ; ++n)
>  		INIT_HLIST_HEAD(ns->uidhash_table + n);
>  
> -	/* Insert new root user.  */
> -	ns->root_user = alloc_uid(ns, 0);
> -	if (!ns->root_user) {
> +	/* Alloc new root user.  */
> +	root_user = alloc_uid(ns, 0);
> +	if (!root_user) {
>  		kfree(ns);
> -		return ERR_PTR(-ENOMEM);
> +		return -ENOMEM;
>  	}
>  
> -	/* Reset current->user with a new one */
> -	new_user = alloc_uid(ns, current_uid());
> -	if (!new_user) {
> -		free_uid(ns->root_user);
> -		kfree(ns);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
> -	/* Install the new user */
> -	new = prepare_creds();
> -	if (!new) {
> -		free_uid(new_user);
> -		free_uid(ns->root_user);
> -		kfree(ns);
> -	}
> -	free_uid(new->user);
> -	new->user = new_user;
> -	commit_creds(new);
> -	return ns;
> -}
> -
> -struct user_namespace * copy_user_ns(int flags, struct user_namespace *old_ns)
> -{
> -	struct user_namespace *new_ns;
> -
> -	BUG_ON(!old_ns);
> -	get_user_ns(old_ns);
> -
> -	if (!(flags & CLONE_NEWUSER))
> -		return old_ns;
> +	/* set the new root user in the credentials under preparation */
> +	ns->creator = new->user;
> +	new->user = root_user;
> +	new->uid = new->euid = new->suid = new->fsuid = 0;
> +	new->gid = new->egid = new->sgid = new->fsgid = 0;
> +	put_group_info(new->group_info);
> +	new->group_info = get_group_info(&init_groups);
> +	key_put(new->thread_keyring);
> +#ifdef CONFIG_KEYS
> +	new->thread_keyring = NULL;
> +	key_put(new->request_key_auth);
> +	new->request_key_auth = NULL;
> +#endif
> +	/* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */
>  
> -	new_ns = clone_user_ns(old_ns);
> +	/* alloc_uid() incremented the userns refcount.  Just set it to 1 */
> +	kref_set(&ns->kref, 1);
>  
> -	put_user_ns(old_ns);
> -	return new_ns;
> +	return 0;
>  }
>  
>  void free_user_ns(struct kref *kref)
> @@ -80,7 +67,7 @@ void free_user_ns(struct kref *kref)
>  	struct user_namespace *ns;
>  
>  	ns = container_of(kref, struct user_namespace, kref);
> -	release_uids(ns);
> +	free_uid(ns->creator);
>  	kfree(ns);
>  }
>  EXPORT_SYMBOL(free_user_ns);
> -- 
> 1.1.6
> 

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] User namespaces: set of cleanups (eventually for linux-next?)
  2008-10-16 22:46 [PATCH] User namespaces: set of cleanups (eventually for linux-next?) Serge E. Hallyn
  2008-10-16 23:17 ` James Morris
@ 2008-10-16 23:54 ` David Howells
  2008-10-20 18:51   ` Serge E. Hallyn
  1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2008-10-16 23:54 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: dhowells, James Morris, Stephen Rothwell, lkml,
	Eric W. Biederman, Michael Halcrow


Serge E. Hallyn <serue@us.ibm.com> wrote:

> David, since this consumes your patch, I wasn't sure whether it was
> appropriate to put your signed-off-by on here or not.  I decided
> doing so was the worse of the potential offenses...

The way I think I'd've done it is to include my patch description and
signed-off-by then list your additional changes (just keep on incrementing the
point numbers) and your signed-off-by.

> -	new_user = alloc_uid(current->nsproxy->user_ns, new->uid);
> +	new_user = alloc_uid(current_user()->user_ns, new->uid);

That should be current_user_ns() rather than current_user()->user_ns.  I made
this change before adding the macro.

> +	key_put(new->thread_keyring);
> +	new->thread_keyring = NULL;

Superfluous.  copy_creds() does this immediately upon return.

Also, in copy_creds(), should the session and process keyrings be discarded if
CLONE_NEWUSER is set?  Actually, I think that should be dealt with by a patch
to deal with namespacing keyrings as the user-default keyrings need to be
namespaced rather than here.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] User namespaces: set of cleanups (eventually for linux-next?)
  2008-10-16 23:54 ` David Howells
@ 2008-10-20 18:51   ` Serge E. Hallyn
  0 siblings, 0 replies; 4+ messages in thread
From: Serge E. Hallyn @ 2008-10-20 18:51 UTC (permalink / raw)
  To: David Howells
  Cc: James Morris, Stephen Rothwell, lkml, Eric W. Biederman, Michael Halcrow

Quoting David Howells (dhowells@redhat.com):
> 
> Serge E. Hallyn <serue@us.ibm.com> wrote:
> 
> > David, since this consumes your patch, I wasn't sure whether it was
> > appropriate to put your signed-off-by on here or not.  I decided
> > doing so was the worse of the potential offenses...
> 
> The way I think I'd've done it is to include my patch description and
> signed-off-by then list your additional changes (just keep on incrementing the
> point numbers) and your signed-off-by.
> 
> > -	new_user = alloc_uid(current->nsproxy->user_ns, new->uid);
> > +	new_user = alloc_uid(current_user()->user_ns, new->uid);
> 
> That should be current_user_ns() rather than current_user()->user_ns.  I made
> this change before adding the macro.

Ok, will change.

> > +	key_put(new->thread_keyring);
> > +	new->thread_keyring = NULL;
> 
> Superfluous.  copy_creds() does this immediately upon return.

Oops, will drop.

> Also, in copy_creds(), should the session and process keyrings be discarded if
> CLONE_NEWUSER is set?  Actually, I think that should be dealt with by a patch
> to deal with namespacing keyrings as the user-default keyrings need to be
> namespaced rather than here.
> 
> David

Oh, yeah, that's going to have to be a separate patch I guess...

thanks,
-serge

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-10-20 18:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-16 22:46 [PATCH] User namespaces: set of cleanups (eventually for linux-next?) Serge E. Hallyn
2008-10-16 23:17 ` James Morris
2008-10-16 23:54 ` David Howells
2008-10-20 18:51   ` Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).