LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] capabilites, take 2
@ 2004-05-13 20:08 Andy Lutomirski
  2004-05-14  1:20 ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-13 20:08 UTC (permalink / raw)
  To: akpm, chrisw, linux-kernel; +Cc: luto

This implements working capabilities.

Changes from the previous version:

- It's now just one patch (the split doesn't make sense anymore)
- Cleaned up cap_bprm_apply_creds
- Restrictions on setuid are somewhat stricter now
- Rediffed against 2.6.6-mm2 (didn't change anything)
- bprm is initialized correctly in fs/exec.c
- printk to remind users that the patch is enabled
- LSM builds correctly.  Untested.

The whole thing is now a module parameter -- specify
commoncap.newcaps=1 if capabilities is built-in or newcaps=1 in
modprobe or insmod if not.

Known issues:

1. setxuid emulation is wrong (keepcaps doesn't work for setre(s)uid).
   I haven't fixed this because I don't what to change the semantics
   of PR_SET_KEEPCAPS.  I'll probably add PR_SET_REALLY_KEEPCAPS
   to really keep capabilities.

2. When newcaps=0 (the default), linuxrc gets all capabilities.  This
   is different from the old behavior.  Init and programs started from
   linuxrc still match the old behavior.  I couldn't think of any
   remotely clean way around that without breaking linuxrc for
   newcaps=1

3. I haven't tried it, but I imagine that unloading commoncap and then
   reloading it in a different mode may do unexpected things.  So read
   the code before you try that.  I see no reason to fix this one because
   the whole thing should go away eventually.

When newcaps=1, this extends the LSM interface:
bprm->secflags & BINPRM_SEC_NO_ELEVATE means that privileges should not
be elevated.  So stacking modules (e.g. selinux) should set this before
calling to the lower module and test it again before deciding whether
to elevate privileges.  That way, either all privs are elevated or none
are.

I also added a flag (BINPRM_SEC_SECUREEXEC) with the obvious meaning.
Otherwise cap_bprm_secureexec would have been a mess.

--Andy


Implement optional working capability support.  Try to avoid giving Andrew
a heart attack. ;)

 fs/exec.c                  |   16 ++++
 include/linux/binfmts.h    |    8 ++
 include/linux/capability.h |    6 +
 include/linux/init_task.h  |    4 -
 kernel/capability.c        |    2 
 security/commoncap.c       |  155 +++++++++++++++++++++++++++++++++++++++++----
 6 files changed, 172 insertions(+), 19 deletions(-)

--- linux-2.6.6-mm2/fs/exec.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/fs/exec.c	2004-05-13 12:15:20.000000000 -0700
@@ -882,8 +882,10 @@
 
 	if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
 		/* Set-uid? */
-		if (mode & S_ISUID)
+		if (mode & S_ISUID) {
 			bprm->e_uid = inode->i_uid;
+			bprm->secflags |= BINPRM_SEC_SETUID;
+		}
 
 		/* Set-gid? */
 		/*
@@ -891,10 +893,19 @@
 		 * is a candidate for mandatory locking, not a setgid
 		 * executable.
 		 */
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
 			bprm->e_gid = inode->i_gid;
+			bprm->secflags |= BINPRM_SEC_SETGID;
+		}
 	}
 
+	/* Pretend we have VFS capabilities */
+	cap_set_full(bprm->cap_inheritable);
+	if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+		cap_set_full(bprm->cap_permitted);
+	else
+		cap_clear(bprm->cap_permitted);
+
 	/* fill in binprm security blob */
 	retval = security_bprm_set(bprm);
 	if (retval)
@@ -1089,6 +1100,7 @@
 	bprm.loader = 0;
 	bprm.exec = 0;
 	bprm.security = NULL;
+	bprm.secflags = 0;
 	bprm.mm = mm_alloc();
 	retval = -ENOMEM;
 	if (!bprm.mm)
--- linux-2.6.6-mm2/security/commoncap.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/security/commoncap.c	2004-05-13 12:59:32.934690092 -0700
@@ -24,6 +24,11 @@
 #include <linux/xattr.h>
 #include <linux/hugetlb.h>
 
+int newcaps = 0;
+
+module_param(newcaps, int, 444);
+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
+
 int cap_capable (struct task_struct *tsk, int cap)
 {
 	/* Derived from include/linux/sched.h:capable. */
@@ -36,6 +41,11 @@
 int cap_ptrace (struct task_struct *parent, struct task_struct *child)
 {
 	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
+	if (newcaps &&
+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
+		return -EPERM;
+
 	if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
 	    !capable (CAP_SYS_PTRACE))
 		return -EPERM;
@@ -76,6 +86,11 @@
 		return -EPERM;
 	}
 
+	/* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
+	if (newcaps && !cap_issubset (*permitted, *inheritable)) {
+		return -EPERM;
+	}
+
 	return 0;
 }
 
@@ -89,6 +104,8 @@
 
 int cap_bprm_set_security (struct linux_binprm *bprm)
 {
+	if (newcaps) return 0;
+
 	/* Copied from fs/exec.c:prepare_binprm. */
 
 	/* We don't have VFS support for capabilities yet */
@@ -115,10 +132,11 @@
 	return 0;
 }
 
-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
 {
-	/* Derived from fs/exec.c:compute_creds. */
+	/* This function will hopefully die in 2.7. */
 	kernel_cap_t new_permitted, working;
+	static int fixed_init = 0;
 
 	new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
 	working = cap_intersect (bprm->cap_inheritable,
@@ -151,6 +169,15 @@
 		current->cap_permitted = new_permitted;
 		current->cap_effective =
 		    cap_intersect (new_permitted, bprm->cap_effective);
+	} else if (!fixed_init) {
+		/* This is not strictly correct, as it gives linuxrc more
+		 * permissions than it used to have.  It was the only way I
+		 * could think of to keep the resulting disaster contained,
+		 * though.
+		 */
+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+		fixed_init = 1;
 	}
 
 	/* AUD: Audit candidate if current->cap_effective is set */
@@ -158,15 +185,103 @@
 	current->keep_capabilities = 0;
 }
 
+/*
+ * The rules of Linux capabilities (not POSIX!)
+ *
+ * What the masks mean:
+ *  pI = capabilities that this process or its children may have
+ *  pP = capabilities that this process has
+ *  pE = capabilities that this process has and are enabled
+ * (so pE <= pP <= pI)
+ *
+ * The capability evolution rules are:
+ *
+ *  pI' = pI & fI
+ *  pP' = ((fP & cap_bset) | pP) & pI' & Y
+ *  pE' = (setuid ? pP' : (pE & pP'))
+ *
+ *  X = cap_bset
+ *  Y is zero if uid!=0, euid==0, and setuid non-root
+ *
+ * Caveat: if (fP & ~pI'), there is no _theoretical_ problem, but
+ * this could introduce exploits in buggy programs.  Since programs
+ * that aren't capability-aware are insecure _anyway_ if pP!=0, this
+ * is OK.
+ *
+ * To allow pI != ~0 to be secure in the presence of buggy programs,
+ * we require full pI for setuid.
+ *
+ * The moral is that, if file capabilities are introduced, programs
+ * that are granted fP > 0 need to be aware of how to deal with it.
+ * Because the effective set is left alone on non-setuid fP>0,
+ * such a program should drop capabilities that were not in its initial
+ * effective set before running untrusted code.
+ *
+ */
+void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+{
+	kernel_cap_t new_pI, new_pP;
+	kernel_cap_t fI, fP;
+	int is_setuid, is_setgid;
+
+	if(!newcaps) {
+		cap_bprm_apply_creds_compat(bprm, unsafe);
+		return;
+	}
+
+	fI = bprm->cap_inheritable;
+	fP = bprm->cap_permitted;
+	is_setuid = (bprm->secflags & BINPRM_SEC_SETUID);
+	is_setgid = (bprm->secflags & BINPRM_SEC_SETGID);
+
+	new_pI = cap_intersect(current->cap_inheritable, fI);
+
+	/* Check that it's safe to elevate privileges */
+	if (unsafe & ~LSM_UNSAFE_PTRACE_CAP)
+		bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+	/* FIXME: Is this overly harsh on setgid? */
+	if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
+	    new_pI != CAP_FULL_SET)
+			bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+	if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
+		is_setuid = is_setgid = 0;
+		cap_clear(fP);
+	}
+
+	new_pP = cap_intersect(fP, cap_bset);
+	new_pP = cap_combine(new_pP, current->cap_permitted);
+	cap_mask(new_pP, new_pI);
+
+	/* setuid-nonroot is special. */
+	if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
+	    current->euid == 0)
+		cap_clear(new_pP);
+
+	if(!cap_issubset(new_pP, current->cap_permitted))
+		bprm->secflags |= BINPRM_SEC_SECUREEXEC;
+
+	/* Apply new security state */
+	if (is_setuid) {
+	        current->suid = current->euid = current->fsuid = bprm->e_uid;
+		current->cap_effective = new_pP;
+	}
+	if (is_setgid)
+	        current->sgid = current->egid = current->fsgid = bprm->e_gid;
+
+	current->cap_inheritable = new_pI;
+	current->cap_permitted = new_pP;
+	cap_mask(current->cap_effective, new_pP);
+
+	current->keep_capabilities = 0;
+}
+
 int cap_bprm_secureexec (struct linux_binprm *bprm)
 {
-	/* If/when this module is enhanced to incorporate capability
-	   bits on files, the test below should be extended to also perform a 
-	   test between the old and new capability sets.  For now,
-	   it simply preserves the legacy decision algorithm used by
-	   the old userland. */
 	return (current->euid != current->uid ||
-		current->egid != current->gid);
+		current->egid != current->gid ||
+		(bprm->secflags & BINPRM_SEC_SECUREEXEC));
 }
 
 int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
@@ -280,9 +395,15 @@
 
 void cap_task_reparent_to_init (struct task_struct *p)
 {
-	p->cap_effective = CAP_INIT_EFF_SET;
-	p->cap_inheritable = CAP_INIT_INH_SET;
-	p->cap_permitted = CAP_FULL_SET;
+	if (newcaps) {
+		cap_set_full(p->cap_inheritable);
+		cap_set_full(p->cap_permitted);
+		cap_set_full(p->cap_effective);
+	} else {
+		p->cap_effective = CAP_OLD_INIT_EFF_SET;
+		p->cap_inheritable = CAP_OLD_INIT_INH_SET;
+		p->cap_permitted = CAP_FULL_SET;
+	}
 	p->keep_capabilities = 0;
 	return;
 }
@@ -367,6 +488,16 @@
 	return -ENOMEM;
 }
 
+static int __init commoncap_init (void)
+{
+	if (newcaps) {
+		printk(KERN_NOTICE "Experimental capability support is on\n");
+		cap_bset = CAP_FULL_SET;
+	}
+
+	return 0;
+}
+
 EXPORT_SYMBOL(cap_capable);
 EXPORT_SYMBOL(cap_ptrace);
 EXPORT_SYMBOL(cap_capget);
@@ -382,5 +513,7 @@
 EXPORT_SYMBOL(cap_syslog);
 EXPORT_SYMBOL(cap_vm_enough_memory);
 
+module_init(commoncap_init);
+
 MODULE_DESCRIPTION("Standard Linux Common Capabilities Security Module");
 MODULE_LICENSE("GPL");
--- linux-2.6.6-mm2/kernel/capability.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/kernel/capability.c	2004-05-13 11:42:51.000000000 -0700
@@ -13,7 +13,7 @@
 #include <asm/uaccess.h>
 
 unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
-kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
+kernel_cap_t cap_bset = CAP_OLD_INIT_EFF_SET;
 int sysctl_mlock_group;
 
 EXPORT_SYMBOL(securebits);
--- linux-2.6.6-mm2/include/linux/capability.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/capability.h	2004-05-13 11:42:51.000000000 -0700
@@ -308,8 +308,10 @@
 
 #define CAP_EMPTY_SET       to_cap_t(0)
 #define CAP_FULL_SET        to_cap_t(~0)
-#define CAP_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
-#define CAP_INIT_INH_SET    to_cap_t(0)
+
+/* For old-style capabilities, we use these. */
+#define CAP_OLD_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
+#define CAP_OLD_INIT_INH_SET    to_cap_t(0)
 
 #define CAP_TO_MASK(x) (1 << (x))
 #define cap_raise(c, flag)   (cap_t(c) |=  CAP_TO_MASK(flag))
--- linux-2.6.6-mm2/include/linux/binfmts.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/binfmts.h	2004-05-13 11:44:02.000000000 -0700
@@ -20,6 +20,10 @@
 /*
  * This structure is used to hold the arguments that are used when loading binaries.
  */
+#define BINPRM_SEC_SETUID	1
+#define BINPRM_SEC_SETGID	2
+#define BINPRM_SEC_SECUREEXEC	4
+#define BINPRM_SEC_NOELEVATE	8
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
 	struct page *page[MAX_ARG_PAGES];
@@ -28,7 +32,9 @@
 	int sh_bang;
 	struct file * file;
 	int e_uid, e_gid;
-	kernel_cap_t cap_inheritable, cap_permitted, cap_effective;
+	int secflags;
+	kernel_cap_t cap_inheritable, cap_permitted;
+	kernel_cap_t cap_effective; /* old caps -- do NOT use in new code */
 	void *security;
 	int argc, envc;
 	char * filename;	/* Name of binary as seen by procps */
--- linux-2.6.6-mm2/include/linux/init_task.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/init_task.h	2004-05-13 11:42:51.000000000 -0700
@@ -92,8 +92,8 @@
 		.function	= it_real_fn				\
 	},								\
 	.group_info	= &init_groups,					\
-	.cap_effective	= CAP_INIT_EFF_SET,				\
-	.cap_inheritable = CAP_INIT_INH_SET,				\
+	.cap_effective	= CAP_FULL_SET,				\
+	.cap_inheritable = CAP_FULL_SET,				\
 	.cap_permitted	= CAP_FULL_SET,					\
 	.keep_capabilities = 0,						\
 	.rlim		= INIT_RLIMITS,					\


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-13 20:08 [PATCH] capabilites, take 2 Andy Lutomirski
@ 2004-05-14  1:20 ` Chris Wright
  2004-05-14  1:35   ` Valdis.Kletnieks
                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14  1:20 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: akpm, chrisw, linux-kernel

* Andy Lutomirski (luto@myrealbox.com) wrote:
> Implement optional working capability support.  Try to avoid giving Andrew
> a heart attack. ;)

I think it still needs more work.  Default behavoiur is changed, like
Inheritble is full rather than clear, setpcap is enabled, etc.  Also,
why do you change from Posix the way exec() updates capabilities?  Sure,
there is no filesystem bits present, so this changes the calculation,
but I'm not convinced it's as secure this way.  At least with newcaps=0.

I believe we can get something functional with fewer changes, hence
easier to understand the ramifications.  In a nutshell, I'm still not
comfortable with this.

Also, it breaks my tests which try to drop privs and keep caps across
execve() which is really the only issue we're trying to solve ATM.


> --- linux-2.6.6-mm2/fs/exec.c~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/fs/exec.c	2004-05-13 12:15:20.000000000 -0700
> @@ -882,8 +882,10 @@
>  
>  	if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
>  		/* Set-uid? */
> -		if (mode & S_ISUID)
> +		if (mode & S_ISUID) {
>  			bprm->e_uid = inode->i_uid;
> +			bprm->secflags |= BINPRM_SEC_SETUID;
> +		}
>  
>  		/* Set-gid? */
>  		/*
> @@ -891,10 +893,19 @@
>  		 * is a candidate for mandatory locking, not a setgid
>  		 * executable.
>  		 */
> -		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> +		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>  			bprm->e_gid = inode->i_gid;
> +			bprm->secflags |= BINPRM_SEC_SETGID;
> +		}
>  	}
>  
> +	/* Pretend we have VFS capabilities */
> +	cap_set_full(bprm->cap_inheritable);

This looks sketchy.

> +	if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)

CodingStyle:  add space after keyword 'if'
	if ((...))

> +		cap_set_full(bprm->cap_permitted);
> +	else
> +		cap_clear(bprm->cap_permitted);
> +
>  	/* fill in binprm security blob */
>  	retval = security_bprm_set(bprm);
>  	if (retval)
> @@ -1089,6 +1100,7 @@
>  	bprm.loader = 0;
>  	bprm.exec = 0;
>  	bprm.security = NULL;
> +	bprm.secflags = 0;
>  	bprm.mm = mm_alloc();
>  	retval = -ENOMEM;
>  	if (!bprm.mm)
> --- linux-2.6.6-mm2/security/commoncap.c~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/security/commoncap.c	2004-05-13 12:59:32.934690092 -0700
> @@ -24,6 +24,11 @@
>  #include <linux/xattr.h>
>  #include <linux/hugetlb.h>
>  
> +int newcaps = 0;

make this:
  static int newcaps;

> +
> +module_param(newcaps, int, 444);
> +MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
> +
>  int cap_capable (struct task_struct *tsk, int cap)
>  {
>  	/* Derived from include/linux/sched.h:capable. */
> @@ -36,6 +41,11 @@
>  int cap_ptrace (struct task_struct *parent, struct task_struct *child)
>  {
>  	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
> +	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
> +	if (newcaps &&
> +	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
> +		return -EPERM;

Why no capable() override?  In fact, is this check really necessary?

> +
>  	if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
>  	    !capable (CAP_SYS_PTRACE))
>  		return -EPERM;
> @@ -76,6 +86,11 @@
>  		return -EPERM;
>  	}
>  
> +	/* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
> +	if (newcaps && !cap_issubset (*permitted, *inheritable)) {
> +		return -EPERM;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -89,6 +104,8 @@
>  
>  int cap_bprm_set_security (struct linux_binprm *bprm)
>  {
> +	if (newcaps) return 0;

CodingStyle:
	if (newcaps)
		return 0;

> +
>  	/* Copied from fs/exec.c:prepare_binprm. */
>  
>  	/* We don't have VFS support for capabilities yet */
> @@ -115,10 +132,11 @@
>  	return 0;
>  }
>  
> -void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
> +static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
>  {
> -	/* Derived from fs/exec.c:compute_creds. */
> +	/* This function will hopefully die in 2.7. */
>  	kernel_cap_t new_permitted, working;
> +	static int fixed_init = 0;
>  
>  	new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
>  	working = cap_intersect (bprm->cap_inheritable,
> @@ -151,6 +169,15 @@
>  		current->cap_permitted = new_permitted;
>  		current->cap_effective =
>  		    cap_intersect (new_permitted, bprm->cap_effective);
> +	} else if (!fixed_init) {
> +		/* This is not strictly correct, as it gives linuxrc more
> +		 * permissions than it used to have.  It was the only way I
> +		 * could think of to keep the resulting disaster contained,
> +		 * though.
> +		 */
> +		current->cap_effective = CAP_OLD_INIT_EFF_SET;
> +		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
> +		fixed_init = 1;

Hrm...

>  	}
>  
>  	/* AUD: Audit candidate if current->cap_effective is set */
> @@ -158,15 +185,103 @@
>  	current->keep_capabilities = 0;
>  }
>  
> +/*
> + * The rules of Linux capabilities (not POSIX!)
> + *
> + * What the masks mean:
> + *  pI = capabilities that this process or its children may have
> + *  pP = capabilities that this process has
> + *  pE = capabilities that this process has and are enabled
> + * (so pE <= pP <= pI)
> + *
> + * The capability evolution rules are:
> + *
> + *  pI' = pI & fI
> + *  pP' = ((fP & cap_bset) | pP) & pI' & Y
> + *  pE' = (setuid ? pP' : (pE & pP'))
> + *
> + *  X = cap_bset
> + *  Y is zero if uid!=0, euid==0, and setuid non-root
> + *
> + * Caveat: if (fP & ~pI'), there is no _theoretical_ problem, but
> + * this could introduce exploits in buggy programs.  Since programs
> + * that aren't capability-aware are insecure _anyway_ if pP!=0, this
> + * is OK.
> + *
> + * To allow pI != ~0 to be secure in the presence of buggy programs,
> + * we require full pI for setuid.
> + *
> + * The moral is that, if file capabilities are introduced, programs
> + * that are granted fP > 0 need to be aware of how to deal with it.
> + * Because the effective set is left alone on non-setuid fP>0,
> + * such a program should drop capabilities that were not in its initial
> + * effective set before running untrusted code.
> + *
> + */
> +void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
> +{
> +	kernel_cap_t new_pI, new_pP;
> +	kernel_cap_t fI, fP;
> +	int is_setuid, is_setgid;
> +
> +	if(!newcaps) {
> +		cap_bprm_apply_creds_compat(bprm, unsafe);
> +		return;
> +	}
> +
> +	fI = bprm->cap_inheritable;
> +	fP = bprm->cap_permitted;
> +	is_setuid = (bprm->secflags & BINPRM_SEC_SETUID);
> +	is_setgid = (bprm->secflags & BINPRM_SEC_SETGID);
> +
> +	new_pI = cap_intersect(current->cap_inheritable, fI);
> +
> +	/* Check that it's safe to elevate privileges */
> +	if (unsafe & ~LSM_UNSAFE_PTRACE_CAP)
> +		bprm->secflags |= BINPRM_SEC_NOELEVATE;
> +
> +	/* FIXME: Is this overly harsh on setgid? */
> +	if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
> +	    new_pI != CAP_FULL_SET)
> +			bprm->secflags |= BINPRM_SEC_NOELEVATE;
> +
> +	if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
> +		is_setuid = is_setgid = 0;
> +		cap_clear(fP);
> +	}
> +
> +	new_pP = cap_intersect(fP, cap_bset);
> +	new_pP = cap_combine(new_pP, current->cap_permitted);
> +	cap_mask(new_pP, new_pI);
> +
> +	/* setuid-nonroot is special. */
> +	if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
> +	    current->euid == 0)
> +		cap_clear(new_pP);
> +
> +	if(!cap_issubset(new_pP, current->cap_permitted))
> +		bprm->secflags |= BINPRM_SEC_SECUREEXEC;
> +
> +	/* Apply new security state */
> +	if (is_setuid) {
> +	        current->suid = current->euid = current->fsuid = bprm->e_uid;
> +		current->cap_effective = new_pP;
> +	}
> +	if (is_setgid)
> +	        current->sgid = current->egid = current->fsgid = bprm->e_gid;
> +
> +	current->cap_inheritable = new_pI;
> +	current->cap_permitted = new_pP;
> +	cap_mask(current->cap_effective, new_pP);
> +
> +	current->keep_capabilities = 0;
> +}
> +
>  int cap_bprm_secureexec (struct linux_binprm *bprm)
>  {
> -	/* If/when this module is enhanced to incorporate capability
> -	   bits on files, the test below should be extended to also perform a 
> -	   test between the old and new capability sets.  For now,
> -	   it simply preserves the legacy decision algorithm used by
> -	   the old userland. */
>  	return (current->euid != current->uid ||
> -		current->egid != current->gid);
> +		current->egid != current->gid ||
> +		(bprm->secflags & BINPRM_SEC_SECUREEXEC));
>  }
>  
>  int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
> @@ -280,9 +395,15 @@
>  
>  void cap_task_reparent_to_init (struct task_struct *p)
>  {
> -	p->cap_effective = CAP_INIT_EFF_SET;
> -	p->cap_inheritable = CAP_INIT_INH_SET;
> -	p->cap_permitted = CAP_FULL_SET;
> +	if (newcaps) {
> +		cap_set_full(p->cap_inheritable);
> +		cap_set_full(p->cap_permitted);
> +		cap_set_full(p->cap_effective);
> +	} else {
> +		p->cap_effective = CAP_OLD_INIT_EFF_SET;
> +		p->cap_inheritable = CAP_OLD_INIT_INH_SET;
> +		p->cap_permitted = CAP_FULL_SET;
> +	}
>  	p->keep_capabilities = 0;
>  	return;
>  }
> @@ -367,6 +488,16 @@
>  	return -ENOMEM;
>  }
>  
> +static int __init commoncap_init (void)
> +{
> +	if (newcaps) {
> +		printk(KERN_NOTICE "Experimental capability support is on\n");
> +		cap_bset = CAP_FULL_SET;
> +	}
> +
> +	return 0;
> +}
> +
>  EXPORT_SYMBOL(cap_capable);
>  EXPORT_SYMBOL(cap_ptrace);
>  EXPORT_SYMBOL(cap_capget);
> @@ -382,5 +513,7 @@
>  EXPORT_SYMBOL(cap_syslog);
>  EXPORT_SYMBOL(cap_vm_enough_memory);
>  
> +module_init(commoncap_init);
> +
>  MODULE_DESCRIPTION("Standard Linux Common Capabilities Security Module");
>  MODULE_LICENSE("GPL");
> --- linux-2.6.6-mm2/kernel/capability.c~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/kernel/capability.c	2004-05-13 11:42:51.000000000 -0700
> @@ -13,7 +13,7 @@
>  #include <asm/uaccess.h>
>  
>  unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
> -kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
> +kernel_cap_t cap_bset = CAP_OLD_INIT_EFF_SET;
>  int sysctl_mlock_group;
>  
>  EXPORT_SYMBOL(securebits);
> --- linux-2.6.6-mm2/include/linux/capability.h~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/include/linux/capability.h	2004-05-13 11:42:51.000000000 -0700
> @@ -308,8 +308,10 @@
>  
>  #define CAP_EMPTY_SET       to_cap_t(0)
>  #define CAP_FULL_SET        to_cap_t(~0)
> -#define CAP_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
> -#define CAP_INIT_INH_SET    to_cap_t(0)
> +
> +/* For old-style capabilities, we use these. */
> +#define CAP_OLD_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
> +#define CAP_OLD_INIT_INH_SET    to_cap_t(0)
>  
>  #define CAP_TO_MASK(x) (1 << (x))
>  #define cap_raise(c, flag)   (cap_t(c) |=  CAP_TO_MASK(flag))
> --- linux-2.6.6-mm2/include/linux/binfmts.h~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/include/linux/binfmts.h	2004-05-13 11:44:02.000000000 -0700
> @@ -20,6 +20,10 @@
>  /*
>   * This structure is used to hold the arguments that are used when loading binaries.
>   */
> +#define BINPRM_SEC_SETUID	1
> +#define BINPRM_SEC_SETGID	2
> +#define BINPRM_SEC_SECUREEXEC	4
> +#define BINPRM_SEC_NOELEVATE	8
>  struct linux_binprm{
>  	char buf[BINPRM_BUF_SIZE];
>  	struct page *page[MAX_ARG_PAGES];
> @@ -28,7 +32,9 @@
>  	int sh_bang;
>  	struct file * file;
>  	int e_uid, e_gid;
> -	kernel_cap_t cap_inheritable, cap_permitted, cap_effective;
> +	int secflags;
> +	kernel_cap_t cap_inheritable, cap_permitted;
> +	kernel_cap_t cap_effective; /* old caps -- do NOT use in new code */
>  	void *security;
>  	int argc, envc;
>  	char * filename;	/* Name of binary as seen by procps */
> --- linux-2.6.6-mm2/include/linux/init_task.h~caps	2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/include/linux/init_task.h	2004-05-13 11:42:51.000000000 -0700
> @@ -92,8 +92,8 @@
>  		.function	= it_real_fn				\
>  	},								\
>  	.group_info	= &init_groups,					\
> -	.cap_effective	= CAP_INIT_EFF_SET,				\
> -	.cap_inheritable = CAP_INIT_INH_SET,				\
> +	.cap_effective	= CAP_FULL_SET,				\
> +	.cap_inheritable = CAP_FULL_SET,				\

This was made unconditional.  And how are you convinced it's safe?

>  	.cap_permitted	= CAP_FULL_SET,					\
>  	.keep_capabilities = 0,						\
>  	.rlim		= INIT_RLIMITS,					\


thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2 
  2004-05-14  1:20 ` Chris Wright
@ 2004-05-14  1:35   ` Valdis.Kletnieks
  2004-05-14  4:51     ` Chris Wright
  2004-05-14  5:33     ` Olaf Dietsche
  2004-05-14  2:27   ` Andy Lutomirski
  2004-05-14  2:45   ` [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) Andy Lutomirski
  2 siblings, 2 replies; 36+ messages in thread
From: Valdis.Kletnieks @ 2004-05-14  1:35 UTC (permalink / raw)
  To: Chris Wright; +Cc: Andy Lutomirski, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:

> I think it still needs more work.  Default behavoiur is changed, like
> Inheritble is full rather than clear, setpcap is enabled, etc.  Also,
> why do you change from Posix the way exec() updates capabilities?  Sure,
> there is no filesystem bits present, so this changes the calculation,
> but I'm not convinced it's as secure this way.  At least with newcaps=0.

The last time the "capabilities" thread reared its head a while ago, Andy made
a posting that pretty conclusively showed that the Posix way was totally b0rken
if you ever intended to support filesystem bits.  So if you wanted to ever have
a snowball's chance of supporting something like:

chcap cap_net_raw+ep /bin/ping

so you could get rid of the set-UID bit on 'ping', you had to toss the Posix
propogation rules out the window.  So we need to do either:

1) Toss the Posix out the window
2) Toss all the filesystems capabilities support out the window.

(I'm assuming that a suggestion that we make the choice a Kconfig option will
be met with the sound of many kernel hackers either retching in disgust or
screaming in horror ;)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  1:20 ` Chris Wright
  2004-05-14  1:35   ` Valdis.Kletnieks
@ 2004-05-14  2:27   ` Andy Lutomirski
  2004-05-14  4:48     ` Chris Wright
  2004-05-14  6:39     ` Olaf Dietsche
  2004-05-14  2:45   ` [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) Andy Lutomirski
  2 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14  2:27 UTC (permalink / raw)
  To: Chris Wright; +Cc: akpm, linux-kernel

Chris Wright wrote:

> * Andy Lutomirski (luto@myrealbox.com) wrote:
> 
>>Implement optional working capability support.  Try to avoid giving Andrew
>>a heart attack. ;)
> 
> 
> I think it still needs more work.  Default behavoiur is changed, like
> Inheritble is full rather than clear, setpcap is enabled, etc.  ...

In cap_bprm_apply_creds_compat:

+	} else if (!fixed_init) {
+		/* This is not strictly correct, as it gives linuxrc more
+		 * permissions than it used to have.  It was the only way I
+		 * could think of to keep the resulting disaster contained,
+		 * though.
+		 */
+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+		fixed_init = 1;

So that it gets changed back.  Otherwise linuxrc ran without permissions
and my drives never got mounted.  Yah, it's ugly -- I'm open to
suggestions to avoid this.

 > ... Also,
> why do you change from Posix the way exec() updates capabilities?  Sure,
> there is no filesystem bits present, so this changes the calculation,
> but I'm not convinced it's as secure this way.  At least with newcaps=0.

I'm not convinced that Posix's version makes any sense.  Also, there are
apparently a number of drafts around which disagree on what the right
rules are.  (My copy, for example, matches the old rules exactly, but
the old rules caused the sendmail problem.)  And, under Posix, what does
the inheritable mask mean, anyway?

Also, I don't find the posix rules to be useful (why is there an
inheritable mask if all it does is to cause caps to be dropped on
exec, when the user could just manually drop them?).

> 
> I believe we can get something functional with fewer changes, hence
> easier to understand the ramifications.  In a nutshell, I'm still not
> comfortable with this.

I'll play with it, but I think this is the shortest patch I've come up
with.  I'll admit that touching this stuff scares me too, but I'd rather
redo it that try and patch it over again.

> 
> Also, it breaks my tests which try to drop privs and keep caps across
> execve() which is really the only issue we're trying to solve ATM.

Can you send me a sample of what breaks?  I do:

[root@luto tmp]# cap -c = ls /home/andy
ls: /home/andy: Permission denied
[root@luto tmp]# echo test >foo
[root@luto tmp]# chmod 700 foo
[root@luto tmp]# su andy -c 'cat foo'
cat: foo: Permission denied
[root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy cat foo
test
[root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy bash
[andy@luto tmp]$ whoami
andy
[andy@luto tmp]$ dumpcap
         Real        Eff
User    500         500
Group   500         500

Caps: = cap_dac_read_search+eip
[andy@luto tmp]$ cat foo
test

Which looks exactly right to me.
(cap and dumpcap live at www.stanford.edu/~luto/cap/)

>>--- linux-2.6.6-mm2/fs/exec.c~caps	2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/fs/exec.c	2004-05-13 12:15:20.000000000 -0700
>>@@ -882,8 +882,10 @@
>> 
>> 	if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
>> 		/* Set-uid? */
>>-		if (mode & S_ISUID)
>>+		if (mode & S_ISUID) {
>> 			bprm->e_uid = inode->i_uid;
>>+			bprm->secflags |= BINPRM_SEC_SETUID;
>>+		}
>> 
>> 		/* Set-gid? */
>> 		/*
>>@@ -891,10 +893,19 @@
>> 		 * is a candidate for mandatory locking, not a setgid
>> 		 * executable.
>> 		 */
>>-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
>>+		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>> 			bprm->e_gid = inode->i_gid;
>>+			bprm->secflags |= BINPRM_SEC_SETGID;
>>+		}
>> 	}
>> 
>>+	/* Pretend we have VFS capabilities */
>>+	cap_set_full(bprm->cap_inheritable);
> 
> 
> This looks sketchy.

My concept of 'inheritable' is that caps that are _not_ inheritable
may never be gained by this task or its children.  So a process
should normally have all caps inheritable.

> 
> 
>>+	if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
> 
> 
> CodingStyle:  add space after keyword 'if'

Fixed.

> 	if ((...))
> 
> 
>>+		cap_set_full(bprm->cap_permitted);
>>+	else
>>+		cap_clear(bprm->cap_permitted);
>>+
>> 	/* fill in binprm security blob */
>> 	retval = security_bprm_set(bprm);
>> 	if (retval)
>>@@ -1089,6 +1100,7 @@
>> 	bprm.loader = 0;
>> 	bprm.exec = 0;
>> 	bprm.security = NULL;
>>+	bprm.secflags = 0;
>> 	bprm.mm = mm_alloc();
>> 	retval = -ENOMEM;
>> 	if (!bprm.mm)
>>--- linux-2.6.6-mm2/security/commoncap.c~caps	2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/security/commoncap.c	2004-05-13 12:59:32.934690092 -0700
>>@@ -24,6 +24,11 @@
>> #include <linux/xattr.h>
>> #include <linux/hugetlb.h>
>> 
>>+int newcaps = 0;
> 
> 
> make this:
>   static int newcaps;

Fixed.

> 
> 
>>+
>>+module_param(newcaps, int, 444);
>>+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
>>+
>> int cap_capable (struct task_struct *tsk, int cap)
>> {
>> 	/* Derived from include/linux/sched.h:capable. */
>>@@ -36,6 +41,11 @@
>> int cap_ptrace (struct task_struct *parent, struct task_struct *child)
>> {
>> 	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
>>+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
>>+	if (newcaps &&
>>+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
>>+		return -EPERM;
> 
> 
> Why no capable() override?  In fact, is this check really necessary?

If task A has less inheritable caps than B, then A is somehow less trusted
and has no business tracing B.

A concrete example: a system runs with very restricted inheritable caps
on all processes except for a magic daemon.  The magic daemon holds on
to CAP_SYS_ADMIN to umount everything at shutdown.  If the rest of the
system gets rooted, it still shouldn't be possible to trace the daemon.
(Yes, this is currently not workable -- I plan to add a sysctl that sets
what inheritable caps a task must have for setuid to work.  The blanket
requirement that _all_ must be present is to avoid bugs in which a
setuid program assumes it will be fully privileged.)

> 
> 
>>+
>> 	if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
>> 	    !capable (CAP_SYS_PTRACE))
>> 		return -EPERM;
>>@@ -76,6 +86,11 @@
>> 		return -EPERM;
>> 	}
>> 
>>+	/* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
>>+	if (newcaps && !cap_issubset (*permitted, *inheritable)) {
>>+		return -EPERM;
>>+	}
>>+
>> 	return 0;
>> }
>> 
>>@@ -89,6 +104,8 @@
>> 
>> int cap_bprm_set_security (struct linux_binprm *bprm)
>> {
>>+	if (newcaps) return 0;
> 
> 
> CodingStyle:
> 	if (newcaps)
> 		return 0;

Fixed.

> 
> 
>>+
>> 	/* Copied from fs/exec.c:prepare_binprm. */
>> 
>> 	/* We don't have VFS support for capabilities yet */
>>@@ -115,10 +132,11 @@
>> 	return 0;
>> }
>> 
>>-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
>>+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
>> {
>>-	/* Derived from fs/exec.c:compute_creds. */
>>+	/* This function will hopefully die in 2.7. */
>> 	kernel_cap_t new_permitted, working;
>>+	static int fixed_init = 0;
>> 
>> 	new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
>> 	working = cap_intersect (bprm->cap_inheritable,
>>@@ -151,6 +169,15 @@
>> 		current->cap_permitted = new_permitted;
>> 		current->cap_effective =
>> 		    cap_intersect (new_permitted, bprm->cap_effective);
>>+	} else if (!fixed_init) {
>>+		/* This is not strictly correct, as it gives linuxrc more
>>+		 * permissions than it used to have.  It was the only way I
>>+		 * could think of to keep the resulting disaster contained,
>>+		 * though.
>>+		 */
>>+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
>>+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
>>+		fixed_init = 1;
> 
> 
> Hrm...

Yup.  It sucks.  I didn't want to touch the system startup code, though,
and this is the closest LSM comes.  It's too bad there's no LSM hook to do
this right (although I suppose I could add one, but that would break
capabilities as a module).

Speaking of which, this is a genuine problem if commoncap is a module
and newcaps=0 -- this code will probably never run.  Does it matter?


[lots of patch snipped]

>>--- linux-2.6.6-mm2/include/linux/init_task.h~caps	2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/include/linux/init_task.h	2004-05-13 11:42:51.000000000 -0700
>>@@ -92,8 +92,8 @@
>> 		.function	= it_real_fn				\
>> 	},								\
>> 	.group_info	= &init_groups,					\
>>-	.cap_effective	= CAP_INIT_EFF_SET,				\
>>-	.cap_inheritable = CAP_INIT_INH_SET,				\
>>+	.cap_effective	= CAP_FULL_SET,				\
>>+	.cap_inheritable = CAP_FULL_SET,				\
> 
> 
> This was made unconditional.  And how are you convinced it's safe?

Same as above.  But even if it were really unconditional (instead
of just sort-of unconditional), I still think it's safe, because
(in newcaps=0 mode) only root can get CAP_SETPCAP.  Root
could always just insert a module to enable it.  So granting it
costs nothing.

On the other hand, safety is good, and I can't see any way in the
old system for anything to get CAP_SETPCAP.  I'll send in a new
patch that just disables CAP_SETPCAP when newcaps=0.


> 
> 
>> 	.cap_permitted	= CAP_FULL_SET,					\
>> 	.keep_capabilities = 0,						\
>> 	.rlim		= INIT_RLIMITS,					\
> 
> 
> 
> thanks,
> -chris


Thanks,
Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14  1:20 ` Chris Wright
  2004-05-14  1:35   ` Valdis.Kletnieks
  2004-05-14  2:27   ` Andy Lutomirski
@ 2004-05-14  2:45   ` Andy Lutomirski
  2004-05-14  5:04     ` Chris Wright
  2004-05-14 11:10     ` Olaf Dietsche
  2 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14  2:45 UTC (permalink / raw)
  To: Chris Wright; +Cc: akpm, linux-kernel

Changes:

- Fixed a couple CodingStyle things (thanks, Chris)
- newcaps=0 explicity bans CAP_SETPCAP.

The latter is to prevent any possibility of capset abuse
in compatibility mode.

I still haven't tried to get CAP_SETPCAP with commoncap as a module,
but it seems match the old behavior (cat /proc/1/status) when built-in
(modulo linuxrc).

I suppose that if people really are scared of CAP_SETPCAP, then this
is a good precaution, because linuxrc's children will have it with
my patch.

I'm still open to suggestions for the real right fix.  I guess I
could make it a real boot parameter instead of a module parameter,
but that's ugly.

As for Posix caps, is there any good reason to follow Posix?  I
don't know of any OS that has Posix caps except Linux, and they're
broken.  The spec was dropped, anyway.

--Andy


 fs/exec.c                  |   16 +++-
 include/linux/binfmts.h    |    8 +-
 include/linux/capability.h |    6 +
 include/linux/init_task.h  |    4 -
 kernel/capability.c        |    2 
 security/commoncap.c       |  159 +++++++++++++++++++++++++++++++++++++++++----
 6 files changed, 176 insertions(+), 19 deletions(-)
--- linux-2.6.6-mm2/fs/exec.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/fs/exec.c	2004-05-13 18:46:36.000000000 -0700
@@ -882,8 +882,10 @@
 
 	if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
 		/* Set-uid? */
-		if (mode & S_ISUID)
+		if (mode & S_ISUID) {
 			bprm->e_uid = inode->i_uid;
+			bprm->secflags |= BINPRM_SEC_SETUID;
+		}
 
 		/* Set-gid? */
 		/*
@@ -891,10 +893,19 @@
 		 * is a candidate for mandatory locking, not a setgid
 		 * executable.
 		 */
-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
 			bprm->e_gid = inode->i_gid;
+			bprm->secflags |= BINPRM_SEC_SETGID;
+		}
 	}
 
+	/* Pretend we have VFS capabilities */
+	cap_set_full(bprm->cap_inheritable);
+	if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+		cap_set_full(bprm->cap_permitted);
+	else
+		cap_clear(bprm->cap_permitted);
+
 	/* fill in binprm security blob */
 	retval = security_bprm_set(bprm);
 	if (retval)
@@ -1089,6 +1100,7 @@
 	bprm.loader = 0;
 	bprm.exec = 0;
 	bprm.security = NULL;
+	bprm.secflags = 0;
 	bprm.mm = mm_alloc();
 	retval = -ENOMEM;
 	if (!bprm.mm)
--- linux-2.6.6-mm2/security/commoncap.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/security/commoncap.c	2004-05-13 19:28:27.000000000 -0700
@@ -24,9 +24,17 @@
 #include <linux/xattr.h>
 #include <linux/hugetlb.h>
 
+static int newcaps = 0;
+
+module_param(newcaps, int, 444);
+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
+
 int cap_capable (struct task_struct *tsk, int cap)
 {
 	/* Derived from include/linux/sched.h:capable. */
+	if (unlikely(!newcaps && cap == CAP_SETPCAP))
+		return -EPERM;
+
 	if (cap_raised (tsk->cap_effective, cap))
 		return 0;
 	else
@@ -36,6 +44,11 @@
 int cap_ptrace (struct task_struct *parent, struct task_struct *child)
 {
 	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
+	if (newcaps &&
+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
+		return -EPERM;
+
 	if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
 	    !capable (CAP_SYS_PTRACE))
 		return -EPERM;
@@ -76,6 +89,11 @@
 		return -EPERM;
 	}
 
+	/* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
+	if (newcaps && !cap_issubset (*permitted, *inheritable)) {
+		return -EPERM;
+	}
+
 	return 0;
 }
 
@@ -89,6 +107,9 @@
 
 int cap_bprm_set_security (struct linux_binprm *bprm)
 {
+	if (newcaps)
+		return 0;
+
 	/* Copied from fs/exec.c:prepare_binprm. */
 
 	/* We don't have VFS support for capabilities yet */
@@ -115,10 +136,11 @@
 	return 0;
 }
 
-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
 {
-	/* Derived from fs/exec.c:compute_creds. */
+	/* This function will hopefully die in 2.7. */
 	kernel_cap_t new_permitted, working;
+	static int fixed_init = 0;
 
 	new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
 	working = cap_intersect (bprm->cap_inheritable,
@@ -151,6 +173,15 @@
 		current->cap_permitted = new_permitted;
 		current->cap_effective =
 		    cap_intersect (new_permitted, bprm->cap_effective);
+	} else if (!fixed_init) {
+		/* This is not strictly correct, as it gives linuxrc more
+		 * permissions than it used to have.  It was the only way I
+		 * could think of to keep the resulting disaster contained,
+		 * though.
+		 */
+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+		fixed_init = 1;
 	}
 
 	/* AUD: Audit candidate if current->cap_effective is set */
@@ -158,15 +189,103 @@
 	current->keep_capabilities = 0;
 }
 
+/*
+ * The rules of Linux capabilities (not POSIX!)
+ *
+ * What the masks mean:
+ *  pI = capabilities that this process or its children may have
+ *  pP = capabilities that this process has
+ *  pE = capabilities that this process has and are enabled
+ * (so pE <= pP <= pI)
+ *
+ * The capability evolution rules are:
+ *
+ *  pI' = pI & fI
+ *  pP' = ((fP & cap_bset) | pP) & pI' & Y
+ *  pE' = (setuid ? pP' : (pE & pP'))
+ *
+ *  X = cap_bset
+ *  Y is zero if uid!=0, euid==0, and setuid non-root
+ *
+ * Caveat: if (fP & ~pI'), there is no _theoretical_ problem, but
+ * this could introduce exploits in buggy programs.  Since programs
+ * that aren't capability-aware are insecure _anyway_ if pP!=0, this
+ * is OK.
+ *
+ * To allow pI != ~0 to be secure in the presence of buggy programs,
+ * we require full pI for setuid.
+ *
+ * The moral is that, if file capabilities are introduced, programs
+ * that are granted fP > 0 need to be aware of how to deal with it.
+ * Because the effective set is left alone on non-setuid fP>0,
+ * such a program should drop capabilities that were not in its initial
+ * effective set before running untrusted code.
+ *
+ */
+void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+{
+	kernel_cap_t new_pI, new_pP;
+	kernel_cap_t fI, fP;
+	int is_setuid, is_setgid;
+
+	if(!newcaps) {
+		cap_bprm_apply_creds_compat(bprm, unsafe);
+		return;
+	}
+
+	fI = bprm->cap_inheritable;
+	fP = bprm->cap_permitted;
+	is_setuid = (bprm->secflags & BINPRM_SEC_SETUID);
+	is_setgid = (bprm->secflags & BINPRM_SEC_SETGID);
+
+	new_pI = cap_intersect(current->cap_inheritable, fI);
+
+	/* Check that it's safe to elevate privileges */
+	if (unsafe & ~LSM_UNSAFE_PTRACE_CAP)
+		bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+	/* FIXME: Is this overly harsh on setgid? */
+	if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
+	    new_pI != CAP_FULL_SET)
+			bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+	if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
+		is_setuid = is_setgid = 0;
+		cap_clear(fP);
+	}
+
+	new_pP = cap_intersect(fP, cap_bset);
+	new_pP = cap_combine(new_pP, current->cap_permitted);
+	cap_mask(new_pP, new_pI);
+
+	/* setuid-nonroot is special. */
+	if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
+	    current->euid == 0)
+		cap_clear(new_pP);
+
+	if (!cap_issubset(new_pP, current->cap_permitted))
+		bprm->secflags |= BINPRM_SEC_SECUREEXEC;
+
+	/* Apply new security state */
+	if (is_setuid) {
+	        current->suid = current->euid = current->fsuid = bprm->e_uid;
+		current->cap_effective = new_pP;
+	}
+	if (is_setgid)
+	        current->sgid = current->egid = current->fsgid = bprm->e_gid;
+
+	current->cap_inheritable = new_pI;
+	current->cap_permitted = new_pP;
+	cap_mask(current->cap_effective, new_pP);
+
+	current->keep_capabilities = 0;
+}
+
 int cap_bprm_secureexec (struct linux_binprm *bprm)
 {
-	/* If/when this module is enhanced to incorporate capability
-	   bits on files, the test below should be extended to also perform a 
-	   test between the old and new capability sets.  For now,
-	   it simply preserves the legacy decision algorithm used by
-	   the old userland. */
 	return (current->euid != current->uid ||
-		current->egid != current->gid);
+		current->egid != current->gid ||
+		(bprm->secflags & BINPRM_SEC_SECUREEXEC));
 }
 
 int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
@@ -280,9 +399,15 @@
 
 void cap_task_reparent_to_init (struct task_struct *p)
 {
-	p->cap_effective = CAP_INIT_EFF_SET;
-	p->cap_inheritable = CAP_INIT_INH_SET;
-	p->cap_permitted = CAP_FULL_SET;
+	if (newcaps) {
+		cap_set_full(p->cap_inheritable);
+		cap_set_full(p->cap_permitted);
+		cap_set_full(p->cap_effective);
+	} else {
+		p->cap_effective = CAP_OLD_INIT_EFF_SET;
+		p->cap_inheritable = CAP_OLD_INIT_INH_SET;
+		p->cap_permitted = CAP_FULL_SET;
+	}
 	p->keep_capabilities = 0;
 	return;
 }
@@ -367,6 +492,16 @@
 	return -ENOMEM;
 }
 
+static int __init commoncap_init (void)
+{
+	if (newcaps) {
+		printk(KERN_NOTICE "Experimental capability support is on\n");
+		cap_bset = CAP_FULL_SET;
+	}
+
+	return 0;
+}
+
 EXPORT_SYMBOL(cap_capable);
 EXPORT_SYMBOL(cap_ptrace);
 EXPORT_SYMBOL(cap_capget);
@@ -382,5 +517,7 @@
 EXPORT_SYMBOL(cap_syslog);
 EXPORT_SYMBOL(cap_vm_enough_memory);
 
+module_init(commoncap_init);
+
 MODULE_DESCRIPTION("Standard Linux Common Capabilities Security Module");
 MODULE_LICENSE("GPL");
--- linux-2.6.6-mm2/kernel/capability.c~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/kernel/capability.c	2004-05-13 11:42:51.000000000 -0700
@@ -13,7 +13,7 @@
 #include <asm/uaccess.h>
 
 unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
-kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
+kernel_cap_t cap_bset = CAP_OLD_INIT_EFF_SET;
 int sysctl_mlock_group;
 
 EXPORT_SYMBOL(securebits);
--- linux-2.6.6-mm2/include/linux/capability.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/capability.h	2004-05-13 11:42:51.000000000 -0700
@@ -308,8 +308,10 @@
 
 #define CAP_EMPTY_SET       to_cap_t(0)
 #define CAP_FULL_SET        to_cap_t(~0)
-#define CAP_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
-#define CAP_INIT_INH_SET    to_cap_t(0)
+
+/* For old-style capabilities, we use these. */
+#define CAP_OLD_INIT_EFF_SET    to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
+#define CAP_OLD_INIT_INH_SET    to_cap_t(0)
 
 #define CAP_TO_MASK(x) (1 << (x))
 #define cap_raise(c, flag)   (cap_t(c) |=  CAP_TO_MASK(flag))
--- linux-2.6.6-mm2/include/linux/binfmts.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/binfmts.h	2004-05-13 11:44:02.000000000 -0700
@@ -20,6 +20,10 @@
 /*
  * This structure is used to hold the arguments that are used when loading binaries.
  */
+#define BINPRM_SEC_SETUID	1
+#define BINPRM_SEC_SETGID	2
+#define BINPRM_SEC_SECUREEXEC	4
+#define BINPRM_SEC_NOELEVATE	8
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
 	struct page *page[MAX_ARG_PAGES];
@@ -28,7 +32,9 @@
 	int sh_bang;
 	struct file * file;
 	int e_uid, e_gid;
-	kernel_cap_t cap_inheritable, cap_permitted, cap_effective;
+	int secflags;
+	kernel_cap_t cap_inheritable, cap_permitted;
+	kernel_cap_t cap_effective; /* old caps -- do NOT use in new code */
 	void *security;
 	int argc, envc;
 	char * filename;	/* Name of binary as seen by procps */
--- linux-2.6.6-mm2/include/linux/init_task.h~caps	2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/init_task.h	2004-05-13 11:42:51.000000000 -0700
@@ -92,8 +92,8 @@
 		.function	= it_real_fn				\
 	},								\
 	.group_info	= &init_groups,					\
-	.cap_effective	= CAP_INIT_EFF_SET,				\
-	.cap_inheritable = CAP_INIT_INH_SET,				\
+	.cap_effective	= CAP_FULL_SET,				\
+	.cap_inheritable = CAP_FULL_SET,				\
 	.cap_permitted	= CAP_FULL_SET,					\
 	.keep_capabilities = 0,						\
 	.rlim		= INIT_RLIMITS,					\


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  2:27   ` Andy Lutomirski
@ 2004-05-14  4:48     ` Chris Wright
  2004-05-14  5:58       ` Andy Lutomirski
  2004-05-14  6:39     ` Olaf Dietsche
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Wright @ 2004-05-14  4:48 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel

* Andy Lutomirski (luto@myrealbox.com) wrote:
> Chris Wright wrote:
> 
> > * Andy Lutomirski (luto@myrealbox.com) wrote:
> > 
> >>Implement optional working capability support.  Try to avoid giving Andrew
> >>a heart attack. ;)
> > 
> > I think it still needs more work.  Default behavoiur is changed, like
> > Inheritble is full rather than clear, setpcap is enabled, etc.  ...
> 
> In cap_bprm_apply_creds_compat:
> 
> +	} else if (!fixed_init) {
> +		/* This is not strictly correct, as it gives linuxrc more
> +		 * permissions than it used to have.  It was the only way I
> +		 * could think of to keep the resulting disaster contained,
> +		 * though.
> +		 */
> +		current->cap_effective = CAP_OLD_INIT_EFF_SET;
> +		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
> +		fixed_init = 1;
> 
> So that it gets changed back.  Otherwise linuxrc ran without permissions
> and my drives never got mounted.  Yah, it's ugly -- I'm open to
> suggestions to avoid this.

I tested as a module, and this doesn't run AFAICS

>  > ... Also,
> > why do you change from Posix the way exec() updates capabilities?  Sure,
> > there is no filesystem bits present, so this changes the calculation,
> > but I'm not convinced it's as secure this way.  At least with newcaps=0.
> 
> I'm not convinced that Posix's version makes any sense.  Also, there are
> apparently a number of drafts around which disagree on what the right
> rules are.  (My copy, for example, matches the old rules exactly, but
> the old rules caused the sendmail problem.)  And, under Posix, what does
> the inheritable mask mean, anyway?
> 
> Also, I don't find the posix rules to be useful (why is there an
> inheritable mask if all it does is to cause caps to be dropped on
> exec, when the user could just manually drop them?).

Not sure if it's defensible, but it allows passing on an inheritable
capability through an intermediate process that simply can't inherit
that capability.  This is not unlike requiring an unprivileged process
to ask a privileged process for it to do something on it's behalf.
Certainly it's implicit that you trust the privileged process.

> > I believe we can get something functional with fewer changes, hence
> > easier to understand the ramifications.  In a nutshell, I'm still not
> > comfortable with this.
> 
> I'll play with it, but I think this is the shortest patch I've come up
> with.  I'll admit that touching this stuff scares me too, but I'd rather
> redo it that try and patch it over again.

Yeah, the subtleties are unnerving.

> > Also, it breaks my tests which try to drop privs and keep caps across
> > execve() which is really the only issue we're trying to solve ATM.
> 
> Can you send me a sample of what breaks?  I do:

Yes.  It's something like this:

keepcaps
dropuid
drop caps
execve()

The caps that are left are like this.  (effective == inheritable) which
are a subset of permitted.

> [root@luto tmp]# cap -c = ls /home/andy
> ls: /home/andy: Permission denied
> [root@luto tmp]# echo test >foo
> [root@luto tmp]# chmod 700 foo
> [root@luto tmp]# su andy -c 'cat foo'
> cat: foo: Permission denied
> [root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy cat foo
> test
> [root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy bash
> [andy@luto tmp]$ whoami
> andy
> [andy@luto tmp]$ dumpcap
>          Real        Eff
> User    500         500
> Group   500         500
> 
> Caps: = cap_dac_read_search+eip
> [andy@luto tmp]$ cat foo
> test
> 
> Which looks exactly right to me.
> (cap and dumpcap live at www.stanford.edu/~luto/cap/)
> 
> >>--- linux-2.6.6-mm2/fs/exec.c~caps	2004-05-13 11:42:26.000000000 -0700
> >>+++ linux-2.6.6-mm2/fs/exec.c	2004-05-13 12:15:20.000000000 -0700
> >>@@ -882,8 +882,10 @@
> >> 
> >> 	if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
> >> 		/* Set-uid? */
> >>-		if (mode & S_ISUID)
> >>+		if (mode & S_ISUID) {
> >> 			bprm->e_uid = inode->i_uid;
> >>+			bprm->secflags |= BINPRM_SEC_SETUID;
> >>+		}
> >> 
> >> 		/* Set-gid? */
> >> 		/*
> >>@@ -891,10 +893,19 @@
> >> 		 * is a candidate for mandatory locking, not a setgid
> >> 		 * executable.
> >> 		 */
> >>-		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> >>+		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> >> 			bprm->e_gid = inode->i_gid;
> >>+			bprm->secflags |= BINPRM_SEC_SETGID;
> >>+		}
> >> 	}
> >> 
> >>+	/* Pretend we have VFS capabilities */
> >>+	cap_set_full(bprm->cap_inheritable);
> > 
> > 
> > This looks sketchy.
> 
> My concept of 'inheritable' is that caps that are _not_ inheritable
> may never be gained by this task or its children.  So a process
> should normally have all caps inheritable.

This is the diff with Posix, which allows a process to inherit a
capability that it can never exercise.  However, it could pass the
capablity on to someone else who could inherit it.

<snip>
> >>@@ -36,6 +41,11 @@
> >> int cap_ptrace (struct task_struct *parent, struct task_struct *child)
> >> {
> >> 	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
> >>+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
> >>+	if (newcaps &&
> >>+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
> >>+		return -EPERM;
> > 
> > 
> > Why no capable() override?  In fact, is this check really necessary?
> 
> If task A has less inheritable caps than B, then A is somehow less trusted
> and has no business tracing B.

But it's not less.  It's just not a subset.  Task B could have only one
inheritable cap, while A could have all but the one cap that B has.  In
fact, that could be CAP_SYS_PTRACE.  So the threat is task A tracing B,
and using it to pass on capabilities that it wasn't allowed to pass on.
Which is what the permitted test was for before, and what CAP_SYS_PTRACE
was used to override.

> A concrete example: a system runs with very restricted inheritable caps
> on all processes except for a magic daemon.  The magic daemon holds on
> to CAP_SYS_ADMIN to umount everything at shutdown.  If the rest of the
> system gets rooted, it still shouldn't be possible to trace the daemon.
> (Yes, this is currently not workable -- I plan to add a sysctl that sets
> what inheritable caps a task must have for setuid to work.  The blanket
> requirement that _all_ must be present is to avoid bugs in which a
> setuid program assumes it will be fully privileged.)

I suppose this eliminates the usefulness of CAP_SYS_PTRACE.

> >> 	if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
> >> 	    !capable (CAP_SYS_PTRACE))
> >> 		return -EPERM;
> >>@@ -76,6 +86,11 @@
> >> 		return -EPERM;
> >> 	}
> >> 
> >>+	/* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
> >>+	if (newcaps && !cap_issubset (*permitted, *inheritable)) {
> >>+		return -EPERM;
> >>+	}

This is what breaks my code.  I specifcally test permitting the parent to
grab back some caps, but never give them away.

> >> 	return 0;
> >> }
> >> 
> >>@@ -115,10 +132,11 @@
> >> 	return 0;
> >> }
> >> 
> >>-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
> >>+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
> >> {
> >>-	/* Derived from fs/exec.c:compute_creds. */
> >>+	/* This function will hopefully die in 2.7. */
> >> 	kernel_cap_t new_permitted, working;
> >>+	static int fixed_init = 0;
> >> 
> >> 	new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
> >> 	working = cap_intersect (bprm->cap_inheritable,
> >>@@ -151,6 +169,15 @@
> >> 		current->cap_permitted = new_permitted;
> >> 		current->cap_effective =
> >> 		    cap_intersect (new_permitted, bprm->cap_effective);
> >>+	} else if (!fixed_init) {
> >>+		/* This is not strictly correct, as it gives linuxrc more
> >>+		 * permissions than it used to have.  It was the only way I
> >>+		 * could think of to keep the resulting disaster contained,
> >>+		 * though.
> >>+		 */
> >>+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
> >>+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
> >>+		fixed_init = 1;
> > 
> > 
> > Hrm...
> 
> Yup.  It sucks.  I didn't want to touch the system startup code, though,
> and this is the closest LSM comes.  It's too bad there's no LSM hook to do
> this right (although I suppose I could add one, but that would break
> capabilities as a module).

As a module it loosk like.

$ grep ^Cap /proc/1/status
CapInh: 00000000ffffffff
CapPrm: 00000000ffffffff
CapEff: 00000000ffffffff

> Speaking of which, this is a genuine problem if commoncap is a module
> and newcaps=0 -- this code will probably never run.  Does it matter?

Yup, that's the case I tested.  It breaks something that's working
normally now...

> [lots of patch snipped]
> 
> >>--- linux-2.6.6-mm2/include/linux/init_task.h~caps	2004-05-13 11:42:26.000000000 -0700
> >>+++ linux-2.6.6-mm2/include/linux/init_task.h	2004-05-13 11:42:51.000000000 -0700
> >>@@ -92,8 +92,8 @@
> >> 		.function	= it_real_fn				\
> >> 	},								\
> >> 	.group_info	= &init_groups,					\
> >>-	.cap_effective	= CAP_INIT_EFF_SET,				\
> >>-	.cap_inheritable = CAP_INIT_INH_SET,				\
> >>+	.cap_effective	= CAP_FULL_SET,				\
> >>+	.cap_inheritable = CAP_FULL_SET,				\
> > 
> > 
> > This was made unconditional.  And how are you convinced it's safe?
> 
> Same as above.  But even if it were really unconditional (instead
> of just sort-of unconditional), I still think it's safe, because
> (in newcaps=0 mode) only root can get CAP_SETPCAP.  Root
> could always just insert a module to enable it.  So granting it
> costs nothing.
> 
> On the other hand, safety is good, and I can't see any way in the
> old system for anything to get CAP_SETPCAP.  I'll send in a new
> patch that just disables CAP_SETPCAP when newcaps=0.

OK.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  1:35   ` Valdis.Kletnieks
@ 2004-05-14  4:51     ` Chris Wright
  2004-05-14  5:33     ` Olaf Dietsche
  1 sibling, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14  4:51 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Chris Wright, Andy Lutomirski, akpm, linux-kernel

* Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:
> 
> > I think it still needs more work.  Default behavoiur is changed, like
> > Inheritble is full rather than clear, setpcap is enabled, etc.  Also,
> > why do you change from Posix the way exec() updates capabilities?  Sure,
> > there is no filesystem bits present, so this changes the calculation,
> > but I'm not convinced it's as secure this way.  At least with newcaps=0.
> 
> The last time the "capabilities" thread reared its head a while ago, Andy made
> a posting that pretty conclusively showed that the Posix way was totally b0rken
> if you ever intended to support filesystem bits.  So if you wanted to ever have
> a snowball's chance of supporting something like:
> 
> chcap cap_net_raw+ep /bin/ping

It's still not clear that's what we want.  For now, just being able to have
the sucap equiv to sudo would be nice.  And it's very uncomfortable to
change mainline in subtle ways that could break security during stable
series.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14  2:45   ` [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) Andy Lutomirski
@ 2004-05-14  5:04     ` Chris Wright
  2004-05-14  5:32       ` Valdis.Kletnieks
  2004-05-14 11:10     ` Olaf Dietsche
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Wright @ 2004-05-14  5:04 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel

* Andy Lutomirski (luto@myrealbox.com) wrote:
> As for Posix caps, is there any good reason to follow Posix?  I
> don't know of any OS that has Posix caps except Linux, and they're
> broken.  The spec was dropped, anyway.

Well, I wonder what IRIX does?  They support capabilities and had a
reasonable sized hand in the draft.  No point in making it impossible
to port apps.  It's not that I'm a strong fan of Posix caps, but
compatibility (even with a partially complete draft) with defacto
standards is not entirely unreasonable.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) 
  2004-05-14  5:04     ` Chris Wright
@ 2004-05-14  5:32       ` Valdis.Kletnieks
  2004-05-14  5:40         ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Valdis.Kletnieks @ 2004-05-14  5:32 UTC (permalink / raw)
  To: Chris Wright; +Cc: Andy Lutomirski, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]

On Thu, 13 May 2004 22:04:15 PDT, Chris Wright said:

> Well, I wonder what IRIX does?  They support capabilities and had a
> reasonable sized hand in the draft.  No point in making it impossible
> to port apps.  It's not that I'm a strong fan of Posix caps, but
> compatibility (even with a partially complete draft) with defacto
> standards is not entirely unreasonable.

The IRIX 6.5 manpage says:


    A process has three, possibly empty, sets of capabilities.  The permitted
     capability set is the maximum set of capabilities for the process.  The
     effective capability set contains those capabilities that are currently
     active for the process.  The inherited capability set contains those
     capabilities that the process may pass to the next process image across
     exec(2).

     Only capabilities in a process' effective capability set allow the
     process to perform restricted operations.  A process may use capability
     management functions to add or remove capabilities from its effective
     capability set.  However the capabilities that a process can make
     effective are limited to those that exist in its permitted capability
     set.

     Only capabilities in the process' inherited capability set can be passed
     across exec(2).

     Capabilities are also associated with files.  There may or may not be a
     capability set associated with a specific file.  If a file has no
     capability set, execution of this file through an exec(2) will leave the
     process' capability set unchanged.  If a file has a capability set,
     execution of file will affect the process' capability set in the
     following way: a file's inherited capability set further constrains the
     process inherited capabilities that are passed from one process image to
     another. The file's permitted capability set contains the capabilities
     that are unconditionally permitted to a process upon execution of that
     file.  The file's effective capabilities are the capabilities that become
     immediately active for the process upon execution of the file.

     More precisely described, the process capability assignment algorithm is:

              I-proc-new = I-proc-old & I-file
              P-proc-new = P-file | (I-proc-new & P-proc-old)
              E-proc-new = P-proc-new & E-file

     File capabilities are supported only on XFS filesystems.

Note that Irix has *3* capability vectors attached to a file....

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  1:35   ` Valdis.Kletnieks
  2004-05-14  4:51     ` Chris Wright
@ 2004-05-14  5:33     ` Olaf Dietsche
  2004-05-14  6:04       ` Valdis.Kletnieks
  1 sibling, 1 reply; 36+ messages in thread
From: Olaf Dietsche @ 2004-05-14  5:33 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Chris Wright, Andy Lutomirski, akpm, linux-kernel

Valdis.Kletnieks@vt.edu writes:

> On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:
>
>> I think it still needs more work.  Default behavoiur is changed, like
>> Inheritble is full rather than clear, setpcap is enabled, etc.  Also,
>> why do you change from Posix the way exec() updates capabilities?  Sure,
>> there is no filesystem bits present, so this changes the calculation,
>> but I'm not convinced it's as secure this way.  At least with newcaps=0.
>
> The last time the "capabilities" thread reared its head a while ago, Andy made
> a posting that pretty conclusively showed that the Posix way was totally b0rken
> if you ever intended to support filesystem bits.  So if you wanted to ever have
> a snowball's chance of supporting something like:
>
> chcap cap_net_raw+ep /bin/ping

Seems like you're not aware of:
<http://www.olafdietsche.de/linux/capability/>

This supports filesystem capabilities with the current (POSIX?)
implementation. So, whatever Andy has shown, it has at least one
counter evidence q.e.d.

> 2) Toss all the filesystems capabilities support out the window.

I agree to disagree ;-)

Regards, Olaf.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14  5:32       ` Valdis.Kletnieks
@ 2004-05-14  5:40         ` Chris Wright
  2004-05-14  6:25           ` Valdis.Kletnieks
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Wright @ 2004-05-14  5:40 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Chris Wright, Andy Lutomirski, akpm, linux-kernel

* Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> On Thu, 13 May 2004 22:04:15 PDT, Chris Wright said:
> 
> > Well, I wonder what IRIX does?  They support capabilities and had a
> > reasonable sized hand in the draft.  No point in making it impossible
> > to port apps.  It's not that I'm a strong fan of Posix caps, but
> > compatibility (even with a partially complete draft) with defacto
> > standards is not entirely unreasonable.
> 
> The IRIX 6.5 manpage says:

Thanks.

OK, this is precisely POSIX as I expected.  No surprise given the folks
involved.

<snip manpage>

> Note that Irix has *3* capability vectors attached to a file....

This is what POSIX specifies (with as much authority as a withdrawn spec
has ;-)

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  4:48     ` Chris Wright
@ 2004-05-14  5:58       ` Andy Lutomirski
  2004-05-14 17:45         ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14  5:58 UTC (permalink / raw)
  To: Chris Wright; +Cc: akpm, linux-kernel, Valdis.Kletnieks

[Pardon my gross butchery and reordering.]

Compatibility (i.e. newcaps=0):

Chris Wright wrote:
 > * Andy Lutomirski (luto@myrealbox.com) wrote:
 >
 >>Chris Wright wrote:
 >>
 >>
 >>>* Andy Lutomirski (luto@myrealbox.com) wrote:

 >>>I think it still needs more work.  Default behavoiur is changed, like
 >>>Inheritble is full rather than clear, setpcap is enabled, etc.  ...
 >>
 >>In cap_bprm_apply_creds_compat:
 >>
 >>+	} else if (!fixed_init) {
 >>+		/* This is not strictly correct, as it gives linuxrc more
 >>+		 * permissions than it used to have.  It was the only way I
 >>+		 * could think of to keep the resulting disaster contained,
 >>+		 * though.
 >>+		 */
 >>+		current->cap_effective = CAP_OLD_INIT_EFF_SET;
 >>+		current->cap_inheritable = CAP_OLD_INIT_INH_SET;
 >>+		fixed_init = 1;
 >>
 >>So that it gets changed back.  Otherwise linuxrc ran without permissions
 >>and my drives never got mounted.  Yah, it's ugly -- I'm open to
 >>suggestions to avoid this.
 >
 >
 > I tested as a module, and this doesn't run AFAICS

OK -- I give in.  I'll redo it as a kernel (non-module) boot parameter.
And touch some more files that have no business being capability-aware
while I'm at it :(


The inheritable mask:

 >>>Also, it breaks my tests which try to drop privs and keep caps across
 >>>execve() which is really the only issue we're trying to solve ATM.
 >>
 >>Can you send me a sample of what breaks?  I do:
 >
 >
 > Yes.  It's something like this:
 >
 > keepcaps
 > dropuid
 > drop caps
 > execve()
 >
 > The caps that are left are like this.  (effective == inheritable) which
 > are a subset of permitted.
 >

Is (eff == inh) what happens or what should happen?  If the former, then
my patch is broken.  If the latter, either I'm confused, or see below.

 >>>why do you change from Posix the way exec() updates capabilities?  Sure,
 >>>there is no filesystem bits present, so this changes the calculation,
 >>>but I'm not convinced it's as secure this way.  At least with newcaps=0.
 >>
 >>I'm not convinced that Posix's version makes any sense.  Also, there are
 >>apparently a number of drafts around which disagree on what the right
 >>rules are.  (My copy, for example, matches the old rules exactly, but
 >>the old rules caused the sendmail problem.)  And, under Posix, what does
 >>the inheritable mask mean, anyway?
 >>
 >>Also, I don't find the posix rules to be useful (why is there an
 >>inheritable mask if all it does is to cause caps to be dropped on
 >>exec, when the user could just manually drop them?).
 >
 >
 > Not sure if it's defensible, but it allows passing on an inheritable
 > capability through an intermediate process that simply can't inherit
 > that capability.  This is not unlike requiring an unprivileged process
 > to ask a privileged process for it to do something on it's behalf.
 > Certainly it's implicit that you trust the privileged process.
 >

[and:]
 >>>>+	/* Pretend we have VFS capabilities */
 >>>>+	cap_set_full(bprm->cap_inheritable);
 >>>
 >>>
 >>>This looks sketchy.
 >>
 >>My concept of 'inheritable' is that caps that are _not_ inheritable
 >>may never be gained by this task or its children.  So a process
 >>should normally have all caps inheritable.
 >
 >
 > This is the diff with Posix, which allows a process to inherit a
 > capability that it can never exercise.  However, it could pass the
 > capablity on to someone else who could inherit it.
 >
 > <snip>
 >

So here's where I think we disagree:

Posix (as interpreted by Linux 2.4/2.6) says:
pP' = (fP & cap_bset) | (fI & pI)

So (assuming that set_security did the "obvious" (but dangerous) thing):

pP := "task may enable and use these capabilities"
pE := "task may use these capabilities _now_"
pI := "task may gain these on exec from fI"

fP := "this program gets these caps (module cap_bset)"
fI := "this program gets these caps if pI says so"

Which screams "overcomplicated."  I imagine that the use is for a
trusted daemon to run an untrusted helper (with pI>0) which then
calls back into trusted land and gets its caps back.  This is
possibly convenient, but it seems to break (where break = scare me)
when there are more than one such system on a given box.  Then one's
untrusted program (with fI>0) can call the other's trusted fI>0
helper.  I suppose the point is that a random user's program (with fI==0)
can't try to exploit anything, but, for this to be at all secure, both
fI>0 programs need to be secure against attack from the other (unrelated)
system, should it be compromised.  Which means it might as well have set
fP>0 and been done with it (I don't believe in security by inconvenience
of exploit).

I see no particular invariants here, except for pE <= pP.

IRIX (thanks Valdis) says:

pI' = pI & fI
pP' = fP | (pI' & fP)

Which I interpret as

pP := "task may enable and use these capabilities"
pE := "task may use these capabilities _now_"
pI := ~"task _loses_ these on exec"

fP := "this program gets these caps"
fI := "this program may keep these caps"

This seems to want pP <= pI as an invariant.

This is what I always thought Linux capabilities meant to be.  They
don't make my brain hurt.

But I also think they're overengineered.  Instead of:

drop_caps_from_inheritable()
exec()

a program could do:

drop_caps_from_permitted()
exec()

And I can't imagine what use fI != ~0 has, since it's trivially
accomplished by a wrapper.  It is also trivially bypassed by
loading the program manually (with ld.so).

So, in my patch, I decided steal the inheritable mask to mean this:

pI := "this process may gain these caps"
fI := "this is an upper bound on pI"

In other words, if a process is extra-untrusted (e.g. it's a daemon
that never needs a certain capability and has no business trying
to gain it), it can drop it from pI.  Then it cannot try to abuse
programs with pP>0.  The setuid override is just added paranoia.
Another benefit is that it allows a securelevel-like scheme, where
even root isn't quite trusted.

I suppose it might be inappropriate to steal this field like this,
given that IRIX already has a (somewhat reasonable) use for it. I
have no problem implementing IRIX-style capabilities and (if there
is enough interest) adding a _fourth_ process field pM (process
maximum capabilities) that does what my pI does.

As for the fE mask, I just don't see the point, although I _really_
don't like the way it's described in the IRIX manpage.

IRIX has pE = pP & fE.  This breaks Posix non-capabilities
compatibility for a program that's uid==0, euid!=0.  It should
have pE==0 and pP>0.  But it calls exec() and gets pE>0.  This
is bad.

Assuming there's something else there to fix that case,
then I still don't see the point.  If a program is capability-
aware, it can set its pE however it likes.  If not, then it probably
expects pE==pP.  I guess there could be a trusted but dumb program
that runs a trusted, cap-aware helper that needs capabilities.
Then the admin sets fE==0 on the dump program and everything works.
Seems a bit contrived, though.

On the other hand, I'm not wedded to my model of pE.  It'll be harder
to get IRIX's right for uid!=euid.

CAP_SYS_PTRACE:

 >>>>@@ -36,6 +41,11 @@
 >>>>int cap_ptrace (struct task_struct *parent, struct task_struct *child)
 >>>>{
 >>>>	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
 >>>>+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
 >>>>+	if (newcaps &&
 >>>>+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
 >>>>+		return -EPERM;
 >>>
 >>>
 >>>Why no capable() override?  In fact, is this check really necessary?
 >>
 >>If task A has less inheritable caps than B, then A is somehow less trusted
 >>and has no business tracing B.
 >
 >
 > But it's not less.  It's just not a subset.  Task B could have only one
 > inheritable cap, while A could have all but the one cap that B has.  In
 > fact, that could be CAP_SYS_PTRACE.  So the threat is task A tracing B,
 > and using it to pass on capabilities that it wasn't allowed to pass on.
 > Which is what the permitted test was for before, and what CAP_SYS_PTRACE
 > was used to override.
 >
 >
 >>A concrete example: a system runs with very restricted inheritable caps
 >>on all processes except for a magic daemon.  The magic daemon holds on
 >>to CAP_SYS_ADMIN to umount everything at shutdown.  If the rest of the
 >>system gets rooted, it still shouldn't be possible to trace the daemon.
 >>(Yes, this is currently not workable -- I plan to add a sysctl that sets
 >>what inheritable caps a task must have for setuid to work.  The blanket
 >>requirement that _all_ must be present is to avoid bugs in which a
 >>setuid program assumes it will be fully privileged.)
 >
 >
 > I suppose this eliminates the usefulness of CAP_SYS_PTRACE.
 >
 >

It lets one uid/gid trace another.  If CAP_SYS_PTRACE allowed a process
to arbitrarity steal another's capabilities, then the process with
CAP_SYS_PTRACE might as well have been given those capabilities.  That is,
this should IMHO be disallowed

drop_all_but_CAP_SYS_PTRACE()
exec(slightly trusted debugger process)
ptrace(1) <--- but it was supposed to be "slightly trusted"!



So:

Should I redo this to keep IRIX's meaning of fI, should I keep mine,
or should I do something else.  Chris -- in your examples, you seem
to have some idea of what should be happening.  What do you think?

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2 
  2004-05-14  5:33     ` Olaf Dietsche
@ 2004-05-14  6:04       ` Valdis.Kletnieks
  2004-05-14  7:09         ` Olaf Dietsche
  0 siblings, 1 reply; 36+ messages in thread
From: Valdis.Kletnieks @ 2004-05-14  6:04 UTC (permalink / raw)
  To: Olaf Dietsche; +Cc: Chris Wright, Andy Lutomirski, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

On Fri, 14 May 2004 07:33:32 +0200, Olaf Dietsche said:

> Seems like you're not aware of:
> <http://www.olafdietsche.de/linux/capability/>
> 
> This supports filesystem capabilities with the current (POSIX?)
> implementation. So, whatever Andy has shown, it has at least one
> counter evidence q.e.d.

Yes.. I was aware of that.. and I just visited it.. and the VERY TOP it says:

"Filesystem capabilities for linux

This implementation is likely *not* POSIX compatible."

Now who should I believe, you or the author of the page? :)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) 
  2004-05-14  5:40         ` Chris Wright
@ 2004-05-14  6:25           ` Valdis.Kletnieks
  0 siblings, 0 replies; 36+ messages in thread
From: Valdis.Kletnieks @ 2004-05-14  6:25 UTC (permalink / raw)
  To: Chris Wright; +Cc: Andy Lutomirski, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On Thu, 13 May 2004 22:40:42 PDT, Chris Wright said:

> OK, this is precisely POSIX as I expected.  No surprise given the folks
> involved.

Hmm... Chris? Andy?  *Exactly* what version of the draft are you both looking at?

I ask because Andy Lutomirski's draft had *different* production rules:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106687456724871&w=2

And I think everybody managed to miss this gotcha (I know I did):

Albert Calahan reported that apparently a lot of people worked off the N-1 version
of the draft, and the equation that's giving us the trouble got changed:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106687419224640&w=2

I wonder if Andy and I were convinced that the Posix draft 16 that people
worked from was broken because it was, but the Posix draft 17 (that looks
like the SGI stuff) was more correct but didn't get seen by everybody?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  2:27   ` Andy Lutomirski
  2004-05-14  4:48     ` Chris Wright
@ 2004-05-14  6:39     ` Olaf Dietsche
  1 sibling, 0 replies; 36+ messages in thread
From: Olaf Dietsche @ 2004-05-14  6:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel

Andy Lutomirski <luto@myrealbox.com> writes:

> I'm not convinced that Posix's version makes any sense.  Also, there are
> apparently a number of drafts around which disagree on what the right
> rules are.  (My copy, for example, matches the old rules exactly, but
> the old rules caused the sendmail problem.)

Don't confuse POSIX _semantics_ with implementation _bugs_.

>  And, under Posix, what does
> the inheritable mask mean, anyway?
>
> Also, I don't find the posix rules to be useful (why is there an
> inheritable mask if all it does is to cause caps to be dropped on
> exec, when the user could just manually drop them?).

You can use the inheritable set, if you want to give capabilities to a
process when it's started by an already priviledged parent (e.g. a
root process), but not when it's started by a regular user.

See <http://www.olafdietsche.de/linux/capability/> for an example.

Regards, Olaf.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  6:04       ` Valdis.Kletnieks
@ 2004-05-14  7:09         ` Olaf Dietsche
  0 siblings, 0 replies; 36+ messages in thread
From: Olaf Dietsche @ 2004-05-14  7:09 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Chris Wright, Andy Lutomirski, akpm, linux-kernel

Valdis.Kletnieks@vt.edu writes:

> On Fri, 14 May 2004 07:33:32 +0200, Olaf Dietsche said:
>
>> Seems like you're not aware of:
>> <http://www.olafdietsche.de/linux/capability/>
>> 
>> This supports filesystem capabilities with the current (POSIX?)
>> implementation.

This refers to the linux kernel main line.

> Yes.. I was aware of that.. and I just visited it.. and the VERY TOP it says:
>
> "Filesystem capabilities for linux
>
> This implementation is likely *not* POSIX compatible."

This refers to the tools I provide. I should emphasize this on the
page, thank you. My patch doesn't change the rules, how the capability
bits are mingled.

> Now who should I believe, you or the author of the page? :)

Alright, I deserve this for being imprecise. :)

Regards, Olaf.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14  2:45   ` [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) Andy Lutomirski
  2004-05-14  5:04     ` Chris Wright
@ 2004-05-14 11:10     ` Olaf Dietsche
  2004-05-14 14:15       ` Andy Lutomirski
  1 sibling, 1 reply; 36+ messages in thread
From: Olaf Dietsche @ 2004-05-14 11:10 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel

Andy Lutomirski <luto@myrealbox.com> writes:

> +	/* Pretend we have VFS capabilities */
> +	cap_set_full(bprm->cap_inheritable);
> +	if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
> +		cap_set_full(bprm->cap_permitted);
> +	else
> +		cap_clear(bprm->cap_permitted);

I'd move this to security/commoncap.c:

diff -urN a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c	Fri May 14 10:07:28 2004
+++ b/fs/exec.c	Fri May 14 12:07:18 2004
@@ -912,13 +912,6 @@
 		}
 	}
 
-	/* Pretend we have VFS capabilities */
-	cap_set_full(bprm->cap_inheritable);
-	if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
-		cap_set_full(bprm->cap_permitted);
-	else
-		cap_clear(bprm->cap_permitted);
-
 	/* fill in binprm security blob */
 	retval = security_bprm_set(bprm);
 	if (retval)
diff -urN a/security/commoncap.c b/security/commoncap.c
--- a/security/commoncap.c	Fri May 14 10:07:28 2004
+++ b/security/commoncap.c	Fri May 14 12:08:30 2004
@@ -107,8 +107,16 @@
 
 int cap_bprm_set_security (struct linux_binprm *bprm)
 {
-	if (newcaps)
+	if (newcaps) {
+		/* Pretend we have VFS capabilities */
+		cap_set_full(bprm->cap_inheritable);
+		if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+			cap_set_full(bprm->cap_permitted);
+		else
+			cap_clear(bprm->cap_permitted);
+
 		return 0;
+	}
 
 	/* Copied from fs/exec.c:prepare_binprm. */
 

> +	/* FIXME: Is this overly harsh on setgid? */
> +	if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
> +	    new_pI != CAP_FULL_SET)
> +			bprm->secflags |= BINPRM_SEC_NOELEVATE;
> +
> +	if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
> +		is_setuid = is_setgid = 0;
> +		cap_clear(fP);
> +	}

This prevents sendmail from being setuid mail and
cap_net_bind_service=ep.

Regards, Olaf.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14 11:10     ` Olaf Dietsche
@ 2004-05-14 14:15       ` Andy Lutomirski
  2004-05-15 15:50         ` Olaf Dietsche
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14 14:15 UTC (permalink / raw)
  To: Olaf Dietsche; +Cc: Chris Wright, akpm, linux-kernel

Olaf Dietsche wrote:

> Andy Lutomirski <luto@myrealbox.com> writes:
> 
> 
>>+	/* Pretend we have VFS capabilities */
>>+	cap_set_full(bprm->cap_inheritable);
>>+	if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
>>+		cap_set_full(bprm->cap_permitted);
>>+	else
>>+		cap_clear(bprm->cap_permitted);
> 
> 
> I'd move this to security/commoncap.c:

[snip]

I put it in fs/exec.c because I had to add it to binprm anyway
(having commoncap use ->security would break SELinux), and, as
long as it's a permanent member of the structure, it made sense
for it to always be filled in.

>>+	/* FIXME: Is this overly harsh on setgid? */
>>+	if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
>>+	    new_pI != CAP_FULL_SET)
>>+			bprm->secflags |= BINPRM_SEC_NOELEVATE;
>>+
>>+	if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
>>+		is_setuid = is_setgid = 0;
>>+		cap_clear(fP);
>>+	}
> 
> 
> This prevents sendmail from being setuid mail and
> cap_net_bind_service=ep.

AAAAH!  That's right -- in my implementation <some cap>=ep on a file
makes no sense.  It's got to be inheritable to be permitted.

On the other hand, that rule could be safely by checking the old pI
as opposed to the new one.  The idea is to prevent a process without
full pI from execing (and thus confusing) old setuid apps.


BTW, in your capabilities implementation, why do you do:

# chcap cap_net_bind_service=ei /usr/sbin/named
# inhcaps cap_net_bind_service=i bind:bind /usr/sbin/named

It seems to me that this wants to be:

# inhcaps cap_net_bind_service=eip bind:bind /usr/sbin/named
(not having looked at your user tool)
or
# cap -c cap_net_bind_service=eip -u bind -g bind /usr/sbin/named
(using my tool)

With my patch, that just works (no fs caps necessary).  Why should the
admin have to tag a program so it is allowed to inherit caps?  (If
named used libexec, then its libexec helpers would have to be similarly
tagged, and, if it used bash, then bash would need the inheritable tag.)

If the answer's because that's how Linux cap evolution works, then
I'd say that Linux cap evolution is broken.

In any case, I'll probably redo my patch to restore the IRIX version of pI.

--Andy



> 
> Regards, Olaf.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14  5:58       ` Andy Lutomirski
@ 2004-05-14 17:45         ` Chris Wright
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14 17:45 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel, Valdis.Kletnieks

* Andy Lutomirski (luto@myrealbox.com) wrote:
>  > I tested as a module, and this doesn't run AFAICS
> 
> OK -- I give in.  I'll redo it as a kernel (non-module) boot parameter.
> And touch some more files that have no business being capability-aware
> while I'm at it :(

I'm not yet convinced this is necessary to get the behaviour we'd like,
which is simply execve() doesn't wipe caps categorically.

> The inheritable mask:
> > > Can you send me a sample of what breaks?
> > Yes.  It's something like this:
> >
> > keepcaps
> > dropuid
> > drop caps
> > execve()
> >
> > The caps that are left are like this.  (effective == inheritable) which
> > are a subset of permitted.
> 
> Is (eff == inh) what happens or what should happen?  If the former, then
> my patch is broken.  If the latter, either I'm confused, or see below.

eff == inh is what capset is doing, with intention.

> > > My concept of 'inheritable' is that caps that are _not_ inheritable
> > > may never be gained by this task or its children.  So a process
> > > should normally have all caps inheritable.
> >
> > This is the diff with Posix, which allows a process to inherit a
> > capability that it can never exercise.  However, it could pass the
> > capablity on to someone else who could inherit it.
> 
> So here's where I think we disagree:
> 
> Posix (as interpreted by Linux 2.4/2.6) says:
> pP' = (fP & cap_bset) | (fI & pI)
> 
> So (assuming that set_security did the "obvious" (but dangerous) thing):
> 
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := "task may gain these on exec from fI"
> 
> fP := "this program gets these caps (module cap_bset)"
> fI := "this program gets these caps if pI says so"
> 
> Which screams "overcomplicated."  I imagine that the use is for a
> trusted daemon to run an untrusted helper (with pI>0) which then
> calls back into trusted land and gets its caps back.  This is
> possibly convenient, but it seems to break (where break = scare me)
> when there are more than one such system on a given box.  Then one's
> untrusted program (with fI>0) can call the other's trusted fI>0
> helper.  I suppose the point is that a random user's program (with fI==0)
> can't try to exploit anything, but, for this to be at all secure, both
> fI>0 programs need to be secure against attack from the other (unrelated)
> system, should it be compromised.  Which means it might as well have set
> fP>0 and been done with it (I don't believe in security by inconvenience
> of exploit).

I agree it's overcomplicated.  Much of the complication is added in the
fs bits where you now have 6 fields to care about per exec.  And you've
got to make sure the fs bits acutally make sense in all cases and
provide useful security.  This is precisely why they haven't been
implemented yet.  They don't mix nicely with the world people are used
to with setuid, etc, and they add many extra knobs which means it's
difficult to admin.

> I see no particular invariants here, except for pE <= pP.
> 
> IRIX (thanks Valdis) says:
> 
> pI' = pI & fI
> pP' = fP | (pI' & fP)
> 
> Which I interpret as
> 
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := ~"task _loses_ these on exec"
> 
> fP := "this program gets these caps"
> fI := "this program may keep these caps"
> 
> This seems to want pP <= pI as an invariant.
> 
> This is what I always thought Linux capabilities meant to be.  They
> don't make my brain hurt.
> 
> But I also think they're overengineered.  Instead of:
> 
> drop_caps_from_inheritable()
> exec()
> 
> a program could do:
> 
> drop_caps_from_permitted()
> exec()

Yes.  Although I suppose the first case allows for simple damage control
when needing some privs locally, and calling into libraries which may
generate a fork/exec.

> And I can't imagine what use fI != ~0 has, since it's trivially
> accomplished by a wrapper.  It is also trivially bypassed by
> loading the program manually (with ld.so).

Not so sure (now that we're dreaming of fs caps), ld.so could have fI == 0.
IOW, similar to ld.so being worthless to gain setuid bit.

> So, in my patch, I decided steal the inheritable mask to mean this:
> 
> pI := "this process may gain these caps"
> fI := "this is an upper bound on pI"
> 
> In other words, if a process is extra-untrusted (e.g. it's a daemon
> that never needs a certain capability and has no business trying
> to gain it), it can drop it from pI.  Then it cannot try to abuse
> programs with pP>0.  The setuid override is just added paranoia.
> Another benefit is that it allows a securelevel-like scheme, where
> even root isn't quite trusted.
> 
> I suppose it might be inappropriate to steal this field like this,
> given that IRIX already has a (somewhat reasonable) use for it. I
> have no problem implementing IRIX-style capabilities and (if there
> is enough interest) adding a _fourth_ process field pM (process
> maximum capabilities) that does what my pI does.
> 
> As for the fE mask, I just don't see the point, although I _really_
> don't like the way it's described in the IRIX manpage.
> 
> IRIX has pE = pP & fE.  This breaks Posix non-capabilities
> compatibility for a program that's uid==0, euid!=0.  It should
> have pE==0 and pP>0.  But it calls exec() and gets pE>0.  This
> is bad.
> 
> Assuming there's something else there to fix that case,
> then I still don't see the point.  If a program is capability-
> aware, it can set its pE however it likes.  If not, then it probably
> expects pE==pP.  I guess there could be a trusted but dumb program
> that runs a trusted, cap-aware helper that needs capabilities.
> Then the admin sets fE==0 on the dump program and everything works.
> Seems a bit contrived, though.
> 
> On the other hand, I'm not wedded to my model of pE.  It'll be harder
> to get IRIX's right for uid!=euid.
> 
> CAP_SYS_PTRACE:
> 
> >>>>@@ -36,6 +41,11 @@
> >>>>int cap_ptrace (struct task_struct *parent, struct task_struct *child)
> >>>>{
> >>>>	/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
> >>>>+	/* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
> >>>>+	if (newcaps &&
> >>>>+	    !cap_issubset (child->cap_inheritable, current->cap_inheritable))
> >>>>+		return -EPERM;
> >>>
> >>>
> >>>Why no capable() override?  In fact, is this check really necessary?
> >>
> >>If task A has less inheritable caps than B, then A is somehow less trusted
> >>and has no business tracing B.
> >
> >
> > But it's not less.  It's just not a subset.  Task B could have only one
> > inheritable cap, while A could have all but the one cap that B has.  In
> > fact, that could be CAP_SYS_PTRACE.  So the threat is task A tracing B,
> > and using it to pass on capabilities that it wasn't allowed to pass on.
> > Which is what the permitted test was for before, and what CAP_SYS_PTRACE
> > was used to override.
> >
> >
> >>A concrete example: a system runs with very restricted inheritable caps
> >>on all processes except for a magic daemon.  The magic daemon holds on
> >>to CAP_SYS_ADMIN to umount everything at shutdown.  If the rest of the
> >>system gets rooted, it still shouldn't be possible to trace the daemon.
> >>(Yes, this is currently not workable -- I plan to add a sysctl that sets
> >>what inheritable caps a task must have for setuid to work.  The blanket
> >>requirement that _all_ must be present is to avoid bugs in which a
> >>setuid program assumes it will be fully privileged.)
> >
> > I suppose this eliminates the usefulness of CAP_SYS_PTRACE.
> 
> It lets one uid/gid trace another.  If CAP_SYS_PTRACE allowed a process
> to arbitrarity steal another's capabilities, then the process with
> CAP_SYS_PTRACE might as well have been given those capabilities.  That is,
> this should IMHO be disallowed

Yeah, this is just another example of capabilities being easily used
to leverage greater caps.  Others are:  CAP_SETUID, CAP_DAC_OVERRIDE,
CAP_SYS_MODULE, CAP_SYS_RAWIO...

> drop_all_but_CAP_SYS_PTRACE()
> exec(slightly trusted debugger process)
> ptrace(1) <--- but it was supposed to be "slightly trusted"!
> 
> So:
> 
> Should I redo this to keep IRIX's meaning of fI, should I keep mine,
> or should I do something else.  Chris -- in your examples, you seem
> to have some idea of what should be happening.  What do you think?

I'd like to see smallest possible change to allow exeve() to be more
fine grained rather than 0 or ~0 caps depending on setuid-root.  I think
we can work with existing meaning of Inh, Eff, Perm and find a way to
smarten up the transition rules slightly.  My patch is trivial, although
there's probably some cases where it's still unsafe.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2)
  2004-05-14 14:15       ` Andy Lutomirski
@ 2004-05-15 15:50         ` Olaf Dietsche
  0 siblings, 0 replies; 36+ messages in thread
From: Olaf Dietsche @ 2004-05-15 15:50 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Chris Wright, akpm, linux-kernel

Andy Lutomirski <luto@myrealbox.com> writes:

> BTW, in your capabilities implementation, why do you do:

There's no such thing like "my" capabilities implementation. I use the
capabilities implementation of linux main line as it is. My patch
provides the possibility to connect capabilities to files, nothing
else.

I never tried to fix or modify, how capabilities work.

> If the answer's because that's how Linux cap evolution works, then
> I'd say that Linux cap evolution is broken.

Well, it works for me. :-)

Regards, Olaf.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 17:32       ` Albert Cahalan
@ 2004-05-14 21:11         ` Chris Wright
  2004-05-14 19:32           ` Albert Cahalan
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Wright @ 2004-05-14 21:11 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Stephen Smalley, linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

* Albert Cahalan (albert@users.sourceforge.net) wrote:
> On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> > You missed the point.  Capability scheme maps far too
> > many operations to a single capability; CAP_SYS_ADMIN
> > in Linux is a good example.
> 
> What I said: lovely, but not exactly groundbreaking.
> 
> Suppose we break up CAP_SYS_ADMIN into 41 other bits.
> There you go. It's silly to argue that a system with
> more bits is some kind of major advance over one with
> just a few bits. There is a quality-of-implementation
> issue here, not some fundamental difference.

Needing more bits isn't the only problem.

> > TE model
> > defers organization of operations into equivalence
> > classes to the policy writer.
> 
> I don't see anything special here either. With a
> plain capability-bit system, you could allow for
> user-defined aliases that map to multiple bits.
> In some random /etc config file:
> 
> define ADMIN := FOO | BAR | BAZ

This doesn't effect the inflexibility of a single definition for domain
transistion that's inherent in the capabilty system.

> Lack of granularity is an implementation detail.
> (Blame the SGI folks that wouldn't listen to me.)
> Lack of granularity is not a design flaw.

It's a design flaw.  More bits won't help.  There's an important missing
piece...credentials for both subject and object.  Both of which can be
dynamic, and differ per subject's view of an object.

> What I'm looking for:
> 
> 1. configure the kernel by ...
> 2. ensure that /bin/foo runs early in boot
> 3. put "blah blah blah" in /etc/foo.conf
> 
> That is, is there a small set of simple config files
> and binaries that I could just slap onto an existing
> system to ensure that a particular app is granted an
> extra capability bit?
> 
> If yes, then the ugly old-Oracle hack is not needed.

Nearly.  There's the minor issue that execve() clears that bit more
agressively than desired for non-root processes.  Otherwise pam can do
it with pam_cap.  Which is all we're trying to fix here.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 21:11         ` Chris Wright
@ 2004-05-14 19:32           ` Albert Cahalan
  0 siblings, 0 replies; 36+ messages in thread
From: Albert Cahalan @ 2004-05-14 19:32 UTC (permalink / raw)
  To: Chris Wright
  Cc: Albert Cahalan, Stephen Smalley, linux-kernel mailing list, luto,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 17:11, Chris Wright wrote:
> * Albert Cahalan (albert@users.sourceforge.net) wrote:
> > On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> > > You missed the point.  Capability scheme maps far too
> > > many operations to a single capability; CAP_SYS_ADMIN
> > > in Linux is a good example.
> > 
> > What I said: lovely, but not exactly groundbreaking.
> > 
> > Suppose we break up CAP_SYS_ADMIN into 41 other bits.
> > There you go. It's silly to argue that a system with
> > more bits is some kind of major advance over one with
> > just a few bits. There is a quality-of-implementation
> > issue here, not some fundamental difference.
> 
> Needing more bits isn't the only problem.

That's what much of the document went on about. The
rest of the document was mostly generic MAC concepts.

> > > TE model
> > > defers organization of operations into equivalence
> > > classes to the policy writer.
> > 
> > I don't see anything special here either. With a
> > plain capability-bit system, you could allow for
> > user-defined aliases that map to multiple bits.
> > In some random /etc config file:
> > 
> > define ADMIN := FOO | BAR | BAZ
> 
> This doesn't effect the inflexibility of a single definition for domain
> transistion that's inherent in the capabilty system.

Sure. I already noted this.

> > Lack of granularity is an implementation detail.
> > (Blame the SGI folks that wouldn't listen to me.)
> > Lack of granularity is not a design flaw.
> 
> It's a design flaw.  More bits won't help.  There's an important missing
> piece...credentials for both subject and object.  Both of which can be
> dynamic, and differ per subject's view of an object.

There is no meaningful object.

subject: process 12345
object: ??????
operation: lock memory

For a few capability bits, there is a meaningful object
and you could use SELinux in place of the capability bits.
For most of the capability bits, this is not the case.

> > What I'm looking for:
> > 
> > 1. configure the kernel by ...
> > 2. ensure that /bin/foo runs early in boot
> > 3. put "blah blah blah" in /etc/foo.conf
> > 
> > That is, is there a small set of simple config files
> > and binaries that I could just slap onto an existing
> > system to ensure that a particular app is granted an
> > extra capability bit?
> > 
> > If yes, then the ugly old-Oracle hack is not needed.
> 
> Nearly.  There's the minor issue that execve() clears that bit more
> agressively than desired for non-root processes.  Otherwise pam can do
> it with pam_cap.  Which is all we're trying to fix here.

Stephen Smalley suggested that SELinux could take the
place of our capability bits. So I'm asking how you do
that, using the most minimal SELinux config.

If he really has a way, then there is no need to change
the execve() behavior in the Linux 2.6.x kernels.

Perhaps we just need an LSM hook to re-add the capability
bit after execve() drops it. That's a tiny change that
doesn't affect any existing system.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 16:18       ` Andy Lutomirski
  2004-05-14 16:37         ` Stephen Smalley
@ 2004-05-14 18:07         ` Chris Wright
  1 sibling, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14 18:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stephen Smalley, Andy Lutomirski, Albert Cahalan,
	linux-kernel mailing list, Chris Wright, olaf+list.linux-kernel,
	Valdis.Kletnieks

* Andy Lutomirski (luto@stanford.edu) wrote:
> Stephen Smalley wrote:
> > On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> > > Thanks -- turning brain back on, SELinux is obviously better than any
> > > fine-grained capability scheme I can imagine.
> > > 
> > > So unless anyone convinces me you're wrong, I'll stick with just
> > > fixing up capabilities to work without making them finer-grained.
> > 
> > Great, thanks.  Fixing capabilities to work is definitely useful and
> > desirable.  Significantly expanding them in any manner is a poor use of
> > limited resources, IMHO; I'd much rather see people work on applying
> > SELinux to the problem and solving it more effectively for the future.
> 
> Does this mean I should trash my 'maximum' mask?
> 
> (I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
> OTOH, since SELinux accomplishes this better, it may not be worth the
> effort.

Let's just get back to the simplest task.  Allow execve() to do smth.
reasonable with capabilities.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 15:19   ` Albert Cahalan
@ 2004-05-14 18:06     ` Stephen Smalley
  2004-05-14 17:32       ` Albert Cahalan
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Smalley @ 2004-05-14 18:06 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 11:19, Albert Cahalan wrote:
> I just read that. It's a very unfair marketing document.
> Among other things, it suggests that a capability system
> is stuck with about 40 bits while their own version of
> capabilities (a duck is a duck...) has 80 bits. Lovely,
> but not exactly groundbreaking.

You missed the point.  Capability scheme maps far too many operations to
a single capability; CAP_SYS_ADMIN in Linux is a good example.  TE model
defers organization of operations into equivalence classes to the policy
writer.

>  There is the bit about
> a 3-argument security call, but a careful reading will
> reveal that one argument is unused (NULL?) when dealing
> with abilities like "can set the clock".

But very useful when dealing with CAP_DAC_OVERRIDE and friends.

> About the only thing of interest is that capability
> transitions can be arbitrary. You're not limited to
> an obscure set of equations that nobody can agree on.

Because there is no one-size-fits-all equation for all transitions.

> The cost: complicated site-specific config files and
> the inability to support capability-aware apps that
> set+clear their own bits.

Complexity pushed to userspace, where it can be analyzed appropriately
via tools and tailored for the transition in question.  Central
management of the capabilities based on equivalence classes (types), as
opposed to having to manage a distributed nightmare of capability bits
on your filesystems.  Applications that can transition to other domains,
but only if so authorized by the policy.

> Eh, why? That's mostly a search-and-replace on the name,
> since capable() makes a perfectly fine LSM hook.

It doesn't offer sufficient granularity, either for operation (which you
_could_ address by introducing new capabilities, but the hooks are more
easily extensible) or for object.

> So what about the old-Oracle problem? You have a
> server that needs the ability to hog and lock memory.
> Is there an almost-empty SELinux policy that would
> provide this while leaving the rest of the system
> acting as UNIX-like systems have always acted?
> If so, we have a winner.

See "relaxed policy" or "targeted policy" in recent discussions on the
Fedora mailing lists; coming soon to a rawhide near you.  Not exactly
the same thing (it is a policy where most processes run essentially
unconfined except for a targeted set that you define like apache, bind,
etc), but you could certainly tweak it to this end (i.e. you put oracle
into a domain that gives it the one capability you desire).

> One still does need to provide apps with a way to
> answer "can I do FOO, BAR, and BAZ?" and "am I
> running with elevated privileges?". Some way to
> dispose of unneeded privileges would be good too.
> Hopefully extra libraries wouldn't be needed.

SELinux provides a policy API already, and a userspace library for
accessing it (and caching decisions from it).  It also provides (via the
AT_SECURE auxv entry) notification that a domain transition has
occurred, and this is already used by glibc secure mode.  Privileges are
dropped via domain transitions.

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 15:21 ` Stephen Smalley
  2004-05-14 15:19   ` Albert Cahalan
@ 2004-05-14 18:00   ` Chris Wright
  1 sibling, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14 18:00 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Albert Cahalan, linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

* Stephen Smalley (sds@epoch.ncsc.mil) wrote:
> On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> > This would be an excellent time to reconsider how capabilities
> > are assigned to bits. You're breaking things anyway; you might
> > as well do all the breaking at once. I want local-use bits so
> > that the print queue management access isn't by magic UID/GID.
> > We haven't escaped UID-as-priv if server apps and setuid apps
> > are still making UID-based access control decisions.
> 
> Capabilities are a broken model for privilege management; try RBAC/TE
> instead.  http://www.securecomputing.com/pdf/secureos.pdf has notes
> about the history and comparison of capabilities vs. TE.
> 
> Instead of adding new capability bits, replace capable() calls with LSM
> hook calls that offer you finer granularity both in operation and in
> object-based decisions, and then use a security module like SELinux to
> map that to actual permission checks.  SELinux provides a framework for
> defining security classes and permissions, including both definitions
> used by the kernel and definitions used by userspace policy enforcers
> (ala security-enhanced X).

exactly!

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 14:53 ` Andy Lutomirski
@ 2004-05-14 17:58   ` Chris Wright
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14 17:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Albert Cahalan, linux-kernel mailing list, chrisw,
	olaf+list.linux-kernel, Valdis.Kletnieks

* Andy Lutomirski (luto@myrealbox.com) wrote:
> > This would be an excellent time to reconsider how capabilities
> > are assigned to bits. You're breaking things anyway; you might
> > as well do all the breaking at once. I want local-use bits so
> > that the print queue management access isn't by magic UID/GID.
> > We haven't escaped UID-as-priv if server apps and setuid apps
> > are still making UID-based access control decisions.
> 
> How many bits?  Or should it even be a bitmask?
> 
> I'm thinking either 64 or 128 for kernel-defined caps and either
> a seperate 128 bits or more or just a list for local-defined.

Starts to look like the list of LSM callbacks.  Making it bigger doesn't
help the simple issue, keep one lousy bit across execve().  All this
redesign seems wrong to do in 2.6.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 12:03 [PATCH] capabilites, take 2 Albert Cahalan
  2004-05-14 14:53 ` Andy Lutomirski
  2004-05-14 15:21 ` Stephen Smalley
@ 2004-05-14 17:48 ` Chris Wright
  2 siblings, 0 replies; 36+ messages in thread
From: Chris Wright @ 2004-05-14 17:48 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, luto, chrisw, olaf+list.linux-kernel,
	Valdis.Kletnieks

* Albert Cahalan (albert@users.sourceforge.net) wrote:
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.

This is too volatile in stable series.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 18:06     ` Stephen Smalley
@ 2004-05-14 17:32       ` Albert Cahalan
  2004-05-14 21:11         ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Albert Cahalan @ 2004-05-14 17:32 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Albert Cahalan, linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> On Fri, 2004-05-14 at 11:19, Albert Cahalan wrote:
> > I just read that. It's a very unfair marketing document.
> > Among other things, it suggests that a capability system
> > is stuck with about 40 bits while their own version of
> > capabilities (a duck is a duck...) has 80 bits. Lovely,
> > but not exactly groundbreaking.
> 
> You missed the point.  Capability scheme maps far too
> many operations to a single capability; CAP_SYS_ADMIN
> in Linux is a good example.

What I said: lovely, but not exactly groundbreaking.

Suppose we break up CAP_SYS_ADMIN into 41 other bits.
There you go. It's silly to argue that a system with
more bits is some kind of major advance over one with
just a few bits. There is a quality-of-implementation
issue here, not some fundamental difference.

> TE model
> defers organization of operations into equivalence
> classes to the policy writer.

I don't see anything special here either. With a
plain capability-bit system, you could allow for
user-defined aliases that map to multiple bits.
In some random /etc config file:

define ADMIN := FOO | BAR | BAZ

> >  There is the bit about
> > a 3-argument security call, but a careful reading will
> > reveal that one argument is unused (NULL?) when dealing
> > with abilities like "can set the clock".
> 
> But very useful when dealing with CAP_DAC_OVERRIDE and friends.

I suppose so, but that isn't the interesting case.
We already have SELinux for that kind of thing.

> > About the only thing of interest is that capability
> > transitions can be arbitrary. You're not limited to
> > an obscure set of equations that nobody can agree on.
> 
> Because there is no one-size-fits-all equation for all transitions.
> 
> > The cost: complicated site-specific config files and
> > the inability to support capability-aware apps that
> > set+clear their own bits.
> 
> Complexity pushed to userspace, where it can be analyzed appropriately
> via tools and tailored for the transition in question.  Central
> management of the capabilities based on equivalence classes (types), as
> opposed to having to manage a distributed nightmare of capability bits
> on your filesystems.  Applications that can transition to other domains,
> but only if so authorized by the policy.

Complexity is pushed to admins, most of which are less
clueful than one might hope.

> > Eh, why? That's mostly a search-and-replace on the name,
> > since capable() makes a perfectly fine LSM hook.
> 
> It doesn't offer sufficient granularity, either for operation (which you
> _could_ address by introducing new capabilities, but the hooks are more
> easily extensible) or for object.

SELinux does, I hope, already deal with anything that
would involve an "object". So it is of little concern
what CAP_DAC_OVERRIDE does. What about setting the clock,
hogging memory, using the FIFO scheduler, and so on?

Lack of granularity is an implementation detail.
(Blame the SGI folks that wouldn't listen to me.)
Lack of granularity is not a design flaw.

> > So what about the old-Oracle problem? You have a
> > server that needs the ability to hog and lock memory.
> > Is there an almost-empty SELinux policy that would
> > provide this while leaving the rest of the system
> > acting as UNIX-like systems have always acted?
> > If so, we have a winner.
> 
> See "relaxed policy" or "targeted policy" in recent discussions on the
> Fedora mailing lists; coming soon to a rawhide near you.  Not exactly
> the same thing (it is a policy where most processes run essentially
> unconfined except for a targeted set that you define like apache, bind,
> etc), but you could certainly tweak it to this end (i.e. you put oracle
> into a domain that gives it the one capability you desire).

What I'm looking for:

1. configure the kernel by ...
2. ensure that /bin/foo runs early in boot
3. put "blah blah blah" in /etc/foo.conf

That is, is there a small set of simple config files
and binaries that I could just slap onto an existing
system to ensure that a particular app is granted an
extra capability bit?

If yes, then the ugly old-Oracle hack is not needed.

> > One still does need to provide apps with a way to
> > answer "can I do FOO, BAR, and BAZ?" and "am I
> > running with elevated privileges?". Some way to
> > dispose of unneeded privileges would be good too.
> > Hopefully extra libraries wouldn't be needed.
> 
> SELinux provides a policy API already, and a userspace library for
> accessing it (and caching decisions from it).  It also provides (via the
> AT_SECURE auxv entry) notification that a domain transition has
> occurred, and this is already used by glibc secure mode.  Privileges are
> dropped via domain transitions.

I take that as a "no" then, at least from the viewpoint
of a normal app author.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 16:18       ` Andy Lutomirski
@ 2004-05-14 16:37         ` Stephen Smalley
  2004-05-14 18:07         ` Chris Wright
  1 sibling, 0 replies; 36+ messages in thread
From: Stephen Smalley @ 2004-05-14 16:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Albert Cahalan, linux-kernel mailing list,
	Chris Wright, olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 12:18, Andy Lutomirski wrote:
> Does this mean I should trash my 'maximum' mask?
> 
> (I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
> OTOH, since SELinux accomplishes this better, it may not be worth the
> effort.

Not my call to make, but I'm not sure it is worthwhile.  Even filesystem
capabilities are questionable IMHO, as you can already authorize
capabilities based on program and call chain (typically just the
immediate caller's domain, but potentially encoding the entire call
chain in the domain transition rules to track any untrusted components
in the call chain) today using SELinux TE.  Today, the program still has
to be setuid as well, as we currently check both ordinary Linux
capability and SELinux permission for each operation to be conservative,
but eventually we would hope to drop the use of the capability module as
a secondary module and let SELinux manage the privileges entirely (once
the TE policy configuration is fully locked down and userland is fully
aware of SELinux).

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 16:01     ` Stephen Smalley
@ 2004-05-14 16:18       ` Andy Lutomirski
  2004-05-14 16:37         ` Stephen Smalley
  2004-05-14 18:07         ` Chris Wright
  0 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14 16:18 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Andy Lutomirski, Albert Cahalan, linux-kernel mailing list,
	Chris Wright, olaf+list.linux-kernel, Valdis.Kletnieks

Stephen Smalley wrote:

> On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> 
>>Thanks -- turning brain back on, SELinux is obviously better than any
>>fine-grained capability scheme I can imagine.
>>
>>So unless anyone convinces me you're wrong, I'll stick with just
>>fixing up capabilities to work without making them finer-grained.
> 
> 
> Great, thanks.  Fixing capabilities to work is definitely useful and
> desirable.  Significantly expanding them in any manner is a poor use of
> limited resources, IMHO; I'd much rather see people work on applying
> SELinux to the problem and solving it more effectively for the future.
> 

Does this mean I should trash my 'maximum' mask?

(I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
OTOH, since SELinux accomplishes this better, it may not be worth the
effort.

--Andy


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 15:57   ` Andy Lutomirski
@ 2004-05-14 16:01     ` Stephen Smalley
  2004-05-14 16:18       ` Andy Lutomirski
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Smalley @ 2004-05-14 16:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Albert Cahalan, linux-kernel mailing list, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> Thanks -- turning brain back on, SELinux is obviously better than any
> fine-grained capability scheme I can imagine.
> 
> So unless anyone convinces me you're wrong, I'll stick with just
> fixing up capabilities to work without making them finer-grained.

Great, thanks.  Fixing capabilities to work is definitely useful and
desirable.  Significantly expanding them in any manner is a poor use of
limited resources, IMHO; I'd much rather see people work on applying
SELinux to the problem and solving it more effectively for the future.

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
       [not found] ` <fa.mu5rj3d.24gtbp@ifi.uio.no>
@ 2004-05-14 15:57   ` Andy Lutomirski
  2004-05-14 16:01     ` Stephen Smalley
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14 15:57 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Albert Cahalan, linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks



Stephen Smalley wrote:

> On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> 
>>This would be an excellent time to reconsider how capabilities
>>are assigned to bits. You're breaking things anyway; you might
>>as well do all the breaking at once. I want local-use bits so
>>that the print queue management access isn't by magic UID/GID.
>>We haven't escaped UID-as-priv if server apps and setuid apps
>>are still making UID-based access control decisions.
> 
> 
> Capabilities are a broken model for privilege management; try RBAC/TE
> instead.  http://www.securecomputing.com/pdf/secureos.pdf has notes
> about the history and comparison of capabilities vs. TE.
> 
> Instead of adding new capability bits, replace capable() calls with LSM
> hook calls that offer you finer granularity both in operation and in
> object-based decisions, and then use a security module like SELinux to
> map that to actual permission checks.  SELinux provides a framework for
> defining security classes and permissions, including both definitions
> used by the kernel and definitions used by userspace policy enforcers
> (ala security-enhanced X).
> 

Thanks -- turning brain back on, SELinux is obviously better than any
fine-grained capability scheme I can imagine.

So unless anyone convinces me you're wrong, I'll stick with just
fixing up capabilities to work without making them finer-grained.

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 12:03 [PATCH] capabilites, take 2 Albert Cahalan
  2004-05-14 14:53 ` Andy Lutomirski
@ 2004-05-14 15:21 ` Stephen Smalley
  2004-05-14 15:19   ` Albert Cahalan
  2004-05-14 18:00   ` Chris Wright
  2004-05-14 17:48 ` Chris Wright
  2 siblings, 2 replies; 36+ messages in thread
From: Stephen Smalley @ 2004-05-14 15:21 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.

Capabilities are a broken model for privilege management; try RBAC/TE
instead.  http://www.securecomputing.com/pdf/secureos.pdf has notes
about the history and comparison of capabilities vs. TE.

Instead of adding new capability bits, replace capable() calls with LSM
hook calls that offer you finer granularity both in operation and in
object-based decisions, and then use a security module like SELinux to
map that to actual permission checks.  SELinux provides a framework for
defining security classes and permissions, including both definitions
used by the kernel and definitions used by userspace policy enforcers
(ala security-enhanced X).

-- 
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 15:21 ` Stephen Smalley
@ 2004-05-14 15:19   ` Albert Cahalan
  2004-05-14 18:06     ` Stephen Smalley
  2004-05-14 18:00   ` Chris Wright
  1 sibling, 1 reply; 36+ messages in thread
From: Albert Cahalan @ 2004-05-14 15:19 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Albert Cahalan, linux-kernel mailing list, luto, Chris Wright,
	olaf+list.linux-kernel, Valdis.Kletnieks

On Fri, 2004-05-14 at 11:21, Stephen Smalley wrote:
> On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> > This would be an excellent time to reconsider how capabilities
> > are assigned to bits. You're breaking things anyway; you might
> > as well do all the breaking at once. I want local-use bits so
> > that the print queue management access isn't by magic UID/GID.
> > We haven't escaped UID-as-priv if server apps and setuid apps
> > are still making UID-based access control decisions.
> 
> Capabilities are a broken model for privilege management; try RBAC/TE
> instead.  http://www.securecomputing.com/pdf/secureos.pdf has notes
> about the history and comparison of capabilities vs. TE.

I just read that. It's a very unfair marketing document.
Among other things, it suggests that a capability system
is stuck with about 40 bits while their own version of
capabilities (a duck is a duck...) has 80 bits. Lovely,
but not exactly groundbreaking. There is the bit about
a 3-argument security call, but a careful reading will
reveal that one argument is unused (NULL?) when dealing
with abilities like "can set the clock".

About the only thing of interest is that capability
transitions can be arbitrary. You're not limited to
an obscure set of equations that nobody can agree on.
The cost: complicated site-specific config files and
the inability to support capability-aware apps that
set+clear their own bits.

There is some value in unifying the handling of
capability bits with the handling of actor-to-actee
security controls. This simplifies the documentation,
and thus reduces the likelyhood of admin mistakes.

> Instead of adding new capability bits, replace capable()
> calls with LSM hook calls that offer you finer granularity
> both in operation and in object-based decisions, and then
> use a security module like SELinux to map that to actual
> permission checks.

Eh, why? That's mostly a search-and-replace on the name,
since capable() makes a perfectly fine LSM hook.

> SELinux provides a framework for
> defining security classes and permissions, including both definitions
> used by the kernel and definitions used by userspace policy enforcers
> (ala security-enhanced X).

Nice.

So what about the old-Oracle problem? You have a
server that needs the ability to hog and lock memory.
Is there an almost-empty SELinux policy that would
provide this while leaving the rest of the system
acting as UNIX-like systems have always acted?
If so, we have a winner.

One still does need to provide apps with a way to
answer "can I do FOO, BAR, and BAZ?" and "am I
running with elevated privileges?". Some way to
dispose of unneeded privileges would be good too.
Hopefully extra libraries wouldn't be needed.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
  2004-05-14 12:03 [PATCH] capabilites, take 2 Albert Cahalan
@ 2004-05-14 14:53 ` Andy Lutomirski
  2004-05-14 17:58   ` Chris Wright
  2004-05-14 15:21 ` Stephen Smalley
  2004-05-14 17:48 ` Chris Wright
  2 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2004-05-14 14:53 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, chrisw, olaf+list.linux-kernel,
	Valdis.Kletnieks

Albert Cahalan wrote:

> Andy Lutomirski writes:
> 
> 
>>Posix (as interpreted by Linux 2.4/2.6) says:
>>pP' = (fP & cap_bset) | (fI & pI)
>>
>>So (assuming that set_security did the "obvious" (but dangerous) thing):
>>
>>pP := "task may enable and use these capabilities"
>>pE := "task may use these capabilities _now_"
>>pI := "task may gain these on exec from fI"
>>
>>fP := "this program gets these caps (module cap_bset)"
>>fI := "this program gets these caps if pI says so"
>>
>>Which screams "overcomplicated."
> 
> 
> People will screw this up.
> 
> That you should even have to explain the names as you do is an
> indication that the names are poor.
> 
> Having names that match IRIX names but act differently is
> a sure path to disaster via confused admins and developers.
> 
> 
>>I see no particular invariants here, except for pE <= pP.
>>
>>IRIX (thanks Valdis) says:
>>
>>pI' = pI & fI
>>pP' = fP | (pI' & fP)
>>
>>Which I interpret as
>>
>>pP := "task may enable and use these capabilities"
>>pE := "task may use these capabilities _now_"
>>pI := ~"task _loses_ these on exec"
>>
>>fP := "this program gets these caps"
>>fI := "this program may keep these caps"
>>
>>This seems to want pP <= pI as an invariant.
>>
>>This is what I always thought Linux capabilities meant to be.  They
>>don't make my brain hurt.
>>
>>But I also think they're overengineered.  Instead of:
>>
>>drop_caps_from_inheritable()
>>exec()
>>
>>a program could do:
>>
>>drop_caps_from_permitted()
>>exec()
>>
>>And I can't imagine what use fI != ~0 has, since it's trivially
>>accomplished by a wrapper.  It is also trivially bypassed by
>>loading the program manually (with ld.so).
> 
> 
> Good point. Even if an exec-only executable would block this,
> nearly all admins will fail to mark it as such.
> 
> 
>>So, in my patch, I decided steal the inheritable mask to mean this:
>>
>>pI := "this process may gain these caps"
>>fI := "this is an upper bound on pI"
>>
>>In other words, if a process is extra-untrusted (e.g. it's a daemon
>>that never needs a certain capability and has no business trying
>>to gain it), it can drop it from pI.  Then it cannot try to abuse
>>programs with pP>0.  The setuid override is just added paranoia.
>>Another benefit is that it allows a securelevel-like scheme, where
>>even root isn't quite trusted.
>>
>>I suppose it might be inappropriate to steal this field like this,
>>given that IRIX already has a (somewhat reasonable) use for it. I
>>have no problem implementing IRIX-style capabilities and (if there
>>is enough interest) adding a _fourth_ process field pM (process
>>maximum capabilities) that does what my pI does.
> 
> 
> A few mostly-unrelated thoughts:
> 
> Rather than adding a compile-time option or boot option, simply
> change the syscall numbers and /proc/*/status names. This will
> cause any existing capability-aware tools to act as if being
> run on a pre-capability Linux kernel. This seems to be safer
> than allowing these tools to assume that nothing has changed.

That sounds nasty to get right.  What about just simulating this
for the benefit of old tools:

pP = (what it really is)
pE = (what it really is -- but it will mostly be pP)
pI = 0 (or full -- anything else is confusing)

Then capset for on other pids (which currently needs CAP_SETPCAP)
goes away completely, since, on stock Linux, that code is
unreachable anyway.

I think this emulates the current linux caps quite nicely.  Unless
we add lots more caps, in which case we'd have to make a guess at
what pP is.

Now this code becomes complicated (code-wise, not just conceptually :)

> 
> Allow me to mark an executable with a set of capabilities that
> must be all set or all unset. Default to ~0ull for setuid apps,
> and to 0ull for all other apps.
> Like this:
>    if ((pFOO & fBAR) != fBAR)    pFOO &= ~fBAR;

Once again bypassable by ld.so.  What about
if ((pM' & fBAR) != fBAR) ignore fP and setuid
(BINPRM_SEC_NO_ELEVATE)?

I'll call it fR (for required).  The default would be:
setuid-root: fR = full
setuid-nonroot and setgid: fR = CAP_SETUID

I check pM' not pP' because in the normal case these capabilities
were in fP anyway (except for setgid/setuid-nonroot, in which case
the admin probably doesn't want the low-pM task changing its uid).
If they weren't, then IMHO it's the job of the task with elevated
caps to be careful of what it runs rather than the filesystem.

So we get:

fM: maximum caps for this app and children (default full)
fP: app gains these (subject to fM and pM)
fR: as above

pM: maximum caps for this task and children (full for init)
pP: the usual
pE: the usual

> 
> Before writing the kernel code, how about writing documentation
> for admins, software developers, and Linux vendors? Until clear
> and readable documentation exists, this is all just a hazard.
> You might even make a mistake in the kernel code if it isn't
> easy for a non-kernel hacker to review how things will interact
> with their software. Just listing all the cases that need to
> be reviewed would be an improvement.

Good call.  I'll do that.

> 
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.
> 

How many bits?  Or should it even be a bitmask?

I'm thinking either 64 or 128 for kernel-defined caps and either
a seperate 128 bits or more or just a list for local-defined.

Then local caps could live in /etc (i don't like it) or /proc (not so
nice either) for collision-avoidance.

Or even numbered bits for kernel use and _names_ for user use.
You'd do sys_cap_register('printquota') and then that becomes
a legal name of a capability.  Then you need sys_cap_unregister()
to atomically remove it from all tasks.

I'll think about how my maximum mask fits in to all this.


Thanks,
Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] capabilites, take 2
@ 2004-05-14 12:03 Albert Cahalan
  2004-05-14 14:53 ` Andy Lutomirski
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Albert Cahalan @ 2004-05-14 12:03 UTC (permalink / raw)
  To: linux-kernel mailing list
  Cc: luto, chrisw, olaf+list.linux-kernel, Valdis.Kletnieks

Andy Lutomirski writes:

> Posix (as interpreted by Linux 2.4/2.6) says:
> pP' = (fP & cap_bset) | (fI & pI)
>
> So (assuming that set_security did the "obvious" (but dangerous) thing):
>
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := "task may gain these on exec from fI"
>
> fP := "this program gets these caps (module cap_bset)"
> fI := "this program gets these caps if pI says so"
>
> Which screams "overcomplicated."

People will screw this up.

That you should even have to explain the names as you do is an
indication that the names are poor.

Having names that match IRIX names but act differently is
a sure path to disaster via confused admins and developers.

> I see no particular invariants here, except for pE <= pP.
>
> IRIX (thanks Valdis) says:
>
> pI' = pI & fI
> pP' = fP | (pI' & fP)
>
> Which I interpret as
>
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := ~"task _loses_ these on exec"
>
> fP := "this program gets these caps"
> fI := "this program may keep these caps"
>
> This seems to want pP <= pI as an invariant.
>
> This is what I always thought Linux capabilities meant to be.  They
> don't make my brain hurt.
>
> But I also think they're overengineered.  Instead of:
>
> drop_caps_from_inheritable()
> exec()
>
> a program could do:
>
> drop_caps_from_permitted()
> exec()
>
> And I can't imagine what use fI != ~0 has, since it's trivially
> accomplished by a wrapper.  It is also trivially bypassed by
> loading the program manually (with ld.so).

Good point. Even if an exec-only executable would block this,
nearly all admins will fail to mark it as such.

> So, in my patch, I decided steal the inheritable mask to mean this:
>
> pI := "this process may gain these caps"
> fI := "this is an upper bound on pI"
>
> In other words, if a process is extra-untrusted (e.g. it's a daemon
> that never needs a certain capability and has no business trying
> to gain it), it can drop it from pI.  Then it cannot try to abuse
> programs with pP>0.  The setuid override is just added paranoia.
> Another benefit is that it allows a securelevel-like scheme, where
> even root isn't quite trusted.
>
> I suppose it might be inappropriate to steal this field like this,
> given that IRIX already has a (somewhat reasonable) use for it. I
> have no problem implementing IRIX-style capabilities and (if there
> is enough interest) adding a _fourth_ process field pM (process
> maximum capabilities) that does what my pI does.

A few mostly-unrelated thoughts:

Rather than adding a compile-time option or boot option, simply
change the syscall numbers and /proc/*/status names. This will
cause any existing capability-aware tools to act as if being
run on a pre-capability Linux kernel. This seems to be safer
than allowing these tools to assume that nothing has changed.

Allow me to mark an executable with a set of capabilities that
must be all set or all unset. Default to ~0ull for setuid apps,
and to 0ull for all other apps.
Like this:
   if ((pFOO & fBAR) != fBAR)    pFOO &= ~fBAR;

Before writing the kernel code, how about writing documentation
for admins, software developers, and Linux vendors? Until clear
and readable documentation exists, this is all just a hazard.
You might even make a mistake in the kernel code if it isn't
easy for a non-kernel hacker to review how things will interact
with their software. Just listing all the cases that need to
be reviewed would be an improvement.

This would be an excellent time to reconsider how capabilities
are assigned to bits. You're breaking things anyway; you might
as well do all the breaking at once. I want local-use bits so
that the print queue management access isn't by magic UID/GID.
We haven't escaped UID-as-priv if server apps and setuid apps
are still making UID-based access control decisions.




^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2004-05-15 15:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-13 20:08 [PATCH] capabilites, take 2 Andy Lutomirski
2004-05-14  1:20 ` Chris Wright
2004-05-14  1:35   ` Valdis.Kletnieks
2004-05-14  4:51     ` Chris Wright
2004-05-14  5:33     ` Olaf Dietsche
2004-05-14  6:04       ` Valdis.Kletnieks
2004-05-14  7:09         ` Olaf Dietsche
2004-05-14  2:27   ` Andy Lutomirski
2004-05-14  4:48     ` Chris Wright
2004-05-14  5:58       ` Andy Lutomirski
2004-05-14 17:45         ` Chris Wright
2004-05-14  6:39     ` Olaf Dietsche
2004-05-14  2:45   ` [PATCH] capabilities, take 3 (Re: [PATCH] capabilites, take 2) Andy Lutomirski
2004-05-14  5:04     ` Chris Wright
2004-05-14  5:32       ` Valdis.Kletnieks
2004-05-14  5:40         ` Chris Wright
2004-05-14  6:25           ` Valdis.Kletnieks
2004-05-14 11:10     ` Olaf Dietsche
2004-05-14 14:15       ` Andy Lutomirski
2004-05-15 15:50         ` Olaf Dietsche
2004-05-14 12:03 [PATCH] capabilites, take 2 Albert Cahalan
2004-05-14 14:53 ` Andy Lutomirski
2004-05-14 17:58   ` Chris Wright
2004-05-14 15:21 ` Stephen Smalley
2004-05-14 15:19   ` Albert Cahalan
2004-05-14 18:06     ` Stephen Smalley
2004-05-14 17:32       ` Albert Cahalan
2004-05-14 21:11         ` Chris Wright
2004-05-14 19:32           ` Albert Cahalan
2004-05-14 18:00   ` Chris Wright
2004-05-14 17:48 ` Chris Wright
     [not found] <fa.dt4cg55.jnqvr5@ifi.uio.no>
     [not found] ` <fa.mu5rj3d.24gtbp@ifi.uio.no>
2004-05-14 15:57   ` Andy Lutomirski
2004-05-14 16:01     ` Stephen Smalley
2004-05-14 16:18       ` Andy Lutomirski
2004-05-14 16:37         ` Stephen Smalley
2004-05-14 18:07         ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).