Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 1/2] proc: Allow pid_revalidate() during LOOKUP_RCU
@ 2020-12-19  0:06 Stephen Brennan
  2020-12-19  0:06 ` [PATCH v3 2/2] proc: ensure security hook is called after exec Stephen Brennan
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Brennan @ 2020-12-19  0:06 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Stephen Brennan, James Morris, Serge E. Hallyn,
	linux-security-module, Paul Moore, Stephen Smalley, Eric Paris,
	selinux, Casey Schaufler, Eric Biederman, Alexander Viro,
	linux-fsdevel, linux-kernel, Matthew Wilcox

The pid_revalidate() function requires dropping from RCU into REF lookup
mode. When many threads are resolving paths within /proc in parallel,
this can result in heavy spinlock contention on d_locrkef as each thread
tries to grab a reference to the /proc dentry (and drop it shortly
thereafter).

Allow the pid_revalidate() function to execute under LOOKUP_RCU. When
updates must be made to the inode, drop out of RCU and into REF mode.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

When running running ~100 parallel instances of "TZ=/etc/localtime ps -fe
>/dev/null" on a 100CPU machine, the %sys utilization reaches 90%, and perf
shows the following code path as being responsible for heavy contention on
the d_lockref spinlock:

      walk_component()
        lookup_fast()
          unlazy_child()
            lockref_get_not_dead(&nd->path.dentry->d_lockref)

By applying this patch, %sys utilization falls to around 60% under the same
workload. Although this particular workload is a bit contrived, we have seen
some monitoring scripts which produced similarly high %sys time due to this
contention.

Changes from v3:
- Rather than call pid_update_inode() with flags, create
  proc_inode_needs_update() to determine whether the call can be skipped.
- Restore the call to the security hook (see next patch).
Changes from v2:
- Remove get_pid_task_rcu_user() and get_proc_task_rcu(), since they were
  unnecessary.
- Remove the call to security_task_to_inode().

 fs/proc/base.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b3422cda2a91..4b246e9bd5df 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1968,6 +1968,20 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
 	security_task_to_inode(task, inode);
 }
 
+/* See if we can avoid the above call. Assumes RCU lock held */
+static bool pid_inode_needs_update(struct task_struct *task, struct inode *inode)
+{
+	kuid_t uid;
+	kgid_t gid;
+
+	if (inode->i_mode & (S_ISUID | S_ISGID))
+		return true;
+	task_dump_owner(task, inode->i_mode, &uid, &gid);
+	if (!uid_eq(uid, inode->i_uid) || !gid_eq(gid, inode->i_gid))
+		return true;
+	return false;
+}
+
 /*
  * Rewrite the inode's ownerships here because the owning task may have
  * performed a setuid(), etc.
@@ -1977,19 +1991,20 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	struct task_struct *task;
+	int rv = 0;
 
-	if (flags & LOOKUP_RCU)
-		return -ECHILD;
-
-	inode = d_inode(dentry);
-	task = get_proc_task(inode);
-
+	rcu_read_lock();
+	inode = d_inode_rcu(dentry);
+	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
-		pid_update_inode(task, inode);
-		put_task_struct(task);
-		return 1;
+		rv = 1;
+		if ((flags & LOOKUP_RCU) && pid_inode_needs_update(task, inode))
+			rv = -ECHILD;
+		else if (!(flags & LOOKUP_RCU))
+			pid_update_inode(task, inode);
 	}
-	return 0;
+	rcu_read_unlock();
+	return rv;
 }
 
 static inline bool proc_inode_is_dead(struct inode *inode)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] proc: ensure security hook is called after exec
  2020-12-19  0:06 [PATCH v3 1/2] proc: Allow pid_revalidate() during LOOKUP_RCU Stephen Brennan
@ 2020-12-19  0:06 ` Stephen Brennan
  2021-01-04 14:16   ` Stephen Smalley
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Brennan @ 2020-12-19  0:06 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Stephen Brennan, James Morris, Serge E. Hallyn,
	linux-security-module, Paul Moore, Stephen Smalley, Eric Paris,
	selinux, Casey Schaufler, Eric Biederman, Alexander Viro,
	linux-fsdevel, linux-kernel, Matthew Wilcox

Smack needs its security_task_to_inode() hook to be called when a task
execs a new executable. Store the self_exec_id of the task and call the
hook via pid_update_inode() whenever the exec_id changes.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

As discussed on the v2 of the patch, this should allow Smack to receive a
security_task_to_inode() call only when the uid/gid changes, or when the task
execs a new binary. I have verified that this doesn't change the performance of
the patch set, and that we do fall out of RCU walk on tasks which have recently
exec'd.

 fs/proc/base.c     | 4 +++-
 fs/proc/internal.h | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4b246e9bd5df..ad59e92e8433 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1917,6 +1917,7 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
 	}
 
 	task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
+	ei->exec_id = task->self_exec_id;
 	security_task_to_inode(task, inode);
 
 out:
@@ -1965,6 +1966,7 @@ void pid_update_inode(struct task_struct *task, struct inode *inode)
 	task_dump_owner(task, inode->i_mode, &inode->i_uid, &inode->i_gid);
 
 	inode->i_mode &= ~(S_ISUID | S_ISGID);
+	PROC_I(inode)->exec_id = task->self_exec_id;
 	security_task_to_inode(task, inode);
 }
 
@@ -1979,7 +1981,7 @@ static bool pid_inode_needs_update(struct task_struct *task, struct inode *inode
 	task_dump_owner(task, inode->i_mode, &uid, &gid);
 	if (!uid_eq(uid, inode->i_uid) || !gid_eq(gid, inode->i_gid))
 		return true;
-	return false;
+	return task->self_exec_id != PROC_I(inode)->exec_id;
 }
 
 /*
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index f60b379dcdc7..1df9b039dfc3 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -92,7 +92,10 @@ union proc_op {
 
 struct proc_inode {
 	struct pid *pid;
-	unsigned int fd;
+	union {
+		unsigned int fd;
+		u32 exec_id;
+	};
 	union proc_op op;
 	struct proc_dir_entry *pde;
 	struct ctl_table_header *sysctl;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] proc: ensure security hook is called after exec
  2020-12-19  0:06 ` [PATCH v3 2/2] proc: ensure security hook is called after exec Stephen Brennan
@ 2021-01-04 14:16   ` Stephen Smalley
  2021-01-04 14:22     ` Stephen Smalley
  2021-01-04 19:51     ` Stephen Brennan
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen Smalley @ 2021-01-04 14:16 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Alexey Dobriyan, James Morris, Serge E. Hallyn, LSM List,
	Paul Moore, Eric Paris, SElinux list, Casey Schaufler,
	Eric Biederman, Alexander Viro, Linux FS Devel, linux-kernel,
	Matthew Wilcox

On Fri, Dec 18, 2020 at 7:06 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Smack needs its security_task_to_inode() hook to be called when a task
> execs a new executable. Store the self_exec_id of the task and call the
> hook via pid_update_inode() whenever the exec_id changes.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

Sorry to be late in responding, but the proc inode security structure
needs to be updated not only upon a context-changing exec but also
upon a setcon(3) aka write to /proc/self/attr/current just like the
uid/gid needs to be updated not only upon a setuid exec but also upon
a setuid(2).  I'm also unclear as to why you can't call
security_task_to_inode during RCU lookup; it doesn't block/sleep
AFAICT.
All it does is take a spinlock and update a few fields.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] proc: ensure security hook is called after exec
  2021-01-04 14:16   ` Stephen Smalley
@ 2021-01-04 14:22     ` Stephen Smalley
  2021-01-04 19:51     ` Stephen Brennan
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Smalley @ 2021-01-04 14:22 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Alexey Dobriyan, James Morris, Serge E. Hallyn, LSM List,
	Paul Moore, Eric Paris, SElinux list, Casey Schaufler,
	Eric Biederman, Alexander Viro, Linux FS Devel, linux-kernel,
	Matthew Wilcox

On Mon, Jan 4, 2021 at 9:16 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 7:06 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
> >
> > Smack needs its security_task_to_inode() hook to be called when a task
> > execs a new executable. Store the self_exec_id of the task and call the
> > hook via pid_update_inode() whenever the exec_id changes.
> >
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> Sorry to be late in responding, but the proc inode security structure
> needs to be updated not only upon a context-changing exec but also
> upon a setcon(3) aka write to /proc/self/attr/current just like the
> uid/gid needs to be updated not only upon a setuid exec but also upon
> a setuid(2).  I'm also unclear as to why you can't call
> security_task_to_inode during RCU lookup; it doesn't block/sleep
> AFAICT.
> All it does is take a spinlock and update a few fields.

You could also optimize this by comparing the sid similar to how the
uid/gid are compared and only updating it within the hook if it has
not yet been initialized or has changed since it was originally set.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] proc: ensure security hook is called after exec
  2021-01-04 14:16   ` Stephen Smalley
  2021-01-04 14:22     ` Stephen Smalley
@ 2021-01-04 19:51     ` Stephen Brennan
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Brennan @ 2021-01-04 19:51 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Alexey Dobriyan, James Morris, Serge E. Hallyn, LSM List,
	Paul Moore, Eric Paris, SElinux list, Casey Schaufler,
	Eric Biederman, Alexander Viro, Linux FS Devel, linux-kernel,
	Matthew Wilcox

Stephen Smalley <stephen.smalley.work@gmail.com> writes:

> On Fri, Dec 18, 2020 at 7:06 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Smack needs its security_task_to_inode() hook to be called when a task
>> execs a new executable. Store the self_exec_id of the task and call the
>> hook via pid_update_inode() whenever the exec_id changes.
>>
>> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> Sorry to be late in responding, but the proc inode security structure
> needs to be updated not only upon a context-changing exec but also
> upon a setcon(3) aka write to /proc/self/attr/current just like the
> uid/gid needs to be updated not only upon a setuid exec but also upon
> a setuid(2).  I'm also unclear as to why you can't call
> security_task_to_inode during RCU lookup; it doesn't block/sleep
> AFAICT.
> All it does is take a spinlock and update a few fields.

The reason I assumed that we need to drop out of RCU mode to update the
inode and call the security hooks was simply because that is how the
code worked originally. I wanted to be conservative in my changes, by
only leaving RCU mode "when necessary", but this assumed that it was
necessary to leave RCU mode at all!

None of the data in a proc inode (at least, i_mode, i_uid, i_gid) seems
to be "RCU protected" in the sense that they could not be modified
during an RCU read critical section. If this were the case, then there
would have to be some sort of copying and a synchronize_rcu() used
somewhere.  So it seems that running pid_update_inode() (which does not
sleep and simply takes some spinlocks) should be safe during RCU mode.

My assumption had originally been that the security_pid_to_inode() calls
could be liable to sleep. But during this review we've seen that both
the selinux and smack security_pid_to_inode() implementations are also
"RCU safe" in that they do not sleep.

So rather than trying to guess when this security hook would like to be
called, it seems that it would be safe to take the easiest option: just
execute pid_revalidate() in RCU mode always, for instance with the
example changes below. Is there anything obviously wrong with this
approach that I'm missing?

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ebea9501afb8..105581e51032 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1830,19 +1830,18 @@ static int pid_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct inode *inode;
 	struct task_struct *task;
+	int rv = 0;
 
-	if (flags & LOOKUP_RCU)
-		return -ECHILD;
-
-	inode = d_inode(dentry);
-	task = get_proc_task(inode);
+	rcu_read_lock();
+	inode = d_inode_rcu(dentry);
+	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 
 	if (task) {
 		pid_update_inode(task, inode);
-		put_task_struct(task);
-		return 1;
+		rv = 1;
 	}
-	return 0;
+	rcu_read_unlock();
+	return rv;
 }
 
 static inline bool proc_inode_is_dead(struct inode *inode)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-04 19:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-19  0:06 [PATCH v3 1/2] proc: Allow pid_revalidate() during LOOKUP_RCU Stephen Brennan
2020-12-19  0:06 ` [PATCH v3 2/2] proc: ensure security hook is called after exec Stephen Brennan
2021-01-04 14:16   ` Stephen Smalley
2021-01-04 14:22     ` Stephen Smalley
2021-01-04 19:51     ` Stephen Brennan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).