Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Eugene Lubarsky <elubarsky.linux@gmail.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, adobriyan@gmail.com,
	avagin@gmail.com, dsahern@gmail.com
Subject: Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes
Date: Tue, 25 Aug 2020 19:59:09 +1000	[thread overview]
Message-ID: <20200825195909.1d1dcd72@eug-lubuntu> (raw)
In-Reply-To: <20200810154132.GA4171851@kroah.com>

On Mon, 10 Aug 2020 17:41:32 +0200
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Tue, Aug 11, 2020 at 01:27:00AM +1000, Eugene Lubarsky wrote:
> > On Mon, 10 Aug 2020 17:04:53 +0200
> > Greg KH <gregkh@linuxfoundation.org> wrote:  
> And have you benchmarked any of this?  Try working with the common
> tools that want this information and see if it actually is noticeable
> (hint, I have been doing that with the readfile work and it's
> surprising what the results are in places...)

Apologies for the delay. Here are some benchmarks with atop.

Patch to atop at: https://github.com/eug48/atop/commits/proc-all
Patch to add /proc/all/schedstat & cpuset below.
atop not collecting threads & cmdline as /proc/all/ doesn't support it.
10,000 processes, kernel 5.8, nested KVM, 2 cores of i7-6700HQ @ 2.60GHz

# USE_PROC_ALL=0 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1

01:33:05   %usr %system  %guest   %wait    %CPU   CPU  Command
01:33:06  33.66   33.66    0.00    0.99   67.33     1  atop
01:33:07  33.00   32.00    0.00    2.00   65.00     0  atop
01:33:08  34.00   31.00    0.00    1.00   65.00     0  atop
...
Average:  33.15   32.79    0.00    1.09   65.94     -  atop


# USE_PROC_ALL=1 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1

01:33:33   %usr %system  %guest   %wait    %CPU   CPU  Command
01:33:34  28.00   14.00    0.00    1.00   42.00     1  atop
01:33:35  28.00   14.00    0.00    0.00   42.00     1  atop
01:33:36  26.00   13.00    0.00    0.00   39.00     1  atop
...
Average:  27.08   12.86    0.00    0.35   39.94     -  atop

So CPU usage goes down from ~65% to ~40%.

Data collection times in milliseconds are:

# xsv cat columns proc.csv procall.csv \
> | xsv stats \
> | xsv select field,min,max,mean,stddev \
> | xsv table
field           min  max  mean     stddev
/proc time      558  625  586.59   18.29
/proc/all time  231  262  243.56   8.02

Much performance optimisation can still be done, e.g. the modified atop
uses fgets which is reading 1KB at a time, and seq_file seems to only
return 4KB pages. task_diag should be much faster still.

I'd imagine this sort of thing would be useful for daemons monitoring
large numbers of processes. I don't run such systems myself; my initial
motivation was frustration with the Kubernetes kubelet having ~2-4% CPU
usage even with a couple of containers. Basic profiling suggests syscalls
have a lot to do with it - it's actually reading loads of tiny cgroup files
and enumerating many directories every 10 seconds, but /proc has similar
issues and seemed easier to start with.

Anyway, I've read that io_uring could also help here in the near future,
which would be really cool especially if there was a way to enumerate
directories and read many files regex-style in a single operation,
e.g. /proc/[0-9].*/(stat|statm|io)

> > Currently I'm trying to re-use the existing code in fs/proc that
> > controls which PIDs are visible, but may well be missing
> > something..  
> 
> Try it out and see if it works correctly.  And pid namespaces are not
> the only thing these days from what I call :)
> 
I've tried `unshare --fork --pid --mount-proc cat /proc/all/stat`
which seems to behave correctly. ptrace flags are handled by the
existing code.


Best Wishes,
Eugene


From 2ffc2e388f7ce4e3f182c2442823e5f13bae03dd Mon Sep 17 00:00:00 2001
From: Eugene Lubarsky <elubarsky.linux@gmail.com>
Date: Tue, 25 Aug 2020 12:36:41 +1000
Subject: [RFC PATCH] fs/proc: /proc/all: add schedstat and cpuset

Signed-off-by: Eugene Lubarsky <elubarsky.linux@gmail.com>
---
 fs/proc/base.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0bba4b3a985e..44d73f1ade4a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3944,6 +3944,36 @@ static int proc_all_io(struct seq_file *m, void *v)
 }
 #endif
 
+#ifdef CONFIG_PROC_PID_CPUSET
+static int proc_all_cpuset(struct seq_file *m, void *v)
+{
+	struct all_iter *iter = (struct all_iter *) v;
+	struct pid_namespace *ns = iter->ns;
+	struct task_struct *task = iter->tgid_iter.task;
+	struct pid *pid = task->thread_pid;
+
+	seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+	seq_puts(m, " ");
+
+	return proc_cpuset_show(m, ns, pid, task);
+}
+#endif
+
+#ifdef CONFIG_SCHED_INFO
+static int proc_all_schedstat(struct seq_file *m, void *v)
+{
+	struct all_iter *iter = (struct all_iter *) v;
+	struct pid_namespace *ns = iter->ns;
+	struct task_struct *task = iter->tgid_iter.task;
+	struct pid *pid = task->thread_pid;
+
+	seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+	seq_puts(m, " ");
+
+	return proc_pid_schedstat(m, ns, pid, task);
+}
+#endif
+
 static int proc_all_statx(struct seq_file *m, void *v)
 {
 	struct all_iter *iter = (struct all_iter *) v;
@@ -3990,6 +4020,12 @@ PROC_ALL_OPS(status);
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	PROC_ALL_OPS(io);
 #endif
+#ifdef CONFIG_SCHED_INFO
+	PROC_ALL_OPS(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+	PROC_ALL_OPS(cpuset);
+#endif
 
 #define PROC_ALL_CREATE(NAME) \
 	do { \
@@ -4011,4 +4047,10 @@ void __init proc_all_init(void)
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	PROC_ALL_CREATE(io);
 #endif
+#ifdef CONFIG_SCHED_INFO
+	PROC_ALL_CREATE(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+	PROC_ALL_CREATE(cpuset);
+#endif
 }
-- 
2.25.1


  reply	other threads:[~2020-08-25  9:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:58 Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 1/5] fs/proc: Introduce /proc/all/stat Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 2/5] fs/proc: Introduce /proc/all/statm Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 3/5] fs/proc: Introduce /proc/all/status Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 4/5] fs/proc: Introduce /proc/all/io Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 5/5] fs/proc: Introduce /proc/all/statx Eugene Lubarsky
2020-08-10 15:04 ` [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Greg KH
2020-08-10 15:27   ` Eugene Lubarsky
2020-08-10 15:41     ` Greg KH
2020-08-25  9:59       ` Eugene Lubarsky [this message]
2020-08-12  7:51 ` Andrei Vagin
2020-08-13  4:47   ` David Ahern
2020-08-13  8:03     ` Andrei Vagin
2020-08-13 15:01   ` Eugene Lubarsky
2020-08-20 17:41     ` Andrei Vagin
2020-08-25 10:00       ` Eugene Lubarsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825195909.1d1dcd72@eug-lubuntu \
    --to=elubarsky.linux@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=avagin@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).