Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrei Vagin <avagin@gmail.com>
To: Eugene Lubarsky <elubarsky.linux@gmail.com>
Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, adobriyan@gmail.com,
	dsahern@gmail.com
Subject: Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes
Date: Wed, 12 Aug 2020 00:51:35 -0700	[thread overview]
Message-ID: <20200812075135.GA191218@gmail.com> (raw)
In-Reply-To: <20200810145852.9330-1-elubarsky.linux@gmail.com>

On Tue, Aug 11, 2020 at 12:58:47AM +1000, Eugene Lubarsky wrote:
> This is an idea for substantially reducing the number of syscalls needed
> by monitoring tools whilst mostly re-using the existing API.
> 
> The proposed files in this proof-of-concept patch set are:
> 
> * /proc/all/stat
>       A stat line for each process in the existing format.
> 
> * /proc/all/statm
>       statm lines but starting with a PID column.
> 
> * /proc/all/status
>       status info for all processes in the existing format.
> 
> * /proc/all/io
>       The existing /proc/pid/io data but formatted as a single line for
>       each process, similarly to stat/statm, with a PID column added.
> 
> * /proc/all/statx
>       Gathers info from stat, statm and io; the purpose is actually
>       not so much to reduce syscalls but to help userspace be more
>       efficient by not having to store data in e.g. hashtables in order
>       to gather it from separate /proc/all/ files.
> 
>       The format proposed here starts with the unchanged stat line
>       and begins the other info with a few characters, repeating for
>       each process:
> 
>       ...
>       25 (cat) R 1 1 0 0 -1 4194304 185 0 16 0 2 0 0 0 20 ...
>       m 662 188 167 5 0 112 0
>       io 4292 0 12 0 0 0 0
>       ...
> 
> 
> There has been a proposal with some overlapping goals: /proc/task-diag
> (https://github.com/avagin/linux-task-diag), but I'm not sure about
> its current status.

I rebased the task_diag patches on top of v5.8:
https://github.com/avagin/linux-task-diag/tree/v5.8-task-diag

/proc/pid files have three major limitations:
* Requires at least three syscalls per process per file
  open(), read(), close()
* Variety of formats, mostly text based
  The kernel spent time to encode binary data into a text format and
  then tools like top and ps spent time to decode them back to a binary
  format.
* Sometimes slow due to extra attributes
  For example, /proc/PID/smaps contains a lot of useful informations
  about memory mappings and memory consumption for each of them. But
  even if we don't need memory consumption fields, the kernel will
  spend time to collect this information.

More details and numbers are in this article:
https://avagin.github.io/how-fast-is-procfs

This new interface doesn't have only one of these limitations, but
task_diag doesn't have all of them.

And I compared how fast each of these interfaces:

The test environment:
CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
RAM: 16GB
kernel: v5.8 with task_diag and /proc/all patches.
100K processes:
$ ps ax | wc -l
10228

$ time cat /proc/all/status > /dev/null

real	0m0.577s
user	0m0.017s
sys	0m0.559s

task_proc_all is used to read /proc/pid/status for all tasks:
https://github.com/avagin/linux-task-diag/blob/master/tools/testing/selftests/task_diag/task_proc_all.c

$ time ./task_proc_all status
tasks: 100230

real	0m0.924s
user	0m0.054s
sys	0m0.858s


/proc/all/status is about 40% faster than /proc/*/status.

Now let's take a look at the perf output:

$ time perf record -g cat /proc/all/status > /dev/null
$ perf report
-   98.08%     1.38%  cat      [kernel.vmlinux]  [k] entry_SYSCALL_64
   - 96.70% entry_SYSCALL_64
      - do_syscall_64
         - 94.97% ksys_read
            - 94.80% vfs_read
               - 94.58% proc_reg_read
                  - seq_read
                     - 87.95% proc_pid_status
                        + 13.10% seq_put_decimal_ull_width
                        - 11.69% task_mem
                           + 9.48% seq_put_decimal_ull_width
                        + 10.63% seq_printf
                        - 10.35% cpuset_task_status_allowed
                           + seq_printf
                        - 9.84% render_sigset_t
                             1.61% seq_putc
                           + 1.61% seq_puts
                        + 4.99% proc_task_name
                        + 4.11% seq_puts
                        - 3.76% render_cap_t
                             2.38% seq_put_hex_ll
                           + 1.25% seq_puts
                          2.64% __task_pid_nr_ns
                        + 1.54% get_task_mm
                        + 1.34% __lock_task_sighand
                        + 0.70% from_kuid_munged
                          0.61% get_task_cred
                          0.56% seq_putc
                          0.52% hugetlb_report_usage
                          0.52% from_kgid_munged
                     + 4.30% proc_all_next
                     + 0.82% _copy_to_user 

We can see that the kernel spent more than 50% of the time to encode binary
data into a text format.

Now let's see how fast task_diag:

$ time ./task_diag_all all -c -q

real	0m0.087s
user	0m0.001s
sys	0m0.082s

Maybe we need resurrect the task_diag series instead of inventing
another less-effective interface...

Thanks,
Andrei

> 
> 
> 
> Best Wishes,
> 
> Eugene
> 
> 
> Eugene Lubarsky (5):
>   fs/proc: Introduce /proc/all/stat
>   fs/proc: Introduce /proc/all/statm
>   fs/proc: Introduce /proc/all/status
>   fs/proc: Introduce /proc/all/io
>   fs/proc: Introduce /proc/all/statx
> 
>  fs/proc/base.c     | 215 +++++++++++++++++++++++++++++++++++++++++++--
>  fs/proc/internal.h |   1 +
>  fs/proc/root.c     |   1 +
>  3 files changed, 210 insertions(+), 7 deletions(-)
> 
> -- 
> 2.25.1
> 

  parent reply	other threads:[~2020-08-12  7:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:58 Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 1/5] fs/proc: Introduce /proc/all/stat Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 2/5] fs/proc: Introduce /proc/all/statm Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 3/5] fs/proc: Introduce /proc/all/status Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 4/5] fs/proc: Introduce /proc/all/io Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 5/5] fs/proc: Introduce /proc/all/statx Eugene Lubarsky
2020-08-10 15:04 ` [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Greg KH
2020-08-10 15:27   ` Eugene Lubarsky
2020-08-10 15:41     ` Greg KH
2020-08-25  9:59       ` Eugene Lubarsky
2020-08-12  7:51 ` Andrei Vagin [this message]
2020-08-13  4:47   ` David Ahern
2020-08-13  8:03     ` Andrei Vagin
2020-08-13 15:01   ` Eugene Lubarsky
2020-08-20 17:41     ` Andrei Vagin
2020-08-25 10:00       ` Eugene Lubarsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812075135.GA191218@gmail.com \
    --to=avagin@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=elubarsky.linux@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).