Linux-Fsdevel Archive on
help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <>
Cc: Christian Borntraeger <>,
	Paolo Bonzini <>,
	Jim Mattson <>,
	Alexander Viro <>,
	Emanuele Giuseppe Esposito <>,
	David Rientjes <>,
	Jonathan Adams <>,,,,,,,,,,
	Emanuele Giuseppe Esposito <>
Subject: [PATCH v3 2/7] documentation for stats_fs
Date: Tue, 26 May 2020 13:03:12 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Html docs for a complete documentation of the stats_fs API,
filesystem and usage.

Signed-off-by: Emanuele Giuseppe Esposito <>
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/stats_fs.rst | 222 +++++++++++++++++++++++++
 2 files changed, 223 insertions(+)
 create mode 100644 Documentation/filesystems/stats_fs.rst

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index e7b46dac7079..9a46fd851c6e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,6 +89,7 @@ Documentation for filesystem implementations.
+   stats_fs
diff --git a/Documentation/filesystems/stats_fs.rst b/Documentation/filesystems/stats_fs.rst
new file mode 100644
index 000000000000..292c689ffb98
--- /dev/null
+++ b/Documentation/filesystems/stats_fs.rst
@@ -0,0 +1,222 @@
+Stats_fs is a synthetic ram-based virtual filesystem that takes care of
+gathering and displaying statistics for the Linux kernel subsystems.
+The motivation for stats_fs comes from the fact that there is no common
+way for Linux kernel subsystems to expose statistics to userspace shared
+throughout the Linux kernel; subsystems have to take care of gathering and
+displaying statistics by themselves, for example in the form of files in
+Allowing each subsystem of the kernel to do so has two disadvantages.
+First, it will introduce redundant code. Second, debugfs is anyway not the
+right place for statistics (for example it is affected by lockdown).
+Stats_fs offers a generic and stable API, allowing any kind of
+directory/file organization and supporting multiple kind of aggregations
+(not only sum, but also average, max, min and count_zero) and data types
+(boolean, all unsigned/signed and custom types). The implementation takes
+care of gathering and displaying information at run time; users only need
+to specify the values to be included in each source. Optionally, users can
+also provide a display function for each value, that will take care of
+displaying the provided value in a custom format.
+Its main function is to display each statistics as a file in the desired
+folder hierarchy defined through the API. Stats_fs files can be read, and
+possibly cleared if their file mode allows it.
+Stats_fs is typically mounted with a command like::
+    mount -t stats_fs stats_fs /sys/kernel/stats_fs
+(Or an equivalent /etc/fstab line).
+Stats_fs has two main components: the public API defined by
+include/linux/stats_fs.h, and the virtual file system in
+The API has two main elements, values and sources. Kernel
+subsystems will create a source, add child
+sources/values/aggregates and register it to the root source (that on the
+virtual fs would be /sys/kernel/stats).
+The stats_fs API is defined in ``<linux/stats_fs.h>``.
+    Sources
+        Sources are created via ``stats_fs_source_create()``, and each
+        source becomes a directory in the file system. Sources form a
+        parent-child relationship; root sources are added to the file
+        system via ``stats_fs_source_register()``. Therefore each Linux
+        subsystem will add its own entry to the root, filesystem similar
+        to what it is done in debugfs. Every other source is added to or
+        removed from a parent through the
+        ``stats_fs_source_add_subordinate()`` and
+        ``stats_fs_source_remove_subordinate()`` APIs. Once a source is
+        created and added to the tree (via add_subordinate), it will be
+        used to compute aggregate values in the parent source. A source
+        can optionally be hidden from the filesystem but still considered
+        in the aggregation operations if the corresponding flag is set
+        during initialization.
+    Values
+        Values represent quantites that are gathered by the stats_fs user.
+        Examples of values include the number of vm exits of a given kind,
+        the amount of memory used by some data structure, the length of
+        the longest hash table chain, or anything like that. Values are
+        defined with the stats_fs_source_add_values function. Each value
+        is defined by a ``struct stats_fs_value``; the same
+        ``stats_fs_value`` can be added to many different sources. A value
+        can be considered "simple" if it fetches data from a user-provided
+        location, or "aggregate" if it groups all values in the
+        subordinate sources that include the same ``stats_fs_value``.
+        Values by default are considered to be cumulative, meaning the
+        value they represent never decreases, but can also be defined as
+        floating if they exibith a different behavior. The main difference
+        between these two is reflected into the file permission, since a
+        floating value file does not allow the user to clear it. Each
+        value has a ``stats_fs_type`` pointer in order to allow the user
+        to provide custom get and clear functions. The library, however,
+        also exports default ``stats_fs_type`` structs for the standard
+        types (all unsigned and signed types plus boolean). A value can
+        also provide a show function that takes care of displaying the
+        value in a custom string format. This can be especially useful
+        when displaying enums.
+Because stats_fs is a different mountpoint than debugfs, it is not affected
+by security lockdown.
+Using Stats_fs
+Define a value::
+        struct statistics{
+                uint64_t exit;
+                ...
+        };
+        struct kvm {
+                ...
+                struct statistics stat;
+        };
+        struct stats_fs_value kvm_stats[] = {
+                { "exit_vm", offsetof(struct kvm, stat.exit), &stats_fs_type_u64,
+                  STATS_FS_SUM },
+                { NULL }
+        };
+The same ``struct stats_fs_value`` is used for both simple and aggregate
+values, though the type and offset are only used for simple values.
+Aggregates merge all values that use the same ``struct stats_fs_value``.
+Create the parent source::
+        struct stats_fs_source parent_source = stats_fs_source_create(0, "parent");
+Register it (files and folders
+will only be visible after this function is called)::
+        stats_fs_source_register(parent_source);
+Create and add a child::
+        struct stats_fs_source child_source = stats_fs_source_create(STATS_FS_HIDDEN, "child");
+        stats_fs_source_add_subordinate(parent_source, child_source);
+The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
+block the creation of the files.
+Add values to parent and child (also here order doesn't matter)::
+        struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
+        ...
+        stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
+        stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
+``child_source`` will be a simple value, since it has a non-NULL base
+pointer, while ``parent_source`` will be an aggregate. During the adding
+phase, also values can optionally be marked as hidden, so that the folder
+and other values can be still shown.
+Of course the same ``struct stats_fs_value`` array can be also passed with a
+different base pointer, to represent the same value but in another instance
+of the kvm struct.
+Fetch a value from the child source, returning the value
+pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
+        uint64_t ret_child, ret_parent;
+        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+Fetch an aggregate value, searching all subsources of ``parent_source`` for
+the specified ``struct stats_fs_value``::
+        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+        assert(ret_child == ret_parent); // check expected result
+To make it more interesting, add another child::
+        struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
+        stats_fs_source_add_subordinate(parent_source, child_source2);
+        // now  the structure is parent -> child1
+        //                              -> child2
+        struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
+        ...
+        stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
+Note that other_base_ptr points to another instance of kvm, so the struct
+stats_fs_value is the same but the address at which they point is not.
+Now get the aggregate value::
+        uint64_t ret_child, ret_child2, ret_parent;
+        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+        stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
+        assert((ret_child + ret_child2) == ret_parent);
+        stats_fs_source_remove_subordinate(parent_source, child_source);
+        stats_fs_source_revoke(child_source);
+        stats_fs_source_put(child_source);
+        stats_fs_source_remove_subordinate(parent_source, child_source2);
+        stats_fs_source_revoke(child_source2);
+        stats_fs_source_put(child_source2);
+        stats_fs_source_put(parent_source);
+        kfree(other_base_ptr);
+        kfree(base_ptr);
+Calling stats_fs_source_revoke is very important, because it will ensure
+that stats_fs will not access the data that were passed to
+stats_fs_source_add_value for this source.
+Because open files increase the reference count for a stats_fs_source, the
+source can end up living longer than the data that provides the values for
+the source.  Calling stats_fs_source_revoke just before the backing data
+is freed avoids accesses to freed data structures. The sources will return
+This is not needed for the parent_source, since it just contains
+aggregates that would be 0 anyways if no matching child value exist.
+API Documentation
+.. kernel-doc:: include/linux/stats_fs.h
+   :export: fs/stats_fs/*.c
\ No newline at end of file

  parent reply	other threads:[~2020-05-26 11:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-26 11:03 [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
2020-05-26 11:03 ` Emanuele Giuseppe Esposito [this message]
2020-06-04  0:23   ` [PATCH v3 2/7] documentation for stats_fs Randy Dunlap
2020-06-04 15:34     ` Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 3/7] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
2020-05-27 10:05   ` Alan Maguire
2020-05-27 13:26     ` Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 5/7] kvm_main: replace debugfs with stats_fs Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function Emanuele Giuseppe Esposito
2020-05-26 11:03 ` [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API Emanuele Giuseppe Esposito
2020-05-26 14:16   ` Andrew Lunn
2020-05-26 15:45     ` Emanuele Giuseppe Esposito
2020-05-26 22:31 ` [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics Jakub Kicinski
2020-05-27 13:14   ` Emanuele Giuseppe Esposito
2020-05-27 13:33     ` Andrew Lunn
2020-05-27 15:00       ` Paolo Bonzini
2020-05-27 20:23     ` Jakub Kicinski
2020-05-27 21:07       ` Paolo Bonzini
2020-05-27 21:27         ` Jakub Kicinski
2020-05-27 21:44           ` Paolo Bonzini
2020-05-27 22:21         ` David Ahern
2020-05-28  5:22           ` Paolo Bonzini
2020-05-27 21:17 ` Andrew Lunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).