LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com> To: linux-kernel@vger.kernel.org Cc: Nitesh Lal <nilal@redhat.com>, Nicolas Saenz Julienne <nsaenzju@redhat.com>, Frederic Weisbecker <frederic@kernel.org>, Christoph Lameter <cl@linux.com>, Juri Lelli <juri.lelli@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com> Subject: [patch V3 2/8] add prctl task isolation prctl docs and samples Date: Tue, 24 Aug 2021 12:24:25 -0300 [thread overview] Message-ID: <20210824152646.706875395@fuller.cnet> (raw) In-Reply-To: 20210824152423.300346181@fuller.cnet Add documentation and userspace sample code for prctl task isolation interface. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++ samples/Kconfig | 7 samples/Makefile | 1 samples/task_isolation/Makefile | 9 + samples/task_isolation/task_isol.c | 83 +++++++++ samples/task_isolation/task_isol.h | 9 + samples/task_isolation/task_isol_userloop.c | 56 ++++++ 7 files changed, 376 insertions(+) Index: linux-2.6/Documentation/userspace-api/task_isolation.rst =================================================================== --- /dev/null +++ linux-2.6/Documentation/userspace-api/task_isolation.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================== +Task isolation prctl interface +=============================== + +Certain types of applications benefit from running uninterrupted by +background OS activities. Realtime systems and high-bandwidth networking +applications with user-space drivers can fall into the category. + +To create an OS noise free environment for the application, this +interface allows userspace to inform the kernel the start and +end of the latency sensitive application section (with configurable +system behaviour for that section). + +Note: the prctl interface is independent of nohz_full=. + +The prctl options are: + + + - PR_ISOL_FEAT: Retrieve supported features. + - PR_ISOL_GET: Retrieve task isolation parameters. + - PR_ISOL_SET: Set task isolation parameters. + - PR_ISOL_CTRL_GET: Retrieve task isolation state. + - PR_ISOL_CTRL_SET: Set task isolation state. + - PR_ISOL_GET_INT: Retrieve internal parameters. + - PR_ISOL_SET_INT: Retrieve internal parameters. + + + +Inheritance of the isolation parameters and state, across +fork(2) and clone(2), can be changed via +PR_ISOL_GET_INT/PR_ISOL_SET_INT. + +At a high-level, task isolation is divided in two steps: + +1. Configuration. +2. Activation. + +Section "Userspace support" describes how to use +task isolation. + +In terms of the interface, the sequence of steps to activate +task isolation are: + +1. Retrieve supported task isolation features (PR_ISOL_FEAT). +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET). +3. Activate or deactivate task isolation features + (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET). +4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT). + +This interface is based on ideas and code from the +task isolation patchset from Alex Belits: +https://lwn.net/Articles/816298/ + +-------------------- +Feature description +-------------------- + + - ``ISOL_F_QUIESCE`` + + This feature allows quiescing select kernel activities on + return from system calls. + +--------------------- +Interface description +--------------------- + +**PR_ISOL_FEAT**: + + Returns the supported features and feature + capabilities, as a bitmask:: + + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5); + + The 'feat' argument specifies whether to return + supported features (if zero), or feature capabilities + (if not zero). Possible non-zero values for 'feat' are: + + - ``ISOL_F_QUIESCE``: + + Returns a bitmask containing which kernel + activities are supported for quiescing. + + Features and its capabilities are defined at include/uapi/linux/task_isolation.h. + +**PR_ISOL_GET**: + + Retrieve task isolation feature configuration. + The general format is:: + + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5); + + The 'feat' argument specifies whether to return + configured features (if zero), or individual feature + configuration (if not zero). Possible non-zero + values for 'feat' are: + + - ``ISOL_F_QUIESCE``: + + Returns a bitmask containing which kernel + activities are enabled for quiescing. + + +**PR_ISOL_SET**: + + Configures task isolation features. The general format is:: + + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5); + + The 'feat' argument specifies which feature to configure. + Possible values for feat are: + + - ``ISOL_F_QUIESCE``: + + The 'arg3' argument is a bitmask specifying which + kernel activities to quiesce. Possible bit sets are: + + - ``ISOL_F_QUIESCE_VMSTATS`` + + VM statistics are maintained in per-CPU counters to + improve performance. When a CPU modifies a VM statistic, + this modification is kept in the per-CPU counter. + Certain activities require a global count, which + involves requesting each CPU to flush its local counters + to the global VM counters. + + This flush is implemented via a workqueue item, which + might schedule a workqueue on isolated CPUs. + + To avoid this interruption, task isolation can be + configured to, upon return from system calls, synchronize + the per-CPU counters to global counters, thus avoiding + the interruption. + + To ensure the application returns to userspace + with no modified per-CPU counters, its necessary to + use mlockall() in addition to this isolcpus flag. + +**PR_ISOL_CTRL_GET**: + + Retrieve task isolation control. + + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0); + + Returns which isolation features are active. + +**PR_ISOL_CTRL_SET**: + + Activates/deactivates task isolation control. + + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0); + + The 'mask' argument specifies which features + to activate (bit set) or deactivate (bit clear). + + For ISOL_F_QUIESCE, quiescing of background activities + happens on return to userspace from the + prctl(PR_ISOL_CTRL_SET) call, and on return from + subsequent system calls. + + Quiescing can be adjusted (while active) by + prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...). + +**PR_ISOL_GET_INT**: + + Retrieves task isolation internal parameters. + + The general format is:: + + prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5); + + The 'cmd' argument specifies which parameter to configure. + Possible values for cmd are: + + - ``INHERIT_CFG``: + + Retrieve inheritance configuration. + + The 'arg3' argument is a pointer to a struct + inherit_control:: + + struct task_isol_inherit_control { + __u8 inherit_mask; + __u8 pad[7]; + }; + + See PR_ISOL_SET_INT description below for meaning + of structure fields. + +**PR_ISOL_SET_INT**: + + Sets task isolation internal parameters. + + The general format is:: + + prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5); + + The 'cmd' argument specifies which parameter to configure. + Possible values for cmd are: + + - ``INHERIT_CFG``: + + Set inheritance configuration when a new task + is created via fork and clone. + + The 'arg3' argument is a pointer to a struct + inherit_control:: + + struct task_isol_inherit_control { + __u8 inherit_mask; + __u8 pad[7]; + }; + + inherit_mask is a bitmask that specifies which part + of task isolation to be inherited: + + - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration. + This is the stated written via prctl(PR_ISOL_SET, ...). + + - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation + (requires ISOL_INHERIT_CONF to be set). The new task + should behave, right after fork/clone, in the same manner + as the parent task _after_ it executed: + + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0); + + with a valid mask. + +================== +Userspace support +================== + +Task isolation is divided in two main steps: configuration and activation. + +Each step can be performed by an external tool or the latency sensitive +application itself. util-linux contains the "chisol" tool for this +purpose. + +This results in three combinations: + +1. Both configuration and activation performed by the +latency sensitive application. +Allows fine grained control of what task isolation +features are enabled and when (see samples section below). + +2. Only activation can be performed by the latency sensitive app +(and configuration performed by chisol). +This allows the admin/user to control task isolation parameters, +and applications have to be modified only once. + +3. Configuration and activation performed by an external tool. +This allows unmodified applications to take advantage of +task isolation. Activation is performed by the "-a" option +of chisol. + +======== +Examples +======== + +The ``samples/task_isolation/`` directory contains sample +code. + +This is a snippet of code to activate task isolation if +it has been previously configured (by chisol for example):: + + #include <sys/prctl.h> + + #ifdef PR_ISOL_GET + ret = prctl(PR_ISOL_GET, 0, 0, 0, 0); + if (ret != -1) { + unsigned long mask = ret; + + ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_CTRL_SET"); + return ret; + } + } + #endif + Index: linux-2.6/samples/task_isolation/task_isol.c =================================================================== --- /dev/null +++ linux-2.6/samples/task_isolation/task_isol.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <sys/mman.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/prctl.h> +#include <linux/prctl.h> +#include <errno.h> +#include "task_isol.h" + +#ifdef PR_ISOL_FEAT +int task_isol_setup(void) +{ + int ret; + int errnosv; + + /* Retrieve supported task isolation features */ + ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_FEAT"); + return ret; + } + printf("supported features bitmask: 0x%x\n", ret); + + /* Retrieve supported ISOL_F_QUIESCE bits */ + ret = prctl(PR_ISOL_FEAT, ISOL_F_QUIESCE, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_FEAT (ISOL_F_QUIESCE)"); + return ret; + } + printf("supported ISOL_F_QUIESCE bits: 0x%x\n", ret); + + /* Do not configure task isolation attributes if already set */ + ret = prctl(PR_ISOL_GET, 0, 0, 0, 0); + errnosv = errno; + if (ret != -1) { + printf("Task isolation parameters already configured!\n"); + return ret; + } + if (ret == -1 && errnosv != ENODATA) { + perror("prctl PR_ISOL_GET"); + return ret; + } + ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, + 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_SET"); + return ret; + } + return ISOL_F_QUIESCE; +} + +int task_isol_ctrl_set(unsigned long mask) +{ + int ret; + + ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)"); + return -1; + } + + return 0; +} + +#else + +int task_isol_setup(void) +{ + return 0; +} + +int task_isol_ctrl_set(unsigned long mask) +{ + return 0; +} +#endif + + Index: linux-2.6/samples/task_isolation/task_isol.h =================================================================== --- /dev/null +++ linux-2.6/samples/task_isolation/task_isol.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __TASK_ISOL_H +#define __TASK_ISOL_H + +int task_isol_setup(void); + +int task_isol_ctrl_set(unsigned long mask); + +#endif /* __TASK_ISOL_H */ Index: linux-2.6/samples/task_isolation/task_isol_userloop.c =================================================================== --- /dev/null +++ linux-2.6/samples/task_isolation/task_isol_userloop.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <sys/mman.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <unistd.h> +#include <sys/prctl.h> +#include <linux/prctl.h> +#include "task_isol.h" + +int main(void) +{ + int ret; + void *buf = malloc(4096); + unsigned long mask; + + memset(buf, 1, 4096); + ret = mlock(buf, 4096); + if (ret) { + perror("mlock"); + return EXIT_FAILURE; + } + + ret = task_isol_setup(); + if (ret) + return EXIT_FAILURE; + mask = ret; + + ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)"); + return EXIT_FAILURE; + } + +#define NR_LOOPS 999999999 +#define NR_PRINT 100000000 + /* busy loop */ + while (ret < NR_LOOPS) { + memset(buf, 0, 4096); + ret = ret+1; + if (!(ret % NR_PRINT)) + printf("loops=%d of %d\n", ret, NR_LOOPS); + } + + ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_CTRL_SET (0)"); + exit(0); + } + + return EXIT_SUCCESS; +} + Index: linux-2.6/samples/Kconfig =================================================================== --- linux-2.6.orig/samples/Kconfig +++ linux-2.6/samples/Kconfig @@ -223,4 +223,11 @@ config SAMPLE_WATCH_QUEUE Build example userspace program to use the new mount_notify(), sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. +config SAMPLE_TASK_ISOLATION + bool "task isolation sample" + depends on CC_CAN_LINK && HEADERS_INSTALL + help + Build example userspace program to use prctl task isolation + interface. + endif # SAMPLES Index: linux-2.6/samples/Makefile =================================================================== --- linux-2.6.orig/samples/Makefile +++ linux-2.6/samples/Makefile @@ -30,3 +30,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/ subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/ +subdir-$(CONFIG_SAMPLE_TASK_ISOLATION) += task_isolation Index: linux-2.6/samples/task_isolation/Makefile =================================================================== --- /dev/null +++ linux-2.6/samples/task_isolation/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0 +# +userprogs-always-y += task_isol_userloop +task_isol_userloop-objs := task_isol.o task_isol_userloop.o + +userccflags += -I usr/include + + +#$(CC) $^ -o $@
next prev parent reply other threads:[~2021-08-24 15:43 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-24 15:24 [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti 2021-08-24 15:24 ` [patch V3 1/8] add basic task isolation prctl interface Marcelo Tosatti 2021-08-24 15:24 ` Marcelo Tosatti [this message] 2021-08-26 9:59 ` [patch V3 2/8] add prctl task isolation prctl docs and samples Frederic Weisbecker 2021-08-26 12:11 ` Marcelo Tosatti 2021-08-26 19:15 ` Christoph Lameter 2021-08-26 20:37 ` Marcelo Tosatti 2021-08-27 13:08 ` Frederic Weisbecker 2021-08-27 14:44 ` Marcelo Tosatti 2021-08-30 11:38 ` Frederic Weisbecker 2021-09-01 13:11 ` Nitesh Lal 2021-09-01 17:34 ` Marcelo Tosatti 2021-09-01 17:49 ` Nitesh Lal 2021-08-24 15:24 ` [patch V3 3/8] task isolation: sync vmstats on return to userspace Marcelo Tosatti 2021-09-10 13:49 ` nsaenzju 2021-08-24 15:24 ` [patch V3 4/8] procfs: add per-pid task isolation state Marcelo Tosatti 2021-08-24 15:24 ` [patch V3 5/8] task isolation: sync vmstats conditional on changes Marcelo Tosatti 2021-08-25 9:46 ` Christoph Lameter 2021-08-24 15:24 ` [patch V3 6/8] KVM: x86: call isolation prepare from VM-entry code path Marcelo Tosatti 2021-08-24 15:24 ` [patch V3 7/8] mm: vmstat: move need_update Marcelo Tosatti 2021-08-24 15:24 ` [patch V3 8/8] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti 2021-08-25 9:30 ` Christoph Lameter 2021-09-01 13:05 ` Nitesh Lal 2021-09-01 17:32 ` Marcelo Tosatti 2021-09-01 18:33 ` Marcelo Tosatti 2021-09-03 17:38 ` Nitesh Lal 2021-08-25 10:02 ` [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210824152646.706875395@fuller.cnet \ --to=mtosatti@redhat.com \ --cc=abelits@belits.com \ --cc=cl@linux.com \ --cc=frederic@kernel.org \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=nilal@redhat.com \ --cc=nsaenzju@redhat.com \ --cc=peterx@redhat.com \ --cc=peterz@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).