LKML Archive on
help / color / mirror / Atom feed
From: Waiman Long <>
To: "Michal Koutný" <>, "Waiman Long" <>
Cc: Tejun Heo <>, Zefan Li <>,
	Johannes Weiner <>,
	Jonathan Corbet <>, Shuah Khan <>,,,,,
	Andrew Morton <>,
	Roman Gushchin <>, Phil Auld <>,
	Peter Zijlstra <>,
	Juri Lelli <>,
	Frederic Weisbecker <>,
	Marcelo Tosatti <>
Subject: Re: [PATCH v7 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
Date: Wed, 13 Oct 2021 18:11:37 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

On 10/13/21 5:45 PM, Waiman Long wrote:
>> In conclusion, it'd be good to have validity conditions separate from
>> transition conditions (since hotplug transition can't be rejected) and
>> perhaps treat administrative changes from an ancestor equally as a
>> hotplug.
> I am trying to make the result of changing "cpuset.cpus" as close to 
> hotplug as possible but there are cases where the "cpuset.cpus" change 
> is prohibited but hotplug can still happen to remove the cpu.
> Hope this will help to clarify the current design.
BTW, the attached file is the current draft of cpuset.cpus.partition 


[-- Attachment #2: cpuset.cpus.partition.txt --]
[-- Type: text/plain, Size: 4889 bytes --]

	A read-write single value file which exists on non-root
	cpuset-enabled cgroups.  This flag is owned by the parent cgroup
	and is not delegatable.

	It accepts only the following input values when written to.

	  ========	================================
	  "member"	Non-root member of a partition
	  "root"	Partition root
	  "isolated"	Partition root without load balancing
	  ========	================================

	When set to be a partition root, the current cgroup is the
	root of a new partition or scheduling domain that comprises
	itself and all its descendants except those that are separate
	partition roots themselves and their descendants.  The root
	cgroup is always a partition root.

	When set to "isolated", the CPUs in that partition root will
	be in an isolated state without any load balancing from the
	scheduler.  Tasks in such a partition must be explicitly bound
	to each individual CPU.

	"cpuset.cpus" must always be set up first before enabling
	partition.  Unlike "member" whose "cpuset.cpus.effective" can
	contain CPUs not in "cpuset.cpus", this can never happen with a
	valid partition root.  In other words, "cpuset.cpus.effective"
	is always a subset of "cpuset.cpus" for a valid partition root.

	When a parent partition root cannot exclusively grant any of
	the CPUs specified in "cpuset.cpus", "cpuset.cpus.effective"
	becomes empty. If there are tasks in the partition root, the
	partition root becomes invalid and "cpuset.cpus.effective"
	is reset to that of the nearest non-empty ancestor.

        Note that a task cannot be moved to a cgroup with empty

	There are additional constraints on where a partition root can
	be enabled ("root" or "isolated").  It can only be enabled in
	a cgroup if all the following conditions are met.

	1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are
	   not shared by any of its siblings.
	2) The parent cgroup is a valid partition root.
	3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus".
	4) There is no child cgroups with cpuset enabled.  This avoids
	   cpu migrations of multiple cgroups simultaneously which can
	   be problematic.

	On read, the "cpuset.cpus.partition" file can show the following

	  ======================	==============================
	  "member"			Non-root member of a partition
	  "root"			Partition root
	  "isolated"			Partition root without load balancing
	  "root invalid (<reason>)"	Invalid partition root
	  ======================	==============================

        In the case of an invalid partition root, a descriptive string on
        why the partition is invalid is included within parentheses.

	Once becoming a partition root, changes to "cpuset.cpus" is
	generally allowed as long as the cpu list is exclusive and is
	a superset of children's cpu lists.

        The constraints of a valid partition root are as follows:

        1) "cpuset.cpus" is non-empty and exclusive.
        2) The parent cgroup is a valid partition root.
        3) "cpuset.cpus.effective" is a subset of "cpuset.cpus"
        4) "cpuset.cpus.effective" is non-empty when there are tasks
           in the partition.

	Changes to "cpuset.cpus" or cpu hotplug may cause the state
	of a valid partition root to become invalid when one or more
	constraints of a valid partition root are violated.  Therefore,
	user space agents that manage partition roots should avoid
	unnecessary changes to "cpuset.cpus" and always check the state
	of "cpuset.cpus.partition" after making changes to make sure
	that the partitions are functioning properly as expected.

        Changing a partition root to "member" is always allowed.
        If there are child partition roots underneath it, however,
        they will be forced to be switched back to "member" too and
        lose their partitions. So care must be taken to double check
        for this condition before disabling a partition root.

	Setting a cgroup to a valid partition root will take the CPUs
	away from the effective CPUs of the parent partition.

	A valid parent partition may distribute out all its CPUs to
	its child partitions as long as it is not the root cgroup as
	we need some house-keeping CPUs in the root cgroup.

	An invalid partition is not a real partition even though some
	internal states may still be kept.

	An invalid partition root can be reverted back to a real
	partition root if none of the constraints of a valid partition
        root are violated.

	Poll and inotify events are triggered whenever the state of
	"cpuset.cpus.partition" changes.  That includes changes caused by
	write to "cpuset.cpus.partition", cpu hotplug and other changes
	that make the partition invalid.  This will allow user space
	agents to monitor unexpected changes to "cpuset.cpus.partition"
	without the need to do continuous polling.

  reply	other threads:[~2021-10-13 22:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 21:37 [PATCH v7 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long
2021-08-25 21:37 ` [PATCH v7 1/6] cgroup/cpuset: Properly transition to invalid partition Waiman Long
2021-08-25 21:37 ` [PATCH v7 2/6] cgroup/cpuset: Show invalid partition reason string Waiman Long
2021-08-25 21:37 ` [PATCH v7 3/6] cgroup/cpuset: Add a new isolated cpus.partition type Waiman Long
2021-08-25 21:37 ` [PATCH v7 4/6] cgroup/cpuset: Allow non-top parent partition to distribute out all CPUs Waiman Long
2021-08-25 21:37 ` [PATCH v7 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Waiman Long
2021-08-26 17:35   ` Tejun Heo
2021-08-27  3:01     ` Waiman Long
2021-08-27  4:00       ` Tejun Heo
2021-08-27 21:19         ` Waiman Long
2021-08-27 21:27           ` Tejun Heo
2021-08-27 22:50             ` Waiman Long
2021-08-27 23:35               ` Tejun Heo
2021-08-28  1:14                 ` Waiman Long
     [not found]                 ` <>
2021-10-12 14:39                   ` Michal Koutný
2021-10-13 21:45                     ` Waiman Long
2021-10-13 22:11                       ` Waiman Long [this message]
2021-08-30 17:59               ` Michal Koutný
2021-08-25 21:37 ` [PATCH v7 6/6] kselftest/cgroup: Add cpuset v2 partition root state test Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH v7 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).