LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
@ 2019-04-09 20:40 Joel Savitz
  2019-04-10 16:11 ` Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Joel Savitz @ 2019-04-09 20:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Savitz, Phil Auld, Waiman Long, Tejun Heo, Li Zefan, cgroups

If a process is limited by taskset (i.e. cpuset) to only be allowed to
run on cpu N, and then cpu N is offlined via hotplug, the process will
be assigned the current value of its cpuset cgroup's effective_cpus field
in a call to do_set_cpus_allowed() in cpuset_cpus_allowed_fallback().
This argument's value does not makes sense for this case, because
task_cs(tsk)->effective_cpus is modified by cpuset_hotplug_workfn()
to reflect the new value of cpu_active_mask after cpu N is removed from
the mask. While this may make sense for the cgroup affinity mask, it
does not make sense on a per-task basis, as a task that was previously
limited to only be run on cpu N will be limited to every cpu _except_ for
cpu N after it is offlined/onlined via hotplug.

Pre-patch behavior:

	$ grep Cpus /proc/$$/status
	Cpus_allowed:	ff
	Cpus_allowed_list:	0-7

	$ taskset -p 4 $$
	pid 19202's current affinity mask: f
	pid 19202's new affinity mask: 4

	$ grep Cpus /proc/self/status
	Cpus_allowed:	04
	Cpus_allowed_list:	2

	# echo off > /sys/devices/system/cpu/cpu2/online
	$ grep Cpus /proc/$$/status
	Cpus_allowed:	0b
	Cpus_allowed_list:	0-1,3

	# echo on > /sys/devices/system/cpu/cpu2/online
	$ grep Cpus /proc/$$/status
	Cpus_allowed:	0b
	Cpus_allowed_list:	0-1,3

On a patched system, the final grep produces the following
output instead:

	$ grep Cpus /proc/$$/status
	Cpus_allowed:	ff
	Cpus_allowed_list:	0-7

This patch changes the above behavior by instead resetting the mask to
task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
mode.

This fallback mechanism is only triggered if _every_ other valid avenue
has been traveled, and it is the last resort before calling BUG().

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
---
 kernel/cgroup/cpuset.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4834c4214e9c..6c9deb2cc687 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3255,10 +3255,23 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
 
+/**
+ * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
+ * @tsk: pointer to task_struct with which the scheduler is struggling
+ *
+ * Description: In the case that the scheduler cannot find an allowed cpu in
+ * tsk->cpus_allowed, we fall back to task_cs(tsk)->cpus_allowed. In legacy
+ * mode however, this value is the same as task_cs(tsk)->effective_cpus,
+ * which will not contain a sane cpumask during cases such as cpu hotplugging.
+ * This is the absolute last resort for the scheduler and it is only used if
+ * _every_ other avenue has been traveled.
+ **/
+
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
 	rcu_read_lock();
-	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
+	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
+		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
 	rcu_read_unlock();
 
 	/*
-- 
2.18.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-04-09 20:40 [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback() Joel Savitz
@ 2019-04-10 16:11 ` Waiman Long
  2019-04-10 17:44 ` Phil Auld
  2019-05-21 14:34 ` Michal Koutný
  2 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2019-04-10 16:11 UTC (permalink / raw)
  To: Joel Savitz, linux-kernel; +Cc: Phil Auld, Tejun Heo, Li Zefan, cgroups

On 04/09/2019 04:40 PM, Joel Savitz wrote:
> If a process is limited by taskset (i.e. cpuset) to only be allowed to
> run on cpu N, and then cpu N is offlined via hotplug, the process will
> be assigned the current value of its cpuset cgroup's effective_cpus field
> in a call to do_set_cpus_allowed() in cpuset_cpus_allowed_fallback().
> This argument's value does not makes sense for this case, because
> task_cs(tsk)->effective_cpus is modified by cpuset_hotplug_workfn()
> to reflect the new value of cpu_active_mask after cpu N is removed from
> the mask. While this may make sense for the cgroup affinity mask, it
> does not make sense on a per-task basis, as a task that was previously
> limited to only be run on cpu N will be limited to every cpu _except_ for
> cpu N after it is offlined/onlined via hotplug.
>
> Pre-patch behavior:
>
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	ff
> 	Cpus_allowed_list:	0-7
>
> 	$ taskset -p 4 $$
> 	pid 19202's current affinity mask: f
> 	pid 19202's new affinity mask: 4
>
> 	$ grep Cpus /proc/self/status
> 	Cpus_allowed:	04
> 	Cpus_allowed_list:	2
>
> 	# echo off > /sys/devices/system/cpu/cpu2/online
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	0b
> 	Cpus_allowed_list:	0-1,3
>
> 	# echo on > /sys/devices/system/cpu/cpu2/online
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	0b
> 	Cpus_allowed_list:	0-1,3
>
> On a patched system, the final grep produces the following
> output instead:
>
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	ff
> 	Cpus_allowed_list:	0-7
>
> This patch changes the above behavior by instead resetting the mask to
> task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
> mode.
>
> This fallback mechanism is only triggered if _every_ other valid avenue
> has been traveled, and it is the last resort before calling BUG().
>
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4834c4214e9c..6c9deb2cc687 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3255,10 +3255,23 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  	spin_unlock_irqrestore(&callback_lock, flags);
>  }
>  
> +/**
> + * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
> + * @tsk: pointer to task_struct with which the scheduler is struggling
> + *
> + * Description: In the case that the scheduler cannot find an allowed cpu in
> + * tsk->cpus_allowed, we fall back to task_cs(tsk)->cpus_allowed. In legacy
> + * mode however, this value is the same as task_cs(tsk)->effective_cpus,
> + * which will not contain a sane cpumask during cases such as cpu hotplugging.
> + * This is the absolute last resort for the scheduler and it is only used if
> + * _every_ other avenue has been traveled.
> + **/
> +
>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>  {
>  	rcu_read_lock();
> -	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> +	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> +		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
>  	rcu_read_unlock();
>  
>  	/*

Acked-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-04-09 20:40 [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback() Joel Savitz
  2019-04-10 16:11 ` Waiman Long
@ 2019-04-10 17:44 ` Phil Auld
  2019-05-21 14:34 ` Michal Koutný
  2 siblings, 0 replies; 8+ messages in thread
From: Phil Auld @ 2019-04-10 17:44 UTC (permalink / raw)
  To: Joel Savitz; +Cc: linux-kernel, Waiman Long, Tejun Heo, Li Zefan, cgroups

On Tue, Apr 09, 2019 at 04:40:03PM -0400 Joel Savitz wrote:
> If a process is limited by taskset (i.e. cpuset) to only be allowed to
> run on cpu N, and then cpu N is offlined via hotplug, the process will
> be assigned the current value of its cpuset cgroup's effective_cpus field
> in a call to do_set_cpus_allowed() in cpuset_cpus_allowed_fallback().
> This argument's value does not makes sense for this case, because
> task_cs(tsk)->effective_cpus is modified by cpuset_hotplug_workfn()
> to reflect the new value of cpu_active_mask after cpu N is removed from
> the mask. While this may make sense for the cgroup affinity mask, it
> does not make sense on a per-task basis, as a task that was previously
> limited to only be run on cpu N will be limited to every cpu _except_ for
> cpu N after it is offlined/onlined via hotplug.
> 
> Pre-patch behavior:
> 
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	ff
> 	Cpus_allowed_list:	0-7
> 
> 	$ taskset -p 4 $$
> 	pid 19202's current affinity mask: f
> 	pid 19202's new affinity mask: 4
> 
> 	$ grep Cpus /proc/self/status
> 	Cpus_allowed:	04
> 	Cpus_allowed_list:	2
> 
> 	# echo off > /sys/devices/system/cpu/cpu2/online
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	0b
> 	Cpus_allowed_list:	0-1,3
> 
> 	# echo on > /sys/devices/system/cpu/cpu2/online
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	0b
> 	Cpus_allowed_list:	0-1,3
> 
> On a patched system, the final grep produces the following
> output instead:
> 
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	ff
> 	Cpus_allowed_list:	0-7
> 
> This patch changes the above behavior by instead resetting the mask to
> task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
> mode.
> 
> This fallback mechanism is only triggered if _every_ other valid avenue
> has been traveled, and it is the last resort before calling BUG().
> 
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4834c4214e9c..6c9deb2cc687 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3255,10 +3255,23 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  	spin_unlock_irqrestore(&callback_lock, flags);
>  }
>  
> +/**
> + * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe.
> + * @tsk: pointer to task_struct with which the scheduler is struggling
> + *
> + * Description: In the case that the scheduler cannot find an allowed cpu in
> + * tsk->cpus_allowed, we fall back to task_cs(tsk)->cpus_allowed. In legacy
> + * mode however, this value is the same as task_cs(tsk)->effective_cpus,
> + * which will not contain a sane cpumask during cases such as cpu hotplugging.
> + * This is the absolute last resort for the scheduler and it is only used if
> + * _every_ other avenue has been traveled.
> + **/
> +
>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>  {
>  	rcu_read_lock();
> -	do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus);
> +	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> +		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
>  	rcu_read_unlock();
>  
>  	/*
> -- 
> 2.18.1
> 

Fwiw,

Acked-by: Phil Auld <pauld@redhat.com>


-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-04-09 20:40 [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback() Joel Savitz
  2019-04-10 16:11 ` Waiman Long
  2019-04-10 17:44 ` Phil Auld
@ 2019-05-21 14:34 ` Michal Koutný
  2019-05-24 15:33   ` Joel Savitz
  2 siblings, 1 reply; 8+ messages in thread
From: Michal Koutný @ 2019-05-21 14:34 UTC (permalink / raw)
  To: Joel Savitz
  Cc: linux-kernel, Li Zefan, Tejun Heo, Waiman Long, Phil Auld, cgroups

On Tue, Apr 09, 2019 at 04:40:03PM -0400, Joel Savitz <jsavitz@redhat.com> wrote:
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	ff
> 	Cpus_allowed_list:	0-7

(a)
 
> 	$ taskset -p 4 $$
> 	pid 19202's current affinity mask: f
> 	pid 19202's new affinity mask: 4
> 
> 	$ grep Cpus /proc/self/status
> 	Cpus_allowed:	04
> 	Cpus_allowed_list:	2
> 
> 	# echo off > /sys/devices/system/cpu/cpu2/online
> 	$ grep Cpus /proc/$$/status
> 	Cpus_allowed:	0b
> 	Cpus_allowed_list:	0-1,3
I'm confused where this value comes from, I must be missing something.

Joel, is the task in question put into a cpuset with 0xf CPUs only (at
point (a))? Or are the CPUs 4-7 offlined as well?

Thanks,
Michal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-05-21 14:34 ` Michal Koutný
@ 2019-05-24 15:33   ` Joel Savitz
  2019-05-28 12:10     ` Michal Koutný
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Savitz @ 2019-05-24 15:33 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-kernel, Li Zefan, Tejun Heo, Waiman Long, Phil Auld, cgroups

On Tue, May 21, 2019 at 10:35 AM Michal Koutný <mkoutny@suse.com> wrote:
> >       $ grep Cpus /proc/$$/status
> >       Cpus_allowed:   ff
> >       Cpus_allowed_list:      0-7
>
> (a)
>
> >       $ taskset -p 4 $$
> >       pid 19202's current affinity mask: f

> I'm confused where this value comes from, I must be missing something.
>
> Joel, is the task in question put into a cpuset with 0xf CPUs only (at
> point (a))? Or are the CPUs 4-7 offlined as well?

Good point.

It is a bit ambiguous, but I performed no action on the task's cpuset
nor did I offline any cpus at point (a).

After a bit of research, I am fairly certain that the observed
discrepancy is due to differing mechanisms used to acquire the cpuset
mask value.

The first mechanism, via `grep Cpus /proc/$$/status`, has it's value
populated by the expression (task->cpus_allowed) in
fs/proc/array.c:sched_getaffinity(), whereas the taskset utility
(https://github.com/karelzak/util-linux/blob/master/schedutils/taskset.c)
uses sched_getaffinity(2) to determine the "current affinity mask"
value from the expression (task->cpus_allowed & cpu_active_mask) in
kernel/sched/core.c:sched_getaffinty(),

I do not know if there is an explicit reason for this discrepancy or
whether the two mechanisms were simply built independently, perhaps
for different purposes.

I think the /proc/$$/status value is intended to simply reflect the
user-specified policy stating which cpus the task is allowed to run on
without consideration for hardware state, whereas the taskset value is
representative of the cpus that the task can actually be run on given
the restriction policy specified by the user via the cpuset mechanism.

By the way, I posted a v2 of this patch that correctly handles cgroup
v2 behavior.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-05-24 15:33   ` Joel Savitz
@ 2019-05-28 12:10     ` Michal Koutný
  2019-06-13 12:02       ` Michal Koutný
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Koutný @ 2019-05-28 12:10 UTC (permalink / raw)
  To: Joel Savitz
  Cc: Li Zefan, Tejun Heo, Waiman Long, Phil Auld, cgroups, linux-kernel

Thanks for digging through this.

On Fri, May 24, 2019 at 11:33:55AM -0400, Joel Savitz <jsavitz@redhat.com> wrote:
> It is a bit ambiguous, but I performed no action on the task's cpuset
> nor did I offline any cpus at point (a).
So did you do any operation that left you with
    cpu_active_mask & 0xf0 == 0
?

(If so, I think the demo code should be made without it to avoid the
confusion.)

Regardless, the demo code should also specify in what cpuset it happens
(for the v2 case).

> I think the /proc/$$/status value is intended to simply reflect the
> user-specified policy stating which cpus the task is allowed to run on
> without consideration for hardware state, whereas the taskset value is
> representative of the cpus that the task can actually be run on given
> the restriction policy specified by the user via the cpuset mechanism.
Yes, it seems to me to be somewhat analogous to effective_cpus vs
cpus_allowed in the v2 cpuset.


> By the way, I posted a v2 of this patch that correctly handles cgroup
> v2 behavior.
I see the original version made the state = cpuset in select_fallback_rq
mostly redundant. The split on v2 (hierarchy) in v2 (patch) makes some
sense. Although, on v1 we will lose the "no longer affine to..." message
(which is what happens in your demo IIUC).

Michal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-05-28 12:10     ` Michal Koutný
@ 2019-06-13 12:02       ` Michal Koutný
  2019-06-13 14:52         ` Joel Savitz
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Koutný @ 2019-06-13 12:02 UTC (permalink / raw)
  To: Joel Savitz
  Cc: Li Zefan, Tejun Heo, Waiman Long, Phil Auld, cgroups, linux-kernel

On Tue, May 28, 2019 at 02:10:37PM +0200, Michal Koutný  <MKoutny@suse.com> wrote:
> Although, on v1 we will lose the "no longer affine to..." message
> (which is what happens in your demo IIUC).
FWIW, I was wrong, off by one 'state' transition. So the patch doesn't
cause change in messaging (not tested though).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback()
  2019-06-13 12:02       ` Michal Koutný
@ 2019-06-13 14:52         ` Joel Savitz
  0 siblings, 0 replies; 8+ messages in thread
From: Joel Savitz @ 2019-06-13 14:52 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Li Zefan, Tejun Heo, Waiman Long, Phil Auld, cgroups, linux-kernel

I just did a quick test on a patched kernel to check on that "no
longer affine to..." message:

# nproc
64

# taskset -p 4 $$
pid 2261's current affinity mask: ffffffffffffffff
pid 2261's new affinity mask: 4
# echo off > /sys/devices/system/cpu/cpu2/online
# taskset -p $$
pid 2261's current affinity mask: fffffffffffffffb
# echo on > /sys/devices/system/cpu/cpu2/online
# taskset -p $$
pid 2261's current affinity mask: ffffffffffffffff

# dmesg | tail -5
[  143.996375] process 2261 (bash) no longer affine to cpu2
[  143.996657] IRQ 114: no longer affine to CPU2
[  144.007472] IRQ 227: no longer affine to CPU2
[  144.013460] smpboot: CPU 2 is now offline
[  162.685519] smpboot: Booting Node 0 Processor 2 APIC 0x4

dmesg output is observably the same on patched and unpatched kernels
in this case.

The only difference in output is that on an unpatched kernel, the last
`taskset -p $$` outputs:
pid 2274's current affinity mask: fffffffffffffffb
Which is the behavior that this patch aims to modify

This case, which I believe is generalizable, demonstrates that we
retain the "no longer affine to..." output on a kernel with this
patch.

Best,
Joel Savitz



On Thu, Jun 13, 2019 at 8:02 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Tue, May 28, 2019 at 02:10:37PM +0200, Michal Koutný  <MKoutny@suse.com> wrote:
> > Although, on v1 we will lose the "no longer affine to..." message
> > (which is what happens in your demo IIUC).
> FWIW, I was wrong, off by one 'state' transition. So the patch doesn't
> cause change in messaging (not tested though).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-06-13 15:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-09 20:40 [PATCH v2] cpuset: restore sanity to cpuset_cpus_allowed_fallback() Joel Savitz
2019-04-10 16:11 ` Waiman Long
2019-04-10 17:44 ` Phil Auld
2019-05-21 14:34 ` Michal Koutný
2019-05-24 15:33   ` Joel Savitz
2019-05-28 12:10     ` Michal Koutný
2019-06-13 12:02       ` Michal Koutný
2019-06-13 14:52         ` Joel Savitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).