LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] sched/fair: make CFS bandwidth slice per cpu group
@ 2018-04-30 19:29 Cong Wang
  2018-04-30 19:42 ` Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Cong Wang @ 2018-04-30 19:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Paul Turner, Peter Zijlstra, Mike Galbraith,
	Thomas Gleixner, Ingo Molnar

The value of sched_cfs_bandwidth_slice_us plays an important role
in distributing CPU time to each local CPU from the global pool,
a smaller slice is more fair to distribute CPU time to each parallel
tasks running on different CPUs.

Currently, the sched_cfs_bandwidth_slice_us is a global setting which
affects all cgroups. Different groups may want different values based
on their own workload, one size doesn't fit all. The global pool filled
periodically is per cgroup too, they should have the right to distribute
their own quota to each local CPU with their own frequency.

This patch intrdouces cpu.cfs_slice_us which allows each cgroup to
specify their own slice length without any global impact. And, the
global sysctl sched_cfs_bandwidth_slice_us now becomes the default
value of each cpu.cfs_slice_us. However, updating this sysctl does not
automatically update existing cgroups using a default value, people
will have to update each cgroup accordingly to make a global update.

Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 Documentation/scheduler/sched-bwc.txt | 14 +++++++----
 kernel/sched/core.c                   | 45 +++++++++++++++++++++++++++++++++++
 kernel/sched/fair.c                   | 17 ++++++-------
 kernel/sched/sched.h                  |  6 +++++
 4 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt
index f6b1873f68ab..b2d6ff02e5b3 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -48,19 +48,25 @@ and return the group to an unconstrained state once more.
 Any updates to a group's bandwidth specification will result in it becoming
 unthrottled if it is in a constrained state.
 
-System wide settings
+Slice
 --------------------
 For efficiency run-time is transferred between the global pool and CPU local
 "silos" in a batch fashion.  This greatly reduces global accounting pressure
 on large systems.  The amount transferred each time such an update is required
 is described as the "slice".
 
-This is tunable via procfs:
-	/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)
-
 Larger slice values will reduce transfer overheads, while smaller values allow
 for more fine-grained consumption.
 
+The per-group file cpu.cfs_slice_us controls the slice length within each CPU
+group, different groups could set different values for their own preference.
+Its default value is tunable via procfs:
+
+	/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)
+
+Note, updating this file does not automatically update existing groups using
+a default slice.
+
 Statistics
 ----------
 A group's bandwidth statistics are exported via 3 fields in cpu.stat.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e10aaeebfcc..cafdfd18be36 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6670,6 +6670,34 @@ long tg_get_cfs_period(struct task_group *tg)
 	return cfs_period_us;
 }
 
+int tg_set_cfs_slice(struct task_group *tg, long cfs_slice_us)
+{
+	u64 quota, slice;
+
+	quota = tg->cfs_bandwidth.quota;
+	if (quota == RUNTIME_INF)
+		return -EINVAL;
+	if (cfs_slice_us <= 0)
+		return -ERANGE;
+
+	slice = (u64)cfs_slice_us * NSEC_PER_USEC;
+	if (slice > quota)
+		return -EINVAL;
+
+	tg->cfs_bandwidth.slice = slice;
+	return 0;
+}
+
+long tg_get_cfs_slice(struct task_group *tg)
+{
+	u64 slice_us;
+
+	slice_us = tg->cfs_bandwidth.slice;
+	do_div(slice_us, NSEC_PER_USEC);
+
+	return slice_us;
+}
+
 static s64 cpu_cfs_quota_read_s64(struct cgroup_subsys_state *css,
 				  struct cftype *cft)
 {
@@ -6694,6 +6722,18 @@ static int cpu_cfs_period_write_u64(struct cgroup_subsys_state *css,
 	return tg_set_cfs_period(css_tg(css), cfs_period_us);
 }
 
+static s64 cpu_cfs_slice_read_s64(struct cgroup_subsys_state *css,
+				  struct cftype *cft)
+{
+	return tg_get_cfs_slice(css_tg(css));
+}
+
+static int cpu_cfs_slice_write_s64(struct cgroup_subsys_state *css,
+				   struct cftype *cftype, s64 cfs_quota_us)
+{
+	return tg_set_cfs_slice(css_tg(css), cfs_quota_us);
+}
+
 struct cfs_schedulable_data {
 	struct task_group *tg;
 	u64 period, quota;
@@ -6837,6 +6877,11 @@ static struct cftype cpu_legacy_files[] = {
 		.write_u64 = cpu_cfs_period_write_u64,
 	},
 	{
+		.name = "cfs_slice_us",
+		.read_s64 = cpu_cfs_slice_read_s64,
+		.write_s64 = cpu_cfs_slice_write_s64,
+	},
+	{
 		.name = "stat",
 		.seq_show = cpu_cfs_stat_show,
 	},
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 54dc31e7ab9b..44b21e70a9b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4539,11 +4539,6 @@ static inline u64 default_cfs_period(void)
 	return 100000000ULL;
 }
 
-static inline u64 sched_cfs_bandwidth_slice(void)
-{
-	return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
-}
-
 /*
  * Replenish runtime according to assigned quota and update expiration time.
  * We use sched_clock_cpu directly instead of rq->clock to avoid adding
@@ -4577,6 +4572,11 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
 	return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
 }
 
+static inline u64 cfs_bandwidth_slice(struct cfs_bandwidth *cfs_b)
+{
+	return cfs_b->slice;
+}
+
 /* returns 0 on failure to allocate runtime */
 static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 {
@@ -4585,7 +4585,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 	u64 amount = 0, min_amount, expires;
 
 	/* note: this is a positive sum as runtime_remaining <= 0 */
-	min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining;
+	min_amount = cfs_bandwidth_slice(cfs_b) - cfs_rq->runtime_remaining;
 
 	raw_spin_lock(&cfs_b->lock);
 	if (cfs_b->quota == RUNTIME_INF)
@@ -5004,7 +5004,7 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 		cfs_b->runtime += slack_runtime;
 
 		/* we are under rq->lock, defer unthrottling using a timer */
-		if (cfs_b->runtime > sched_cfs_bandwidth_slice() &&
+		if (cfs_b->runtime > cfs_bandwidth_slice(cfs_b) &&
 		    !list_empty(&cfs_b->throttled_cfs_rq))
 			start_cfs_slack_bandwidth(cfs_b);
 	}
@@ -5031,7 +5031,7 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
  */
 static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
 {
-	u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
+	u64 runtime = 0, slice = cfs_bandwidth_slice(cfs_b);
 	u64 expires;
 
 	/* confirm we're still not at a refresh boundary */
@@ -5157,6 +5157,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
 	cfs_b->runtime = 0;
 	cfs_b->quota = RUNTIME_INF;
 	cfs_b->period = ns_to_ktime(default_cfs_period());
+	cfs_b->slice = sched_default_cfs_bandwidth_slice();
 
 	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
 	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 15750c222ca2..204377652edb 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -331,6 +331,7 @@ struct cfs_bandwidth {
 	raw_spinlock_t		lock;
 	ktime_t			period;
 	u64			quota;
+	u64			slice;
 	u64			runtime;
 	s64			hierarchical_quota;
 	u64			runtime_expires;
@@ -432,6 +433,11 @@ extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 			struct sched_entity *parent);
 extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b);
 
+static inline u64 sched_default_cfs_bandwidth_slice(void)
+{
+	return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
+}
+
 extern void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b);
 extern void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b);
 extern void unthrottle_cfs_rq(struct cfs_rq *cfs_rq);
-- 
2.13.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
@ 2018-04-30 19:42 ` Peter Zijlstra
  2018-04-30 20:37   ` Cong Wang
  2018-05-01  0:29 ` kbuild test robot
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2018-04-30 19:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, Paul Turner, Mike Galbraith, Thomas Gleixner, Ingo Molnar

On Mon, Apr 30, 2018 at 12:29:25PM -0700, Cong Wang wrote:
> Currently, the sched_cfs_bandwidth_slice_us is a global setting which
> affects all cgroups. Different groups may want different values based
> on their own workload, one size doesn't fit all. The global pool filled
> periodically is per cgroup too, they should have the right to distribute
> their own quota to each local CPU with their own frequency.

Why.. what happens? This doesn't really tell us anything.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 19:42 ` Peter Zijlstra
@ 2018-04-30 20:37   ` Cong Wang
  2018-05-01  7:11     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2018-04-30 20:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Paul Turner, Mike Galbraith, Thomas Gleixner, Ingo Molnar

On Mon, Apr 30, 2018 at 12:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Apr 30, 2018 at 12:29:25PM -0700, Cong Wang wrote:
>> Currently, the sched_cfs_bandwidth_slice_us is a global setting which
>> affects all cgroups. Different groups may want different values based
>> on their own workload, one size doesn't fit all. The global pool filled
>> periodically is per cgroup too, they should have the right to distribute
>> their own quota to each local CPU with their own frequency.
>
> Why.. what happens? This doesn't really tell us anything.

We saw tasks in a container got throttled for many times even
when they don't apparently over-burn the CPU's. I tried to reduce
the sched_cfs_bandwidth_slice_us from the default 5ms to 1ms,
it solved the problem as no tasks got throttled after this change.
This is why I want to change it.

And I don't think 1ms will be good for all containers, so in order to
minimize the impact, I would like to keep the slice change within
each container. This is why I propose this patch rather just
`sysctl  -w`. Do you think otherwise?

BTW, people reported a similar (if not same) issue here before:
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
  2018-04-30 19:42 ` Peter Zijlstra
@ 2018-05-01  0:29 ` kbuild test robot
  2018-05-01  0:29 ` [RFC PATCH] sched/fair: tg_set_cfs_slice() can be static kbuild test robot
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-05-01  0:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: kbuild-all, linux-kernel, Cong Wang, Paul Turner, Peter Zijlstra,
	Mike Galbraith, Thomas Gleixner, Ingo Molnar

Hi Cong,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.17-rc3 next-20180430]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Cong-Wang/sched-fair-make-CFS-bandwidth-slice-per-cpu-group/20180501-064013
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   kernel/sched/core.c:509:9: sparse: incompatible types in comparison expression (different address spaces)
   kernel/sched/core.c:1614:17: sparse: incompatible types in comparison expression (different address spaces)
   kernel/sched/core.c:1802:27: sparse: incompatible types in comparison expression (different address spaces)
   kernel/sched/core.c:6541:11: sparse: symbol 'max_cfs_quota_period' was not declared. Should it be static?
   kernel/sched/core.c:6542:11: sparse: symbol 'min_cfs_quota_period' was not declared. Should it be static?
   kernel/sched/core.c:6622:5: sparse: symbol 'tg_set_cfs_quota' was not declared. Should it be static?
   kernel/sched/core.c:6635:6: sparse: symbol 'tg_get_cfs_quota' was not declared. Should it be static?
   kernel/sched/core.c:6648:5: sparse: symbol 'tg_set_cfs_period' was not declared. Should it be static?
   kernel/sched/core.c:6658:6: sparse: symbol 'tg_get_cfs_period' was not declared. Should it be static?
>> kernel/sched/core.c:6668:5: sparse: symbol 'tg_set_cfs_slice' was not declared. Should it be static?
>> kernel/sched/core.c:6686:6: sparse: symbol 'tg_get_cfs_slice' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH] sched/fair: tg_set_cfs_slice() can be static
  2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
  2018-04-30 19:42 ` Peter Zijlstra
  2018-05-01  0:29 ` kbuild test robot
@ 2018-05-01  0:29 ` kbuild test robot
  2018-05-01  0:31 ` [PATCH] sched/fair: make CFS bandwidth slice per cpu group kbuild test robot
  2018-05-01  0:57 ` kbuild test robot
  4 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-05-01  0:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: kbuild-all, linux-kernel, Cong Wang, Paul Turner, Peter Zijlstra,
	Mike Galbraith, Thomas Gleixner, Ingo Molnar


Fixes: 601aedfafd22 ("sched/fair: make CFS bandwidth slice per cpu group")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index abc48d9..daa1843 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6665,7 +6665,7 @@ long tg_get_cfs_period(struct task_group *tg)
 	return cfs_period_us;
 }
 
-int tg_set_cfs_slice(struct task_group *tg, long cfs_slice_us)
+static int tg_set_cfs_slice(struct task_group *tg, long cfs_slice_us)
 {
 	u64 quota, slice;
 
@@ -6683,7 +6683,7 @@ int tg_set_cfs_slice(struct task_group *tg, long cfs_slice_us)
 	return 0;
 }
 
-long tg_get_cfs_slice(struct task_group *tg)
+static long tg_get_cfs_slice(struct task_group *tg)
 {
 	u64 slice_us;
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
                   ` (2 preceding siblings ...)
  2018-05-01  0:29 ` [RFC PATCH] sched/fair: tg_set_cfs_slice() can be static kbuild test robot
@ 2018-05-01  0:31 ` kbuild test robot
  2018-05-01  0:57 ` kbuild test robot
  4 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-05-01  0:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: kbuild-all, linux-kernel, Cong Wang, Paul Turner, Peter Zijlstra,
	Mike Galbraith, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

Hi Cong,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17-rc3 next-20180430]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Cong-Wang/sched-fair-make-CFS-bandwidth-slice-per-cpu-group/20180501-064013
config: i386-randconfig-a0-201817 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel/sched/idle.c:8:0:
   kernel/sched/sched.h: In function 'sched_default_cfs_bandwidth_slice':
>> kernel/sched/sched.h:438:14: error: 'sysctl_sched_cfs_bandwidth_slice' undeclared (first use in this function)
     return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
                 ^
   kernel/sched/sched.h:438:14: note: each undeclared identifier is reported only once for each function it appears in

vim +/sysctl_sched_cfs_bandwidth_slice +438 kernel/sched/sched.h

   435	
   436	static inline u64 sched_default_cfs_bandwidth_slice(void)
   437	{
 > 438		return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
   439	}
   440	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32292 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
                   ` (3 preceding siblings ...)
  2018-05-01  0:31 ` [PATCH] sched/fair: make CFS bandwidth slice per cpu group kbuild test robot
@ 2018-05-01  0:57 ` kbuild test robot
  4 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-05-01  0:57 UTC (permalink / raw)
  To: Cong Wang
  Cc: kbuild-all, linux-kernel, Cong Wang, Paul Turner, Peter Zijlstra,
	Mike Galbraith, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

Hi Cong,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.17-rc3 next-20180430]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Cong-Wang/sched-fair-make-CFS-bandwidth-slice-per-cpu-group/20180501-064013
config: i386-randconfig-x003-201817 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel/sched/debug.c:12:0:
   kernel/sched/sched.h: In function 'sched_default_cfs_bandwidth_slice':
>> kernel/sched/sched.h:438:14: error: 'sysctl_sched_cfs_bandwidth_slice' undeclared (first use in this function); did you mean 'sched_default_cfs_bandwidth_slice'?
     return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 sched_default_cfs_bandwidth_slice
   kernel/sched/sched.h:438:14: note: each undeclared identifier is reported only once for each function it appears in

vim +438 kernel/sched/sched.h

   435	
   436	static inline u64 sched_default_cfs_bandwidth_slice(void)
   437	{
 > 438		return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
   439	}
   440	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26030 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-04-30 20:37   ` Cong Wang
@ 2018-05-01  7:11     ` Peter Zijlstra
  2018-05-01 18:06       ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2018-05-01  7:11 UTC (permalink / raw)
  To: Cong Wang; +Cc: LKML, Paul Turner, Mike Galbraith, Thomas Gleixner, Ingo Molnar

On Mon, Apr 30, 2018 at 01:37:16PM -0700, Cong Wang wrote:
> On Mon, Apr 30, 2018 at 12:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Apr 30, 2018 at 12:29:25PM -0700, Cong Wang wrote:
> >> Currently, the sched_cfs_bandwidth_slice_us is a global setting which
> >> affects all cgroups. Different groups may want different values based
> >> on their own workload, one size doesn't fit all. The global pool filled
> >> periodically is per cgroup too, they should have the right to distribute
> >> their own quota to each local CPU with their own frequency.
> >
> > Why.. what happens? This doesn't really tell us anything.
> 
> We saw tasks in a container got throttled for many times even
> when they don't apparently over-burn the CPU's. I tried to reduce
> the sched_cfs_bandwidth_slice_us from the default 5ms to 1ms,
> it solved the problem as no tasks got throttled after this change.
> This is why I want to change it.

The 1ms slice distributes time better at the cost of higher overhead,
right?

> And I don't think 1ms will be good for all containers, so in order to
> minimize the impact, I would like to keep the slice change within
> each container. This is why I propose this patch rather just
> `sysctl  -w`. Do you think otherwise?

Well, I think I don't quite remember everything and a Changelog that
tells me why you want stuff in a little more detail and helps me
remember some things is a lot more useful than me having to go dig
through the code myself (which I'll invariably postpone because I'm a
busy sort of person).

> BTW, people reported a similar (if not same) issue here before:
> https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1

That's not a report, that's a random person on the interweb posting
random crap. A report lands in my inbox.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group
  2018-05-01  7:11     ` Peter Zijlstra
@ 2018-05-01 18:06       ` Cong Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2018-05-01 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Paul Turner, Mike Galbraith, Thomas Gleixner, Ingo Molnar

On Tue, May 1, 2018 at 12:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Apr 30, 2018 at 01:37:16PM -0700, Cong Wang wrote:
>> On Mon, Apr 30, 2018 at 12:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, Apr 30, 2018 at 12:29:25PM -0700, Cong Wang wrote:
>> >> Currently, the sched_cfs_bandwidth_slice_us is a global setting which
>> >> affects all cgroups. Different groups may want different values based
>> >> on their own workload, one size doesn't fit all. The global pool filled
>> >> periodically is per cgroup too, they should have the right to distribute
>> >> their own quota to each local CPU with their own frequency.
>> >
>> > Why.. what happens? This doesn't really tell us anything.
>>
>> We saw tasks in a container got throttled for many times even
>> when they don't apparently over-burn the CPU's. I tried to reduce
>> the sched_cfs_bandwidth_slice_us from the default 5ms to 1ms,
>> it solved the problem as no tasks got throttled after this change.
>> This is why I want to change it.
>
> The 1ms slice distributes time better at the cost of higher overhead,
> right?


Right, slightly higher. According to this paper [1]:

"While decreasing this value increases the frequency at which CPUs
request for quota from the global pool, we did not notice any measurable
impact on performance."

1. https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf


>
>> And I don't think 1ms will be good for all containers, so in order to
>> minimize the impact, I would like to keep the slice change within
>> each container. This is why I propose this patch rather just
>> `sysctl  -w`. Do you think otherwise?
>
> Well, I think I don't quite remember everything and a Changelog that
> tells me why you want stuff in a little more detail and helps me
> remember some things is a lot more useful than me having to go dig
> through the code myself (which I'll invariably postpone because I'm a
> busy sort of person).

Sure, you just tell me how I should improve my changelog. I didn't
realize I need to state why I need to change sched_cfs_bandwidth_slice_us
since it is already a tunable. I will add it in v2.

>
>> BTW, people reported a similar (if not same) issue here before:
>> https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
>
> That's not a report, that's a random person on the interweb posting
> random crap. A report lands in my inbox.

Well, I found it via LKML:
https://marc.info/?l=linux-kernel&m=151270583632566&w=2

(I was too lazy to search for the original LKML link.)

I will update this patch to improve the changelog and address kbuild
warnings, if you don't object.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-01 18:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30 19:29 [PATCH] sched/fair: make CFS bandwidth slice per cpu group Cong Wang
2018-04-30 19:42 ` Peter Zijlstra
2018-04-30 20:37   ` Cong Wang
2018-05-01  7:11     ` Peter Zijlstra
2018-05-01 18:06       ` Cong Wang
2018-05-01  0:29 ` kbuild test robot
2018-05-01  0:29 ` [RFC PATCH] sched/fair: tg_set_cfs_slice() can be static kbuild test robot
2018-05-01  0:31 ` [PATCH] sched/fair: make CFS bandwidth slice per cpu group kbuild test robot
2018-05-01  0:57 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).