LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2 v2] Add statistics and document for cfs_b burst
@ 2021-08-30  3:22 Huaixin Chang
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Huaixin Chang @ 2021-08-30  3:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, anderson, baruah, bsegall, changhuaixin,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong,
	daniel.m.jordan

Changelog:
v2:
- Use burst_time in nanoseconds for cgroup1 interface, and burst_usec
  in microseconds for cgroup2 interface.
- Minor document adjustment.

v1 Link:
https://lore.kernel.org/lkml/20210816070849.3153-1-changhuaixin@linux.alibaba.com/

Huaixin Chang (2):
  sched/fair: Add cfs bandwidth burst statistics
  sched/fair: Add document for burstable CFS bandwidth

 Documentation/admin-guide/cgroup-v2.rst |  8 ++++
 Documentation/scheduler/sched-bwc.rst   | 85 +++++++++++++++++++++++++++++----
 kernel/sched/core.c                     | 13 +++--
 kernel/sched/fair.c                     |  9 ++++
 kernel/sched/sched.h                    |  3 ++
 5 files changed, 105 insertions(+), 13 deletions(-)

-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
@ 2021-08-30  3:22 ` Huaixin Chang
  2021-09-03 18:47   ` Benjamin Segall
                     ` (2 more replies)
  2021-08-30  3:22 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 16+ messages in thread
From: Huaixin Chang @ 2021-08-30  3:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, anderson, baruah, bsegall, changhuaixin,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong,
	daniel.m.jordan

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_time: cumulative wall-time (in nanoseconds) that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 20ffcc044134..d00b92712253 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10068,6 +10068,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10164,16 +10167,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44c452072a1b..464371f364f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14a41a243f7b..80e4322727b4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -367,6 +367,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -379,7 +380,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth
  2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-08-30  3:22 ` Huaixin Chang
  2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
  2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  2021-08-30 16:49 ` [PATCH 0/2 v2] Add statistics and document for cfs_b burst Tejun Heo
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Huaixin Chang @ 2021-08-30  3:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, anderson, baruah, bsegall, changhuaixin,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong,
	daniel.m.jordan

Basic description of usage and effect for CFS Bandwidth Control Burst.

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  8 ++++
 Documentation/scheduler/sched-bwc.rst   | 85 +++++++++++++++++++++++++++++----
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 5c7377b5bd3e..f9d749d2daf2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1016,6 +1016,8 @@ All time durations are in microseconds.
 	- nr_periods
 	- nr_throttled
 	- throttled_usec
+	- nr_bursts
+	- burst_usec
 
   cpu.weight
 	A read-write single value file which exists on non-root
@@ -1047,6 +1049,12 @@ All time durations are in microseconds.
 	$PERIOD duration.  "max" for $MAX indicates no limit.  If only
 	one number is written, $MAX is updated.
 
+  cpu.max.burst
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	The burst in the range [0, $MAX].
+
   cpu.pressure
 	A read-write nested-keyed file.
 
diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst
index 1fc73555f5c4..05cfefdeebb2 100644
--- a/Documentation/scheduler/sched-bwc.rst
+++ b/Documentation/scheduler/sched-bwc.rst
@@ -22,39 +22,89 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it
 is transferred to cpu-local "silos" on a demand basis. The amount transferred
 within each of these updates is tunable and described as the "slice".
 
+Burst feature
+-------------
+This feature borrows time now against our future underrun, at the cost of
+increased interference against the other system users. All nicely bounded.
+
+Traditional (UP-EDF) bandwidth control is something like:
+
+  (U = \Sum u_i) <= 1
+
+This guaranteeds both that every deadline is met and that the system is
+stable. After all, if U were > 1, then for every second of walltime,
+we'd have to run more than a second of program time, and obviously miss
+our deadline, but the next deadline will be further out still, there is
+never time to catch up, unbounded fail.
+
+The burst feature observes that a workload doesn't always executes the full
+quota; this enables one to describe u_i as a statistical distribution.
+
+For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
+(the traditional WCET). This effectively allows u to be smaller,
+increasing the efficiency (we can pack more tasks in the system), but at
+the cost of missing deadlines when all the odds line up. However, it
+does maintain stability, since every overrun must be paired with an
+underrun as long as our x is above the average.
+
+That is, suppose we have 2 tasks, both specify a p(95) value, then we
+have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
+everything is good. At the same time we have a p(5)p(5) = 0.25% chance
+both tasks will exceed their quota at the same time (guaranteed deadline
+fail). Somewhere in between there's a threshold where one exceeds and
+the other doesn't underrun enough to compensate; this depends on the
+specific CDFs.
+
+At the same time, we can say that the worst case deadline miss, will be
+\Sum e_i; that is, there is a bounded tardiness (under the assumption
+that x+e is indeed WCET).
+
+The interferenece when using burst is valued by the possibilities for
+missing the deadline and the average WCET. Test results showed that when
+there many cgroups or CPU is under utilized, the interference is
+limited. More details are shown in:
+https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
+
 Management
 ----------
-Quota and period are managed within the cpu subsystem via cgroupfs.
+Quota, period and burst are managed within the cpu subsystem via cgroupfs.
 
 .. note::
    The cgroupfs files described in this section are only applicable
    to cgroup v1. For cgroup v2, see
    :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
 
-- cpu.cfs_quota_us: the total available run-time within a period (in
-  microseconds)
+- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
 - cpu.cfs_period_us: the length of a period (in microseconds)
 - cpu.stat: exports throttling statistics [explained further below]
+- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds)
 
 The default values are::
 
 	cpu.cfs_period_us=100ms
-	cpu.cfs_quota=-1
+	cpu.cfs_quota_us=-1
+	cpu.cfs_burst_us=0
 
 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
 bandwidth restriction in place, such a group is described as an unconstrained
 bandwidth group. This represents the traditional work-conserving behavior for
 CFS.
 
-Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms. There is also an
-upper bound on the period length of 1s. Additional restrictions exist when
-bandwidth limits are used in a hierarchical fashion, these are explained in
-more detail below.
+Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will
+enact the specified bandwidth limit. The minimum quota allowed for the quota or
+period is 1ms. There is also an upper bound on the period length of 1s.
+Additional restrictions exist when bandwidth limits are used in a hierarchical
+fashion, these are explained in more detail below.
 
 Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
 and return the group to an unconstrained state once more.
 
+A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate
+any unused bandwidth. It makes the traditional bandwidth control behavior for
+CFS unchanged. Writing any (valid) positive value(s) no larger than
+cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth
+accumulation.
+
 Any updates to a group's bandwidth specification will result in it becoming
 unthrottled if it is in a constrained state.
 
@@ -74,7 +124,7 @@ for more fine-grained consumption.
 
 Statistics
 ----------
-A group's bandwidth statistics are exported via 3 fields in cpu.stat.
+A group's bandwidth statistics are exported via 5 fields in cpu.stat.
 
 cpu.stat:
 
@@ -82,6 +132,9 @@ cpu.stat:
 - nr_throttled: Number of times the group has been throttled/limited.
 - throttled_time: The total time duration (in nanoseconds) for which entities
   of the group have been throttled.
+- nr_bursts: Number of periods burst occurs.
+- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used
+  above quota in respective periods
 
 This interface is read-only.
 
@@ -179,3 +232,15 @@ Examples
 
    By using a small period here we are ensuring a consistent latency
    response at the expense of burst capacity.
+
+4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU
+   additionally, in case accumulation has been done.
+
+   With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
+   And 10ms burst will be equivalent to 20% of 1 CPU.
+
+	# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
+	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
+	# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
+
+   Larger buffer setting (no larger than quota) allows greater burst capacity.
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2 v2] Add statistics and document for cfs_b burst
  2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
  2021-08-30  3:22 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
@ 2021-08-30 16:49 ` Tejun Heo
  2021-09-01 15:37 ` Daniel Jordan
  2021-09-03 13:12 ` changhuaixin
  4 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2021-08-30 16:49 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: linux-kernel, peterz, anderson, baruah, bsegall,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong,
	daniel.m.jordan

On Mon, Aug 30, 2021 at 11:22:13AM +0800, Huaixin Chang wrote:
> Changelog:
> v2:
> - Use burst_time in nanoseconds for cgroup1 interface, and burst_usec
>   in microseconds for cgroup2 interface.
> - Minor document adjustment.
> 
> v1 Link:
> https://lore.kernel.org/lkml/20210816070849.3153-1-changhuaixin@linux.alibaba.com/

From cgroup interface pov, looks good to me.

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2 v2] Add statistics and document for cfs_b burst
  2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
                   ` (2 preceding siblings ...)
  2021-08-30 16:49 ` [PATCH 0/2 v2] Add statistics and document for cfs_b burst Tejun Heo
@ 2021-09-01 15:37 ` Daniel Jordan
  2021-09-03 13:12 ` changhuaixin
  4 siblings, 0 replies; 16+ messages in thread
From: Daniel Jordan @ 2021-09-01 15:37 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: linux-kernel, peterz, anderson, baruah, bsegall,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

On Mon, Aug 30, 2021 at 11:22:13AM +0800, Huaixin Chang wrote:
> Changelog:
> v2:
> - Use burst_time in nanoseconds for cgroup1 interface, and burst_usec
>   in microseconds for cgroup2 interface.
> - Minor document adjustment.
> 
> v1 Link:
> https://lore.kernel.org/lkml/20210816070849.3153-1-changhuaixin@linux.alibaba.com/
> 
> Huaixin Chang (2):
>   sched/fair: Add cfs bandwidth burst statistics
>   sched/fair: Add document for burstable CFS bandwidth

Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2 v2] Add statistics and document for cfs_b burst
  2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
                   ` (3 preceding siblings ...)
  2021-09-01 15:37 ` Daniel Jordan
@ 2021-09-03 13:12 ` changhuaixin
  4 siblings, 0 replies; 16+ messages in thread
From: changhuaixin @ 2021-09-03 13:12 UTC (permalink / raw)
  To: open list
  Cc: changhuaixin, Peter Zijlstra, anderson, baruah, Benjamin Segall,
	Dietmar Eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong,
	daniel.m.jordan

Hi,

I wonder how do these two patches look this time, from cfs_b's point of view.
The statistics is simpler than before. And document is added too.

> On Aug 30, 2021, at 11:22 AM, Huaixin Chang <changhuaixin@linux.alibaba.com> wrote:
> 
> Changelog:
> v2:
> - Use burst_time in nanoseconds for cgroup1 interface, and burst_usec
>  in microseconds for cgroup2 interface.
> - Minor document adjustment.
> 
> v1 Link:
> https://lore.kernel.org/lkml/20210816070849.3153-1-changhuaixin@linux.alibaba.com/
> 
> Huaixin Chang (2):
>  sched/fair: Add cfs bandwidth burst statistics
>  sched/fair: Add document for burstable CFS bandwidth
> 
> Documentation/admin-guide/cgroup-v2.rst |  8 ++++
> Documentation/scheduler/sched-bwc.rst   | 85 +++++++++++++++++++++++++++++----
> kernel/sched/core.c                     | 13 +++--
> kernel/sched/fair.c                     |  9 ++++
> kernel/sched/sched.h                    |  3 ++
> 5 files changed, 105 insertions(+), 13 deletions(-)
> 
> -- 
> 2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-09-03 18:47   ` Benjamin Segall
  2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
  2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  2 siblings, 0 replies; 16+ messages in thread
From: Benjamin Segall @ 2021-09-03 18:47 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: linux-kernel, peterz, anderson, baruah, dietmar.eggemann, dtcccc,
	juri.lelli, khlebnikov, luca.abeni, mgorman, mingo, odin, odin,
	pauld, pjt, rostedt, shanpeic, tj, tommaso.cucinotta,
	vincent.guittot, xiyou.wangcong, daniel.m.jordan

Huaixin Chang <changhuaixin@linux.alibaba.com> writes:

> Two new statistics are introduced to show the internal of burst feature
> and explain why burst helps or not.
>
> nr_bursts:  number of periods bandwidth burst occurs
> burst_time: cumulative wall-time (in nanoseconds) that any cpus has
> 	    used above quota in respective periods
>
> Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>

Reviewed-by: Ben Segall <bsegall@google.com>

I know there's some worry about the overhead of a constantly increasing
amount of statistics, but as far as the implementation of this goes, it
looks good to me.

> ---
>  kernel/sched/core.c  | 13 ++++++++++---
>  kernel/sched/fair.c  |  9 +++++++++
>  kernel/sched/sched.h |  3 +++
>  3 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 20ffcc044134..d00b92712253 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10068,6 +10068,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
>  		seq_printf(sf, "wait_sum %llu\n", ws);
>  	}
>  
> +	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
> +	seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time);
> +
>  	return 0;
>  }
>  #endif /* CONFIG_CFS_BANDWIDTH */
> @@ -10164,16 +10167,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
>  	{
>  		struct task_group *tg = css_tg(css);
>  		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> -		u64 throttled_usec;
> +		u64 throttled_usec, burst_usec;
>  
>  		throttled_usec = cfs_b->throttled_time;
>  		do_div(throttled_usec, NSEC_PER_USEC);
> +		burst_usec = cfs_b->burst_time;
> +		do_div(burst_usec, NSEC_PER_USEC);
>  
>  		seq_printf(sf, "nr_periods %d\n"
>  			   "nr_throttled %d\n"
> -			   "throttled_usec %llu\n",
> +			   "throttled_usec %llu\n"
> +			   "nr_bursts %d\n"
> +			   "burst_usec %llu\n",
>  			   cfs_b->nr_periods, cfs_b->nr_throttled,
> -			   throttled_usec);
> +			   throttled_usec, cfs_b->nr_burst, burst_usec);
>  	}
>  #endif
>  	return 0;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 44c452072a1b..464371f364f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>   */
>  void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  {
> +	s64 runtime;
> +
>  	if (unlikely(cfs_b->quota == RUNTIME_INF))
>  		return;
>  
>  	cfs_b->runtime += cfs_b->quota;
> +	runtime = cfs_b->runtime_snap - cfs_b->runtime;
> +	if (runtime > 0) {
> +		cfs_b->burst_time += runtime;
> +		cfs_b->nr_burst++;
> +	}
> +
>  	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> +	cfs_b->runtime_snap = cfs_b->runtime;
>  }
>  
>  static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 14a41a243f7b..80e4322727b4 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -367,6 +367,7 @@ struct cfs_bandwidth {
>  	u64			quota;
>  	u64			runtime;
>  	u64			burst;
> +	u64			runtime_snap;
>  	s64			hierarchical_quota;
>  
>  	u8			idle;
> @@ -379,7 +380,9 @@ struct cfs_bandwidth {
>  	/* Statistics: */
>  	int			nr_periods;
>  	int			nr_throttled;
> +	int			nr_burst;
>  	u64			throttled_time;
> +	u64			burst_time;
>  #endif
>  };

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched/fair: Add cfs bandwidth burst statistics
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
  2021-09-03 18:47   ` Benjamin Segall
@ 2021-09-09 11:18   ` tip-bot2 for Huaixin Chang
  2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Huaixin Chang @ 2021-09-09 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shanpei Chen, Tianchen Ding, Huaixin Chang,
	Peter Zijlstra (Intel),
	Daniel Jordan, Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9525616d056e29f7900796cb0c19b38ad274b0eb
Gitweb:        https://git.kernel.org/tip/9525616d056e29f7900796cb0c19b38ad274b0eb
Author:        Huaixin Chang <changhuaixin@linux.alibaba.com>
AuthorDate:    Mon, 30 Aug 2021 11:22:14 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 09 Sep 2021 11:27:32 +02:00

sched/fair: Add cfs bandwidth burst statistics

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_time: cumulative wall-time (in nanoseconds) that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210830032215.16302-2-changhuaixin@linux.alibaba.com
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 953ff36..2877138 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10244,6 +10244,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10359,16 +10362,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b27ed8b..3594884 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4686,11 +4686,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6b2d8b7..094ea86 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -369,6 +369,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -381,7 +382,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched/fair: Add document for burstable CFS bandwidth
  2021-08-30  3:22 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
@ 2021-09-09 11:18   ` tip-bot2 for Huaixin Chang
  2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Huaixin Chang @ 2021-09-09 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shanpei Chen, Tianchen Ding, Huaixin Chang,
	Peter Zijlstra (Intel),
	Daniel Jordan, Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9de47777ee77c2b9bedf43ba92900b199cf1457e
Gitweb:        https://git.kernel.org/tip/9de47777ee77c2b9bedf43ba92900b199cf1457e
Author:        Huaixin Chang <changhuaixin@linux.alibaba.com>
AuthorDate:    Mon, 30 Aug 2021 11:22:15 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 09 Sep 2021 11:27:32 +02:00

sched/fair: Add document for burstable CFS bandwidth

Basic description of usage and effect for CFS Bandwidth Control Burst.

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210830032215.16302-3-changhuaixin@linux.alibaba.com
---
 Documentation/admin-guide/cgroup-v2.rst |  8 ++-
 Documentation/scheduler/sched-bwc.rst   | 84 +++++++++++++++++++++---
 2 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index b1e81aa..4cee0f3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1000,6 +1000,8 @@ All time durations are in microseconds.
 	- nr_periods
 	- nr_throttled
 	- throttled_usec
+	- nr_bursts
+	- burst_usec
 
   cpu.weight
 	A read-write single value file which exists on non-root
@@ -1031,6 +1033,12 @@ All time durations are in microseconds.
 	$PERIOD duration.  "max" for $MAX indicates no limit.  If only
 	one number is written, $MAX is updated.
 
+  cpu.max.burst
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	The burst in the range [0, $MAX].
+
   cpu.pressure
 	A read-write nested-keyed file.
 
diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst
index 845eee6..023b366 100644
--- a/Documentation/scheduler/sched-bwc.rst
+++ b/Documentation/scheduler/sched-bwc.rst
@@ -22,9 +22,52 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it
 is transferred to cpu-local "silos" on a demand basis. The amount transferred
 within each of these updates is tunable and described as the "slice".
 
+Burst feature
+-------------
+This feature borrows time now against our future underrun, at the cost of
+increased interference against the other system users. All nicely bounded.
+
+Traditional (UP-EDF) bandwidth control is something like:
+
+  (U = \Sum u_i) <= 1
+
+This guaranteeds both that every deadline is met and that the system is
+stable. After all, if U were > 1, then for every second of walltime,
+we'd have to run more than a second of program time, and obviously miss
+our deadline, but the next deadline will be further out still, there is
+never time to catch up, unbounded fail.
+
+The burst feature observes that a workload doesn't always executes the full
+quota; this enables one to describe u_i as a statistical distribution.
+
+For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
+(the traditional WCET). This effectively allows u to be smaller,
+increasing the efficiency (we can pack more tasks in the system), but at
+the cost of missing deadlines when all the odds line up. However, it
+does maintain stability, since every overrun must be paired with an
+underrun as long as our x is above the average.
+
+That is, suppose we have 2 tasks, both specify a p(95) value, then we
+have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
+everything is good. At the same time we have a p(5)p(5) = 0.25% chance
+both tasks will exceed their quota at the same time (guaranteed deadline
+fail). Somewhere in between there's a threshold where one exceeds and
+the other doesn't underrun enough to compensate; this depends on the
+specific CDFs.
+
+At the same time, we can say that the worst case deadline miss, will be
+\Sum e_i; that is, there is a bounded tardiness (under the assumption
+that x+e is indeed WCET).
+
+The interferenece when using burst is valued by the possibilities for
+missing the deadline and the average WCET. Test results showed that when
+there many cgroups or CPU is under utilized, the interference is
+limited. More details are shown in:
+https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
+
 Management
 ----------
-Quota and period are managed within the cpu subsystem via cgroupfs.
+Quota, period and burst are managed within the cpu subsystem via cgroupfs.
 
 .. note::
    The cgroupfs files described in this section are only applicable
@@ -32,29 +75,37 @@ Quota and period are managed within the cpu subsystem via cgroupfs.
    :ref:`Documentation/admin-guide/cgroupv2.rst <cgroup-v2-cpu>`.
 
 - cpu.cfs_quota_us: the total available run-time within a period (in
-  microseconds)
+- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
 - cpu.cfs_period_us: the length of a period (in microseconds)
 - cpu.stat: exports throttling statistics [explained further below]
+- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds)
 
 The default values are::
 
 	cpu.cfs_period_us=100ms
-	cpu.cfs_quota=-1
+	cpu.cfs_quota_us=-1
+	cpu.cfs_burst_us=0
 
 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
 bandwidth restriction in place, such a group is described as an unconstrained
 bandwidth group. This represents the traditional work-conserving behavior for
 CFS.
 
-Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms. There is also an
-upper bound on the period length of 1s. Additional restrictions exist when
-bandwidth limits are used in a hierarchical fashion, these are explained in
-more detail below.
+Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will
+enact the specified bandwidth limit. The minimum quota allowed for the quota or
+period is 1ms. There is also an upper bound on the period length of 1s.
+Additional restrictions exist when bandwidth limits are used in a hierarchical
+fashion, these are explained in more detail below.
 
 Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
 and return the group to an unconstrained state once more.
 
+A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate
+any unused bandwidth. It makes the traditional bandwidth control behavior for
+CFS unchanged. Writing any (valid) positive value(s) no larger than
+cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth
+accumulation.
+
 Any updates to a group's bandwidth specification will result in it becoming
 unthrottled if it is in a constrained state.
 
@@ -74,7 +125,7 @@ for more fine-grained consumption.
 
 Statistics
 ----------
-A group's bandwidth statistics are exported via 3 fields in cpu.stat.
+A group's bandwidth statistics are exported via 5 fields in cpu.stat.
 
 cpu.stat:
 
@@ -82,6 +133,9 @@ cpu.stat:
 - nr_throttled: Number of times the group has been throttled/limited.
 - throttled_time: The total time duration (in nanoseconds) for which entities
   of the group have been throttled.
+- nr_bursts: Number of periods burst occurs.
+- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used
+  above quota in respective periods
 
 This interface is read-only.
 
@@ -179,3 +233,15 @@ Examples
 
    By using a small period here we are ensuring a consistent latency
    response at the expense of burst capacity.
+
+4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU
+   additionally, in case accumulation has been done.
+
+   With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
+   And 10ms burst will be equivalent to 20% of 1 CPU.
+
+	# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
+	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
+	# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
+
+   Larger buffer setting (no larger than quota) allows greater burst capacity.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched/fair: Add document for burstable CFS bandwidth
  2021-08-30  3:22 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
  2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
@ 2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Huaixin Chang @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shanpei Chen, Tianchen Ding, Huaixin Chang,
	Peter Zijlstra (Intel),
	Daniel Jordan, Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     d73df887b6b8174dfbb7f5f878fbd1e0e2eb3f08
Gitweb:        https://git.kernel.org/tip/d73df887b6b8174dfbb7f5f878fbd1e0e2eb3f08
Author:        Huaixin Chang <changhuaixin@linux.alibaba.com>
AuthorDate:    Mon, 30 Aug 2021 11:22:15 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:51:41 +02:00

sched/fair: Add document for burstable CFS bandwidth

Basic description of usage and effect for CFS Bandwidth Control Burst.

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210830032215.16302-3-changhuaixin@linux.alibaba.com
---
 Documentation/admin-guide/cgroup-v2.rst |  8 ++-
 Documentation/scheduler/sched-bwc.rst   | 84 +++++++++++++++++++++---
 2 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index babbe04..d5b0e8a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1016,6 +1016,8 @@ All time durations are in microseconds.
 	- nr_periods
 	- nr_throttled
 	- throttled_usec
+	- nr_bursts
+	- burst_usec
 
   cpu.weight
 	A read-write single value file which exists on non-root
@@ -1047,6 +1049,12 @@ All time durations are in microseconds.
 	$PERIOD duration.  "max" for $MAX indicates no limit.  If only
 	one number is written, $MAX is updated.
 
+  cpu.max.burst
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	The burst in the range [0, $MAX].
+
   cpu.pressure
 	A read-write nested-keyed file.
 
diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst
index 1fc7355..173c141 100644
--- a/Documentation/scheduler/sched-bwc.rst
+++ b/Documentation/scheduler/sched-bwc.rst
@@ -22,9 +22,52 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it
 is transferred to cpu-local "silos" on a demand basis. The amount transferred
 within each of these updates is tunable and described as the "slice".
 
+Burst feature
+-------------
+This feature borrows time now against our future underrun, at the cost of
+increased interference against the other system users. All nicely bounded.
+
+Traditional (UP-EDF) bandwidth control is something like:
+
+  (U = \Sum u_i) <= 1
+
+This guaranteeds both that every deadline is met and that the system is
+stable. After all, if U were > 1, then for every second of walltime,
+we'd have to run more than a second of program time, and obviously miss
+our deadline, but the next deadline will be further out still, there is
+never time to catch up, unbounded fail.
+
+The burst feature observes that a workload doesn't always executes the full
+quota; this enables one to describe u_i as a statistical distribution.
+
+For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
+(the traditional WCET). This effectively allows u to be smaller,
+increasing the efficiency (we can pack more tasks in the system), but at
+the cost of missing deadlines when all the odds line up. However, it
+does maintain stability, since every overrun must be paired with an
+underrun as long as our x is above the average.
+
+That is, suppose we have 2 tasks, both specify a p(95) value, then we
+have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
+everything is good. At the same time we have a p(5)p(5) = 0.25% chance
+both tasks will exceed their quota at the same time (guaranteed deadline
+fail). Somewhere in between there's a threshold where one exceeds and
+the other doesn't underrun enough to compensate; this depends on the
+specific CDFs.
+
+At the same time, we can say that the worst case deadline miss, will be
+\Sum e_i; that is, there is a bounded tardiness (under the assumption
+that x+e is indeed WCET).
+
+The interferenece when using burst is valued by the possibilities for
+missing the deadline and the average WCET. Test results showed that when
+there many cgroups or CPU is under utilized, the interference is
+limited. More details are shown in:
+https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
+
 Management
 ----------
-Quota and period are managed within the cpu subsystem via cgroupfs.
+Quota, period and burst are managed within the cpu subsystem via cgroupfs.
 
 .. note::
    The cgroupfs files described in this section are only applicable
@@ -32,29 +75,37 @@ Quota and period are managed within the cpu subsystem via cgroupfs.
    :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
 
 - cpu.cfs_quota_us: the total available run-time within a period (in
-  microseconds)
+- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
 - cpu.cfs_period_us: the length of a period (in microseconds)
 - cpu.stat: exports throttling statistics [explained further below]
+- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds)
 
 The default values are::
 
 	cpu.cfs_period_us=100ms
-	cpu.cfs_quota=-1
+	cpu.cfs_quota_us=-1
+	cpu.cfs_burst_us=0
 
 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
 bandwidth restriction in place, such a group is described as an unconstrained
 bandwidth group. This represents the traditional work-conserving behavior for
 CFS.
 
-Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms. There is also an
-upper bound on the period length of 1s. Additional restrictions exist when
-bandwidth limits are used in a hierarchical fashion, these are explained in
-more detail below.
+Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will
+enact the specified bandwidth limit. The minimum quota allowed for the quota or
+period is 1ms. There is also an upper bound on the period length of 1s.
+Additional restrictions exist when bandwidth limits are used in a hierarchical
+fashion, these are explained in more detail below.
 
 Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
 and return the group to an unconstrained state once more.
 
+A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate
+any unused bandwidth. It makes the traditional bandwidth control behavior for
+CFS unchanged. Writing any (valid) positive value(s) no larger than
+cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth
+accumulation.
+
 Any updates to a group's bandwidth specification will result in it becoming
 unthrottled if it is in a constrained state.
 
@@ -74,7 +125,7 @@ for more fine-grained consumption.
 
 Statistics
 ----------
-A group's bandwidth statistics are exported via 3 fields in cpu.stat.
+A group's bandwidth statistics are exported via 5 fields in cpu.stat.
 
 cpu.stat:
 
@@ -82,6 +133,9 @@ cpu.stat:
 - nr_throttled: Number of times the group has been throttled/limited.
 - throttled_time: The total time duration (in nanoseconds) for which entities
   of the group have been throttled.
+- nr_bursts: Number of periods burst occurs.
+- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used
+  above quota in respective periods
 
 This interface is read-only.
 
@@ -179,3 +233,15 @@ Examples
 
    By using a small period here we are ensuring a consistent latency
    response at the expense of burst capacity.
+
+4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU
+   additionally, in case accumulation has been done.
+
+   With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
+   And 10ms burst will be equivalent to 20% of 1 CPU.
+
+	# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
+	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
+	# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
+
+   Larger buffer setting (no larger than quota) allows greater burst capacity.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: sched/core] sched/fair: Add cfs bandwidth burst statistics
  2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
  2021-09-03 18:47   ` Benjamin Segall
  2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
@ 2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Huaixin Chang @ 2021-10-05 14:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Shanpei Chen, Tianchen Ding, Huaixin Chang,
	Peter Zijlstra (Intel),
	Daniel Jordan, Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     bcb1704a1ed2de580a46f28922e223a65f16e0f5
Gitweb:        https://git.kernel.org/tip/bcb1704a1ed2de580a46f28922e223a65f16e0f5
Author:        Huaixin Chang <changhuaixin@linux.alibaba.com>
AuthorDate:    Mon, 30 Aug 2021 11:22:14 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Oct 2021 15:51:40 +02:00

sched/fair: Add cfs bandwidth burst statistics

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_time: cumulative wall-time (in nanoseconds) that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210830032215.16302-2-changhuaixin@linux.alibaba.com
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f963c81..ccb604a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10406,6 +10406,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10521,16 +10524,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5457c80..fd41abe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4715,11 +4715,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 15a8895..8712fc4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -369,6 +369,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -381,7 +382,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-08-23 18:32   ` Daniel Jordan
@ 2021-08-23 19:03     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2021-08-23 19:03 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Huaixin Chang, linux-kernel, peterz, anderson, baruah, bsegall,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

On Mon, Aug 23, 2021 at 02:32:10PM -0400, Daniel Jordan wrote:
> burst_time is nsec, not usec.
> 
> Just "burst_time" seems the way to go for consistency with
> throttled_time in the same file, though "burst_usec" nicely has units
> and would have the same name between cgroup1 and 2.

The cgroup2 interface rule is that time units are in usecs. The right thing
to do is keeping the _usec postfix (to avoid confusion due to the unit
change from cgroup1) and converting it to usecs for printing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-08-16  7:08 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-08-23 18:32   ` Daniel Jordan
  2021-08-23 19:03     ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Jordan @ 2021-08-23 18:32 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: linux-kernel, peterz, anderson, baruah, bsegall,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

On Mon, Aug 16, 2021 at 03:08:48PM +0800, Huaixin Chang wrote:
> Two new statistics are introduced to show the internal of burst feature
> and explain why burst helps or not.
> 
> nr_bursts:  number of periods bandwidth burst occurs
> burst_usec: cumulative wall-time that any cpus has
> 	    used above quota in respective periods
> 
> Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/core.c  | 13 ++++++++++---
>  kernel/sched/fair.c  |  9 +++++++++
>  kernel/sched/sched.h |  3 +++
>  3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d9ff40f4661..9a286c8a1354 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
>  		seq_printf(sf, "wait_sum %llu\n", ws);
>  	}
>  
> +	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
> +	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);

burst_time is nsec, not usec.

Just "burst_time" seems the way to go for consistency with
throttled_time in the same file, though "burst_usec" nicely has units
and would have the same name between cgroup1 and 2.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-08-16  7:08 [PATCH 0/2] Add statistics and document for cfs bandwidth burst Huaixin Chang
@ 2021-08-16  7:08 ` Huaixin Chang
  2021-08-23 18:32   ` Daniel Jordan
  0 siblings, 1 reply; 16+ messages in thread
From: Huaixin Chang @ 2021-08-16  7:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, anderson, baruah, bsegall, changhuaixin,
	dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_usec: cumulative wall-time that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d9ff40f4661..9a286c8a1354 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10184,16 +10187,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44c452072a1b..464371f364f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14a41a243f7b..80e4322727b4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -367,6 +367,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -379,7 +380,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-08-12 12:18   ` changhuaixin
  0 siblings, 0 replies; 16+ messages in thread
From: changhuaixin @ 2021-08-12 12:18 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: Peter Zijlstra, anderson, baruah, Benjamin Segall,
	Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list,
	luca.abeni, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal,
	pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo,
	tommaso.cucinotta, Vincent Guittot, xiyou.wangcong

Ping.

The statistics code is further simplified than the one discussed before. Mind having a look at it?

> On Jul 30, 2021, at 3:09 PM, Huaixin Chang <changhuaixin@linux.alibaba.com> wrote:
> 
> Two new statistics are introduced to show the internal of burst feature
> and explain why burst helps or not.
> 
> nr_bursts:  number of periods bandwidth burst occurs
> burst_usec: cumulative wall-time that any cpus has
> 	    used above quota in respective periods
> 
> Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
> ---
> kernel/sched/core.c  | 13 ++++++++++---
> kernel/sched/fair.c  |  9 +++++++++
> kernel/sched/sched.h |  3 +++
> 3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d9ff40f4661..9a286c8a1354 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
> 		seq_printf(sf, "wait_sum %llu\n", ws);
> 	}
> 
> +	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
> +	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);
> +
> 	return 0;
> }
> #endif /* CONFIG_CFS_BANDWIDTH */
> @@ -10184,16 +10187,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
> 	{
> 		struct task_group *tg = css_tg(css);
> 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> -		u64 throttled_usec;
> +		u64 throttled_usec, burst_usec;
> 
> 		throttled_usec = cfs_b->throttled_time;
> 		do_div(throttled_usec, NSEC_PER_USEC);
> +		burst_usec = cfs_b->burst_time;
> +		do_div(burst_usec, NSEC_PER_USEC);
> 
> 		seq_printf(sf, "nr_periods %d\n"
> 			   "nr_throttled %d\n"
> -			   "throttled_usec %llu\n",
> +			   "throttled_usec %llu\n"
> +			   "nr_bursts %d\n"
> +			   "burst_usec %llu\n",
> 			   cfs_b->nr_periods, cfs_b->nr_throttled,
> -			   throttled_usec);
> +			   throttled_usec, cfs_b->nr_burst, burst_usec);
> 	}
> #endif
> 	return 0;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 44c452072a1b..464371f364f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>  */
> void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
> {
> +	s64 runtime;
> +
> 	if (unlikely(cfs_b->quota == RUNTIME_INF))
> 		return;
> 
> 	cfs_b->runtime += cfs_b->quota;
> +	runtime = cfs_b->runtime_snap - cfs_b->runtime;
> +	if (runtime > 0) {
> +		cfs_b->burst_time += runtime;
> +		cfs_b->nr_burst++;
> +	}
> +
> 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> +	cfs_b->runtime_snap = cfs_b->runtime;
> }
> 
> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 14a41a243f7b..80e4322727b4 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -367,6 +367,7 @@ struct cfs_bandwidth {
> 	u64			quota;
> 	u64			runtime;
> 	u64			burst;
> +	u64			runtime_snap;
> 	s64			hierarchical_quota;
> 
> 	u8			idle;
> @@ -379,7 +380,9 @@ struct cfs_bandwidth {
> 	/* Statistics: */
> 	int			nr_periods;
> 	int			nr_throttled;
> +	int			nr_burst;
> 	u64			throttled_time;
> +	u64			burst_time;
> #endif
> };
> 
> -- 
> 2.14.4.44.g2045bb6
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
@ 2021-07-30  7:09 ` Huaixin Chang
  2021-08-12 12:18   ` changhuaixin
  0 siblings, 1 reply; 16+ messages in thread
From: Huaixin Chang @ 2021-07-30  7:09 UTC (permalink / raw)
  To: peterz
  Cc: anderson, baruah, bsegall, changhuaixin, dietmar.eggemann,
	dtcccc, juri.lelli, khlebnikov, linux-kernel, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_usec: cumulative wall-time that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d9ff40f4661..9a286c8a1354 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10184,16 +10187,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44c452072a1b..464371f364f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14a41a243f7b..80e4322727b4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -367,6 +367,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -379,7 +380,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 
-- 
2.14.4.44.g2045bb6


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-10-05 14:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30  3:22 [PATCH 0/2 v2] Add statistics and document for cfs_b burst Huaixin Chang
2021-08-30  3:22 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
2021-09-03 18:47   ` Benjamin Segall
2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
2021-08-30  3:22 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
2021-09-09 11:18   ` [tip: sched/core] " tip-bot2 for Huaixin Chang
2021-10-05 14:12   ` tip-bot2 for Huaixin Chang
2021-08-30 16:49 ` [PATCH 0/2 v2] Add statistics and document for cfs_b burst Tejun Heo
2021-09-01 15:37 ` Daniel Jordan
2021-09-03 13:12 ` changhuaixin
  -- strict thread matches above, loose matches on Subject: below --
2021-08-16  7:08 [PATCH 0/2] Add statistics and document for cfs bandwidth burst Huaixin Chang
2021-08-16  7:08 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
2021-08-23 18:32   ` Daniel Jordan
2021-08-23 19:03     ` Tejun Heo
2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
2021-08-12 12:18   ` changhuaixin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).