LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
@ 2021-07-06 13:18 Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 1/6] PM: Introduce Active Stats framework Lukasz Luba
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

Hi all,

This patch set introduces a new mechanism: Active Stats framework (ASF), which
gathers and maintains statistics of CPU performance - time residency at each
performance state.

The ASF tracks all the frequency transitions as well as CPU
idle entry/exit events for all CPUs. Based on that it accounts the active
(non-idle) residency time for each CPU at each frequency. This information can
be used by some other subsystems (like thermal governor) to enhance their
estimations about CPU usage at a given period.

Does it fix something in mainline?
Yes, there is thermal governor Intelligent Power Allocation (IPA), which
estimates the CPUs power used in the past. IPA is sampling the CPU utilization
and frequency and relies on the info available at the time of sampling
and this imposes the estimation errors.
The use of ASF solve the issue and enables IPA to make better estimates.

Why it couldn't be done using existing frameworks?
The CPUFreq and CPUIdle statistics are not combined, so it is not possible
to derive the information on how long exactly the CPU was running with a given
frequency. This new framework combines that information and provides
it in a handy way. IMHO it has to be implemented as a new framework, next to
CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
governor into the frequency change and idle code paths.

Tha patch 4/6 introduces a new API for cooling devices, which allows to
stop tracking the freq and idle statistics.

The patch set contains also a patches 5/6 6/6 which adds the new power model
based on ASF into the cpufreq cooling (used by thermal governor IPA).
It is added as ifdef option, since Active Stats might be not compiled in.
The ASF is a compile time option, but that might be changed and IPA could
select it, which would allow to remove some redundant code from
cpufreq_cooling.c.

Comments and suggestions are very welcome.

Changelog:
v2:
- added interface for cooling devices to support custom setup used
  by IPA, which requires Active Stats alocated and running; when IPA
  is not working Active Stats are deactivated
- added mechanism to stop tracking CPU freq and idle changes in Active Stats
  when there are no clients of this information
- removed spinlock in idle path (and hotplug) and redesigned how Active Stats
  Monitor (ASM) tracks the changed performance; ASM no longer can update
  local ASM stats in periodic check; only CPU entering/exiting idle can do that
v1:
- basic implementation which can be found at [1]


Regards,
Lukasz Luba

[1] https://lore.kernel.org/linux-pm/20210622075925.16189-1-lukasz.luba@arm.com/

Lukasz Luba (6):
  PM: Introduce Active Stats framework
  cpuidle: Add Active Stats calls tracking idle entry/exit
  cpufreq: Add Active Stats calls tracking frequency changes
  thermal: Add interface to cooling devices to handle governor change
  thermal/core/power allocator: Prepare power actors and calm down when
    not used
  thermal: cpufreq_cooling: Improve power estimation based on Active
    Stats framework

 Documentation/power/active_stats.rst  | 128 ++++
 Documentation/power/index.rst         |   1 +
 MAINTAINERS                           |   8 +
 drivers/cpufreq/cpufreq.c             |   5 +
 drivers/cpuidle/cpuidle.c             |   5 +
 drivers/thermal/cpufreq_cooling.c     | 132 ++++
 drivers/thermal/gov_power_allocator.c |  71 ++
 include/linux/active_stats.h          | 131 ++++
 include/linux/thermal.h               |   1 +
 kernel/power/Kconfig                  |   9 +
 kernel/power/Makefile                 |   1 +
 kernel/power/active_stats.c           | 921 ++++++++++++++++++++++++++
 12 files changed, 1413 insertions(+)
 create mode 100644 Documentation/power/active_stats.rst
 create mode 100644 include/linux/active_stats.h
 create mode 100644 kernel/power/active_stats.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 1/6] PM: Introduce Active Stats framework
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 2/6] cpuidle: Add Active Stats calls tracking idle entry/exit Lukasz Luba
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

Introduce a new centralized mechanism for gathering and maintaining CPU
performance statistics - the Active Stats Framework (ASF). It tracks the
CPU activity at all performance levels (frequencies) taking into account
the idle entry/exit. This combined information can be used by other kernel
subsystems, like thermal governor, which tries to estimate the CPU
performance. The information about frequency change comes from CPUFreq
framework, while the idle entry/exit from CPUIdle framework. The ASF is
triggered in both scenarios so it can combine them locally and account in a
proper data structures. ASF is also attached to CPU hotplug, to properly
track it.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 Documentation/power/active_stats.rst | 128 ++++
 Documentation/power/index.rst        |   1 +
 MAINTAINERS                          |   8 +
 include/linux/active_stats.h         | 131 ++++
 kernel/power/Kconfig                 |   9 +
 kernel/power/Makefile                |   1 +
 kernel/power/active_stats.c          | 921 +++++++++++++++++++++++++++
 7 files changed, 1199 insertions(+)
 create mode 100644 Documentation/power/active_stats.rst
 create mode 100644 include/linux/active_stats.h
 create mode 100644 kernel/power/active_stats.c

diff --git a/Documentation/power/active_stats.rst b/Documentation/power/active_stats.rst
new file mode 100644
index 000000000000..c97adbf679b7
--- /dev/null
+++ b/Documentation/power/active_stats.rst
@@ -0,0 +1,128 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Active Stats framework
+=======================
+
+1. Overview
+-----------
+
+The Active Stats framework provides useful statistical information on CPU
+performance state time residency to other frameworks or governors. The
+information contains a real time spent by the CPU when running at each
+performance state (frequency), excluding idle time. This knowledge might be used
+by other kernel subsystems to improve their mechanisms, e.g. used power
+estimation. The statistics does not distinguish idle states and does not track
+each idle state residency. It is done in this way because in most of the modern
+platforms the kernel has no control over deeper CPU idle states.
+
+The Active Stats consists of two components:
+a) the Active Stats Tracking (AST) mechanism
+b) Active Stats Monitor (ASM)
+
+The diagram below presents the design::
+
+       +---------------+  +---------------+
+       |    CPUFreq    |  |    CPUIdle    |
+       +---------------+  +---------------+
+                 |                 |
+                 |                 |
+                 +-------+         |
+                         |         |
+                         |         |
+       +--------------------------------------------------------+
+       | Active Stats    |         |                            |
+       | Framework       v         v                            |
+       |                 +---------------------+                |
+       |                 |    Active Stats     |                |
+       |                 |     Tracking        |                |
+       |                 +---------------------+                |
+       |                           |                            |
+       |                           |                            |
+       |                 +---------------------+                |
+       |                 |    Active Stats     |                |
+       |                 |     Monitor         |                |
+       |                 +---------------------+                |
+       |                   ^         ^                          |
+       |                   |         |                          |
+       +--------------------------------------------------------+
+                           |         |
+                           |         |
+                +----------+         |
+                |                    |
+        +------------------+   +--------------+
+        | Thermal Governor |   |    Other     |
+        +------------------+   +--------------+
+                ^                    ^
+                |                    |
+        +------------------+   +--------------+
+        | Thermal          |   |      ?       |
+        +------------------+   +--------------+
+
+The AST mechanism gathers and maintains statistics from CPUFreq and CPUIdle
+frameworks. It is triggered in the CPU frequency switch paths. It accounts
+the time spent at each frequency for each CPU. It supports per-CPU DVFS as
+well as shared frequency domain. The idle time accounting is triggered every
+time the CPU enters and exits idle state.
+
+The ASM is a component which is used by other kernel subsystems (like thermal
+governor) willing to know these statistical information. The client subsystem
+uses its private ASM structure to keep track of the system statistics and
+calculated local difference. Based on ASM data it is possible to calculate CPU
+running time at each frequency, from last check. The time difference between
+checks of ASM determines the period.
+
+
+2. Core APIs
+------------
+
+2.1 Config options
+^^^^^^^^^^^^^^^^^^
+
+CONFIG_ACTIVE_STATS must be enabled to use the Active Stats framework.
+
+
+2.2 Registration of the Active Stats Monitor
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Subsystems which want to use ASM to track CPU performance should register
+into the Active Stats framework by calling the following API::
+
+  struct active_stats_monitor *active_stats_cpu_setup_monitor(int cpu);
+
+The allocated and configured ASM structure will be returned in success
+otherwise a proper ERR_PTR() will be provided. See the
+kernel/power/active_stats.c for further documenation of this function.
+
+
+2.3 Accessing the Active Stats Monitor
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To get the latest statistics about CPU performance there is a need to call
+the following API::
+
+  int active_stats_cpu_update_monitor(struct active_stats_monitor *ast_mon);
+
+In this call the Active Stats framework calculates and populates lastest
+statistics for the given CPU. The statistics are then provided in the
+'ast_mon->local' which is an array of 'struct active_stats_state'.
+That structure contains array of time residency for each performance state
+(in nanoseconds). Looping over the array 'ast_mon->local->residency' (which
+has length provided in 'ast_mon->states_count') will provide detailed
+information about time residency at each performance state (frequency) in
+the given period present in 'ast_mon->local_period'. The local period value
+might be bigger than the sum in the residency array, when the CPU was idle
+or offline. More details about internals of the structures can be found in
+include/linux/active_stats.h.
+The 1st argument is the ASM allocated structure returned during setup (2.2).
+
+
+2.4 Clean up of the Active Stats Monitor
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To clean up the ASM when it is not needed anymore there is a need to call
+the following API::
+
+  void active_stats_cpu_free_monitor(struct active_stats_monitor *ast_mon);
+
+The associated structures will be freed.
diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst
index a0f5244fb427..757b8df44651 100644
--- a/Documentation/power/index.rst
+++ b/Documentation/power/index.rst
@@ -7,6 +7,7 @@ Power Management
 .. toctree::
     :maxdepth: 1
 
+    active_stats
     apm-acpi
     basic-pm-debugging
     charger-manager
diff --git a/MAINTAINERS b/MAINTAINERS
index efeaebe1bcae..6b831d4834d5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -446,6 +446,14 @@ F:	Documentation/virt/acrn/
 F:	drivers/virt/acrn/
 F:	include/uapi/linux/acrn.h
 
+ACTIVE STATS
+M:	Lukasz Luba <lukasz.luba@arm.com>
+L:	linux-pm@vger.kernel.org
+S:	Maintained
+F:	Documentation/power/active_stats.rst
+F:	include/linux/active_stats.h
+F:	kernel/power/active_stats.c
+
 AD1889 ALSA SOUND DRIVER
 L:	linux-parisc@vger.kernel.org
 S:	Maintained
diff --git a/include/linux/active_stats.h b/include/linux/active_stats.h
new file mode 100644
index 000000000000..e7b81f5d774c
--- /dev/null
+++ b/include/linux/active_stats.h
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_ACTIVE_STATS_H
+#define _LINUX_ACTIVE_STATS_H
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/seqlock.h>
+#include <linux/spinlock.h>
+
+/**
+ * active_stats_state - State statistics associated with performance level
+ * @last_event_ts:	Timestamp of the last event in nanoseconds
+ * @last_freq_idx:	Last used frequency index
+ * @residency:		Array which holds total time (in nanoseconds) that
+ *			each frequency has been used when CPU was
+ *			running
+ */
+struct active_stats_state {
+	u64 last_event_ts;
+	int last_freq_idx;
+	u64 *residency;
+};
+
+/**
+ * active_stats_snapshot - Active Stats Snapshot structure
+ * @curr:	Snapshot of statistics from Active Stats main structure
+ *		which accounts this CPU performance states residency
+ * @prev:	Old snapshot of the Active Stats main structure
+ *		structure, against which new snapshot is compared
+ * @result:	Statistics of running time for each performance state which
+ *		are calculated for this CPU for a specific period based on
+ *		@prev and @curr data.
+ */
+struct active_stats_snapshot {
+	struct active_stats_state *curr;
+	struct active_stats_state *prev;
+	struct active_stats_state *result;
+};
+
+/**
+ * active_stats - Active Stats main structure
+ * @activated:	Set when the tracking mechanism is used
+ * @num_clients:	Number of clients using tracking mechanism
+ * @in_ilde:	Set when CPU is in idle
+ * @offline:	Set when CPU was hotplug out and is offline
+ * @states_count:	Number of state entries in the statistics
+ * @states_size:	Size of the state stats entries in bytes
+ * @freq:	Frequency table
+ * @local:	Statistics of running time which are calculated for this CPU
+ * @snapshot:	Snapshot of statistics which accounts the frequencies
+ *		residency combined with idle period
+ * @seqcount:	Used for making consistent state for the reader side
+ *		of this statistics data
+ */
+struct active_stats {
+	bool activated;
+	int num_clients;
+	bool in_idle;
+	bool offline;
+	unsigned int states_count;
+	unsigned int states_size;
+	unsigned int *freq;
+	struct active_stats_snapshot snapshot;
+	struct active_stats *shared_ast;
+	struct mutex activation_lock;
+	/* protect concurent cpufreq changes in slow path */
+	spinlock_t lock;
+	/* Seqcount to create consistent state in the read side */
+	seqcount_t seqcount;
+};
+
+/**
+ * active_stats_monitor - Active Stats Monitor structure
+ * @local_period:	Local period for which the statistics are provided
+ * @states_count:	Number of state entries in the statistics
+ * @states_size:	Size of the state stats entries in bytes
+ * @ast:	Active Stats structure for the associated CPU, which is used
+ *		for taking the snapshot
+ * @snapshot:	Snapshot of statistics which accounts for this private monitor
+ *		period the frequencies residency combined with idle
+ * @tmp_view:	Snapshot of statistics which is used for calculating local
+ *		monitor statistics for private period the frequencies
+ *		residency combined with idle
+ * @lock:	Lock which is used to serialize access to Active Stats
+ *		Monitor structure which might be used concurrently.
+ */
+struct active_stats_monitor {
+	int cpu;
+	u64 local_period;
+	unsigned int states_count;
+	unsigned int states_size;
+	struct active_stats *ast;
+	struct active_stats_snapshot snapshot;
+	struct active_stats_snapshot tmp_view;
+	struct mutex lock;
+};
+
+#if defined(CONFIG_ACTIVE_STATS)
+void active_stats_cpu_idle_enter(ktime_t time_start);
+void active_stats_cpu_idle_exit(ktime_t time_start);
+void active_stats_cpu_freq_fast_change(int cpu, unsigned int freq);
+void active_stats_cpu_freq_change(int cpu, unsigned int freq);
+struct active_stats_monitor *active_stats_cpu_setup_monitor(int cpu);
+void active_stats_cpu_free_monitor(struct active_stats_monitor *ast_mon);
+int active_stats_cpu_update_monitor(struct active_stats_monitor *ast_mon);
+#else
+static inline
+void active_stats_cpu_freq_fast_change(int cpu, unsigned int freq) {}
+static inline
+void active_stats_cpu_freq_change(int cpu, unsigned int freq) {}
+static inline
+void active_stats_cpu_idle_enter(ktime_t time_start) {}
+static inline
+void active_stats_cpu_idle_exit(ktime_t time_start) {}
+static inline
+struct active_stats_monitor *active_stats_cpu_setup_monitor(int cpu)
+{
+	return ERR_PTR(-EINVAL);
+}
+static inline
+void active_stats_cpu_free_monitor(struct active_stats_monitor *ast_mon)
+{
+}
+static inline
+int active_stats_cpu_update_monitor(struct active_stats_monitor *ast_mon)
+{
+	return -EINVAL;
+}
+#endif
+
+#endif /* _LINUX_ACTIVE_STATS_H */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 6bfe3ead10ad..c1701a75fe4f 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -331,3 +331,12 @@ config ENERGY_MODEL
 	  The exact usage of the energy model is subsystem-dependent.
 
 	  If in doubt, say N.
+
+config ACTIVE_STATS
+	bool "Enable Active Stats"
+	depends on CPU_FREQ && CPU_IDLE && PM_OPP
+	help
+	  Enable support for architectures common code for tracking and
+	  processing the CPU performance time residency. It is useful for
+	  power estimation, energy usage and can be used in frameworks such
+	  as thermal.
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 5899260a8bef..4b89aabf2259 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_PM_WAKELOCKS)	+= wakelock.o
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
 
 obj-$(CONFIG_ENERGY_MODEL)	+= energy_model.o
+obj-$(CONFIG_ACTIVE_STATS)	+= active_stats.o
diff --git a/kernel/power/active_stats.c b/kernel/power/active_stats.c
new file mode 100644
index 000000000000..96c4f9238620
--- /dev/null
+++ b/kernel/power/active_stats.c
@@ -0,0 +1,921 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Active Stats - CPU performance statistics tracking mechanism, which
+ * provides handy and combined information about how long a CPU was running
+ * at each frequency - excluding idle period. It is more detailed information
+ * than just time accounted in CPUFreq when that frequency was set.
+ *
+ * Copyright (C) 2021, ARM Ltd.
+ * Written by: Lukasz Luba, ARM Ltd.
+ */
+
+#include <linux/active_stats.h>
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/ktime.h>
+#include <linux/minmax.h>
+#include <linux/percpu.h>
+#include <linux/pm_opp.h>
+#include <linux/sched/clock.h>
+#include <linux/slab.h>
+
+static cpumask_var_t cpus_to_visit;
+static void processing_done_fn(struct work_struct *work);
+static DECLARE_WORK(processing_done_work, processing_done_fn);
+
+static DEFINE_PER_CPU(struct active_stats *, ast_local);
+
+static struct active_stats_state *alloc_state_stats(int count);
+static void update_local_stats(struct active_stats *ast, ktime_t event_ts);
+static void get_stats_snapshot(struct active_stats *ast);
+
+#ifdef CONFIG_DEBUG_FS
+static struct dentry *rootdir;
+
+static int active_stats_debug_residency_show(struct seq_file *s, void *unused)
+{
+	struct active_stats *ast = s->private;
+	u64 ts, residency;
+	int i;
+
+	ts = local_clock();
+
+	/*
+	 * Print statistics for each performance state and related residency
+	 * time [ns].
+	 */
+	for (i = 0; i < ast->states_count; i++) {
+		residency = ast->snapshot.result->residency[i];
+		if (i == ast->snapshot.result->last_freq_idx && !ast->in_idle && !ast->offline)
+			residency += ts - ast->snapshot.result->last_event_ts;
+
+		seq_printf(s, "%u:\t%llu\n", ast->freq[i], residency);
+	}
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(active_stats_debug_residency);
+
+static void active_stats_debug_init(int cpu)
+{
+	struct device *dev;
+	struct dentry *d;
+
+	if (!rootdir)
+		rootdir = debugfs_create_dir("active_stats", NULL);
+
+	dev = get_cpu_device(cpu);
+	if (!dev)
+		return;
+
+	d = debugfs_create_dir(dev_name(dev), rootdir);
+	debugfs_create_file("time_in_state", 0444, d,
+			    per_cpu(ast_local, cpu),
+			    &active_stats_debug_residency_fops);
+}
+
+static void active_stats_debug_remove(int cpu)
+{
+	struct dentry *debug_dir;
+	struct device *dev;
+
+	dev = get_cpu_device(cpu);
+	if (!dev || !rootdir)
+		return;
+
+	debug_dir = debugfs_lookup(dev_name(dev), rootdir);
+	debugfs_remove_recursive(debug_dir);
+}
+#else /* CONFIG_DEBUG_FS */
+static void active_stats_debug_init(int cpu) {}
+static void active_stats_debug_remove(int cpu) {}
+#endif
+
+static int get_freq_index(struct active_stats *ast, unsigned int freq)
+{
+	int i;
+
+	for (i = 0; i < ast->states_count; i++)
+		if (ast->freq[i] == freq)
+			return i;
+	return -EINVAL;
+}
+
+static void free_state_stats(struct active_stats_state *stats)
+{
+	if (!stats)
+		return;
+
+	kfree(stats->residency);
+	kfree(stats);
+}
+
+static void active_stats_deactivate(struct active_stats *ast)
+{
+	mutex_lock(&ast->activation_lock);
+
+	ast->num_clients--;
+	if (!ast->num_clients)
+		WRITE_ONCE(ast->activated, false);
+
+	/* Do similar accouning for shared structure and not deeper */
+	if (ast->shared_ast)
+		active_stats_deactivate(ast->shared_ast);
+
+	mutex_unlock(&ast->activation_lock);
+}
+
+static void active_stats_reinit_snapshots(struct active_stats *ast, int cpu)
+{
+	int size = ast->states_size;
+	unsigned long freq;
+	int curr_freq_idx;
+	u64 curr_ts;
+
+	freq = cpufreq_quick_get(cpu);
+	curr_freq_idx = get_freq_index(ast, freq);
+	if (curr_freq_idx < 0)
+		curr_freq_idx = 0;
+
+	curr_ts = local_clock();
+
+	/* Only idle stats have the 'curr' and 'prev' */
+	if (ast->shared_ast) {
+		ast->snapshot.curr->last_event_ts = curr_ts;
+		ast->snapshot.curr->last_freq_idx = curr_freq_idx;
+
+		ast->snapshot.prev->last_freq_idx = curr_freq_idx;
+		ast->snapshot.prev->last_event_ts = curr_ts;
+
+		memcpy(ast->snapshot.prev->residency,
+		       ast->snapshot.curr->residency, size);
+	}
+
+	ast->snapshot.result->last_event_ts = curr_ts;
+	ast->snapshot.result->last_freq_idx = curr_freq_idx;
+}
+
+static void active_stats_activate(struct active_stats *ast, int cpu)
+{
+	mutex_lock(&ast->activation_lock);
+
+	ast->num_clients++;
+	if (!READ_ONCE(ast->activated)) {
+		/* For idle tracking stats take snapshot of freq stats */
+		if (ast->shared_ast) {
+			get_stats_snapshot(ast);
+			ast->in_idle = idle_cpu(cpu);
+		}
+
+		active_stats_reinit_snapshots(ast, cpu);
+		WRITE_ONCE(ast->activated, true);
+	}
+
+	mutex_unlock(&ast->activation_lock);
+}
+
+/**
+ * active_stats_cpu_setup_monitor - setup Active Stats Monitor structures
+ * @cpu:	CPU id for which the update is done
+ *
+ * Setup Active Stats Monitor statistics for a given @cpu. It allocates the
+ * needed structures for tracking the CPU performance levels residency.
+ * Return a valid pointer to struct active_stats_monitor or corresponding
+ * ERR_PTR().
+ */
+struct active_stats_monitor *active_stats_cpu_setup_monitor(int cpu)
+{
+	struct active_stats_monitor *ast_mon;
+	struct active_stats *ast;
+
+	ast = per_cpu(ast_local, cpu);
+	if (!ast)
+		return ERR_PTR(-EINVAL);
+
+	ast_mon = kzalloc(sizeof(struct active_stats_monitor), GFP_KERNEL);
+	if (!ast_mon)
+		return ERR_PTR(-ENOMEM);
+
+	ast_mon->snapshot.result = alloc_state_stats(ast->states_count);
+	if (!ast_mon->snapshot.result)
+		goto free_ast_mon;
+
+	ast_mon->snapshot.curr = alloc_state_stats(ast->states_count);
+	if (!ast_mon->snapshot.curr)
+		goto free_local;
+
+	ast_mon->snapshot.prev = alloc_state_stats(ast->states_count);
+	if (!ast_mon->snapshot.prev)
+		goto free_snapshot_curr;
+
+	ast_mon->tmp_view.result = alloc_state_stats(ast->states_count);
+	if (!ast_mon->tmp_view.result)
+		goto free_snapshot_prev;
+
+	ast_mon->tmp_view.curr = alloc_state_stats(ast->states_count);
+	if (!ast_mon->tmp_view.curr)
+		goto free_tmp_view_result;
+
+	ast_mon->tmp_view.prev = alloc_state_stats(ast->states_count);
+	if (!ast_mon->tmp_view.prev)
+		goto free_tmp_view_snapshot_curr;
+
+	ast_mon->ast = ast;
+	ast_mon->local_period = 0;
+	ast_mon->states_count = ast->states_count;
+	ast_mon->states_size = ast->states_count * sizeof(u64);
+	ast_mon->cpu = cpu;
+
+	active_stats_activate(ast->shared_ast, cpu);
+	active_stats_activate(ast, cpu);
+
+	mutex_init(&ast_mon->lock);
+
+	return ast_mon;
+
+free_tmp_view_snapshot_curr:
+	free_state_stats(ast_mon->tmp_view.curr);
+free_tmp_view_result:
+	free_state_stats(ast_mon->tmp_view.result);
+free_snapshot_prev:
+	free_state_stats(ast_mon->snapshot.prev);
+free_snapshot_curr:
+	free_state_stats(ast_mon->snapshot.curr);
+free_local:
+	free_state_stats(ast_mon->snapshot.result);
+free_ast_mon:
+	kfree(ast_mon);
+
+	return ERR_PTR(-ENOMEM);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_setup_monitor);
+
+/**
+ * active_stats_cpu_free_monitor - free Active Stats Monitor structures
+ * @ast_mon:	Active Stats Monitor pointer
+ *
+ * Free the Active Stats Monitor data structures. No return value.
+ */
+void active_stats_cpu_free_monitor(struct active_stats_monitor *ast_mon)
+{
+	if (IS_ERR_OR_NULL(ast_mon))
+		return;
+
+	active_stats_deactivate(ast_mon->ast);
+
+	free_state_stats(ast_mon->tmp_view.prev);
+	free_state_stats(ast_mon->tmp_view.curr);
+	free_state_stats(ast_mon->tmp_view.result);
+	free_state_stats(ast_mon->snapshot.prev);
+	free_state_stats(ast_mon->snapshot.curr);
+	free_state_stats(ast_mon->snapshot.result);
+	kfree(ast_mon);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_free_monitor);
+
+static int update_monitor_stats(struct active_stats_monitor *ast_mon)
+{
+	struct active_stats_state *snapshot_local, *origin_local;
+	struct active_stats_state *origin_freq, *snapshot_freq;
+	struct active_stats_state *origin_idle, *snapshot_idle;
+	unsigned long seq_freq, seq_idle;
+	struct active_stats *ast;
+	int size, cpu_in_idle;
+
+	ast = ast_mon->ast;
+	size = ast->states_size;
+
+	/*
+	 * Take a consistent snapshot of the statistics updated from other CPU
+	 * which might be changing the frequency for the whole domain.
+	 */
+	origin_freq = ast->shared_ast->snapshot.result;
+	snapshot_freq = ast_mon->tmp_view.curr;
+	do {
+		seq_freq = read_seqcount_begin(&ast->shared_ast->seqcount);
+
+		/*
+		 * Take a consistent snapshot of the statistics updated from other CPU
+		 * which might be toggling idle.
+		 */
+		origin_idle = ast->snapshot.prev;
+		snapshot_idle = ast_mon->tmp_view.prev;
+		origin_local = ast->snapshot.result;
+		snapshot_local = ast_mon->tmp_view.result;
+		do {
+			seq_idle = read_seqcount_begin(&ast->seqcount);
+
+			memcpy(snapshot_idle->residency, origin_idle->residency, size);
+			snapshot_idle->last_event_ts = origin_idle->last_event_ts;
+			snapshot_idle->last_freq_idx = origin_idle->last_freq_idx;
+
+			memcpy(snapshot_local->residency, origin_local->residency, size);
+			snapshot_local->last_event_ts = origin_local->last_event_ts;
+			snapshot_local->last_freq_idx = origin_local->last_freq_idx;
+
+			cpu_in_idle = ast->in_idle || ast->offline;
+		} while (read_seqcount_retry(&ast->seqcount, seq_idle));
+
+		/* Now take from frequency, which path is less often used */
+		memcpy(snapshot_freq->residency, origin_freq->residency, size);
+		snapshot_freq->last_event_ts = origin_freq->last_event_ts;
+		snapshot_freq->last_freq_idx = origin_freq->last_freq_idx;
+
+	} while (read_seqcount_retry(&ast->shared_ast->seqcount, seq_freq));
+
+	return cpu_in_idle;
+}
+
+/**
+ * active_stats_cpu_update_monitor - update Active Stats Monitor structures
+ * @ast_mon:	Active Stats Monitor pointer
+ *
+ * Update Active Stats Monitor statistics for a given @ast_mon. It calculates
+ * residency time for all supported performance levels when CPU was running.
+ * Return 0 for success or -EINVAL on error.
+ */
+int active_stats_cpu_update_monitor(struct active_stats_monitor *ast_mon)
+{
+	struct active_stats_state *origin, *snapshot, *old, *local, *result;
+	int last_local_freq_idx, last_new_freq_idx;
+	int size, i, cpu_in_idle;
+	struct active_stats *ast;
+	s64 total_residency = 0;
+	u64 local_last_event_ts;
+	u64 last_event_ts = 0;
+	s64 period = 0;
+	s64 diff, acc;
+	u64 curr_ts;
+
+	if (IS_ERR_OR_NULL(ast_mon))
+		return -EINVAL;
+
+	ast = ast_mon->ast;
+
+	size = ast_mon->states_size;
+	origin = ast_mon->ast->snapshot.result;
+	local = ast_mon->snapshot.result;
+
+	mutex_lock(&ast_mon->lock);
+
+	curr_ts = local_clock();
+
+	/* last_event_ts = local->last_event_ts; */
+
+	/* Use older buffer for upcoming newest data */
+	swap(ast_mon->snapshot.curr, ast_mon->snapshot.prev);
+
+	snapshot = ast_mon->snapshot.curr;
+	old = ast_mon->snapshot.prev;
+	result = ast_mon->tmp_view.result;
+
+
+	cpu_in_idle = update_monitor_stats(ast_mon);
+
+	if (!cpu_in_idle) {
+		/* Take difference since this freq is set */
+		last_event_ts = max(last_event_ts, ast_mon->tmp_view.curr->last_event_ts);
+		/* Or take difference since the idle CPU last time accounted it if it was later */
+		last_event_ts = max(last_event_ts, result->last_event_ts);
+		diff = curr_ts - last_event_ts;
+
+		local_last_event_ts = ast_mon->tmp_view.result->last_event_ts;
+		period = curr_ts - local_last_event_ts;
+
+		last_new_freq_idx = ast_mon->tmp_view.curr->last_freq_idx;
+		last_local_freq_idx = ast_mon->tmp_view.result->last_freq_idx;
+
+		diff = max(0LL, diff);
+		if (diff > 0) {
+			result->residency[last_new_freq_idx] += diff;
+			total_residency += diff;
+		}
+		/* Calculate the difference for freq snapshot with idle snapshot */
+		for (i = 0; i < ast_mon->states_count; i++) {
+
+			acc = ast_mon->tmp_view.curr->residency[i];
+			acc -= ast_mon->tmp_view.prev->residency[i];
+
+			if (last_local_freq_idx != i) {
+				result->residency[i] += acc;
+				total_residency += acc;
+			}
+		}
+
+		/* Don't account twice the same running period */
+		result->residency[last_local_freq_idx] += period - total_residency;
+	}
+
+	memcpy(snapshot->residency, result->residency, size);
+
+	/* Calculate the difference of the running time since last check */
+	for (i = 0; i < ast_mon->states_count; i++) {
+		diff = snapshot->residency[i] - old->residency[i];
+		/* Avoid CPUs local clock differences issue and set 0 */
+		local->residency[i] = diff > 0 ? diff : 0;
+	}
+
+	snapshot->last_event_ts = curr_ts;
+	local->last_event_ts = curr_ts;
+	ast_mon->local_period = snapshot->last_event_ts - old->last_event_ts;
+
+	mutex_unlock(&ast_mon->lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_update_monitor);
+
+static void get_stats_snapshot(struct active_stats *ast)
+{
+	struct active_stats_state *origin, *snapshot;
+	int size = ast->states_size;
+	unsigned long seq;
+
+	origin = ast->shared_ast->snapshot.result;
+	snapshot = ast->snapshot.curr;
+
+	/*
+	 * Take a consistent snapshot of the statistics updated from other CPU
+	 * which might be changing the frequency for the whole domain.
+	 */
+	do {
+		seq = read_seqcount_begin(&ast->shared_ast->seqcount);
+
+		memcpy(snapshot->residency, origin->residency, size);
+
+		snapshot->last_event_ts = origin->last_event_ts;
+		snapshot->last_freq_idx = origin->last_freq_idx;
+
+	} while (read_seqcount_retry(&ast->shared_ast->seqcount, seq));
+}
+
+static void
+update_local_stats(struct active_stats *ast, ktime_t event_ts)
+{
+	s64 total_residency = 0;
+	s64 diff, acc, period;
+	ktime_t prev_ts;
+	u64 prev;
+	int i;
+
+	get_stats_snapshot(ast);
+
+	prev = max(ast->snapshot.result->last_event_ts, ast->snapshot.curr->last_event_ts);
+	prev_ts = ns_to_ktime(prev);
+	diff = ktime_sub(event_ts, prev_ts);
+
+	prev_ts = ns_to_ktime(ast->snapshot.result->last_event_ts);
+	period = ktime_sub(event_ts, prev_ts);
+
+	i = ast->snapshot.curr->last_freq_idx;
+
+	diff = max(0LL, diff);
+	if (diff > 0) {
+		ast->snapshot.result->residency[i] += diff;
+		total_residency += diff;
+	}
+
+	for (i = 0; i < ast->states_count; i++) {
+		if (ast->snapshot.result->last_freq_idx == i)
+			continue;
+
+		acc = ast->snapshot.curr->residency[i];
+		acc -= ast->snapshot.prev->residency[i];
+		ast->snapshot.result->residency[i] += acc;
+		total_residency += acc;
+	}
+
+	/* Don't account twice the same running period */
+	i = ast->snapshot.result->last_freq_idx;
+	ast->snapshot.result->residency[i] += period - total_residency;
+
+	ast->snapshot.result->last_freq_idx = ast->snapshot.curr->last_freq_idx;
+
+	ast->snapshot.prev->last_freq_idx = ast->snapshot.curr->last_freq_idx;
+	ast->snapshot.prev->last_event_ts = ast->snapshot.curr->last_event_ts;
+
+	swap(ast->snapshot.curr, ast->snapshot.prev);
+
+	ast->snapshot.result->last_event_ts = event_ts;
+}
+
+static inline
+void _active_stats_cpu_idle_enter(struct active_stats *ast, ktime_t enter_ts)
+{
+	write_seqcount_begin(&ast->seqcount);
+
+	update_local_stats(ast, enter_ts);
+	ast->in_idle = true;
+
+	write_seqcount_end(&ast->seqcount);
+}
+
+static inline
+void _active_stats_cpu_idle_exit(struct active_stats *ast, ktime_t time_end)
+{
+	int size = ast->states_size;
+
+	write_seqcount_begin(&ast->seqcount);
+
+	get_stats_snapshot(ast);
+
+	ast->snapshot.result->last_freq_idx = ast->snapshot.curr->last_freq_idx;
+
+	memcpy(ast->snapshot.prev->residency, ast->snapshot.curr->residency, size);
+	ast->snapshot.prev->last_freq_idx = ast->snapshot.curr->last_freq_idx;
+	ast->snapshot.prev->last_event_ts = ast->snapshot.curr->last_event_ts;
+
+	swap(ast->snapshot.curr, ast->snapshot.prev);
+
+	ast->snapshot.result->last_event_ts = time_end;
+	ast->in_idle = false;
+
+	write_seqcount_end(&ast->seqcount);
+}
+
+/**
+ * active_stats_cpu_idle_enter - update Active Stats idle tracking data
+ * @enter_ts:	the time stamp with idle enter value
+ *
+ * Update the maintained statistics for entering idle for a given CPU.
+ * No return value.
+ */
+void active_stats_cpu_idle_enter(ktime_t enter_ts)
+{
+	struct active_stats *ast;
+
+	ast = per_cpu(ast_local, raw_smp_processor_id());
+	if (!ast || !READ_ONCE(ast->activated))
+		return;
+
+	_active_stats_cpu_idle_enter(ast, enter_ts);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_idle_enter);
+
+/**
+ * active_stats_cpu_idle_exit - update Active Stats idle tracking data
+ * @time_end:	the time stamp with idle exit value
+ *
+ * Update the maintained statistics for exiting idle for a given CPU.
+ * No return value.
+ */
+void active_stats_cpu_idle_exit(ktime_t time_end)
+{
+	struct active_stats *ast;
+
+	ast = per_cpu(ast_local, raw_smp_processor_id());
+	if (!ast || !READ_ONCE(ast->activated))
+		return;
+
+	_active_stats_cpu_idle_exit(ast, time_end);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_idle_exit);
+
+static void _active_stats_cpu_freq_change(struct active_stats *shared_ast,
+					  unsigned int freq, u64 ts)
+{
+	int prev_idx, next_idx;
+	s64 time_diff;
+
+	next_idx = get_freq_index(shared_ast, freq);
+	if (next_idx < 0)
+		return;
+
+	write_seqcount_begin(&shared_ast->seqcount);
+
+	/* The value in prev_idx is always a valid array index */
+	prev_idx = shared_ast->snapshot.result->last_freq_idx;
+
+	time_diff = ts - shared_ast->snapshot.result->last_event_ts;
+
+	/* Avoid jitter from different CPUs local clock */
+	if (time_diff > 0)
+		shared_ast->snapshot.result->residency[prev_idx] += time_diff;
+
+	shared_ast->snapshot.result->last_freq_idx = next_idx;
+	shared_ast->snapshot.result->last_event_ts = ts;
+
+	write_seqcount_end(&shared_ast->seqcount);
+}
+
+/**
+ * active_stats_cpu_freq_fast_change - update Active Stats tracking data
+ * @cpu:	the CPU id belonging to frequency domain which is updated
+ * @freq:	new frequency value
+ *
+ * Update the maintained statistics for frequency change for a given CPU's
+ * frequency domain. This function must be used only in the fast switch code
+ * path. No return value.
+ */
+void active_stats_cpu_freq_fast_change(int cpu, unsigned int freq)
+{
+	struct active_stats *ast;
+	u64 ts;
+
+	ast = per_cpu(ast_local, cpu);
+	if (!ast || !READ_ONCE(ast->activated))
+		return;
+
+	ts = local_clock();
+	_active_stats_cpu_freq_change(ast->shared_ast, freq, ts);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_freq_fast_change);
+
+/**
+ * active_stats_cpu_freq_change - update Active Stats tracking data
+ * @cpu:	the CPU id belonging to frequency domain which is updated
+ * @freq:	new frequency value
+ *
+ * Update the maintained statistics for frequency change for a given CPU's
+ * frequency domain. This function must not be used in the fast switch code
+ * path. No return value.
+ */
+void active_stats_cpu_freq_change(int cpu, unsigned int freq)
+{
+	struct active_stats *ast, *shared_ast;
+	unsigned long flags;
+	u64 ts;
+
+	ast = per_cpu(ast_local, cpu);
+	if (!ast || !READ_ONCE(ast->activated))
+		return;
+
+	shared_ast = ast->shared_ast;
+	ts = local_clock();
+
+	spin_lock_irqsave(&shared_ast->lock, flags);
+	_active_stats_cpu_freq_change(shared_ast, freq, ts);
+	spin_unlock_irqrestore(&shared_ast->lock, flags);
+}
+EXPORT_SYMBOL_GPL(active_stats_cpu_freq_change);
+
+static struct active_stats_state *alloc_state_stats(int count)
+{
+	struct active_stats_state *stats;
+
+	stats = kzalloc(sizeof(*stats), GFP_KERNEL);
+	if (!stats)
+		return NULL;
+
+	stats->residency = kcalloc(count, sizeof(u64), GFP_KERNEL);
+	if (!stats->residency)
+		goto free_stats;
+
+	return stats;
+
+free_stats:
+	kfree(stats);
+
+	return NULL;
+}
+
+static struct active_stats *
+active_stats_setup(int cpu, int nr_opp, struct active_stats *shared_ast)
+{
+	struct active_stats *ast;
+	struct device *cpu_dev;
+	struct dev_pm_opp *opp;
+	unsigned long rate;
+	int i;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev) {
+		pr_err("%s: too early to get CPU%d device!\n",
+		       __func__, cpu);
+		return NULL;
+	}
+
+	ast = kzalloc(sizeof(*ast), GFP_KERNEL);
+	if (!ast)
+		return NULL;
+
+	ast->states_count = nr_opp;
+	ast->states_size = ast->states_count * sizeof(u64);
+	ast->in_idle = true;
+
+	ast->snapshot.result = alloc_state_stats(nr_opp);
+	if (!ast->snapshot.result)
+		goto free_ast;
+
+	if (!shared_ast) {
+		ast->freq = kcalloc(nr_opp, sizeof(unsigned long), GFP_KERNEL);
+		if (!ast->freq)
+			goto free_ast_local;
+
+		for (i = 0, rate = 0; i < nr_opp; i++, rate++) {
+			opp = dev_pm_opp_find_freq_ceil(cpu_dev, &rate);
+			if (IS_ERR(opp)) {
+				dev_warn(cpu_dev, "reading an OPP failed\n");
+				kfree(ast->freq);
+				goto free_ast_local;
+			}
+			dev_pm_opp_put(opp);
+
+			ast->freq[i] = rate / 1000;
+		}
+
+		/* Frequency isn't known at this point */
+		ast->snapshot.result->last_freq_idx = nr_opp - 1;
+	} else {
+		ast->freq = shared_ast->freq;
+
+		ast->snapshot.curr = alloc_state_stats(nr_opp);
+		if (!ast->snapshot.curr)
+			goto free_ast_local;
+
+		ast->snapshot.prev = alloc_state_stats(nr_opp);
+		if (!ast->snapshot.prev)
+			goto free_ast_snapshot;
+
+		ast->snapshot.curr->last_freq_idx = nr_opp - 1;
+		ast->snapshot.prev->last_freq_idx = nr_opp - 1;
+
+		ast->snapshot.result->last_freq_idx = nr_opp - 1;
+	}
+
+	mutex_init(&ast->activation_lock);
+	spin_lock_init(&ast->lock);
+	seqcount_init(&ast->seqcount);
+
+	return ast;
+
+free_ast_snapshot:
+	free_state_stats(ast->snapshot.curr);
+free_ast_local:
+	free_state_stats(ast->snapshot.result);
+free_ast:
+	kfree(ast);
+
+	return NULL;
+}
+
+static void active_stats_cleanup(struct active_stats *ast)
+{
+	free_state_stats(ast->snapshot.prev);
+	free_state_stats(ast->snapshot.curr);
+	free_state_stats(ast->snapshot.result);
+	kfree(ast);
+}
+
+static void active_stats_init(struct cpufreq_policy *policy)
+{
+	struct active_stats *ast, *shared_ast;
+	cpumask_var_t setup_cpus;
+	struct device *cpu_dev;
+	int nr_opp, cpu;
+
+	cpu = policy->cpu;
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev) {
+		pr_err("%s: too early to get CPU%d device!\n",
+		       __func__, cpu);
+		return;
+	}
+
+	nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
+	if (nr_opp <= 0) {
+		dev_warn(cpu_dev, "OPP table is not ready\n");
+		return;
+	}
+
+	if (!alloc_cpumask_var(&setup_cpus, GFP_KERNEL)) {
+		dev_warn(cpu_dev, "cpumask alloc failed\n");
+		return;
+	}
+
+	shared_ast = active_stats_setup(cpu, nr_opp, NULL);
+	if (!shared_ast) {
+		free_cpumask_var(setup_cpus);
+		dev_warn(cpu_dev, "failed to setup shared_ast properly\n");
+		return;
+	}
+
+	for_each_cpu(cpu, policy->related_cpus) {
+		ast = active_stats_setup(cpu, nr_opp, shared_ast);
+		if (!ast) {
+			dev_warn(cpu_dev, "failed to setup stats properly\n");
+			goto cpus_cleanup;
+		}
+
+		ast->shared_ast = shared_ast;
+		per_cpu(ast_local, cpu) = ast;
+
+		active_stats_debug_init(cpu);
+
+		cpumask_set_cpu(cpu, setup_cpus);
+	}
+
+	free_cpumask_var(setup_cpus);
+
+	dev_info(cpu_dev, "Active Stats created\n");
+	return;
+
+cpus_cleanup:
+	for_each_cpu(cpu, setup_cpus) {
+		active_stats_debug_remove(cpu);
+
+		ast = per_cpu(ast_local, cpu);
+		per_cpu(ast_local, cpu) = NULL;
+
+		active_stats_cleanup(ast);
+	}
+
+	free_cpumask_var(setup_cpus);
+	kfree(ast->freq);
+
+	active_stats_cleanup(shared_ast);
+}
+
+static int
+active_stats_init_callback(struct notifier_block *nb,
+			   unsigned long val, void *data)
+{
+	struct cpufreq_policy *policy = data;
+
+	if (val != CPUFREQ_CREATE_POLICY)
+		return 0;
+
+	cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
+
+	active_stats_init(policy);
+
+	if (cpumask_empty(cpus_to_visit))
+		schedule_work(&processing_done_work);
+
+	return 0;
+}
+
+static struct notifier_block active_stats_init_notifier = {
+	.notifier_call = active_stats_init_callback,
+};
+
+static void processing_done_fn(struct work_struct *work)
+{
+	cpufreq_unregister_notifier(&active_stats_init_notifier,
+				    CPUFREQ_POLICY_NOTIFIER);
+	free_cpumask_var(cpus_to_visit);
+}
+
+static int cpuhp_active_stats_cpu_offline(unsigned int cpu)
+{
+	struct active_stats *ast;
+
+	ast = per_cpu(ast_local, cpu);
+	if (!ast)
+		return 0;
+
+	ast->offline = true;
+
+	if (!READ_ONCE(ast->activated))
+		return 0;
+
+	_active_stats_cpu_idle_enter(ast, local_clock());
+
+	return 0;
+}
+
+static int cpuhp_active_stats_cpu_online(unsigned int cpu)
+{
+	struct active_stats *ast;
+
+	ast = per_cpu(ast_local, cpu);
+	if (!ast)
+		return 0;
+
+	ast->offline = false;
+
+	if (!READ_ONCE(ast->activated))
+		return 0;
+
+	_active_stats_cpu_idle_exit(ast, local_clock());
+
+	return 0;
+}
+
+static int __init active_stats_register_notifier(void)
+{
+	int ret;
+
+	if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_copy(cpus_to_visit, cpu_possible_mask);
+
+	ret = cpufreq_register_notifier(&active_stats_init_notifier,
+					CPUFREQ_POLICY_NOTIFIER);
+
+	if (ret)
+		free_cpumask_var(cpus_to_visit);
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				"active_stats_cpu:online",
+				cpuhp_active_stats_cpu_online,
+				cpuhp_active_stats_cpu_offline);
+
+	return ret;
+}
+fs_initcall(active_stats_register_notifier);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 2/6] cpuidle: Add Active Stats calls tracking idle entry/exit
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 1/6] PM: Introduce Active Stats framework Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 3/6] cpufreq: Add Active Stats calls tracking frequency changes Lukasz Luba
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

The Active Stats framework tracks and accounts the activity of the CPU
for each performance level. It accounts the real residency, when the CPU
was not idle, at a given performance level. This patch adds needed calls
which provide the CPU idle entry/exit events to the Active Stats
framework.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpuidle/cpuidle.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef2ea1b12cd8..24a33c6c4a62 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -8,6 +8,7 @@
  * This code is licenced under the GPL.
  */
 
+#include <linux/active_stats.h>
 #include <linux/clockchips.h>
 #include <linux/kernel.h>
 #include <linux/mutex.h>
@@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	trace_cpu_idle(index, dev->cpu);
 	time_start = ns_to_ktime(local_clock());
 
+	active_stats_cpu_idle_enter(time_start);
+
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_enter();
@@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
+	active_stats_cpu_idle_exit(time_end);
+
 	/* The cpu is no longer idle or about to enter idle. */
 	sched_idle_set_state(NULL);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 3/6] cpufreq: Add Active Stats calls tracking frequency changes
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 1/6] PM: Introduce Active Stats framework Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 2/6] cpuidle: Add Active Stats calls tracking idle entry/exit Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 4/6] thermal: Add interface to cooling devices to handle governor change Lukasz Luba
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

The Active Stats framework tracks and accounts the activity of the CPU
for each performance level. It accounts the real residency, when the CPU
was not idle, at a given performance level. This patch adds needed calls
which provide the CPU frequency transition events to the Active Stats
framework.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/cpufreq.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 802abc925b2a..d79cb9310572 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -14,6 +14,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/active_stats.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/cpu_cooling.h>
@@ -387,6 +388,8 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
 
 		cpufreq_stats_record_transition(policy, freqs->new);
 		policy->cur = freqs->new;
+
+		active_stats_cpu_freq_change(policy->cpu, freqs->new);
 	}
 }
 
@@ -2085,6 +2088,8 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 			    policy->cpuinfo.max_freq);
 	cpufreq_stats_record_transition(policy, freq);
 
+	active_stats_cpu_freq_fast_change(policy->cpu, freq);
+
 	if (trace_cpu_frequency_enabled()) {
 		for_each_cpu(cpu, policy->cpus)
 			trace_cpu_frequency(freq, cpu);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 4/6] thermal: Add interface to cooling devices to handle governor change
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
                   ` (2 preceding siblings ...)
  2021-07-06 13:18 ` [RFC PATCH v2 3/6] cpufreq: Add Active Stats calls tracking frequency changes Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 5/6] thermal/core/power allocator: Prepare power actors and calm down when not used Lukasz Luba
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

Add interface to cooling devices to handle governor change, to better
setup internal needed structures or to cleanup them. This interface is
going to be used by thermal governor Intelligent Power Allocation (IPA).
which requires to setup monitoring mechanism in the cpufreq cooling
devices.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/thermal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index d296f3b88fb9..79c20daaf287 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -87,6 +87,7 @@ struct thermal_cooling_device_ops {
 	int (*get_requested_power)(struct thermal_cooling_device *, u32 *);
 	int (*state2power)(struct thermal_cooling_device *, unsigned long, u32 *);
 	int (*power2state)(struct thermal_cooling_device *, u32, unsigned long *);
+	int (*change_governor)(struct thermal_cooling_device *cdev, bool set);
 };
 
 struct thermal_cooling_device {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 5/6] thermal/core/power allocator: Prepare power actors and calm down when not used
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
                   ` (3 preceding siblings ...)
  2021-07-06 13:18 ` [RFC PATCH v2 4/6] thermal: Add interface to cooling devices to handle governor change Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 13:18 ` [RFC PATCH v2 6/6] thermal: cpufreq_cooling: Improve power estimation based on Active Stats framework Lukasz Luba
  2021-07-06 15:28 ` [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Rafael J. Wysocki
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

The cooling devices in thermal zone can support an interface for
preparation to work with thermal governor. They should be properly setup
before the first throttling happens, which means the internal power
tracking mechanism should be ready. When the IPA is not used or thermal
zone is disabled the power tracking can be stopped. Thus, add the code
which handles cooling device proper setup for the operation with IPA.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/gov_power_allocator.c | 71 +++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 13e375751d22..678fb544c8af 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -53,6 +53,8 @@ static inline s64 div_frac(s64 x, s64 y)
  * struct power_allocator_params - parameters for the power allocator governor
  * @allocated_tzp:	whether we have allocated tzp for this thermal zone and
  *			it needs to be freed on unbind
+ * @actors_ready:	set to 1 when power actors are properly setup or set to
+ *			-EINVAL when there were errors during preparation
  * @err_integral:	accumulated error in the PID controller.
  * @prev_err:	error in the previous iteration of the PID controller.
  *		Used to calculate the derivative term.
@@ -68,6 +70,7 @@ static inline s64 div_frac(s64 x, s64 y)
  */
 struct power_allocator_params {
 	bool allocated_tzp;
+	int actors_ready;
 	s64 err_integral;
 	s32 prev_err;
 	int trip_switch_on;
@@ -693,9 +696,20 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
 static void power_allocator_unbind(struct thermal_zone_device *tz)
 {
 	struct power_allocator_params *params = tz->governor_data;
+	struct thermal_instance *instance;
 
 	dev_dbg(&tz->device, "Unbinding from thermal zone %d\n", tz->id);
 
+	/* Calm down cooling devices and stop monitoring mechanims */
+	list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+		struct thermal_cooling_device *cdev = instance->cdev;
+
+		if (!cdev_is_power_actor(cdev))
+			continue;
+		if (cdev->ops->change_governor)
+			cdev->ops->change_governor(cdev, false);
+	}
+
 	if (params->allocated_tzp) {
 		kfree(tz->tzp);
 		tz->tzp = NULL;
@@ -705,6 +719,51 @@ static void power_allocator_unbind(struct thermal_zone_device *tz)
 	tz->governor_data = NULL;
 }
 
+static int prepare_power_actors(struct thermal_zone_device *tz)
+{
+	struct power_allocator_params *params = tz->governor_data;
+	struct thermal_instance *instance;
+	int ret;
+
+	if (params->actors_ready > 0)
+		return 0;
+
+	if (params->actors_ready < 0)
+		return -EINVAL;
+
+	mutex_lock(&tz->lock);
+
+	list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+		struct thermal_cooling_device *cdev = instance->cdev;
+
+		if (!cdev_is_power_actor(cdev))
+			continue;
+		if (cdev->ops->change_governor) {
+			ret = cdev->ops->change_governor(cdev, true);
+			if (ret)
+				goto clean_up;
+		}
+	}
+
+	mutex_unlock(&tz->lock);
+	params->actors_ready = 1;
+	return 0;
+
+clean_up:
+	list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+		struct thermal_cooling_device *cdev = instance->cdev;
+
+		if (!cdev_is_power_actor(cdev))
+			continue;
+		if (cdev->ops->change_governor)
+			cdev->ops->change_governor(cdev, false);
+	}
+
+	mutex_unlock(&tz->lock);
+	params->actors_ready = -EINVAL;
+	return -EINVAL;
+}
+
 static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
 {
 	int ret;
@@ -719,6 +778,18 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
 	if (trip != params->trip_max_desired_temperature)
 		return 0;
 
+	/*
+	 * If we are called for the first time (e.g. after enabling thermal
+	 * zone), setup properly power actors
+	 */
+	ret = prepare_power_actors(tz);
+	if (ret) {
+		dev_warn_once(&tz->device,
+			      "Failed to setup IPA power actors: %d\n",
+			      ret);
+		return ret;
+	}
+
 	ret = tz->ops->get_trip_temp(tz, params->trip_switch_on,
 				     &switch_on_temp);
 	if (!ret && (tz->temperature < switch_on_temp)) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v2 6/6] thermal: cpufreq_cooling: Improve power estimation based on Active Stats framework
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
                   ` (4 preceding siblings ...)
  2021-07-06 13:18 ` [RFC PATCH v2 5/6] thermal/core/power allocator: Prepare power actors and calm down when not used Lukasz Luba
@ 2021-07-06 13:18 ` Lukasz Luba
  2021-07-06 15:28 ` [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Rafael J. Wysocki
  6 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 13:18 UTC (permalink / raw)
  To: linux-kernel, daniel.lezcano
  Cc: linux-pm, amitk, rui.zhang, lukasz.luba, dietmar.eggemann,
	Chris.Redpath, Beata.Michalska, viresh.kumar, rjw, amit.kachhap

The cpufreq_cooling has dedicated APIs for thermal governor called
Intelligent Power Allocation (IPA). IPA needs the CPUs power used by the
devices in the past.  Based on this, IPA tries to estimate the power
budget, allocate new budget and split it across cooling devices for the
next period (keeping the system in the thermal envelope).  When the input
power estimated value has big error, the whole mechanism does not work
properly. The current power estimation assumes constant frequency during
the whole IPA period (e.g. 100ms). This can cause big error in the power
estimation, especially when SchedUtil governor is used and frequency is
often adjusted to the current need. This can be visible in periodic
workloads, when the frequency oscillates between two OPPs and IPA samples
the lower frequency.

This patch introduces a new mechanism which solves this frequency
sampling problem. It uses Active Stats framework to track and account
the CPU power used for a given IPA period.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/cpufreq_cooling.c | 132 ++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index eeb4e4b76c0b..90dca20e458a 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -10,6 +10,7 @@
  *		Viresh Kumar <viresh.kumar@linaro.org>
  *
  */
+#include <linux/active_stats.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/cpu_cooling.h>
@@ -61,6 +62,7 @@ struct time_in_idle {
  * @policy: cpufreq policy.
  * @idle_time: idle time stats
  * @qos_req: PM QoS contraint to apply
+ * @ast_mon: Active Stats Monitor array of pointers
  *
  * This structure is required for keeping information of each registered
  * cpufreq_cooling_device.
@@ -75,6 +77,9 @@ struct cpufreq_cooling_device {
 	struct time_in_idle *idle_time;
 #endif
 	struct freq_qos_request qos_req;
+#ifdef CONFIG_ACTIVE_STATS
+	struct active_stats_monitor **ast_mon;
+#endif
 };
 
 #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
@@ -124,6 +129,107 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
 	return cpufreq_cdev->em->table[i].frequency;
 }
 
+#ifdef CONFIG_ACTIVE_STATS
+static u32 account_cpu_power(struct active_stats_monitor *ast_mon,
+			     struct em_perf_domain *em)
+{
+	u64 single_power, residency, total_time;
+	struct active_stats_state *result;
+	u32 power = 0;
+	int i;
+
+	mutex_lock(&ast_mon->lock);
+	result = ast_mon->snapshot.result;
+	total_time = ast_mon->local_period;
+
+	for (i = 0; i < ast_mon->states_count; i++) {
+		residency = result->residency[i];
+		single_power = em->table[i].power * residency;
+		single_power = div64_u64(single_power, total_time);
+		power += (u32)single_power;
+	}
+
+	mutex_unlock(&ast_mon->lock);
+
+	return power;
+}
+
+static u32 get_power_est(struct cpufreq_cooling_device *cdev)
+{
+	int num_cpus, ret, i;
+	u32 total_power = 0;
+
+	num_cpus = cpumask_weight(cdev->policy->related_cpus);
+
+	for (i = 0; i < num_cpus; i++) {
+		ret = active_stats_cpu_update_monitor(cdev->ast_mon[i]);
+		if (ret)
+			return 0;
+
+		total_power += account_cpu_power(cdev->ast_mon[i], cdev->em);
+	}
+
+	return total_power;
+}
+
+static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
+				       u32 *power)
+{
+	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+	struct cpufreq_policy *policy = cpufreq_cdev->policy;
+
+	*power = get_power_est(cpufreq_cdev);
+
+	trace_thermal_power_cpu_get_power(policy->related_cpus, 0, 0, 0,
+					  *power);
+
+	return 0;
+}
+
+static void clean_cpu_monitoring(struct cpufreq_cooling_device *cdev)
+{
+	int num_cpus, i;
+
+	if (!cdev->ast_mon)
+		return;
+
+	num_cpus = cpumask_weight(cdev->policy->related_cpus);
+
+	for (i = 0; i < num_cpus; i++)
+		active_stats_cpu_free_monitor(cdev->ast_mon[i++]);
+
+	kfree(cdev->ast_mon);
+	cdev->ast_mon = NULL;
+}
+
+static int setup_cpu_monitoring(struct cpufreq_cooling_device *cdev)
+{
+	int cpu, cpus, i = 0;
+
+	if (cdev->ast_mon)
+		return 0;
+
+	cpus = cpumask_weight(cdev->policy->related_cpus);
+
+	cdev->ast_mon = kcalloc(cpus, sizeof(struct active_stats_monitor *),
+				GFP_KERNEL);
+	if (!cdev->ast_mon)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, cdev->policy->related_cpus) {
+		cdev->ast_mon[i] = active_stats_cpu_setup_monitor(cpu);
+		if (IS_ERR_OR_NULL(cdev->ast_mon[i++]))
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	clean_cpu_monitoring(cdev);
+	return -EINVAL;
+}
+#else
+
 /**
  * get_load() - get load for a cpu
  * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
@@ -184,6 +290,15 @@ static u32 get_dynamic_power(struct cpufreq_cooling_device *cpufreq_cdev,
 	return (raw_cpu_power * cpufreq_cdev->last_load) / 100;
 }
 
+static void clean_cpu_monitoring(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+}
+
+static int setup_cpu_monitoring(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+	return 0;
+}
+
 /**
  * cpufreq_get_requested_power() - get the current power
  * @cdev:	&thermal_cooling_device pointer
@@ -252,6 +367,7 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
 
 	return 0;
 }
+#endif
 
 /**
  * cpufreq_state2power() - convert a cpu cdev state to power consumed
@@ -323,6 +439,20 @@ static int cpufreq_power2state(struct thermal_cooling_device *cdev,
 	return 0;
 }
 
+static int cpufreq_change_governor(struct thermal_cooling_device *cdev,
+				   bool governor_up)
+{
+	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+	int ret = 0;
+
+	if (governor_up)
+		ret = setup_cpu_monitoring(cpufreq_cdev);
+	else
+		clean_cpu_monitoring(cpufreq_cdev);
+
+	return ret;
+}
+
 static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
 			      struct em_perf_domain *em) {
 	struct cpufreq_policy *policy;
@@ -566,6 +696,7 @@ __cpufreq_cooling_register(struct device_node *np,
 		cooling_ops->get_requested_power = cpufreq_get_requested_power;
 		cooling_ops->state2power = cpufreq_state2power;
 		cooling_ops->power2state = cpufreq_power2state;
+		cooling_ops->change_governor = cpufreq_change_governor;
 	} else
 #endif
 	if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED) {
@@ -690,6 +821,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
 
 	thermal_cooling_device_unregister(cdev);
 	freq_qos_remove_request(&cpufreq_cdev->qos_req);
+	clean_cpu_monitoring(cpufreq_cdev);
 	free_idle_time(cpufreq_cdev);
 	kfree(cpufreq_cdev);
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
                   ` (5 preceding siblings ...)
  2021-07-06 13:18 ` [RFC PATCH v2 6/6] thermal: cpufreq_cooling: Improve power estimation based on Active Stats framework Lukasz Luba
@ 2021-07-06 15:28 ` Rafael J. Wysocki
  2021-07-06 15:56   ` Lukasz Luba
  6 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-07-06 15:28 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Linux Kernel Mailing List, Daniel Lezcano, Linux PM,
	Amit Kucheria, Zhang, Rui, Dietmar Eggemann, Chris Redpath,
	Beata.Michalska, Viresh Kumar, Rafael J. Wysocki, Amit Kachhap

On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi all,
>
> This patch set introduces a new mechanism: Active Stats framework (ASF), which
> gathers and maintains statistics of CPU performance - time residency at each
> performance state.
>
> The ASF tracks all the frequency transitions as well as CPU
> idle entry/exit events for all CPUs. Based on that it accounts the active
> (non-idle) residency time for each CPU at each frequency. This information can
> be used by some other subsystems (like thermal governor) to enhance their
> estimations about CPU usage at a given period.

This seems to mean that what is needed is something like the cpufreq
stats but only collected during the time when CPUs are not in idle
states.

> Does it fix something in mainline?
> Yes, there is thermal governor Intelligent Power Allocation (IPA), which
> estimates the CPUs power used in the past. IPA is sampling the CPU utilization
> and frequency and relies on the info available at the time of sampling
> and this imposes the estimation errors.
> The use of ASF solve the issue and enables IPA to make better estimates.

Obviously the IPA is not used on all platforms where cpufreq and
cpuidle are used.  What platforms are going to benefit from this
change?

> Why it couldn't be done using existing frameworks?
> The CPUFreq and CPUIdle statistics are not combined, so it is not possible
> to derive the information on how long exactly the CPU was running with a given
> frequency.

But it doesn't mean that the statistics could not be combined.

For instance, the frequency of the CPU cannot change via cpufreq when
active_stats_cpu_idle_enter() is running, so instead of using an
entirely new framework for collecting statistics it might update the
existing cpufreq stats to register that event.

And analogously for the wakeup.

> This new framework combines that information and provides
> it in a handy way.

I'm not convinced about the last piece.

> IMHO it has to be implemented as a new framework, next to
> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
> governor into the frequency change and idle code paths.

As far as the design is concerned, I'm not sure if I agree with it.

From my perspective it's all a 1000-line patch that I have to read and
understand to figure out what the design is.

> Tha patch 4/6 introduces a new API for cooling devices, which allows to
> stop tracking the freq and idle statistics.
>
> The patch set contains also a patches 5/6 6/6 which adds the new power model
> based on ASF into the cpufreq cooling (used by thermal governor IPA).
> It is added as ifdef option, since Active Stats might be not compiled in.
> The ASF is a compile time option, but that might be changed and IPA could
> select it, which would allow to remove some redundant code from
> cpufreq_cooling.c.
>
> Comments and suggestions are very welcome.

I'm totally not convinced that it is necessary to put the extra 1000
lines of code into the kernel to address the problem at hand.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-06 15:28 ` [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Rafael J. Wysocki
@ 2021-07-06 15:56   ` Lukasz Luba
  2021-07-06 16:34     ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 15:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Daniel Lezcano, Linux PM,
	Amit Kucheria, Zhang, Rui, Dietmar Eggemann, Chris Redpath,
	Beata.Michalska, Viresh Kumar, Rafael J. Wysocki, Amit Kachhap



On 7/6/21 4:28 PM, Rafael J. Wysocki wrote:
> On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> Hi all,
>>
>> This patch set introduces a new mechanism: Active Stats framework (ASF), which
>> gathers and maintains statistics of CPU performance - time residency at each
>> performance state.
>>
>> The ASF tracks all the frequency transitions as well as CPU
>> idle entry/exit events for all CPUs. Based on that it accounts the active
>> (non-idle) residency time for each CPU at each frequency. This information can
>> be used by some other subsystems (like thermal governor) to enhance their
>> estimations about CPU usage at a given period.
> 
> This seems to mean that what is needed is something like the cpufreq
> stats but only collected during the time when CPUs are not in idle
> states.

Yes

> 
>> Does it fix something in mainline?
>> Yes, there is thermal governor Intelligent Power Allocation (IPA), which
>> estimates the CPUs power used in the past. IPA is sampling the CPU utilization
>> and frequency and relies on the info available at the time of sampling
>> and this imposes the estimation errors.
>> The use of ASF solve the issue and enables IPA to make better estimates.
> 
> Obviously the IPA is not used on all platforms where cpufreq and
> cpuidle are used.  What platforms are going to benefit from this
> change?

Arm platforms which still use kernel thermal to control temperature,
such as Chromebooks or mid-, low-end phones.

> 
>> Why it couldn't be done using existing frameworks?
>> The CPUFreq and CPUIdle statistics are not combined, so it is not possible
>> to derive the information on how long exactly the CPU was running with a given
>> frequency.
> 
> But it doesn't mean that the statistics could not be combined.
> 
> For instance, the frequency of the CPU cannot change via cpufreq when
> active_stats_cpu_idle_enter() is running, so instead of using an
> entirely new framework for collecting statistics it might update the
> existing cpufreq stats to register that event.

True, but keep in mind that the cpufreq most likely works for a few
CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion.
I would say that cpuidle should check during enter/exit what is
the currently set frequency for cluster and account its active
period.

> 
> And analogously for the wakeup.
> 
>> This new framework combines that information and provides
>> it in a handy way.
> 
> I'm not convinced about the last piece.

The handy structure is called Active Stats Monitor. It samples
the stats gathered after processing idle. That private
structure maintains statistics which are for a given period
(current snapshot - previous snapshot).

> 
>> IMHO it has to be implemented as a new framework, next to
>> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
>> governor into the frequency change and idle code paths.
> 
> As far as the design is concerned, I'm not sure if I agree with it.
> 
>  From my perspective it's all a 1000-line patch that I have to read and
> understand to figure out what the design is.

I can help you with understanding it with some design docs if you want.

> 
>> Tha patch 4/6 introduces a new API for cooling devices, which allows to
>> stop tracking the freq and idle statistics.
>>
>> The patch set contains also a patches 5/6 6/6 which adds the new power model
>> based on ASF into the cpufreq cooling (used by thermal governor IPA).
>> It is added as ifdef option, since Active Stats might be not compiled in.
>> The ASF is a compile time option, but that might be changed and IPA could
>> select it, which would allow to remove some redundant code from
>> cpufreq_cooling.c.
>>
>> Comments and suggestions are very welcome.
> 
> I'm totally not convinced that it is necessary to put the extra 1000
> lines of code into the kernel to address the problem at hand.
> 

I understand your concerns. If you have another idea than this framework
I'm happy to hear it. Maybe better stats in cpuidle, which would be
are of the cpufreq?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-06 15:56   ` Lukasz Luba
@ 2021-07-06 16:34     ` Rafael J. Wysocki
  2021-07-06 16:51       ` Lukasz Luba
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-07-06 16:34 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Daniel Lezcano,
	Linux PM, Amit Kucheria, Zhang, Rui, Dietmar Eggemann,
	Chris Redpath, Beata.Michalska, Viresh Kumar, Rafael J. Wysocki,
	Amit Kachhap

On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> On 7/6/21 4:28 PM, Rafael J. Wysocki wrote:
> > On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> Hi all,
> >>
> >> This patch set introduces a new mechanism: Active Stats framework (ASF), which
> >> gathers and maintains statistics of CPU performance - time residency at each
> >> performance state.
> >>
> >> The ASF tracks all the frequency transitions as well as CPU
> >> idle entry/exit events for all CPUs. Based on that it accounts the active
> >> (non-idle) residency time for each CPU at each frequency. This information can
> >> be used by some other subsystems (like thermal governor) to enhance their
> >> estimations about CPU usage at a given period.
> >
> > This seems to mean that what is needed is something like the cpufreq
> > stats but only collected during the time when CPUs are not in idle
> > states.
>
> Yes

So this is a clear problem statement: the cpufreq statistics cover the
time when CPUs are in idle states, so they are not suitable for
certain purposes, like thermal control.

The most straightforward approach to address it seems to be to modify
the collection of cpufreq statistics so that they don't include the
time spent by CPUs in idle states, or to make it possible to
distinguish the time spent in idle states from the "active" time.

> >> Does it fix something in mainline?
> >> Yes, there is thermal governor Intelligent Power Allocation (IPA), which
> >> estimates the CPUs power used in the past. IPA is sampling the CPU utilization
> >> and frequency and relies on the info available at the time of sampling
> >> and this imposes the estimation errors.
> >> The use of ASF solve the issue and enables IPA to make better estimates.
> >
> > Obviously the IPA is not used on all platforms where cpufreq and
> > cpuidle are used.  What platforms are going to benefit from this
> > change?
>
> Arm platforms which still use kernel thermal to control temperature,
> such as Chromebooks or mid-, low-end phones.

Which means that this feature is not going to be universally useful.

However, if the time spent by CPUs in idle states were accounted for
in the cpufreq statistics, that would be universally useful.

> >
> >> Why it couldn't be done using existing frameworks?
> >> The CPUFreq and CPUIdle statistics are not combined, so it is not possible
> >> to derive the information on how long exactly the CPU was running with a given
> >> frequency.
> >
> > But it doesn't mean that the statistics could not be combined.
> >
> > For instance, the frequency of the CPU cannot change via cpufreq when
> > active_stats_cpu_idle_enter() is running, so instead of using an
> > entirely new framework for collecting statistics it might update the
> > existing cpufreq stats to register that event.
>
> True, but keep in mind that the cpufreq most likely works for a few
> CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion.
> I would say that cpuidle should check during enter/exit what is
> the currently set frequency for cluster and account its active
> period.

Yes, that's not particularly difficult to achieve in principle: on
idle entry and exit, update the cpufreq statistics of the policy
including the current CPU.

> >
> > And analogously for the wakeup.
> >
> >> This new framework combines that information and provides
> >> it in a handy way.
> >
> > I'm not convinced about the last piece.
>
> The handy structure is called Active Stats Monitor. It samples
> the stats gathered after processing idle. That private
> structure maintains statistics which are for a given period
> (current snapshot - previous snapshot).

So collecting the statistics should be fast and simple and processing
them need not be.

Ideally, they should be processed only when somebody asks the data.

I'm not sure if that is the case in the current patchset.

> >
> >> IMHO it has to be implemented as a new framework, next to
> >> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
> >> governor into the frequency change and idle code paths.
> >
> > As far as the design is concerned, I'm not sure if I agree with it.
> >
> >  From my perspective it's all a 1000-line patch that I have to read and
> > understand to figure out what the design is.
>
> I can help you with understanding it with some design docs if you want.

That may help, but let's avoid doing extra work just yet.

> >
> >> Tha patch 4/6 introduces a new API for cooling devices, which allows to
> >> stop tracking the freq and idle statistics.
> >>
> >> The patch set contains also a patches 5/6 6/6 which adds the new power model
> >> based on ASF into the cpufreq cooling (used by thermal governor IPA).
> >> It is added as ifdef option, since Active Stats might be not compiled in.
> >> The ASF is a compile time option, but that might be changed and IPA could
> >> select it, which would allow to remove some redundant code from
> >> cpufreq_cooling.c.
> >>
> >> Comments and suggestions are very welcome.
> >
> > I'm totally not convinced that it is necessary to put the extra 1000
> > lines of code into the kernel to address the problem at hand.
> >
>
> I understand your concerns. If you have another idea than this framework
> I'm happy to hear it. Maybe better stats in cpuidle, which would be
> are of the cpufreq?

One idea that I have is outlined above and I'm not seeing a reason to
put cpufreq statistics into cpuidle.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-06 16:34     ` Rafael J. Wysocki
@ 2021-07-06 16:51       ` Lukasz Luba
  2021-07-14 18:30         ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Lukasz Luba @ 2021-07-06 16:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Daniel Lezcano, Linux PM,
	Amit Kucheria, Zhang, Rui, Dietmar Eggemann, Chris Redpath,
	Beata.Michalska, Viresh Kumar, Rafael J. Wysocki, Amit Kachhap



On 7/6/21 5:34 PM, Rafael J. Wysocki wrote:
> On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> On 7/6/21 4:28 PM, Rafael J. Wysocki wrote:
>>> On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This patch set introduces a new mechanism: Active Stats framework (ASF), which
>>>> gathers and maintains statistics of CPU performance - time residency at each
>>>> performance state.
>>>>
>>>> The ASF tracks all the frequency transitions as well as CPU
>>>> idle entry/exit events for all CPUs. Based on that it accounts the active
>>>> (non-idle) residency time for each CPU at each frequency. This information can
>>>> be used by some other subsystems (like thermal governor) to enhance their
>>>> estimations about CPU usage at a given period.
>>>
>>> This seems to mean that what is needed is something like the cpufreq
>>> stats but only collected during the time when CPUs are not in idle
>>> states.
>>
>> Yes
> 
> So this is a clear problem statement: the cpufreq statistics cover the
> time when CPUs are in idle states, so they are not suitable for
> certain purposes, like thermal control.

Agree, it's better described problem statement.

> 
> The most straightforward approach to address it seems to be to modify
> the collection of cpufreq statistics so that they don't include the
> time spent by CPUs in idle states, or to make it possible to
> distinguish the time spent in idle states from the "active" time.
> 
>>>> Does it fix something in mainline?
>>>> Yes, there is thermal governor Intelligent Power Allocation (IPA), which
>>>> estimates the CPUs power used in the past. IPA is sampling the CPU utilization
>>>> and frequency and relies on the info available at the time of sampling
>>>> and this imposes the estimation errors.
>>>> The use of ASF solve the issue and enables IPA to make better estimates.
>>>
>>> Obviously the IPA is not used on all platforms where cpufreq and
>>> cpuidle are used.  What platforms are going to benefit from this
>>> change?
>>
>> Arm platforms which still use kernel thermal to control temperature,
>> such as Chromebooks or mid-, low-end phones.
> 
> Which means that this feature is not going to be universally useful.
> 
> However, if the time spent by CPUs in idle states were accounted for
> in the cpufreq statistics, that would be universally useful.

True

> 
>>>
>>>> Why it couldn't be done using existing frameworks?
>>>> The CPUFreq and CPUIdle statistics are not combined, so it is not possible
>>>> to derive the information on how long exactly the CPU was running with a given
>>>> frequency.
>>>
>>> But it doesn't mean that the statistics could not be combined.
>>>
>>> For instance, the frequency of the CPU cannot change via cpufreq when
>>> active_stats_cpu_idle_enter() is running, so instead of using an
>>> entirely new framework for collecting statistics it might update the
>>> existing cpufreq stats to register that event.
>>
>> True, but keep in mind that the cpufreq most likely works for a few
>> CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion.
>> I would say that cpuidle should check during enter/exit what is
>> the currently set frequency for cluster and account its active
>> period.
> 
> Yes, that's not particularly difficult to achieve in principle: on
> idle entry and exit, update the cpufreq statistics of the policy
> including the current CPU.

Sounds good.

> 
>>>
>>> And analogously for the wakeup.
>>>
>>>> This new framework combines that information and provides
>>>> it in a handy way.
>>>
>>> I'm not convinced about the last piece.
>>
>> The handy structure is called Active Stats Monitor. It samples
>> the stats gathered after processing idle. That private
>> structure maintains statistics which are for a given period
>> (current snapshot - previous snapshot).
> 
> So collecting the statistics should be fast and simple and processing
> them need not be.
> 
> Ideally, they should be processed only when somebody asks the data.

Correct.

> 
> I'm not sure if that is the case in the current patchset.

Which is the case in current implementation, where the Active Stats
Monitor (ASM) is run in the context of thermal workqueue. It is
scheduled every 100ms to run IPA throttling, which does the
statistics gathering and calculation in the ASM. Time accounted for this
major calculation is moved to the 'client' (like IPA) context.

> 
>>>
>>>> IMHO it has to be implemented as a new framework, next to
>>>> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
>>>> governor into the frequency change and idle code paths.
>>>
>>> As far as the design is concerned, I'm not sure if I agree with it.
>>>
>>>   From my perspective it's all a 1000-line patch that I have to read and
>>> understand to figure out what the design is.
>>
>> I can help you with understanding it with some design docs if you want.
> 
> That may help, but let's avoid doing extra work just yet.

Understood

> 
>>>
>>>> Tha patch 4/6 introduces a new API for cooling devices, which allows to
>>>> stop tracking the freq and idle statistics.
>>>>
>>>> The patch set contains also a patches 5/6 6/6 which adds the new power model
>>>> based on ASF into the cpufreq cooling (used by thermal governor IPA).
>>>> It is added as ifdef option, since Active Stats might be not compiled in.
>>>> The ASF is a compile time option, but that might be changed and IPA could
>>>> select it, which would allow to remove some redundant code from
>>>> cpufreq_cooling.c.
>>>>
>>>> Comments and suggestions are very welcome.
>>>
>>> I'm totally not convinced that it is necessary to put the extra 1000
>>> lines of code into the kernel to address the problem at hand.
>>>
>>
>> I understand your concerns. If you have another idea than this framework
>> I'm happy to hear it. Maybe better stats in cpuidle, which would be
>> are of the cpufreq?
> 
> One idea that I have is outlined above and I'm not seeing a reason to
> put cpufreq statistics into cpuidle.
> 

I'm happy to prepare such RFC if you like. I would just need a bit more
information. It sounds your proposed solution might be smaller in code
size, since the client's statistics accounting might be moved to the
cpufreq_cooling.c (which now live in the ASM). Also, there won't be a
new framework to maintain (which is a big plus).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-06 16:51       ` Lukasz Luba
@ 2021-07-14 18:30         ` Rafael J. Wysocki
  2021-07-19  8:52           ` Lukasz Luba
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2021-07-14 18:30 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Daniel Lezcano,
	Linux PM, Amit Kucheria, Zhang, Rui, Dietmar Eggemann,
	Chris Redpath, Beata.Michalska, Viresh Kumar, Rafael J. Wysocki,
	Amit Kachhap

Sorry for the delay.

On Tue, Jul 6, 2021 at 6:51 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 7/6/21 5:34 PM, Rafael J. Wysocki wrote:
> > On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> On 7/6/21 4:28 PM, Rafael J. Wysocki wrote:
> >>> On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> This patch set introduces a new mechanism: Active Stats framework (ASF), which
> >>>> gathers and maintains statistics of CPU performance - time residency at each
> >>>> performance state.
> >>>>
> >>>> The ASF tracks all the frequency transitions as well as CPU
> >>>> idle entry/exit events for all CPUs. Based on that it accounts the active
> >>>> (non-idle) residency time for each CPU at each frequency. This information can
> >>>> be used by some other subsystems (like thermal governor) to enhance their
> >>>> estimations about CPU usage at a given period.
> >>>
> >>> This seems to mean that what is needed is something like the cpufreq
> >>> stats but only collected during the time when CPUs are not in idle
> >>> states.
> >>
> >> Yes
> >
> > So this is a clear problem statement: the cpufreq statistics cover the
> > time when CPUs are in idle states, so they are not suitable for
> > certain purposes, like thermal control.
>
> Agree, it's better described problem statement.
>
> >
> > The most straightforward approach to address it seems to be to modify
> > the collection of cpufreq statistics so that they don't include the
> > time spent by CPUs in idle states, or to make it possible to
> > distinguish the time spent in idle states from the "active" time.
> >
> >>>> Does it fix something in mainline?
> >>>> Yes, there is thermal governor Intelligent Power Allocation (IPA), which
> >>>> estimates the CPUs power used in the past. IPA is sampling the CPU utilization
> >>>> and frequency and relies on the info available at the time of sampling
> >>>> and this imposes the estimation errors.
> >>>> The use of ASF solve the issue and enables IPA to make better estimates.
> >>>
> >>> Obviously the IPA is not used on all platforms where cpufreq and
> >>> cpuidle are used.  What platforms are going to benefit from this
> >>> change?
> >>
> >> Arm platforms which still use kernel thermal to control temperature,
> >> such as Chromebooks or mid-, low-end phones.
> >
> > Which means that this feature is not going to be universally useful.
> >
> > However, if the time spent by CPUs in idle states were accounted for
> > in the cpufreq statistics, that would be universally useful.
>
> True
>
> >
> >>>
> >>>> Why it couldn't be done using existing frameworks?
> >>>> The CPUFreq and CPUIdle statistics are not combined, so it is not possible
> >>>> to derive the information on how long exactly the CPU was running with a given
> >>>> frequency.
> >>>
> >>> But it doesn't mean that the statistics could not be combined.
> >>>
> >>> For instance, the frequency of the CPU cannot change via cpufreq when
> >>> active_stats_cpu_idle_enter() is running, so instead of using an
> >>> entirely new framework for collecting statistics it might update the
> >>> existing cpufreq stats to register that event.
> >>
> >> True, but keep in mind that the cpufreq most likely works for a few
> >> CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion.
> >> I would say that cpuidle should check during enter/exit what is
> >> the currently set frequency for cluster and account its active
> >> period.
> >
> > Yes, that's not particularly difficult to achieve in principle: on
> > idle entry and exit, update the cpufreq statistics of the policy
> > including the current CPU.
>
> Sounds good.
>
> >
> >>>
> >>> And analogously for the wakeup.
> >>>
> >>>> This new framework combines that information and provides
> >>>> it in a handy way.
> >>>
> >>> I'm not convinced about the last piece.
> >>
> >> The handy structure is called Active Stats Monitor. It samples
> >> the stats gathered after processing idle. That private
> >> structure maintains statistics which are for a given period
> >> (current snapshot - previous snapshot).
> >
> > So collecting the statistics should be fast and simple and processing
> > them need not be.
> >
> > Ideally, they should be processed only when somebody asks the data.
>
> Correct.
>
> >
> > I'm not sure if that is the case in the current patchset.
>
> Which is the case in current implementation, where the Active Stats
> Monitor (ASM) is run in the context of thermal workqueue. It is
> scheduled every 100ms to run IPA throttling, which does the
> statistics gathering and calculation in the ASM. Time accounted for this
> major calculation is moved to the 'client' (like IPA) context.
>
> >
> >>>
> >>>> IMHO it has to be implemented as a new framework, next to
> >>>> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal
> >>>> governor into the frequency change and idle code paths.
> >>>
> >>> As far as the design is concerned, I'm not sure if I agree with it.
> >>>
> >>>   From my perspective it's all a 1000-line patch that I have to read and
> >>> understand to figure out what the design is.
> >>
> >> I can help you with understanding it with some design docs if you want.
> >
> > That may help, but let's avoid doing extra work just yet.
>
> Understood
>
> >
> >>>
> >>>> Tha patch 4/6 introduces a new API for cooling devices, which allows to
> >>>> stop tracking the freq and idle statistics.
> >>>>
> >>>> The patch set contains also a patches 5/6 6/6 which adds the new power model
> >>>> based on ASF into the cpufreq cooling (used by thermal governor IPA).
> >>>> It is added as ifdef option, since Active Stats might be not compiled in.
> >>>> The ASF is a compile time option, but that might be changed and IPA could
> >>>> select it, which would allow to remove some redundant code from
> >>>> cpufreq_cooling.c.
> >>>>
> >>>> Comments and suggestions are very welcome.
> >>>
> >>> I'm totally not convinced that it is necessary to put the extra 1000
> >>> lines of code into the kernel to address the problem at hand.
> >>>
> >>
> >> I understand your concerns. If you have another idea than this framework
> >> I'm happy to hear it. Maybe better stats in cpuidle, which would be
> >> are of the cpufreq?
> >
> > One idea that I have is outlined above and I'm not seeing a reason to
> > put cpufreq statistics into cpuidle.
> >
>
> I'm happy to prepare such RFC if you like.

Well, it should be quite clear that I would prefer this to the
original approach, if viable at all. :-)

> I would just need a bit more information.

OK

> It sounds your proposed solution might be smaller in code
> size, since the client's statistics accounting might be moved to the
> cpufreq_cooling.c (which now live in the ASM). Also, there won't be a
> new framework to maintain (which is a big plus).

Right.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics
  2021-07-14 18:30         ` Rafael J. Wysocki
@ 2021-07-19  8:52           ` Lukasz Luba
  0 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2021-07-19  8:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Daniel Lezcano, Linux PM,
	Amit Kucheria, Zhang, Rui, Dietmar Eggemann, Chris Redpath,
	Beata.Michalska, Viresh Kumar, Rafael J. Wysocki, Amit Kachhap



On 7/14/21 7:30 PM, Rafael J. Wysocki wrote:
> Sorry for the delay.

Thank you for coming back with comments.
No worries, I've been on holidays last week :)

[snip]

>>>>
>>>> I understand your concerns. If you have another idea than this framework
>>>> I'm happy to hear it. Maybe better stats in cpuidle, which would be
>>>> are of the cpufreq?
>>>
>>> One idea that I have is outlined above and I'm not seeing a reason to
>>> put cpufreq statistics into cpuidle.
>>>
>>
>> I'm happy to prepare such RFC if you like.
> 
> Well, it should be quite clear that I would prefer this to the
> original approach, if viable at all. :-)

Sure, let me check if this approach is viable. I'll come back to you
in next days...

> 
>> I would just need a bit more information.
> 
> OK

For now I have only high level questions:
1. The stats data structures and related manageable code should
live in the cpuidle and cpufreq just calls some API to notify about
freq transition? (no split of data, code between these two frameworks)
2. The registration/allocation of these structures inside the
cpuidle could be done using cpufreq_register_notifier() with
notification mechanism?
3. CPU hotplug notification (which is needed for these stats) can be
used inside the cpuidle (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,...))?


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-19  9:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 13:18 [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 1/6] PM: Introduce Active Stats framework Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 2/6] cpuidle: Add Active Stats calls tracking idle entry/exit Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 3/6] cpufreq: Add Active Stats calls tracking frequency changes Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 4/6] thermal: Add interface to cooling devices to handle governor change Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 5/6] thermal/core/power allocator: Prepare power actors and calm down when not used Lukasz Luba
2021-07-06 13:18 ` [RFC PATCH v2 6/6] thermal: cpufreq_cooling: Improve power estimation based on Active Stats framework Lukasz Luba
2021-07-06 15:28 ` [RFC PATCH v2 0/6] Introduce Active Stats framework with CPU performance statistics Rafael J. Wysocki
2021-07-06 15:56   ` Lukasz Luba
2021-07-06 16:34     ` Rafael J. Wysocki
2021-07-06 16:51       ` Lukasz Luba
2021-07-14 18:30         ` Rafael J. Wysocki
2021-07-19  8:52           ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).