LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1
@ 2021-03-11  7:07 Jin Yao
  2021-03-11  7:07 ` [PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h Jin Yao
                   ` (26 more replies)
  0 siblings, 27 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

AlderLake uses a hybrid architecture utilizing Golden Cove cores
(core cpu) and Gracemont cores (atom cpu). Each cpu has dedicated
event list. Some events are available on core cpu, some events
are available on atom cpu and some events can be available on both.

Kernel exports new pmus "cpu_core" and "cpu_atom" through sysfs:
/sys/devices/cpu_core
/sys/devices/cpu_atom

cat /sys/devices/cpu_core/cpus
0-15

cat /sys/devices/cpu_atom/cpus
16-23

In this example, core cpus are 0-15 and atom cpus are 16-23.

To enable a core only event or atom only event:

        cpu_core/<event name>/
or
        cpu_atom/<event name>/

Count the 'cycles' event on core cpus.

  # perf stat -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

      12,853,951,349      cpu_core/cycles/

         1.002581249 seconds time elapsed

If one event is available on both atom cpu and core cpu, two events
are created automatically.

  # perf stat -e cycles -a -- sleep 1

   Performance counter stats for 'system wide':

      12,856,467,438      cpu_core/cycles/
       6,404,634,785      cpu_atom/cycles/

         1.002453013 seconds time elapsed

Group is supported if the events are from same pmu, otherwise a warning
is displayed.   

  # perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -a -- sleep 1

   Performance counter stats for 'system wide':

      12,863,866,968      cpu_core/cycles/
         554,795,017      cpu_core/instructions/

         1.002616117 seconds time elapsed

  # perf stat -e '{cpu_core/cycles/,cpu_atom/instructions/}' -a -- sleep 1
  WARNING: Group has events from different hybrid PMUs

   Performance counter stats for 'system wide':

      12,863,344,830      cpu_core/cycles/
     <not supported>      cpu_atom/instructions/

       1.002568830 seconds time elapsed

Note that, since the whole patchset for AlderLake hybrid support is very
large (40+ patches), for simplicity, it's splitted into several patch
series.

The patch series 1 only supports the basic functionality. The other
supports for perf-c2c/perf-mem/topdown/metrics/topology header and etc
will be added in follow-up series.

v2:
---
- Drop kernel patches (Kan posted the series "Add Alder Lake support for perf (kernel)" separately).
- Drop the patches for perf-c2c/perf-mem/topdown/metrics/topology header supports,
  which will be added in series 2 or series 3.
- Simplify the arguments of __perf_pmu__new_alias() by passing
  the 'struct pme_event' pointer.
- Check sysfs validity before access.
- Use pmu style event name, such as "cpu_core/cycles/".
- Move command output two chars to the right.
- Move pmu hybrid functions to new created pmu-hybrid.c/pmu-hybrid.h.
  This is to pass the perf test python case.

Jin Yao (27):
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf jevents: Support unit value "cpu_core" and "cpu_atom"
  perf pmu: Simplify arguments of __perf_pmu__new_alias
  perf pmu: Save pmu name
  perf pmu: Save detected hybrid pmus to a global pmu list
  perf pmu: Add hybrid helper functions
  perf evlist: Hybrid event uses its own cpus
  perf stat: Uniquify hybrid event name
  perf parse-events: Create two hybrid hardware events
  perf parse-events: Create two hybrid cache events
  perf parse-events: Support hardware events inside PMU
  perf parse-events: Support hybrid raw events
  perf evlist: Create two hybrid 'cycles' events by default
  perf stat: Add default hybrid events
  perf stat: Filter out unmatched aggregation for hybrid event
  perf evlist: Warn as events from different hybrid PMUs in a group
  perf evsel: Adjust hybrid event and global event mixed group
  perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  perf Documentation: Document intel-hybrid support

 tools/include/uapi/linux/perf_event.h      |  26 ++
 tools/perf/Documentation/intel-hybrid.txt  | 228 +++++++++++++++++
 tools/perf/Documentation/perf-record.txt   |   1 +
 tools/perf/Documentation/perf-stat.txt     |   2 +
 tools/perf/builtin-record.c                |  13 +-
 tools/perf/builtin-script.c                |  24 ++
 tools/perf/builtin-stat.c                  |  69 ++++-
 tools/perf/pmu-events/jevents.c            |   2 +
 tools/perf/tests/attr.c                    |   5 +
 tools/perf/tests/evsel-roundtrip-name.c    |  19 +-
 tools/perf/tests/parse-events.c            | 171 +++++++++++++
 tools/perf/tests/parse-metric.c            |  11 +-
 tools/perf/tests/perf-time-to-tsc.c        |  16 ++
 tools/perf/tests/shell/stat+shadow_stat.sh |   3 +
 tools/perf/tests/switch-tracking.c         |  10 +-
 tools/perf/tests/topology.c                |  10 +-
 tools/perf/util/Build                      |   1 +
 tools/perf/util/evlist.c                   | 148 ++++++++++-
 tools/perf/util/evlist.h                   |   6 +
 tools/perf/util/evsel.c                    | 111 ++++++++-
 tools/perf/util/evsel.h                    |  10 +-
 tools/perf/util/parse-events.c             | 277 ++++++++++++++++++++-
 tools/perf/util/parse-events.h             |   6 +-
 tools/perf/util/parse-events.y             |  21 +-
 tools/perf/util/pmu-hybrid.c               |  75 ++++++
 tools/perf/util/pmu-hybrid.h               |  29 +++
 tools/perf/util/pmu.c                      |  53 ++--
 tools/perf/util/pmu.h                      |   5 +
 tools/perf/util/python-ext-sources         |   2 +
 tools/perf/util/stat-display.c             |  32 ++-
 tools/perf/util/stat.h                     |   1 +
 31 files changed, 1311 insertions(+), 76 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom" Jin Yao
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

To get the changes in:

Liang Kan's patch
("perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU")

Kan's patch is in review at the moment, but the following perf tool
patches need this interface for hybrid support.

This patch can be removed after Kan's patch is upstreamed.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/include/uapi/linux/perf_event.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index ad15e40d7f5d..c0a511eea498 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -33,6 +33,8 @@ enum perf_type_id {
 	PERF_TYPE_HW_CACHE			= 3,
 	PERF_TYPE_RAW				= 4,
 	PERF_TYPE_BREAKPOINT			= 5,
+	PERF_TYPE_HARDWARE_PMU			= 6,
+	PERF_TYPE_HW_CACHE_PMU			= 7,
 
 	PERF_TYPE_MAX,				/* non-ABI */
 };
@@ -94,6 +96,30 @@ enum perf_hw_cache_op_result_id {
 	PERF_COUNT_HW_CACHE_RESULT_MAX,		/* non-ABI */
 };
 
+/*
+ * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE*
+ * PERF_TYPE_HARDWARE:		0xAA
+ *				AA: hardware event ID
+ * PERF_TYPE_HW_CACHE:		0xCCBBAA
+ *				AA: hardware cache ID
+ *				BB: hardware cache op ID
+ *				CC: hardware cache op result ID
+ * PERF_TYPE_HARDWARE_PMU:	0xDD000000AA
+ *				AA: hardware event ID
+ *				DD: PMU type ID
+ * PERF_TYPE_HW_CACHE_PMU:	0xDD00CCBBAA
+ *				AA: hardware cache ID
+ *				BB: hardware cache op ID
+ *				CC: hardware cache op result ID
+ *				DD: PMU type ID
+ */
+#define PERF_HW_CACHE_ID_SHIFT			0
+#define PERF_HW_CACHE_OP_ID_SHIFT		8
+#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT	16
+#define PERF_HW_CACHE_EVENT_MASK		0xffffff
+
+#define PERF_PMU_TYPE_SHIFT			32
+
 /*
  * Special "software" events provided by the kernel, even if the hardware
  * does not support performance events. These events measure various
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom"
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
  2021-03-11  7:07 ` [PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias Jin Yao
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

For some Intel platforms, such as Alderlake, which is a hybrid platform
and it consists of atom cpu and core cpu. Each cpu has dedicated event
list. Part of events are available on core cpu, part of events are
available on atom cpu.

The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
json is added with a new field "Unit" to indicate which pmu the event
is available on.

For example, one event in cache.json,

    {
        "BriefDescription": "Counts the number of load ops retired that",
        "CollectPEBSRecord": "2",
        "Counter": "0,1,2,3",
        "EventCode": "0xd2",
        "EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
        "PEBScounters": "0,1,2,3",
        "SampleAfterValue": "1000003",
        "UMask": "0x80",
        "Unit": "cpu_atom"
    },

The unit "cpu_atom" indicates this event is only availabe on "cpu_atom".

In generated pmu-events.c, we can see:

{
        .name = "mem_load_uops_retired_misc.mmio",
        .event = "period=1000003,umask=0x80,event=0xd2",
        .desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
        .topic = "cache",
        .pmu = "cpu_atom",
},

But if without this patch, the "uncore_" prefix is added before "cpu_atom",
such as:
        .pmu = "uncore_cpu_atom"

That would be a wrong pmu.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/pmu-events/jevents.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index e1f3f5c8c550..b1a15f57c9ad 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -285,6 +285,8 @@ static struct map {
 	{ "imx8_ddr", "imx8_ddr" },
 	{ "L3PMC", "amd_l3" },
 	{ "DFPMC", "amd_df" },
+	{ "cpu_core", "cpu_core" },
+	{ "cpu_atom", "cpu_atom" },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
  2021-03-11  7:07 ` [PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h Jin Yao
  2021-03-11  7:07 ` [PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom" Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 04/27] perf pmu: Save pmu name Jin Yao
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Simplify the arguments of __perf_pmu__new_alias() by passing
the whole 'struct pme_event' pointer.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/pmu.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 44ef28302fc7..54e586bf19a5 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -306,18 +306,25 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-				 char *desc, char *val,
-				 char *long_desc, char *topic,
-				 char *unit, char *perpkg,
-				 char *metric_expr,
-				 char *metric_name,
-				 char *deprecated)
+				 char *desc, char *val, struct pmu_event *pe)
 {
 	struct parse_events_term *term;
 	struct perf_pmu_alias *alias;
 	int ret;
 	int num;
 	char newval[256];
+	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
+	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+
+	if (pe) {
+		long_desc = (char *)pe->long_desc;
+		topic = (char *)pe->topic;
+		unit = (char *)pe->unit;
+		perpkg = (char *)pe->perpkg;
+		metric_expr = (char *)pe->metric_expr;
+		metric_name = (char *)pe->metric_name;
+		deprecated = (char *)pe->deprecated;
+	}
 
 	alias = malloc(sizeof(*alias));
 	if (!alias)
@@ -406,8 +413,7 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI
 	/* Remove trailing newline from sysfs file */
 	strim(buf);
 
-	return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, NULL,
-				     NULL, NULL, NULL, NULL);
+	return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -793,11 +799,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, struct perf_pmu *pmu,
 		/* need type casts to override 'const' */
 		__perf_pmu__new_alias(head, NULL, (char *)pe->name,
 				(char *)pe->desc, (char *)pe->event,
-				(char *)pe->long_desc, (char *)pe->topic,
-				(char *)pe->unit, (char *)pe->perpkg,
-				(char *)pe->metric_expr,
-				(char *)pe->metric_name,
-				(char *)pe->deprecated);
+				pe);
 	}
 }
 
@@ -864,13 +866,7 @@ static int pmu_add_sys_aliases_iter_fn(struct pmu_event *pe, void *data)
 				      (char *)pe->name,
 				      (char *)pe->desc,
 				      (char *)pe->event,
-				      (char *)pe->long_desc,
-				      (char *)pe->topic,
-				      (char *)pe->unit,
-				      (char *)pe->perpkg,
-				      (char *)pe->metric_expr,
-				      (char *)pe->metric_name,
-				      (char *)pe->deprecated);
+				      pe);
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 04/27] perf pmu: Save pmu name
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (2 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-15 23:03   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list Jin Yao
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

On hybrid platform, one event is available on one pmu
(such as, available on cpu_core or on cpu_atom).

This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
Then next we can know the pmu which the event can be available on.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/pmu.c | 10 +++++++++-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 54e586bf19a5..45d8db1af8d2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
 	zfree(&newalias->str);
 	zfree(&newalias->metric_expr);
 	zfree(&newalias->metric_name);
+	zfree(&newalias->pmu);
 	parse_events_terms__purge(&newalias->terms);
 	free(newalias);
 }
@@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias,
 
 	list_for_each_entry(a, alist, list) {
 		if (!strcasecmp(newalias->name, a->name)) {
+			if (newalias->pmu && a->pmu &&
+			    !strcasecmp(newalias->pmu, a->pmu)) {
+				continue;
+			}
 			perf_pmu_update_alias(a, newalias);
 			perf_pmu_free_alias(newalias);
 			return true;
@@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	int num;
 	char newval[256];
 	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
+	     *pmu = NULL;
 
 	if (pe) {
 		long_desc = (char *)pe->long_desc;
@@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 		metric_expr = (char *)pe->metric_expr;
 		metric_name = (char *)pe->metric_name;
 		deprecated = (char *)pe->deprecated;
+		pmu = (char *)pe->pmu;
 	}
 
 	alias = malloc(sizeof(*alias));
@@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 	}
 	alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
 	alias->str = strdup(newval);
+	alias->pmu = pmu ? strdup(pmu) : NULL;
 
 	if (deprecated)
 		alias->deprecated = true;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8164388478c6..0e724d5b84c6 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -72,6 +72,7 @@ struct perf_pmu_alias {
 	bool deprecated;
 	char *metric_expr;
 	char *metric_name;
+	char *pmu;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (3 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 04/27] perf pmu: Save pmu name Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 06/27] perf pmu: Add hybrid helper functions Jin Yao
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:

For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"

For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"

If the 'cpus' file exists, the pmu exists.

But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way. So if the path
"/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid
pmu exists. All the detected hybrid pmus are linked to a
global list 'perf_pmu__hybrid_pmus' and then next we just need
to iterate the list to get all hybrid pmu by using
perf_pmu__for_each_hybrid_pmu.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/Build        |  1 +
 tools/perf/util/pmu-hybrid.c | 35 +++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu-hybrid.h | 18 ++++++++++++++++++
 tools/perf/util/pmu.c        |  9 ++++++++-
 tools/perf/util/pmu.h        |  4 ++++
 5 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e3e12f9d4733..37a8a63c7195 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,6 +69,7 @@ perf-y += parse-events-bison.o
 perf-y += pmu.o
 perf-y += pmu-flex.o
 perf-y += pmu-bison.o
+perf-y += pmu-hybrid.o
 perf-y += trace-event-read.o
 perf-y += trace-event-info.o
 perf-y += trace-event-scripting.o
diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
new file mode 100644
index 000000000000..7316bf46e54b
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/list.h>
+#include <linux/compiler.h>
+#include <linux/string.h>
+#include <linux/zalloc.h>
+#include <sys/types.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdarg.h>
+#include <locale.h>
+#include <api/fs/fs.h>
+#include "fncache.h"
+#include "pmu-hybrid.h"
+
+LIST_HEAD(perf_pmu__hybrid_pmus);
+
+bool perf_pmu__hybrid_mounted(const char *name)
+{
+	char path[PATH_MAX];
+	const char *sysfs;
+
+	if (strncmp(name, "cpu_", 4))
+		return false;
+
+	sysfs = sysfs__mountpoint();
+	if (!sysfs)
+		return false;
+
+	snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
+	return file_available(path);
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
new file mode 100644
index 000000000000..35bed3714438
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PMU_HYBRID_H
+#define __PMU_HYBRID_H
+
+#include <linux/perf_event.h>
+#include <linux/compiler.h>
+#include <linux/list.h>
+#include <stdbool.h>
+#include "pmu.h"
+
+extern struct list_head perf_pmu__hybrid_pmus;
+
+#define perf_pmu__for_each_hybrid_pmu(pmu)	\
+	list_for_each_entry(pmu, &perf_pmu__hybrid_pmus, hybrid_list)
+
+bool perf_pmu__hybrid_mounted(const char *name);
+
+#endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 45d8db1af8d2..08280a3e45a8 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -25,6 +25,7 @@
 #include "string2.h"
 #include "strbuf.h"
 #include "fncache.h"
+#include "pmu-hybrid.h"
 
 struct perf_pmu perf_pmu__fake;
 
@@ -613,7 +614,6 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path)
  */
 #define SYS_TEMPLATE_ID	"./bus/event_source/devices/%s/identifier"
 #define CPUS_TEMPLATE_UNCORE	"%s/bus/event_source/devices/%s/cpumask"
-#define CPUS_TEMPLATE_CPU	"%s/bus/event_source/devices/%s/cpus"
 
 static struct perf_cpu_map *pmu_cpumask(const char *name)
 {
@@ -645,6 +645,9 @@ static bool pmu_is_uncore(const char *name)
 	char path[PATH_MAX];
 	const char *sysfs;
 
+	if (perf_pmu__hybrid_mounted(name))
+		return false;
+
 	sysfs = sysfs__mountpoint();
 	snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);
 	return file_available(path);
@@ -946,6 +949,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	pmu->is_uncore = pmu_is_uncore(name);
 	if (pmu->is_uncore)
 		pmu->id = pmu_id(name);
+	pmu->is_hybrid = perf_pmu__hybrid_mounted(name);
 	pmu->max_precise = pmu_max_precise(name);
 	pmu_add_cpu_aliases(&aliases, pmu);
 	pmu_add_sys_aliases(&aliases, pmu);
@@ -957,6 +961,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	list_splice(&aliases, &pmu->aliases);
 	list_add_tail(&pmu->list, &pmus);
 
+	if (pmu->is_hybrid)
+		list_add_tail(&pmu->hybrid_list, &perf_pmu__hybrid_pmus);
+
 	pmu->default_config = perf_pmu__get_default_config(pmu);
 
 	return pmu;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0e724d5b84c6..3b9b4def6032 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -5,6 +5,7 @@
 #include <linux/bitmap.h>
 #include <linux/compiler.h>
 #include <linux/perf_event.h>
+#include <linux/list.h>
 #include <stdbool.h>
 #include "parse-events.h"
 #include "pmu-events/pmu-events.h"
@@ -19,6 +20,7 @@ enum {
 
 #define PERF_PMU_FORMAT_BITS 64
 #define EVENT_SOURCE_DEVICE_PATH "/bus/event_source/devices/"
+#define CPUS_TEMPLATE_CPU	"%s/bus/event_source/devices/%s/cpus"
 
 struct perf_event_attr;
 
@@ -34,6 +36,7 @@ struct perf_pmu {
 	__u32 type;
 	bool selectable;
 	bool is_uncore;
+	bool is_hybrid;
 	bool auxtrace;
 	int max_precise;
 	struct perf_event_attr *default_config;
@@ -42,6 +45,7 @@ struct perf_pmu {
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
 	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
+	struct list_head hybrid_list;
 };
 
 extern struct perf_pmu perf_pmu__fake;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 06/27] perf pmu: Add hybrid helper functions
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (4 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus Jin Yao
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
can be used to identify the hybrid platform and return the found
hybrid cpu pmu. All the detected hybrid pmus have been saved in
'perf_pmu__hybrid_pmus' list. So we just need to search this list.

perf_pmu__hybrid_type_to_pmu converts the user specified string
to hybrid pmu name. This is used to support the '--cputype' option
in next patches.

perf_pmu__hybrid_exist checks the existing of hybrid pmu.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/pmu-hybrid.c | 40 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu-hybrid.h | 11 ++++++++++
 2 files changed, 51 insertions(+)

diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
index 7316bf46e54b..86ba84d9469c 100644
--- a/tools/perf/util/pmu-hybrid.c
+++ b/tools/perf/util/pmu-hybrid.c
@@ -33,3 +33,43 @@ bool perf_pmu__hybrid_mounted(const char *name)
 	snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
 	return file_available(path);
 }
+
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name)
+{
+	struct perf_pmu *pmu;
+
+	if (!name)
+		return NULL;
+
+	perf_pmu__for_each_hybrid_pmu(pmu) {
+		if (!strcmp(name, pmu->name))
+			return pmu;
+	}
+
+	return NULL;
+}
+
+bool perf_pmu__is_hybrid(const char *name)
+{
+	return perf_pmu__find_hybrid_pmu(name) != NULL;
+}
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type)
+{
+	char *pmu_name = NULL;
+
+	if (asprintf(&pmu_name, "cpu_%s", type) < 0)
+		return NULL;
+
+	if (perf_pmu__is_hybrid(pmu_name))
+		return pmu_name;
+
+	/*
+	 * pmu may be not scanned, check the sysfs.
+	 */
+	if (perf_pmu__hybrid_mounted(pmu_name))
+		return pmu_name;
+
+	free(pmu_name);
+	return NULL;
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
index 35bed3714438..7fb2246e939a 100644
--- a/tools/perf/util/pmu-hybrid.h
+++ b/tools/perf/util/pmu-hybrid.h
@@ -15,4 +15,15 @@ extern struct list_head perf_pmu__hybrid_pmus;
 
 bool perf_pmu__hybrid_mounted(const char *name);
 
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name);
+
+bool perf_pmu__is_hybrid(const char *name);
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type);
+
+static inline bool perf_pmu__hybrid_exist(void)
+{
+	return !list_empty(&perf_pmu__hybrid_pmus);
+}
+
 #endif /* __PMU_HYBRID_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (5 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 06/27] perf pmu: Add hybrid helper functions Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-12 19:15   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 08/27] perf stat: Uniquify hybrid event name Jin Yao
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

On hybrid platform, atom events can be only enabled on atom CPUs. Core
events can be only enabled on core CPUs. So for a hybrid event, it can
be only enabled on it's own CPUs.

But the problem for current perf is, the cpus for evsel (via PMU sysfs)
have been merged to evsel_list->core.all_cpus. It might be all CPUs.

So we need to figure out one way to let the hybrid event only use it's
own CPUs.

The idea is to create a new evlist__invalidate_all_cpus to invalidate
the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1
for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus.

We will see following code piece in patch.

if (cpu == -1 && !evlist->thread_mode)
        evsel__enable_cpus(pos);

It lets the event be enabled on event's own cpus.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c          | 37 ++++++++++++++-
 tools/perf/util/evlist.c           | 72 ++++++++++++++++++++++++++++--
 tools/perf/util/evlist.h           |  4 ++
 tools/perf/util/evsel.h            |  8 ++++
 tools/perf/util/python-ext-sources |  2 +
 5 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2e2e4a8345ea..68ecf68699a9 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -393,6 +393,18 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu)
 	return 0;
 }
 
+static int read_counter_cpus(struct evsel *counter, struct timespec *rs)
+{
+	int cpu, nr_cpus, err = 0;
+	struct perf_cpu_map *cpus = evsel__cpus(counter);
+
+	nr_cpus = cpus ? cpus->nr : 1;
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		err = read_counter_cpu(counter, rs, cpu);
+
+	return err;
+}
+
 static int read_affinity_counters(struct timespec *rs)
 {
 	struct evsel *counter;
@@ -414,8 +426,14 @@ static int read_affinity_counters(struct timespec *rs)
 			if (evsel__cpu_iter_skip(counter, cpu))
 				continue;
 			if (!counter->err) {
-				counter->err = read_counter_cpu(counter, rs,
-								counter->cpu_iter - 1);
+				if (cpu == -1 && !evsel_list->thread_mode) {
+					counter->err = read_counter_cpus(counter, rs);
+				} else if (evsel_list->thread_mode) {
+					counter->err = read_counter_cpu(counter, rs, 0);
+				} else {
+					counter->err = read_counter_cpu(counter, rs,
+									counter->cpu_iter - 1);
+				}
 			}
 		}
 	}
@@ -781,6 +799,21 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (group)
 		evlist__set_leader(evsel_list);
 
+	/*
+	 * On hybrid platform, the cpus for evsel (via PMU sysfs) have been
+	 * merged to evsel_list->core.all_cpus. We use evlist__invalidate_all_cpus
+	 * to invalidate the evsel_list->core.all_cpus then evlist__for_each_cpu
+	 * returns cpu -1 for hybrid evsel. If cpu is -1, hybrid evsel will
+	 * use it's own cpus.
+	 */
+	if (evlist__has_hybrid_events(evsel_list)) {
+		evlist__invalidate_all_cpus(evsel_list);
+		if (!target__has_cpu(&target) ||
+		    target__has_per_thread(&target)) {
+			evsel_list->thread_mode = true;
+		}
+	}
+
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 882cd1f721d9..3ee12fcd0c9f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -381,7 +381,8 @@ bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu)
 bool evsel__cpu_iter_skip(struct evsel *ev, int cpu)
 {
 	if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) {
-		ev->cpu_iter++;
+		if (cpu != -1)
+			ev->cpu_iter++;
 		return false;
 	}
 	return true;
@@ -410,6 +411,16 @@ static int evlist__is_enabled(struct evlist *evlist)
 	return false;
 }
 
+static void evsel__disable_cpus(struct evsel *evsel)
+{
+	int cpu, nr_cpus;
+	struct perf_cpu_map *cpus = evsel__cpus(evsel);
+
+	nr_cpus = cpus ? cpus->nr : 1;
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		evsel__disable_cpu(evsel, cpu);
+}
+
 static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 {
 	struct evsel *pos;
@@ -436,7 +447,12 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 					has_imm = true;
 				if (pos->immediate != imm)
 					continue;
-				evsel__disable_cpu(pos, pos->cpu_iter - 1);
+				if (cpu == -1 && !evlist->thread_mode)
+					evsel__disable_cpus(pos);
+				else if (evlist->thread_mode)
+					evsel__disable_cpu(pos, 0);
+				else
+					evsel__disable_cpu(pos, pos->cpu_iter - 1);
 			}
 		}
 		if (!has_imm)
@@ -472,6 +488,15 @@ void evlist__disable_evsel(struct evlist *evlist, char *evsel_name)
 	__evlist__disable(evlist, evsel_name);
 }
 
+static void evsel__enable_cpus(struct evsel *evsel)
+{
+	int cpu, nr_cpus;
+	struct perf_cpu_map *cpus = evsel__cpus(evsel);
+
+	nr_cpus = cpus ? cpus->nr : 1;
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		evsel__enable_cpu(evsel, cpu);
+}
 static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 {
 	struct evsel *pos;
@@ -491,7 +516,12 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 				continue;
 			if (!evsel__is_group_leader(pos) || !pos->core.fd)
 				continue;
-			evsel__enable_cpu(pos, pos->cpu_iter - 1);
+                        if (cpu == -1 && !evlist->thread_mode)
+                                evsel__enable_cpus(pos);
+                        else if (evlist->thread_mode)
+                                evsel__enable_cpu(pos, 0);
+                        else
+                                evsel__enable_cpu(pos, pos->cpu_iter - 1);
 		}
 	}
 	affinity__cleanup(&affinity);
@@ -1274,6 +1304,16 @@ void evlist__set_selected(struct evlist *evlist, struct evsel *evsel)
 	evlist->selected = evsel;
 }
 
+static void evsel__close_cpus(struct evsel *evsel)
+{
+	int cpu, nr_cpus;
+	struct perf_cpu_map *cpus = evsel__cpus(evsel);
+
+	nr_cpus = cpus ? cpus->nr : 1;
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		perf_evsel__close_cpu(&evsel->core, cpu);
+}
+
 void evlist__close(struct evlist *evlist)
 {
 	struct evsel *evsel;
@@ -1298,7 +1338,13 @@ void evlist__close(struct evlist *evlist)
 		evlist__for_each_entry_reverse(evlist, evsel) {
 			if (evsel__cpu_iter_skip(evsel, cpu))
 			    continue;
-			perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1);
+
+			if (cpu == -1 && !evlist->thread_mode)
+				evsel__close_cpus(evsel);
+			else if (evlist->thread_mode)
+				perf_evsel__close_cpu(&evsel->core, 0);
+			else
+				perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1);
 		}
 	}
 	affinity__cleanup(&affinity);
@@ -2130,3 +2176,21 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
 	}
 	return NULL;
 }
+
+bool evlist__has_hybrid_events(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_hybrid_event(evsel))
+			return true;
+	}
+
+	return false;
+}
+
+void evlist__invalidate_all_cpus(struct evlist *evlist)
+{
+	perf_cpu_map__put(evlist->core.all_cpus);
+	evlist->core.all_cpus = perf_cpu_map__empty_new(1);
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b695ffaae519..0da683511d98 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -52,6 +52,7 @@ struct evlist {
 	struct perf_evlist core;
 	int		 nr_groups;
 	bool		 enabled;
+	bool		 thread_mode;
 	int		 id_pos;
 	int		 is_pos;
 	u64		 combined_sample_type;
@@ -365,4 +366,7 @@ int evlist__ctlfd_ack(struct evlist *evlist);
 #define EVLIST_DISABLED_MSG "Events disabled\n"
 
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
+void evlist__invalidate_all_cpus(struct evlist *evlist);
+
+bool evlist__has_hybrid_events(struct evlist *evlist);
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 6026487353dd..69aadc52c1bd 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -7,9 +7,11 @@
 #include <sys/types.h>
 #include <linux/perf_event.h>
 #include <linux/types.h>
+#include <string.h>
 #include <internal/evsel.h>
 #include <perf/evsel.h>
 #include "symbol_conf.h"
+#include "pmu-hybrid.h"
 #include <internal/cpumap.h>
 
 struct bpf_object;
@@ -435,4 +437,10 @@ struct perf_env *evsel__env(struct evsel *evsel);
 int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
 
 void evsel__zero_per_pkg(struct evsel *evsel);
+
+static inline bool evsel__is_hybrid_event(struct evsel *evsel)
+{
+	return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
+}
+
 #endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index 845dd46e3c61..d7c976671e3a 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -37,3 +37,5 @@ util/units.c
 util/affinity.c
 util/rwsem.c
 util/hashmap.c
+util/pmu-hybrid.c
+util/fncache.c
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 08/27] perf stat: Uniquify hybrid event name
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (6 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-15 23:04   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events Jin Yao
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

It would be useful to tell user the pmu which the event belongs to.
perf-stat has supported '--no-merge' option and it can print the pmu
name after the event name, such as:

"cycles [cpu_core]"

Now this option is enabled by default for hybrid platform but change
the format to:

"cpu_core/cycles/"

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c      |  3 +++
 tools/perf/util/stat-display.c | 12 ++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 68ecf68699a9..6c0a21323814 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2390,6 +2390,9 @@ int cmd_stat(int argc, const char **argv)
 
 	evlist__check_cpu_maps(evsel_list);
 
+	if (perf_pmu__hybrid_exist())
+		stat_config.no_merge = true;
+
 	/*
 	 * Initialize thread_map with comm names,
 	 * so we could print it out on output.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..ed37d8e7ea1a 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -526,6 +526,7 @@ static void uniquify_event_name(struct evsel *counter)
 {
 	char *new_name;
 	char *config;
+	int ret;
 
 	if (counter->uniquified_name ||
 	    !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
@@ -540,8 +541,15 @@ static void uniquify_event_name(struct evsel *counter)
 			counter->name = new_name;
 		}
 	} else {
-		if (asprintf(&new_name,
-			     "%s [%s]", counter->name, counter->pmu_name) > 0) {
+		if (perf_pmu__hybrid_exist()) {
+			ret = asprintf(&new_name, "%s/%s/",
+				       counter->pmu_name, counter->name);
+		} else {
+			ret = asprintf(&new_name, "%s [%s]",
+				       counter->name, counter->pmu_name);
+		}
+
+		if (ret) {
 			free(counter->name);
 			counter->name = new_name;
 		}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (7 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 08/27] perf stat: Uniquify hybrid event name Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-12 19:15   ` Jiri Olsa
  2021-03-15 23:05   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 10/27] perf parse-events: Create two hybrid cache events Jin Yao
                   ` (17 subsequent siblings)
  26 siblings, 2 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

For hardware events, they have pre-defined configs. The kernel
needs to know where the event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:

0xDD000000AA
AA: original hardware event ID
DD: PMU type ID

PMU type ID is retrieved from sysfs. For example,

  cat /sys/devices/cpu_atom/type
  10

  cat /sys/devices/cpu_core/type
  4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.

  root@ssp-pwrt-002:~# ./perf stat -e cycles -vv -a -- sleep 1
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0x400000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0xa00000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  cycles: 0: 810754998 1002563650 1002563650
  cycles: 1: 810749852 1002559947 1002559947
  cycles: 2: 808096005 1002555036 1002555036
  cycles: 3: 808090246 1002543496 1002543496
  cycles: 4: 800933425 1002536659 1002536659
  cycles: 5: 800928573 1002528386 1002528386
  cycles: 6: 800924347 1002520527 1002520527
  cycles: 7: 800922009 1002513176 1002513176
  cycles: 8: 800919624 1002507326 1002507326
  cycles: 9: 800917204 1002500663 1002500663
  cycles: 10: 802096579 1002494280 1002494280
  cycles: 11: 802093770 1002486404 1002486404
  cycles: 12: 803284338 1002479491 1002479491
  cycles: 13: 803277609 1002469777 1002469777
  cycles: 14: 800875902 1002458861 1002458861
  cycles: 15: 800873241 1002451350 1002451350
  cycles: 0: 800837379 1002444645 1002444645
  cycles: 1: 800833400 1002438505 1002438505
  cycles: 2: 800829291 1002433698 1002433698
  cycles: 3: 800824390 1002427584 1002427584
  cycles: 4: 800819360 1002422099 1002422099
  cycles: 5: 800814787 1002415845 1002415845
  cycles: 6: 800810125 1002410301 1002410301
  cycles: 7: 800791893 1002386845 1002386845
  cycles: 12855737722 16040169029 16040169029
  cycles: 6406560625 8019379522 8019379522

   Performance counter stats for 'system wide':

      12,855,737,722      cpu_core/cycles/
       6,406,560,625      cpu_atom/cycles/

         1.002774658 seconds time elapsed

type 6 is PERF_TYPE_HARDWARE_PMU.
0x4 in 0x400000000 indicates the cpu_core pmu.
0xa in 0xa00000000 indicates the cpu_atom pmu.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/parse-events.c | 73 ++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 42c84adeb2fb..c6c76fc810a3 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -446,6 +446,24 @@ static int config_attr(struct perf_event_attr *attr,
 		       struct parse_events_error *err,
 		       config_term_func_t config_term);
 
+static void config_hybrid_attr(struct perf_event_attr *attr,
+			       int type, int pmu_type)
+{
+	/*
+	 * attr.config layout:
+	 * PERF_TYPE_HARDWARE_PMU:     0xDD000000AA
+	 *                             AA: hardware event ID
+	 *                             DD: PMU type ID
+	 * PERF_TYPE_HW_CACHE_PMU:     0xDD00CCBBAA
+	 *                             AA: hardware cache ID
+	 *                             BB: hardware cache op ID
+	 *                             CC: hardware cache op result ID
+	 *                             DD: PMU type ID
+	 */
+	attr->type = type;
+	attr->config = attr->config | ((__u64)pmu_type << PERF_PMU_TYPE_SHIFT);
+}
+
 int parse_events_add_cache(struct list_head *list, int *idx,
 			   char *type, char *op_result1, char *op_result2,
 			   struct parse_events_error *err,
@@ -1409,6 +1427,47 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
 					    err, head_config);
 }
 
+static int create_hybrid_hw_event(struct parse_events_state *parse_state,
+				  struct list_head *list,
+				  struct perf_event_attr *attr,
+				  struct perf_pmu *pmu)
+{
+	struct evsel *evsel;
+	__u32 type = attr->type;
+	__u64 config = attr->config;
+
+	config_hybrid_attr(attr, PERF_TYPE_HARDWARE_PMU, pmu->type);
+	evsel = __add_event(list, &parse_state->idx, attr, true, NULL,
+			    pmu, NULL, false, NULL);
+	if (evsel)
+		evsel->pmu_name = strdup(pmu->name);
+	else
+		return -ENOMEM;
+
+	attr->type = type;
+	attr->config = config;
+	return 0;
+}
+
+static int add_hybrid_numeric(struct parse_events_state *parse_state,
+			      struct list_head *list,
+			      struct perf_event_attr *attr,
+			      bool *hybrid)
+{
+	struct perf_pmu *pmu;
+	int ret;
+
+	*hybrid = false;
+	perf_pmu__for_each_hybrid_pmu(pmu) {
+		*hybrid = true;
+		ret = create_hybrid_hw_event(parse_state, list, attr, pmu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 int parse_events_add_numeric(struct parse_events_state *parse_state,
 			     struct list_head *list,
 			     u32 type, u64 config,
@@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
 {
 	struct perf_event_attr attr;
 	LIST_HEAD(config_terms);
+	bool hybrid;
+	int ret;
 
 	memset(&attr, 0, sizeof(attr));
 	attr.type = type;
@@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
 			return -ENOMEM;
 	}
 
+	/*
+	 * Skip the software dummy event.
+	 */
+	if (type != PERF_TYPE_SOFTWARE) {
+		if (!perf_pmu__hybrid_exist())
+			perf_pmu__scan(NULL);
+
+		ret = add_hybrid_numeric(parse_state, list, &attr, &hybrid);
+		if (hybrid)
+			return ret;
+	}
+
 	return add_event(list, &parse_state->idx, &attr,
 			 get_config_name(head_config), &config_terms);
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 10/27] perf parse-events: Create two hybrid cache events
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (8 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-15 23:05   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU Jin Yao
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao, hardware

For cache events, they have pre-defined configs. The kernel needs
to know where the cache event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HW_CACHE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HW_CACHE_PMU'.

The new attr.config layout for PERF_TYPE_HW_CACHE_PMU is

0xDD00CCBBAA
AA: hardware cache ID
BB: hardware cache op ID
CC: hardware cache op result ID
DD: PMU type ID

Similar as hardware event, PMU type ID is retrieved from sysfs.

When enabling a hybrid cache event without specified pmu, such as,
'perf stat -e L1-dcache-loads -a', two events are created
automatically. One is for atom, the other is for core.

  root@ssp-pwrt-002:~# ./perf stat -e L1-dcache-loads -vv -a -- sleep 1
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             7
    size                             120
    config                           0x400000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  ------------------------------------------------------------
  perf_event_attr:
    type                             7
    size                             120
    config                           0xa00000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  L1-dcache-loads: 0: 13103284 1002535421 1002535421
  L1-dcache-loads: 1: 12995797 1002532807 1002532807
  L1-dcache-loads: 2: 13428186 1002528572 1002528572
  L1-dcache-loads: 3: 12913469 1002517437 1002517437
  L1-dcache-loads: 4: 12857843 1002507079 1002507079
  L1-dcache-loads: 5: 12812079 1002498279 1002498279
  L1-dcache-loads: 6: 12829938 1002490010 1002490010
  L1-dcache-loads: 7: 12807085 1002481860 1002481860
  L1-dcache-loads: 8: 12907189 1002473181 1002473181
  L1-dcache-loads: 9: 12823095 1002465895 1002465895
  L1-dcache-loads: 10: 12892770 1002459322 1002459322
  L1-dcache-loads: 11: 12789718 1002451607 1002451607
  L1-dcache-loads: 12: 12838931 1002442632 1002442632
  L1-dcache-loads: 13: 12803756 1002434133 1002434133
  L1-dcache-loads: 14: 12840574 1002426060 1002426060
  L1-dcache-loads: 15: 12799075 1002415964 1002415964
  L1-dcache-loads: 0: 39394457 1002406287 1002406287
  L1-dcache-loads: 1: 39372632 1002400502 1002400502
  L1-dcache-loads: 2: 39405247 1002394865 1002394865
  L1-dcache-loads: 3: 39400547 1002389099 1002389099
  L1-dcache-loads: 4: 39410752 1002383106 1002383106
  L1-dcache-loads: 5: 39402983 1002375365 1002375365
  L1-dcache-loads: 6: 39388775 1002369374 1002369374
  L1-dcache-loads: 7: 39408527 1002363344 1002363344
  L1-dcache-loads: 206442789 16039660259 16039660259
  L1-dcache-loads: 315183920 8019081942 8019081942

   Performance counter stats for 'system wide':

         206,442,789      cpu_core/L1-dcache-loads/
         315,183,920      cpu_atom/L1-dcache-loads/

         1.002751663 seconds time elapsed

type 7 is PERF_TYPE_HW_CACHE_PMU.
0x4 in 0x400000000 indicates the cpu_core pmu.
0xa in 0xa00000000 indicates the cpu_atom pmu.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/parse-events.c | 54 +++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c6c76fc810a3..09e42245f71a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -464,6 +464,48 @@ static void config_hybrid_attr(struct perf_event_attr *attr,
 	attr->config = attr->config | ((__u64)pmu_type << PERF_PMU_TYPE_SHIFT);
 }
 
+static int create_hybrid_cache_event(struct list_head *list, int *idx,
+				     struct perf_event_attr *attr, char *name,
+				     struct list_head *config_terms,
+				     struct perf_pmu *pmu)
+{
+	struct evsel *evsel;
+	__u32 type = attr->type;
+	__u64 config = attr->config;
+
+	config_hybrid_attr(attr, PERF_TYPE_HW_CACHE_PMU, pmu->type);
+	evsel = __add_event(list, idx, attr, true, name,
+			    pmu, config_terms, false, NULL);
+	if (evsel)
+		evsel->pmu_name = strdup(pmu->name);
+	else
+		return -ENOMEM;
+
+	attr->type = type;
+	attr->config = config;
+	return 0;
+}
+
+static int add_hybrid_cache(struct list_head *list, int *idx,
+			    struct perf_event_attr *attr, char *name,
+			    struct list_head *config_terms,
+			    bool *hybrid)
+{
+	struct perf_pmu *pmu;
+	int ret;
+
+	*hybrid = false;
+	perf_pmu__for_each_hybrid_pmu(pmu) {
+		*hybrid = true;
+		ret = create_hybrid_cache_event(list, idx, attr, name,
+						config_terms, pmu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 int parse_events_add_cache(struct list_head *list, int *idx,
 			   char *type, char *op_result1, char *op_result2,
 			   struct parse_events_error *err,
@@ -474,7 +516,8 @@ int parse_events_add_cache(struct list_head *list, int *idx,
 	char name[MAX_NAME_LEN], *config_name;
 	int cache_type = -1, cache_op = -1, cache_result = -1;
 	char *op_result[2] = { op_result1, op_result2 };
-	int i, n;
+	int i, n, ret;
+	bool hybrid;
 
 	/*
 	 * No fallback - if we cannot get a clear cache type
@@ -534,6 +577,15 @@ int parse_events_add_cache(struct list_head *list, int *idx,
 		if (get_config_terms(head_config, &config_terms))
 			return -ENOMEM;
 	}
+
+	if (!perf_pmu__hybrid_exist())
+		perf_pmu__scan(NULL);
+
+	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
+			       &config_terms, &hybrid);
+	if (hybrid)
+		return ret;
+
 	return add_event(list, idx, &attr, config_name ? : name, &config_terms);
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (9 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 10/27] perf parse-events: Create two hybrid cache events Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-12 19:15   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 12/27] perf parse-events: Support hybrid raw events Jin Yao
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

On hybrid platform, some hardware events are only available
on a specific pmu. For example, 'L1-dcache-load-misses' is only
available on 'cpu_core' pmu. And even for the event which can be
available on both pmus, the user also may want to just enable
one event. So now following syntax is supported:

cpu_core/<hardware event>/
cpu_core/<hardware cache event>/
cpu_core/<pmu event>/

cpu_atom/<hardware event>/
cpu_atom/<hardware cache event>/
cpu_atom/<pmu event>/

It limits the event to be enabled only on a specified pmu.

The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".

We create a new "parse_events_state" with the pmu_name and use
parse_events__scanner to scan the term->config (the string "LLC-loads"
in this example). The parse_events_add_cache() will be called during
parsing. The parse_state->pmu_name is used to identify the pmu
where the event is enabled.

Let's see examples:

  root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv -- ./triad_loop
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0x400000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
  ------------------------------------------------------------
  perf_event_attr:
    type                             7
    size                             120
    config                           0x400000002
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 449252097 297999924 297999924
  LLC-loads: 0: 1857 297999924 297999924
  cycles: 449252097 297999924 297999924
  LLC-loads: 1857 297999924 297999924

   Performance counter stats for './triad_loop':

         449,252,097      cpu_core/cycles/
               1,857      cpu_core/LLC-loads/

         0.298898415 seconds time elapsed

  root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv -- taskset -c 16 ./triad_loop
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0xa00000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 3
  ------------------------------------------------------------
  perf_event_attr:
    type                             7
    size                             120
    config                           0xa00000002
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 602020010 343657939 342553275
  LLC-loads: 0: 3537 343657939 342553275
  cycles: 603961400 343657939 342553275
  LLC-loads: 3548 343657939 342553275

   Performance counter stats for 'taskset -c 16 ./triad_loop':

         603,961,400      cpu_atom/cycles/                                              (99.68%)
               3,548      cpu_atom/LLC-loads/                                           (99.68%)

         0.344904585 seconds time elapsed

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/parse-events.c | 100 +++++++++++++++++++++++++++++++--
 tools/perf/util/parse-events.h |   6 +-
 tools/perf/util/parse-events.y |  21 ++-----
 3 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09e42245f71a..30435adc7a7b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -489,7 +489,8 @@ static int create_hybrid_cache_event(struct list_head *list, int *idx,
 static int add_hybrid_cache(struct list_head *list, int *idx,
 			    struct perf_event_attr *attr, char *name,
 			    struct list_head *config_terms,
-			    bool *hybrid)
+			    bool *hybrid,
+			    struct parse_events_state *parse_state)
 {
 	struct perf_pmu *pmu;
 	int ret;
@@ -497,6 +498,11 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
 	*hybrid = false;
 	perf_pmu__for_each_hybrid_pmu(pmu) {
 		*hybrid = true;
+		 if (parse_state->pmu_name &&
+		     strcmp(parse_state->pmu_name, pmu->name)) {
+			continue;
+		}
+
 		ret = create_hybrid_cache_event(list, idx, attr, name,
 						config_terms, pmu);
 		if (ret)
@@ -509,7 +515,8 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
 int parse_events_add_cache(struct list_head *list, int *idx,
 			   char *type, char *op_result1, char *op_result2,
 			   struct parse_events_error *err,
-			   struct list_head *head_config)
+			   struct list_head *head_config,
+			   struct parse_events_state *parse_state)
 {
 	struct perf_event_attr attr;
 	LIST_HEAD(config_terms);
@@ -582,7 +589,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
 		perf_pmu__scan(NULL);
 
 	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
-			       &config_terms, &hybrid);
+			       &config_terms, &hybrid, parse_state);
 	if (hybrid)
 		return ret;
 
@@ -1512,6 +1519,11 @@ static int add_hybrid_numeric(struct parse_events_state *parse_state,
 	*hybrid = false;
 	perf_pmu__for_each_hybrid_pmu(pmu) {
 		*hybrid = true;
+		if (parse_state->pmu_name &&
+		    strcmp(parse_state->pmu_name, pmu->name)) {
+			continue;
+		}
+
 		ret = create_hybrid_hw_event(parse_state, list, attr, pmu);
 		if (ret)
 			return ret;
@@ -1578,6 +1590,10 @@ static bool config_term_percore(struct list_head *config_terms)
 	return false;
 }
 
+static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
+			      const char *str, char *name, bool *found,
+			      struct list_head *list);
+
 int parse_events_add_pmu(struct parse_events_state *parse_state,
 			 struct list_head *list, char *name,
 			 struct list_head *head_config,
@@ -1589,7 +1605,7 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
 	struct perf_pmu *pmu;
 	struct evsel *evsel;
 	struct parse_events_error *err = parse_state->error;
-	bool use_uncore_alias;
+	bool use_uncore_alias, found;
 	LIST_HEAD(config_terms);
 
 	if (verbose > 1) {
@@ -1605,6 +1621,22 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
 		fprintf(stderr, "' that may result in non-fatal errors\n");
 	}
 
+	if (head_config && perf_pmu__is_hybrid(name)) {
+		struct parse_events_term *term;
+		int ret;
+
+		list_for_each_entry(term, head_config, list) {
+			if (!term->config)
+				continue;
+			ret = parse_events_with_hybrid_pmu(parse_state,
+							   term->config,
+							   name, &found,
+							   list);
+			if (found)
+				return ret;
+		}
+	}
+
 	pmu = parse_state->fake_pmu ?: perf_pmu__find(name);
 	if (!pmu) {
 		char *err_str;
@@ -1713,12 +1745,19 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 	struct perf_pmu *pmu = NULL;
 	int ok = 0;
 
+        if (parse_state->pmu_name) {
+                list = alloc_list();
+                if (!list)
+                        return -1;
+                *listp = list;
+                return 0;
+        }
+
 	*listp = NULL;
 	/* Add it for all PMUs that support the alias */
-	list = malloc(sizeof(struct list_head));
+	list = alloc_list();
 	if (!list)
 		return -1;
-	INIT_LIST_HEAD(list);
 	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
 		struct perf_pmu_alias *alias;
 
@@ -2284,6 +2323,44 @@ int parse_events_terms(struct list_head *terms, const char *str)
 	return ret;
 }
 
+static int list_num(struct list_head *list)
+{
+	struct list_head *pos;
+	int n = 0;
+
+	list_for_each(pos, list)
+		n++;
+
+	return n;
+}
+
+static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
+					const char *str, char *pmu_name,
+					bool *found, struct list_head *list)
+{
+	struct parse_events_state ps = {
+		.list		= LIST_HEAD_INIT(ps.list),
+		.stoken		= PE_START_EVENTS,
+		.pmu_name	= pmu_name,
+		.idx		= parse_state->idx,
+	};
+	int ret;
+
+	*found = false;
+	ret = parse_events__scanner(str, &ps);
+	perf_pmu__parse_cleanup();
+
+	if (!ret) {
+		if (!list_empty(&ps.list)) {
+			*found = true;
+			list_splice(&ps.list, list);
+			parse_state->idx = list_num(list);
+		}
+	}
+
+	return ret;
+}
+
 int __parse_events(struct evlist *evlist, const char *str,
 		   struct parse_events_error *err, struct perf_pmu *fake_pmu)
 {
@@ -3307,3 +3384,14 @@ char *parse_events_formats_error_string(char *additional_terms)
 fail:
 	return NULL;
 }
+
+struct list_head *alloc_list(void)
+{
+	struct list_head *list = malloc(sizeof(*list));
+
+	if (!list)
+		return NULL;
+
+	INIT_LIST_HEAD(list);
+	return list;
+}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e80c9b74f2f2..39c7121a4659 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -138,6 +138,7 @@ struct parse_events_state {
 	struct list_head	  *terms;
 	int			   stoken;
 	struct perf_pmu		  *fake_pmu;
+	char			  *pmu_name;
 };
 
 void parse_events__handle_error(struct parse_events_error *err, int idx,
@@ -188,7 +189,8 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
 int parse_events_add_cache(struct list_head *list, int *idx,
 			   char *type, char *op_result1, char *op_result2,
 			   struct parse_events_error *error,
-			   struct list_head *head_config);
+			   struct list_head *head_config,
+			   struct parse_events_state *parse_state);
 int parse_events_add_breakpoint(struct list_head *list, int *idx,
 				u64 addr, char *type, u64 len);
 int parse_events_add_pmu(struct parse_events_state *parse_state,
@@ -242,6 +244,8 @@ char *parse_events_formats_error_string(char *additional_terms);
 void parse_events_print_error(struct parse_events_error *err,
 			      const char *event);
 
+struct list_head *alloc_list(void);
+
 #ifdef HAVE_LIBELF_SUPPORT
 /*
  * If the probe point starts with '%',
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index d57ac86ce7ca..e0e68c3da9e4 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -26,18 +26,6 @@ do { \
 		YYABORT; \
 } while (0)
 
-static struct list_head* alloc_list(void)
-{
-	struct list_head *list;
-
-	list = malloc(sizeof(*list));
-	if (!list)
-		return NULL;
-
-	INIT_LIST_HEAD(list);
-	return list;
-}
-
 static void free_list_evsel(struct list_head* list_evsel)
 {
 	struct evsel *evsel, *tmp;
@@ -454,7 +442,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e
 
 	list = alloc_list();
 	ABORT_ON(!list);
-	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6);
+	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6,
+				parse_state);
 	parse_events_terms__delete($6);
 	free($1);
 	free($3);
@@ -475,7 +464,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config
 
 	list = alloc_list();
 	ABORT_ON(!list);
-	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4);
+	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4,
+				parse_state);
 	parse_events_terms__delete($4);
 	free($1);
 	free($3);
@@ -495,7 +485,8 @@ PE_NAME_CACHE_TYPE opt_event_config
 
 	list = alloc_list();
 	ABORT_ON(!list);
-	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2);
+	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2,
+				parse_state);
 	parse_events_terms__delete($2);
 	free($1);
 	if (err) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 12/27] perf parse-events: Support hybrid raw events
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (10 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default Jin Yao
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

On hybrid platform, same raw event is possible to be available on
both cpu_core pmu and cpu_atom pmu. So it's supported to create
two raw events for one event encoding.

  root@ssp-pwrt-002:~# ./perf stat -e r3c -a -vv -- sleep 1
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             4
    size                             120
    config                           0x3c
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  ------------------------------------------------------------
  perf_event_attr:
    type                             10
    size                             120
    config                           0x3c
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  r3c: 0: 807321251 1002093589 1002093589
  r3c: 1: 807321699 1002088203 1002088203
  r3c: 2: 802163010 1002086701 1002086701
  r3c: 3: 802162967 1002080660 1002080660
  r3c: 4: 801769096 1002077047 1002077047
  r3c: 5: 801766174 1002071197 1002071197
  r3c: 6: 804147338 1002065696 1002065696
  r3c: 7: 804141152 1002055345 1002055345
  r3c: 8: 801743651 1002043364 1002043364
  r3c: 9: 801742285 1002036921 1002036921
  r3c: 10: 804083297 1002032502 1002032502
  r3c: 11: 804084735 1002027992 1002027992
  r3c: 12: 804504507 1002026371 1002026371
  r3c: 13: 804504679 1002022466 1002022466
  r3c: 14: 811424953 1002021767 1002021767
  r3c: 15: 811423320 1002021594 1002021594
  r3c: 0: 810883154 1002021654 1002021654
  r3c: 1: 810881069 1002017334 1002017334
  r3c: 2: 810878689 1002014010 1002014010
  r3c: 3: 810876654 1002011516 1002011516
  r3c: 4: 800488244 1002007858 1002007858
  r3c: 5: 800486260 1002003635 1002003635
  r3c: 6: 800483374 1002000384 1002000384
  r3c: 7: 800481011 1001997122 1001997122
  r3c: 12874304114 16032851415 16032851415
  r3c: 6445458455 8016073513 8016073513

   Performance counter stats for 'system wide':

      12,874,304,114      cpu_core/r3c/
       6,445,458,455      cpu_atom/r3c/

         1.002310991 seconds time elapsed

It also supports the raw event inside pmu. Syntax is similar:

cpu_core/<raw event>/
cpu_atom/<raw event>/

  root@ssp-pwrt-002:~# ./perf stat -e cpu_core/r3c/ -vv  -- ./triad_loop
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             4
    size                             120
    config                           0x3c
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 12340  cpu -1  group_fd -1  flags 0x8 = 3
  cpu_core/r3c/: 0: 449000613 293915211 293915211
  cpu_core/r3c/: 449000613 293915211 293915211

   Performance counter stats for './triad_loop':

         449,000,613      cpu_core/r3c/

         0.294859229 seconds time elapsed

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/parse-events.c | 56 +++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 30435adc7a7b..9b2a33103a57 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1532,6 +1532,55 @@ static int add_hybrid_numeric(struct parse_events_state *parse_state,
 	return 0;
 }
 
+static int create_hybrid_raw_event(struct parse_events_state *parse_state,
+				   struct list_head *list,
+				   struct perf_event_attr *attr,
+				   struct list_head *head_config,
+				   struct list_head *config_terms,
+				   struct perf_pmu *pmu)
+{
+	struct evsel *evsel;
+
+	attr->type = pmu->type;
+	evsel = __add_event(list, &parse_state->idx, attr, true,
+			    get_config_name(head_config),
+			    pmu, config_terms, false, NULL);
+	if (evsel)
+		evsel->pmu_name = strdup(pmu->name);
+	else
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int add_hybrid_raw(struct parse_events_state *parse_state,
+			  struct list_head *list,
+			  struct perf_event_attr *attr,
+			  struct list_head *head_config,
+			  struct list_head *config_terms,
+			  bool *hybrid)
+{
+	struct perf_pmu *pmu;
+	int ret;
+
+	*hybrid = false;
+	perf_pmu__for_each_hybrid_pmu(pmu) {
+		*hybrid = true;
+		if (parse_state->pmu_name &&
+		    strcmp(parse_state->pmu_name, pmu->name)) {
+			continue;
+		}
+
+		ret = create_hybrid_raw_event(parse_state, list, attr,
+					      head_config, config_terms,
+					      pmu);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 int parse_events_add_numeric(struct parse_events_state *parse_state,
 			     struct list_head *list,
 			     u32 type, u64 config,
@@ -1558,7 +1607,12 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
 	/*
 	 * Skip the software dummy event.
 	 */
-	if (type != PERF_TYPE_SOFTWARE) {
+	if (type == PERF_TYPE_RAW) {
+		ret = add_hybrid_raw(parse_state, list, &attr, head_config,
+				     &config_terms, &hybrid);
+		if (hybrid)
+			return ret;
+	} else if (type != PERF_TYPE_SOFTWARE) {
 		if (!perf_pmu__hybrid_exist())
 			perf_pmu__scan(NULL);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (11 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 12/27] perf parse-events: Support hybrid raw events Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 14/27] perf stat: Add default hybrid events Jin Yao
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

When evlist is empty, for example no '-e' specified in perf record,
one default 'cycles' event is added to evlist.

While on hybrid platform, it needs to create two default 'cycles'
events. One is for core, the other is for atom.

This patch actually calls evsel__new_cycles() two times to create
two 'cycles' events.

  root@ssp-pwrt-002:~# ./perf record -vv -- sleep 1
  ...
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0x400000000
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|ID|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid 22300  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 22300  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 22300  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid 22300  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 22300  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 22300  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 22300  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 22300  cpu 7  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid 22300  cpu 8  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid 22300  cpu 9  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid 22300  cpu 10  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid 22300  cpu 11  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid 22300  cpu 12  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid 22300  cpu 13  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid 22300  cpu 14  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid 22300  cpu 15  group_fd -1  flags 0x8 = 21
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0xa00000000
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|ID|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    freq                             1
    enable_on_exec                   1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 22300  cpu 16  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid 22300  cpu 17  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid 22300  cpu 18  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid 22300  cpu 19  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid 22300  cpu 20  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid 22300  cpu 21  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid 22300  cpu 22  group_fd -1  flags 0x8 = 28
  sys_perf_event_open: pid 22300  cpu 23  group_fd -1  flags 0x8 = 29
  ...

We can see one core 'cycles' (0x400000000) is enabled on cpu0-cpu15
and atom 'cycles' (0xa00000000) is enabled on cpu16-cpu23.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-record.c | 10 ++++++----
 tools/perf/util/evlist.c    | 32 +++++++++++++++++++++++++++++++-
 tools/perf/util/evsel.c     |  6 +++---
 tools/perf/util/evsel.h     |  2 +-
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 35465d1db6dd..363ea1047148 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2786,10 +2786,12 @@ int cmd_record(int argc, const char **argv)
 	if (record.opts.overwrite)
 		record.opts.tail_synthesize = true;
 
-	if (rec->evlist->core.nr_entries == 0 &&
-	    __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
-		pr_err("Not enough memory for event selector list\n");
-		goto out;
+	if (rec->evlist->core.nr_entries == 0) {
+		perf_pmu__scan(NULL);
+		if (__evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
+			pr_err("Not enough memory for event selector list\n");
+			goto out;
+		}
 	}
 
 	if (rec->opts.target.tid && !rec->opts.no_inherit_set)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3ee12fcd0c9f..f139151b9433 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -244,10 +244,40 @@ void evlist__set_leader(struct evlist *evlist)
 	}
 }
 
+static int __evlist__add_hybrid_default(struct evlist *evlist, bool precise)
+{
+	struct evsel *evsel;
+	struct perf_pmu *pmu;
+	__u64 config;
+	struct perf_cpu_map *cpus;
+
+	perf_pmu__for_each_hybrid_pmu(pmu) {
+		config = PERF_COUNT_HW_CPU_CYCLES |
+			 ((__u64)pmu->type << PERF_PMU_TYPE_SHIFT);
+		evsel = evsel__new_cycles(precise, PERF_TYPE_HARDWARE_PMU,
+					  config);
+		if (!evsel)
+			return -ENOMEM;
+
+		cpus = perf_cpu_map__get(pmu->cpus);
+		evsel->core.cpus = cpus;
+		evsel->core.own_cpus = perf_cpu_map__get(cpus);
+		evsel->pmu_name = strdup(pmu->name);
+		evlist__add(evlist, evsel);
+	}
+
+	return 0;
+}
+
 int __evlist__add_default(struct evlist *evlist, bool precise)
 {
-	struct evsel *evsel = evsel__new_cycles(precise);
+	struct evsel *evsel;
+
+	if (perf_pmu__hybrid_exist())
+		return __evlist__add_hybrid_default(evlist, precise);
 
+	evsel = evsel__new_cycles(precise, PERF_TYPE_HARDWARE,
+				  PERF_COUNT_HW_CPU_CYCLES);
 	if (evsel == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7ecbc8e2fbfa..e0b6227d263f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -295,11 +295,11 @@ static bool perf_event_can_profile_kernel(void)
 	return perf_event_paranoid_check(1);
 }
 
-struct evsel *evsel__new_cycles(bool precise)
+struct evsel *evsel__new_cycles(bool precise, __u32 type, __u64 config)
 {
 	struct perf_event_attr attr = {
-		.type	= PERF_TYPE_HARDWARE,
-		.config	= PERF_COUNT_HW_CPU_CYCLES,
+		.type	= type,
+		.config	= config,
 		.exclude_kernel	= !perf_event_can_profile_kernel(),
 	};
 	struct evsel *evsel;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 69aadc52c1bd..8e9079505e96 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -204,7 +204,7 @@ static inline struct evsel *evsel__newtp(const char *sys, const char *name)
 	return evsel__newtp_idx(sys, name, 0);
 }
 
-struct evsel *evsel__new_cycles(bool precise);
+struct evsel *evsel__new_cycles(bool precise, __u32 type, __u64 config);
 
 struct tep_event *event_format__new(const char *sys, const char *name);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 14/27] perf stat: Add default hybrid events
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (12 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event Jin Yao
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

  root@otcpl-adl-s-2:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

              109.43 msec task-clock                #    0.993 CPUs utilized
                   1      context-switches          #    0.009 K/sec
                   0      cpu-migrations            #    0.000 K/sec
                 105      page-faults               #    0.960 K/sec
         401,161,982      cycles                    #    3.666 GHz
       1,601,216,357      instructions              #    3.99  insn per cycle
         200,217,751      branches                  # 1829.686 M/sec
              14,555      branch-misses             #    0.01% of all branches

         0.110176860 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two events are created for one hardware event.

core cycles,
atom cycles,
core instructions,
atom instructions,
core branches,
atom branches,
core branch-misses,
atom branch-misses

These events will be added to evlist in order on hybrid platform
if '-e' is not set.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:

  root@ssp-pwrt-002:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

              290.77 msec task-clock                #    0.996 CPUs utilized
                  25      context-switches          #    0.086 K/sec
                  13      cpu-migrations            #    0.045 K/sec
                 107      page-faults               #    0.368 K/sec
         449,620,957      cpu_core/cycles/          # 1546.334 M/sec
       <not counted>      cpu_atom/cycles/                                              (0.00%)
       1,601,499,820      cpu_core/instructions/    # 5507.870 M/sec
       <not counted>      cpu_atom/instructions/                                        (0.00%)
         200,272,310      cpu_core/branches/        #  688.776 M/sec
       <not counted>      cpu_atom/branches/                                            (0.00%)
              15,255      cpu_core/branch-misses/   #    0.052 M/sec
       <not counted>      cpu_atom/branch-misses/                                       (0.00%)

         0.291897676 seconds time elapsed

We can see two events are created for one hardware event.
First one is core event the second one is atom event.

One thing is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6c0a21323814..7a732508b2b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1162,6 +1162,13 @@ static int parse_stat_cgroups(const struct option *opt,
 	return parse_cgroups(opt, str, unset);
 }
 
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+	struct parse_events_error err;
+
+	return parse_events(evlist, "cycles,instructions,branches,branch-misses", &err);
+}
+
 static struct option stat_options[] = {
 	OPT_BOOLEAN('T', "transaction", &transaction_run,
 		    "hardware transaction statistics"),
@@ -1637,6 +1644,12 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS	},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES		},
 
+};
+	struct perf_event_attr default_sw_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES	},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS		},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS		},
 };
 
 /*
@@ -1874,6 +1887,15 @@ static int add_default_attributes(void)
 	}
 
 	if (!evsel_list->core.nr_entries) {
+		perf_pmu__scan(NULL);
+		if (perf_pmu__hybrid_exist()) {
+			if (evlist__add_default_attrs(evsel_list,
+						      default_sw_attrs) < 0) {
+				return -1;
+			}
+			return add_default_hybrid_events(evsel_list);
+		}
+
 		if (target__has_cpu(&target))
 			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (13 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 14/27] perf stat: Add default hybrid events Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group Jin Yao
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.

Before:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0           2      1,604,426,524      cpu_core/cycles/
  S0-D0-C4           2      1,604,408,224      cpu_core/cycles/
  S0-D0-C8           2      1,605,995,644      cpu_core/cycles/
  S0-D0-C12          2      1,628,056,554      cpu_core/cycles/
  S0-D0-C16          2      1,611,488,734      cpu_core/cycles/
  S0-D0-C20          2      1,616,314,761      cpu_core/cycles/
  S0-D0-C24          2      1,603,558,295      cpu_core/cycles/
  S0-D0-C28          2      1,603,541,128      cpu_core/cycles/
  S0-D0-C32          0      <not counted>      cpu_core/cycles/
  S0-D0-C33          0      <not counted>      cpu_core/cycles/
  S0-D0-C34          0      <not counted>      cpu_core/cycles/
  S0-D0-C35          0      <not counted>      cpu_core/cycles/
  S0-D0-C36          0      <not counted>      cpu_core/cycles/
  S0-D0-C37          0      <not counted>      cpu_core/cycles/
  S0-D0-C38          0      <not counted>      cpu_core/cycles/
  S0-D0-C39          0      <not counted>      cpu_core/cycles/

After:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0           2      1,621,781,943      cpu_core/cycles/
  S0-D0-C4           2      1,621,755,088      cpu_core/cycles/
  S0-D0-C8           2      1,604,276,920      cpu_core/cycles/
  S0-D0-C12          2      1,603,446,963      cpu_core/cycles/
  S0-D0-C16          2      1,604,231,725      cpu_core/cycles/
  S0-D0-C20          2      1,603,435,286      cpu_core/cycles/
  S0-D0-C24          2      1,603,387,250      cpu_core/cycles/
  S0-D0-C28          2      1,604,173,183      cpu_core/cycles/

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-display.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index ed37d8e7ea1a..2db7c36a03ad 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -634,6 +634,20 @@ static void aggr_cb(struct perf_stat_config *config,
 	}
 }
 
+static bool aggr_id_hybrid_matched(struct perf_stat_config *config,
+				   struct evsel *counter, struct aggr_cpu_id id)
+{
+	struct aggr_cpu_id s;
+
+	for (int i = 0; i < evsel__nr_cpus(counter); i++) {
+		s = config->aggr_get_id(config, evsel__cpus(counter), i);
+		if (cpu_map__compare_aggr_cpu_id(s, id))
+			return true;
+	}
+
+	return false;
+}
+
 static void print_counter_aggrdata(struct perf_stat_config *config,
 				   struct evsel *counter, int s,
 				   char *prefix, bool metric_only,
@@ -647,6 +661,12 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
 	double uval;
 
 	ad.id = id = config->aggr_map->map[s];
+
+	if (perf_pmu__hybrid_exist() &&
+	    !aggr_id_hybrid_matched(config, counter, id)) {
+		return;
+	}
+
 	ad.val = ad.ena = ad.run = 0;
 	ad.nr = 0;
 	if (!collect_data(config, counter, aggr_cb, &ad))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (14 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-15 23:03   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group Jin Yao
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

If a group has events which are from different hybrid PMUs,
shows a warning.

This is to remind the user not to put the core event and atom
event into one group.

  root@ssp-pwrt-002:~# ./perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -- sleep 1
  WARNING: Group has events from different hybrid PMUs

   Performance counter stats for 'sleep 1':

       <not counted>      cpu_core/cycles/
     <not supported>      cpu_atom/cycles/

         1.002585908 seconds time elapsed

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-record.c |  3 +++
 tools/perf/builtin-stat.c   |  7 ++++++
 tools/perf/util/evlist.c    | 44 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h    |  2 ++
 4 files changed, 56 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 363ea1047148..188a1198cd4b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -929,6 +929,9 @@ static int record__open(struct record *rec)
 			        pos = evlist__reset_weak_group(evlist, pos, true);
 				goto try_again;
 			}
+
+			if (errno == EINVAL && perf_pmu__hybrid_exist())
+				evlist__warn_hybrid_group(evlist);
 			rc = -errno;
 			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
 			ui__error("%s\n", msg);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7a732508b2b4..6f780a039db0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
 	struct evsel *evsel, *pos, *leader;
 	char buf[1024];
 
+	if (evlist__hybrid_exist(evlist))
+		return;
+
 	evlist__for_each_entry(evlist, evsel) {
 		leader = evsel->leader;
 
@@ -726,6 +729,10 @@ enum counter_recovery {
 static enum counter_recovery stat_handle_error(struct evsel *counter)
 {
 	char msg[BUFSIZ];
+
+	if (perf_pmu__hybrid_exist() && errno == EINVAL)
+		evlist__warn_hybrid_group(evsel_list);
+
 	/*
 	 * PPC returns ENXIO for HW counters until 2.6.37
 	 * (behavior changed with commit b0a873e).
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f139151b9433..5ec891418cdd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
 	perf_cpu_map__put(evlist->core.all_cpus);
 	evlist->core.all_cpus = perf_cpu_map__empty_new(1);
 }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+	struct evsel *pos, *prev = NULL;
+
+	for_each_group_evsel(pos, leader) {
+		if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
+			continue;
+
+		if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+			return true;
+
+		prev = pos;
+	}
+
+	return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_group_leader(evsel) &&
+		    evsel->core.nr_members > 1 &&
+		    group_hybrid_conflict(evsel)) {
+			WARN_ONCE(1, "WARNING: Group has events from "
+				     "different hybrid PMUs\n");
+			return;
+		}
+	}
+}
+
+bool evlist__hybrid_exist(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_hybrid_event(evsel))
+			return true;
+	}
+
+	return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0da683511d98..33dec3bb5739 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -369,4 +369,6 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
 void evlist__invalidate_all_cpus(struct evlist *evlist);
 
 bool evlist__has_hybrid_events(struct evlist *evlist);
+void evlist__warn_hybrid_group(struct evlist *evlist);
+bool evlist__hybrid_exist(struct evlist *evlist);
 #endif /* __PERF_EVLIST_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (15 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-15 23:04   ` Jiri Olsa
  2021-03-11  7:07 ` [PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU Jin Yao
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

A group mixed with hybrid event and global event is allowed. For example,
group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'.

e.g.
perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a

The challenge is their available cpus are not fully matched.
For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/'
is available on CPU16-CPU23.

When getting the group id for group member, we must be very careful
because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'.
Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID.
e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16.

Another challenge is for group read. The events in group may be not
available on all cpus. For example the leader is a software event and
it's available on CPU0-CPU1, but the group member is a hybrid event and
it's only available on CPU1. For CPU0, we have only one event, but for CPU1
we have two events. So we need to change the read size according to
the real number of events on that cpu.

Let's see examples,

  root@ssp-pwrt-002:~# ./perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a -vvv -- sleep 1
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             1
    size                             120
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0xa00000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 16  group_fd 20  flags 0x8 = 28
  sys_perf_event_open: pid -1  cpu 17  group_fd 21  flags 0x8 = 29
  sys_perf_event_open: pid -1  cpu 18  group_fd 22  flags 0x8 = 30
  sys_perf_event_open: pid -1  cpu 19  group_fd 23  flags 0x8 = 31
  sys_perf_event_open: pid -1  cpu 20  group_fd 24  flags 0x8 = 32
  sys_perf_event_open: pid -1  cpu 21  group_fd 25  flags 0x8 = 33
  sys_perf_event_open: pid -1  cpu 22  group_fd 26  flags 0x8 = 34
  sys_perf_event_open: pid -1  cpu 23  group_fd 27  flags 0x8 = 35
  cpu-clock: 0: 1002495141 1002496165 1002496165
  cpu-clock: 1: 1002494114 1002494741 1002494741
  cpu-clock: 2: 1002489755 1002491096 1002491096
  cpu-clock: 3: 1002486160 1002487660 1002487660
  cpu-clock: 4: 1002482342 1002483461 1002483461
  cpu-clock: 5: 1002480405 1002480756 1002480756
  cpu-clock: 6: 1002478135 1002478698 1002478698
  cpu-clock: 7: 1002476940 1002477251 1002477251
  cpu-clock: 8: 1002474281 1002475616 1002475616
  cpu-clock: 9: 1002471033 1002471983 1002471983
  cpu-clock: 10: 1002468437 1002469553 1002469553
  cpu-clock: 11: 1002467503 1002467892 1002467892
  cpu-clock: 12: 1002465167 1002466190 1002466190
  cpu-clock: 13: 1002463794 1002464109 1002464109
  cpu-clock: 14: 1002460746 1002461897 1002461897
  cpu-clock: 15: 1002460006 1002460371 1002460371
  cpu-clock: 16: 1002457619 1002458056 1002458056
  cpu-clock: 17: 1002451809 1002452414 1002452414
  cpu-clock: 18: 1002446385 1002446927 1002446927
  cpu-clock: 19: 1002442633 1002443203 1002443203
  cpu-clock: 20: 1002438330 1002438939 1002438939
  cpu-clock: 21: 1002432839 1002433483 1002433483
  cpu-clock: 22: 1002428951 1002429416 1002429416
  cpu-clock: 23: 1002423932 1002424604 1002424604
  cycles: 0: 800847113 1002458056 1002458056
  cycles: 1: 800843983 1002452414 1002452414
  cycles: 2: 800840233 1002446927 1002446927
  cycles: 3: 800837260 1002443203 1002443203
  cycles: 4: 800832030 1002438939 1002438939
  cycles: 5: 800829207 1002433483 1002433483
  cycles: 6: 800825621 1002429416 1002429416
  cycles: 7: 800822445 1002424604 1002424604
  cpu-clock: 24059136457 24059154481 24059154481
  cycles: 6406677892 8019527042 8019527042

   Performance counter stats for 'system wide':

           24,059.14 msec cpu-clock                 #   23.994 CPUs utilized
       6,406,677,892      cpu_atom/cycles/          #  266.289 M/sec

         1.002699058 seconds time elapsed

For cpu_atom/cycles/, cpu16-cpu23 are set with valid group fd (cpu-clock's fd
on that cpu). For counting results, cpu-clock has 24 cpus aggregation and
cpu_atom/cycles/ has 8 cpus aggregation. That's expected.

But if the event order is changed, e.g. '{cpu_atom/cycles/,cpu-clock}',
there leaves more works to do.

  root@ssp-pwrt-002:~# ./perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv -- sleep 1
  Control descriptor is not initialized
  ------------------------------------------------------------
  perf_event_attr:
    type                             6
    size                             120
    config                           0xa00000000
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
    disabled                         1
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 11
  ------------------------------------------------------------
  perf_event_attr:
    type                             1
    size                             120
    sample_type                      IDENTIFIER
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
    inherit                          1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid -1  cpu 16  group_fd 3  flags 0x8 = 28
  sys_perf_event_open: pid -1  cpu 17  group_fd 4  flags 0x8 = 29
  sys_perf_event_open: pid -1  cpu 18  group_fd 5  flags 0x8 = 30
  sys_perf_event_open: pid -1  cpu 19  group_fd 7  flags 0x8 = 31
  sys_perf_event_open: pid -1  cpu 20  group_fd 8  flags 0x8 = 32
  sys_perf_event_open: pid -1  cpu 21  group_fd 9  flags 0x8 = 33
  sys_perf_event_open: pid -1  cpu 22  group_fd 10  flags 0x8 = 34
  sys_perf_event_open: pid -1  cpu 23  group_fd 11  flags 0x8 = 35
  cycles: 0: 810965983 1002124999 1002124999
  cycles: 1: 810962706 1002118442 1002118442
  cycles: 2: 810959729 1002114853 1002114853
  cycles: 3: 810958079 1002111730 1002111730
  cycles: 4: 800570097 1002108582 1002108582
  cycles: 5: 800569278 1002106441 1002106441
  cycles: 6: 800568167 1002104339 1002104339
  cycles: 7: 800566760 1002102953 1002102953
  WARNING: for cpu-clock, some CPU counts not read
  cpu-clock: 0: 0 0 0
  cpu-clock: 1: 0 0 0
  cpu-clock: 2: 0 0 0
  cpu-clock: 3: 0 0 0
  cpu-clock: 4: 0 0 0
  cpu-clock: 5: 0 0 0
  cpu-clock: 6: 0 0 0
  cpu-clock: 7: 0 0 0
  cpu-clock: 8: 0 0 0
  cpu-clock: 9: 0 0 0
  cpu-clock: 10: 0 0 0
  cpu-clock: 11: 0 0 0
  cpu-clock: 12: 0 0 0
  cpu-clock: 13: 0 0 0
  cpu-clock: 14: 0 0 0
  cpu-clock: 15: 0 0 0
  cpu-clock: 16: 1002125111 1002124999 1002124999
  cpu-clock: 17: 1002118626 1002118442 1002118442
  cpu-clock: 18: 1002115058 1002114853 1002114853
  cpu-clock: 19: 1002111740 1002111730 1002111730
  cpu-clock: 20: 1002109031 1002108582 1002108582
  cpu-clock: 21: 1002105927 1002106441 1002106441
  cpu-clock: 22: 1002104010 1002104339 1002104339
  cpu-clock: 23: 1002102730 1002102953 1002102953
  cycles: 6446120799 8016892339 8016892339
  cpu-clock: 8016892233 8016892339 8016892339

   Performance counter stats for 'system wide':

       6,446,120,799      cpu_atom/cycles/          #  804.067 M/sec
            8,016.89 msec cpu-clock                 #    7.999 CPUs utilized

         1.002212870 seconds time elapsed

For cpu-clock, cpu16-cpu23 are set with valid group fd (cpu_atom/cycles/'s
fd on that cpu). For counting results, cpu_atom/cycles/ has 8 cpus aggregation
, that's correct. But for cpu-clock, it also has 8 cpus aggregation
(cpu16-cpu23, but not all cpus), the code should be improved. Now one warning
is displayed: "WARNING: for cpu-clock, some CPU counts not read".

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/evsel.c | 105 ++++++++++++++++++++++++++++++++++++++--
 tools/perf/util/stat.h  |   1 +
 2 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index e0b6227d263f..862fdc145f05 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1464,15 +1464,26 @@ static void evsel__set_count(struct evsel *counter, int cpu, int thread, u64 val
 	perf_counts__set_loaded(counter->counts, cpu, thread, true);
 }
 
-static int evsel__process_group_data(struct evsel *leader, int cpu, int thread, u64 *data)
+static int evsel_cpuid_match(struct evsel *evsel1, struct evsel *evsel2,
+			     int cpu)
+{
+	int cpuid;
+
+	cpuid = perf_cpu_map__cpu(evsel1->core.cpus, cpu);
+	return perf_cpu_map__idx(evsel2->core.cpus, cpuid);
+}
+
+static int evsel__process_group_data(struct evsel *leader, int cpu, int thread,
+				     u64 *data, int nr_members)
 {
 	u64 read_format = leader->core.attr.read_format;
 	struct sample_read_value *v;
 	u64 nr, ena = 0, run = 0, i;
+	int idx;
 
 	nr = *data++;
 
-	if (nr != (u64) leader->core.nr_members)
+	if (nr != (u64) nr_members)
 		return -EINVAL;
 
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -1492,24 +1503,85 @@ static int evsel__process_group_data(struct evsel *leader, int cpu, int thread,
 		if (!counter)
 			return -EINVAL;
 
-		evsel__set_count(counter, cpu, thread, v[i].value, ena, run);
+		if (evsel__is_hybrid_event(counter) ||
+		    evsel__is_hybrid_event(leader)) {
+			idx = evsel_cpuid_match(leader, counter, cpu);
+			if (idx == -1)
+				return -EINVAL;
+		} else
+			idx = cpu;
+
+		evsel__set_count(counter, idx, thread, v[i].value, ena, run);
 	}
 
 	return 0;
 }
 
+static int hybrid_read_size(struct evsel *leader, int cpu, int *nr_members)
+{
+	struct evsel *pos;
+	int nr = 1, back, new_size = 0, idx;
+
+	for_each_group_member(pos, leader) {
+		idx = evsel_cpuid_match(leader, pos, cpu);
+		if (idx != -1)
+			nr++;
+	}
+
+	if (nr != leader->core.nr_members) {
+		back = leader->core.nr_members;
+		leader->core.nr_members = nr;
+		new_size = perf_evsel__read_size(&leader->core);
+		leader->core.nr_members = back;
+	}
+
+	*nr_members = nr;
+	return new_size;
+}
+
 static int evsel__read_group(struct evsel *leader, int cpu, int thread)
 {
 	struct perf_stat_evsel *ps = leader->stats;
 	u64 read_format = leader->core.attr.read_format;
 	int size = perf_evsel__read_size(&leader->core);
+	int new_size, nr_members;
 	u64 *data = ps->group_data;
 
 	if (!(read_format & PERF_FORMAT_ID))
 		return -EINVAL;
 
-	if (!evsel__is_group_leader(leader))
+	if (!evsel__is_group_leader(leader)) {
+		if (evsel__is_hybrid_event(leader->leader) &&
+		    !evsel__is_hybrid_event(leader)) {
+			/*
+			 * The group leader is hybrid event and it's
+			 * only available on part of cpus. But the group
+			 * member are available on all cpus. TODO:
+			 * read the counts on the rest of cpus for group
+			 * member.
+			 */
+			WARN_ONCE(1, "WARNING: for %s, some CPU counts "
+				     "not read\n", leader->name);
+			return 0;
+		}
 		return -EINVAL;
+	}
+
+	/*
+	 * For example the leader is a software event and it's available on
+	 * cpu0-cpu1, but the group member is a hybrid event and it's only
+	 * available on cpu1. For cpu0, we have only one event, but for cpu1
+	 * we have two events. So we need to change the read size according to
+	 * the real number of events on a given cpu.
+	 */
+	new_size = hybrid_read_size(leader, cpu, &nr_members);
+	if (new_size)
+		size = new_size;
+
+	if (ps->group_data && ps->group_data_size < size) {
+		zfree(&ps->group_data);
+		data = NULL;
+	}
 
 	if (!data) {
 		data = zalloc(size);
@@ -1517,6 +1589,7 @@ static int evsel__read_group(struct evsel *leader, int cpu, int thread)
 			return -ENOMEM;
 
 		ps->group_data = data;
+		ps->group_data_size = size;
 	}
 
 	if (FD(leader, cpu, thread) < 0)
@@ -1525,7 +1598,7 @@ static int evsel__read_group(struct evsel *leader, int cpu, int thread)
 	if (readn(FD(leader, cpu, thread), data, size) <= 0)
 		return -errno;
 
-	return evsel__process_group_data(leader, cpu, thread, data);
+	return evsel__process_group_data(leader, cpu, thread, data, nr_members);
 }
 
 int evsel__read_counter(struct evsel *evsel, int cpu, int thread)
@@ -1572,6 +1645,28 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread)
 	 */
 	BUG_ON(!leader->core.fd);
 
+	/*
+	 * If leader is not hybrid event, it's available on
+	 * all cpus (e.g. software event). But hybrid evsel
+	 * member is only available on part of cpus. So need
+	 * to get the leader's fd from correct cpu.
+	 */
+	if (evsel__is_hybrid_event(evsel) &&
+	    !evsel__is_hybrid_event(leader)) {
+		cpu = evsel_cpuid_match(evsel, leader, cpu);
+		BUG_ON(cpu == -1);
+	}
+
+	/*
+	 * Leader is hybrid event but member is global event.
+	 */
+	if (!evsel__is_hybrid_event(evsel) &&
+	    evsel__is_hybrid_event(leader)) {
+		cpu = evsel_cpuid_match(evsel, leader, cpu);
+		if (cpu == -1)
+			return -1;
+	}
+
 	fd = FD(leader, cpu, thread);
 	BUG_ON(fd == -1);
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index d85c292148bb..4aec97d32e69 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -46,6 +46,7 @@ struct perf_stat_evsel {
 	struct stats		 res_stats[3];
 	enum perf_stat_evsel_id	 id;
 	u64			*group_data;
+	int			 group_data_size;
 };
 
 enum aggr_mode {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (16 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test Jin Yao
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

For a hybrid system, the perf subsystem doesn't know which PMU the
events belong to. So the PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU are introduced.

Now define the new output[] entries for these two types.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-script.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5915f19cee55..d0e889e636d5 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -275,6 +275,30 @@ static struct {
 		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
 	},
 
+	[PERF_TYPE_HARDWARE_PMU] = {
+		.user_set = false,
+
+		.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+			      PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+			      PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+			      PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+	},
+
+	[PERF_TYPE_HW_CACHE_PMU] = {
+		.user_set = false,
+
+		.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+			      PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+			      PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+			      PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+	},
+
 	[OUTPUT_TYPE_SYNTH] = {
 		.user_set = false,
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (17 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test Jin Yao
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Add basic hybrid test cases for 'Parse event definition strings' test.

root@otcpl-adl-s-2:~# ./perf test 6
 6: Parse event definition strings                                  : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/parse-events.c | 171 ++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index a7f6661e6112..aec929867020 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1512,6 +1512,123 @@ static int test__all_tracepoints(struct evlist *evlist)
 	return test__checkevent_tracepoint_multi(evlist);
 }
 
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+	struct evsel *evsel = evlist__first(evlist);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel->core.attr.config);
+	return 0;
+}
+
+static int test__hybrid_hw_event(struct evlist *evlist)
+{
+	struct evsel *evsel1 = evlist__first(evlist);
+	struct evsel *evsel2 = evlist__last(evlist);
+
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel1->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel1->core.attr.config);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel2->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0xa00000000 == evsel2->core.attr.config);
+	return 0;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+	struct evsel *evsel, *leader;
+
+	evsel = leader = evlist__first(evlist);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+	evsel = evsel__next(evsel);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000001 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+	return 0;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+	struct evsel *evsel, *leader;
+
+	evsel = leader = evlist__first(evlist);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+	evsel = evsel__next(evsel);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+	return 0;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+	struct evsel *evsel, *leader;
+
+	evsel = leader = evlist__first(evlist);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+	evsel = evsel__next(evsel);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+	return 0;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+	struct evsel *evsel, *leader;
+
+	evsel = leader = evlist__first(evlist);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000000 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
+
+	evsel = evsel__next(evsel);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x400000001 == evsel->core.attr.config);
+	TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+	TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
+	return 0;
+}
+
+static int test__hybrid_raw1(struct evlist *evlist)
+{
+	struct evsel *evsel = evlist__first(evlist);
+
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+
+	/* The type of second event is randome value */
+	evsel = evsel__next(evsel);
+	TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+	return 0;
+}
+
+static int test__hybrid_raw2(struct evlist *evlist)
+{
+	struct evsel *evsel = evlist__first(evlist);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+	return 0;
+}
+
 struct evlist_test {
 	const char *name;
 	__u32 type;
@@ -1868,6 +1985,54 @@ static struct terms_test test__terms[] = {
 	},
 };
 
+static struct evlist_test test__hybrid_events[] = {
+	{
+		.name  = "cpu_core/cycles/",
+		.check = test__hybrid_hw_event_with_pmu,
+		.id    = 0,
+	},
+	{
+		.name  = "cycles",
+		.check = test__hybrid_hw_event,
+		.id    = 1,
+	},
+	{
+		.name  = "{cpu_core/cycles/,cpu_core/instructions/}",
+		.check = test__hybrid_hw_group_event,
+		.id    = 2,
+	},
+	{
+		.name  = "{cpu-clock,cpu_core/cycles/}",
+		.check = test__hybrid_sw_hw_group_event,
+		.id    = 3,
+	},
+	{
+		.name  = "{cpu_core/cycles/,cpu-clock}",
+		.check = test__hybrid_hw_sw_group_event,
+		.id    = 4,
+	},
+	{
+		.name  = "{cpu_core/cycles:k/,cpu_core/instructions:u/}",
+		.check = test__hybrid_group_modifier1,
+		.id    = 5,
+	},
+	{
+		.name  = "r1a",
+		.check = test__hybrid_raw1,
+		.id    = 6,
+	},
+	{
+		.name  = "cpu_core/r1a/",
+		.check = test__hybrid_raw2,
+		.id    = 7,
+	},
+	{
+		.name  = "cpu_core/config=10,config1,config2=3,period=1000/u",
+		.check = test__checkevent_pmu,
+		.id    = 0,
+	},
+};
+
 static int test_event(struct evlist_test *e)
 {
 	struct parse_events_error err;
@@ -2035,6 +2200,12 @@ do {							\
 		ret2 = ret1;				\
 } while (0)
 
+	perf_pmu__scan(NULL);
+	if (perf_pmu__hybrid_exist()) {
+		TEST_EVENTS(test__hybrid_events);
+		return ret2;
+	}
+
 	TEST_EVENTS(test__events);
 
 	if (test_pmu())
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (18 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid Jin Yao
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Since for one hw event, two hybrid events are created.

For example,

evsel->idx	evsel__name(evsel)
0		cycles
1		cycles
2		instructions
3		instructions
...

So for comparing the evsel name on hybrid, the evsel->idx
needs to be divided by 2.

root@otcpl-adl-s-2:~# ./perf test 14
14: Roundtrip evsel->name                                           : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/evsel-roundtrip-name.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c b/tools/perf/tests/evsel-roundtrip-name.c
index f7f3e5b4c180..2b938a15901e 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -62,7 +62,8 @@ static int perf_evsel__roundtrip_cache_name_test(void)
 	return ret;
 }
 
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
+static int __perf_evsel__name_array_test(const char *names[], int nr_names,
+					 int distance)
 {
 	int i, err;
 	struct evsel *evsel;
@@ -82,9 +83,9 @@ static int __perf_evsel__name_array_test(const char *names[], int nr_names)
 
 	err = 0;
 	evlist__for_each_entry(evlist, evsel) {
-		if (strcmp(evsel__name(evsel), names[evsel->idx])) {
+		if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) {
 			--err;
-			pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->idx]);
+			pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->idx / distance]);
 		}
 	}
 
@@ -93,18 +94,22 @@ static int __perf_evsel__name_array_test(const char *names[], int nr_names)
 	return err;
 }
 
-#define perf_evsel__name_array_test(names) \
-	__perf_evsel__name_array_test(names, ARRAY_SIZE(names))
+#define perf_evsel__name_array_test(names, distance) \
+	__perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
 
 int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, int subtest __maybe_unused)
 {
 	int err = 0, ret = 0;
 
-	err = perf_evsel__name_array_test(evsel__hw_names);
+	perf_pmu__scan(NULL);
+	if (perf_pmu__hybrid_exist())
+		return perf_evsel__name_array_test(evsel__hw_names, 2);
+
+	err = perf_evsel__name_array_test(evsel__hw_names, 1);
 	if (err)
 		ret = err;
 
-	err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1);
+	err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1, 1);
 	if (err)
 		ret = err;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (19 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 22/27] perf tests: Support 'Track with sched_switch' " Jin Yao
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

For hybrid, kernel introduces new perf type PERF_TYPE_HARDWARE_PMU (6)
and it's assigned to hybrid hardware events.

root@otcpl-adl-s-2:~# ./perf test 17 -vvv
  ...
  compare
    matching [event:base-stat]
      to [event-6-17179869184-4]
      [cpu] * 0
      [flags] 0|8 8
      [type] 0 6
    ->FAIL
    match: [event:base-stat] matches []
    event:base-stat does not match, but is optional
  matched
  compare
    matching [event-6-17179869184-4]
      to [event:base-stat]
      [cpu] 0 *
      [flags] 8 0|8
      [type] 6 0
    ->FAIL
    match: [event-6-17179869184-4] matches []
expected type=6, got 0
expected config=17179869184, got 0
FAILED './tests/attr/test-stat-C0' - match failure

The type matching is failed because one type is 0 but the
type of hybrid hardware event is 6. We temporarily skip this
test case and TODO in future.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/attr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index dd39ce9b0277..fc7f74159764 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -34,6 +34,7 @@
 #include "event.h"
 #include "util.h"
 #include "tests.h"
+#include "pmu-hybrid.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -184,6 +185,10 @@ int test__attr(struct test *test __maybe_unused, int subtest __maybe_unused)
 	char path_dir[PATH_MAX];
 	char *exec_path;
 
+	perf_pmu__scan(NULL);
+	if (perf_pmu__hybrid_exist())
+		return 0;
+
 	/* First try development tree tests. */
 	if (!lstat("./tests", &st))
 		return run_dir("./tests", "./perf");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 22/27] perf tests: Support 'Track with sched_switch' test for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (20 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 23/27] perf tests: Support 'Parse and process metrics' " Jin Yao
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the number of events in evlist is not expected in next test
steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.

root@otcpl-adl-s-2:~# ./perf test 35
35: Track with sched_switch                                         : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/switch-tracking.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c
index 3ebaa758df77..13a11ce51a1a 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -340,6 +340,11 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_
 	struct evsel *switch_evsel, *tracking_evsel;
 	const char *comm;
 	int err = -1;
+	bool hybrid = false;
+
+	perf_pmu__scan(NULL);
+	if (perf_pmu__hybrid_exist())
+		hybrid = true;
 
 	threads = thread_map__new(-1, getpid(), UINT_MAX);
 	if (!threads) {
@@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_
 	cpu_clocks_evsel = evlist__last(evlist);
 
 	/* Second event */
-	err = parse_events(evlist, "cycles:u", NULL);
+	if (!hybrid)
+		err = parse_events(evlist, "cycles:u", NULL);
+	else
+		err = parse_events(evlist, "cpu_core/cycles:u/", NULL);
 	if (err) {
 		pr_debug("Failed to parse event cycles:u\n");
 		goto out_err;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 23/27] perf tests: Support 'Parse and process metrics' test for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (21 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 22/27] perf tests: Support 'Track with sched_switch' " Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 24/27] perf tests: Support 'Session topology' " Jin Yao
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Some events are not supported. Only pick up some cases for hybrid.

root@otcpl-adl-s-2:~# ./perf test 67
67: Parse and process metrics                                       : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/parse-metric.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 55bf52e588be..149b18f1f96a 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -370,12 +370,17 @@ static int test_metric_group(void)
 
 int test__parse_metric(struct test *test __maybe_unused, int subtest __maybe_unused)
 {
+	perf_pmu__scan(NULL);
+
 	TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
 	TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
-	TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 0);
 	TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
 	TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
-	TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
 	TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
-	return 0;
+
+	if (!perf_pmu__hybrid_exist()) {
+		TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 0);
+		TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+	}
+        return 0;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 24/27] perf tests: Support 'Session topology' test for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (22 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 23/27] perf tests: Support 'Parse and process metrics' " Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' " Jin Yao
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Force to create one event "cpu_core/cycles/" by default,
otherwise in evlist__valid_sample_type, the checking of
'if (evlist->core.nr_entries == 1)' would be failed.

root@otcpl-adl-s-2:~# ./perf test 41
41: Session topology                                                : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/topology.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index 74748ed75b2c..0f6e73baab2d 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -40,7 +40,15 @@ static int session_write_header(char *path)
 	session = perf_session__new(&data, false, NULL);
 	TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
 
-	session->evlist = evlist__new_default();
+	perf_pmu__scan(NULL);
+	if (!perf_pmu__hybrid_exist()) {
+		session->evlist = evlist__new_default();
+	} else {
+		struct parse_events_error err;
+
+		session->evlist = evlist__new();
+		parse_events(session->evlist, "cpu_core/cycles/", &err);
+	}
 	TEST_ASSERT_VAL("can't get evlist", session->evlist);
 
 	perf_header__set_feat(&session->header, HEADER_CPU_TOPOLOGY);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (23 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 24/27] perf tests: Support 'Session topology' " Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' " Jin Yao
  2021-03-11  7:07 ` [PATCH v2 27/27] perf Documentation: Document intel-hybrid support Jin Yao
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the second evsel in evlist also needs initialization.

With this patch,

root@otcpl-adl-s-2:~# ./perf test 71
71: Convert perf time to TSC                                        : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/perf-time-to-tsc.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
index 680c3cffb128..b472205ec8e3 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -66,6 +66,11 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
 	u64 test_tsc, comm1_tsc, comm2_tsc;
 	u64 test_time, comm1_time = 0, comm2_time = 0;
 	struct mmap *md;
+	bool hybrid = false;
+
+	perf_pmu__scan(NULL);
+	if (perf_pmu__hybrid_exist())
+		hybrid = true;
 
 	threads = thread_map__new(-1, getpid(), UINT_MAX);
 	CHECK_NOT_NULL__(threads);
@@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
 	evsel->core.attr.disabled = 1;
 	evsel->core.attr.enable_on_exec = 0;
 
+	/*
+	 * For hybrid "cycles:u", it creates two events.
+	 * Init the second evsel here.
+	 */
+	if (hybrid) {
+		evsel = evsel__next(evsel);
+		evsel->core.attr.comm = 1;
+		evsel->core.attr.disabled = 1;
+		evsel->core.attr.enable_on_exec = 0;
+	}
+
 	CHECK__(evlist__open(evlist));
 
 	CHECK__(evlist__mmap(evlist, UINT_MAX));
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (24 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' " Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  2021-03-11  7:07 ` [PATCH v2 27/27] perf Documentation: Document intel-hybrid support Jin Yao
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Currently we don't support shadow stat for hybrid.

  root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1

   Performance counter stats for 'system wide':

      12,883,109,591      cpu_core/cycles/
       6,405,163,221      cpu_atom/cycles/
         555,553,778      cpu_core/instructions/
         841,158,734      cpu_atom/instructions/

         1.002644773 seconds time elapsed

Now there is no shadow stat 'insn per cycle' reported. We will support
it later and now just skip the 'perf stat metrics (shadow stat) test'.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh
index ebebd3596cf9..e6e35fc6c882 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -7,6 +7,9 @@ set -e
 # skip if system-wide mode is forbidden
 perf stat -a true > /dev/null 2>&1 || exit 2
 
+# skip if on hybrid platform
+perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
+
 test_global_aggr()
 {
 	perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 27/27] perf Documentation: Document intel-hybrid support
  2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
                   ` (25 preceding siblings ...)
  2021-03-11  7:07 ` [PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' " Jin Yao
@ 2021-03-11  7:07 ` Jin Yao
  26 siblings, 0 replies; 59+ messages in thread
From: Jin Yao @ 2021-03-11  7:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/Documentation/intel-hybrid.txt | 228 ++++++++++++++++++++++
 tools/perf/Documentation/perf-record.txt  |   1 +
 tools/perf/Documentation/perf-stat.txt    |   2 +
 3 files changed, 231 insertions(+)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt

diff --git a/tools/perf/Documentation/intel-hybrid.txt b/tools/perf/Documentation/intel-hybrid.txt
new file mode 100644
index 000000000000..ff641d9ac81b
--- /dev/null
+++ b/tools/perf/Documentation/intel-hybrid.txt
@@ -0,0 +1,228 @@
+Intel hybrid support
+--------------------
+Support for Intel hybrid events within perf tools.
+
+For some Intel platforms, such as AlderLake, which is hybrid platform and
+it consists of atom cpu and core cpu. Each cpu has dedicated event list.
+Part of events are available on core cpu, part of events are available
+on atom cpu and even part of events are available on both.
+
+Kernel exports two new cpu pmus via sysfs:
+/sys/devices/cpu_core
+/sys/devices/cpu_atom
+
+The 'cpus' files are created under the directories. For example,
+
+cat /sys/devices/cpu_core/cpus
+0-15
+
+cat /sys/devices/cpu_atom/cpus
+16-23
+
+It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
+
+Quickstart
+
+List hybrid event
+-----------------
+
+As before, use perf-list to list the symbolic event.
+
+perf list
+
+inst_retired.any
+	[Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
+inst_retired.any
+	[Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
+
+The 'Unit: xxx' is added to brief description to indicate which pmu
+the event is belong to. Same event name but with different pmu can
+be supported.
+
+Enable hybrid event with a specific pmu
+---------------------------------------
+
+To enable a core only event or atom only event, following syntax is supported:
+
+	cpu_core/<event name>/
+or
+	cpu_atom/<event name>/
+
+For example, count the 'cycles' event on core cpus.
+
+	perf stat -e cpu_core/cycles/
+
+Create two events for one hardware event automatically
+------------------------------------------------------
+
+When creating one event and the event is available on both atom and core,
+two events are created automatically. One is for atom, the other is for
+core. Most of hardware events and cache events are available on both
+cpu_core and cpu_atom.
+
+For hardware events, they have pre-defined configs (e.g. 0 for cycles).
+But on hybrid platform, kernel needs to know where the event comes from
+(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
+can't carry pmu information. So a new type PERF_TYPE_HARDWARE_PMU is
+introduced.
+
+The new attr.config layout for PERF_TYPE_HARDWARE_PMU:
+
+0xDD000000AA
+AA: original hardware event ID
+DD: PMU type ID
+
+Cache event is similar. A new type PERF_TYPE_HW_CACHE_PMU is introduced.
+
+The new attr.config layout for PERF_TYPE_HW_CACHE_PMU:
+
+0xDD00CCBBAA
+AA: original hardware cache ID
+BB: original hardware cache op ID
+CC: original hardware cache op result ID
+DD: PMU type ID
+
+PMU type ID is retrieved from sysfs
+
+cat /sys/devices/cpu_atom/type
+10
+
+cat /sys/devices/cpu_core/type
+4
+
+When enabling a hardware event without specified pmu, such as,
+perf stat -e cycles -a (use system-wide in this example), two events
+are created automatically.
+
+    ------------------------------------------------------------
+    perf_event_attr:
+      type                             6
+      size                             120
+      config                           0x400000000
+      sample_type                      IDENTIFIER
+      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+      disabled                         1
+      inherit                          1
+      exclude_guest                    1
+    ------------------------------------------------------------
+
+and
+
+    ------------------------------------------------------------
+    perf_event_attr:
+      type                             6
+      size                             120
+      config                           0xa00000000
+      sample_type                      IDENTIFIER
+      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+      disabled                         1
+      inherit                          1
+      exclude_guest                    1
+    ------------------------------------------------------------
+
+type 6 is PERF_TYPE_HARDWARE_PMU.
+0x4 in 0x400000000 indicates it's cpu_core pmu.
+0xa in 0xa00000000 indicates it's cpu_atom pmu (atom pmu type id is random).
+
+The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
+and create 'cycles' (0xa00000000) on cpu16-cpu23 (atom cpus).
+
+For perf-stat result, it displays two events:
+
+ Performance counter stats for 'system wide':
+
+    12,869,720,529      cpu_core/cycles/
+     6,405,459,328      cpu_atom/cycles/
+
+The first 'cycles' is core event, the second 'cycles' is atom event.
+
+Thread mode example:
+--------------------
+
+perf-stat reports the scaled counts for hybrid event and with a percentage
+displayed. The percentage is the event's running time/enabling time.
+
+One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
+scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
+
+perf stat -e cycles -- taskset -c 16 ./triad_loop
+
+As previous, two events are created.
+
+------------------------------------------------------------
+perf_event_attr:
+  type                             6
+  size                             120
+  config                           0x400000000
+  sample_type                      IDENTIFIER
+  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled                         1
+  inherit                          1
+  enable_on_exec                   1
+  exclude_guest                    1
+------------------------------------------------------------
+
+and
+
+------------------------------------------------------------
+perf_event_attr:
+  type                             6
+  size                             120
+  config                           0xa00000000
+  sample_type                      IDENTIFIER
+  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled                         1
+  inherit                          1
+  enable_on_exec                   1
+  exclude_guest                    1
+------------------------------------------------------------
+
+cycles: 0: 784136 339210144 1013444
+cycles: 0: 601988862 339210144 338196700
+cycles: 262458394 339210144 1013444
+cycles: 603792788 339210144 338196700
+
+ Performance counter stats for 'taskset -c 16 ./triad_loop':
+
+       262,458,394      cpu_core/cycles/                                              (0.30%)
+       603,792,788      cpu_atom/cycles/                                              (99.70%)
+
+       0.340467913 seconds time elapsed
+
+perf-record:
+------------
+
+If there is no '-e' specified in perf record, on hybrid platform,
+it creates two default 'cycles' and adds them to event list. One
+is for core, the other is for atom.
+
+perf-stat:
+----------
+
+If there is no '-e' specified in perf stat, on hybrid platform,
+besides of software events, following events are created and
+added to event list in order.
+
+core 'cycles',
+atom 'cycles',
+core 'instructions',
+atom 'instructions',
+core 'branches',
+atom 'branches',
+core 'branch-misses',
+atom 'branch-misses'
+
+Of course, both perf-stat and perf-record support to enable
+hybrid event with a specific pmu.
+
+e.g.
+perf stat -e cpu_core/cycles/
+perf stat -e cpu_atom/cycles/
+perf stat -e cpu_core/r1a/
+perf stat -e cpu_atom/L1-icache-loads/
+perf stat -e cpu_core/cycles/,cpu_atom/instructions/
+perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
+
+But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
+"<not supported>" for 'instructions', because the pmus in
+group are not matched (cpu_core vs. cpu_atom).
\ No newline at end of file
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index f3161c9673e9..d71bac847936 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -695,6 +695,7 @@ measurements:
  wait -n ${perf_pid}
  exit $?
 
+include::intel-hybrid.txt[]
 
 SEE ALSO
 --------
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 08a1714494f8..d0def5c1715a 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -527,6 +527,8 @@ The fields are in this order:
 
 Additional metrics may be printed with all earlier fields being empty.
 
+include::intel-hybrid.txt[]
+
 SEE ALSO
 --------
 linkperf:perf-top[1], linkperf:perf-list[1]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-11  7:07 ` [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU Jin Yao
@ 2021-03-12 19:15   ` Jiri Olsa
  2021-03-15  2:28     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-12 19:15 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:
> On hybrid platform, some hardware events are only available
> on a specific pmu. For example, 'L1-dcache-load-misses' is only
> available on 'cpu_core' pmu. And even for the event which can be
> available on both pmus, the user also may want to just enable
> one event. So now following syntax is supported:
> 
> cpu_core/<hardware event>/
> cpu_core/<hardware cache event>/
> cpu_core/<pmu event>/
> 
> cpu_atom/<hardware event>/
> cpu_atom/<hardware cache event>/
> cpu_atom/<pmu event>/
> 
> It limits the event to be enabled only on a specified pmu.
> 
> The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
> in parse_events_add_pmu(), term->config is "LLC-loads".

hum, I don't understand how this doest not work even now,
I assume both cpu_core and cpu_atom have sysfs device directory
with events/ directory right?

and whatever is defined in events we allow in parsing syntax..

why can't we treat them like 2 separated pmus?

thanks,
jirka

> 
> We create a new "parse_events_state" with the pmu_name and use
> parse_events__scanner to scan the term->config (the string "LLC-loads"
> in this example). The parse_events_add_cache() will be called during
> parsing. The parse_state->pmu_name is used to identify the pmu
> where the event is enabled.
> 
> Let's see examples:
> 
>   root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv -- ./triad_loop
>   Control descriptor is not initialized
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             6
>     size                             120
>     config                           0x400000000
>     sample_type                      IDENTIFIER
>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>     disabled                         1
>     inherit                          1
>     enable_on_exec                   1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             7
>     size                             120
>     config                           0x400000002
>     sample_type                      IDENTIFIER
>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>     disabled                         1
>     inherit                          1
>     enable_on_exec                   1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
>   cycles: 0: 449252097 297999924 297999924
>   LLC-loads: 0: 1857 297999924 297999924
>   cycles: 449252097 297999924 297999924
>   LLC-loads: 1857 297999924 297999924
> 
>    Performance counter stats for './triad_loop':
> 
>          449,252,097      cpu_core/cycles/
>                1,857      cpu_core/LLC-loads/
> 
>          0.298898415 seconds time elapsed
> 
>   root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv -- taskset -c 16 ./triad_loop
>   Control descriptor is not initialized
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             6
>     size                             120
>     config                           0xa00000000
>     sample_type                      IDENTIFIER
>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>     disabled                         1
>     inherit                          1
>     enable_on_exec                   1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 3
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             7
>     size                             120
>     config                           0xa00000002
>     sample_type                      IDENTIFIER
>     read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>     disabled                         1
>     inherit                          1
>     enable_on_exec                   1
>     exclude_guest                    1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 4
>   cycles: 0: 602020010 343657939 342553275
>   LLC-loads: 0: 3537 343657939 342553275
>   cycles: 603961400 343657939 342553275
>   LLC-loads: 3548 343657939 342553275
> 
>    Performance counter stats for 'taskset -c 16 ./triad_loop':
> 
>          603,961,400      cpu_atom/cycles/                                              (99.68%)
>                3,548      cpu_atom/LLC-loads/                                           (99.68%)
> 
>          0.344904585 seconds time elapsed
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/util/parse-events.c | 100 +++++++++++++++++++++++++++++++--
>  tools/perf/util/parse-events.h |   6 +-
>  tools/perf/util/parse-events.y |  21 ++-----
>  3 files changed, 105 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 09e42245f71a..30435adc7a7b 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -489,7 +489,8 @@ static int create_hybrid_cache_event(struct list_head *list, int *idx,
>  static int add_hybrid_cache(struct list_head *list, int *idx,
>  			    struct perf_event_attr *attr, char *name,
>  			    struct list_head *config_terms,
> -			    bool *hybrid)
> +			    bool *hybrid,
> +			    struct parse_events_state *parse_state)
>  {
>  	struct perf_pmu *pmu;
>  	int ret;
> @@ -497,6 +498,11 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
>  	*hybrid = false;
>  	perf_pmu__for_each_hybrid_pmu(pmu) {
>  		*hybrid = true;
> +		 if (parse_state->pmu_name &&
> +		     strcmp(parse_state->pmu_name, pmu->name)) {
> +			continue;
> +		}
> +
>  		ret = create_hybrid_cache_event(list, idx, attr, name,
>  						config_terms, pmu);
>  		if (ret)
> @@ -509,7 +515,8 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
>  int parse_events_add_cache(struct list_head *list, int *idx,
>  			   char *type, char *op_result1, char *op_result2,
>  			   struct parse_events_error *err,
> -			   struct list_head *head_config)
> +			   struct list_head *head_config,
> +			   struct parse_events_state *parse_state)
>  {
>  	struct perf_event_attr attr;
>  	LIST_HEAD(config_terms);
> @@ -582,7 +589,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>  		perf_pmu__scan(NULL);
>  
>  	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
> -			       &config_terms, &hybrid);
> +			       &config_terms, &hybrid, parse_state);
>  	if (hybrid)
>  		return ret;
>  
> @@ -1512,6 +1519,11 @@ static int add_hybrid_numeric(struct parse_events_state *parse_state,
>  	*hybrid = false;
>  	perf_pmu__for_each_hybrid_pmu(pmu) {
>  		*hybrid = true;
> +		if (parse_state->pmu_name &&
> +		    strcmp(parse_state->pmu_name, pmu->name)) {
> +			continue;
> +		}
> +
>  		ret = create_hybrid_hw_event(parse_state, list, attr, pmu);
>  		if (ret)
>  			return ret;
> @@ -1578,6 +1590,10 @@ static bool config_term_percore(struct list_head *config_terms)
>  	return false;
>  }
>  
> +static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
> +			      const char *str, char *name, bool *found,
> +			      struct list_head *list);
> +
>  int parse_events_add_pmu(struct parse_events_state *parse_state,
>  			 struct list_head *list, char *name,
>  			 struct list_head *head_config,
> @@ -1589,7 +1605,7 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
>  	struct perf_pmu *pmu;
>  	struct evsel *evsel;
>  	struct parse_events_error *err = parse_state->error;
> -	bool use_uncore_alias;
> +	bool use_uncore_alias, found;
>  	LIST_HEAD(config_terms);
>  
>  	if (verbose > 1) {
> @@ -1605,6 +1621,22 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
>  		fprintf(stderr, "' that may result in non-fatal errors\n");
>  	}
>  
> +	if (head_config && perf_pmu__is_hybrid(name)) {
> +		struct parse_events_term *term;
> +		int ret;
> +
> +		list_for_each_entry(term, head_config, list) {
> +			if (!term->config)
> +				continue;
> +			ret = parse_events_with_hybrid_pmu(parse_state,
> +							   term->config,
> +							   name, &found,
> +							   list);
> +			if (found)
> +				return ret;
> +		}
> +	}
> +
>  	pmu = parse_state->fake_pmu ?: perf_pmu__find(name);
>  	if (!pmu) {
>  		char *err_str;
> @@ -1713,12 +1745,19 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>  	struct perf_pmu *pmu = NULL;
>  	int ok = 0;
>  
> +        if (parse_state->pmu_name) {
> +                list = alloc_list();
> +                if (!list)
> +                        return -1;
> +                *listp = list;
> +                return 0;
> +        }
> +
>  	*listp = NULL;
>  	/* Add it for all PMUs that support the alias */
> -	list = malloc(sizeof(struct list_head));
> +	list = alloc_list();
>  	if (!list)
>  		return -1;
> -	INIT_LIST_HEAD(list);
>  	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
>  		struct perf_pmu_alias *alias;
>  
> @@ -2284,6 +2323,44 @@ int parse_events_terms(struct list_head *terms, const char *str)
>  	return ret;
>  }
>  
> +static int list_num(struct list_head *list)
> +{
> +	struct list_head *pos;
> +	int n = 0;
> +
> +	list_for_each(pos, list)
> +		n++;
> +
> +	return n;
> +}
> +
> +static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
> +					const char *str, char *pmu_name,
> +					bool *found, struct list_head *list)
> +{
> +	struct parse_events_state ps = {
> +		.list		= LIST_HEAD_INIT(ps.list),
> +		.stoken		= PE_START_EVENTS,
> +		.pmu_name	= pmu_name,
> +		.idx		= parse_state->idx,
> +	};
> +	int ret;
> +
> +	*found = false;
> +	ret = parse_events__scanner(str, &ps);
> +	perf_pmu__parse_cleanup();
> +
> +	if (!ret) {
> +		if (!list_empty(&ps.list)) {
> +			*found = true;
> +			list_splice(&ps.list, list);
> +			parse_state->idx = list_num(list);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
>  int __parse_events(struct evlist *evlist, const char *str,
>  		   struct parse_events_error *err, struct perf_pmu *fake_pmu)
>  {
> @@ -3307,3 +3384,14 @@ char *parse_events_formats_error_string(char *additional_terms)
>  fail:
>  	return NULL;
>  }
> +
> +struct list_head *alloc_list(void)
> +{
> +	struct list_head *list = malloc(sizeof(*list));
> +
> +	if (!list)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(list);
> +	return list;
> +}
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index e80c9b74f2f2..39c7121a4659 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -138,6 +138,7 @@ struct parse_events_state {
>  	struct list_head	  *terms;
>  	int			   stoken;
>  	struct perf_pmu		  *fake_pmu;
> +	char			  *pmu_name;
>  };
>  
>  void parse_events__handle_error(struct parse_events_error *err, int idx,
> @@ -188,7 +189,8 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
>  int parse_events_add_cache(struct list_head *list, int *idx,
>  			   char *type, char *op_result1, char *op_result2,
>  			   struct parse_events_error *error,
> -			   struct list_head *head_config);
> +			   struct list_head *head_config,
> +			   struct parse_events_state *parse_state);
>  int parse_events_add_breakpoint(struct list_head *list, int *idx,
>  				u64 addr, char *type, u64 len);
>  int parse_events_add_pmu(struct parse_events_state *parse_state,
> @@ -242,6 +244,8 @@ char *parse_events_formats_error_string(char *additional_terms);
>  void parse_events_print_error(struct parse_events_error *err,
>  			      const char *event);
>  
> +struct list_head *alloc_list(void);
> +
>  #ifdef HAVE_LIBELF_SUPPORT
>  /*
>   * If the probe point starts with '%',
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index d57ac86ce7ca..e0e68c3da9e4 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -26,18 +26,6 @@ do { \
>  		YYABORT; \
>  } while (0)
>  
> -static struct list_head* alloc_list(void)
> -{
> -	struct list_head *list;
> -
> -	list = malloc(sizeof(*list));
> -	if (!list)
> -		return NULL;
> -
> -	INIT_LIST_HEAD(list);
> -	return list;
> -}
> -
>  static void free_list_evsel(struct list_head* list_evsel)
>  {
>  	struct evsel *evsel, *tmp;
> @@ -454,7 +442,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e
>  
>  	list = alloc_list();
>  	ABORT_ON(!list);
> -	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6);
> +	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6,
> +				parse_state);
>  	parse_events_terms__delete($6);
>  	free($1);
>  	free($3);
> @@ -475,7 +464,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config
>  
>  	list = alloc_list();
>  	ABORT_ON(!list);
> -	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4);
> +	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4,
> +				parse_state);
>  	parse_events_terms__delete($4);
>  	free($1);
>  	free($3);
> @@ -495,7 +485,8 @@ PE_NAME_CACHE_TYPE opt_event_config
>  
>  	list = alloc_list();
>  	ABORT_ON(!list);
> -	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2);
> +	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2,
> +				parse_state);
>  	parse_events_terms__delete($2);
>  	free($1);
>  	if (err) {
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus
  2021-03-11  7:07 ` [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus Jin Yao
@ 2021-03-12 19:15   ` Jiri Olsa
  2021-03-15  1:25     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-12 19:15 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:22PM +0800, Jin Yao wrote:
> On hybrid platform, atom events can be only enabled on atom CPUs. Core
> events can be only enabled on core CPUs. So for a hybrid event, it can
> be only enabled on it's own CPUs.
> 
> But the problem for current perf is, the cpus for evsel (via PMU sysfs)
> have been merged to evsel_list->core.all_cpus. It might be all CPUs.
> 
> So we need to figure out one way to let the hybrid event only use it's
> own CPUs.
> 
> The idea is to create a new evlist__invalidate_all_cpus to invalidate
> the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1
> for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus.

that's wild.. I don't understand when you say we don't have
cpus for evsel, because they have been merged.. each evsel
has evsel->core.own_cpus coming from pmu->cpus, right?

why can't you just filter out cpus that are in there?

jirka

> 
> We will see following code piece in patch.
> 
> if (cpu == -1 && !evlist->thread_mode)
>         evsel__enable_cpus(pos);
> 
> It lets the event be enabled on event's own cpus.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/builtin-stat.c          | 37 ++++++++++++++-
>  tools/perf/util/evlist.c           | 72 ++++++++++++++++++++++++++++--
>  tools/perf/util/evlist.h           |  4 ++
>  tools/perf/util/evsel.h            |  8 ++++
>  tools/perf/util/python-ext-sources |  2 +
>  5 files changed, 117 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 2e2e4a8345ea..68ecf68699a9 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -393,6 +393,18 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu)
>  	return 0;
>  }
>  
> +static int read_counter_cpus(struct evsel *counter, struct timespec *rs)
> +{
> +	int cpu, nr_cpus, err = 0;
> +	struct perf_cpu_map *cpus = evsel__cpus(counter);
> +
> +	nr_cpus = cpus ? cpus->nr : 1;
> +	for (cpu = 0; cpu < nr_cpus; cpu++)
> +		err = read_counter_cpu(counter, rs, cpu);
> +
> +	return err;
> +}
> +
>  static int read_affinity_counters(struct timespec *rs)
>  {
>  	struct evsel *counter;
> @@ -414,8 +426,14 @@ static int read_affinity_counters(struct timespec *rs)
>  			if (evsel__cpu_iter_skip(counter, cpu))
>  				continue;
>  			if (!counter->err) {
> -				counter->err = read_counter_cpu(counter, rs,
> -								counter->cpu_iter - 1);
> +				if (cpu == -1 && !evsel_list->thread_mode) {
> +					counter->err = read_counter_cpus(counter, rs);
> +				} else if (evsel_list->thread_mode) {
> +					counter->err = read_counter_cpu(counter, rs, 0);
> +				} else {
> +					counter->err = read_counter_cpu(counter, rs,
> +									counter->cpu_iter - 1);
> +				}
>  			}
>  		}
>  	}
> @@ -781,6 +799,21 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (group)
>  		evlist__set_leader(evsel_list);
>  
> +	/*
> +	 * On hybrid platform, the cpus for evsel (via PMU sysfs) have been
> +	 * merged to evsel_list->core.all_cpus. We use evlist__invalidate_all_cpus
> +	 * to invalidate the evsel_list->core.all_cpus then evlist__for_each_cpu
> +	 * returns cpu -1 for hybrid evsel. If cpu is -1, hybrid evsel will
> +	 * use it's own cpus.
> +	 */
> +	if (evlist__has_hybrid_events(evsel_list)) {
> +		evlist__invalidate_all_cpus(evsel_list);
> +		if (!target__has_cpu(&target) ||
> +		    target__has_per_thread(&target)) {
> +			evsel_list->thread_mode = true;
> +		}
> +	}
> +
>  	if (affinity__setup(&affinity) < 0)
>  		return -1;
>  
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 882cd1f721d9..3ee12fcd0c9f 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -381,7 +381,8 @@ bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu)
>  bool evsel__cpu_iter_skip(struct evsel *ev, int cpu)
>  {
>  	if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) {
> -		ev->cpu_iter++;
> +		if (cpu != -1)
> +			ev->cpu_iter++;
>  		return false;
>  	}
>  	return true;
> @@ -410,6 +411,16 @@ static int evlist__is_enabled(struct evlist *evlist)
>  	return false;
>  }
>  
> +static void evsel__disable_cpus(struct evsel *evsel)
> +{
> +	int cpu, nr_cpus;
> +	struct perf_cpu_map *cpus = evsel__cpus(evsel);
> +
> +	nr_cpus = cpus ? cpus->nr : 1;
> +	for (cpu = 0; cpu < nr_cpus; cpu++)
> +		evsel__disable_cpu(evsel, cpu);
> +}
> +
>  static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>  {
>  	struct evsel *pos;
> @@ -436,7 +447,12 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>  					has_imm = true;
>  				if (pos->immediate != imm)
>  					continue;
> -				evsel__disable_cpu(pos, pos->cpu_iter - 1);
> +				if (cpu == -1 && !evlist->thread_mode)
> +					evsel__disable_cpus(pos);
> +				else if (evlist->thread_mode)
> +					evsel__disable_cpu(pos, 0);
> +				else
> +					evsel__disable_cpu(pos, pos->cpu_iter - 1);
>  			}
>  		}
>  		if (!has_imm)
> @@ -472,6 +488,15 @@ void evlist__disable_evsel(struct evlist *evlist, char *evsel_name)
>  	__evlist__disable(evlist, evsel_name);
>  }
>  
> +static void evsel__enable_cpus(struct evsel *evsel)
> +{
> +	int cpu, nr_cpus;
> +	struct perf_cpu_map *cpus = evsel__cpus(evsel);
> +
> +	nr_cpus = cpus ? cpus->nr : 1;
> +	for (cpu = 0; cpu < nr_cpus; cpu++)
> +		evsel__enable_cpu(evsel, cpu);
> +}
>  static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>  {
>  	struct evsel *pos;
> @@ -491,7 +516,12 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>  				continue;
>  			if (!evsel__is_group_leader(pos) || !pos->core.fd)
>  				continue;
> -			evsel__enable_cpu(pos, pos->cpu_iter - 1);
> +                        if (cpu == -1 && !evlist->thread_mode)
> +                                evsel__enable_cpus(pos);
> +                        else if (evlist->thread_mode)
> +                                evsel__enable_cpu(pos, 0);
> +                        else
> +                                evsel__enable_cpu(pos, pos->cpu_iter - 1);
>  		}
>  	}
>  	affinity__cleanup(&affinity);
> @@ -1274,6 +1304,16 @@ void evlist__set_selected(struct evlist *evlist, struct evsel *evsel)
>  	evlist->selected = evsel;
>  }
>  
> +static void evsel__close_cpus(struct evsel *evsel)
> +{
> +	int cpu, nr_cpus;
> +	struct perf_cpu_map *cpus = evsel__cpus(evsel);
> +
> +	nr_cpus = cpus ? cpus->nr : 1;
> +	for (cpu = 0; cpu < nr_cpus; cpu++)
> +		perf_evsel__close_cpu(&evsel->core, cpu);
> +}
> +
>  void evlist__close(struct evlist *evlist)
>  {
>  	struct evsel *evsel;
> @@ -1298,7 +1338,13 @@ void evlist__close(struct evlist *evlist)
>  		evlist__for_each_entry_reverse(evlist, evsel) {
>  			if (evsel__cpu_iter_skip(evsel, cpu))
>  			    continue;
> -			perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1);
> +
> +			if (cpu == -1 && !evlist->thread_mode)
> +				evsel__close_cpus(evsel);
> +			else if (evlist->thread_mode)
> +				perf_evsel__close_cpu(&evsel->core, 0);
> +			else
> +				perf_evsel__close_cpu(&evsel->core, evsel->cpu_iter - 1);
>  		}
>  	}
>  	affinity__cleanup(&affinity);
> @@ -2130,3 +2176,21 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
>  	}
>  	return NULL;
>  }
> +
> +bool evlist__has_hybrid_events(struct evlist *evlist)
> +{
> +	struct evsel *evsel;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel__is_hybrid_event(evsel))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +void evlist__invalidate_all_cpus(struct evlist *evlist)
> +{
> +	perf_cpu_map__put(evlist->core.all_cpus);
> +	evlist->core.all_cpus = perf_cpu_map__empty_new(1);
> +}
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index b695ffaae519..0da683511d98 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -52,6 +52,7 @@ struct evlist {
>  	struct perf_evlist core;
>  	int		 nr_groups;
>  	bool		 enabled;
> +	bool		 thread_mode;
>  	int		 id_pos;
>  	int		 is_pos;
>  	u64		 combined_sample_type;
> @@ -365,4 +366,7 @@ int evlist__ctlfd_ack(struct evlist *evlist);
>  #define EVLIST_DISABLED_MSG "Events disabled\n"
>  
>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
> +void evlist__invalidate_all_cpus(struct evlist *evlist);
> +
> +bool evlist__has_hybrid_events(struct evlist *evlist);
>  #endif /* __PERF_EVLIST_H */
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 6026487353dd..69aadc52c1bd 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -7,9 +7,11 @@
>  #include <sys/types.h>
>  #include <linux/perf_event.h>
>  #include <linux/types.h>
> +#include <string.h>
>  #include <internal/evsel.h>
>  #include <perf/evsel.h>
>  #include "symbol_conf.h"
> +#include "pmu-hybrid.h"
>  #include <internal/cpumap.h>
>  
>  struct bpf_object;
> @@ -435,4 +437,10 @@ struct perf_env *evsel__env(struct evsel *evsel);
>  int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
>  
>  void evsel__zero_per_pkg(struct evsel *evsel);
> +
> +static inline bool evsel__is_hybrid_event(struct evsel *evsel)
> +{
> +	return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
> +}
> +
>  #endif /* __PERF_EVSEL_H */
> diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
> index 845dd46e3c61..d7c976671e3a 100644
> --- a/tools/perf/util/python-ext-sources
> +++ b/tools/perf/util/python-ext-sources
> @@ -37,3 +37,5 @@ util/units.c
>  util/affinity.c
>  util/rwsem.c
>  util/hashmap.c
> +util/pmu-hybrid.c
> +util/fncache.c
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-11  7:07 ` [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events Jin Yao
@ 2021-03-12 19:15   ` Jiri Olsa
  2021-03-15  2:04     ` Jin, Yao
  2021-03-15 23:05   ` Jiri Olsa
  1 sibling, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-12 19:15 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

SNIP

>   cycles: 4: 800933425 1002536659 1002536659
>   cycles: 5: 800928573 1002528386 1002528386
>   cycles: 6: 800924347 1002520527 1002520527
>   cycles: 7: 800922009 1002513176 1002513176
>   cycles: 8: 800919624 1002507326 1002507326
>   cycles: 9: 800917204 1002500663 1002500663
>   cycles: 10: 802096579 1002494280 1002494280
>   cycles: 11: 802093770 1002486404 1002486404
>   cycles: 12: 803284338 1002479491 1002479491
>   cycles: 13: 803277609 1002469777 1002469777
>   cycles: 14: 800875902 1002458861 1002458861
>   cycles: 15: 800873241 1002451350 1002451350
>   cycles: 0: 800837379 1002444645 1002444645
>   cycles: 1: 800833400 1002438505 1002438505
>   cycles: 2: 800829291 1002433698 1002433698
>   cycles: 3: 800824390 1002427584 1002427584
>   cycles: 4: 800819360 1002422099 1002422099
>   cycles: 5: 800814787 1002415845 1002415845
>   cycles: 6: 800810125 1002410301 1002410301
>   cycles: 7: 800791893 1002386845 1002386845
>   cycles: 12855737722 16040169029 16040169029
>   cycles: 6406560625 8019379522 8019379522
> 
>    Performance counter stats for 'system wide':
> 
>       12,855,737,722      cpu_core/cycles/
>        6,406,560,625      cpu_atom/cycles/

so we do that no_merge stuff for uncore pmus, why can't we do
that in here? that'd seems like generic way

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus
  2021-03-12 19:15   ` Jiri Olsa
@ 2021-03-15  1:25     ` Jin, Yao
  0 siblings, 0 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-15  1:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:22PM +0800, Jin Yao wrote:
>> On hybrid platform, atom events can be only enabled on atom CPUs. Core
>> events can be only enabled on core CPUs. So for a hybrid event, it can
>> be only enabled on it's own CPUs.
>>
>> But the problem for current perf is, the cpus for evsel (via PMU sysfs)
>> have been merged to evsel_list->core.all_cpus. It might be all CPUs.
>>
>> So we need to figure out one way to let the hybrid event only use it's
>> own CPUs.
>>
>> The idea is to create a new evlist__invalidate_all_cpus to invalidate
>> the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1
>> for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus.
> 
> that's wild.. I don't understand when you say we don't have
> cpus for evsel, because they have been merged.. each evsel
> has evsel->core.own_cpus coming from pmu->cpus, right?
> 
> why can't you just filter out cpus that are in there?
> 
> jirka
> 

Yes, you're right. This patch is wide and actually it's not very necessary.

The current framework has processed the cpus for evsel well even for hybrid evsel. So this patch can 
be dropped.

Thanks
Jin Yao


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-12 19:15   ` Jiri Olsa
@ 2021-03-15  2:04     ` Jin, Yao
  2021-03-15 17:34       ` Jiri Olsa
  0 siblings, 1 reply; 59+ messages in thread
From: Jin, Yao @ 2021-03-15  2:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>>    cycles: 4: 800933425 1002536659 1002536659
>>    cycles: 5: 800928573 1002528386 1002528386
>>    cycles: 6: 800924347 1002520527 1002520527
>>    cycles: 7: 800922009 1002513176 1002513176
>>    cycles: 8: 800919624 1002507326 1002507326
>>    cycles: 9: 800917204 1002500663 1002500663
>>    cycles: 10: 802096579 1002494280 1002494280
>>    cycles: 11: 802093770 1002486404 1002486404
>>    cycles: 12: 803284338 1002479491 1002479491
>>    cycles: 13: 803277609 1002469777 1002469777
>>    cycles: 14: 800875902 1002458861 1002458861
>>    cycles: 15: 800873241 1002451350 1002451350
>>    cycles: 0: 800837379 1002444645 1002444645
>>    cycles: 1: 800833400 1002438505 1002438505
>>    cycles: 2: 800829291 1002433698 1002433698
>>    cycles: 3: 800824390 1002427584 1002427584
>>    cycles: 4: 800819360 1002422099 1002422099
>>    cycles: 5: 800814787 1002415845 1002415845
>>    cycles: 6: 800810125 1002410301 1002410301
>>    cycles: 7: 800791893 1002386845 1002386845
>>    cycles: 12855737722 16040169029 16040169029
>>    cycles: 6406560625 8019379522 8019379522
>>
>>     Performance counter stats for 'system wide':
>>
>>        12,855,737,722      cpu_core/cycles/
>>         6,406,560,625      cpu_atom/cycles/
> 
> so we do that no_merge stuff for uncore pmus, why can't we do
> that in here? that'd seems like generic way
> 
> jirka
> 

We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf stat: Uniquify hybrid event 
name".

For hybrid hardware events, they have different configs. The config is 0xDD000000AA (0x400000000 for 
core vs. 0xa00000000 for atom in this example)

We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate the configs and create 
the evsels for each hybrid PMU. This logic and the code are not complex and easy to understand.

Uncore looks complicated. It has uncore alias concept which is for different PMUs but with same 
prefix. Such as "uncore_cbox" for "uncore_cbox_0" to "uncore_cbox_9". But the uncore alias concept 
doesn't apply to hybrid pmu (we just have "cpu_core" and "cpu_atom" here). And actually I also don't 
want to mix the core stuff with uncore stuff, that would be hard for understanding.

Perhaps I misunderstand, correct me if I'm wrong.

Thanks
Jin Yao


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-12 19:15   ` Jiri Olsa
@ 2021-03-15  2:28     ` Jin, Yao
  2021-03-15 17:37       ` Jiri Olsa
  0 siblings, 1 reply; 59+ messages in thread
From: Jin, Yao @ 2021-03-15  2:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:
>> On hybrid platform, some hardware events are only available
>> on a specific pmu. For example, 'L1-dcache-load-misses' is only
>> available on 'cpu_core' pmu. And even for the event which can be
>> available on both pmus, the user also may want to just enable
>> one event. So now following syntax is supported:
>>
>> cpu_core/<hardware event>/
>> cpu_core/<hardware cache event>/
>> cpu_core/<pmu event>/
>>
>> cpu_atom/<hardware event>/
>> cpu_atom/<hardware cache event>/
>> cpu_atom/<pmu event>/
>>
>> It limits the event to be enabled only on a specified pmu.
>>
>> The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
>> in parse_events_add_pmu(), term->config is "LLC-loads".
> 
> hum, I don't understand how this doest not work even now,
> I assume both cpu_core and cpu_atom have sysfs device directory
> with events/ directory right?
> 

Yes, we have cpu_core and cpu_atom directories with events.

root@ssp-pwrt-002:/sys/devices/cpu_atom/events# ls
branch-instructions  bus-cycles    cache-references  instructions  mem-stores  topdown-bad-spec 
topdown-fe-bound
branch-misses        cache-misses  cpu-cycles        mem-loads     ref-cycles  topdown-be-bound 
topdown-retiring

root@ssp-pwrt-002:/sys/devices/cpu_core/events# ls
branch-instructions  cache-misses      instructions   mem-stores  topdown-bad-spec 
topdown-fe-bound   topdown-mem-bound
branch-misses        cache-references  mem-loads      ref-cycles  topdown-be-bound 
topdown-fetch-lat  topdown-retiring
bus-cycles           cpu-cycles        mem-loads-aux  slots       topdown-br-mispredict 
topdown-heavy-ops

> and whatever is defined in events we allow in parsing syntax..
> 
> why can't we treat them like 2 separated pmus?
>

But if without this patch, it reports the error,

root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/ -a -vv -- sleep 1
event syntax error: 'cpu_core/cycles/'
                               \___ unknown term 'cycles' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/cycles/'
                               \___ unknown term 'cycles' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
Run 'perf list' for a list of valid events

The 'cycles' is treated as a unknown term, then it errors out.

So we have to create another parser to scan the term.

Thanks
Jin Yao

> thanks,
> jirka
> 
>>
>> We create a new "parse_events_state" with the pmu_name and use
>> parse_events__scanner to scan the term->config (the string "LLC-loads"
>> in this example). The parse_events_add_cache() will be called during
>> parsing. The parse_state->pmu_name is used to identify the pmu
>> where the event is enabled.
>>
>> Let's see examples:
>>
>>    root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv -- ./triad_loop
>>    Control descriptor is not initialized
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             6
>>      size                             120
>>      config                           0x400000000
>>      sample_type                      IDENTIFIER
>>      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>      disabled                         1
>>      inherit                          1
>>      enable_on_exec                   1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             7
>>      size                             120
>>      config                           0x400000002
>>      sample_type                      IDENTIFIER
>>      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>      disabled                         1
>>      inherit                          1
>>      enable_on_exec                   1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
>>    cycles: 0: 449252097 297999924 297999924
>>    LLC-loads: 0: 1857 297999924 297999924
>>    cycles: 449252097 297999924 297999924
>>    LLC-loads: 1857 297999924 297999924
>>
>>     Performance counter stats for './triad_loop':
>>
>>           449,252,097      cpu_core/cycles/
>>                 1,857      cpu_core/LLC-loads/
>>
>>           0.298898415 seconds time elapsed
>>
>>    root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv -- taskset -c 16 ./triad_loop
>>    Control descriptor is not initialized
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             6
>>      size                             120
>>      config                           0xa00000000
>>      sample_type                      IDENTIFIER
>>      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>      disabled                         1
>>      inherit                          1
>>      enable_on_exec                   1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 3
>>    ------------------------------------------------------------
>>    perf_event_attr:
>>      type                             7
>>      size                             120
>>      config                           0xa00000002
>>      sample_type                      IDENTIFIER
>>      read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>      disabled                         1
>>      inherit                          1
>>      enable_on_exec                   1
>>      exclude_guest                    1
>>    ------------------------------------------------------------
>>    sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 4
>>    cycles: 0: 602020010 343657939 342553275
>>    LLC-loads: 0: 3537 343657939 342553275
>>    cycles: 603961400 343657939 342553275
>>    LLC-loads: 3548 343657939 342553275
>>
>>     Performance counter stats for 'taskset -c 16 ./triad_loop':
>>
>>           603,961,400      cpu_atom/cycles/                                              (99.68%)
>>                 3,548      cpu_atom/LLC-loads/                                           (99.68%)
>>
>>           0.344904585 seconds time elapsed
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/util/parse-events.c | 100 +++++++++++++++++++++++++++++++--
>>   tools/perf/util/parse-events.h |   6 +-
>>   tools/perf/util/parse-events.y |  21 ++-----
>>   3 files changed, 105 insertions(+), 22 deletions(-)
>>
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index 09e42245f71a..30435adc7a7b 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -489,7 +489,8 @@ static int create_hybrid_cache_event(struct list_head *list, int *idx,
>>   static int add_hybrid_cache(struct list_head *list, int *idx,
>>   			    struct perf_event_attr *attr, char *name,
>>   			    struct list_head *config_terms,
>> -			    bool *hybrid)
>> +			    bool *hybrid,
>> +			    struct parse_events_state *parse_state)
>>   {
>>   	struct perf_pmu *pmu;
>>   	int ret;
>> @@ -497,6 +498,11 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
>>   	*hybrid = false;
>>   	perf_pmu__for_each_hybrid_pmu(pmu) {
>>   		*hybrid = true;
>> +		 if (parse_state->pmu_name &&
>> +		     strcmp(parse_state->pmu_name, pmu->name)) {
>> +			continue;
>> +		}
>> +
>>   		ret = create_hybrid_cache_event(list, idx, attr, name,
>>   						config_terms, pmu);
>>   		if (ret)
>> @@ -509,7 +515,8 @@ static int add_hybrid_cache(struct list_head *list, int *idx,
>>   int parse_events_add_cache(struct list_head *list, int *idx,
>>   			   char *type, char *op_result1, char *op_result2,
>>   			   struct parse_events_error *err,
>> -			   struct list_head *head_config)
>> +			   struct list_head *head_config,
>> +			   struct parse_events_state *parse_state)
>>   {
>>   	struct perf_event_attr attr;
>>   	LIST_HEAD(config_terms);
>> @@ -582,7 +589,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>>   		perf_pmu__scan(NULL);
>>   
>>   	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
>> -			       &config_terms, &hybrid);
>> +			       &config_terms, &hybrid, parse_state);
>>   	if (hybrid)
>>   		return ret;
>>   
>> @@ -1512,6 +1519,11 @@ static int add_hybrid_numeric(struct parse_events_state *parse_state,
>>   	*hybrid = false;
>>   	perf_pmu__for_each_hybrid_pmu(pmu) {
>>   		*hybrid = true;
>> +		if (parse_state->pmu_name &&
>> +		    strcmp(parse_state->pmu_name, pmu->name)) {
>> +			continue;
>> +		}
>> +
>>   		ret = create_hybrid_hw_event(parse_state, list, attr, pmu);
>>   		if (ret)
>>   			return ret;
>> @@ -1578,6 +1590,10 @@ static bool config_term_percore(struct list_head *config_terms)
>>   	return false;
>>   }
>>   
>> +static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
>> +			      const char *str, char *name, bool *found,
>> +			      struct list_head *list);
>> +
>>   int parse_events_add_pmu(struct parse_events_state *parse_state,
>>   			 struct list_head *list, char *name,
>>   			 struct list_head *head_config,
>> @@ -1589,7 +1605,7 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
>>   	struct perf_pmu *pmu;
>>   	struct evsel *evsel;
>>   	struct parse_events_error *err = parse_state->error;
>> -	bool use_uncore_alias;
>> +	bool use_uncore_alias, found;
>>   	LIST_HEAD(config_terms);
>>   
>>   	if (verbose > 1) {
>> @@ -1605,6 +1621,22 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,
>>   		fprintf(stderr, "' that may result in non-fatal errors\n");
>>   	}
>>   
>> +	if (head_config && perf_pmu__is_hybrid(name)) {
>> +		struct parse_events_term *term;
>> +		int ret;
>> +
>> +		list_for_each_entry(term, head_config, list) {
>> +			if (!term->config)
>> +				continue;
>> +			ret = parse_events_with_hybrid_pmu(parse_state,
>> +							   term->config,
>> +							   name, &found,
>> +							   list);
>> +			if (found)
>> +				return ret;
>> +		}
>> +	}
>> +
>>   	pmu = parse_state->fake_pmu ?: perf_pmu__find(name);
>>   	if (!pmu) {
>>   		char *err_str;
>> @@ -1713,12 +1745,19 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>>   	struct perf_pmu *pmu = NULL;
>>   	int ok = 0;
>>   
>> +        if (parse_state->pmu_name) {
>> +                list = alloc_list();
>> +                if (!list)
>> +                        return -1;
>> +                *listp = list;
>> +                return 0;
>> +        }
>> +
>>   	*listp = NULL;
>>   	/* Add it for all PMUs that support the alias */
>> -	list = malloc(sizeof(struct list_head));
>> +	list = alloc_list();
>>   	if (!list)
>>   		return -1;
>> -	INIT_LIST_HEAD(list);
>>   	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
>>   		struct perf_pmu_alias *alias;
>>   
>> @@ -2284,6 +2323,44 @@ int parse_events_terms(struct list_head *terms, const char *str)
>>   	return ret;
>>   }
>>   
>> +static int list_num(struct list_head *list)
>> +{
>> +	struct list_head *pos;
>> +	int n = 0;
>> +
>> +	list_for_each(pos, list)
>> +		n++;
>> +
>> +	return n;
>> +}
>> +
>> +static int parse_events_with_hybrid_pmu(struct parse_events_state *parse_state,
>> +					const char *str, char *pmu_name,
>> +					bool *found, struct list_head *list)
>> +{
>> +	struct parse_events_state ps = {
>> +		.list		= LIST_HEAD_INIT(ps.list),
>> +		.stoken		= PE_START_EVENTS,
>> +		.pmu_name	= pmu_name,
>> +		.idx		= parse_state->idx,
>> +	};
>> +	int ret;
>> +
>> +	*found = false;
>> +	ret = parse_events__scanner(str, &ps);
>> +	perf_pmu__parse_cleanup();
>> +
>> +	if (!ret) {
>> +		if (!list_empty(&ps.list)) {
>> +			*found = true;
>> +			list_splice(&ps.list, list);
>> +			parse_state->idx = list_num(list);
>> +		}
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   int __parse_events(struct evlist *evlist, const char *str,
>>   		   struct parse_events_error *err, struct perf_pmu *fake_pmu)
>>   {
>> @@ -3307,3 +3384,14 @@ char *parse_events_formats_error_string(char *additional_terms)
>>   fail:
>>   	return NULL;
>>   }
>> +
>> +struct list_head *alloc_list(void)
>> +{
>> +	struct list_head *list = malloc(sizeof(*list));
>> +
>> +	if (!list)
>> +		return NULL;
>> +
>> +	INIT_LIST_HEAD(list);
>> +	return list;
>> +}
>> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
>> index e80c9b74f2f2..39c7121a4659 100644
>> --- a/tools/perf/util/parse-events.h
>> +++ b/tools/perf/util/parse-events.h
>> @@ -138,6 +138,7 @@ struct parse_events_state {
>>   	struct list_head	  *terms;
>>   	int			   stoken;
>>   	struct perf_pmu		  *fake_pmu;
>> +	char			  *pmu_name;
>>   };
>>   
>>   void parse_events__handle_error(struct parse_events_error *err, int idx,
>> @@ -188,7 +189,8 @@ int parse_events_add_tool(struct parse_events_state *parse_state,
>>   int parse_events_add_cache(struct list_head *list, int *idx,
>>   			   char *type, char *op_result1, char *op_result2,
>>   			   struct parse_events_error *error,
>> -			   struct list_head *head_config);
>> +			   struct list_head *head_config,
>> +			   struct parse_events_state *parse_state);
>>   int parse_events_add_breakpoint(struct list_head *list, int *idx,
>>   				u64 addr, char *type, u64 len);
>>   int parse_events_add_pmu(struct parse_events_state *parse_state,
>> @@ -242,6 +244,8 @@ char *parse_events_formats_error_string(char *additional_terms);
>>   void parse_events_print_error(struct parse_events_error *err,
>>   			      const char *event);
>>   
>> +struct list_head *alloc_list(void);
>> +
>>   #ifdef HAVE_LIBELF_SUPPORT
>>   /*
>>    * If the probe point starts with '%',
>> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
>> index d57ac86ce7ca..e0e68c3da9e4 100644
>> --- a/tools/perf/util/parse-events.y
>> +++ b/tools/perf/util/parse-events.y
>> @@ -26,18 +26,6 @@ do { \
>>   		YYABORT; \
>>   } while (0)
>>   
>> -static struct list_head* alloc_list(void)
>> -{
>> -	struct list_head *list;
>> -
>> -	list = malloc(sizeof(*list));
>> -	if (!list)
>> -		return NULL;
>> -
>> -	INIT_LIST_HEAD(list);
>> -	return list;
>> -}
>> -
>>   static void free_list_evsel(struct list_head* list_evsel)
>>   {
>>   	struct evsel *evsel, *tmp;
>> @@ -454,7 +442,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_e
>>   
>>   	list = alloc_list();
>>   	ABORT_ON(!list);
>> -	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6);
>> +	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, $5, error, $6,
>> +				parse_state);
>>   	parse_events_terms__delete($6);
>>   	free($1);
>>   	free($3);
>> @@ -475,7 +464,8 @@ PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT opt_event_config
>>   
>>   	list = alloc_list();
>>   	ABORT_ON(!list);
>> -	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4);
>> +	err = parse_events_add_cache(list, &parse_state->idx, $1, $3, NULL, error, $4,
>> +				parse_state);
>>   	parse_events_terms__delete($4);
>>   	free($1);
>>   	free($3);
>> @@ -495,7 +485,8 @@ PE_NAME_CACHE_TYPE opt_event_config
>>   
>>   	list = alloc_list();
>>   	ABORT_ON(!list);
>> -	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2);
>> +	err = parse_events_add_cache(list, &parse_state->idx, $1, NULL, NULL, error, $2,
>> +				parse_state);
>>   	parse_events_terms__delete($2);
>>   	free($1);
>>   	if (err) {
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-15  2:04     ` Jin, Yao
@ 2021-03-15 17:34       ` Jiri Olsa
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 17:34 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Mon, Mar 15, 2021 at 10:04:56AM +0800, Jin, Yao wrote:
> Hi Jiri,
> 
> On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> > On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
> > 
> > SNIP
> > 
> > >    cycles: 4: 800933425 1002536659 1002536659
> > >    cycles: 5: 800928573 1002528386 1002528386
> > >    cycles: 6: 800924347 1002520527 1002520527
> > >    cycles: 7: 800922009 1002513176 1002513176
> > >    cycles: 8: 800919624 1002507326 1002507326
> > >    cycles: 9: 800917204 1002500663 1002500663
> > >    cycles: 10: 802096579 1002494280 1002494280
> > >    cycles: 11: 802093770 1002486404 1002486404
> > >    cycles: 12: 803284338 1002479491 1002479491
> > >    cycles: 13: 803277609 1002469777 1002469777
> > >    cycles: 14: 800875902 1002458861 1002458861
> > >    cycles: 15: 800873241 1002451350 1002451350
> > >    cycles: 0: 800837379 1002444645 1002444645
> > >    cycles: 1: 800833400 1002438505 1002438505
> > >    cycles: 2: 800829291 1002433698 1002433698
> > >    cycles: 3: 800824390 1002427584 1002427584
> > >    cycles: 4: 800819360 1002422099 1002422099
> > >    cycles: 5: 800814787 1002415845 1002415845
> > >    cycles: 6: 800810125 1002410301 1002410301
> > >    cycles: 7: 800791893 1002386845 1002386845
> > >    cycles: 12855737722 16040169029 16040169029
> > >    cycles: 6406560625 8019379522 8019379522
> > > 
> > >     Performance counter stats for 'system wide':
> > > 
> > >        12,855,737,722      cpu_core/cycles/
> > >         6,406,560,625      cpu_atom/cycles/
> > 
> > so we do that no_merge stuff for uncore pmus, why can't we do
> > that in here? that'd seems like generic way
> > 
> > jirka
> > 
> 
> We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf
> stat: Uniquify hybrid event name".
> 
> For hybrid hardware events, they have different configs. The config is
> 0xDD000000AA (0x400000000 for core vs. 0xa00000000 for atom in this example)
> 
> We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate
> the configs and create the evsels for each hybrid PMU. This logic and the
> code are not complex and easy to understand.
> 
> Uncore looks complicated. It has uncore alias concept which is for different
> PMUs but with same prefix. Such as "uncore_cbox" for "uncore_cbox_0" to
> "uncore_cbox_9". But the uncore alias concept doesn't apply to hybrid pmu
> (we just have "cpu_core" and "cpu_atom" here). And actually I also don't
> want to mix the core stuff with uncore stuff, that would be hard for
> understanding.
> 
> Perhaps I misunderstand, correct me if I'm wrong.

not sure, I thought the merging stuff was more generic,
because the change looks too specific for me, I'll try
to check on it more deeply

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-15  2:28     ` Jin, Yao
@ 2021-03-15 17:37       ` Jiri Olsa
  2021-03-16  1:49         ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 17:37 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Mon, Mar 15, 2021 at 10:28:12AM +0800, Jin, Yao wrote:
> Hi Jiri,
> 
> On 3/13/2021 3:15 AM, Jiri Olsa wrote:
> > On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:
> > > On hybrid platform, some hardware events are only available
> > > on a specific pmu. For example, 'L1-dcache-load-misses' is only
> > > available on 'cpu_core' pmu. And even for the event which can be
> > > available on both pmus, the user also may want to just enable
> > > one event. So now following syntax is supported:
> > > 
> > > cpu_core/<hardware event>/
> > > cpu_core/<hardware cache event>/
> > > cpu_core/<pmu event>/
> > > 
> > > cpu_atom/<hardware event>/
> > > cpu_atom/<hardware cache event>/
> > > cpu_atom/<pmu event>/
> > > 
> > > It limits the event to be enabled only on a specified pmu.
> > > 
> > > The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
> > > in parse_events_add_pmu(), term->config is "LLC-loads".
> > 
> > hum, I don't understand how this doest not work even now,
> > I assume both cpu_core and cpu_atom have sysfs device directory
> > with events/ directory right?
> > 
> 
> Yes, we have cpu_core and cpu_atom directories with events.
> 
> root@ssp-pwrt-002:/sys/devices/cpu_atom/events# ls
> branch-instructions  bus-cycles    cache-references  instructions
> mem-stores  topdown-bad-spec topdown-fe-bound
> branch-misses        cache-misses  cpu-cycles        mem-loads
> ref-cycles  topdown-be-bound topdown-retiring
> 
> root@ssp-pwrt-002:/sys/devices/cpu_core/events# ls
> branch-instructions  cache-misses      instructions   mem-stores
> topdown-bad-spec topdown-fe-bound   topdown-mem-bound
> branch-misses        cache-references  mem-loads      ref-cycles
> topdown-be-bound topdown-fetch-lat  topdown-retiring
> bus-cycles           cpu-cycles        mem-loads-aux  slots
> topdown-br-mispredict topdown-heavy-ops
> 
> > and whatever is defined in events we allow in parsing syntax..
> > 
> > why can't we treat them like 2 separated pmus?
> > 
> 
> But if without this patch, it reports the error,
> 
> root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/ -a -vv -- sleep 1
> event syntax error: 'cpu_core/cycles/'
>                               \___ unknown term 'cycles' for pmu 'cpu_core'

yep, because there's special care for 'cycles' unfortunately,
but you should be able to run 'cpu_core/cpu-cycles/' right?

> 
> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> 
> Initial error:
> event syntax error: 'cpu_core/cycles/'
>                               \___ unknown term 'cycles' for pmu 'cpu_core'
> 
> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> Run 'perf list' for a list of valid events
> 
> The 'cycles' is treated as a unknown term, then it errors out.

yep, because it's not in events.. we could add special rule to
treat cycles as cpu-cycles inside pmu definition ;-)

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group
  2021-03-11  7:07 ` [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group Jin Yao
@ 2021-03-15 23:03   ` Jiri Olsa
  2021-03-16  5:25     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:03 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:31PM +0800, Jin Yao wrote:

SNIP

>  				goto try_again;
>  			}
> +
> +			if (errno == EINVAL && perf_pmu__hybrid_exist())
> +				evlist__warn_hybrid_group(evlist);
>  			rc = -errno;
>  			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
>  			ui__error("%s\n", msg);
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 7a732508b2b4..6f780a039db0 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
>  	struct evsel *evsel, *pos, *leader;
>  	char buf[1024];
>  
> +	if (evlist__hybrid_exist(evlist))
> +		return;

this should be in separate patch and explained

> +
>  	evlist__for_each_entry(evlist, evsel) {
>  		leader = evsel->leader;
>  
> @@ -726,6 +729,10 @@ enum counter_recovery {
>  static enum counter_recovery stat_handle_error(struct evsel *counter)
>  {
>  	char msg[BUFSIZ];
> +
> +	if (perf_pmu__hybrid_exist() && errno == EINVAL)
> +		evlist__warn_hybrid_group(evsel_list);
> +
>  	/*
>  	 * PPC returns ENXIO for HW counters until 2.6.37
>  	 * (behavior changed with commit b0a873e).
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index f139151b9433..5ec891418cdd 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
>  	perf_cpu_map__put(evlist->core.all_cpus);
>  	evlist->core.all_cpus = perf_cpu_map__empty_new(1);
>  }
> +
> +static bool group_hybrid_conflict(struct evsel *leader)
> +{
> +	struct evsel *pos, *prev = NULL;
> +
> +	for_each_group_evsel(pos, leader) {
> +		if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
> +			continue;
> +
> +		if (prev && strcmp(prev->pmu_name, pos->pmu_name))
> +			return true;
> +
> +		prev = pos;
> +	}
> +
> +	return false;
> +}
> +
> +void evlist__warn_hybrid_group(struct evlist *evlist)
> +{
> +	struct evsel *evsel;
> +
> +	evlist__for_each_entry(evlist, evsel) {
> +		if (evsel__is_group_leader(evsel) &&
> +		    evsel->core.nr_members > 1 &&

hm, could we just iterate all the members and make sure the first found
hybrid event's pmu matches the pmu of the rest hybrid events in the list?

> +		    group_hybrid_conflict(evsel)) {
> +			WARN_ONCE(1, "WARNING: Group has events from "
> +				     "different hybrid PMUs\n");
> +			return;
> +		}
> +	}
> +}
> +
> +bool evlist__hybrid_exist(struct evlist *evlist)

evlist__has_hybrid seems better


jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 04/27] perf pmu: Save pmu name
  2021-03-11  7:07 ` [PATCH v2 04/27] perf pmu: Save pmu name Jin Yao
@ 2021-03-15 23:03   ` Jiri Olsa
  2021-03-16  0:59     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:03 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:19PM +0800, Jin Yao wrote:
> On hybrid platform, one event is available on one pmu
> (such as, available on cpu_core or on cpu_atom).
> 
> This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
> Then next we can know the pmu which the event can be available on.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/util/pmu.c | 10 +++++++++-
>  tools/perf/util/pmu.h |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 54e586bf19a5..45d8db1af8d2 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
>  	zfree(&newalias->str);
>  	zfree(&newalias->metric_expr);
>  	zfree(&newalias->metric_name);
> +	zfree(&newalias->pmu);
>  	parse_events_terms__purge(&newalias->terms);
>  	free(newalias);
>  }
> @@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias,
>  
>  	list_for_each_entry(a, alist, list) {
>  		if (!strcasecmp(newalias->name, a->name)) {
> +			if (newalias->pmu && a->pmu &&
> +			    !strcasecmp(newalias->pmu, a->pmu)) {
> +				continue;
> +			}
>  			perf_pmu_update_alias(a, newalias);
>  			perf_pmu_free_alias(newalias);
>  			return true;
> @@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>  	int num;
>  	char newval[256];
>  	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
> -	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
> +	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
> +	     *pmu = NULL;
>  
>  	if (pe) {
>  		long_desc = (char *)pe->long_desc;
> @@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>  		metric_expr = (char *)pe->metric_expr;
>  		metric_name = (char *)pe->metric_name;
>  		deprecated = (char *)pe->deprecated;
> +		pmu = (char *)pe->pmu;
>  	}
>  
>  	alias = malloc(sizeof(*alias));
> @@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>  	}
>  	alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
>  	alias->str = strdup(newval);
> +	alias->pmu = pmu ? strdup(pmu) : NULL;
>  
>  	if (deprecated)
>  		alias->deprecated = true;
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 8164388478c6..0e724d5b84c6 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -72,6 +72,7 @@ struct perf_pmu_alias {
>  	bool deprecated;
>  	char *metric_expr;
>  	char *metric_name;
> +	char *pmu;

please use pmu_name

thanks,
jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 08/27] perf stat: Uniquify hybrid event name
  2021-03-11  7:07 ` [PATCH v2 08/27] perf stat: Uniquify hybrid event name Jin Yao
@ 2021-03-15 23:04   ` Jiri Olsa
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:04 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:23PM +0800, Jin Yao wrote:
> It would be useful to tell user the pmu which the event belongs to.
> perf-stat has supported '--no-merge' option and it can print the pmu
> name after the event name, such as:
> 
> "cycles [cpu_core]"
> 
> Now this option is enabled by default for hybrid platform but change
> the format to:
> 
> "cpu_core/cycles/"
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/builtin-stat.c      |  3 +++
>  tools/perf/util/stat-display.c | 12 ++++++++++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 68ecf68699a9..6c0a21323814 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -2390,6 +2390,9 @@ int cmd_stat(int argc, const char **argv)
>  
>  	evlist__check_cpu_maps(evsel_list);
>  
> +	if (perf_pmu__hybrid_exist())
> +		stat_config.no_merge = true;
> +
>  	/*
>  	 * Initialize thread_map with comm names,
>  	 * so we could print it out on output.
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 7f09cdaf5b60..ed37d8e7ea1a 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -526,6 +526,7 @@ static void uniquify_event_name(struct evsel *counter)
>  {
>  	char *new_name;
>  	char *config;
> +	int ret;
>  
>  	if (counter->uniquified_name ||
>  	    !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
> @@ -540,8 +541,15 @@ static void uniquify_event_name(struct evsel *counter)
>  			counter->name = new_name;
>  		}
>  	} else {
> -		if (asprintf(&new_name,
> -			     "%s [%s]", counter->name, counter->pmu_name) > 0) {
> +		if (perf_pmu__hybrid_exist()) {

I'm still not sure about the whole thing, but should you
check in here jut for count->pmu_name instead for hybrid
globaly?

jirka

> +			ret = asprintf(&new_name, "%s/%s/",
> +				       counter->pmu_name, counter->name);
> +		} else {
> +			ret = asprintf(&new_name, "%s [%s]",
> +				       counter->name, counter->pmu_name);
> +		}
> +
> +		if (ret) {
>  			free(counter->name);
>  			counter->name = new_name;
>  		}
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group
  2021-03-11  7:07 ` [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group Jin Yao
@ 2021-03-15 23:04   ` Jiri Olsa
  2021-03-16  4:39     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:04 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:32PM +0800, Jin Yao wrote:
> A group mixed with hybrid event and global event is allowed. For example,
> group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'.
> 
> e.g.
> perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a
> 
> The challenge is their available cpus are not fully matched.
> For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/'
> is available on CPU16-CPU23.
> 
> When getting the group id for group member, we must be very careful
> because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'.
> Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID.
> e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16.
> 
> Another challenge is for group read. The events in group may be not
> available on all cpus. For example the leader is a software event and
> it's available on CPU0-CPU1, but the group member is a hybrid event and
> it's only available on CPU1. For CPU0, we have only one event, but for CPU1
> we have two events. So we need to change the read size according to
> the real number of events on that cpu.

ugh, this is really bad.. do we really want to support it? ;-)
I guess we need that for metrics..

SNIP

> 
>    Performance counter stats for 'system wide':
> 
>            24,059.14 msec cpu-clock                 #   23.994 CPUs utilized
>        6,406,677,892      cpu_atom/cycles/          #  266.289 M/sec
> 
>          1.002699058 seconds time elapsed
> 
> For cpu_atom/cycles/, cpu16-cpu23 are set with valid group fd (cpu-clock's fd
> on that cpu). For counting results, cpu-clock has 24 cpus aggregation and
> cpu_atom/cycles/ has 8 cpus aggregation. That's expected.
> 
> But if the event order is changed, e.g. '{cpu_atom/cycles/,cpu-clock}',
> there leaves more works to do.
> 
>   root@ssp-pwrt-002:~# ./perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv -- sleep 1

what id you add the other hybrid pmu event? or just cycles?


SNIP
  
> +static int hybrid_read_size(struct evsel *leader, int cpu, int *nr_members)
> +{
> +	struct evsel *pos;
> +	int nr = 1, back, new_size = 0, idx;
> +
> +	for_each_group_member(pos, leader) {
> +		idx = evsel_cpuid_match(leader, pos, cpu);
> +		if (idx != -1)
> +			nr++;
> +	}
> +
> +	if (nr != leader->core.nr_members) {
> +		back = leader->core.nr_members;
> +		leader->core.nr_members = nr;
> +		new_size = perf_evsel__read_size(&leader->core);
> +		leader->core.nr_members = back;
> +	}
> +
> +	*nr_members = nr;
> +	return new_size;
> +}
> +
>  static int evsel__read_group(struct evsel *leader, int cpu, int thread)
>  {
>  	struct perf_stat_evsel *ps = leader->stats;
>  	u64 read_format = leader->core.attr.read_format;
>  	int size = perf_evsel__read_size(&leader->core);
> +	int new_size, nr_members;
>  	u64 *data = ps->group_data;
>  
>  	if (!(read_format & PERF_FORMAT_ID))
>  		return -EINVAL;

I wonder if we do not find some reasonable generic way to process
this, porhaps we should make some early check that this evlist has
hybrid event and the move the implementation in some separated
hybrid-XXX object, so we don't confuse the code

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-11  7:07 ` [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events Jin Yao
  2021-03-12 19:15   ` Jiri Olsa
@ 2021-03-15 23:05   ` Jiri Olsa
  2021-03-16  3:13     ` Jin, Yao
  1 sibling, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:05 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
> For hardware events, they have pre-defined configs. The kernel
> needs to know where the event comes from (e.g. from cpu_core pmu
> or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
> can't carry pmu information.
> 
> So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
> The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:
> 
> 0xDD000000AA
> AA: original hardware event ID
> DD: PMU type ID
> 
> PMU type ID is retrieved from sysfs. For example,
> 
>   cat /sys/devices/cpu_atom/type
>   10
> 
>   cat /sys/devices/cpu_core/type
>   4
> 
> When enabling a hybrid hardware event without specified pmu, such as,
> 'perf stat -e cycles -a', two events are created automatically. One
> is for atom, the other is for core.

ok I think I understand the need for this (and the following) patch
the perf_hw_id counters could be global, so when you specify only
event like:

   -e cycles

you want all the cycles, which on hybrid system means cycles from
more than one pmus

SNIP

> @@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
>  {
>  	struct perf_event_attr attr;
>  	LIST_HEAD(config_terms);
> +	bool hybrid;
> +	int ret;
>  
>  	memset(&attr, 0, sizeof(attr));
>  	attr.type = type;
> @@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
>  			return -ENOMEM;
>  	}
>  
> +	/*
> +	 * Skip the software dummy event.
> +	 */
> +	if (type != PERF_TYPE_SOFTWARE) {
> +		if (!perf_pmu__hybrid_exist())
> +			perf_pmu__scan(NULL);

this could be checked in the following add_hybrid_numeric call

> +
> +		ret = add_hybrid_numeric(parse_state, list, &attr, &hybrid);
> +		if (hybrid)
> +			return ret;
> +	}

could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c,

there's already global __add_event wrapper - parse_events__add_event 


jirka

> +
>  	return add_event(list, &parse_state->idx, &attr,
>  			 get_config_name(head_config), &config_terms);
>  }
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 10/27] perf parse-events: Create two hybrid cache events
  2021-03-11  7:07 ` [PATCH v2 10/27] perf parse-events: Create two hybrid cache events Jin Yao
@ 2021-03-15 23:05   ` Jiri Olsa
  2021-03-16  3:51     ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-15 23:05 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Thu, Mar 11, 2021 at 03:07:25PM +0800, Jin Yao wrote:

SNIP

> +						config_terms, pmu);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>  int parse_events_add_cache(struct list_head *list, int *idx,
>  			   char *type, char *op_result1, char *op_result2,
>  			   struct parse_events_error *err,
> @@ -474,7 +516,8 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>  	char name[MAX_NAME_LEN], *config_name;
>  	int cache_type = -1, cache_op = -1, cache_result = -1;
>  	char *op_result[2] = { op_result1, op_result2 };
> -	int i, n;
> +	int i, n, ret;
> +	bool hybrid;
>  
>  	/*
>  	 * No fallback - if we cannot get a clear cache type
> @@ -534,6 +577,15 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>  		if (get_config_terms(head_config, &config_terms))
>  			return -ENOMEM;
>  	}
> +
> +	if (!perf_pmu__hybrid_exist())
> +		perf_pmu__scan(NULL);

actualy how about construct like:

	perf_pmu_is_hybrid()
		return hybrid_add_event_cache(...)

	return add_event(...)


with:
	perf_pmu_is_hybrid()
	{
		static bool initialized;

		if (!initialized) {
			initialized = true;
			perf_pmu__scan(NULL)
		}

		return ...
	}

jirka

> +
> +	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
> +			       &config_terms, &hybrid);
> +	if (hybrid)
> +		return ret;
> +
>  	return add_event(list, idx, &attr, config_name ? : name, &config_terms);
>  }
>  
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 04/27] perf pmu: Save pmu name
  2021-03-15 23:03   ` Jiri Olsa
@ 2021-03-16  0:59     ` Jin, Yao
  0 siblings, 0 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  0:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 7:03 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:19PM +0800, Jin Yao wrote:
>> On hybrid platform, one event is available on one pmu
>> (such as, available on cpu_core or on cpu_atom).
>>
>> This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
>> Then next we can know the pmu which the event can be available on.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/util/pmu.c | 10 +++++++++-
>>   tools/perf/util/pmu.h |  1 +
>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 54e586bf19a5..45d8db1af8d2 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
>>   	zfree(&newalias->str);
>>   	zfree(&newalias->metric_expr);
>>   	zfree(&newalias->metric_name);
>> +	zfree(&newalias->pmu);
>>   	parse_events_terms__purge(&newalias->terms);
>>   	free(newalias);
>>   }
>> @@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias,
>>   
>>   	list_for_each_entry(a, alist, list) {
>>   		if (!strcasecmp(newalias->name, a->name)) {
>> +			if (newalias->pmu && a->pmu &&
>> +			    !strcasecmp(newalias->pmu, a->pmu)) {
>> +				continue;
>> +			}
>>   			perf_pmu_update_alias(a, newalias);
>>   			perf_pmu_free_alias(newalias);
>>   			return true;
>> @@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>>   	int num;
>>   	char newval[256];
>>   	char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
>> -	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
>> +	     *metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
>> +	     *pmu = NULL;
>>   
>>   	if (pe) {
>>   		long_desc = (char *)pe->long_desc;
>> @@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>>   		metric_expr = (char *)pe->metric_expr;
>>   		metric_name = (char *)pe->metric_name;
>>   		deprecated = (char *)pe->deprecated;
>> +		pmu = (char *)pe->pmu;
>>   	}
>>   
>>   	alias = malloc(sizeof(*alias));
>> @@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
>>   	}
>>   	alias->per_pkg = perpkg && sscanf(perpkg, "%d", &num) == 1 && num == 1;
>>   	alias->str = strdup(newval);
>> +	alias->pmu = pmu ? strdup(pmu) : NULL;
>>   
>>   	if (deprecated)
>>   		alias->deprecated = true;
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index 8164388478c6..0e724d5b84c6 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -72,6 +72,7 @@ struct perf_pmu_alias {
>>   	bool deprecated;
>>   	char *metric_expr;
>>   	char *metric_name;
>> +	char *pmu;
> 
> please use pmu_name
> 
> thanks,
> jirka
> 

OK, I will use pmu_name in next version.

Thanks
Jin Yao


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-15 17:37       ` Jiri Olsa
@ 2021-03-16  1:49         ` Jin, Yao
  2021-03-16 14:04           ` Jiri Olsa
  0 siblings, 1 reply; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  1:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 1:37 AM, Jiri Olsa wrote:
> On Mon, Mar 15, 2021 at 10:28:12AM +0800, Jin, Yao wrote:
>> Hi Jiri,
>>
>> On 3/13/2021 3:15 AM, Jiri Olsa wrote:
>>> On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:
>>>> On hybrid platform, some hardware events are only available
>>>> on a specific pmu. For example, 'L1-dcache-load-misses' is only
>>>> available on 'cpu_core' pmu. And even for the event which can be
>>>> available on both pmus, the user also may want to just enable
>>>> one event. So now following syntax is supported:
>>>>
>>>> cpu_core/<hardware event>/
>>>> cpu_core/<hardware cache event>/
>>>> cpu_core/<pmu event>/
>>>>
>>>> cpu_atom/<hardware event>/
>>>> cpu_atom/<hardware cache event>/
>>>> cpu_atom/<pmu event>/
>>>>
>>>> It limits the event to be enabled only on a specified pmu.
>>>>
>>>> The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
>>>> in parse_events_add_pmu(), term->config is "LLC-loads".
>>>
>>> hum, I don't understand how this doest not work even now,
>>> I assume both cpu_core and cpu_atom have sysfs device directory
>>> with events/ directory right?
>>>
>>
>> Yes, we have cpu_core and cpu_atom directories with events.
>>
>> root@ssp-pwrt-002:/sys/devices/cpu_atom/events# ls
>> branch-instructions  bus-cycles    cache-references  instructions
>> mem-stores  topdown-bad-spec topdown-fe-bound
>> branch-misses        cache-misses  cpu-cycles        mem-loads
>> ref-cycles  topdown-be-bound topdown-retiring
>>
>> root@ssp-pwrt-002:/sys/devices/cpu_core/events# ls
>> branch-instructions  cache-misses      instructions   mem-stores
>> topdown-bad-spec topdown-fe-bound   topdown-mem-bound
>> branch-misses        cache-references  mem-loads      ref-cycles
>> topdown-be-bound topdown-fetch-lat  topdown-retiring
>> bus-cycles           cpu-cycles        mem-loads-aux  slots
>> topdown-br-mispredict topdown-heavy-ops
>>
>>> and whatever is defined in events we allow in parsing syntax..
>>>
>>> why can't we treat them like 2 separated pmus?
>>>
>>
>> But if without this patch, it reports the error,
>>
>> root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/ -a -vv -- sleep 1
>> event syntax error: 'cpu_core/cycles/'
>>                                \___ unknown term 'cycles' for pmu 'cpu_core'
> 
> yep, because there's special care for 'cycles' unfortunately,
> but you should be able to run 'cpu_core/cpu-cycles/' right?
> 

Yes, cpu_core/cpu-cycles/ is OK.

# ./perf stat -e cpu_core/cpu-cycles/ -a -- sleep 1

  Performance counter stats for 'system wide':

     12,831,980,326      cpu_core/cpu-cycles/

        1.003132639 seconds time elapsed

>>
>> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
>>
>> Initial error:
>> event syntax error: 'cpu_core/cycles/'
>>                                \___ unknown term 'cycles' for pmu 'cpu_core'
>>
>> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
>> Run 'perf list' for a list of valid events
>>
>> The 'cycles' is treated as a unknown term, then it errors out.
> 
> yep, because it's not in events.. we could add special rule to
> treat cycles as cpu-cycles inside pmu definition ;-)
> 
> jirka
> 

But not only the cycles, the branches has error too.

# ./perf stat -e cpu_core/branches/ -a -- sleep 1
event syntax error: 'cpu_core/branches/'
                               \___ unknown term 'branches' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/branches/'
                               \___ unknown term 'branches' for pmu 'cpu_core'

Of course, branch-instructions runs OK.

# ./perf stat -e cpu_core/branch-instructions/ -a -- sleep 1

  Performance counter stats for 'system wide':

        136,655,302      cpu_core/branch-instructions/

        1.003171561 seconds time elapsed

So we need special rules for both cycles and branches.

The worse thing is, we also need to process the hardware cache events.

# ./perf stat -e cpu_core/LLC-loads/
event syntax error: 'cpu_core/LLC-loads/'
                               \___ unknown term 'LLC-loads' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/LLC-loads/'
                               \___ unknown term 'LLC-loads' for pmu 'cpu_core'

If we use special rules for establishing all event mapping, that looks too much. :(

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
  2021-03-15 23:05   ` Jiri Olsa
@ 2021-03-16  3:13     ` Jin, Yao
  0 siblings, 0 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  3:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 7:05 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:
>> For hardware events, they have pre-defined configs. The kernel
>> needs to know where the event comes from (e.g. from cpu_core pmu
>> or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
>> can't carry pmu information.
>>
>> So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
>> The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:
>>
>> 0xDD000000AA
>> AA: original hardware event ID
>> DD: PMU type ID
>>
>> PMU type ID is retrieved from sysfs. For example,
>>
>>    cat /sys/devices/cpu_atom/type
>>    10
>>
>>    cat /sys/devices/cpu_core/type
>>    4
>>
>> When enabling a hybrid hardware event without specified pmu, such as,
>> 'perf stat -e cycles -a', two events are created automatically. One
>> is for atom, the other is for core.
> 
> ok I think I understand the need for this (and the following) patch
> the perf_hw_id counters could be global, so when you specify only
> event like:
> 
>     -e cycles
> 
> you want all the cycles, which on hybrid system means cycles from
> more than one pmus
> 

Yes, on hybrid system it means the cycles from two pmus. One cycle is from cpu_core pmu, another 
cycles is from cpu_atom pmu.

> SNIP
> 
>> @@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
>>   {
>>   	struct perf_event_attr attr;
>>   	LIST_HEAD(config_terms);
>> +	bool hybrid;
>> +	int ret;
>>   
>>   	memset(&attr, 0, sizeof(attr));
>>   	attr.type = type;
>> @@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state *parse_state,
>>   			return -ENOMEM;
>>   	}
>>   
>> +	/*
>> +	 * Skip the software dummy event.
>> +	 */
>> +	if (type != PERF_TYPE_SOFTWARE) {
>> +		if (!perf_pmu__hybrid_exist())
>> +			perf_pmu__scan(NULL);
> 
> this could be checked in the following add_hybrid_numeric call
> 

Yes, that should be OK. I will move the check in the next version.

>> +
>> +		ret = add_hybrid_numeric(parse_state, list, &attr, &hybrid);
>> +		if (hybrid)
>> +			return ret;
>> +	}
> 
> could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c,
> 
> there's already global __add_event wrapper - parse_events__add_event
> 
> 
> jirka
>

Use a new parse-events-hybrid.c, hmm, well that's OK.

Thanks
Jin Yao

>> +
>>   	return add_event(list, &parse_state->idx, &attr,
>>   			 get_config_name(head_config), &config_terms);
>>   }
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 10/27] perf parse-events: Create two hybrid cache events
  2021-03-15 23:05   ` Jiri Olsa
@ 2021-03-16  3:51     ` Jin, Yao
  0 siblings, 0 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  3:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 7:05 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:25PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>> +						config_terms, pmu);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   int parse_events_add_cache(struct list_head *list, int *idx,
>>   			   char *type, char *op_result1, char *op_result2,
>>   			   struct parse_events_error *err,
>> @@ -474,7 +516,8 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>>   	char name[MAX_NAME_LEN], *config_name;
>>   	int cache_type = -1, cache_op = -1, cache_result = -1;
>>   	char *op_result[2] = { op_result1, op_result2 };
>> -	int i, n;
>> +	int i, n, ret;
>> +	bool hybrid;
>>   
>>   	/*
>>   	 * No fallback - if we cannot get a clear cache type
>> @@ -534,6 +577,15 @@ int parse_events_add_cache(struct list_head *list, int *idx,
>>   		if (get_config_terms(head_config, &config_terms))
>>   			return -ENOMEM;
>>   	}
>> +
>> +	if (!perf_pmu__hybrid_exist())
>> +		perf_pmu__scan(NULL);
> 
> actualy how about construct like:
> 
> 	perf_pmu_is_hybrid()
> 		return hybrid_add_event_cache(...)
> 
> 	return add_event(...)
> 
> 
> with:
> 	perf_pmu_is_hybrid()
> 	{
> 		static bool initialized;
> 
> 		if (!initialized) {
> 			initialized = true;
> 			perf_pmu__scan(NULL)
> 		}
> 
> 		return ...
> 	}
> 
> jirka
> 

Thanks, that's a good solution. I will do that in v3.

Thanks
Jin Yao

>> +
>> +	ret = add_hybrid_cache(list, idx, &attr, config_name ? : name,
>> +			       &config_terms, &hybrid);
>> +	if (hybrid)
>> +		return ret;
>> +
>>   	return add_event(list, idx, &attr, config_name ? : name, &config_terms);
>>   }
>>   
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group
  2021-03-15 23:04   ` Jiri Olsa
@ 2021-03-16  4:39     ` Jin, Yao
  0 siblings, 0 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  4:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 7:04 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:32PM +0800, Jin Yao wrote:
>> A group mixed with hybrid event and global event is allowed. For example,
>> group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'.
>>
>> e.g.
>> perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a
>>
>> The challenge is their available cpus are not fully matched.
>> For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/'
>> is available on CPU16-CPU23.
>>
>> When getting the group id for group member, we must be very careful
>> because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'.
>> Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID.
>> e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16.
>>
>> Another challenge is for group read. The events in group may be not
>> available on all cpus. For example the leader is a software event and
>> it's available on CPU0-CPU1, but the group member is a hybrid event and
>> it's only available on CPU1. For CPU0, we have only one event, but for CPU1
>> we have two events. So we need to change the read size according to
>> the real number of events on that cpu.
> 
> ugh, this is really bad.. do we really want to support it? ;-)
> I guess we need that for metrics..
> 

Yes, it's a bit of pain but the user case makes sense. Some metrics need the event group which 
consists of global event + hybrid event.

For example, CPU_Utilization = 'cpu_clk_unhalted.ref_tsc' / 'msr/tsc/'.

'msr/tsc/' is a global event. It's valid on all CPUs.

But 'cpu_clk_unhalted.ref' is hybrid event.
'cpu_core/cpu_clk_unhalted.ref/' is valid on core CPUs
'cpu_atom/cpu_clk_unhalted.ref/' is valid on atom CPUs.

So we have to support this usage. :)

> SNIP
> 
>>
>>     Performance counter stats for 'system wide':
>>
>>             24,059.14 msec cpu-clock                 #   23.994 CPUs utilized
>>         6,406,677,892      cpu_atom/cycles/          #  266.289 M/sec
>>
>>           1.002699058 seconds time elapsed
>>
>> For cpu_atom/cycles/, cpu16-cpu23 are set with valid group fd (cpu-clock's fd
>> on that cpu). For counting results, cpu-clock has 24 cpus aggregation and
>> cpu_atom/cycles/ has 8 cpus aggregation. That's expected.
>>
>> But if the event order is changed, e.g. '{cpu_atom/cycles/,cpu-clock}',
>> there leaves more works to do.
>>
>>    root@ssp-pwrt-002:~# ./perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv -- sleep 1
> 
> what id you add the other hybrid pmu event? or just cycles?
>

Do you mean the config for cpu_atom/cycles/? Let's see the log.

root@ssp-pwrt-002:~# perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv -- sleep 1
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
   type                             6
   size                             120
   config                           0xa00000000
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
   disabled                         1
   inherit                          1
   exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 3
sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 7
sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 8
sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
   type                             1
   size                             120
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
   inherit                          1
   exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 13
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 19
sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 20
sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 21
sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 22
sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 23
sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 24
sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 25
sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 26
sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 27
sys_perf_event_open: pid -1  cpu 16  group_fd 3  flags 0x8 = 28
sys_perf_event_open: pid -1  cpu 17  group_fd 4  flags 0x8 = 29
sys_perf_event_open: pid -1  cpu 18  group_fd 5  flags 0x8 = 30
sys_perf_event_open: pid -1  cpu 19  group_fd 7  flags 0x8 = 31
sys_perf_event_open: pid -1  cpu 20  group_fd 8  flags 0x8 = 32
sys_perf_event_open: pid -1  cpu 21  group_fd 9  flags 0x8 = 33
sys_perf_event_open: pid -1  cpu 22  group_fd 10  flags 0x8 = 34
sys_perf_event_open: pid -1  cpu 23  group_fd 11  flags 0x8 = 35
cycles: 0: 800791792 1002389889 1002389889
cycles: 1: 800788198 1002383611 1002383611
cycles: 2: 800783491 1002377507 1002377507
cycles: 3: 800777752 1002371035 1002371035
cycles: 4: 800771559 1002363669 1002363669
cycles: 5: 800766391 1002356944 1002356944
cycles: 6: 800761593 1002350144 1002350144
cycles: 7: 800756258 1002343203 1002343203
WARNING: for cpu-clock, some CPU counts not read
cpu-clock: 0: 0 0 0
cpu-clock: 1: 0 0 0
cpu-clock: 2: 0 0 0
cpu-clock: 3: 0 0 0
cpu-clock: 4: 0 0 0
cpu-clock: 5: 0 0 0
cpu-clock: 6: 0 0 0
cpu-clock: 7: 0 0 0
cpu-clock: 8: 0 0 0
cpu-clock: 9: 0 0 0
cpu-clock: 10: 0 0 0
cpu-clock: 11: 0 0 0
cpu-clock: 12: 0 0 0
cpu-clock: 13: 0 0 0
cpu-clock: 14: 0 0 0
cpu-clock: 15: 0 0 0
cpu-clock: 16: 1002390566 1002389889 1002389889
cpu-clock: 17: 1002383263 1002383611 1002383611
cpu-clock: 18: 1002377257 1002377507 1002377507
cpu-clock: 19: 1002370895 1002371035 1002371035
cpu-clock: 20: 1002363611 1002363669 1002363669
cpu-clock: 21: 1002356623 1002356944 1002356944
cpu-clock: 22: 1002349562 1002350144 1002350144
cpu-clock: 23: 1002343089 1002343203 1002343203
cycles: 6406197034 8018936002 8018936002
cpu-clock: 8018934866 8018936002 8018936002

  Performance counter stats for 'system wide':

      6,406,197,034      cpu_atom/cycles/          #  798.884 M/sec
           8,018.93 msec cpu-clock                 #    7.999 CPUs utilized

        1.002475994 seconds time elapsed

'0xa00000000' is the config for cpu_atom/cycles/ and we can see this event is only enabled on CPU16-23.

'cpu-clock' is a global event and it's enable don CPU0-23 (but only have valid group fd on CPU16-23).

> 
> SNIP
>    
>> +static int hybrid_read_size(struct evsel *leader, int cpu, int *nr_members)
>> +{
>> +	struct evsel *pos;
>> +	int nr = 1, back, new_size = 0, idx;
>> +
>> +	for_each_group_member(pos, leader) {
>> +		idx = evsel_cpuid_match(leader, pos, cpu);
>> +		if (idx != -1)
>> +			nr++;
>> +	}
>> +
>> +	if (nr != leader->core.nr_members) {
>> +		back = leader->core.nr_members;
>> +		leader->core.nr_members = nr;
>> +		new_size = perf_evsel__read_size(&leader->core);
>> +		leader->core.nr_members = back;
>> +	}
>> +
>> +	*nr_members = nr;
>> +	return new_size;
>> +}
>> +
>>   static int evsel__read_group(struct evsel *leader, int cpu, int thread)
>>   {
>>   	struct perf_stat_evsel *ps = leader->stats;
>>   	u64 read_format = leader->core.attr.read_format;
>>   	int size = perf_evsel__read_size(&leader->core);
>> +	int new_size, nr_members;
>>   	u64 *data = ps->group_data;
>>   
>>   	if (!(read_format & PERF_FORMAT_ID))
>>   		return -EINVAL;
> 
> I wonder if we do not find some reasonable generic way to process
> this, porhaps we should make some early check that this evlist has
> hybrid event and the move the implementation in some separated
> hybrid-XXX object, so we don't confuse the code
> 
> jirka
> 

Agree. The code looks a bit tricky and hard to understand for non-hybrid platform.

Maybe it's a good idea to move this code to hybrid-xxx files.

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group
  2021-03-15 23:03   ` Jiri Olsa
@ 2021-03-16  5:25     ` Jin, Yao
  2021-03-18 11:52       ` Jiri Olsa
  0 siblings, 1 reply; 59+ messages in thread
From: Jin, Yao @ 2021-03-16  5:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/16/2021 7:03 AM, Jiri Olsa wrote:
> On Thu, Mar 11, 2021 at 03:07:31PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>>   				goto try_again;
>>   			}
>> +
>> +			if (errno == EINVAL && perf_pmu__hybrid_exist())
>> +				evlist__warn_hybrid_group(evlist);
>>   			rc = -errno;
>>   			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
>>   			ui__error("%s\n", msg);
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 7a732508b2b4..6f780a039db0 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
>>   	struct evsel *evsel, *pos, *leader;
>>   	char buf[1024];
>>   
>> +	if (evlist__hybrid_exist(evlist))
>> +		return;
> 
> this should be in separate patch and explained
> 

Now I have another idea. If a group consists of atom events and core events, we still follow current 
disabling group solution?

I mean removing following code:

if (evlist__hybrid_exist(evlist))
	return;

evlist__check_cpu_maps then continues running and disabling the group. But also report with a 
warning that says "WARNING: Group has events from different hybrid PMUs".

Do you like this way?

>> +
>>   	evlist__for_each_entry(evlist, evsel) {
>>   		leader = evsel->leader;
>>   
>> @@ -726,6 +729,10 @@ enum counter_recovery {
>>   static enum counter_recovery stat_handle_error(struct evsel *counter)
>>   {
>>   	char msg[BUFSIZ];
>> +
>> +	if (perf_pmu__hybrid_exist() && errno == EINVAL)
>> +		evlist__warn_hybrid_group(evsel_list);
>> +
>>   	/*
>>   	 * PPC returns ENXIO for HW counters until 2.6.37
>>   	 * (behavior changed with commit b0a873e).
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index f139151b9433..5ec891418cdd 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
>>   	perf_cpu_map__put(evlist->core.all_cpus);
>>   	evlist->core.all_cpus = perf_cpu_map__empty_new(1);
>>   }
>> +
>> +static bool group_hybrid_conflict(struct evsel *leader)
>> +{
>> +	struct evsel *pos, *prev = NULL;
>> +
>> +	for_each_group_evsel(pos, leader) {
>> +		if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
>> +			continue;
>> +
>> +		if (prev && strcmp(prev->pmu_name, pos->pmu_name))
>> +			return true;
>> +
>> +		prev = pos;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +void evlist__warn_hybrid_group(struct evlist *evlist)
>> +{
>> +	struct evsel *evsel;
>> +
>> +	evlist__for_each_entry(evlist, evsel) {
>> +		if (evsel__is_group_leader(evsel) &&
>> +		    evsel->core.nr_members > 1 &&
> 
> hm, could we just iterate all the members and make sure the first found
> hybrid event's pmu matches the pmu of the rest hybrid events in the list?
> 

'{cpu_core/event1/,cpu_core/event2/}','{cpu_atom/event3/,cpu_atom/event4/}'

Two or more groups need to be supported. We get the first hybrid event's pmu (cpu_core in this 
example) but it doesn't match the cpu_atom/event3/ and cpu_atom/event4/. But actually this case 
should be supported, right?

>> +		    group_hybrid_conflict(evsel)) {
>> +			WARN_ONCE(1, "WARNING: Group has events from "
>> +				     "different hybrid PMUs\n");
>> +			return;
>> +		}
>> +	}
>> +}
>> +
>> +bool evlist__hybrid_exist(struct evlist *evlist)
> 
> evlist__has_hybrid seems better
> 

Yes, agree.

Thanks
Jin Yao

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-16  1:49         ` Jin, Yao
@ 2021-03-16 14:04           ` Jiri Olsa
  2021-03-17  2:12             ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-16 14:04 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:

SNIP

> 
>  Performance counter stats for 'system wide':
> 
>        136,655,302      cpu_core/branch-instructions/
> 
>        1.003171561 seconds time elapsed
> 
> So we need special rules for both cycles and branches.
> 
> The worse thing is, we also need to process the hardware cache events.
> 
> # ./perf stat -e cpu_core/LLC-loads/
> event syntax error: 'cpu_core/LLC-loads/'
>                               \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> 
> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> 
> Initial error:
> event syntax error: 'cpu_core/LLC-loads/'
>                               \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> 
> If we use special rules for establishing all event mapping, that looks too much. :(

hmmm but wait, currently we do not support events like this:

  'cpu/cycles/'
  'cpu/branches/'

the pmu style accepts only 'events' or 'format' terms within //

we made hw events like 'cycles','instructions','branches' special
to be used without the pmu

so why do we need to support cpu_code/cycles/ ?

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-16 14:04           ` Jiri Olsa
@ 2021-03-17  2:12             ` Jin, Yao
  2021-03-17 10:06               ` Jiri Olsa
  0 siblings, 1 reply; 59+ messages in thread
From: Jin, Yao @ 2021-03-17  2:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 3/16/2021 10:04 PM, Jiri Olsa wrote:
> On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
> 
> SNIP
> 
>>
>>   Performance counter stats for 'system wide':
>>
>>         136,655,302      cpu_core/branch-instructions/
>>
>>         1.003171561 seconds time elapsed
>>
>> So we need special rules for both cycles and branches.
>>
>> The worse thing is, we also need to process the hardware cache events.
>>
>> # ./perf stat -e cpu_core/LLC-loads/
>> event syntax error: 'cpu_core/LLC-loads/'
>>                                \___ unknown term 'LLC-loads' for pmu 'cpu_core'
>>
>> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
>>
>> Initial error:
>> event syntax error: 'cpu_core/LLC-loads/'
>>                                \___ unknown term 'LLC-loads' for pmu 'cpu_core'
>>
>> If we use special rules for establishing all event mapping, that looks too much. :(
> 
> hmmm but wait, currently we do not support events like this:
> 
>    'cpu/cycles/'
>    'cpu/branches/'
> 
> the pmu style accepts only 'events' or 'format' terms within //
> 
> we made hw events like 'cycles','instructions','branches' special
> to be used without the pmu
> 
> so why do we need to support cpu_code/cycles/ ?
> 
> jirka
> 

Actually we have to support pmu style event for hybrid platform.

User may want to enable the events from specified pmus and also with flexible grouping.

For example,

perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'

This usage is common and reasonable. So I think we may need to support pmu style events.

Thanks
Jin Yao


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17  2:12             ` Jin, Yao
@ 2021-03-17 10:06               ` Jiri Olsa
  2021-03-17 12:17                 ` Jin, Yao
  0 siblings, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-17 10:06 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:
> 
> 
> On 3/16/2021 10:04 PM, Jiri Olsa wrote:
> > On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
> > 
> > SNIP
> > 
> > > 
> > >   Performance counter stats for 'system wide':
> > > 
> > >         136,655,302      cpu_core/branch-instructions/
> > > 
> > >         1.003171561 seconds time elapsed
> > > 
> > > So we need special rules for both cycles and branches.
> > > 
> > > The worse thing is, we also need to process the hardware cache events.
> > > 
> > > # ./perf stat -e cpu_core/LLC-loads/
> > > event syntax error: 'cpu_core/LLC-loads/'
> > >                                \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > 
> > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> > > 
> > > Initial error:
> > > event syntax error: 'cpu_core/LLC-loads/'
> > >                                \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > 
> > > If we use special rules for establishing all event mapping, that looks too much. :(
> > 
> > hmmm but wait, currently we do not support events like this:
> > 
> >    'cpu/cycles/'
> >    'cpu/branches/'
> > 
> > the pmu style accepts only 'events' or 'format' terms within //
> > 
> > we made hw events like 'cycles','instructions','branches' special
> > to be used without the pmu
> > 
> > so why do we need to support cpu_code/cycles/ ?
> > 
> > jirka
> > 
> 
> Actually we have to support pmu style event for hybrid platform.
> 
> User may want to enable the events from specified pmus and also with flexible grouping.
> 
> For example,
> 
> perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'
> 
> This usage is common and reasonable. So I think we may need to support pmu style events.

sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
why do you insist on supporting cpu_core/cycles/ ?

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17 10:06               ` Jiri Olsa
@ 2021-03-17 12:17                 ` Jin, Yao
  2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
  2021-03-18 11:51                   ` Jiri Olsa
  0 siblings, 2 replies; 59+ messages in thread
From: Jin, Yao @ 2021-03-17 12:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Hi Jiri,

On 3/17/2021 6:06 PM, Jiri Olsa wrote:
> On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:
>>
>>
>> On 3/16/2021 10:04 PM, Jiri Olsa wrote:
>>> On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
>>>
>>> SNIP
>>>
>>>>
>>>>    Performance counter stats for 'system wide':
>>>>
>>>>          136,655,302      cpu_core/branch-instructions/
>>>>
>>>>          1.003171561 seconds time elapsed
>>>>
>>>> So we need special rules for both cycles and branches.
>>>>
>>>> The worse thing is, we also need to process the hardware cache events.
>>>>
>>>> # ./perf stat -e cpu_core/LLC-loads/
>>>> event syntax error: 'cpu_core/LLC-loads/'
>>>>                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
>>>>
>>>> valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
>>>>
>>>> Initial error:
>>>> event syntax error: 'cpu_core/LLC-loads/'
>>>>                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
>>>>
>>>> If we use special rules for establishing all event mapping, that looks too much. :(
>>>
>>> hmmm but wait, currently we do not support events like this:
>>>
>>>     'cpu/cycles/'
>>>     'cpu/branches/'
>>>
>>> the pmu style accepts only 'events' or 'format' terms within //
>>>
>>> we made hw events like 'cycles','instructions','branches' special
>>> to be used without the pmu
>>>
>>> so why do we need to support cpu_code/cycles/ ?
>>>
>>> jirka
>>>
>>
>> Actually we have to support pmu style event for hybrid platform.
>>
>> User may want to enable the events from specified pmus and also with flexible grouping.
>>
>> For example,
>>
>> perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'
>>
>> This usage is common and reasonable. So I think we may need to support pmu style events.
> 
> sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
> why do you insist on supporting cpu_core/cycles/ ?
> 
> jirka
> 

I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But what would we do for 
cache event?

'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.

For hybrid platform, user may only want to enable the LLC-loads on core CPUs or on atom CPUs. That's 
reasonable. While if we don't support the pmu style event, how to satisfy this requirement?

If we can support the pmu style event, we can also use the same way for cpu_core/cycles/. At least 
it's not a bad thing, right? :)

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17 12:17                 ` Jin, Yao
@ 2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
  2021-03-18 12:16                     ` Jiri Olsa
  2021-03-19  2:48                     ` Andi Kleen
  2021-03-18 11:51                   ` Jiri Olsa
  1 sibling, 2 replies; 59+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-03-17 13:42 UTC (permalink / raw)
  To: Jin, Yao
  Cc: Jiri Olsa, jolsa, peterz, mingo, alexander.shishkin,
	Linux-kernel, ak, kan.liang, yao.jin

Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
> Hi Jiri,
> 
> On 3/17/2021 6:06 PM, Jiri Olsa wrote:
> > On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:
> > > 
> > > 
> > > On 3/16/2021 10:04 PM, Jiri Olsa wrote:
> > > > On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
> > > > 
> > > > SNIP
> > > > 
> > > > > 
> > > > >    Performance counter stats for 'system wide':
> > > > > 
> > > > >          136,655,302      cpu_core/branch-instructions/
> > > > > 
> > > > >          1.003171561 seconds time elapsed
> > > > > 
> > > > > So we need special rules for both cycles and branches.
> > > > > 
> > > > > The worse thing is, we also need to process the hardware cache events.
> > > > > 
> > > > > # ./perf stat -e cpu_core/LLC-loads/
> > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > >                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > > > 
> > > > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> > > > > 
> > > > > Initial error:
> > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > >                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > > > 
> > > > > If we use special rules for establishing all event mapping, that looks too much. :(
> > > > 
> > > > hmmm but wait, currently we do not support events like this:
> > > > 
> > > >     'cpu/cycles/'
> > > >     'cpu/branches/'
> > > > 
> > > > the pmu style accepts only 'events' or 'format' terms within //
> > > > 
> > > > we made hw events like 'cycles','instructions','branches' special
> > > > to be used without the pmu
> > > > 
> > > > so why do we need to support cpu_code/cycles/ ?

> > > Actually we have to support pmu style event for hybrid platform.

> > > User may want to enable the events from specified pmus and also with flexible grouping.

> > > For example,

> > > perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'

> > > This usage is common and reasonable. So I think we may need to support pmu style events.

> > sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
> > why do you insist on supporting cpu_core/cycles/ ?

> 
> I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
> what would we do for cache event?
> 
> 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.
> 
> For hybrid platform, user may only want to enable the LLC-loads on core CPUs
> or on atom CPUs. That's reasonable. While if we don't support the pmu style
> event, how to satisfy this requirement?
> 
> If we can support the pmu style event, we can also use the same way for
> cpu_core/cycles/. At least it's not a bad thing, right? :)

While we're discussing, do we really want to use the "core" and "atom"
terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
that we should come up with some short name for the "litle" CPUs.

Won't we have the same situation with ARM where we want to know the
number of cycles spent on a BIG core and also on a little one?

Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
'little/cycles/'?

- Arnaldo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17 12:17                 ` Jin, Yao
  2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
@ 2021-03-18 11:51                   ` Jiri Olsa
  1 sibling, 0 replies; 59+ messages in thread
From: Jiri Olsa @ 2021-03-18 11:51 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao wrote:

SNIP

> > > 
> > > For example,
> > > 
> > > perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'
> > > 
> > > This usage is common and reasonable. So I think we may need to support pmu style events.
> > 
> > sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
> > why do you insist on supporting cpu_core/cycles/ ?
> > 
> > jirka
> > 
> 
> I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
> what would we do for cache event?
> 
> 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.

ugh, I keep forgetting those ;-)

> 
> For hybrid platform, user may only want to enable the LLC-loads on core CPUs
> or on atom CPUs. That's reasonable. While if we don't support the pmu style
> event, how to satisfy this requirement?
> 
> If we can support the pmu style event, we can also use the same way for
> cpu_core/cycles/. At least it's not a bad thing, right? :)

right, it's probably best to use the pmu/LLC-.../ for this,
I'll check the patch again

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group
  2021-03-16  5:25     ` Jin, Yao
@ 2021-03-18 11:52       ` Jiri Olsa
  0 siblings, 0 replies; 59+ messages in thread
From: Jiri Olsa @ 2021-03-18 11:52 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Mar 16, 2021 at 01:25:29PM +0800, Jin, Yao wrote:
> Hi Jiri,
> 
> On 3/16/2021 7:03 AM, Jiri Olsa wrote:
> > On Thu, Mar 11, 2021 at 03:07:31PM +0800, Jin Yao wrote:
> > 
> > SNIP
> > 
> > >   				goto try_again;
> > >   			}
> > > +
> > > +			if (errno == EINVAL && perf_pmu__hybrid_exist())
> > > +				evlist__warn_hybrid_group(evlist);
> > >   			rc = -errno;
> > >   			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > >   			ui__error("%s\n", msg);
> > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > > index 7a732508b2b4..6f780a039db0 100644
> > > --- a/tools/perf/builtin-stat.c
> > > +++ b/tools/perf/builtin-stat.c
> > > @@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
> > >   	struct evsel *evsel, *pos, *leader;
> > >   	char buf[1024];
> > > +	if (evlist__hybrid_exist(evlist))
> > > +		return;
> > 
> > this should be in separate patch and explained
> > 
> 
> Now I have another idea. If a group consists of atom events and core events,
> we still follow current disabling group solution?
> 
> I mean removing following code:
> 
> if (evlist__hybrid_exist(evlist))
> 	return;
> 
> evlist__check_cpu_maps then continues running and disabling the group. But
> also report with a warning that says "WARNING: Group has events from
> different hybrid PMUs".
> 
> Do you like this way?

I'm not sure I follow completely.. would be best over the code

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
@ 2021-03-18 12:16                     ` Jiri Olsa
  2021-03-18 13:21                       ` Arnaldo Carvalho de Melo
  2021-03-19  2:48                     ` Andi Kleen
  1 sibling, 1 reply; 59+ messages in thread
From: Jiri Olsa @ 2021-03-18 12:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jin, Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
	ak, kan.liang, yao.jin

On Wed, Mar 17, 2021 at 10:42:45AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
> > Hi Jiri,
> > 
> > On 3/17/2021 6:06 PM, Jiri Olsa wrote:
> > > On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:
> > > > 
> > > > 
> > > > On 3/16/2021 10:04 PM, Jiri Olsa wrote:
> > > > > On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
> > > > > 
> > > > > SNIP
> > > > > 
> > > > > > 
> > > > > >    Performance counter stats for 'system wide':
> > > > > > 
> > > > > >          136,655,302      cpu_core/branch-instructions/
> > > > > > 
> > > > > >          1.003171561 seconds time elapsed
> > > > > > 
> > > > > > So we need special rules for both cycles and branches.
> > > > > > 
> > > > > > The worse thing is, we also need to process the hardware cache events.
> > > > > > 
> > > > > > # ./perf stat -e cpu_core/LLC-loads/
> > > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > > >                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > > > > 
> > > > > > valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> > > > > > 
> > > > > > Initial error:
> > > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > > >                                 \___ unknown term 'LLC-loads' for pmu 'cpu_core'
> > > > > > 
> > > > > > If we use special rules for establishing all event mapping, that looks too much. :(
> > > > > 
> > > > > hmmm but wait, currently we do not support events like this:
> > > > > 
> > > > >     'cpu/cycles/'
> > > > >     'cpu/branches/'
> > > > > 
> > > > > the pmu style accepts only 'events' or 'format' terms within //
> > > > > 
> > > > > we made hw events like 'cycles','instructions','branches' special
> > > > > to be used without the pmu
> > > > > 
> > > > > so why do we need to support cpu_code/cycles/ ?
> 
> > > > Actually we have to support pmu style event for hybrid platform.
> 
> > > > User may want to enable the events from specified pmus and also with flexible grouping.
> 
> > > > For example,
> 
> > > > perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e '{cpu_atom/cycles/,cpu_atom/instructions/}'
> 
> > > > This usage is common and reasonable. So I think we may need to support pmu style events.
> 
> > > sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
> > > why do you insist on supporting cpu_core/cycles/ ?
> 
> > 
> > I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
> > what would we do for cache event?
> > 
> > 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.
> > 
> > For hybrid platform, user may only want to enable the LLC-loads on core CPUs
> > or on atom CPUs. That's reasonable. While if we don't support the pmu style
> > event, how to satisfy this requirement?
> > 
> > If we can support the pmu style event, we can also use the same way for
> > cpu_core/cycles/. At least it's not a bad thing, right? :)
> 
> While we're discussing, do we really want to use the "core" and "atom"
> terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
> that we should come up with some short name for the "litle" CPUs.
> 
> Won't we have the same situation with ARM where we want to know the
> number of cycles spent on a BIG core and also on a little one?
> 
> Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
> 'little/cycles/'?

do arm servers already export multiple pmus like this?
I did not notice

it'd be definitely great to have some unite way for this,
so far we have the hybrid pmu detection and support in
hw events like cycles/instructions.. which should be easy
to follow on arm

there's also support to have these events on specific pmu
pmu/cycles/ , which I still need to check on

jirka


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-18 12:16                     ` Jiri Olsa
@ 2021-03-18 13:21                       ` Arnaldo Carvalho de Melo
  2021-03-18 18:16                         ` Liang, Kan
  0 siblings, 1 reply; 59+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-03-18 13:21 UTC (permalink / raw)
  To: Jiri Olsa, Mark Rutland
  Cc: Jin, Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
	ak, kan.liang, yao.jin

Em Thu, Mar 18, 2021 at 01:16:37PM +0100, Jiri Olsa escreveu:
> On Wed, Mar 17, 2021 at 10:42:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
> > > I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
> > > what would we do for cache event?

> > > 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.

> > > For hybrid platform, user may only want to enable the LLC-loads on core CPUs
> > > or on atom CPUs. That's reasonable. While if we don't support the pmu style
> > > event, how to satisfy this requirement?

> > > If we can support the pmu style event, we can also use the same way for
> > > cpu_core/cycles/. At least it's not a bad thing, right? :)

> > While we're discussing, do we really want to use the "core" and "atom"
> > terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
> > that we should come up with some short name for the "litle" CPUs.

> > Won't we have the same situation with ARM where we want to know the
> > number of cycles spent on a BIG core and also on a little one?

> > Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
> > 'little/cycles/'?

> do arm servers already export multiple pmus like this?
> I did not notice

I haven't checked, but AFAIK this BIG/Little kind of arch started there,
Mark?

- Arnaldo
 
> it'd be definitely great to have some unite way for this,
> so far we have the hybrid pmu detection and support in
> hw events like cycles/instructions.. which should be easy
> to follow on arm
> 
> there's also support to have these events on specific pmu
> pmu/cycles/ , which I still need to check on

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-18 13:21                       ` Arnaldo Carvalho de Melo
@ 2021-03-18 18:16                         ` Liang, Kan
  0 siblings, 0 replies; 59+ messages in thread
From: Liang, Kan @ 2021-03-18 18:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland
  Cc: Jin, Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
	ak, kan.liang, yao.jin



On 3/18/2021 9:21 AM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Mar 18, 2021 at 01:16:37PM +0100, Jiri Olsa escreveu:
>> On Wed, Mar 17, 2021 at 10:42:45AM -0300, Arnaldo Carvalho de Melo wrote:
>>> Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
>>>> I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
>>>> what would we do for cache event?
> 
>>>> 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not supported currently.
> 
>>>> For hybrid platform, user may only want to enable the LLC-loads on core CPUs
>>>> or on atom CPUs. That's reasonable. While if we don't support the pmu style
>>>> event, how to satisfy this requirement?
> 
>>>> If we can support the pmu style event, we can also use the same way for
>>>> cpu_core/cycles/. At least it's not a bad thing, right? :)
> 
>>> While we're discussing, do we really want to use the "core" and "atom"
>>> terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
>>> that we should come up with some short name for the "litle" CPUs.
> 
>>> Won't we have the same situation with ARM where we want to know the
>>> number of cycles spent on a BIG core and also on a little one?
> 
>>> Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
>>> 'little/cycles/'?
> 
>> do arm servers already export multiple pmus like this?
>> I did not notice
> 
> I haven't checked, but AFAIK this BIG/Little kind of arch started there,
> Mark?


Here is the cover letter of the ARM big.little patch set. ARM also 
exports multiple PMUs, e.g., armv7_cortex_a15 and armv7_cortex_a7.
https://lore.kernel.org/lkml/1431533549-27715-1-git-send-email-mark.rutland@arm.com/

We follow a similar way to handle the Intel hybrid PMUs. The naming rule 
is also similar, "cpu_" + CPU type.

We don't use the old name "cpu" for the main CPU type, because we want 
to make sure every software updated for the hybrid architecture. 
Otherwise, the old script with "cpu//" can still run on a hybrid 
architecture. Users cannot notice that the monitored scope is already 
implicitly changed. The results may be not what they want.

Thanks,
Kan

> 
> - Arnaldo
>   
>> it'd be definitely great to have some unite way for this,
>> so far we have the hybrid pmu detection and support in
>> hw events like cycles/instructions.. which should be easy
>> to follow on arm
>>
>> there's also support to have these events on specific pmu
>> pmu/cycles/ , which I still need to check on

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
  2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
  2021-03-18 12:16                     ` Jiri Olsa
@ 2021-03-19  2:48                     ` Andi Kleen
  1 sibling, 0 replies; 59+ messages in thread
From: Andi Kleen @ 2021-03-19  2:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jin, Yao, Jiri Olsa, jolsa, peterz, mingo, alexander.shishkin,
	Linux-kernel, kan.liang, yao.jin

> While we're discussing, do we really want to use the "core" and "atom"
> terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and

Yes absolutely.

> that we should come up with some short name for the "litle" CPUs.

There actually isn't a main CPU. 

There's nothing "better" about the big cores vs the Atoms 
anyways. They're all important CPUs.

And the system might have no "big" CPUs, but we won't know
until we finished onlining all CPUs. 

Or on Lakefield there are four Atoms and only a single big core.
So with a non hybrid aware profiler tool you would miss most of the
system if we used cpu// for the big core.

Also I think renaming is a good idea because it forces the software
or configuration to handle hybrid. Otherwise you just get subtle breakage
all the time with some CPUs not getting profiled.

It's a similar strategy as we do in the source code when semantics
change.

ARM did this right.

-Andi


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-03-19  2:49 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11  7:07 [PATCH v2 00/27] perf tool: AlderLake hybrid support series 1 Jin Yao
2021-03-11  7:07 ` [PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h Jin Yao
2021-03-11  7:07 ` [PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom" Jin Yao
2021-03-11  7:07 ` [PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias Jin Yao
2021-03-11  7:07 ` [PATCH v2 04/27] perf pmu: Save pmu name Jin Yao
2021-03-15 23:03   ` Jiri Olsa
2021-03-16  0:59     ` Jin, Yao
2021-03-11  7:07 ` [PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list Jin Yao
2021-03-11  7:07 ` [PATCH v2 06/27] perf pmu: Add hybrid helper functions Jin Yao
2021-03-11  7:07 ` [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus Jin Yao
2021-03-12 19:15   ` Jiri Olsa
2021-03-15  1:25     ` Jin, Yao
2021-03-11  7:07 ` [PATCH v2 08/27] perf stat: Uniquify hybrid event name Jin Yao
2021-03-15 23:04   ` Jiri Olsa
2021-03-11  7:07 ` [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events Jin Yao
2021-03-12 19:15   ` Jiri Olsa
2021-03-15  2:04     ` Jin, Yao
2021-03-15 17:34       ` Jiri Olsa
2021-03-15 23:05   ` Jiri Olsa
2021-03-16  3:13     ` Jin, Yao
2021-03-11  7:07 ` [PATCH v2 10/27] perf parse-events: Create two hybrid cache events Jin Yao
2021-03-15 23:05   ` Jiri Olsa
2021-03-16  3:51     ` Jin, Yao
2021-03-11  7:07 ` [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU Jin Yao
2021-03-12 19:15   ` Jiri Olsa
2021-03-15  2:28     ` Jin, Yao
2021-03-15 17:37       ` Jiri Olsa
2021-03-16  1:49         ` Jin, Yao
2021-03-16 14:04           ` Jiri Olsa
2021-03-17  2:12             ` Jin, Yao
2021-03-17 10:06               ` Jiri Olsa
2021-03-17 12:17                 ` Jin, Yao
2021-03-17 13:42                   ` Arnaldo Carvalho de Melo
2021-03-18 12:16                     ` Jiri Olsa
2021-03-18 13:21                       ` Arnaldo Carvalho de Melo
2021-03-18 18:16                         ` Liang, Kan
2021-03-19  2:48                     ` Andi Kleen
2021-03-18 11:51                   ` Jiri Olsa
2021-03-11  7:07 ` [PATCH v2 12/27] perf parse-events: Support hybrid raw events Jin Yao
2021-03-11  7:07 ` [PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default Jin Yao
2021-03-11  7:07 ` [PATCH v2 14/27] perf stat: Add default hybrid events Jin Yao
2021-03-11  7:07 ` [PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event Jin Yao
2021-03-11  7:07 ` [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group Jin Yao
2021-03-15 23:03   ` Jiri Olsa
2021-03-16  5:25     ` Jin, Yao
2021-03-18 11:52       ` Jiri Olsa
2021-03-11  7:07 ` [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group Jin Yao
2021-03-15 23:04   ` Jiri Olsa
2021-03-16  4:39     ` Jin, Yao
2021-03-11  7:07 ` [PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU Jin Yao
2021-03-11  7:07 ` [PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test Jin Yao
2021-03-11  7:07 ` [PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test Jin Yao
2021-03-11  7:07 ` [PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid Jin Yao
2021-03-11  7:07 ` [PATCH v2 22/27] perf tests: Support 'Track with sched_switch' " Jin Yao
2021-03-11  7:07 ` [PATCH v2 23/27] perf tests: Support 'Parse and process metrics' " Jin Yao
2021-03-11  7:07 ` [PATCH v2 24/27] perf tests: Support 'Session topology' " Jin Yao
2021-03-11  7:07 ` [PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' " Jin Yao
2021-03-11  7:07 ` [PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' " Jin Yao
2021-03-11  7:07 ` [PATCH v2 27/27] perf Documentation: Document intel-hybrid support Jin Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).