LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/9] TopDown metrics support for Icelake
@ 2019-05-21 21:40 kan.liang
  2019-05-21 21:40 ` [PATCH 1/9] perf/core: Support a REMOVE transaction kan.liang
                   ` (8 more replies)
  0 siblings, 9 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.

Four TopDown metric events as separate perf events, which map to
internal METRICS register, are exposed. They are topdown-retiring,
topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
Those events do not exist in hardware, but can be allocated by the
scheduler. We use a special 0xff event code, which is reserved for
software. The value of TopDown metric events can be calculated by
multiplying the METRICS (percentage) register with SLOTS fixed counter.

New in Icelake
- Do not require generic counters. This allows to collect TopDown always
  in addition to other events.
- Measuring TopDown per thread/process instead of only per core

Limitation
- To get accurate result and avoid reading the METRICS register multiple
  times, the TopDown metrics events and SLOTS event have to be in the
  same group.
- METRICS and SLOTS registers have to be cleared after each read by SW.
  That is to prevent the lose of precision and a known side effect of
  METRICS register.
- Cannot do sampling read SLOTS and TopDown metric events

Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
TopDown metrics.

Andi Kleen (7):
  perf/core: Support a REMOVE transaction
  perf/x86/intel: Basic support for metrics counters
  perf/x86/intel: Support overflows on SLOTS
  perf/x86/intel: Set correct weight for TopDown metrics events
  perf/x86/intel: Export new TopDown metrics events for Icelake
  perf, tools, stat: Support new per thread TopDown metrics
  perf, tools: Add documentation for topdown metrics

Kan Liang (2):
  perf/x86/intel: Support hardware TopDown metrics
  perf/x86/intel: Disable sampling read slots and topdown

 arch/x86/events/core.c                 |  63 ++++++--
 arch/x86/events/intel/core.c           | 284 +++++++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h           |  31 ++++
 arch/x86/include/asm/msr-index.h       |   3 +
 arch/x86/include/asm/perf_event.h      |  30 ++++
 include/linux/perf_event.h             |   7 +
 kernel/events/core.c                   |   5 +
 tools/perf/Documentation/perf-stat.txt |   9 +-
 tools/perf/Documentation/topdown.txt   | 223 ++++++++++++++++++++++++++
 tools/perf/builtin-stat.c              |  24 +++
 tools/perf/util/stat-shadow.c          |  89 +++++++++++
 tools/perf/util/stat.c                 |   4 +
 tools/perf/util/stat.h                 |   8 +
 13 files changed, 754 insertions(+), 26 deletions(-)
 create mode 100644 tools/perf/Documentation/topdown.txt

-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/9] perf/core: Support a REMOVE transaction
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The TopDown events can be collected per thread/process on Icelake. To
use TopDown through RDPMC in applications, the metrics and slots MSR
values have to be saved/restored during context switching.
It is useful to have a remove transaction when the counter is
unscheduled, so that the values can be saved correctly.

Add a remove transaction to the perf core.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c     | 3 +--
 include/linux/perf_event.h | 1 +
 kernel/events/core.c       | 5 +++++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f0e4804515d8..e075de494dfd 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1856,8 +1856,7 @@ static inline void x86_pmu_read(struct perf_event *event)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  *
- * We only support PERF_PMU_TXN_ADD transactions. Save the
- * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * Save the transaction flags and ignore non-PERF_PMU_TXN_ADD
  * transactions.
  */
 static void x86_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5beb5cde3d56..973b7f8ce8e9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,7 @@ struct perf_event;
  */
 #define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
 #define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */
+#define PERF_PMU_TXN_REMOVE 0x4		/* txn to remove event on PMU */
 
 /**
  * pmu::capabilities flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 118ad1aef6af..f204166f6bc8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2032,6 +2032,7 @@ group_sched_out(struct perf_event *group_event,
 		struct perf_cpu_context *cpuctx,
 		struct perf_event_context *ctx)
 {
+	struct pmu *pmu = ctx->pmu;
 	struct perf_event *event;
 
 	if (group_event->state != PERF_EVENT_STATE_ACTIVE)
@@ -2039,6 +2040,8 @@ group_sched_out(struct perf_event *group_event,
 
 	perf_pmu_disable(ctx->pmu);
 
+	pmu->start_txn(pmu, PERF_PMU_TXN_REMOVE);
+
 	event_sched_out(group_event, cpuctx, ctx);
 
 	/*
@@ -2051,6 +2054,8 @@ group_sched_out(struct perf_event *group_event,
 
 	if (group_event->attr.exclusive)
 		cpuctx->exclusive = 0;
+
+	pmu->commit_txn(pmu);
 }
 
 #define DETACH_GROUP	0x01UL
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
  2019-05-21 21:40 ` [PATCH 1/9] perf/core: Support a REMOVE transaction kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-28 12:05   ` Peter Zijlstra
                     ` (2 more replies)
  2019-05-21 21:40 ` [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS kan.liang
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Metrics counters (hardware counters containing multiple metrics)
are modeled as separate registers for each TopDown metric events,
with an extra reg being used for coordinating access to the
underlying register in the scheduler.

This patch adds the basic infrastructure to separate the scheduler
register indexes from the actual hardware register indexes. In
most cases the MSR address is already used correctly, but for
code using indexes we need a separate reg_idx field in the event
to indicate the correct underlying register.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c            | 18 ++++++++++++++++--
 arch/x86/events/intel/core.c      | 29 ++++++++++++++++++++---------
 arch/x86/events/perf_event.h      | 15 +++++++++++++++
 arch/x86/include/asm/msr-index.h  |  1 +
 arch/x86/include/asm/perf_event.h | 30 ++++++++++++++++++++++++++++++
 include/linux/perf_event.h        |  1 +
 6 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e075de494dfd..e9075d57853d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1035,16 +1035,30 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	struct hw_perf_event *hwc = &event->hw;
 
 	hwc->idx = cpuc->assign[i];
+	hwc->reg_idx = hwc->idx;
 	hwc->last_cpu = smp_processor_id();
 	hwc->last_tag = ++cpuc->tags[i];
 
+	/*
+	 * Metrics counters use different indexes in the scheduler
+	 * versus the hardware.
+	 *
+	 * Map metrics to fixed counter 3 (which is the base count),
+	 * but the update event callback reads the extra metric register
+	 * and converts to the right metric.
+	 */
+	if (is_metric_idx(hwc->idx))
+		hwc->reg_idx = INTEL_PMC_IDX_FIXED_SLOTS;
+
 	if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) {
 		hwc->config_base = 0;
 		hwc->event_base	= 0;
 	} else if (hwc->idx >= INTEL_PMC_IDX_FIXED) {
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
-		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - INTEL_PMC_IDX_FIXED);
-		hwc->event_base_rdpmc = (hwc->idx - INTEL_PMC_IDX_FIXED) | 1<<30;
+		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
+			(hwc->reg_idx - INTEL_PMC_IDX_FIXED);
+		hwc->event_base_rdpmc = (hwc->reg_idx - INTEL_PMC_IDX_FIXED)
+			| 1<<30;
 	} else {
 		hwc->config_base = x86_pmu_config_addr(hwc->idx);
 		hwc->event_base  = x86_pmu_event_addr(hwc->idx);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 85afe7e98c7d..75ed91a36413 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2129,7 +2129,7 @@ static inline void intel_pmu_ack_status(u64 ack)
 
 static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
 {
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
+	int idx = hwc->reg_idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask;
 
 	mask = 0xfULL << (idx * 4);
@@ -2155,9 +2155,19 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
-	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_cp_status &= ~(1ull << hwc->idx);
+	__clear_bit(hwc->idx, cpuc->enabled_events);
+
+	/*
+	 * When any other slots sharing event is still enabled,
+	 * cancel the disabling.
+	 */
+	if (is_any_slots_idx(hwc->idx) &&
+	    (*(u64 *)&cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS))
+		return;
+
+	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->reg_idx);
+	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->reg_idx);
+	cpuc->intel_cp_status &= ~(1ull << hwc->reg_idx);
 
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
 		intel_pmu_disable_fixed(hwc);
@@ -2193,7 +2203,7 @@ static void intel_pmu_read_event(struct perf_event *event)
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
+	int idx = hwc->reg_idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask, bits = 0;
 
 	/*
@@ -2242,18 +2252,19 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	}
 
 	if (event->attr.exclude_host)
-		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx);
+		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->reg_idx);
 	if (event->attr.exclude_guest)
-		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
+		cpuc->intel_ctrl_host_mask |= (1ull << hwc->reg_idx);
 
 	if (unlikely(event_is_checkpointed(event)))
-		cpuc->intel_cp_status |= (1ull << hwc->idx);
+		cpuc->intel_cp_status |= (1ull << hwc->reg_idx);
 
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
-		intel_pmu_enable_fixed(event);
+		if (!__test_and_set_bit(hwc->idx, cpuc->enabled_events))
+			intel_pmu_enable_fixed(event);
 		return;
 	}
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7ae2912f16de..dd6c86a758f7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -203,6 +203,7 @@ struct cpu_hw_events {
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	unsigned long		running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
+	unsigned long		enabled_events[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 
 	int			n_events; /* the # of events in the below arrays */
 	int			n_added;  /* the # last events in the below arrays;
@@ -366,6 +367,20 @@ struct cpu_hw_events {
 #define FIXED_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
+/*
+ * Special metric counters do not actually exist, but get remapped
+ * to a combination of FxCtr3 + MSR_PERF_METRICS
+ *
+ * This allocates them to a dummy offset for the scheduler.
+ * This does not allow sharing of multiple users of the same
+ * metric without multiplexing, even though the hardware supports that
+ * in principle.
+ */
+
+#define METRIC_EVENT_CONSTRAINT(c, n)					\
+	EVENT_CONSTRAINT(c, (1ULL << (INTEL_PMC_IDX_FIXED_METRIC_BASE+n)), \
+			 FIXED_EVENT_FLAGS)
+
 /*
  * Constraint on the Event code + UMask
  */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1378518cf63f..4310477d6808 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -777,6 +777,7 @@
 #define MSR_CORE_PERF_FIXED_CTR0	0x00000309
 #define MSR_CORE_PERF_FIXED_CTR1	0x0000030a
 #define MSR_CORE_PERF_FIXED_CTR2	0x0000030b
+#define MSR_CORE_PERF_FIXED_CTR3	0x0000030c
 #define MSR_CORE_PERF_FIXED_CTR_CTRL	0x0000038d
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 1392d5e6e8d6..7be4f9d5ea6f 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -167,6 +167,10 @@ struct x86_pmu_capability {
 #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
 #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)
 
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
@@ -176,6 +180,32 @@ struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
 
+/*
+ * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
+ * and another for the whole slots counter
+ *
+ * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
+ * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
+ * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
+ * have the same configuration.
+ */
+#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
+#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
+#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
+#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
+#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
+#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
+						 INTEL_PMC_MSK_FIXED_SLOTS)
+static inline bool is_metric_idx(int idx)
+{
+	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
+}
+
+static inline bool is_any_slots_idx(int idx)
+{
+	return is_metric_idx(idx) || idx == INTEL_PMC_IDX_FIXED_SLOTS;
+}
+
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
 #define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 973b7f8ce8e9..b980b9e95d2a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -127,6 +127,7 @@ struct hw_perf_event {
 			unsigned long	event_base;
 			int		event_base_rdpmc;
 			int		idx;
+			int		reg_idx;
 			int		last_cpu;
 			int		flags;
 
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
  2019-05-21 21:40 ` [PATCH 1/9] perf/core: Support a REMOVE transaction kan.liang
  2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-28 12:20   ` Peter Zijlstra
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The internal counters used for the metrics can overflow. If this happens
an overflow is triggered on the SLOTS fixed counter. Add special code
that resets all the slave metric counters in this case.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 75ed91a36413..a66dc761f09d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2279,12 +2279,35 @@ static void intel_pmu_add_event(struct perf_event *event)
 		intel_pmu_lbr_add(event);
 }
 
+/* When SLOTS overflowed update all the active topdown-* events */
+static void intel_pmu_update_metrics(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int idx;
+	u64 slots_events;
+
+	slots_events = *(u64 *)cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS;
+
+	for_each_set_bit(idx, (unsigned long *)&slots_events, 64) {
+		struct perf_event *ev = cpuc->events[idx];
+
+		if (ev == event)
+			continue;
+		x86_perf_event_update(event);
+	}
+}
+
 /*
  * Save and restart an expired event. Called by NMI contexts,
  * so it has to be careful about preempting normal event ops:
  */
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (unlikely(hwc->reg_idx == INTEL_PMC_IDX_FIXED_SLOTS))
+		intel_pmu_update_metrics(event);
+
 	x86_perf_event_update(event);
 	/*
 	 * For a checkpointed counter always reset back to 0.  This
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (2 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-28 12:43   ` Peter Zijlstra
                     ` (5 more replies)
  2019-05-21 21:40 ` [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events kan.liang
                   ` (4 subsequent siblings)
  8 siblings, 6 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intro
=====

Icelake has support for measuring the four top level TopDown metrics
directly in hardware. This is implemented by an additional "metrics"
register, and a new Fixed Counter 3 that measures pipeline "slots".

Events
======

We export four metric events as separate perf events, which map to
internal "metrics" counter register. Those events do not exist in
hardware, but can be allocated by the scheduler.

For the event mapping we use a special 0xff event code, which is
reserved for software.

When setting up such events they point to the slots counter, and a
special callback, metric_update_event(), reads the additional metrics
msr to generate the metrics. Then the metric is reported by multiplying
the metric (percentage) with slots.

This multiplication allows to easily keep a running count, for example
when the slots counter overflows, and makes all the standard tools, such
as a perf stat, work. They can do deltas of the values without needing
to know about percentages. This also simplifies accumulating the counts
of child events, which otherwise would need to know how to average
percent values.

Groups
======

To avoid reading the METRICS register multiple times, the metrics and
slots value are cached. This only works when TopDown metrics events are
in the same group.

Resetting1
==========

The 8bit metrics ratio values lose precision when the measurement period
gets longer.

To avoid this we always reset the metric value when reading, as we
already accumulate the count in the perf count value.

For a long period read, low precision is acceptable.
For a short period read, the register will be reset often enough that it
is not a problem.

This implies that to read more than one submetrics always a group needs
to be used, so that the caching above still gives the correct value.

We also need to support this in the NMI, so that it's possible to
collect all top down metrics as part of leader sampling. To avoid races
with the normal transactions use a special nmi_metric cache that is only
used during the NMI.

Resetting2
==========

The PERF_METRICS may report wrong value if its delta was less than 1/255
of SLOTS (Fixed counter 3).

To avoid this, the PERF_METRICS and SLOTS registers have to be reset
simultaneously. The slots value has to be cached as well.

In counting, the -max_period is the initial value of the SLOTS. The huge
initial value will definitely trigger the issue mentioned above.
Force initial value as 0 for topdown and slots event counting.

RDPMC
=========
The TopDown can be collected per thread/process. To use TopDown
through RDPMC in applications on Icelake, the metrics and slots values
have to be saved/restored during context switching.

Add specific set_period() to specially handle the slots and metrics
event. Because,
 - The initial value must be 0.
 - Only need to restore the value in context switch. For other cases,
   the counters have been cleared after read.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c           |  40 +++++---
 arch/x86/events/intel/core.c     | 192 +++++++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |  14 +++
 arch/x86/include/asm/msr-index.h |   2 +
 include/linux/perf_event.h       |   5 +
 5 files changed, 242 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e9075d57853d..07ecfe75f0e6 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -91,16 +91,20 @@ u64 x86_perf_event_update(struct perf_event *event)
 					new_raw_count) != prev_raw_count)
 		goto again;
 
-	/*
-	 * Now we have the new raw value and have updated the prev
-	 * timestamp already. We can now calculate the elapsed delta
-	 * (event-)time and add that to the generic event.
-	 *
-	 * Careful, not all hw sign-extends above the physical width
-	 * of the count.
-	 */
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
+	if (unlikely(hwc->flags & PERF_X86_EVENT_UPDATE))
+		delta = x86_pmu.metric_update_event(event, new_raw_count);
+	else {
+		/*
+		 * Now we have the new raw value and have updated the prev
+		 * timestamp already. We can now calculate the elapsed delta
+		 * (event-)time and add that to the generic event.
+		 *
+		 * Careful, not all hw sign-extends above the physical width
+		 * of the count.
+		 */
+		delta = (new_raw_count << shift) - (prev_raw_count << shift);
+		delta >>= shift;
+	}
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -1003,6 +1007,10 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 
 	max_count = x86_pmu.num_counters + x86_pmu.num_counters_fixed;
 
+	/* There are 4 TopDown metrics */
+	if (x86_pmu.has_metric)
+		max_count += 4;
+
 	/* current number of events already accepted */
 	n = cpuc->n_events;
 
@@ -1186,6 +1194,9 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (idx == INTEL_PMC_IDX_FIXED_BTS)
 		return 0;
 
+	if (x86_pmu.set_period && x86_pmu.set_period(event))
+		goto update_userpage;
+
 	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
@@ -1234,6 +1245,7 @@ int x86_perf_event_set_period(struct perf_event *event)
 			(u64)(-left) & x86_pmu.cntval_mask);
 	}
 
+update_userpage:
 	perf_event_update_userpage(event);
 
 	return ret;
@@ -1901,6 +1913,8 @@ static void x86_pmu_cancel_txn(struct pmu *pmu)
 
 	txn_flags = cpuc->txn_flags;
 	cpuc->txn_flags = 0;
+	cpuc->txn_metric = 0;
+	cpuc->txn_slots = 0;
 	if (txn_flags & ~PERF_PMU_TXN_ADD)
 		return;
 
@@ -1928,6 +1942,8 @@ static int x86_pmu_commit_txn(struct pmu *pmu)
 
 	WARN_ON_ONCE(!cpuc->txn_flags);	/* no txn in flight */
 
+	cpuc->txn_metric = 0;
+	cpuc->txn_slots = 0;
 	if (cpuc->txn_flags & ~PERF_PMU_TXN_ADD) {
 		cpuc->txn_flags = 0;
 		return 0;
@@ -2141,7 +2157,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
+	if (is_metric_idx(idx))
+		idx = 1 << 29;
+	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
 		idx -= INTEL_PMC_IDX_FIXED;
 		idx |= 1 << 30;
 	}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a66dc761f09d..2eec172765f4 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -245,6 +245,11 @@ static struct event_constraint intel_icl_event_constraints[] = {
 	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
 	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
 	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	METRIC_EVENT_CONSTRAINT(0x01ff, 0),	/* Retiring metric */
+	METRIC_EVENT_CONSTRAINT(0x02ff, 1),	/* Bad speculation metric */
+	METRIC_EVENT_CONSTRAINT(0x03ff, 2),	/* FE bound metric */
+	METRIC_EVENT_CONSTRAINT(0x04ff, 3),	/* BE bound metric */
+
 	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
 	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
@@ -265,6 +270,14 @@ static struct extra_reg intel_icl_extra_regs[] __read_mostly = {
 	INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffff9fffull, RSP_1),
 	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
 	INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE),
+	/*
+	 * PERF_METRICS does exist, but it is not configured. But we
+	 * share the original Fixed Ctr 3 from different metrics
+	 * events. So use the extra reg to enforce the same
+	 * configuration on the original register, but do not actually
+	 * write to it.
+	 */
+	INTEL_EVENT_EXTRA_REG(0xff, 0, -1L, PERF_METRICS),
 	EVENT_EXTRA_END
 };
 
@@ -2480,6 +2493,8 @@ static int intel_pmu_handle_irq_v4(struct pt_regs *regs)
 	int pmu_enabled = cpuc->enabled;
 	int loops = 0;
 
+	cpuc->nmi_metric = 0;
+	cpuc->nmi_slots = 0;
 	/* PMU has been disabled because of counter freezing */
 	cpuc->enabled = 0;
 	if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
@@ -2553,6 +2568,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	int pmu_enabled;
 
 	cpuc = this_cpu_ptr(&cpu_hw_events);
+	cpuc->nmi_metric = 0;
+	cpuc->nmi_slots = 0;
 
 	/*
 	 * Save the PMU state.
@@ -3287,6 +3304,13 @@ static int core_pmu_hw_config(struct perf_event *event)
 	return intel_pmu_bts_config(event);
 }
 
+#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
+#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
+#define is_perf_metrics_event(e)				\
+		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
+		 (EVENT_CODE(e) >= 0x01ff) &&			\
+		 (EVENT_CODE(e) <= 0x04ff))
+
 static int intel_pmu_hw_config(struct perf_event *event)
 {
 	int ret = x86_pmu_hw_config(event);
@@ -3332,6 +3356,31 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	/* Fixed Counter 3 with its metric sub events */
+	if (x86_pmu.has_metric &&
+	    (is_slots_event(event) || is_perf_metrics_event(event))) {
+		if (event->attr.config1 != 0)
+			return -EINVAL;
+		if (event->attr.config & ARCH_PERFMON_EVENTSEL_ANY)
+			return -EINVAL;
+		/*
+		 * Put configuration (minus event) into config1 so that
+		 * the scheduler enforces through an extra_reg that
+		 * all instances of the metrics events have the same
+		 * configuration.
+		 */
+		event->attr.config1 = event->hw.config & X86_ALL_EVENT_FLAGS;
+		if (is_perf_metrics_event(event)) {
+			if (!x86_pmu.intel_cap.perf_metrics)
+				return -EINVAL;
+			if (is_sampling_event(event))
+				return -EINVAL;
+		}
+		if (!is_sampling_event(event))
+			event->hw.flags |= PERF_X86_EVENT_UPDATE;
+		return 0;
+	}
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
@@ -4220,6 +4269,141 @@ static __init void intel_ht_bug(void)
 	x86_pmu.stop_scheduling = intel_stop_scheduling;
 }
 
+/*
+ * Update metric event with the PERF_METRICS register and return the delta
+ *
+ * Metric events are defined as SLOTS * metric. The original
+ * metric can be reconstructed by taking SUM(all-metrics)/metric
+ * (or SLOTS/metric)
+ *
+ * There are some limits for reading and resetting the PERF_METRICS register.
+ * - The PERF_METRICS and SLOTS registers have to be reset simultaneously.
+ * - To get the high precision, the measurement period has to be short.
+ *   The PERF_METRICS and SLOTS will be reset for each reading.
+ *   The only exception is context switch. The PERF_METRICS and SLOTS have to
+ *   be saved/restored during context switch for RDPMC users.
+ *
+ * The PERF_METRICS doesn't have the absolute value. To calculate the delta
+ * of metric events,
+ *   delta = new_SLOTS * new_METRICS - last_SLOTS * last_METRICS;
+ * Only need to save the last SLOTS and METRICS for context switch. For other
+ * cases, the registers have been reset. The last_* values are 0.
+ *
+ * To avoid reading the METRICS register multiple times, the metrics and slots
+ * value are cached. This only works when TopDown metrics events are in the
+ * same group.
+ */
+static u64 icl_metric_update_event(struct perf_event *event, u64 val)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 newval, metric, slots_val = 0, new, last;
+	bool nmi = in_nmi();
+	int txn_flags = nmi ? 0 : cpuc->txn_flags;
+
+	/*
+	 * Use cached value for transaction.
+	 */
+	newval = 0;
+	if (txn_flags) {
+		newval = cpuc->txn_metric;
+		slots_val = cpuc->txn_slots;
+	} else if (nmi) {
+		newval = cpuc->nmi_metric;
+		slots_val = cpuc->nmi_slots;
+	}
+
+	if (!newval) {
+		slots_val = val;
+
+		rdpmcl((1<<29), newval);
+		if (txn_flags) {
+			cpuc->txn_metric = newval;
+			cpuc->txn_slots = slots_val;
+		} else if (nmi) {
+			cpuc->nmi_metric = newval;
+			cpuc->nmi_slots = slots_val;
+		}
+
+		if (!(txn_flags & PERF_PMU_TXN_REMOVE)) {
+			/* Reset the metric value when reading
+			 * The SLOTS register must be reset when PERF_METRICS reset,
+			 * otherwise PERF_METRICS may has wrong output.
+			 */
+			wrmsrl(MSR_PERF_METRICS, 0);
+			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+			hwc->saved_metric = 0;
+			hwc->saved_slots = 0;
+		} else {
+			/* saved metric and slots for context switch */
+			hwc->saved_metric = newval;
+			hwc->saved_slots = val;
+
+		}
+		/* cache the last metric and slots */
+		cpuc->last_metric = hwc->last_metric;
+		cpuc->last_slots = hwc->last_slots;
+		hwc->last_metric = 0;
+		hwc->last_slots = 0;
+	}
+
+	/* The METRICS and SLOTS have been reset when reading */
+	if (!(txn_flags & PERF_PMU_TXN_REMOVE))
+		local64_set(&hwc->prev_count, 0);
+
+	if (is_slots_event(event))
+		return (slots_val - cpuc->last_slots);
+
+	/*
+	 * The metric is reported as an 8bit integer percentage
+	 * suming up to 0xff. As the counter is less than 64bits
+	 * we can use the not used bits to get the needed precision.
+	 * Use 16bit fixed point arithmetic for
+	 * slots-in-metric = (MetricPct / 0xff) * val
+	 * This works fine for upto 48bit counters, but will
+	 * lose precision above that.
+	 */
+
+	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
+	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;
+
+	metric = (newval >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
+	new = (((metric * 0xffff) >> 8) * slots_val) >> 16;
+
+	return (new - last);
+}
+
+/*
+ * Update counters for metrics and slots events
+ */
+static int icl_set_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+
+	if (!(hwc->flags & PERF_X86_EVENT_UPDATE))
+		return 0;
+
+	/* The initial value must be 0 */
+	if ((left == x86_pmu.max_period) && !hwc->saved_slots) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+	}
+
+	/* restore metric and slots for context switch */
+	if (hwc->saved_slots) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3,  hwc->saved_slots);
+		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
+		hwc->last_slots = hwc->saved_slots;
+		hwc->last_metric = hwc->saved_metric;
+		hwc->saved_slots = 0;
+		hwc->saved_metric = 0;
+
+	}
+
+	return 1;
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -5039,6 +5223,9 @@ __init int intel_pmu_init(void)
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
 		intel_pmu_pebs_data_source_skl(false);
+		x86_pmu.has_metric = x86_pmu.intel_cap.perf_metrics;
+		x86_pmu.metric_update_event = icl_metric_update_event;
+		x86_pmu.set_period = icl_set_period;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -5149,6 +5336,11 @@ __init int intel_pmu_init(void)
 	if (x86_pmu.counter_freezing)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
+	if (x86_pmu.has_metric) {
+		x86_pmu.intel_ctrl |= 1ULL << 48;
+		pr_cont("TopDown, ");
+	}
+
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index dd6c86a758f7..909c3b30782e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -40,6 +40,7 @@ enum extra_reg_type {
 	EXTRA_REG_LBR   = 2,	/* lbr_select */
 	EXTRA_REG_LDLAT = 3,	/* ld_lat_threshold */
 	EXTRA_REG_FE    = 4,    /* fe_* */
+	EXTRA_REG_PERF_METRICS = 5, /* perf metrics */
 
 	EXTRA_REG_MAX		/* number of entries needed */
 };
@@ -76,6 +77,7 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 #define PERF_X86_EVENT_EXCL_ACCT	0x0100 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0200 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0400 /* use large PEBS */
+#define PERF_X86_EVENT_UPDATE		0x0800 /* call update_event after read */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -255,6 +257,13 @@ struct cpu_hw_events {
 	u64				intel_ctrl_host_mask;
 	struct perf_guest_switch_msr	guest_switch_msrs[X86_PMC_IDX_MAX];
 
+	unsigned long			txn_metric;
+	unsigned long			txn_slots;
+	unsigned long			nmi_metric;
+	unsigned long			nmi_slots;
+	unsigned long			last_metric;
+	unsigned long			last_slots;
+
 	/*
 	 * Intel checkpoint mask
 	 */
@@ -543,6 +552,7 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
+		u64	perf_metrics:1;
 	};
 	u64	capabilities;
 };
@@ -606,6 +616,7 @@ struct x86_pmu {
 	int		(*addr_offset)(int index, bool eventsel);
 	int		(*rdpmc_index)(int index);
 	u64		(*event_map)(int);
+	u64		(*metric_update_event)(struct perf_event *event, u64 val);
 	int		max_events;
 	int		num_counters;
 	int		num_counters_fixed;
@@ -636,6 +647,7 @@ struct x86_pmu {
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
 	u64		(*limit_period)(struct perf_event *event, u64 l);
+	int		(*set_period)(struct perf_event *event);
 
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
@@ -717,6 +729,8 @@ struct x86_pmu {
 	struct extra_reg *extra_regs;
 	unsigned int flags;
 
+	bool has_metric;
+
 	/*
 	 * Intel host/guest support (KVM)
 	 */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4310477d6808..bcdd6fadf225 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -102,6 +102,8 @@
 #define MSR_TURBO_RATIO_LIMIT1		0x000001ae
 #define MSR_TURBO_RATIO_LIMIT2		0x000001af
 
+#define MSR_PERF_METRICS		0x00000329
+
 #define MSR_LBR_SELECT			0x000001c8
 #define MSR_LBR_TOS			0x000001c9
 #define MSR_LBR_NHM_FROM		0x00000680
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b980b9e95d2a..0d7081434d1d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -133,6 +133,11 @@ struct hw_perf_event {
 
 			struct hw_perf_event_extra extra_reg;
 			struct hw_perf_event_extra branch_reg;
+
+			u64		saved_metric;
+			u64		saved_slots;
+			u64		last_slots;
+			u64		last_metric;
 		};
 		struct { /* software */
 			struct hrtimer	hrtimer;
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (3 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-28 13:50   ` Peter Zijlstra
  2019-05-21 21:40 ` [PATCH 6/9] perf/x86/intel: Export new TopDown metrics events for Icelake kan.liang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The topdown metrics and slots events are mapped to a fixed counter,
but should have the normal weight for the scheduler.
So special case this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2eec172765f4..6de9249acb28 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5281,6 +5281,15 @@ __init int intel_pmu_init(void)
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
+			/*
+			 * Don't limit the event mask for TopDown
+			 * metrics and slots events.
+			 */
+			if (x86_pmu.num_counters_fixed >= 3 &&
+			    c->idxmsk64 & INTEL_PMC_MSK_ANY_SLOTS) {
+				c->weight = hweight64(c->idxmsk64);
+				continue;
+			}
 			if (c->cmask == FIXED_EVENT_FLAGS
 			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
 				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 6/9] perf/x86/intel: Export new TopDown metrics events for Icelake
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (4 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-21 21:40 ` [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown kan.liang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Export new TopDown metrics events for perf that map to the sub metrics
in the metrics register, and another for the new slots fixed counter.
This makes the new fixed counters in Icelake visible to the perf
user tools.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6de9249acb28..79e9d05e047d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -320,6 +320,12 @@ EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
 EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
 	"4", "2");
 
+EVENT_ATTR_STR(slots,			slots,		"event=0x00,umask=0x4");
+EVENT_ATTR_STR(topdown-retiring,	td_retiring,	"event=0xff,umask=0x1");
+EVENT_ATTR_STR(topdown-bad-spec,	td_bad_spec,	"event=0xff,umask=0x2");
+EVENT_ATTR_STR(topdown-fe-bound,	td_fe_bound,	"event=0xff,umask=0x3");
+EVENT_ATTR_STR(topdown-be-bound,	td_be_bound,	"event=0xff,umask=0x4");
+
 static struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(td_slots_issued),
 	EVENT_PTR(td_slots_retired),
@@ -4462,6 +4468,11 @@ EVENT_ATTR_STR(el-capacity-write, el_capacity_write, "event=0x54,umask=0x2");
 static struct attribute *icl_events_attrs[] = {
 	EVENT_PTR(mem_ld_hsw),
 	EVENT_PTR(mem_st_hsw),
+	EVENT_PTR(slots),
+	EVENT_PTR(td_retiring),
+	EVENT_PTR(td_bad_spec),
+	EVENT_PTR(td_fe_bound),
+	EVENT_PTR(td_be_bound),
 	NULL,
 };
 
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (5 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 6/9] perf/x86/intel: Export new TopDown metrics events for Icelake kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-28 13:52   ` Peter Zijlstra
  2019-05-21 21:40 ` [PATCH 8/9] perf, tools, stat: Support new per thread TopDown metrics kan.liang
  2019-05-21 21:40 ` [PATCH 9/9] perf, tools: Add documentation for topdown metrics kan.liang
  8 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

To get correct PERF_METRICS value, the fixed counter 3 must start from
0. It would bring problems when sampling read slots and topdown events.
For example,
        perf record -e '{slots, topdown-retiring}:S'
The slots would not overflow if it starts from 0.

Add specific validate_group() support to reject the case and error out
for Icelake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       |  2 ++
 arch/x86/events/intel/core.c | 20 ++++++++++++++++++++
 arch/x86/events/perf_event.h |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 07ecfe75f0e6..a7eb842f8651 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2065,6 +2065,8 @@ static int validate_group(struct perf_event *event)
 	fake_cpuc->n_events = 0;
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
 
+	if (x86_pmu.validate_group)
+		ret = x86_pmu.validate_group(fake_cpuc, n);
 out:
 	free_fake_cpuc(fake_cpuc);
 	return ret;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 79e9d05e047d..2bb90d652a35 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4410,6 +4410,25 @@ static int icl_set_period(struct perf_event *event)
 	return 1;
 }
 
+static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
+{
+	bool has_sampling_slots = false, has_metrics = false;
+	struct perf_event *e;
+	int i;
+
+	for (i = 0; i < n; i++) {
+		e = cpuc->event_list[i];
+		if (is_slots_event(e) && is_sampling_event(e))
+			has_sampling_slots = true;
+
+		if (is_perf_metrics_event(e))
+			has_metrics = true;
+	}
+	if (unlikely(has_sampling_slots && has_metrics))
+		return -EINVAL;
+	return 0;
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -5237,6 +5256,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.has_metric = x86_pmu.intel_cap.perf_metrics;
 		x86_pmu.metric_update_event = icl_metric_update_event;
 		x86_pmu.set_period = icl_set_period;
+		x86_pmu.validate_group = icl_validate_group;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 909c3b30782e..66729f868f7c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -649,6 +649,8 @@ struct x86_pmu {
 	u64		(*limit_period)(struct perf_event *event, u64 l);
 	int		(*set_period)(struct perf_event *event);
 
+	int		(*validate_group)(struct cpu_hw_events *cpuc, int n);
+
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
 			counter_freezing	:1;
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 8/9] perf, tools, stat: Support new per thread TopDown metrics
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (6 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown kan.liang
@ 2019-05-21 21:40 ` kan.liang
  2019-05-21 21:40 ` [PATCH 9/9] perf, tools: Add documentation for topdown metrics kan.liang
  8 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  9 +++-
 tools/perf/builtin-stat.c              | 24 +++++++++
 tools/perf/util/stat-shadow.c          | 89 ++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.c                 |  4 ++
 tools/perf/util/stat.h                 |  8 +++
 5 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 39c05f89104e..d572e5ea17c4 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -265,8 +265,13 @@ if the workload is actually bound by the CPU and not by something else.
 For best results it is usually a good idea to use it with interval
 mode like -I 1000, as the bottleneck of workloads can change often.
 
-The top down metrics are collected per core instead of per
-CPU thread. Per core mode is automatically enabled
+This enables --metric-only, unless overridden with --no-metric-only.
+
+The following restrictions only apply to older Intel CPUs and Atom,
+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
+
+The top down metrics are collected per core instead of per CPU thread.
+Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a3c060878faa..614bf24504b0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -123,6 +123,14 @@ static const char * topdown_attrs[] = {
 	NULL,
 };
 
+static const char *topdown_metric_attrs[] = {
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1230,6 +1238,21 @@ static int add_default_attributes(void)
 		char *str = NULL;
 		bool warn = false;
 
+		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+			pr_err("Out of memory\n");
+			return -1;
+		}
+		if (topdown_metric_attrs[0] && str) {
+			if (!stat_config.interval) {
+				fprintf(stat_config.output,
+					"Topdown accuracy may decreases when measuring long period.\n"
+					"Please print the result regularly, e.g. -I1000\n");
+			}
+			goto setup_metrics;
+		}
+
+		str = NULL;
+
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
 			pr_err("top down event configuration requires --per-core mode\n");
@@ -1251,6 +1274,7 @@ static int add_default_attributes(void)
 		if (topdown_attrs[0] && str) {
 			if (warn)
 				arch_topdown_group_warn();
+setup_metrics:
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 83d8094be4fe..6b6ffd64a4c4 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -238,6 +238,18 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
 		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
 				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
+		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
+		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    ctx, cpu, count);
@@ -682,6 +694,47 @@ static double td_be_bound(int ctx, int cpu, struct runtime_stat *st)
 	return sanitize_val(1.0 - sum);
 }
 
+/*
+ * Kernel reports metrics multiplied with slots. To get back
+ * the ratios we need to recreate the sum.
+ */
+
+static double td_metric_ratio(int ctx, int cpu,
+			      enum stat_type type,
+			      struct runtime_stat *stat)
+{
+	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu);
+	double d = runtime_stat_avg(stat, type, ctx, cpu);
+
+	if (sum)
+		return d / sum;
+	return 0;
+}
+
+/*
+ * ... but only if most of the values are actually available.
+ * We allow two missing.
+ */
+
+static bool full_td(int ctx, int cpu,
+		    struct runtime_stat *stat)
+{
+	int c = 0;
+
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu) > 0)
+		c++;
+	return c >= 2;
+}
+
 static void print_smi_cost(struct perf_stat_config *config,
 			   int cpu, struct perf_evsel *evsel,
 			   struct perf_stat_output_ctx *out,
@@ -970,6 +1023,42 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					be_bound * 100.);
 		else
 			print_metric(config, ctxp, NULL, NULL, name, 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
+			full_td(ctx, cpu, st)) {
+		double retiring = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_RETIRING, st);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double fe_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_FE_BOUND, st);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double be_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BE_BOUND, st);
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "backend bound",
+				be_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
+			full_td(ctx, cpu, st)) {
+		double bad_spec = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
 				evsel->metric_name, avg, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 2856cc9d5a31..89beb5b766f9 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -87,6 +87,10 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
 	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
 	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+	ID(TOPDOWN_RETIRING, topdown-retiring),
+	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
+	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
+	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 2f9c9159a364..827c560ad19f 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -29,6 +29,10 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
 	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -81,6 +85,10 @@ enum stat_type {
 	STAT_TOPDOWN_SLOTS_RETIRED,
 	STAT_TOPDOWN_FETCH_BUBBLES,
 	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_TOPDOWN_RETIRING,
+	STAT_TOPDOWN_BAD_SPEC,
+	STAT_TOPDOWN_FE_BOUND,
+	STAT_TOPDOWN_BE_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 9/9] perf, tools: Add documentation for topdown metrics
  2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
                   ` (7 preceding siblings ...)
  2019-05-21 21:40 ` [PATCH 8/9] perf, tools, stat: Support new per thread TopDown metrics kan.liang
@ 2019-05-21 21:40 ` kan.liang
  8 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2019-05-21 21:40 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Add some documentation how to use the topdown metrics in ring 3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/topdown.txt | 223 +++++++++++++++++++++++++++++++++++
 1 file changed, 223 insertions(+)
 create mode 100644 tools/perf/Documentation/topdown.txt

diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
new file mode 100644
index 000000000000..e82a74fa9243
--- /dev/null
+++ b/tools/perf/Documentation/topdown.txt
@@ -0,0 +1,223 @@
+Using TopDown metrics in user space
+-----------------------------------
+
+Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
+methology to break down CPU pipeline execution into 4 bottlenecks:
+frontend bound, backend bound, bad speculation, retiring.
+
+For more details on Topdown see [1][5]
+
+Traditionally this was implemented by events in generic counters
+and specific formulas to compute the bottlenecks.
+
+perf stat --topdown implements this.
+
+% perf stat -a --topdown -I1000
+#           time             counts unit events
+     1.000373951      8,460,978,609      topdown-retiring          #     22.9% retiring
+     1.000373951      3,445,383,303      topdown-bad-spec          #      9.3% bad speculation
+     1.000373951     15,886,483,355      topdown-fe-bound          #     43.0% frontend bound
+     1.000373951      9,163,488,720      topdown-be-bound          #     24.8% backend bound
+     2.000782154      8,477,925,431      topdown-retiring          #     22.9% retiring
+     2.000782154      3,459,151,256      topdown-bad-spec          #      9.3% bad speculation
+     2.000782154     15,947,224,725      topdown-fe-bound          #     43.0% frontend bound
+     2.000782154      9,145,551,695      topdown-be-bound          #     24.7% backend bound
+     3.001155967      8,487,323,125      topdown-retiring          #     22.9% retiring
+     3.001155967      3,451,808,066      topdown-bad-spec          #      9.3% bad speculation
+     3.001155967     15,959,068,902      topdown-fe-bound          #     43.0% frontend bound
+     3.001155967      9,172,479,784      topdown-be-bound          #     24.7% backend bound
+...
+
+Full Top Down includes more levels that can break down the
+bottlenecks further. This is not directly implemented in perf,
+but available in other tools that can run on top of perf,
+such as toplev[2] or vtune[3]
+
+New Topdown features in Icelake
+===============================
+
+With Icelake CPUs the TopDown metrics are directly available as
+fixed counters and do not require generic counters. This allows
+to collect TopDown always in addition to other events.
+
+This also enables measuring TopDown per thread/process instead
+of only per core.
+
+Using TopDown through RDPMC in applications on Icelake
+======================================================
+
+For more fine grained measurements it can be useful to
+access the new  directly from user space. This is more complicated,
+but drastically lowers overhead.
+
+On Icelake, there is a new fixed counter 3: SLOTS, which reports
+"pipeline SLOTS" (cycles multiplied by core issue width) and a
+metric register that reports slots ratios for the different bottleneck
+categories.
+
+The metrics counter is CPU model specific and is not be available
+on older CPUs.
+
+Example code
+============
+
+Library functions to do the functionality described below
+is also available in libjevents [4]
+
+The application opens a perf_event file descriptor
+and sets up fixed counter 3 (SLOTS) to start and
+allow user programs to read the performance counters.
+
+Fixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
+so the perf_event_attr structure should be initialized with
+{ .config = 0x0400, .type = PERF_TYPE_RAW }
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/* Provide own perf_event_open stub because glibc doesn't */
+__attribute__((weak))
+int perf_event_open(struct perf_event_attr *attr, pid_t pid,
+		    int cpu, int group_fd, unsigned long flags)
+{
+	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+/* open slots counter file descriptor for current task */
+struct perf_event_attr slots = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x400,
+	.exclude_kernel = 1,
+};
+
+int fd = perf_event_open(&slots, 0, -1, -1, 0);
+if (fd < 0)
+	... error ...
+
+The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
+to read slots and the topdown metrics at different points of the program:
+
+#include <stdint.h>
+#include <x86intrin.h>
+
+#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
+#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
+
+#define FIXED_COUNTER_SLOTS		3
+#define METRIC_COUNTER_TOPDOWN_L1	0
+
+static inline uint64_t read_slots(void)
+{
+	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
+}
+
+static inline uint64_t read_metrics(void)
+{
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+}
+
+Then the program can be instrumented to read these metrics at different
+points.
+
+It's not a good idea to do this with too short code regions,
+as the parallelism and overlap in the CPU program execution will
+cause too much measurement inaccuracy. For example instrumenting
+individual basic blocks is definitely too fine grained.
+
+Decoding metrics values
+=======================
+
+The value reported by read_metrics() contains four 8 bit fields
+that represent a scaled ratio that represent the Level 1 bottleneck.
+All four fields add up to 0xff (= 100%)
+
+The binary ratios in the metric value can be converted to float ratios:
+
+#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
+
+#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
+#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
+#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
+#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
+
+and then converted to percent for printing.
+
+The ratios in the metric accumulate for the time when the counter
+is enabled. For measuring programs it is often useful to measure
+specific sections. For this it is needed to deltas on metrics.
+
+This can be done by scaling the metrics with the slots counter
+read at the same time.
+
+Then it's possible to take deltas of these slots counts
+measured at different points, and determine the metrics
+for that time period.
+
+	slots_a = read_slots();
+	metric_a = read_metrics();
+
+	... larger code region ...
+
+	slots_b = read_slots()
+	metric_b = read_metrics()
+
+	# compute scaled metrics for measurement a
+	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
+	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
+	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
+	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
+
+	# compute delta scaled metrics between b and a
+	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
+	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
+	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
+	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
+
+Later the individual ratios for the measurement period can be recreated
+from these counts.
+
+	slots_delta = slots_b - slots_a
+	retiring_ratio = (float)retiring_slots / slots_delta
+	bad_spec_ratio = (float)bad_spec_slots / slots_delta
+	fe_bound_ratio = (float)fe_bound_slots / slots_delta
+	be_bound_ratio = (float)be_bound_slots / slota_delta
+
+	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
+		retiring_ratio * 100.,
+		bad_spec_ratio * 100.,
+		fe_bound_ratio * 100.,
+		be_bound_ratio * 100.);
+
+Resetting metrics counters
+==========================
+
+Since the individual metrics are only 8bit they lose precision for
+short regions over time because the number of cycles covered by each
+fraction bit shrinks. So the counters need to be reset regularly.
+
+When using the kernel perf API the kernel resets on every read.
+So as long as the reading is at reasonable intervals (every few
+seconds) the precision is good.
+
+When using perf stat it is recommended to always use the -I option,
+with no longer interval than a few seconds
+
+	perf stat -I 1000 --topdown ...
+
+For user programs using RDPMC directly the counter can
+be reset explicitly using ioctl:
+
+	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
+
+This "opens" a new measurement period.
+
+A program using RDPMC for TopDown should schedule such a reset
+regularly, as in every few seconds.
+
+[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
+[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
+[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
+[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.14.5


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
@ 2019-05-28 12:05   ` Peter Zijlstra
  2019-05-28 18:20     ` Liang, Kan
  2019-05-28 12:15   ` Peter Zijlstra
  2019-05-29  8:14   ` Peter Zijlstra
  2 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:05 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Metrics counters (hardware counters containing multiple metrics)
> are modeled as separate registers for each TopDown metric events,
> with an extra reg being used for coordinating access to the
> underlying register in the scheduler.
> 
> This patch adds the basic infrastructure to separate the scheduler
> register indexes from the actual hardware register indexes. In
> most cases the MSR address is already used correctly, but for
> code using indexes we need a separate reg_idx field in the event
> to indicate the correct underlying register.

That doesn't parse. What exactly is the difference between reg_idx and
idx? AFAICT there is a fixed relation like:

  reg_idx = is_metric_idx(idx) ? INTEL_PMC_IDX_FIXED_SLOTS : idx;

Do we really need a variable for that?

Also, why do we need that whole enabled_events[] array. Do we really not
have that information elsewhere?

I shouldn've have to reverse engineer patches :/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
  2019-05-28 12:05   ` Peter Zijlstra
@ 2019-05-28 12:15   ` Peter Zijlstra
  2019-05-28 18:21     ` Liang, Kan
  2019-05-29  8:14   ` Peter Zijlstra
  2 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:15 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
> +/*
> + * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
> + * and another for the whole slots counter
> + *
> + * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
> + * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
> + * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
> + * have the same configuration.
> + */
> +#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
> +#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
> +#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
> +#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
> +#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
> +#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
> +						 INTEL_PMC_MSK_FIXED_SLOTS)
> +static inline bool is_metric_idx(int idx)
> +{
> +	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
> +}

Something like:

	return (idx >> INTEL_PMC_IDX_FIXED_METRIC_BASE) & 0xf;

might be faster code... (if it wasn't for 64bit literals being a pain,
it could be a simple test instruction).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS
  2019-05-21 21:40 ` [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS kan.liang
@ 2019-05-28 12:20   ` Peter Zijlstra
  2019-05-28 18:22     ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:20 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:49PM -0700, kan.liang@linux.intel.com wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The internal counters used for the metrics can overflow. If this happens
> an overflow is triggered on the SLOTS fixed counter. Add special code
> that resets all the slave metric counters in this case.

The SDM also talked about a OVF_PERF_METRICS overflow bit. Which seems
to suggest something else.

> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  arch/x86/events/intel/core.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 75ed91a36413..a66dc761f09d 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2279,12 +2279,35 @@ static void intel_pmu_add_event(struct perf_event *event)
>  		intel_pmu_lbr_add(event);
>  }
>  
> +/* When SLOTS overflowed update all the active topdown-* events */
> +static void intel_pmu_update_metrics(struct perf_event *event)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	int idx;
> +	u64 slots_events;
> +
> +	slots_events = *(u64 *)cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS;
> +
> +	for_each_set_bit(idx, (unsigned long *)&slots_events, 64) {

	for (idx = INTEL_PMC_IDX_TD_RETIRING;
	     idx <= INTEL_PMC_IDX_TD_BE_BOUND; idx++)

perhaps?

> +		struct perf_event *ev = cpuc->events[idx];
> +
> +		if (ev == event)
> +			continue;
> +		x86_perf_event_update(event);

		if (ev != event)
			x86_perf_event_update(event)
> +	}
> +}
> +
>  /*
>   * Save and restart an expired event. Called by NMI contexts,
>   * so it has to be careful about preempting normal event ops:
>   */
>  int intel_pmu_save_and_restart(struct perf_event *event)
>  {
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (unlikely(hwc->reg_idx == INTEL_PMC_IDX_FIXED_SLOTS))
> +		intel_pmu_update_metrics(event);
> +
>  	x86_perf_event_update(event);
>  	/*
>  	 * For a checkpointed counter always reset back to 0.  This
> -- 
> 2.14.5
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
@ 2019-05-28 12:43   ` Peter Zijlstra
  2019-05-28 18:23     ` Liang, Kan
  2019-05-28 12:53   ` Peter Zijlstra
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:43 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> The 8bit metrics ratio values lose precision when the measurement period
> gets longer.
> 
> To avoid this we always reset the metric value when reading, as we
> already accumulate the count in the perf count value.
> 
> For a long period read, low precision is acceptable.
> For a short period read, the register will be reset often enough that it
> is not a problem.

> The PERF_METRICS may report wrong value if its delta was less than 1/255
> of SLOTS (Fixed counter 3).
> 
> To avoid this, the PERF_METRICS and SLOTS registers have to be reset
> simultaneously. The slots value has to be cached as well.

That doesn't sound like it is NMI-safe.



> RDPMC
> =========
> The TopDown can be collected per thread/process. To use TopDown
> through RDPMC in applications on Icelake, the metrics and slots values
> have to be saved/restored during context switching.
> 
> Add specific set_period() to specially handle the slots and metrics
> event. Because,
>  - The initial value must be 0.
>  - Only need to restore the value in context switch. For other cases,
>    the counters have been cleared after read.

So the above claims to explain RDPMC, but doesn't mention that magic
value below at all. In fact, I don't see how the above relates to RDPMC
at all.

> @@ -2141,7 +2157,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
>  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>  		return 0;
>  
> -	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
> +	if (is_metric_idx(idx))
> +		idx = 1 << 29;

I can't find this in the SDM RDPMC description. What does it return?

> +	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
>  		idx -= INTEL_PMC_IDX_FIXED;
>  		idx |= 1 << 30;
>  	}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
  2019-05-28 12:43   ` Peter Zijlstra
@ 2019-05-28 12:53   ` Peter Zijlstra
  2019-05-28 12:56   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:53 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> +		x86_pmu.has_metric = x86_pmu.intel_cap.perf_metrics;

It makes sense to duplicate that state because?

> @@ -717,6 +729,8 @@ struct x86_pmu {
>  	struct extra_reg *extra_regs;
>  	unsigned int flags;
>  
> +	bool has_metric;
> +
>  	/*
>  	 * Intel host/guest support (KVM)
>  	 */

You forgot how I feel about _Bool in composite types _again_ ?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
  2019-05-28 12:43   ` Peter Zijlstra
  2019-05-28 12:53   ` Peter Zijlstra
@ 2019-05-28 12:56   ` Peter Zijlstra
  2019-05-28 13:32     ` Peter Zijlstra
  2019-05-28 13:30   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 12:56 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index e9075d57853d..07ecfe75f0e6 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -91,16 +91,20 @@ u64 x86_perf_event_update(struct perf_event *event)
>  					new_raw_count) != prev_raw_count)
>  		goto again;
>  
> -	/*
> -	 * Now we have the new raw value and have updated the prev
> -	 * timestamp already. We can now calculate the elapsed delta
> -	 * (event-)time and add that to the generic event.
> -	 *
> -	 * Careful, not all hw sign-extends above the physical width
> -	 * of the count.
> -	 */
> -	delta = (new_raw_count << shift) - (prev_raw_count << shift);
> -	delta >>= shift;
> +	if (unlikely(hwc->flags & PERF_X86_EVENT_UPDATE))
> +		delta = x86_pmu.metric_update_event(event, new_raw_count);
> +	else {
> +		/*
> +		 * Now we have the new raw value and have updated the prev
> +		 * timestamp already. We can now calculate the elapsed delta
> +		 * (event-)time and add that to the generic event.
> +		 *
> +		 * Careful, not all hw sign-extends above the physical width
> +		 * of the count.
> +		 */
> +		delta = (new_raw_count << shift) - (prev_raw_count << shift);
> +		delta >>= shift;
> +	}
>  
>  	local64_add(delta, &event->count);
>  	local64_sub(delta, &hwc->period_left);

> @@ -1186,6 +1194,9 @@ int x86_perf_event_set_period(struct perf_event *event)
>  	if (idx == INTEL_PMC_IDX_FIXED_BTS)
>  		return 0;
>  
> +	if (x86_pmu.set_period && x86_pmu.set_period(event))
> +		goto update_userpage;
> +
>  	/*
>  	 * If we are way outside a reasonable range then just skip forward:
>  	 */
> @@ -1234,6 +1245,7 @@ int x86_perf_event_set_period(struct perf_event *event)
>  			(u64)(-left) & x86_pmu.cntval_mask);
>  	}
>  
> +update_userpage:
>  	perf_event_update_userpage(event);
>  
>  	return ret;


*groan*.... that's pretty terrible.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
                     ` (2 preceding siblings ...)
  2019-05-28 12:56   ` Peter Zijlstra
@ 2019-05-28 13:30   ` Peter Zijlstra
  2019-05-28 18:24     ` Liang, Kan
  2019-05-28 13:43   ` Peter Zijlstra
  2019-05-28 13:48   ` Peter Zijlstra
  5 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:30 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> +static u64 icl_metric_update_event(struct perf_event *event, u64 val)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u64 newval, metric, slots_val = 0, new, last;
> +	bool nmi = in_nmi();
> +	int txn_flags = nmi ? 0 : cpuc->txn_flags;
> +
> +	/*
> +	 * Use cached value for transaction.
> +	 */
> +	newval = 0;
> +	if (txn_flags) {
> +		newval = cpuc->txn_metric;
> +		slots_val = cpuc->txn_slots;
> +	} else if (nmi) {
> +		newval = cpuc->nmi_metric;
> +		slots_val = cpuc->nmi_slots;
> +	}
> +
> +	if (!newval) {
> +		slots_val = val;
> +
> +		rdpmcl((1<<29), newval);
> +		if (txn_flags) {
> +			cpuc->txn_metric = newval;
> +			cpuc->txn_slots = slots_val;
> +		} else if (nmi) {
> +			cpuc->nmi_metric = newval;
> +			cpuc->nmi_slots = slots_val;
> +		}
> +
> +		if (!(txn_flags & PERF_PMU_TXN_REMOVE)) {
> +			/* Reset the metric value when reading
> +			 * The SLOTS register must be reset when PERF_METRICS reset,
> +			 * otherwise PERF_METRICS may has wrong output.
> +			 */

broken comment style.. (and grammer)

> +			wrmsrl(MSR_PERF_METRICS, 0);
> +			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);

I don't get this, overflow happens on when we flip sign, so why is
programming 0 a sane thing to do?

> +			hwc->saved_metric = 0;
> +			hwc->saved_slots = 0;
> +		} else {
> +			/* saved metric and slots for context switch */
> +			hwc->saved_metric = newval;
> +			hwc->saved_slots = val;
> +
> +		}
> +		/* cache the last metric and slots */
> +		cpuc->last_metric = hwc->last_metric;
> +		cpuc->last_slots = hwc->last_slots;
> +		hwc->last_metric = 0;
> +		hwc->last_slots = 0;
> +	}
> +
> +	/* The METRICS and SLOTS have been reset when reading */
> +	if (!(txn_flags & PERF_PMU_TXN_REMOVE))
> +		local64_set(&hwc->prev_count, 0);
> +
> +	if (is_slots_event(event))
> +		return (slots_val - cpuc->last_slots);
> +
> +	/*
> +	 * The metric is reported as an 8bit integer percentage
> +	 * suming up to 0xff. As the counter is less than 64bits
> +	 * we can use the not used bits to get the needed precision.
> +	 * Use 16bit fixed point arithmetic for
> +	 * slots-in-metric = (MetricPct / 0xff) * val
> +	 * This works fine for upto 48bit counters, but will
> +	 * lose precision above that.
> +	 */
> +
> +	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
> +	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;

How is that cpuc->last_* crap not broken for NMIs ?

> +
> +	metric = (newval >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
> +	new = (((metric * 0xffff) >> 8) * slots_val) >> 16;
> +
> +	return (new - last);
> +}


This is diguisting.. and unreadable.

mul_u64_u32_shr() is actually really fast, use it.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 12:56   ` Peter Zijlstra
@ 2019-05-28 13:32     ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:32 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:56:15PM +0200, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> 
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index e9075d57853d..07ecfe75f0e6 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -91,16 +91,20 @@ u64 x86_perf_event_update(struct perf_event *event)
> >  					new_raw_count) != prev_raw_count)
> >  		goto again;

AFAICT, you don't actually need that cmpxchg loop for the metric crud.

> >  
> > -	/*
> > -	 * Now we have the new raw value and have updated the prev
> > -	 * timestamp already. We can now calculate the elapsed delta
> > -	 * (event-)time and add that to the generic event.
> > -	 *
> > -	 * Careful, not all hw sign-extends above the physical width
> > -	 * of the count.
> > -	 */
> > -	delta = (new_raw_count << shift) - (prev_raw_count << shift);
> > -	delta >>= shift;
> > +	if (unlikely(hwc->flags & PERF_X86_EVENT_UPDATE))
> > +		delta = x86_pmu.metric_update_event(event, new_raw_count);
> > +	else {
> > +		/*
> > +		 * Now we have the new raw value and have updated the prev
> > +		 * timestamp already. We can now calculate the elapsed delta
> > +		 * (event-)time and add that to the generic event.
> > +		 *
> > +		 * Careful, not all hw sign-extends above the physical width
> > +		 * of the count.
> > +		 */
> > +		delta = (new_raw_count << shift) - (prev_raw_count << shift);
> > +		delta >>= shift;
> > +	}
> >  
> >  	local64_add(delta, &event->count);
> >  	local64_sub(delta, &hwc->period_left);
> 
> > @@ -1186,6 +1194,9 @@ int x86_perf_event_set_period(struct perf_event *event)
> >  	if (idx == INTEL_PMC_IDX_FIXED_BTS)
> >  		return 0;
> >  
> > +	if (x86_pmu.set_period && x86_pmu.set_period(event))
> > +		goto update_userpage;
> > +
> >  	/*
> >  	 * If we are way outside a reasonable range then just skip forward:
> >  	 */
> > @@ -1234,6 +1245,7 @@ int x86_perf_event_set_period(struct perf_event *event)
> >  			(u64)(-left) & x86_pmu.cntval_mask);
> >  	}
> >  
> > +update_userpage:
> >  	perf_event_update_userpage(event);
> >  
> >  	return ret;
> 
> 
> *groan*.... that's pretty terrible.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
                     ` (3 preceding siblings ...)
  2019-05-28 13:30   ` Peter Zijlstra
@ 2019-05-28 13:43   ` Peter Zijlstra
  2019-05-28 18:24     ` Liang, Kan
  2019-05-28 13:48   ` Peter Zijlstra
  5 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:43 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> @@ -3287,6 +3304,13 @@ static int core_pmu_hw_config(struct perf_event *event)
>  	return intel_pmu_bts_config(event);
>  }
>  
> +#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
> +#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
> +#define is_perf_metrics_event(e)				\
> +		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
> +		 (EVENT_CODE(e) >= 0x01ff) &&			\
> +		 (EVENT_CODE(e) <= 0x04ff))
> +

That is horrific.. (e & INTEL_ARCH_EVENT_MASK) & 0xff is just daft,
that should be: (e & ARCH_PERFMON_EVENTSEL_EVENT).

Also, we already have fake events for event=0, see FIXED2, why are we
now using event=0xff ?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
                     ` (4 preceding siblings ...)
  2019-05-28 13:43   ` Peter Zijlstra
@ 2019-05-28 13:48   ` Peter Zijlstra
  2019-05-28 18:24     ` Liang, Kan
  5 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:48 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index b980b9e95d2a..0d7081434d1d 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -133,6 +133,11 @@ struct hw_perf_event {
>  
>  			struct hw_perf_event_extra extra_reg;
>  			struct hw_perf_event_extra branch_reg;
> +
> +			u64		saved_metric;
> +			u64		saved_slots;
> +			u64		last_slots;
> +			u64		last_metric;

This is really sad, and I'm thinking much of that really isn't needed
anyway, due to how you're not using some of the other fields.

>  		};
>  		struct { /* software */
>  			struct hrtimer	hrtimer;
> -- 
> 2.14.5
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events
  2019-05-21 21:40 ` [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events kan.liang
@ 2019-05-28 13:50   ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:50 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:51PM -0700, kan.liang@linux.intel.com wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The topdown metrics and slots events are mapped to a fixed counter,
> but should have the normal weight for the scheduler.

You forgot the 'why/because' part of that sentence.

> So special case this.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown
  2019-05-21 21:40 ` [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown kan.liang
@ 2019-05-28 13:52   ` Peter Zijlstra
  2019-05-28 18:25     ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-28 13:52 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 21, 2019 at 02:40:53PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> To get correct PERF_METRICS value, the fixed counter 3 must start from
> 0. It would bring problems when sampling read slots and topdown events.
> For example,
>         perf record -e '{slots, topdown-retiring}:S'
> The slots would not overflow if it starts from 0.
> 
> Add specific validate_group() support to reject the case and error out
> for Icelake.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  arch/x86/events/core.c       |  2 ++
>  arch/x86/events/intel/core.c | 20 ++++++++++++++++++++
>  arch/x86/events/perf_event.h |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 07ecfe75f0e6..a7eb842f8651 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2065,6 +2065,8 @@ static int validate_group(struct perf_event *event)
>  	fake_cpuc->n_events = 0;
>  	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
>  
> +	if (x86_pmu.validate_group)
> +		ret = x86_pmu.validate_group(fake_cpuc, n);
>  out:
>  	free_fake_cpuc(fake_cpuc);
>  	return ret;
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 79e9d05e047d..2bb90d652a35 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4410,6 +4410,25 @@ static int icl_set_period(struct perf_event *event)
>  	return 1;
>  }
>  
> +static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
> +{
> +	bool has_sampling_slots = false, has_metrics = false;
> +	struct perf_event *e;
> +	int i;
> +
> +	for (i = 0; i < n; i++) {
> +		e = cpuc->event_list[i];
> +		if (is_slots_event(e) && is_sampling_event(e))
> +			has_sampling_slots = true;
> +
> +		if (is_perf_metrics_event(e))
> +			has_metrics = true;
> +	}
> +	if (unlikely(has_sampling_slots && has_metrics))
> +		return -EINVAL;
> +	return 0;
> +}

Why this special hack, why not disallow sampling on SLOTS on creation?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-28 12:05   ` Peter Zijlstra
@ 2019-05-28 18:20     ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 8:05 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
>> From: Andi Kleen <ak@linux.intel.com>
>>
>> Metrics counters (hardware counters containing multiple metrics)
>> are modeled as separate registers for each TopDown metric events,
>> with an extra reg being used for coordinating access to the
>> underlying register in the scheduler.
>>
>> This patch adds the basic infrastructure to separate the scheduler
>> register indexes from the actual hardware register indexes. In
>> most cases the MSR address is already used correctly, but for
>> code using indexes we need a separate reg_idx field in the event
>> to indicate the correct underlying register.
> 
> That doesn't parse. What exactly is the difference between reg_idx and
> idx? AFAICT there is a fixed relation like:
> 
>    reg_idx = is_metric_idx(idx) ? INTEL_PMC_IDX_FIXED_SLOTS : idx;
> 
> Do we really need a variable for that?

It may save the calculation. But, right, a variable is not necessary.

> 
> Also, why do we need that whole enabled_events[] array. Do we really not
> have that information elsewhere?

No. We don't have a case that several events share a counter at the same 
time. We don't need to check if other events are enabled when we try to 
disable a counter. So we don't save such information.
But we have to do it for metrics events.

Thanks,
Kan
> 
> I shouldn've have to reverse engineer patches :/
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-28 12:15   ` Peter Zijlstra
@ 2019-05-28 18:21     ` Liang, Kan
  2019-05-29  7:28       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 8:15 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
>> +/*
>> + * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
>> + * and another for the whole slots counter
>> + *
>> + * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
>> + * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
>> + * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
>> + * have the same configuration.
>> + */
>> +#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
>> +#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
>> +#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
>> +#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
>> +#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
>> +#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
>> +						 INTEL_PMC_MSK_FIXED_SLOTS)
>> +static inline bool is_metric_idx(int idx)
>> +{
>> +	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
>> +}
> 
> Something like:
> 
> 	return (idx >> INTEL_PMC_IDX_FIXED_METRIC_BASE) & 0xf;
> 
> might be faster code... (if it wasn't for 64bit literals being a pain,
> it could be a simple test instruction).
> 

is_metric_idx() is not a mask. It's to check if the idx between 49 and 52.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS
  2019-05-28 12:20   ` Peter Zijlstra
@ 2019-05-28 18:22     ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 8:20 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:49PM -0700, kan.liang@linux.intel.com wrote:
>> From: Andi Kleen <ak@linux.intel.com>
>>
>> The internal counters used for the metrics can overflow. If this happens
>> an overflow is triggered on the SLOTS fixed counter. Add special code
>> that resets all the slave metric counters in this case.
> 
> The SDM also talked about a OVF_PERF_METRICS overflow bit. Which seems
> to suggest something else.

Yes, I think we should handle the bit as well and only update metrics event.

It looks like bit 48 is occupied by INTEL_PMC_IDX_FIXED_BTS.
I will also change
INTEL_PMC_IDX_FIXED_BTS to INTEL_PMC_IDX_FIXED + 15.
INTEL_PMC_IDX_FIXED_METRIC_BASE to INTEL_PMC_IDX_FIXED + 16

Thanks,
Kan

> 
>> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   arch/x86/events/intel/core.c | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 75ed91a36413..a66dc761f09d 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2279,12 +2279,35 @@ static void intel_pmu_add_event(struct perf_event *event)
>>   		intel_pmu_lbr_add(event);
>>   }
>>   
>> +/* When SLOTS overflowed update all the active topdown-* events */
>> +static void intel_pmu_update_metrics(struct perf_event *event)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +	int idx;
>> +	u64 slots_events;
>> +
>> +	slots_events = *(u64 *)cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS;
>> +
>> +	for_each_set_bit(idx, (unsigned long *)&slots_events, 64) {
> 
> 	for (idx = INTEL_PMC_IDX_TD_RETIRING;
> 	     idx <= INTEL_PMC_IDX_TD_BE_BOUND; idx++)
> 
> perhaps?
> 
>> +		struct perf_event *ev = cpuc->events[idx];
>> +
>> +		if (ev == event)
>> +			continue;
>> +		x86_perf_event_update(event);
> 
> 		if (ev != event)
> 			x86_perf_event_update(event)
>> +	}
>> +}
>> +
>>   /*
>>    * Save and restart an expired event. Called by NMI contexts,
>>    * so it has to be careful about preempting normal event ops:
>>    */
>>   int intel_pmu_save_and_restart(struct perf_event *event)
>>   {
>> +	struct hw_perf_event *hwc = &event->hw;
>> +
>> +	if (unlikely(hwc->reg_idx == INTEL_PMC_IDX_FIXED_SLOTS))
>> +		intel_pmu_update_metrics(event);
>> +
>>   	x86_perf_event_update(event);
>>   	/*
>>   	 * For a checkpointed counter always reset back to 0.  This
>> -- 
>> 2.14.5
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 12:43   ` Peter Zijlstra
@ 2019-05-28 18:23     ` Liang, Kan
  2019-05-29  7:30       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 8:43 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>> The 8bit metrics ratio values lose precision when the measurement period
>> gets longer.
>>
>> To avoid this we always reset the metric value when reading, as we
>> already accumulate the count in the perf count value.
>>
>> For a long period read, low precision is acceptable.
>> For a short period read, the register will be reset often enough that it
>> is not a problem.
> 
>> The PERF_METRICS may report wrong value if its delta was less than 1/255
>> of SLOTS (Fixed counter 3).
>>
>> To avoid this, the PERF_METRICS and SLOTS registers have to be reset
>> simultaneously. The slots value has to be cached as well.
> 
> That doesn't sound like it is NMI-safe.
>  >
> 
>> RDPMC
>> =========
>> The TopDown can be collected per thread/process. To use TopDown
>> through RDPMC in applications on Icelake, the metrics and slots values
>> have to be saved/restored during context switching.
>>
>> Add specific set_period() to specially handle the slots and metrics
>> event. Because,
>>   - The initial value must be 0.
>>   - Only need to restore the value in context switch. For other cases,
>>     the counters have been cleared after read.
> 
> So the above claims to explain RDPMC, but doesn't mention that magic
> value below at all. In fact, I don't see how the above relates to RDPMC
> at all.

Current perf only support per-core Topdown RDPMC. On Icelake, it can be 
extended to per-thread Topdown RDPMC.
It tries to explain the extra work for per-thread topdown RDPMC, e.g. 
save/restore slots and metrics value in context switch.


> 
>> @@ -2141,7 +2157,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
>>   	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>   		return 0;
>>   
>> -	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
>> +	if (is_metric_idx(idx))
>> +		idx = 1 << 29;
> 
> I can't find this in the SDM RDPMC description. What does it return?

It will return the value of PERF_METRICS. I will add it in the changelog.

Thanks,
Kan
> 
>> +	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
>>   		idx -= INTEL_PMC_IDX_FIXED;
>>   		idx |= 1 << 30;
>>   	}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 13:30   ` Peter Zijlstra
@ 2019-05-28 18:24     ` Liang, Kan
  2019-05-29  7:34       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 9:30 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>> +static u64 icl_metric_update_event(struct perf_event *event, u64 val)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	u64 newval, metric, slots_val = 0, new, last;
>> +	bool nmi = in_nmi();
>> +	int txn_flags = nmi ? 0 : cpuc->txn_flags;
>> +
>> +	/*
>> +	 * Use cached value for transaction.
>> +	 */
>> +	newval = 0;
>> +	if (txn_flags) {
>> +		newval = cpuc->txn_metric;
>> +		slots_val = cpuc->txn_slots;
>> +	} else if (nmi) {
>> +		newval = cpuc->nmi_metric;
>> +		slots_val = cpuc->nmi_slots;
>> +	}
>> +
>> +	if (!newval) {
>> +		slots_val = val;
>> +
>> +		rdpmcl((1<<29), newval);
>> +		if (txn_flags) {
>> +			cpuc->txn_metric = newval;
>> +			cpuc->txn_slots = slots_val;
>> +		} else if (nmi) {
>> +			cpuc->nmi_metric = newval;
>> +			cpuc->nmi_slots = slots_val;
>> +		}
>> +
>> +		if (!(txn_flags & PERF_PMU_TXN_REMOVE)) {
>> +			/* Reset the metric value when reading
>> +			 * The SLOTS register must be reset when PERF_METRICS reset,
>> +			 * otherwise PERF_METRICS may has wrong output.
>> +			 */
> 
> broken comment style.. (and grammer)

Missed a full stop.
Should be "Reset the metric value for each read."
> 
>> +			wrmsrl(MSR_PERF_METRICS, 0);
>> +			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
> 
> I don't get this, overflow happens on when we flip sign, so why is
> programming 0 a sane thing to do?

Reset the counters (programming 0) don't trigger overflow.
We have to reset both registers for each read to avoid the known 
PERF_METRICS issue.


> 
>> +			hwc->saved_metric = 0;
>> +			hwc->saved_slots = 0;
>> +		} else {
>> +			/* saved metric and slots for context switch */
>> +			hwc->saved_metric = newval;
>> +			hwc->saved_slots = val;
>> +
>> +		}
>> +		/* cache the last metric and slots */
>> +		cpuc->last_metric = hwc->last_metric;
>> +		cpuc->last_slots = hwc->last_slots;
>> +		hwc->last_metric = 0;
>> +		hwc->last_slots = 0;
>> +	}
>> +
>> +	/* The METRICS and SLOTS have been reset when reading */
>> +	if (!(txn_flags & PERF_PMU_TXN_REMOVE))
>> +		local64_set(&hwc->prev_count, 0);
>> +
>> +	if (is_slots_event(event))
>> +		return (slots_val - cpuc->last_slots);
>> +
>> +	/*
>> +	 * The metric is reported as an 8bit integer percentage
>> +	 * suming up to 0xff. As the counter is less than 64bits
>> +	 * we can use the not used bits to get the needed precision.
>> +	 * Use 16bit fixed point arithmetic for
>> +	 * slots-in-metric = (MetricPct / 0xff) * val
>> +	 * This works fine for upto 48bit counters, but will
>> +	 * lose precision above that.
>> +	 */
>> +
>> +	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
>> +	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;
> 
> How is that cpuc->last_* crap not broken for NMIs ?

There should be no NMI for slots or metric events at the moment, because 
the MSR_PERF_METRICS and MSR_CORE_PERF_FIXED_CTR3 are reset in first read.
Other NMIs will not touch the codes here.

Thanks,
Kan

> 
>> +
>> +	metric = (newval >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
>> +	new = (((metric * 0xffff) >> 8) * slots_val) >> 16;
>> +
>> +	return (new - last);
>> +}
> 
> 
> This is diguisting.. and unreadable.
> 
> mul_u64_u32_shr() is actually really fast, use it.
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 13:43   ` Peter Zijlstra
@ 2019-05-28 18:24     ` Liang, Kan
  2019-05-29  7:54       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 9:43 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>> @@ -3287,6 +3304,13 @@ static int core_pmu_hw_config(struct perf_event *event)
>>   	return intel_pmu_bts_config(event);
>>   }
>>   
>> +#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
>> +#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
>> +#define is_perf_metrics_event(e)				\
>> +		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
>> +		 (EVENT_CODE(e) >= 0x01ff) &&			\
>> +		 (EVENT_CODE(e) <= 0x04ff))
>> +
> 
> That is horrific.. (e & INTEL_ARCH_EVENT_MASK) & 0xff is just daft,
> that should be: (e & ARCH_PERFMON_EVENTSEL_EVENT).
> 
> Also, we already have fake events for event=0, see FIXED2, why are we
> now using event=0xff ?

I think event=0 is for genuine fixed events. Metrics events are fake events.
I didn't find FIXED2 in the code. Could you please share more hints?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 13:48   ` Peter Zijlstra
@ 2019-05-28 18:24     ` Liang, Kan
  2019-05-29  7:57       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 9:48 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index b980b9e95d2a..0d7081434d1d 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -133,6 +133,11 @@ struct hw_perf_event {
>>   
>>   			struct hw_perf_event_extra extra_reg;
>>   			struct hw_perf_event_extra branch_reg;
>> +
>> +			u64		saved_metric;
>> +			u64		saved_slots;
>> +			u64		last_slots;
>> +			u64		last_metric;
> 
> This is really sad, and I'm thinking much of that really isn't needed
> anyway, due to how you're not using some of the other fields.

If we don't cache the value, we have to update all metrics events when 
reading any metrics event. I think that could bring high overhead.

Thanks,
Kan
> 
>>   		};
>>   		struct { /* software */
>>   			struct hrtimer	hrtimer;
>> -- 
>> 2.14.5
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown
  2019-05-28 13:52   ` Peter Zijlstra
@ 2019-05-28 18:25     ` Liang, Kan
  2019-05-29  7:58       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-28 18:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/28/2019 9:52 AM, Peter Zijlstra wrote:
> On Tue, May 21, 2019 at 02:40:53PM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> To get correct PERF_METRICS value, the fixed counter 3 must start from
>> 0. It would bring problems when sampling read slots and topdown events.
>> For example,
>>          perf record -e '{slots, topdown-retiring}:S'
>> The slots would not overflow if it starts from 0.
>>
>> Add specific validate_group() support to reject the case and error out
>> for Icelake.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   arch/x86/events/core.c       |  2 ++
>>   arch/x86/events/intel/core.c | 20 ++++++++++++++++++++
>>   arch/x86/events/perf_event.h |  2 ++
>>   3 files changed, 24 insertions(+)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 07ecfe75f0e6..a7eb842f8651 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -2065,6 +2065,8 @@ static int validate_group(struct perf_event *event)
>>   	fake_cpuc->n_events = 0;
>>   	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
>>   
>> +	if (x86_pmu.validate_group)
>> +		ret = x86_pmu.validate_group(fake_cpuc, n);
>>   out:
>>   	free_fake_cpuc(fake_cpuc);
>>   	return ret;
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 79e9d05e047d..2bb90d652a35 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -4410,6 +4410,25 @@ static int icl_set_period(struct perf_event *event)
>>   	return 1;
>>   }
>>   
>> +static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
>> +{
>> +	bool has_sampling_slots = false, has_metrics = false;
>> +	struct perf_event *e;
>> +	int i;
>> +
>> +	for (i = 0; i < n; i++) {
>> +		e = cpuc->event_list[i];
>> +		if (is_slots_event(e) && is_sampling_event(e))
>> +			has_sampling_slots = true;
>> +
>> +		if (is_perf_metrics_event(e))
>> +			has_metrics = true;
>> +	}
>> +	if (unlikely(has_sampling_slots && has_metrics))
>> +		return -EINVAL;
>> +	return 0;
>> +}
> 
> Why this special hack, why not disallow sampling on SLOTS on creation?

You mean unconditionally disable SLOTS sampling?

The SLOTS doesn't have to be with Topdown metrics event.
I think users may want to only sampling slot events. We should allow 
this usage.

Thanks,
Kan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-28 18:21     ` Liang, Kan
@ 2019-05-29  7:28       ` Peter Zijlstra
  2019-05-29 14:40         ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:28 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:21:49PM -0400, Liang, Kan wrote:
> 
> 
> On 5/28/2019 8:15 AM, Peter Zijlstra wrote:
> > On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
> > > +/*
> > > + * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
> > > + * and another for the whole slots counter
> > > + *
> > > + * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
> > > + * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
> > > + * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
> > > + * have the same configuration.
> > > + */
> > > +#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
> > > +#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
> > > +#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
> > > +#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
> > > +#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
> > > +#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
> > > +						 INTEL_PMC_MSK_FIXED_SLOTS)
> > > +static inline bool is_metric_idx(int idx)
> > > +{
> > > +	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
> > > +}
> > 
> > Something like:
> > 
> > 	return (idx >> INTEL_PMC_IDX_FIXED_METRIC_BASE) & 0xf;
> > 
> > might be faster code... (if it wasn't for 64bit literals being a pain,
> > it could be a simple test instruction).
> > 
> 
> is_metric_idx() is not a mask. It's to check if the idx between 49 and 52.

blergh, indeed. Check that it compiles into a single branch though, if
it gets that wrong it needs hand holding.

One way would be to make these 48-51 and put BTS as 52, and then you do
have a mask.

Another way would be to write it as:

	(unsigned)(idx - INTEL_PMC_IDX_FIXED_METRIC_BASE) < 4;



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 18:23     ` Liang, Kan
@ 2019-05-29  7:30       ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:30 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:23:09PM -0400, Liang, Kan wrote:
> > > RDPMC
> > > =========
> > > The TopDown can be collected per thread/process. To use TopDown
> > > through RDPMC in applications on Icelake, the metrics and slots values
> > > have to be saved/restored during context switching.
> > > 
> > > Add specific set_period() to specially handle the slots and metrics
> > > event. Because,
> > >   - The initial value must be 0.
> > >   - Only need to restore the value in context switch. For other cases,
> > >     the counters have been cleared after read.
> > 
> > So the above claims to explain RDPMC, but doesn't mention that magic
> > value below at all. In fact, I don't see how the above relates to RDPMC
> > at all.
> 
> Current perf only support per-core Topdown RDPMC. On Icelake, it can be
> extended to per-thread Topdown RDPMC.
> It tries to explain the extra work for per-thread topdown RDPMC, e.g.
> save/restore slots and metrics value in context switch.

Right, this has what relation to RDPMC ?

> > > @@ -2141,7 +2157,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
> > >   	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> > >   		return 0;
> > > -	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
> > > +	if (is_metric_idx(idx))
> > > +		idx = 1 << 29;
> > 
> > I can't find this in the SDM RDPMC description. What does it return?
> 
> It will return the value of PERF_METRICS. I will add it in the changelog.

A comment would be even better.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 18:24     ` Liang, Kan
@ 2019-05-29  7:34       ` Peter Zijlstra
  2019-05-29 14:41         ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:34 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:24:14PM -0400, Liang, Kan wrote:

> > > +		if (!(txn_flags & PERF_PMU_TXN_REMOVE)) {
> > > +			/* Reset the metric value when reading
> > > +			 * The SLOTS register must be reset when PERF_METRICS reset,
> > > +			 * otherwise PERF_METRICS may has wrong output.
> > > +			 */
> > 
> > broken comment style.. (and grammer)
> 
> Missed a full stop.
> Should be "Reset the metric value for each read."

s/may has wrong/may have wrong/

> > > +			wrmsrl(MSR_PERF_METRICS, 0);
> > > +			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
> > 
> > I don't get this, overflow happens on when we flip sign, so why is
> > programming 0 a sane thing to do?
> 
> Reset the counters (programming 0) don't trigger overflow.

Right, so why then do you allow creating this thing as
is_sampling_event() ?

> We have to reset both registers for each read to avoid the known
> PERF_METRICS issue.

'the known PERF_METRICS issue' is unknown to me and any other reader.

> > > +	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
> > > +	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;
> > 
> > How is that cpuc->last_* crap not broken for NMIs ?
> 
> There should be no NMI for slots or metric events at the moment, because the
> MSR_PERF_METRICS and MSR_CORE_PERF_FIXED_CTR3 are reset in first read.
> Other NMIs will not touch the codes here.

What happens if someone does: read(perf_fd) and then has the NMI hit?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 18:24     ` Liang, Kan
@ 2019-05-29  7:54       ` Peter Zijlstra
  2019-05-29 14:42         ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:54 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:24:38PM -0400, Liang, Kan wrote:
> 
> 
> On 5/28/2019 9:43 AM, Peter Zijlstra wrote:
> > On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> > > @@ -3287,6 +3304,13 @@ static int core_pmu_hw_config(struct perf_event *event)
> > >   	return intel_pmu_bts_config(event);
> > >   }
> > > +#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
> > > +#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
> > > +#define is_perf_metrics_event(e)				\
> > > +		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
> > > +		 (EVENT_CODE(e) >= 0x01ff) &&			\
> > > +		 (EVENT_CODE(e) <= 0x04ff))
> > > +
> > 
> > That is horrific.. (e & INTEL_ARCH_EVENT_MASK) & 0xff is just daft,
> > that should be: (e & ARCH_PERFMON_EVENTSEL_EVENT).
> > 
> > Also, we already have fake events for event=0, see FIXED2, why are we
> > now using event=0xff ?
> 
> I think event=0 is for genuine fixed events. Metrics events are fake events.
> I didn't find FIXED2 in the code. Could you please share more hints?

cd09c0c40a97 ("perf events: Enable raw event support for Intel unhalted_reference_cycles event")

We used the fake event=0x00, umask=0x03 for CPU_CLK_UNHALTED.REF_TSC,
because that was not available as a generic event, *until now* it seems.
I see ICL actually has it as a generic event, which means we need to fix
up the constraint mask for that differently.

But note that for all previous uarchs this event did not in fact exist.

It appears the TOPDOWN.SLOTS thing, which is available in in FIXED3 is
event=0x00, umask=0x04, is indeed a generic event too.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-28 18:24     ` Liang, Kan
@ 2019-05-29  7:57       ` Peter Zijlstra
  2019-05-29 14:42         ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:57 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:24:56PM -0400, Liang, Kan wrote:
> 
> 
> On 5/28/2019 9:48 AM, Peter Zijlstra wrote:
> > On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
> > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > > index b980b9e95d2a..0d7081434d1d 100644
> > > --- a/include/linux/perf_event.h
> > > +++ b/include/linux/perf_event.h
> > > @@ -133,6 +133,11 @@ struct hw_perf_event {
> > >   			struct hw_perf_event_extra extra_reg;
> > >   			struct hw_perf_event_extra branch_reg;
> > > +
> > > +			u64		saved_metric;
> > > +			u64		saved_slots;
> > > +			u64		last_slots;
> > > +			u64		last_metric;
> > 
> > This is really sad, and I'm thinking much of that really isn't needed
> > anyway, due to how you're not using some of the other fields.
> 
> If we don't cache the value, we have to update all metrics events when
> reading any metrics event. I think that could bring high overhead.

Since you don't support sampling, or even 'normal' functionality on this
FIXED3/SLOTS thing, you'll not use prev_count, sample_period,
last_period, period_left, interrupts_seq, interrupts, freq_time_stamp
and freq_count_stamp.

So why do you then need to grow the data structure with 4 more nonsense
fields?

Also, I'm not sure what you need those last things for, you reset the
value to 0 every time you read them, there is no delta to track.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown
  2019-05-28 18:25     ` Liang, Kan
@ 2019-05-29  7:58       ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  7:58 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Tue, May 28, 2019 at 02:25:34PM -0400, Liang, Kan wrote:

> > > +static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
> > > +{
> > > +	bool has_sampling_slots = false, has_metrics = false;
> > > +	struct perf_event *e;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < n; i++) {
> > > +		e = cpuc->event_list[i];
> > > +		if (is_slots_event(e) && is_sampling_event(e))
> > > +			has_sampling_slots = true;
> > > +
> > > +		if (is_perf_metrics_event(e))
> > > +			has_metrics = true;
> > > +	}
> > > +	if (unlikely(has_sampling_slots && has_metrics))
> > > +		return -EINVAL;
> > > +	return 0;
> > > +}
> > 
> > Why this special hack, why not disallow sampling on SLOTS on creation?
> 
> You mean unconditionally disable SLOTS sampling?
> 
> The SLOTS doesn't have to be with Topdown metrics event.
> I think users may want to only sampling slot events. We should allow this
> usage.

Given the trainwreck these patches are, none of that is clear.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
  2019-05-28 12:05   ` Peter Zijlstra
  2019-05-28 12:15   ` Peter Zijlstra
@ 2019-05-29  8:14   ` Peter Zijlstra
  2 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29  8:14 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak


On Tue, May 28, 2019 at 02:20:53PM -0400, Liang, Kan wrote:
> On 5/28/2019 8:05 AM, Peter Zijlstra wrote:
> > On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:

> @@ -2155,9 +2155,19 @@ static void intel_pmu_disable_event(struct perf_event *event)
>  		return;
>  	}
>  
> -	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
> -	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
> -	cpuc->intel_cp_status &= ~(1ull << hwc->idx);
> +	__clear_bit(hwc->idx, cpuc->enabled_events);
> +
> +	/*
> +	 * When any other slots sharing event is still enabled,
> +	 * cancel the disabling.
> +	 */
> +	if (is_any_slots_idx(hwc->idx) &&
> +	    (*(u64 *)&cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS))
> +		return;
> +
> +	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->reg_idx);
> +	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->reg_idx);
> +	cpuc->intel_cp_status &= ~(1ull << hwc->reg_idx);
>  
>  	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
>  		intel_pmu_disable_fixed(hwc);

> @@ -2242,18 +2252,19 @@ static void intel_pmu_enable_event(struct perf_event *event)
>  	}
>  
>  	if (event->attr.exclude_host)
> -		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx);
> +		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->reg_idx);
>  	if (event->attr.exclude_guest)
> -		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
> +		cpuc->intel_ctrl_host_mask |= (1ull << hwc->reg_idx);
>  
>  	if (unlikely(event_is_checkpointed(event)))
> -		cpuc->intel_cp_status |= (1ull << hwc->idx);
> +		cpuc->intel_cp_status |= (1ull << hwc->reg_idx);
>  
>  	if (unlikely(event->attr.precise_ip))
>  		intel_pmu_pebs_enable(event);
>  
>  	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
> -		intel_pmu_enable_fixed(event);
> +		if (!__test_and_set_bit(hwc->idx, cpuc->enabled_events))
> +			intel_pmu_enable_fixed(event);
>  		return;
>  	}
>  
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 7ae2912f16de..dd6c86a758f7 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -203,6 +203,7 @@ struct cpu_hw_events {
>  	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  	unsigned long		running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  	int			enabled;
> +	unsigned long		enabled_events[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  
>  	int			n_events; /* the # of events in the below arrays */
>  	int			n_added;  /* the # last events in the below arrays;

> > Also, why do we need that whole enabled_events[] array. Do we really not
> > have that information elsewhere?
> 
> No. We don't have a case that several events share a counter at the same
> time. We don't need to check if other events are enabled when we try to
> disable a counter. So we don't save such information.
> But we have to do it for metrics events.

So you have x86_pmu.disable() clear the bit, and x86_pmu.enable() set
the bit, and then, if you look at arch/x86/events/core.c that doesn't
look redundant?

That is, explain to me how exactly this new enabled_events[] is different
from active_mask[].

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-29  7:28       ` Peter Zijlstra
@ 2019-05-29 14:40         ` Liang, Kan
  2019-05-29 16:46           ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-29 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/29/2019 3:28 AM, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 02:21:49PM -0400, Liang, Kan wrote:
>>
>>
>> On 5/28/2019 8:15 AM, Peter Zijlstra wrote:
>>> On Tue, May 21, 2019 at 02:40:48PM -0700, kan.liang@linux.intel.com wrote:
>>>> +/*
>>>> + * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
>>>> + * and another for the whole slots counter
>>>> + *
>>>> + * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
>>>> + * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
>>>> + * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
>>>> + * have the same configuration.
>>>> + */
>>>> +#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
>>>> +#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
>>>> +#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
>>>> +#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
>>>> +#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
>>>> +#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
>>>> +						 INTEL_PMC_MSK_FIXED_SLOTS)
>>>> +static inline bool is_metric_idx(int idx)
>>>> +{
>>>> +	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
>>>> +}
>>>
>>> Something like:
>>>
>>> 	return (idx >> INTEL_PMC_IDX_FIXED_METRIC_BASE) & 0xf;
>>>
>>> might be faster code... (if it wasn't for 64bit literals being a pain,
>>> it could be a simple test instruction).
>>>
>>
>> is_metric_idx() is not a mask. It's to check if the idx between 49 and 52.
> 
> blergh, indeed. Check that it compiles into a single branch though, if
> it gets that wrong it needs hand holding.
> 
> One way would be to make these 48-51 and put BTS as 52, and then you do

Can we put BTS 47?
Now we only support L1 Topdown metrics. Probably there will be L2 
Topdown metrics. I think we need some space to extend.


Thanks,
Kan

> have a mask.
> 
> Another way would be to write it as:
> 
> 	(unsigned)(idx - INTEL_PMC_IDX_FIXED_METRIC_BASE) < 4;
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-29  7:34       ` Peter Zijlstra
@ 2019-05-29 14:41         ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2019-05-29 14:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/29/2019 3:34 AM, Peter Zijlstra wrote:
>>>> +			wrmsrl(MSR_PERF_METRICS, 0);
>>>> +			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
>>> I don't get this, overflow happens on when we flip sign, so why is
>>> programming 0 a sane thing to do?
>> Reset the counters (programming 0) don't trigger overflow.
> Right, so why then do you allow creating this thing as
> is_sampling_event() ?

Perf metrics event doesn't support sampling.
SLOTs/FIXED3 event + Perf metrics event don't support sampling either.

Only the case SLOTs/FIXED3 event without Perf metrics events support 
sampling. But the case doesn't reach this code path. It is handled by 
the normal routing, like other sampling events.

So there is no sampling event in this code path.

> 
>> We have to reset both registers for each read to avoid the known
>> PERF_METRICS issue.
> 'the known PERF_METRICS issue' is unknown to me and any other reader.
> 
>>>> +	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
>>>> +	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;
>>> How is that cpuc->last_* crap not broken for NMIs ?
>> There should be no NMI for slots or metric events at the moment, because the
>> MSR_PERF_METRICS and MSR_CORE_PERF_FIXED_CTR3 are reset in first read.
>> Other NMIs will not touch the codes here.
> What happens if someone does: read(perf_fd) and then has the NMI hit?

Right, it looks like there is a corner case we cannot handle. The NMI 
hit can happen right after rdpmcl, but before reset.

My current thought is to disable the METRICS and SLOTs counter for first 
read. I will think about it.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-29  7:54       ` Peter Zijlstra
@ 2019-05-29 14:42         ` Liang, Kan
  2019-05-29 16:58           ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Liang, Kan @ 2019-05-29 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/29/2019 3:54 AM, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 02:24:38PM -0400, Liang, Kan wrote:
>>
>>
>> On 5/28/2019 9:43 AM, Peter Zijlstra wrote:
>>> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>>>> @@ -3287,6 +3304,13 @@ static int core_pmu_hw_config(struct perf_event *event)
>>>>    	return intel_pmu_bts_config(event);
>>>>    }
>>>> +#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
>>>> +#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
>>>> +#define is_perf_metrics_event(e)				\
>>>> +		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
>>>> +		 (EVENT_CODE(e) >= 0x01ff) &&			\
>>>> +		 (EVENT_CODE(e) <= 0x04ff))
>>>> +
>>>
>>> That is horrific.. (e & INTEL_ARCH_EVENT_MASK) & 0xff is just daft,
>>> that should be: (e & ARCH_PERFMON_EVENTSEL_EVENT).
>>>
>>> Also, we already have fake events for event=0, see FIXED2, why are we
>>> now using event=0xff ?
>>
>> I think event=0 is for genuine fixed events. Metrics events are fake events.
>> I didn't find FIXED2 in the code. Could you please share more hints?
> 
> cd09c0c40a97 ("perf events: Enable raw event support for Intel unhalted_reference_cycles event")
> 
> We used the fake event=0x00, umask=0x03 for CPU_CLK_UNHALTED.REF_TSC,
> because that was not available as a generic event, *until now* it seems.
> I see ICL actually has it as a generic event, which means we need to fix
> up the constraint mask for that differently.
>

There is no change for REF_TSC on ICL.

> But note that for all previous uarchs this event did not in fact exist.
> 
> It appears the TOPDOWN.SLOTS thing, which is available in in FIXED3 is
> event=0x00, umask=0x04, is indeed a generic event too.

The SLOTS do have a generic event, TOPDOWN.SLOTS_P, event=0xA4, umask=0x1.

I think we need a fix as below for ICL, so the SLOT event can be 
extended to generic event.
-	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	FIXED_EVENT_CONSTRAINT(0x01a4, 3),	/* TOPDOWN.SLOTS */

I will send a separate patch for it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-29  7:57       ` Peter Zijlstra
@ 2019-05-29 14:42         ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2019-05-29 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/29/2019 3:57 AM, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 02:24:56PM -0400, Liang, Kan wrote:
>>
>>
>> On 5/28/2019 9:48 AM, Peter Zijlstra wrote:
>>> On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@linux.intel.com wrote:
>>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>>> index b980b9e95d2a..0d7081434d1d 100644
>>>> --- a/include/linux/perf_event.h
>>>> +++ b/include/linux/perf_event.h
>>>> @@ -133,6 +133,11 @@ struct hw_perf_event {
>>>>    			struct hw_perf_event_extra extra_reg;
>>>>    			struct hw_perf_event_extra branch_reg;
>>>> +
>>>> +			u64		saved_metric;
>>>> +			u64		saved_slots;
>>>> +			u64		last_slots;
>>>> +			u64		last_metric;
>>>
>>> This is really sad, and I'm thinking much of that really isn't needed
>>> anyway, due to how you're not using some of the other fields.
>>
>> If we don't cache the value, we have to update all metrics events when
>> reading any metrics event. I think that could bring high overhead.
> 
> Since you don't support sampling, or even 'normal' functionality on this
> FIXED3/SLOTS thing, you'll not use prev_count, sample_period,
> last_period, period_left, interrupts_seq, interrupts, freq_time_stamp
> and freq_count_stamp.
> 
> So why do you then need to grow the data structure with 4 more nonsense
> fields?
> 
> Also, I'm not sure what you need those last things for, you reset the
> value to 0 every time you read them, there is no delta to track.
> 

The 'last things' are only for per-thread Topdown RDPMC.
We have to save/restore slots and metrics value in context switch.
It's used to calculate the delta between thread shced in and current 
read/sched out.


I will split the patch into several patches, and try to make things clear.

Thanks,
Kan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/9] perf/x86/intel: Basic support for metrics counters
  2019-05-29 14:40         ` Liang, Kan
@ 2019-05-29 16:46           ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29 16:46 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Wed, May 29, 2019 at 10:40:52AM -0400, Liang, Kan wrote:
> On 5/29/2019 3:28 AM, Peter Zijlstra wrote:
> > One way would be to make these 48-51 and put BTS as 52, and then you do
> 
> Can we put BTS 47?

Sure, it doesn't matter where you put it. It just needs to be a free bit
position, the rest doesn't matter.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-29 14:42         ` Liang, Kan
@ 2019-05-29 16:58           ` Peter Zijlstra
  2019-06-04 20:39             ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2019-05-29 16:58 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Wed, May 29, 2019 at 10:42:10AM -0400, Liang, Kan wrote:
> On 5/29/2019 3:54 AM, Peter Zijlstra wrote:

> > cd09c0c40a97 ("perf events: Enable raw event support for Intel unhalted_reference_cycles event")
> > 
> > We used the fake event=0x00, umask=0x03 for CPU_CLK_UNHALTED.REF_TSC,
> > because that was not available as a generic event, *until now* it seems.
> > I see ICL actually has it as a generic event, which means we need to fix
> > up the constraint mask for that differently.
> > 
> 
> There is no change for REF_TSC on ICL.

Well, if I look at the SDM for May'19 (latest afaict), Volume 3, Chapter
19.3 'Performance Monitoring Events for Future Intel (C) Core(tm)
Processors' the table lists:

 Event Num.	Umask Value	Event Mask Mnemonic

 00H		03H		CPU_CLK_UNHALTED.REF_TSC

as a generic event, without constraints, unlike any of the preceding
uarchs, where that event was not available except through FIXED2.

That is most certainly a change.

> > But note that for all previous uarchs this event did not in fact exist.
> > 
> > It appears the TOPDOWN.SLOTS thing, which is available in in FIXED3 is
> > event=0x00, umask=0x04, is indeed a generic event too.
> 
> The SLOTS do have a generic event, TOPDOWN.SLOTS_P, event=0xA4, umask=0x1.
> 
> I think we need a fix as below for ICL, so the SLOT event can be extended to
> generic event.
> -	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
> +	FIXED_EVENT_CONSTRAINT(0x01a4, 3),	/* TOPDOWN.SLOTS */

Then WTH is that 00H, 04H event listed in the table? Note the distinct
lack of 'Fixed Counter' or any other contraints in the 'Comments'
column.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics
  2019-05-29 16:58           ` Peter Zijlstra
@ 2019-06-04 20:39             ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2019-06-04 20:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 5/29/2019 12:58 PM, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 10:42:10AM -0400, Liang, Kan wrote:
>> On 5/29/2019 3:54 AM, Peter Zijlstra wrote:
> 
>>> cd09c0c40a97 ("perf events: Enable raw event support for Intel unhalted_reference_cycles event")
>>>
>>> We used the fake event=0x00, umask=0x03 for CPU_CLK_UNHALTED.REF_TSC,
>>> because that was not available as a generic event, *until now* it seems.
>>> I see ICL actually has it as a generic event, which means we need to fix
>>> up the constraint mask for that differently.
>>>
>>
>> There is no change for REF_TSC on ICL.
> 
> Well, if I look at the SDM for May'19 (latest afaict), Volume 3, Chapter
> 19.3 'Performance Monitoring Events for Future Intel (C) Core(tm)
> Processors' the table lists:
> 
>   Event Num.	Umask Value	Event Mask Mnemonic
> 
>   00H		03H		CPU_CLK_UNHALTED.REF_TSC
> 
> as a generic event, without constraints, unlike any of the preceding
> uarchs, where that event was not available except through FIXED2.
> 
> That is most certainly a change.

I checked with our internal team. They confirmed that there is no change 
for REF_TSC on ICL.
They will fix the comment in the next SDM update.
Thanks for bringing this up.

> 
>>> But note that for all previous uarchs this event did not in fact exist.
>>>
>>> It appears the TOPDOWN.SLOTS thing, which is available in in FIXED3 is
>>> event=0x00, umask=0x04, is indeed a generic event too.
>>
>> The SLOTS do have a generic event, TOPDOWN.SLOTS_P, event=0xA4, umask=0x1.
>>
>> I think we need a fix as below for ICL, so the SLOT event can be extended to
>> generic event.
>> -	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
>> +	FIXED_EVENT_CONSTRAINT(0x01a4, 3),	/* TOPDOWN.SLOTS */
> 
> Then WTH is that 00H, 04H event listed in the table? Note the distinct
> lack of 'Fixed Counter' or any other contraints in the 'Comments'
> column.
>

TOPDOWN.SLOTS(0x0400) is only available on FIXED3. It's not a generic 
event. The equivalent event for GP counters is TOPDOWN.SLOTS_P (0x01a4).
But it's not architectural event.
So I think the best way is to force TOPDOWN.SLOTS(0x0400) only in 
FIXED3. The patch as below will do so.


 From 22e3ed25340e4f46685a059cf2184747a3e02a47 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Tue, 4 Jun 2019 10:36:08 -0700
Subject: [PATCH] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS

TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on
fixed counter3.

Don't extend its mask to generic counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
  arch/x86/events/intel/core.c      | 6 ++++--
  arch/x86/include/asm/perf_event.h | 5 +++++
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4377bf6a6f82..f30d02830921 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5066,12 +5066,14 @@ __init int intel_pmu_init(void)

  	if (x86_pmu.event_constraints) {
  		/*
-		 * event on fixed counter2 (REF_CYCLES) only works on this
+		 * event on fixed counter2 (REF_CYCLES) and
+		 * fixed counter3 (TOPDOWN.SLOTS) only work on this
  		 * counter, so do not extend mask to generic counters
  		 */
  		for_each_event_constraint(c, x86_pmu.event_constraints) {
  			if (c->cmask == FIXED_EVENT_FLAGS
-			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
+			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES
+			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_SLOTS) {
  				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
  			}
  			c->idxmsk64 &=
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 1392d5e6e8d6..457d35a75ad3 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -167,6 +167,11 @@ struct x86_pmu_capability {
  #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
  #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << 
INTEL_PMC_IDX_FIXED_REF_CYCLES)

+/* TOPDOWN.SLOTS: */
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
  /*
   * We model BTS tracing as another fixed-mode PMC.
   *
-- 
2.14.5

Thanks,
Kan




^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-06-04 20:39 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21 21:40 [PATCH 0/9] TopDown metrics support for Icelake kan.liang
2019-05-21 21:40 ` [PATCH 1/9] perf/core: Support a REMOVE transaction kan.liang
2019-05-21 21:40 ` [PATCH 2/9] perf/x86/intel: Basic support for metrics counters kan.liang
2019-05-28 12:05   ` Peter Zijlstra
2019-05-28 18:20     ` Liang, Kan
2019-05-28 12:15   ` Peter Zijlstra
2019-05-28 18:21     ` Liang, Kan
2019-05-29  7:28       ` Peter Zijlstra
2019-05-29 14:40         ` Liang, Kan
2019-05-29 16:46           ` Peter Zijlstra
2019-05-29  8:14   ` Peter Zijlstra
2019-05-21 21:40 ` [PATCH 3/9] perf/x86/intel: Support overflows on SLOTS kan.liang
2019-05-28 12:20   ` Peter Zijlstra
2019-05-28 18:22     ` Liang, Kan
2019-05-21 21:40 ` [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics kan.liang
2019-05-28 12:43   ` Peter Zijlstra
2019-05-28 18:23     ` Liang, Kan
2019-05-29  7:30       ` Peter Zijlstra
2019-05-28 12:53   ` Peter Zijlstra
2019-05-28 12:56   ` Peter Zijlstra
2019-05-28 13:32     ` Peter Zijlstra
2019-05-28 13:30   ` Peter Zijlstra
2019-05-28 18:24     ` Liang, Kan
2019-05-29  7:34       ` Peter Zijlstra
2019-05-29 14:41         ` Liang, Kan
2019-05-28 13:43   ` Peter Zijlstra
2019-05-28 18:24     ` Liang, Kan
2019-05-29  7:54       ` Peter Zijlstra
2019-05-29 14:42         ` Liang, Kan
2019-05-29 16:58           ` Peter Zijlstra
2019-06-04 20:39             ` Liang, Kan
2019-05-28 13:48   ` Peter Zijlstra
2019-05-28 18:24     ` Liang, Kan
2019-05-29  7:57       ` Peter Zijlstra
2019-05-29 14:42         ` Liang, Kan
2019-05-21 21:40 ` [PATCH 5/9] perf/x86/intel: Set correct weight for TopDown metrics events kan.liang
2019-05-28 13:50   ` Peter Zijlstra
2019-05-21 21:40 ` [PATCH 6/9] perf/x86/intel: Export new TopDown metrics events for Icelake kan.liang
2019-05-21 21:40 ` [PATCH 7/9] perf/x86/intel: Disable sampling read slots and topdown kan.liang
2019-05-28 13:52   ` Peter Zijlstra
2019-05-28 18:25     ` Liang, Kan
2019-05-29  7:58       ` Peter Zijlstra
2019-05-21 21:40 ` [PATCH 8/9] perf, tools, stat: Support new per thread TopDown metrics kan.liang
2019-05-21 21:40 ` [PATCH 9/9] perf, tools: Add documentation for topdown metrics kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).