LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V
@ 2018-04-18  2:12 Alan Kao
  2018-04-18  2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao
  2018-04-18  2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao
  0 siblings, 2 replies; 5+ messages in thread
From: Alan Kao @ 2018-04-18  2:12 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv,
	linux-doc, linux-kernel
  Cc: Alan Kao, Greentime Hu, Nick Hu

This implements the baseline PMU for RISC-V platforms.

To ease future PMU portings, a guide is also written, containing
perf concepts, arch porting practices and some hints.

Changes in v4:
 - Fix several compilation errors.  Sorry for that.
 - Raise a warning in the write_counter body.

Changes in v3:
 - Fix typos in the document.
 - Change the initialization routine from statically assigning PMU to
   device-tree-based methods, and set default to the PMU proposed in
   this patch.

Changes in v2:
 - Fix the bug reported by Alex, which was caused by not sufficient
   initialization.  Check https://lkml.org/lkml/2018/3/31/251 for the
   discussion.

Alan Kao (2):
  perf: riscv: preliminary RISC-V support
  perf: riscv: Add Document for Future Porting Guide

 Documentation/riscv/pmu.txt         | 249 ++++++++++++++
 arch/riscv/Kconfig                  |  13 +
 arch/riscv/include/asm/perf_event.h |  79 ++++-
 arch/riscv/kernel/Makefile          |   1 +
 arch/riscv/kernel/perf_event.c      | 482 ++++++++++++++++++++++++++++
 5 files changed, 820 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/riscv/pmu.txt
 create mode 100644 arch/riscv/kernel/perf_event.c

-- 
2.17.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] perf: riscv: preliminary RISC-V support
  2018-04-18  2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao
@ 2018-04-18  2:12 ` Alan Kao
  2018-04-19 19:46   ` Atish Patra
  2018-04-18  2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao
  1 sibling, 1 reply; 5+ messages in thread
From: Alan Kao @ 2018-04-18  2:12 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv,
	linux-doc, linux-kernel
  Cc: Alan Kao, Nick Hu, Greentime Hu

This patch provide a basic PMU, riscv_base_pmu, which supports two
general hardware event, instructions and cycles.  Furthermore, this
PMU serves as a reference implementation to ease the portings in
the future.

riscv_base_pmu should be able to run on any RISC-V machine that
conforms to the Priv-Spec.  Note that the latest qemu model hasn't
fully support a proper behavior of Priv-Spec 1.10 yet, but work
around should be easy with very small fixes.  Please check
https://github.com/riscv/riscv-qemu/pull/115 for future updates.

Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Signed-off-by: Alan Kao <alankao@andestech.com>
---
 arch/riscv/Kconfig                  |  13 +
 arch/riscv/include/asm/perf_event.h |  79 ++++-
 arch/riscv/kernel/Makefile          |   1 +
 arch/riscv/kernel/perf_event.c      | 482 ++++++++++++++++++++++++++++
 4 files changed, 571 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/kernel/perf_event.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c22ebe08e902..90d9c8e50377 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -203,6 +203,19 @@ config RISCV_ISA_C
 config RISCV_ISA_A
 	def_bool y
 
+menu "supported PMU type"
+	depends on PERF_EVENTS
+
+config RISCV_BASE_PMU
+	bool "Base Performance Monitoring Unit"
+	def_bool y
+	help
+	  A base PMU that serves as a reference implementation and has limited
+	  feature of perf.  It can run on any RISC-V machines so serves as the
+	  fallback, but this option can also be disable to reduce kernel size.
+
+endmenu
+
 endmenu
 
 menu "Kernel type"
diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h
index e13d2ff29e83..0e638a0c3feb 100644
--- a/arch/riscv/include/asm/perf_event.h
+++ b/arch/riscv/include/asm/perf_event.h
@@ -1,13 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2018 SiFive
+ * Copyright (C) 2018 Andes Technology Corporation
  *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
  */
 
 #ifndef _ASM_RISCV_PERF_EVENT_H
 #define _ASM_RISCV_PERF_EVENT_H
 
+#include <linux/perf_event.h>
+#include <linux/ptrace.h>
+
+#define RISCV_BASE_COUNTERS	2
+
+/*
+ * The RISCV_MAX_COUNTERS parameter should be specified.
+ */
+
+#ifdef CONFIG_RISCV_BASE_PMU
+#define RISCV_MAX_COUNTERS	2
+#endif
+
+#ifndef RISCV_MAX_COUNTERS
+#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
+#endif
+
+/*
+ * These are the indexes of bits in counteren register *minus* 1,
+ * except for cycle.  It would be coherent if it can directly mapped
+ * to counteren bit definition, but there is a *time* register at
+ * counteren[1].  Per-cpu structure is scarce resource here.
+ *
+ * According to the spec, an implementation can support counter up to
+ * mhpmcounter31, but many high-end processors has at most 6 general
+ * PMCs, we give the definition to MHPMCOUNTER8 here.
+ */
+#define RISCV_PMU_CYCLE		0
+#define RISCV_PMU_INSTRET	1
+#define RISCV_PMU_MHPMCOUNTER3	2
+#define RISCV_PMU_MHPMCOUNTER4	3
+#define RISCV_PMU_MHPMCOUNTER5	4
+#define RISCV_PMU_MHPMCOUNTER6	5
+#define RISCV_PMU_MHPMCOUNTER7	6
+#define RISCV_PMU_MHPMCOUNTER8	7
+
+#define RISCV_OP_UNSUPP		(-EOPNOTSUPP)
+
+struct cpu_hw_events {
+	/* # currently enabled events*/
+	int			n_events;
+	/* currently enabled events */
+	struct perf_event	*events[RISCV_MAX_COUNTERS];
+	/* vendor-defined PMU data */
+	void			*platform;
+};
+
+struct riscv_pmu {
+	struct pmu	*pmu;
+
+	/* generic hw/cache events table */
+	const int	*hw_events;
+	const int	(*cache_events)[PERF_COUNT_HW_CACHE_MAX]
+				       [PERF_COUNT_HW_CACHE_OP_MAX]
+				       [PERF_COUNT_HW_CACHE_RESULT_MAX];
+	/* method used to map hw/cache events */
+	int		(*map_hw_event)(u64 config);
+	int		(*map_cache_event)(u64 config);
+
+	/* max generic hw events in map */
+	int		max_events;
+	/* number total counters, 2(base) + x(general) */
+	int		num_counters;
+	/* the width of the counter */
+	int		counter_width;
+
+	/* vendor-defined PMU features */
+	void		*platform;
+
+	irqreturn_t	(*handle_irq)(int irq_num, void *dev);
+	int		irq;
+};
+
 #endif /* _ASM_RISCV_PERF_EVENT_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index ffa439d4a364..f50d19816757 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
 obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)	+= ftrace.o
+obj-$(CONFIG_PERF_EVENTS)      += perf_event.o
 
 clean:
diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c
new file mode 100644
index 000000000000..ba3192afc470
--- /dev/null
+++ b/arch/riscv/kernel/perf_event.c
@@ -0,0 +1,482 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
+ * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
+ * Copyright (C) 2009 Jaswinder Singh Rajput
+ * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
+ * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
+ * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com>
+ * Copyright (C) 2009 Google, Inc., Stephane Eranian
+ * Copyright 2014 Tilera Corporation. All Rights Reserved.
+ * Copyright (C) 2018 Andes Technology Corporation
+ *
+ * Perf_events support for RISC-V platforms.
+ *
+ * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
+ * functionality for perf event to fully work, this file provides
+ * the very basic framework only.
+ *
+ * For platform portings, please check Documentations/riscv/pmu.txt.
+ *
+ * The Copyright line includes x86 and tile ones.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/kernel.h>
+#include <linux/kdebug.h>
+#include <linux/mutex.h>
+#include <linux/bitmap.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/perf_event.h>
+#include <linux/atomic.h>
+#include <linux/of.h>
+#include <asm/perf_event.h>
+
+static const struct riscv_pmu *riscv_pmu __read_mostly;
+static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
+
+/*
+ * Hardware & cache maps and their methods
+ */
+
+static const int riscv_hw_event_map[] = {
+	[PERF_COUNT_HW_CPU_CYCLES]		= RISCV_PMU_CYCLE,
+	[PERF_COUNT_HW_INSTRUCTIONS]		= RISCV_PMU_INSTRET,
+	[PERF_COUNT_HW_CACHE_REFERENCES]	= RISCV_OP_UNSUPP,
+	[PERF_COUNT_HW_CACHE_MISSES]		= RISCV_OP_UNSUPP,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= RISCV_OP_UNSUPP,
+	[PERF_COUNT_HW_BRANCH_MISSES]		= RISCV_OP_UNSUPP,
+	[PERF_COUNT_HW_BUS_CYCLES]		= RISCV_OP_UNSUPP,
+};
+
+#define C(x) PERF_COUNT_HW_CACHE_##x
+static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
+[PERF_COUNT_HW_CACHE_OP_MAX]
+[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+	[C(L1D)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+	[C(L1I)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+	[C(LL)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+	[C(DTLB)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] =  RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] =  RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+	[C(ITLB)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+	[C(BPU)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+		},
+	},
+};
+
+static int riscv_map_hw_event(u64 config)
+{
+	if (config >= riscv_pmu->max_events)
+		return -EINVAL;
+
+	return riscv_pmu->hw_events[config];
+}
+
+int riscv_map_cache_decode(u64 config, unsigned int *type,
+			   unsigned int *op, unsigned int *result)
+{
+	return -ENOENT;
+}
+
+static int riscv_map_cache_event(u64 config)
+{
+	unsigned int type, op, result;
+	int err = -ENOENT;
+		int code;
+
+	err = riscv_map_cache_decode(config, &type, &op, &result);
+	if (!riscv_pmu->cache_events || err)
+		return err;
+
+	if (type >= PERF_COUNT_HW_CACHE_MAX ||
+	    op >= PERF_COUNT_HW_CACHE_OP_MAX ||
+	    result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+		return -EINVAL;
+
+	code = (*riscv_pmu->cache_events)[type][op][result];
+	if (code == RISCV_OP_UNSUPP)
+		return -EINVAL;
+
+	return code;
+}
+
+/*
+ * Low-level functions: reading/writing counters
+ */
+
+static inline u64 read_counter(int idx)
+{
+	u64 val = 0;
+
+	switch (idx) {
+	case RISCV_PMU_CYCLE:
+		val = csr_read(cycle);
+		break;
+	case RISCV_PMU_INSTRET:
+		val = csr_read(instret);
+		break;
+	default:
+		WARN_ON_ONCE(idx < 0 ||	idx > RISCV_MAX_COUNTERS);
+		return -EINVAL;
+	}
+
+	return val;
+}
+
+static inline void write_counter(int idx, u64 value)
+{
+	/* currently not supported */
+	WARN_ON_ONCE(1);
+}
+
+/*
+ * pmu->read: read and update the counter
+ *
+ * Other architectures' implementation often have a xxx_perf_event_update
+ * routine, which can return counter values when called in the IRQ, but
+ * return void when being called by the pmu->read method.
+ */
+static void riscv_pmu_read(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 prev_raw_count, new_raw_count;
+	u64 oldval;
+	int idx = hwc->idx;
+	u64 delta;
+
+	do {
+		prev_raw_count = local64_read(&hwc->prev_count);
+		new_raw_count = read_counter(idx);
+
+		oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+					 new_raw_count);
+	} while (oldval != prev_raw_count);
+
+	/*
+	 * delta is the value to update the counter we maintain in the kernel.
+	 */
+	delta = (new_raw_count - prev_raw_count) &
+		((1ULL << riscv_pmu->counter_width) - 1);
+	local64_add(delta, &event->count);
+	/*
+	 * Something like local64_sub(delta, &hwc->period_left) here is
+	 * needed if there is an interrupt for perf.
+	 */
+}
+
+/*
+ * State transition functions:
+ *
+ * stop()/start() & add()/del()
+ */
+
+/*
+ * pmu->stop: stop the counter
+ */
+static void riscv_pmu_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+	hwc->state |= PERF_HES_STOPPED;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		riscv_pmu->pmu->read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+/*
+ * pmu->start: start the event.
+ */
+static void riscv_pmu_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+		return;
+
+	if (flags & PERF_EF_RELOAD) {
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
+		/*
+		 * Set the counter to the period to the next interrupt here,
+		 * if you have any.
+		 */
+	}
+
+	hwc->state = 0;
+	perf_event_update_userpage(event);
+
+	/*
+	 * Since we cannot write to counters, this serves as an initialization
+	 * to the delta-mechanism in pmu->read(); otherwise, the delta would be
+	 * wrong when pmu->read is called for the first time.
+	 */
+	local64_set(&hwc->prev_count, read_counter(hwc->idx));
+}
+
+/*
+ * pmu->add: add the event to PMU.
+ */
+static int riscv_pmu_add(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (cpuc->n_events == riscv_pmu->num_counters)
+		return -ENOSPC;
+
+	/*
+	 * We don't have general conunters, so no binding-event-to-counter
+	 * process here.
+	 *
+	 * Indexing using hwc->config generally not works, since config may
+	 * contain extra information, but here the only info we have in
+	 * hwc->config is the event index.
+	 */
+	hwc->idx = hwc->config;
+	cpuc->events[hwc->idx] = event;
+	cpuc->n_events++;
+
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	if (flags & PERF_EF_START)
+		riscv_pmu->pmu->start(event, PERF_EF_RELOAD);
+
+	return 0;
+}
+
+/*
+ * pmu->del: delete the event from PMU.
+ */
+static void riscv_pmu_del(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+
+	cpuc->events[hwc->idx] = NULL;
+	cpuc->n_events--;
+	riscv_pmu->pmu->stop(event, PERF_EF_UPDATE);
+	perf_event_update_userpage(event);
+}
+
+/*
+ * Interrupt: a skeletion for reference.
+ */
+
+static DEFINE_MUTEX(pmc_reserve_mutex);
+
+irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev)
+{
+	return IRQ_NONE;
+}
+
+static int reserve_pmc_hardware(void)
+{
+	int err = 0;
+
+	mutex_lock(&pmc_reserve_mutex);
+	if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) {
+		err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq,
+				  IRQF_PERCPU, "riscv-base-perf", NULL);
+	}
+	mutex_unlock(&pmc_reserve_mutex);
+
+	return err;
+}
+
+void release_pmc_hardware(void)
+{
+	mutex_lock(&pmc_reserve_mutex);
+	if (riscv_pmu->irq >=0) {
+		free_irq(riscv_pmu->irq, NULL);
+	}
+	mutex_unlock(&pmc_reserve_mutex);
+}
+
+/*
+ * Event Initialization/Finalization
+ */
+
+static atomic_t riscv_active_events = ATOMIC_INIT(0);
+
+static void riscv_event_destroy(struct perf_event *event)
+{
+	if (atomic_dec_return(&riscv_active_events) == 0)
+		release_pmc_hardware();
+}
+
+static int riscv_event_init(struct perf_event *event)
+{
+	struct perf_event_attr *attr = &event->attr;
+	struct hw_perf_event *hwc = &event->hw;
+	int err;
+	int code;
+
+	if (atomic_inc_return(&riscv_active_events) == 1) {
+		err = reserve_pmc_hardware();
+
+		if (err) {
+			pr_warn("PMC hardware not available\n");
+			atomic_dec(&riscv_active_events);
+			return -EBUSY;
+		}
+	}
+
+	switch (event->attr.type) {
+	case PERF_TYPE_HARDWARE:
+		code = riscv_pmu->map_hw_event(attr->config);
+		break;
+	case PERF_TYPE_HW_CACHE:
+		code = riscv_pmu->map_cache_event(attr->config);
+		break;
+	case PERF_TYPE_RAW:
+		return -EOPNOTSUPP;
+	default:
+		return -ENOENT;
+	}
+
+	event->destroy = riscv_event_destroy;
+	if (code < 0) {
+		event->destroy(event);
+		return code;
+	}
+
+	/*
+	 * idx is set to -1 because the index of a general event should not be
+	 * decided until binding to some counter in pmu->add().
+	 *
+	 * But since we don't have such support, later in pmu->add(), we just
+	 * use hwc->config as the index instead.
+	 */
+	hwc->config = code;
+	hwc->idx = -1;
+
+	return 0;
+}
+
+/*
+ * Initialization
+ */
+
+static struct pmu min_pmu = {
+	.name		= "riscv-base",
+	.event_init	= riscv_event_init,
+	.add		= riscv_pmu_add,
+	.del		= riscv_pmu_del,
+	.start		= riscv_pmu_start,
+	.stop		= riscv_pmu_stop,
+	.read		= riscv_pmu_read,
+};
+
+static const struct riscv_pmu riscv_base_pmu = {
+	.pmu = &min_pmu,
+	.max_events = ARRAY_SIZE(riscv_hw_event_map),
+	.map_hw_event = riscv_map_hw_event,
+	.hw_events = riscv_hw_event_map,
+	.map_cache_event = riscv_map_cache_event,
+	.cache_events = &riscv_cache_event_map,
+	.counter_width = 63,
+	.num_counters = RISCV_BASE_COUNTERS + 0,
+	.handle_irq = &riscv_base_pmu_handle_irq,
+
+	/* This means this PMU has no IRQ. */
+	.irq = -1,
+};
+
+static const struct of_device_id riscv_pmu_of_ids[] = {
+	{.compatible = "riscv,base-pmu",	.data = &riscv_base_pmu},
+	{ /* sentinel value */ }
+};
+
+int __init init_hw_perf_events(void)
+{
+	struct device_node *node = of_find_node_by_type(NULL, "pmu");
+	const struct of_device_id *of_id;
+	
+	if (node && (of_id = of_match_node(riscv_pmu_of_ids, node)))
+		riscv_pmu = of_id->data;
+	else
+		riscv_pmu = &riscv_base_pmu;
+
+	perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW);
+	return 0;
+}
+arch_initcall(init_hw_perf_events);
-- 
2.17.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide
  2018-04-18  2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao
  2018-04-18  2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao
@ 2018-04-18  2:12 ` Alan Kao
  1 sibling, 0 replies; 5+ messages in thread
From: Alan Kao @ 2018-04-18  2:12 UTC (permalink / raw)
  To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv,
	linux-doc, linux-kernel
  Cc: Alan Kao, Nick Hu, Greentime Hu

Reviewed-by: Alex Solomatnikov <sols@sifive.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Signed-off-by: Alan Kao <alankao@andestech.com>
---
 Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++
 1 file changed, 249 insertions(+)
 create mode 100644 Documentation/riscv/pmu.txt

diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
new file mode 100644
index 000000000000..b29f03a6d82f
--- /dev/null
+++ b/Documentation/riscv/pmu.txt
@@ -0,0 +1,249 @@
+Supporting PMUs on RISC-V platforms
+==========================================
+Alan Kao <alankao@andestech.com>, Mar 2018
+
+Introduction
+------------
+
+As of this writing, perf_event-related features mentioned in The RISC-V ISA
+Privileged Version 1.10 are as follows:
+(please check the manual for more details)
+
+* [m|s]counteren
+* mcycle[h], cycle[h]
+* minstret[h], instret[h]
+* mhpeventx, mhpcounterx[h]
+
+With such function set only, porting perf would require a lot of work, due to
+the lack of the following general architectural performance monitoring features:
+
+* Enabling/Disabling counters
+  Counters are just free-running all the time in our case.
+* Interrupt caused by counter overflow
+  No such feature in the spec.
+* Interrupt indicator
+  It is not possible to have many interrupt ports for all counters, so an
+  interrupt indicator is required for software to tell which counter has
+  just overflowed.
+* Writing to counters
+  There will be an SBI to support this since the kernel cannot modify the
+  counters [1].  Alternatively, some vendor considers to implement
+  hardware-extension for M-S-U model machines to write counters directly.
+
+This document aims to provide developers a quick guide on supporting their
+PMUs in the kernel.  The following sections briefly explain perf' mechanism
+and todos.
+
+You may check previous discussions here [1][2].  Also, it might be helpful
+to check the appendix for related kernel structures.
+
+
+1. Initialization
+-----------------
+
+*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
+various methods according to perf's internal convention and PMU-specific
+parameters.  One should declare such instance to represent the PMU.  By default,
+*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
+basic support to a baseline QEMU model.
+
+Then he/she can either assign the instance's pointer to *riscv_pmu* so that
+the minimal and already-implemented logic can be leveraged, or invent his/her
+own *riscv_init_platform_pmu* implementation.
+
+In other words, existing sources of *riscv_base_pmu* merely provide a
+reference implementation.  Developers can flexibly decide how many parts they
+can leverage, and in the most extreme case, they can customize every function
+according to their needs.
+
+
+2. Event Initialization
+-----------------------
+
+When a user launches a perf command to monitor some events, it is first
+interpreted by the userspace perf tool into multiple *perf_event_open*
+system calls, and then each of them calls to the body of *event_init*
+member function that was assigned in the previous step.  In *riscv_base_pmu*'s
+case, it is *riscv_event_init*.
+
+The main purpose of this function is to translate the event provided by user
+into bitmap, so that HW-related control registers or counters can directly be
+manipulated.  The translation is based on the mappings and methods provided in
+*riscv_pmu*.
+
+Note that some features can be done in this stage as well:
+
+(1) interrupt setting, which is stated in the next section;
+(2) privilege level setting (user space only, kernel space only, both);
+(3) destructor setting.  Normally it is sufficient to apply *riscv_destroy_event*;
+(4) tweaks for non-sampling events, which will be utilized by functions such as
+*perf_adjust_period*, usually something like the follows:
+
+if (!is_sampling_event(event)) {
+        hwc->sample_period = x86_pmu.max_period;
+        hwc->last_period = hwc->sample_period;
+        local64_set(&hwc->period_left, hwc->sample_period);
+}
+
+In the case of *riscv_base_pmu*, only (3) is provided for now.
+
+
+3. Interrupt
+------------
+
+3.1. Interrupt Initialization
+
+This often occurs at the beginning of the *event_init* method. In common
+practice, this should be a code segment like
+
+int x86_reserve_hardware(void)
+{
+        int err = 0;
+
+        if (!atomic_inc_not_zero(&pmc_refcount)) {
+                mutex_lock(&pmc_reserve_mutex);
+                if (atomic_read(&pmc_refcount) == 0) {
+                        if (!reserve_pmc_hardware())
+                                err = -EBUSY;
+                        else
+                                reserve_ds_buffers();
+                }
+                if (!err)
+                        atomic_inc(&pmc_refcount);
+                mutex_unlock(&pmc_reserve_mutex);
+        }
+
+        return err;
+}
+
+And the magic is in *reserve_pmc_hardware*, which usually does atomic
+operations to make implemented IRQ accessible from some global function pointer.
+*release_pmc_hardware* serves the opposite purpose, and it is used in event
+destructors mentioned in previous section.
+
+(Note: From the implementations in all the architectures, the *reserve/release*
+pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
+It does NOT deal with the binding between an event and a physical counter,
+which will be introduced in the next section.)
+
+3.2. IRQ Structure
+
+Basically, a IRQ runs the following pseudo code:
+
+for each hardware counter that triggered this overflow
+
+    get the event of this counter
+
+    // following two steps are defined as *read()*,
+    // check the section Reading/Writing Counters for details.
+    count the delta value since previous interrupt
+    update the event->count (# event occurs) by adding delta, and
+               event->hw.period_left by subtracting delta
+
+    if the event overflows
+        sample data
+        set the counter appropriately for the next overflow
+
+        if the event overflows again
+            too frequently, throttle this event
+        fi
+    fi
+
+end for
+
+However as of this writing, none of the RISC-V implementations have designed an
+interrupt for perf, so the details are to be completed in the future.
+
+4. Reading/Writing Counters
+---------------------------
+
+They seem symmetric but perf treats them quite differently.  For reading, there
+is a *read* interface in *struct pmu*, but it serves more than just reading.
+According to the context, the *read* function not only reads the content of the
+counter (event->count), but also updates the left period to the next interrupt
+(event->hw.period_left).
+
+But the core of perf does not need direct write to counters.  Writing counters
+is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one
+has to set the counter to a good value for the next interrupt; 2) inside the IRQ
+it should set the counter to the same resonable value.
+
+Reading is not a problem in RISC-V but writing would need some effort, since
+counters are not allowed to be written by S-mode.
+
+
+5. add()/del()/start()/stop()
+-----------------------------
+
+Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
+starts/stop the counter of some event in the PMU.  All of them take the same
+arguments: *struct perf_event *event* and *int flag*.
+
+Consider perf as a state machine, then you will find that these functions serve
+as the state transition process between those states.
+Three states (event->hw.state) are defined:
+
+* PERF_HES_STOPPED:	the counter is stopped
+* PERF_HES_UPTODATE:	the event->count is up-to-date
+* PERF_HES_ARCH:	arch-dependent usage ... we don't need this for now
+
+A normal flow of these state transitions are as follows:
+
+* A user launches a perf event, resulting in calling to *event_init*.
+* When being context-switched in, *add* is called by the perf core, with a flag
+  PERF_EF_START, which means that the event should be started after it is added.
+  At this stage, a general event is bound to a physical counter, if any.
+  The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
+  stopped, and the (software) event count does not need updating.
+** *start* is then called, and the counter is enabled.
+   With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check
+   previous section for detail).
+   Nothing is written if the flag does not contain PERF_EF_RELOAD.
+   The state now is reset to none, because it is neither stopped nor updated
+   (the counting already started)
+* When being context-switched out, *del* is called.  It then checks out all the
+  events in the PMU and calls *stop* to update their counts.
+** *stop* is called by *del*
+   and the perf core with flag PERF_EF_UPDATE, and it often shares the same
+   subroutine as *read* with the same logic.
+   The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
+
+** Life cycle of these two pairs: *add* and *del* are called repeatedly as
+  tasks switch in-and-out; *start* and *stop* is also called when the perf core
+  needs a quick stop-and-start, for instance, when the interrupt period is being
+  adjusted.
+
+Current implementation is sufficient for now and can be easily extended to
+features in the future.
+
+A. Related Structures
+---------------------
+
+* struct pmu: include/linux/perf_event.h
+* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
+
+  Both structures are designed to be read-only.
+
+  *struct pmu* defines some function pointer interfaces, and most of them take
+*struct perf_event* as a main argument, dealing with perf events according to
+perf's internal state machine (check kernel/events/core.c for details).
+
+  *struct riscv_pmu* defines PMU-specific parameters.  The naming follows the
+convention of all other architectures.
+
+* struct perf_event: include/linux/perf_event.h
+* struct hw_perf_event
+
+  The generic structure that represents perf events, and the hardware-related
+details.
+
+* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
+
+  The structure that holds the status of events, has two fixed members:
+the number of events and the array of the events.
+
+References
+----------
+
+[1] https://github.com/riscv/riscv-linux/pull/124
+[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
-- 
2.17.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] perf: riscv: preliminary RISC-V support
  2018-04-18  2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao
@ 2018-04-19 19:46   ` Atish Patra
  2018-04-19 23:24     ` Alan Kao
  0 siblings, 1 reply; 5+ messages in thread
From: Atish Patra @ 2018-04-19 19:46 UTC (permalink / raw)
  To: Alan Kao, Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv,
	linux-doc, linux-kernel
  Cc: Nick Hu, Greentime Hu

On 4/17/18 7:13 PM, Alan Kao wrote:
> This patch provide a basic PMU, riscv_base_pmu, which supports two
> general hardware event, instructions and cycles.  Furthermore, this
> PMU serves as a reference implementation to ease the portings in
> the future.
> 
> riscv_base_pmu should be able to run on any RISC-V machine that
> conforms to the Priv-Spec.  Note that the latest qemu model hasn't
> fully support a proper behavior of Priv-Spec 1.10 yet, but work
> around should be easy with very small fixes.  Please check
> https://github.com/riscv/riscv-qemu/pull/115 for future updates.
> 
> Cc: Nick Hu <nickhu@andestech.com>
> Cc: Greentime Hu <greentime@andestech.com>
> Signed-off-by: Alan Kao <alankao@andestech.com>
> ---
>   arch/riscv/Kconfig                  |  13 +
>   arch/riscv/include/asm/perf_event.h |  79 ++++-
>   arch/riscv/kernel/Makefile          |   1 +
>   arch/riscv/kernel/perf_event.c      | 482 ++++++++++++++++++++++++++++
>   4 files changed, 571 insertions(+), 4 deletions(-)
>   create mode 100644 arch/riscv/kernel/perf_event.c
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index c22ebe08e902..90d9c8e50377 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -203,6 +203,19 @@ config RISCV_ISA_C
>   config RISCV_ISA_A
>   	def_bool y
>   
> +menu "supported PMU type"
> +	depends on PERF_EVENTS
> +
> +config RISCV_BASE_PMU
> +	bool "Base Performance Monitoring Unit"
> +	def_bool y
> +	help
> +	  A base PMU that serves as a reference implementation and has limited
> +	  feature of perf.  It can run on any RISC-V machines so serves as the
> +	  fallback, but this option can also be disable to reduce kernel size.
> +
> +endmenu
> +
>   endmenu
>   
>   menu "Kernel type"
> diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h
> index e13d2ff29e83..0e638a0c3feb 100644
> --- a/arch/riscv/include/asm/perf_event.h
> +++ b/arch/riscv/include/asm/perf_event.h
> @@ -1,13 +1,84 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
>   /*
>    * Copyright (C) 2018 SiFive
> + * Copyright (C) 2018 Andes Technology Corporation
>    *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public Licence
> - * as published by the Free Software Foundation; either version
> - * 2 of the Licence, or (at your option) any later version.
>    */
>   
>   #ifndef _ASM_RISCV_PERF_EVENT_H
>   #define _ASM_RISCV_PERF_EVENT_H
>   
> +#include <linux/perf_event.h>
> +#include <linux/ptrace.h>
> +
> +#define RISCV_BASE_COUNTERS	2
> +
> +/*
> + * The RISCV_MAX_COUNTERS parameter should be specified.
> + */
> +
> +#ifdef CONFIG_RISCV_BASE_PMU
> +#define RISCV_MAX_COUNTERS	2
> +#endif
> +
> +#ifndef RISCV_MAX_COUNTERS
> +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
> +#endif
> +
> +/*
> + * These are the indexes of bits in counteren register *minus* 1,
> + * except for cycle.  It would be coherent if it can directly mapped
> + * to counteren bit definition, but there is a *time* register at
> + * counteren[1].  Per-cpu structure is scarce resource here.
> + *
> + * According to the spec, an implementation can support counter up to
> + * mhpmcounter31, but many high-end processors has at most 6 general
> + * PMCs, we give the definition to MHPMCOUNTER8 here.
> + */
> +#define RISCV_PMU_CYCLE		0
> +#define RISCV_PMU_INSTRET	1
> +#define RISCV_PMU_MHPMCOUNTER3	2
> +#define RISCV_PMU_MHPMCOUNTER4	3
> +#define RISCV_PMU_MHPMCOUNTER5	4
> +#define RISCV_PMU_MHPMCOUNTER6	5
> +#define RISCV_PMU_MHPMCOUNTER7	6
> +#define RISCV_PMU_MHPMCOUNTER8	7
> +
> +#define RISCV_OP_UNSUPP		(-EOPNOTSUPP)
> +
> +struct cpu_hw_events {
> +	/* # currently enabled events*/
> +	int			n_events;
> +	/* currently enabled events */
> +	struct perf_event	*events[RISCV_MAX_COUNTERS];
> +	/* vendor-defined PMU data */
> +	void			*platform;
> +};
> +
> +struct riscv_pmu {
> +	struct pmu	*pmu;
> +
> +	/* generic hw/cache events table */
> +	const int	*hw_events;
> +	const int	(*cache_events)[PERF_COUNT_HW_CACHE_MAX]
> +				       [PERF_COUNT_HW_CACHE_OP_MAX]
> +				       [PERF_COUNT_HW_CACHE_RESULT_MAX];
> +	/* method used to map hw/cache events */
> +	int		(*map_hw_event)(u64 config);
> +	int		(*map_cache_event)(u64 config);
> +
> +	/* max generic hw events in map */
> +	int		max_events;
> +	/* number total counters, 2(base) + x(general) */
> +	int		num_counters;
> +	/* the width of the counter */
> +	int		counter_width;
> +
> +	/* vendor-defined PMU features */
> +	void		*platform;
> +
> +	irqreturn_t	(*handle_irq)(int irq_num, void *dev);
> +	int		irq;
> +};
> +
>   #endif /* _ASM_RISCV_PERF_EVENT_H */
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index ffa439d4a364..f50d19816757 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>   obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o
>   obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>   obj-$(CONFIG_FUNCTION_GRAPH_TRACER)	+= ftrace.o
> +obj-$(CONFIG_PERF_EVENTS)      += perf_event.o
>   
>   clean:
> diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c
> new file mode 100644
> index 000000000000..ba3192afc470
> --- /dev/null
> +++ b/arch/riscv/kernel/perf_event.c
> @@ -0,0 +1,482 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
> + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
> + * Copyright (C) 2009 Jaswinder Singh Rajput
> + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
> + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
> + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com>
> + * Copyright (C) 2009 Google, Inc., Stephane Eranian
> + * Copyright 2014 Tilera Corporation. All Rights Reserved.
> + * Copyright (C) 2018 Andes Technology Corporation
> + *
> + * Perf_events support for RISC-V platforms.
> + *
> + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
> + * functionality for perf event to fully work, this file provides
> + * the very basic framework only.
> + *
> + * For platform portings, please check Documentations/riscv/pmu.txt.
> + *
> + * The Copyright line includes x86 and tile ones.
> + */
> +
> +#include <linux/kprobes.h>
> +#include <linux/kernel.h>
> +#include <linux/kdebug.h>
> +#include <linux/mutex.h>
> +#include <linux/bitmap.h>
> +#include <linux/irq.h>
> +#include <linux/interrupt.h>
> +#include <linux/perf_event.h>
> +#include <linux/atomic.h>
> +#include <linux/of.h>
> +#include <asm/perf_event.h>
> +
> +static const struct riscv_pmu *riscv_pmu __read_mostly;
> +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
> +
> +/*
> + * Hardware & cache maps and their methods
> + */
> +
> +static const int riscv_hw_event_map[] = {
> +	[PERF_COUNT_HW_CPU_CYCLES]		= RISCV_PMU_CYCLE,
> +	[PERF_COUNT_HW_INSTRUCTIONS]		= RISCV_PMU_INSTRET,
> +	[PERF_COUNT_HW_CACHE_REFERENCES]	= RISCV_OP_UNSUPP,
> +	[PERF_COUNT_HW_CACHE_MISSES]		= RISCV_OP_UNSUPP,
> +	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= RISCV_OP_UNSUPP,
> +	[PERF_COUNT_HW_BRANCH_MISSES]		= RISCV_OP_UNSUPP,
> +	[PERF_COUNT_HW_BUS_CYCLES]		= RISCV_OP_UNSUPP,
> +};
> +
> +#define C(x) PERF_COUNT_HW_CACHE_##x
> +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
> +[PERF_COUNT_HW_CACHE_OP_MAX]
> +[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> +	[C(L1D)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +	[C(L1I)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +	[C(LL)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +	[C(DTLB)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] =  RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] =  RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +	[C(ITLB)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +	[C(BPU)] = {
> +		[C(OP_READ)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_WRITE)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +		[C(OP_PREFETCH)] = {
> +			[C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> +			[C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> +		},
> +	},
> +};
> +
> +static int riscv_map_hw_event(u64 config)
> +{
> +	if (config >= riscv_pmu->max_events)
> +		return -EINVAL;
> +
> +	return riscv_pmu->hw_events[config];
> +}
> +
> +int riscv_map_cache_decode(u64 config, unsigned int *type,
> +			   unsigned int *op, unsigned int *result)
> +{
> +	return -ENOENT;
> +}
> +
> +static int riscv_map_cache_event(u64 config)
> +{
> +	unsigned int type, op, result;
> +	int err = -ENOENT;
> +		int code;
> +
> +	err = riscv_map_cache_decode(config, &type, &op, &result);
> +	if (!riscv_pmu->cache_events || err)
> +		return err;
> +
> +	if (type >= PERF_COUNT_HW_CACHE_MAX ||
> +	    op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> +	    result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> +		return -EINVAL;
> +
> +	code = (*riscv_pmu->cache_events)[type][op][result];
> +	if (code == RISCV_OP_UNSUPP)
> +		return -EINVAL;
> +
> +	return code;
> +}
> +
> +/*
> + * Low-level functions: reading/writing counters
> + */
> +
> +static inline u64 read_counter(int idx)
> +{
> +	u64 val = 0;
> +
> +	switch (idx) {
> +	case RISCV_PMU_CYCLE:
> +		val = csr_read(cycle);
> +		break;
> +	case RISCV_PMU_INSTRET:
> +		val = csr_read(instret);
> +		break;
> +	default:
> +		WARN_ON_ONCE(idx < 0 ||	idx > RISCV_MAX_COUNTERS);
> +		return -EINVAL;
> +	}
> +
> +	return val;
> +}
> +
> +static inline void write_counter(int idx, u64 value)
> +{
> +	/* currently not supported */
> +	WARN_ON_ONCE(1);
> +}
> +
> +/*
> + * pmu->read: read and update the counter
> + *
> + * Other architectures' implementation often have a xxx_perf_event_update
> + * routine, which can return counter values when called in the IRQ, but
> + * return void when being called by the pmu->read method.
> + */
> +static void riscv_pmu_read(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	u64 prev_raw_count, new_raw_count;
> +	u64 oldval;
> +	int idx = hwc->idx;
> +	u64 delta;
> +
> +	do {
> +		prev_raw_count = local64_read(&hwc->prev_count);
> +		new_raw_count = read_counter(idx);
> +
> +		oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> +					 new_raw_count);
> +	} while (oldval != prev_raw_count);
> +
> +	/*
> +	 * delta is the value to update the counter we maintain in the kernel.
> +	 */
> +	delta = (new_raw_count - prev_raw_count) &
> +		((1ULL << riscv_pmu->counter_width) - 1);
> +	local64_add(delta, &event->count);
> +	/*
> +	 * Something like local64_sub(delta, &hwc->period_left) here is
> +	 * needed if there is an interrupt for perf.
> +	 */
> +}
> +
> +/*
> + * State transition functions:
> + *
> + * stop()/start() & add()/del()
> + */
> +
> +/*
> + * pmu->stop: stop the counter
> + */
> +static void riscv_pmu_stop(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> +	hwc->state |= PERF_HES_STOPPED;
> +
> +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> +		riscv_pmu->pmu->read(event);
> +		hwc->state |= PERF_HES_UPTODATE;
> +	}
> +}
> +
> +/*
> + * pmu->start: start the event.
> + */
> +static void riscv_pmu_start(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> +		return;
> +
> +	if (flags & PERF_EF_RELOAD) {
> +		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
> +
> +		/*
> +		 * Set the counter to the period to the next interrupt here,
> +		 * if you have any.
> +		 */
> +	}
> +
> +	hwc->state = 0;
> +	perf_event_update_userpage(event);
> +
> +	/*
> +	 * Since we cannot write to counters, this serves as an initialization
> +	 * to the delta-mechanism in pmu->read(); otherwise, the delta would be
> +	 * wrong when pmu->read is called for the first time.
> +	 */
> +	local64_set(&hwc->prev_count, read_counter(hwc->idx));
> +}
> +
> +/*
> + * pmu->add: add the event to PMU.
> + */
> +static int riscv_pmu_add(struct perf_event *event, int flags)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (cpuc->n_events == riscv_pmu->num_counters)
> +		return -ENOSPC;
> +
> +	/*
> +	 * We don't have general conunters, so no binding-event-to-counter
> +	 * process here.
> +	 *
> +	 * Indexing using hwc->config generally not works, since config may
> +	 * contain extra information, but here the only info we have in
> +	 * hwc->config is the event index.
> +	 */
> +	hwc->idx = hwc->config;
> +	cpuc->events[hwc->idx] = event;
> +	cpuc->n_events++;
> +
> +	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> +
> +	if (flags & PERF_EF_START)
> +		riscv_pmu->pmu->start(event, PERF_EF_RELOAD);
> +
> +	return 0;
> +}
> +
> +/*
> + * pmu->del: delete the event from PMU.
> + */
> +static void riscv_pmu_del(struct perf_event *event, int flags)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	cpuc->events[hwc->idx] = NULL;
> +	cpuc->n_events--;
> +	riscv_pmu->pmu->stop(event, PERF_EF_UPDATE);
> +	perf_event_update_userpage(event);
> +}
> +
> +/*
> + * Interrupt: a skeletion for reference.
> + */
> +
> +static DEFINE_MUTEX(pmc_reserve_mutex);
> +
> +irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev)
> +{
> +	return IRQ_NONE;
> +}
> +
> +static int reserve_pmc_hardware(void)
> +{
> +	int err = 0;
> +
> +	mutex_lock(&pmc_reserve_mutex);
> +	if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) {
> +		err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq,
> +				  IRQF_PERCPU, "riscv-base-perf", NULL);
> +	}
> +	mutex_unlock(&pmc_reserve_mutex);
> +
> +	return err;
> +}
> +
> +void release_pmc_hardware(void)
> +{
> +	mutex_lock(&pmc_reserve_mutex);
> +	if (riscv_pmu->irq >=0) {
> +		free_irq(riscv_pmu->irq, NULL);
> +	}
> +	mutex_unlock(&pmc_reserve_mutex);
> +}
> +
> +/*
> + * Event Initialization/Finalization
> + */
> +
> +static atomic_t riscv_active_events = ATOMIC_INIT(0);
> +
> +static void riscv_event_destroy(struct perf_event *event)
> +{
> +	if (atomic_dec_return(&riscv_active_events) == 0)
> +		release_pmc_hardware();
> +}
> +
> +static int riscv_event_init(struct perf_event *event)
> +{
> +	struct perf_event_attr *attr = &event->attr;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int err;
> +	int code;
> +
> +	if (atomic_inc_return(&riscv_active_events) == 1) {
> +		err = reserve_pmc_hardware();
> +
> +		if (err) {
> +			pr_warn("PMC hardware not available\n");
> +			atomic_dec(&riscv_active_events);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	switch (event->attr.type) {
> +	case PERF_TYPE_HARDWARE:
> +		code = riscv_pmu->map_hw_event(attr->config);
> +		break;
> +	case PERF_TYPE_HW_CACHE:
> +		code = riscv_pmu->map_cache_event(attr->config);
> +		break;
> +	case PERF_TYPE_RAW:
> +		return -EOPNOTSUPP;
> +	default:
> +		return -ENOENT;
> +	}
> +
> +	event->destroy = riscv_event_destroy;
> +	if (code < 0) {
> +		event->destroy(event);
> +		return code;
> +	}
> +
> +	/*
> +	 * idx is set to -1 because the index of a general event should not be
> +	 * decided until binding to some counter in pmu->add().
> +	 *
> +	 * But since we don't have such support, later in pmu->add(), we just
> +	 * use hwc->config as the index instead.
> +	 */
> +	hwc->config = code;
> +	hwc->idx = -1;
> +
> +	return 0;
> +}
> +
> +/*
> + * Initialization
> + */
> +
> +static struct pmu min_pmu = {
> +	.name		= "riscv-base",
> +	.event_init	= riscv_event_init,
> +	.add		= riscv_pmu_add,
> +	.del		= riscv_pmu_del,
> +	.start		= riscv_pmu_start,
> +	.stop		= riscv_pmu_stop,
> +	.read		= riscv_pmu_read,
> +};
> +
> +static const struct riscv_pmu riscv_base_pmu = {
> +	.pmu = &min_pmu,
> +	.max_events = ARRAY_SIZE(riscv_hw_event_map),
> +	.map_hw_event = riscv_map_hw_event,
> +	.hw_events = riscv_hw_event_map,
> +	.map_cache_event = riscv_map_cache_event,
> +	.cache_events = &riscv_cache_event_map,
> +	.counter_width = 63,
> +	.num_counters = RISCV_BASE_COUNTERS + 0,
> +	.handle_irq = &riscv_base_pmu_handle_irq,
> +
> +	/* This means this PMU has no IRQ. */
> +	.irq = -1,
> +};
> +
> +static const struct of_device_id riscv_pmu_of_ids[] = {
> +	{.compatible = "riscv,base-pmu",	.data = &riscv_base_pmu},
> +	{ /* sentinel value */ }
> +};
> +
> +int __init init_hw_perf_events(void)
> +{
> +	struct device_node *node = of_find_node_by_type(NULL, "pmu");
> +	const struct of_device_id *of_id;
> +	
> +	if (node && (of_id = of_match_node(riscv_pmu_of_ids, node)))
> +		riscv_pmu = of_id->data;
> +	else
> +		riscv_pmu = &riscv_base_pmu;
> +
> +	perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW);
> +	return 0;
> +}
> +arch_initcall(init_hw_perf_events);
> 
Some check patch errors.

ERROR: spaces required around that '>=' (ctx:WxV)
#517: FILE: arch/riscv/kernel/perf_event.c:356:
+	if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) {
  	                   ^

ERROR: spaces required around that '>=' (ctx:WxV)
#529: FILE: arch/riscv/kernel/perf_event.c:368:
+	if (riscv_pmu->irq >=0) {
  	                   ^

WARNING: braces {} are not necessary for single statement blocks
#529: FILE: arch/riscv/kernel/perf_event.c:368:
+	if (riscv_pmu->irq >=0) {
+		free_irq(riscv_pmu->irq, NULL);
+	}

WARNING: DT compatible string "riscv,base-pmu" appears un-documented -- 
check ./Documentation/devicetree/bindings/
#626: FILE: arch/riscv/kernel/perf_event.c:465:
+	{.compatible = "riscv,base-pmu",	.data = &riscv_base_pmu},

ERROR: trailing whitespace
#634: FILE: arch/riscv/kernel/perf_event.c:473:
+^I$

ERROR: do not use assignment in if condition
#635: FILE: arch/riscv/kernel/perf_event.c:474:
+	if (node && (of_id = of_match_node(riscv_pmu_of_ids, node)))

total: 4 errors, 3 warnings, 595 lines checked


Regards,
Atish

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] perf: riscv: preliminary RISC-V support
  2018-04-19 19:46   ` Atish Patra
@ 2018-04-19 23:24     ` Alan Kao
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Kao @ 2018-04-19 23:24 UTC (permalink / raw)
  To: Atish Patra
  Cc: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv,
	linux-doc, linux-kernel, Nick Hu, Greentime Hu

On Thu, Apr 19, 2018 at 12:46:24PM -0700, Atish Patra wrote:
> On 4/17/18 7:13 PM, Alan Kao wrote:
> >This patch provide a basic PMU, riscv_base_pmu, which supports two
> >general hardware event, instructions and cycles.  Furthermore, this
> >PMU serves as a reference implementation to ease the portings in
> >the future.
> >
> >riscv_base_pmu should be able to run on any RISC-V machine that
> >conforms to the Priv-Spec.  Note that the latest qemu model hasn't
> >fully support a proper behavior of Priv-Spec 1.10 yet, but work
> >around should be easy with very small fixes.  Please check
> >https://github.com/riscv/riscv-qemu/pull/115 for future updates.
> >
> >Cc: Nick Hu <nickhu@andestech.com>
> >Cc: Greentime Hu <greentime@andestech.com>
> >Signed-off-by: Alan Kao <alankao@andestech.com>
> >---
> >  arch/riscv/Kconfig                  |  13 +
> >  arch/riscv/include/asm/perf_event.h |  79 ++++-
> >  arch/riscv/kernel/Makefile          |   1 +
> >  arch/riscv/kernel/perf_event.c      | 482 ++++++++++++++++++++++++++++
> >  4 files changed, 571 insertions(+), 4 deletions(-)
> >  create mode 100644 arch/riscv/kernel/perf_event.c
> >
> >diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >index c22ebe08e902..90d9c8e50377 100644
> >--- a/arch/riscv/Kconfig
> >+++ b/arch/riscv/Kconfig
> Some check patch errors.
> 
> ERROR: spaces required around that '>=' (ctx:WxV)
> #517: FILE: arch/riscv/kernel/perf_event.c:356:
> +	if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) {
>  	                   ^
> 
> ERROR: spaces required around that '>=' (ctx:WxV)
> #529: FILE: arch/riscv/kernel/perf_event.c:368:
> +	if (riscv_pmu->irq >=0) {
>  	                   ^
> 
> WARNING: braces {} are not necessary for single statement blocks
> #529: FILE: arch/riscv/kernel/perf_event.c:368:
> +	if (riscv_pmu->irq >=0) {
> +		free_irq(riscv_pmu->irq, NULL);
> +	}
> 
> WARNING: DT compatible string "riscv,base-pmu" appears un-documented --
> check ./Documentation/devicetree/bindings/
> #626: FILE: arch/riscv/kernel/perf_event.c:465:
> +	{.compatible = "riscv,base-pmu",	.data = &riscv_base_pmu},
> 
> ERROR: trailing whitespace
> #634: FILE: arch/riscv/kernel/perf_event.c:473:
> +^I$
> 
> ERROR: do not use assignment in if condition
> #635: FILE: arch/riscv/kernel/perf_event.c:474:
> +	if (node && (of_id = of_match_node(riscv_pmu_of_ids, node)))
> 
> total: 4 errors, 3 warnings, 595 lines checked
> 
> 
> Regards,
> Atish

Thanks for pointing this out.  I happened to develop this patchset on a machine
without the post-commit settings.  A new version is ready.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-19 23:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao
2018-04-18  2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao
2018-04-19 19:46   ` Atish Patra
2018-04-19 23:24     ` Alan Kao
2018-04-18  2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).