LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Alan Kao <alankao@andestech.com>
To: Palmer Dabbelt <palmer@sifive.com>, Albert Ou <albert@sifive.com>,
"Peter Zijlstra" <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
"Arnaldo Carvalho de Melo" <acme@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>,
"Namhyung Kim" <namhyung@kernel.org>,
Alex Solomatnikov <sols@sifive.com>,
Atish Patra <atish.patra@wdc.com>,
Jonathan Corbet <corbet@lwn.net>,
<linux-riscv@lists.infradead.org>, <linux-doc@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Cc: Alan Kao <alankao@andestech.com>, Nick Hu <nickhu@andestech.com>,
Greentime Hu <greentime@andestech.com>
Subject: [PATCH v5 2/2] perf: riscv: Add Document for Future Porting Guide
Date: Fri, 20 Apr 2018 07:27:50 +0800 [thread overview]
Message-ID: <1524180470-8622-3-git-send-email-alankao@andestech.com> (raw)
In-Reply-To: <1524180470-8622-1-git-send-email-alankao@andestech.com>
Reviewed-by: Alex Solomatnikov <sols@sifive.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Signed-off-by: Alan Kao <alankao@andestech.com>
---
Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++
1 file changed, 249 insertions(+)
create mode 100644 Documentation/riscv/pmu.txt
diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
new file mode 100644
index 000000000000..b29f03a6d82f
--- /dev/null
+++ b/Documentation/riscv/pmu.txt
@@ -0,0 +1,249 @@
+Supporting PMUs on RISC-V platforms
+==========================================
+Alan Kao <alankao@andestech.com>, Mar 2018
+
+Introduction
+------------
+
+As of this writing, perf_event-related features mentioned in The RISC-V ISA
+Privileged Version 1.10 are as follows:
+(please check the manual for more details)
+
+* [m|s]counteren
+* mcycle[h], cycle[h]
+* minstret[h], instret[h]
+* mhpeventx, mhpcounterx[h]
+
+With such function set only, porting perf would require a lot of work, due to
+the lack of the following general architectural performance monitoring features:
+
+* Enabling/Disabling counters
+ Counters are just free-running all the time in our case.
+* Interrupt caused by counter overflow
+ No such feature in the spec.
+* Interrupt indicator
+ It is not possible to have many interrupt ports for all counters, so an
+ interrupt indicator is required for software to tell which counter has
+ just overflowed.
+* Writing to counters
+ There will be an SBI to support this since the kernel cannot modify the
+ counters [1]. Alternatively, some vendor considers to implement
+ hardware-extension for M-S-U model machines to write counters directly.
+
+This document aims to provide developers a quick guide on supporting their
+PMUs in the kernel. The following sections briefly explain perf' mechanism
+and todos.
+
+You may check previous discussions here [1][2]. Also, it might be helpful
+to check the appendix for related kernel structures.
+
+
+1. Initialization
+-----------------
+
+*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
+various methods according to perf's internal convention and PMU-specific
+parameters. One should declare such instance to represent the PMU. By default,
+*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
+basic support to a baseline QEMU model.
+
+Then he/she can either assign the instance's pointer to *riscv_pmu* so that
+the minimal and already-implemented logic can be leveraged, or invent his/her
+own *riscv_init_platform_pmu* implementation.
+
+In other words, existing sources of *riscv_base_pmu* merely provide a
+reference implementation. Developers can flexibly decide how many parts they
+can leverage, and in the most extreme case, they can customize every function
+according to their needs.
+
+
+2. Event Initialization
+-----------------------
+
+When a user launches a perf command to monitor some events, it is first
+interpreted by the userspace perf tool into multiple *perf_event_open*
+system calls, and then each of them calls to the body of *event_init*
+member function that was assigned in the previous step. In *riscv_base_pmu*'s
+case, it is *riscv_event_init*.
+
+The main purpose of this function is to translate the event provided by user
+into bitmap, so that HW-related control registers or counters can directly be
+manipulated. The translation is based on the mappings and methods provided in
+*riscv_pmu*.
+
+Note that some features can be done in this stage as well:
+
+(1) interrupt setting, which is stated in the next section;
+(2) privilege level setting (user space only, kernel space only, both);
+(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*;
+(4) tweaks for non-sampling events, which will be utilized by functions such as
+*perf_adjust_period*, usually something like the follows:
+
+if (!is_sampling_event(event)) {
+ hwc->sample_period = x86_pmu.max_period;
+ hwc->last_period = hwc->sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+}
+
+In the case of *riscv_base_pmu*, only (3) is provided for now.
+
+
+3. Interrupt
+------------
+
+3.1. Interrupt Initialization
+
+This often occurs at the beginning of the *event_init* method. In common
+practice, this should be a code segment like
+
+int x86_reserve_hardware(void)
+{
+ int err = 0;
+
+ if (!atomic_inc_not_zero(&pmc_refcount)) {
+ mutex_lock(&pmc_reserve_mutex);
+ if (atomic_read(&pmc_refcount) == 0) {
+ if (!reserve_pmc_hardware())
+ err = -EBUSY;
+ else
+ reserve_ds_buffers();
+ }
+ if (!err)
+ atomic_inc(&pmc_refcount);
+ mutex_unlock(&pmc_reserve_mutex);
+ }
+
+ return err;
+}
+
+And the magic is in *reserve_pmc_hardware*, which usually does atomic
+operations to make implemented IRQ accessible from some global function pointer.
+*release_pmc_hardware* serves the opposite purpose, and it is used in event
+destructors mentioned in previous section.
+
+(Note: From the implementations in all the architectures, the *reserve/release*
+pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
+It does NOT deal with the binding between an event and a physical counter,
+which will be introduced in the next section.)
+
+3.2. IRQ Structure
+
+Basically, a IRQ runs the following pseudo code:
+
+for each hardware counter that triggered this overflow
+
+ get the event of this counter
+
+ // following two steps are defined as *read()*,
+ // check the section Reading/Writing Counters for details.
+ count the delta value since previous interrupt
+ update the event->count (# event occurs) by adding delta, and
+ event->hw.period_left by subtracting delta
+
+ if the event overflows
+ sample data
+ set the counter appropriately for the next overflow
+
+ if the event overflows again
+ too frequently, throttle this event
+ fi
+ fi
+
+end for
+
+However as of this writing, none of the RISC-V implementations have designed an
+interrupt for perf, so the details are to be completed in the future.
+
+4. Reading/Writing Counters
+---------------------------
+
+They seem symmetric but perf treats them quite differently. For reading, there
+is a *read* interface in *struct pmu*, but it serves more than just reading.
+According to the context, the *read* function not only reads the content of the
+counter (event->count), but also updates the left period to the next interrupt
+(event->hw.period_left).
+
+But the core of perf does not need direct write to counters. Writing counters
+is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one
+has to set the counter to a good value for the next interrupt; 2) inside the IRQ
+it should set the counter to the same resonable value.
+
+Reading is not a problem in RISC-V but writing would need some effort, since
+counters are not allowed to be written by S-mode.
+
+
+5. add()/del()/start()/stop()
+-----------------------------
+
+Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
+starts/stop the counter of some event in the PMU. All of them take the same
+arguments: *struct perf_event *event* and *int flag*.
+
+Consider perf as a state machine, then you will find that these functions serve
+as the state transition process between those states.
+Three states (event->hw.state) are defined:
+
+* PERF_HES_STOPPED: the counter is stopped
+* PERF_HES_UPTODATE: the event->count is up-to-date
+* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now
+
+A normal flow of these state transitions are as follows:
+
+* A user launches a perf event, resulting in calling to *event_init*.
+* When being context-switched in, *add* is called by the perf core, with a flag
+ PERF_EF_START, which means that the event should be started after it is added.
+ At this stage, a general event is bound to a physical counter, if any.
+ The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
+ stopped, and the (software) event count does not need updating.
+** *start* is then called, and the counter is enabled.
+ With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check
+ previous section for detail).
+ Nothing is written if the flag does not contain PERF_EF_RELOAD.
+ The state now is reset to none, because it is neither stopped nor updated
+ (the counting already started)
+* When being context-switched out, *del* is called. It then checks out all the
+ events in the PMU and calls *stop* to update their counts.
+** *stop* is called by *del*
+ and the perf core with flag PERF_EF_UPDATE, and it often shares the same
+ subroutine as *read* with the same logic.
+ The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
+
+** Life cycle of these two pairs: *add* and *del* are called repeatedly as
+ tasks switch in-and-out; *start* and *stop* is also called when the perf core
+ needs a quick stop-and-start, for instance, when the interrupt period is being
+ adjusted.
+
+Current implementation is sufficient for now and can be easily extended to
+features in the future.
+
+A. Related Structures
+---------------------
+
+* struct pmu: include/linux/perf_event.h
+* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
+
+ Both structures are designed to be read-only.
+
+ *struct pmu* defines some function pointer interfaces, and most of them take
+*struct perf_event* as a main argument, dealing with perf events according to
+perf's internal state machine (check kernel/events/core.c for details).
+
+ *struct riscv_pmu* defines PMU-specific parameters. The naming follows the
+convention of all other architectures.
+
+* struct perf_event: include/linux/perf_event.h
+* struct hw_perf_event
+
+ The generic structure that represents perf events, and the hardware-related
+details.
+
+* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
+
+ The structure that holds the status of events, has two fixed members:
+the number of events and the array of the events.
+
+References
+----------
+
+[1] https://github.com/riscv/riscv-linux/pull/124
+[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
--
2.17.0
next prev parent reply other threads:[~2018-04-19 23:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-19 23:27 [PATCH v5 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao
2018-04-19 23:27 ` [PATCH v5 1/2] perf: riscv: preliminary RISC-V support Alan Kao
2018-04-19 23:27 ` Alan Kao [this message]
2018-04-24 18:07 ` [PATCH v5 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Atish Patra
2018-04-24 19:27 ` Atish Patra
2018-04-24 19:44 ` Palmer Dabbelt
2018-04-24 22:16 ` Atish Patra
2018-04-25 0:29 ` Palmer Dabbelt
2018-04-25 1:15 ` Atish Patra
2018-04-25 3:19 ` Alan Kao
2018-04-25 8:09 ` Atish Patra
2018-04-25 16:39 ` Palmer Dabbelt
2018-06-06 3:16 ` Alan Kao
2018-04-25 16:16 ` Palmer Dabbelt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1524180470-8622-3-git-send-email-alankao@andestech.com \
--to=alankao@andestech.com \
--cc=acme@kernel.org \
--cc=albert@sifive.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=atish.patra@wdc.com \
--cc=corbet@lwn.net \
--cc=greentime@andestech.com \
--cc=jolsa@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=nickhu@andestech.com \
--cc=palmer@sifive.com \
--cc=peterz@infradead.org \
--cc=sols@sifive.com \
--subject='Re: [PATCH v5 2/2] perf: riscv: Add Document for Future Porting Guide' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).