LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v4 0/4] Add perf interface to expose nvdimm
@ 2021-08-02 7:39 Kajol Jain
2021-08-02 7:39 ` [PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Kajol Jain @ 2021-08-02 7:39 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: maddy, santosh, aneesh.kumar, vaibhav, atrajeev, tglx, kjain, rnsastry
Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as supported events, cpumask
pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.
Added implementation to expose IBM pseries platform nmem*
device performance stats using this interface.
Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
nmem0/cchrhcnt/ [Kernel PMU event]
nmem0/cchwhcnt/ [Kernel PMU event]
nmem0/critrscu/ [Kernel PMU event]
nmem0/ctlresct/ [Kernel PMU event]
nmem0/ctlrestm/ [Kernel PMU event]
nmem0/fastwcnt/ [Kernel PMU event]
nmem0/hostlcnt/ [Kernel PMU event]
nmem0/hostldur/ [Kernel PMU event]
nmem0/hostscnt/ [Kernel PMU event]
nmem0/hostsdur/ [Kernel PMU event]
nmem0/medrcnt/ [Kernel PMU event]
nmem0/medrdur/ [Kernel PMU event]
nmem0/medwcnt/ [Kernel PMU event]
nmem0/medwdur/ [Kernel PMU event]
nmem0/memlife/ [Kernel PMU event]
nmem0/noopstat/ [Kernel PMU event]
nmem0/ponsecs/ [Kernel PMU event]
nmem1/cchrhcnt/ [Kernel PMU event]
nmem1/cchwhcnt/ [Kernel PMU event]
nmem1/critrscu/ [Kernel PMU event]
...
nmem1/noopstat/ [Kernel PMU event]
nmem1/ponsecs/ [Kernel PMU event]
Patch1:
Introduces the nvdimm_pmu structure
Patch2:
Adds common interface to add arch/platform specific data
includes supported events, pmu event functions. It also
adds code for cpu hotplug support.
Patch3:
Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
cpumask andevent functions and then registers the pmu by adding
callbacks to register_nvdimm_pmu.
Patch4:
Sysfs documentation patch
Changelog
---
v3 -> v4
- Rebase code on top of current papr_scm code without any logical
changes.
- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
from Madhavan Srinivasan.
- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
v2 -> v3
- Added Tested-by tag.
- Fix nvdimm mailing list in the ABI Documentation.
- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
v1 -> v2
- Fix hotplug code by adding pmu migration call
incase current designated cpu got offline. As
pointed by Peter Zijlstra.
- Removed the retun -1 part from cpu hotplug offline
function.
- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
Kajol Jain (4):
drivers/nvdimm: Add nvdimm pmu structure
drivers/nvdimm: Add perf interface to expose nvdimm performance stats
powerpc/papr_scm: Add perf interface support
powerpc/papr_scm: Document papr_scm sysfs event format entries
Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 ++
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 230 +++++++++++
include/linux/nd.h | 46 +++
6 files changed, 678 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c
--
2.26.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure
2021-08-02 7:39 [PATCH v4 0/4] Add perf interface to expose nvdimm Kajol Jain
@ 2021-08-02 7:39 ` Kajol Jain
2021-08-02 7:39 ` [PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Kajol Jain @ 2021-08-02 7:39 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: maddy, santosh, aneesh.kumar, vaibhav, atrajeev, tglx, kjain, rnsastry
A structure is added, called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
nvdimm pmu data such as supported events and pmu event functions
like event_init/add/read/del with cpu hotplug support.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..712499cf7335 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
#include <linux/ndctl.h>
#include <linux/device.h>
#include <linux/badblocks.h>
+#include <linux/platform_device.h>
+#include <linux/perf_event.h>
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,47 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR 0
+#define NVDIMM_PMU_EVENT_ATTR 1
+#define NVDIMM_PMU_CPUMASK_ATTR 2
+#define NVDIMM_PMU_NULL_ATTR 3
+
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific pmu functions.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+ const char *name;
+ struct pmu pmu;
+ struct device *dev;
+ int (*event_init)(struct perf_event *event);
+ int (*add)(struct perf_event *event, int flags);
+ void (*del)(struct perf_event *event, int flags);
+ void (*read)(struct perf_event *event);
+ /*
+ * Attribute groups for the nvdimm pmu. Index 0 used for
+ * format attribute, index 1 used for event attribute,
+ * index 2 used for cpusmask attribute and index 3 kept as NULL.
+ */
+ const struct attribute_group *attr_groups[4];
+ int cpu;
+ struct hlist_node node;
+ enum cpuhp_state cpuhp_state;
+
+ /* cpumask provided by arch/platform specific code */
+ struct cpumask arch_cpumask;
+};
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
2021-08-02 7:39 [PATCH v4 0/4] Add perf interface to expose nvdimm Kajol Jain
2021-08-02 7:39 ` [PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
@ 2021-08-02 7:39 ` Kajol Jain
2021-08-02 7:39 ` [PATCH v4 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
2021-08-02 7:39 ` [PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
3 siblings, 0 replies; 5+ messages in thread
From: Kajol Jain @ 2021-08-02 7:39 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: maddy, santosh, aneesh.kumar, vaibhav, atrajeev, tglx, kjain, rnsastry
A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface includes support for
pmu register/unregister functions, cpu hotplug and pmu event
functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 230 +++++++++++++++++++++++++++++++++++++++
include/linux/nd.h | 3 +
3 files changed, 234 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..4c49d1bc2a3c
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+ return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+ u32 target;
+ int nodeid;
+ const struct cpumask *cpumask;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ /* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+ cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+ /*
+ * If given cpu is not same as current designated cpu for
+ * counter access, just return.
+ */
+ if (cpu != nd_pmu->cpu)
+ return 0;
+
+ /* Check for any active cpu in nd_pmu->arch_cpumask */
+ target = cpumask_any(&nd_pmu->arch_cpumask);
+
+ /*
+ * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+ * check in given cpu's numa node list.
+ */
+ if (target >= nr_cpu_ids) {
+ nodeid = cpu_to_node(cpu);
+ cpumask = cpumask_of_node(nodeid);
+ target = cpumask_any_but(cpumask, cpu);
+ }
+ nd_pmu->cpu = target;
+
+ /* Migrate nvdimm pmu events to the new target cpu if valid */
+ if (target >= 0 && target < nr_cpu_ids)
+ perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
+
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ if (nd_pmu->cpu >= nr_cpu_ids)
+ nd_pmu->cpu = cpu;
+
+ return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+ struct perf_pmu_events_attr *attr;
+ struct attribute **attrs;
+ struct attribute_group *nvdimm_pmu_cpumask_group;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+ if (!attr)
+ return -ENOMEM;
+
+ attrs = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+ if (!attrs) {
+ kfree(attr);
+ return -ENOMEM;
+ }
+
+ /* Allocate memory for cpumask attribute group */
+ nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+ if (!nvdimm_pmu_cpumask_group) {
+ kfree(attr);
+ kfree(attrs);
+ return -ENOMEM;
+ }
+
+ sysfs_attr_init(&attr->attr.attr);
+ attr->attr.attr.name = "cpumask";
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = nvdimm_pmu_cpumask_show;
+ attrs[0] = &attr->attr.attr;
+ attrs[1] = NULL;
+
+ nvdimm_pmu_cpumask_group->attrs = attrs;
+ nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+ int nodeid, rc;
+ const struct cpumask *cpumask;
+
+ /*
+ * Incase cpu hotplug is not handled by arch specific code
+ * they can still provide required cpumask which can be used
+ * to get designatd cpu for counter access.
+ * Check for any active cpu in nd_pmu->arch_cpumask.
+ */
+ if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+ nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+ } else {
+ /* pick active cpu from the cpumask of device numa node. */
+ nodeid = dev_to_node(nd_pmu->dev);
+ cpumask = cpumask_of_node(nodeid);
+ nd_pmu->cpu = cpumask_any(cpumask);
+ }
+
+ rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+ nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+ if (rc < 0)
+ return rc;
+
+ nd_pmu->cpuhp_state = rc;
+
+ /* Register the pmu instance for cpu hotplug */
+ rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ if (rc) {
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ /* Create cpumask attribute group */
+ rc = create_cpumask_attr_group(nd_pmu);
+ if (rc) {
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ return 0;
+}
+
+void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+ if (nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+ int rc;
+
+ if (!nd_pmu || !pdev)
+ return -EINVAL;
+
+ /* event functions like add/del/read/event_init should not be NULL */
+ if (WARN_ON_ONCE(!(nd_pmu->event_init && nd_pmu->add && nd_pmu->del && nd_pmu->read)))
+ return -EINVAL;
+
+ nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+ nd_pmu->pmu.name = nd_pmu->name;
+ nd_pmu->pmu.event_init = nd_pmu->event_init;
+ nd_pmu->pmu.add = nd_pmu->add;
+ nd_pmu->pmu.del = nd_pmu->del;
+ nd_pmu->pmu.read = nd_pmu->read;
+
+ nd_pmu->pmu.attr_groups = nd_pmu->attr_groups;
+ nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+ PERF_PMU_CAP_NO_EXCLUDE;
+
+ /*
+ * Add platform_device->dev pointer to nvdimm_pmu to access
+ * device data in events functions.
+ */
+ nd_pmu->dev = &pdev->dev;
+
+ /*
+ * Incase cpumask attribute is set it means cpu
+ * hotplug is handled by the arch specific code and
+ * we can skip calling hotplug_init.
+ */
+ if (!nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]) {
+ /* init cpuhotplug */
+ rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+ if (rc) {
+ pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->name);
+ return rc;
+ }
+ }
+
+ rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->name, -1);
+ if (rc) {
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+ return rc;
+ }
+
+ pr_info("%s NVDIMM performance monitor support registered\n",
+ nd_pmu->name);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+ /* handle freeing of memory nd_pmu in arch specific code */
+ perf_pmu_unregister(&nd_pmu->pmu);
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 712499cf7335..7d8b4f7d277d 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -66,6 +66,9 @@ struct nvdimm_pmu {
struct cpumask arch_cpumask;
};
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 3/4] powerpc/papr_scm: Add perf interface support
2021-08-02 7:39 [PATCH v4 0/4] Add perf interface to expose nvdimm Kajol Jain
2021-08-02 7:39 ` [PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
2021-08-02 7:39 ` [PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
@ 2021-08-02 7:39 ` Kajol Jain
2021-08-02 7:39 ` [PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
3 siblings, 0 replies; 5+ messages in thread
From: Kajol Jain @ 2021-08-02 7:39 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: maddy, santosh, aneesh.kumar, vaibhav, atrajeev, tglx, kjain, rnsastry
Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.
A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.
papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, cpumask, attribute groups along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device.
Event handling functions internally uses hcall to get events and
counter data.
Result in power9 machine with 2 nvdimm device:
Ex: List all event by perf list
command:# perf list nmem
nmem0/cchrhcnt/ [Kernel PMU event]
nmem0/cchwhcnt/ [Kernel PMU event]
nmem0/critrscu/ [Kernel PMU event]
nmem0/ctlresct/ [Kernel PMU event]
nmem0/ctlrestm/ [Kernel PMU event]
nmem0/fastwcnt/ [Kernel PMU event]
nmem0/hostlcnt/ [Kernel PMU event]
nmem0/hostldur/ [Kernel PMU event]
nmem0/hostscnt/ [Kernel PMU event]
nmem0/hostsdur/ [Kernel PMU event]
nmem0/medrcnt/ [Kernel PMU event]
nmem0/medrdur/ [Kernel PMU event]
nmem0/medwcnt/ [Kernel PMU event]
nmem0/medwdur/ [Kernel PMU event]
nmem0/memlife/ [Kernel PMU event]
nmem0/noopstat/ [Kernel PMU event]
nmem0/ponsecs/ [Kernel PMU event]
nmem1/cchrhcnt/ [Kernel PMU event]
nmem1/cchwhcnt/ [Kernel PMU event]
nmem1/critrscu/ [Kernel PMU event]
...
nmem1/noopstat/ [Kernel PMU event]
nmem1/ponsecs/ [Kernel PMU event]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++++++
2 files changed, 370 insertions(+)
diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
struct pdev_archdata {
u64 dma_mask;
+ /*
+ * Pointer to nvdimm_pmu structure, to handle the unregistering
+ * of pmu device
+ */
+ void *priv;
};
#endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..26900101e638 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,8 @@
#include <asm/papr_pdsm.h>
#include <asm/mce.h>
#include <asm/unaligned.h>
+#include <linux/perf_event.h>
+#include <linux/ctype.h>
#define BIND_ANY_ADDR (~0ul)
@@ -68,6 +70,8 @@
#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
#define PAPR_SCM_PERF_STATS_VERSION 0x1
+#define to_nvdimm_pmu(_pmu) container_of(_pmu, struct nvdimm_pmu, pmu)
+
/* Struct holding a single performance metric */
struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +124,12 @@ struct papr_scm_priv {
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+ /* array to have event_code and stat_id mappings */
+ char **nvdimm_events_map;
+
+ /* count of supported events */
+ u32 total_events;
};
static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +350,354 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
}
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+ .name = "format",
+ .attrs = nvdimm_pmu_format_attr,
+};
+
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+ int rc, size;
+
+ /* Allocate request buffer enough to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) +
+ sizeof(struct papr_scm_perf_stat);
+
+ if (!p || !p->nvdimm_events_map)
+ return -EINVAL;
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ stat = &stats->scm_statistic[0];
+ memcpy(&stat->stat_id,
+ p->nvdimm_events_map[event->attr.config - 1],
+ sizeof(stat->stat_id));
+ stat->stat_val = 0;
+
+ rc = drc_pmem_query_stats(p, stats, 1);
+ if (rc < 0) {
+ kfree(stats);
+ return rc;
+ }
+
+ *count = be64_to_cpu(stat->stat_val);
+ kfree(stats);
+ return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+ struct papr_scm_priv *p;
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* it does not support event sampling mode */
+ if (is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+ if (!p)
+ return -EINVAL;
+
+ /* Invalid eventcode */
+ if (event->attr.config == 0 || event->attr.config > p->total_events)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+ u64 count;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ if (flags & PERF_EF_START) {
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+ if (rc)
+ return rc;
+
+ local64_set(&event->hw.prev_count, count);
+ }
+
+ return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+ u64 prev, now;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return;
+
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+ if (rc)
+ return;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+ papr_scm_pmu_read(event);
+}
+
+static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct perf_pmu_events_attr *d;
+
+ d = container_of(attr, struct perf_pmu_events_attr, attr);
+
+ return sysfs_emit(buf, "%s\n", (char *)d->event_str);
+}
+
+static char *strtolower(char *updated_name)
+{
+ int i = 0;
+
+ while (updated_name[i]) {
+ if (isupper(updated_name[i]))
+ updated_name[i] = tolower(updated_name[i]);
+ i++;
+ }
+ updated_name[i] = '\0';
+ return strim(updated_name);
+}
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create_(char *name, char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ if (!attr)
+ return NULL;
+
+ sysfs_attr_init(&attr->attr.attr);
+ attr->event_str = str;
+ attr->attr.attr.name = strtolower(name);
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = device_show_string;
+
+ return &attr->attr.attr;
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats, *single_stats;
+ int index, size, rc, count;
+ u32 available_events;
+ struct attribute **events;
+ char *eventcode, *eventname, *statid;
+ struct attribute_group *nvdimm_pmu_events_group;
+
+ if (!p->stat_buffer_len)
+ return -ENOENT;
+
+ available_events = (p->stat_buffer_len - sizeof(struct papr_scm_perf_stats))
+ / sizeof(struct papr_scm_perf_stat);
+
+ /* Allocate memory for events attribute group */
+ nvdimm_pmu_events_group = kzalloc(sizeof(*nvdimm_pmu_events_group), GFP_KERNEL);
+ if (!nvdimm_pmu_events_group)
+ return -ENOMEM;
+
+ /* Allocate the buffer for phyp where stats are written */
+ stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+ if (!stats) {
+ rc = -ENOMEM;
+ goto out_nvdimm_pmu_events_group;
+ }
+
+ /* Allocate memory to nvdimm_event_map */
+ p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+ if (!p->nvdimm_events_map) {
+ rc = -ENOMEM;
+ goto out_stats;
+ }
+
+ /* Called to get list of events supported */
+ rc = drc_pmem_query_stats(p, stats, 0);
+ if (rc)
+ goto out_nvdimm_events_map;
+
+ /* Allocate buffer to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) + sizeof(struct papr_scm_perf_stat);
+
+ single_stats = kzalloc(size, GFP_KERNEL);
+ if (!single_stats) {
+ rc = -ENOMEM;
+ goto out_nvdimm_events_map;
+ }
+
+ events = kzalloc(available_events * sizeof(struct attribute *), GFP_KERNEL);
+ if (!events) {
+ rc = -ENOMEM;
+ goto out_single_stats;
+ }
+
+ for (index = 0, stat = stats->scm_statistic, count = 0;
+ index < available_events; index++, ++stat) {
+
+ single_stats->scm_statistic[0] = *stat;
+ rc = drc_pmem_query_stats(p, single_stats, 1);
+
+ if (rc < 0) {
+ pr_info("Event not supported %s for device %s\n",
+ stat->stat_id, nvdimm_name(p->nvdimm));
+ } else {
+ eventcode = kasprintf(GFP_KERNEL, "event=0x%x", count + 1);
+ eventname = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+ statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+
+ if (!eventname || !statid || !eventcode)
+ goto out;
+
+ strcpy(eventname, stat->stat_id);
+ events[count] = device_str_attr_create_(eventname,
+ eventcode);
+ if (!events[count])
+ goto out;
+
+ strcpy(statid, stat->stat_id);
+ p->nvdimm_events_map[count] = statid;
+ count++;
+ continue;
+out:
+ kfree(eventcode);
+ kfree(eventname);
+ kfree(statid);
+ }
+ }
+
+ if (!count)
+ goto out_events;
+
+ events[count] = NULL;
+ p->nvdimm_events_map[count] = NULL;
+ p->total_events = count;
+
+ nvdimm_pmu_events_group->name = "events";
+ nvdimm_pmu_events_group->attrs = events;
+
+ /* Fill attribute groups for the nvdimm pmu device */
+ nd_pmu->attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+ nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR] = nvdimm_pmu_events_group;
+ nd_pmu->attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+ kfree(single_stats);
+ kfree(stats);
+ return 0;
+
+out_events:
+ kfree(events);
+out_single_stats:
+ kfree(single_stats);
+out_nvdimm_events_map:
+ kfree(p->nvdimm_events_map);
+out_stats:
+ kfree(stats);
+out_nvdimm_pmu_events_group:
+ kfree(nvdimm_pmu_events_group);
+ return rc;
+}
+
+/* Function to free the attr_groups which are dynamically allocated */
+static void papr_scm_pmu_mem_free(struct nvdimm_pmu *nd_pmu)
+{
+ if (nd_pmu) {
+ if (nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR])
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]->attrs);
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]);
+ }
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+ struct nvdimm_pmu *nd_pmu;
+ int rc, nodeid;
+
+ nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+ if (!nd_pmu) {
+ rc = -ENOMEM;
+ goto pmu_err_print;
+ }
+
+ rc = papr_scm_pmu_check_events(p, nd_pmu);
+ if (rc)
+ goto pmu_check_events_err;
+
+ nd_pmu->name = nvdimm_name(p->nvdimm);
+ nd_pmu->event_init = papr_scm_pmu_event_init;
+ nd_pmu->read = papr_scm_pmu_read;
+ nd_pmu->add = papr_scm_pmu_add;
+ nd_pmu->del = papr_scm_pmu_del;
+
+ /*updating the cpumask variable */
+ nodeid = dev_to_node(&p->pdev->dev);
+ nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+ /* cpumask should not be NULL */
+ WARN_ON_ONCE(cpumask_empty(&nd_pmu->arch_cpumask));
+
+ rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+ if (rc)
+ goto pmu_register_err;
+
+ /*
+ * Set archdata.priv value to nvdimm_pmu structure, to handle the
+ * unregistering of pmu device.
+ */
+ p->pdev->archdata.priv = nd_pmu;
+ return;
+
+pmu_register_err:
+ papr_scm_pmu_mem_free(nd_pmu);
+ kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+ kfree(nd_pmu);
+pmu_err_print:
+ dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
+static void papr_scm_pmu_uninit(struct nvdimm_pmu *nd_pmu)
+{
+ unregister_nvdimm_pmu(nd_pmu);
+ papr_scm_pmu_mem_free(nd_pmu);
+ kfree(nd_pmu);
+}
+
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
@@ -1236,6 +1594,7 @@ static int papr_scm_probe(struct platform_device *pdev)
goto err2;
platform_set_drvdata(pdev, p);
+ papr_scm_pmu_register(p);
return 0;
@@ -1254,6 +1613,12 @@ static int papr_scm_remove(struct platform_device *pdev)
nvdimm_bus_unregister(p->bus);
drc_pmem_unbind(p);
+
+ if (pdev->archdata.priv)
+ papr_scm_pmu_uninit(pdev->archdata.priv);
+
+ pdev->archdata.priv = NULL;
+ kfree(p->nvdimm_events_map);
kfree(p->bus_desc.provider_name);
kfree(p);
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries
2021-08-02 7:39 [PATCH v4 0/4] Add perf interface to expose nvdimm Kajol Jain
` (2 preceding siblings ...)
2021-08-02 7:39 ` [PATCH v4 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
@ 2021-08-02 7:39 ` Kajol Jain
3 siblings, 0 replies; 5+ messages in thread
From: Kajol Jain @ 2021-08-02 7:39 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: maddy, santosh, aneesh.kumar, vaibhav, atrajeev, tglx, kjain, rnsastry
Details is added for the event, cpumask and format attributes
in the ABI documentation.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 95254cec92bf..4d86252448f8 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -61,3 +61,34 @@ Description:
* "CchRHCnt" : Cache Read Hit Count
* "CchWHCnt" : Cache Write Hit Count
* "FastWCnt" : Fast Write Count
+
+What: /sys/devices/nmemX/format
+Date: June 2021
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description: (RO) Attribute group to describe the magic bits
+ that go into perf_event_attr.config for a particular pmu.
+ (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+ Each attribute under this group defines a bit range of the
+ perf_event_attr.config. Supported attribute is listed
+ below::
+
+ event = "config:0-4" - event ID
+
+ For example::
+ noopstat = "event=0x1"
+
+What: /sys/devices/nmemX/events
+Date: June 2021
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description: (RO) Attribute group to describe performance monitoring
+ events specific to papr-scm. Each attribute in this group describes
+ a single performance monitoring event supported by this nvdimm pmu.
+ The name of the file is the name of the event.
+ (See ABI/testing/sysfs-bus-event_source-devices-events).
+
+What: /sys/devices/nmemX/cpumask
+Date: June 2021
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description: (RO) This sysfs file exposes the cpumask which is designated to make
+ HCALLs to retrieve nvdimm pmu event counter data.
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-02 7:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02 7:39 [PATCH v4 0/4] Add perf interface to expose nvdimm Kajol Jain
2021-08-02 7:39 ` [PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure Kajol Jain
2021-08-02 7:39 ` [PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats Kajol Jain
2021-08-02 7:39 ` [PATCH v4 3/4] powerpc/papr_scm: Add perf interface support Kajol Jain
2021-08-02 7:39 ` [PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries Kajol Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).