LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/5] perf cs-etm: Print size using consistent format
@ 2021-09-16 15:46 German Gomez
  2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez
                   ` (5 more replies)
  0 siblings, 6 replies; 38+ messages in thread
From: German Gomez @ 2021-09-16 15:46 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Andrew Kilroy, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

From: Andrew Kilroy <andrew.kilroy@arm.com>

Since the size is already printed earlier in hex, print the same data
using the same format, in hex.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/cs-etm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f323adb1af85..4f672f7d008c 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -537,7 +537,7 @@ static void cs_etm__dump_event(struct cs_etm_queue *etmq,
 
 	fprintf(stdout, "\n");
 	color_fprintf(stdout, color,
-		     ". ... CoreSight %s Trace data: size %zu bytes\n",
+		     ". ... CoreSight %s Trace data: size %#zx bytes\n",
 		     cs_etm_decoder__get_name(etmq->decoder), buffer->size);
 
 	do {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/5] perf arm-spe: Print size using consistent format
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
@ 2021-09-16 15:46 ` German Gomez
  2021-09-23 13:35   ` Leo Yan
  2021-09-16 15:46 ` [PATCH 3/5] perf arm-spe: Add snapshot mode support German Gomez
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-09-16 15:46 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Andrew Kilroy, German Gomez, John Garry, Will Deacon,
	Mathieu Poirier, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

From: Andrew Kilroy <andrew.kilroy@arm.com>

Since the size is already printed earlier in hex, print the same data
using the same format, in hex.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 58b7069c5a5f..2196291976d9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -100,7 +100,7 @@ static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
 	const char *color = PERF_COLOR_BLUE;
 
 	color_fprintf(stdout, color,
-		      ". ... ARM SPE data: size %zu bytes\n",
+		      ". ... ARM SPE data: size %#zx bytes\n",
 		      len);
 
 	while (len) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/5] perf arm-spe: Add snapshot mode support
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
  2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez
@ 2021-09-16 15:46 ` German Gomez
  2021-10-20 12:48   ` Leo Yan
  2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-09-16 15:46 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mike Leach, linux-arm-kernel, coresight

This patch enabled support for snapshot mode of arm_spe events,
including the implementation of the necessary callbacks (excluding
find_snapshot, which is to be included in a followup commit).

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 130 +++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index a4420d4df503..f8b03d164b42 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -84,6 +84,55 @@ static int arm_spe_info_fill(struct auxtrace_record *itr,
 	return 0;
 }
 
+static void
+arm_spe_snapshot_resolve_auxtrace_defaults(struct record_opts *opts,
+					   bool privileged)
+{
+	/*
+	 * The default snapshot size is the auxtrace mmap size. If neither auxtrace mmap size nor
+	 * snapshot size is specified, then the default is 4MiB for privileged users, 128KiB for
+	 * unprivileged users.
+	 *
+	 * The default auxtrace mmap size is 4MiB/page_size for privileged users, 128KiB for
+	 * unprivileged users. If an unprivileged user does not specify mmap pages, the mmap pages
+	 * will be reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+	 * user is likely to get an error as they exceed their mlock limmit.
+	 */
+
+	/*
+	 * No size were given to '-S' or '-m,', so go with the default
+	 */
+	if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	} else if (!opts->auxtrace_mmap_pages && !privileged && opts->mmap_pages == UINT_MAX) {
+		opts->mmap_pages = KiB(256) / page_size;
+	}
+
+	/*
+	 * '-m,xyz' was specified but no snapshot size, so make the snapshot size as big as the
+	 * auxtrace mmap area.
+	 */
+	if (!opts->auxtrace_snapshot_size)
+		opts->auxtrace_snapshot_size = opts->auxtrace_mmap_pages * (size_t)page_size;
+
+	/*
+	 * '-Sxyz' was specified but no auxtrace mmap area, so make the auxtrace mmap area big
+	 * enough to fit the requested snapshot size.
+	 */
+	if (!opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_snapshot_size;
+
+		sz = round_up(sz, page_size) / page_size;
+		opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+	}
+}
+
 static int arm_spe_recording_options(struct auxtrace_record *itr,
 				     struct evlist *evlist,
 				     struct record_opts *opts)
@@ -115,6 +164,36 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	if (!opts->full_auxtrace)
 		return 0;
 
+	/*
+	 * we are in snapshot mode.
+	 */
+	if (opts->auxtrace_snapshot_mode) {
+		/*
+		 * Command arguments '-Sxyz' and/or '-m,xyz' are missing, so fill those in with
+		 * default values.
+		 */
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages)
+			arm_spe_snapshot_resolve_auxtrace_defaults(opts, privileged);
+
+		/*
+		 * Snapshot size can't be bigger than the auxtrace area.
+		 */
+		if (opts->auxtrace_snapshot_size > opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+
+		/*
+		 * Something went wrong somewhere - this shouldn't happen.
+		 */
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+	}
+
 	/* We are in full trace mode but '-m,xyz' wasn't specified */
 	if (!opts->auxtrace_mmap_pages) {
 		if (privileged) {
@@ -138,6 +217,9 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 		}
 	}
 
+	if (opts->auxtrace_snapshot_mode)
+		pr_debug2("%sx snapshot size: %zu\n", ARM_SPE_PMU_NAME,
+			  opts->auxtrace_snapshot_size);
 
 	/*
 	 * To obtain the auxtrace buffer file descriptor, the auxtrace event
@@ -172,6 +254,51 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	return 0;
 }
 
+static int arm_spe_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
+					 struct record_opts *opts,
+					 const char *str)
+{
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+static int arm_spe_snapshot_start(struct auxtrace_record *itr)
+{
+	struct arm_spe_recording *ptr =
+			container_of(itr, struct arm_spe_recording, itr);
+	struct evsel *evsel;
+
+	evlist__for_each_entry(ptr->evlist, evsel) {
+		if (evsel->core.attr.type == ptr->arm_spe_pmu->type)
+			return evsel__disable(evsel);
+	}
+	return -EINVAL;
+}
+
+static int arm_spe_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct arm_spe_recording *ptr =
+			container_of(itr, struct arm_spe_recording, itr);
+	struct evsel *evsel;
+
+	evlist__for_each_entry(ptr->evlist, evsel) {
+		if (evsel->core.attr.type == ptr->arm_spe_pmu->type)
+			return evsel__enable(evsel);
+	}
+	return -EINVAL;
+}
+
 static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused)
 {
 	struct timespec ts;
@@ -207,6 +334,9 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
 
 	sper->arm_spe_pmu = arm_spe_pmu;
 	sper->itr.pmu = arm_spe_pmu;
+	sper->itr.snapshot_start = arm_spe_snapshot_start;
+	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
+	sper->itr.parse_snapshot_options = arm_spe_parse_snapshot_options;
 	sper->itr.recording_options = arm_spe_recording_options;
 	sper->itr.info_priv_size = arm_spe_info_priv_size;
 	sper->itr.info_fill = arm_spe_info_fill;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
  2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez
  2021-09-16 15:46 ` [PATCH 3/5] perf arm-spe: Add snapshot mode support German Gomez
@ 2021-09-16 15:46 ` German Gomez
  2021-09-23 13:50   ` Leo Yan
  2021-10-17 12:05   ` Leo Yan
  2021-09-16 15:46 ` [PATCH 5/5] perf arm-spe: Snapshot mode test German Gomez
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 38+ messages in thread
From: German Gomez @ 2021-09-16 15:46 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mike Leach, linux-arm-kernel, coresight

The head pointer of the AUX buffer managed by the arm_spe_pmu.c driver
is not monotonically increasing, therefore the find_snapshot callback is
needed in order to find the trace data within the AUX buffer and avoid
wasting space in the perf.data file.

The pointer is assumed to have wrapped if the buffer contains non-zero
data at the end. If it has wrapped, the entire contents of the AUX
buffer are stored in the perf.data file. Otherwise only the data up to
the head pointer is stored.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 145 +++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index f8b03d164b42..56785034fc84 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -23,6 +23,7 @@
 #include "../../../util/auxtrace.h"
 #include "../../../util/record.h"
 #include "../../../util/arm-spe.h"
+#include <tools/libc_compat.h> // reallocarray
 
 #define KiB(x) ((x) * 1024)
 #define MiB(x) ((x) * 1024 * 1024)
@@ -31,6 +32,8 @@ struct arm_spe_recording {
 	struct auxtrace_record		itr;
 	struct perf_pmu			*arm_spe_pmu;
 	struct evlist		*evlist;
+	int			wrapped_cnt;
+	bool			*wrapped;
 };
 
 static void arm_spe_set_timestamp(struct auxtrace_record *itr,
@@ -299,6 +302,146 @@ static int arm_spe_snapshot_finish(struct auxtrace_record *itr)
 	return -EINVAL;
 }
 
+static int arm_spe_alloc_wrapped_array(struct arm_spe_recording *ptr, int idx)
+{
+	bool *wrapped;
+	int cnt = ptr->wrapped_cnt, new_cnt, i;
+
+	/*
+	 * No need to allocate, so return early.
+	 */
+	if (idx < cnt)
+		return 0;
+
+	/*
+	 * Make ptr->wrapped as big as idx.
+	 */
+	new_cnt = idx + 1;
+
+	/*
+	 * Free'ed in arm_spe_recording_free().
+	 */
+	wrapped = reallocarray(ptr->wrapped, new_cnt, sizeof(bool));
+	if (!wrapped)
+		return -ENOMEM;
+
+	/*
+	 * init new allocated values.
+	 */
+	for (i = cnt; i < new_cnt; i++)
+		wrapped[i] = false;
+
+	ptr->wrapped_cnt = new_cnt;
+	ptr->wrapped = wrapped;
+
+	return 0;
+}
+
+static bool arm_spe_buffer_has_wrapped(unsigned char *buffer,
+				      size_t buffer_size, u64 head)
+{
+	u64 i, watermark;
+	u64 *buf = (u64 *)buffer;
+	size_t buf_size = buffer_size;
+
+	/*
+	 * Defensively handle the case where head might be continually increasing - if its value is
+	 * equal or greater than the size of the ring buffer, then we can safely determine it has
+	 * wrapped around. Otherwise, continue to detect if head might have wrapped.
+	 */
+	if (head >= buffer_size)
+		return true;
+
+	/*
+	 * We want to look the very last 512 byte (chosen arbitrarily) in the ring buffer.
+	 */
+	watermark = buf_size - 512;
+
+	/*
+	 * The value of head is somewhere within the size of the ring buffer. This can be that there
+	 * hasn't been enough data to fill the ring buffer yet or the trace time was so long that
+	 * head has numerically wrapped around.  To find we need to check if we have data at the
+	 * very end of the ring buffer.  We can reliably do this because mmap'ed pages are zeroed
+	 * out and there is a fresh mapping with every new session.
+	 */
+
+	/*
+	 * head is less than 512 byte from the end of the ring buffer.
+	 */
+	if (head > watermark)
+		watermark = head;
+
+	/*
+	 * Speed things up by using 64 bit transactions (see "u64 *buf" above)
+	 */
+	watermark /= sizeof(u64);
+	buf_size /= sizeof(u64);
+
+	/*
+	 * If we find trace data at the end of the ring buffer, head has been there and has
+	 * numerically wrapped around at least once.
+	 */
+	for (i = watermark; i < buf_size; i++)
+		if (buf[i])
+			return true;
+
+	return false;
+}
+
+static int arm_spe_find_snapshot(struct auxtrace_record *itr, int idx,
+				  struct auxtrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	int err;
+	bool wrapped;
+	struct arm_spe_recording *ptr =
+			container_of(itr, struct arm_spe_recording, itr);
+
+	/*
+	 * Allocate memory to keep track of wrapping if this is the first
+	 * time we deal with this *mm.
+	 */
+	if (idx >= ptr->wrapped_cnt) {
+		err = arm_spe_alloc_wrapped_array(ptr, idx);
+		if (err)
+			return err;
+	}
+
+	/*
+	 * Check to see if *head has wrapped around.  If it hasn't only the
+	 * amount of data between *head and *old is snapshot'ed to avoid
+	 * bloating the perf.data file with zeros.  But as soon as *head has
+	 * wrapped around the entire size of the AUX ring buffer it taken.
+	 */
+	wrapped = ptr->wrapped[idx];
+	if (!wrapped && arm_spe_buffer_has_wrapped(data, mm->len, *head)) {
+		wrapped = true;
+		ptr->wrapped[idx] = true;
+	}
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head, mm->len);
+
+	/*
+	 * No wrap has occurred, we can just use *head and *old.
+	 */
+	if (!wrapped)
+		return 0;
+
+	/*
+	 * *head has wrapped around - adjust *head and *old to pickup the
+	 * entire content of the AUX buffer.
+	 */
+	if (*head >= mm->len) {
+		*old = *head - mm->len;
+	} else {
+		*head += mm->len;
+		*old = *head - mm->len;
+	}
+
+	return 0;
+}
+
 static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused)
 {
 	struct timespec ts;
@@ -313,6 +456,7 @@ static void arm_spe_recording_free(struct auxtrace_record *itr)
 	struct arm_spe_recording *sper =
 			container_of(itr, struct arm_spe_recording, itr);
 
+	free(sper->wrapped);
 	free(sper);
 }
 
@@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
 	sper->itr.pmu = arm_spe_pmu;
 	sper->itr.snapshot_start = arm_spe_snapshot_start;
 	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
+	sper->itr.find_snapshot = arm_spe_find_snapshot;
 	sper->itr.parse_snapshot_options = arm_spe_parse_snapshot_options;
 	sper->itr.recording_options = arm_spe_recording_options;
 	sper->itr.info_priv_size = arm_spe_info_priv_size;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
                   ` (2 preceding siblings ...)
  2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez
@ 2021-09-16 15:46 ` German Gomez
  2021-10-20 13:13   ` Leo Yan
  2021-09-23 13:35 ` [PATCH 1/5] perf cs-etm: Print size using consistent format Leo Yan
  2021-09-23 16:24 ` Mathieu Poirier
  5 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-09-16 15:46 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: German Gomez, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mike Leach, linux-arm-kernel, coresight

Shell script test_arm_spe.sh has been added to test the recording of SPE
tracing events in snapshot mode.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/tests/shell/test_arm_spe.sh | 91 ++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100755 tools/perf/tests/shell/test_arm_spe.sh

diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
new file mode 100755
index 000000000000..9ed817e76f95
--- /dev/null
+++ b/tools/perf/tests/shell/test_arm_spe.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+# Check Arm SPE trace data recording and synthesized samples
+
+# Uses the 'perf record' to record trace data of Arm SPE events;
+# then verify if any SPE event samples are generated by SPE with
+# 'perf script' and 'perf report' commands.
+
+# SPDX-License-Identifier: GPL-2.0
+# German Gomez <german.gomez@arm.com>, 2021
+
+skip_if_no_arm_spe_event() {
+	perf list | egrep -q 'arm_spe_[0-9]+//' && return 0
+
+	# arm_spe event doesn't exist
+	return 2
+}
+
+skip_if_no_arm_spe_event || exit 2
+
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+glb_err=0
+
+cleanup_files()
+{
+	rm -f ${perfdata}
+	trap - exit term int
+	kill -2 $$ # Forward sigint to parent
+	exit $glb_err
+}
+
+trap cleanup_files exit term int
+
+arm_spe_report() {
+	if [ $2 != 0 ]; then
+		echo "$1: FAIL"
+		glb_err=$2
+	else
+		echo "$1: PASS"
+	fi
+}
+
+perf_script_samples() {
+	echo "Looking at perf.data file for dumping samples:"
+
+	# from arm-spe.c/arm_spe_synth_events()
+	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
+
+	# Below is an example of the samples dumping:
+	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
+	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
+	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
+	perf script -F,-time -i ${perfdata} 2>&1 | \
+		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
+}
+
+perf_report_samples() {
+	echo "Looking at perf.data file for reporting samples:"
+
+	# Below is an example of the samples reporting:
+	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
+	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
+	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
+	perf report --stdio -i ${perfdata} 2>&1 | \
+		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
+}
+
+arm_spe_snapshot_test() {
+	echo "Recording trace with snapshot mode $perfdata"
+	perf record -o ${perfdata} -e arm_spe// -S \
+		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
+	PERFPID=$!
+
+	# Wait for perf program
+	sleep 1
+
+	# Send signal to snapshot trace data
+	kill -USR2 $PERFPID
+
+	# Stop perf program
+	kill $PERFPID
+	wait $PERFPID
+
+	perf_script_samples dd &&
+	perf_report_samples dd
+
+	err=$?
+	arm_spe_report "SPE snapshot testing" $err
+}
+
+arm_spe_snapshot_test
+exit $glb_err
\ No newline at end of file
-- 
2.17.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/5] perf cs-etm: Print size using consistent format
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
                   ` (3 preceding siblings ...)
  2021-09-16 15:46 ` [PATCH 5/5] perf arm-spe: Snapshot mode test German Gomez
@ 2021-09-23 13:35 ` Leo Yan
  2021-09-23 16:24 ` Mathieu Poirier
  5 siblings, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-09-23 13:35 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, Andrew Kilroy, John Garry,
	Will Deacon, Mathieu Poirier, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 16, 2021 at 04:46:31PM +0100, German Gomez wrote:
> From: Andrew Kilroy <andrew.kilroy@arm.com>
> 
> Since the size is already printed earlier in hex, print the same data
> using the same format, in hex.
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>

Reviewed-by: Leo Yan <leo.yan@linaro.org>

> ---
>  tools/perf/util/cs-etm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index f323adb1af85..4f672f7d008c 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -537,7 +537,7 @@ static void cs_etm__dump_event(struct cs_etm_queue *etmq,
>  
>  	fprintf(stdout, "\n");
>  	color_fprintf(stdout, color,
> -		     ". ... CoreSight %s Trace data: size %zu bytes\n",
> +		     ". ... CoreSight %s Trace data: size %#zx bytes\n",
>  		     cs_etm_decoder__get_name(etmq->decoder), buffer->size);
>  
>  	do {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/5] perf arm-spe: Print size using consistent format
  2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez
@ 2021-09-23 13:35   ` Leo Yan
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-09-23 13:35 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, Andrew Kilroy, John Garry,
	Will Deacon, Mathieu Poirier, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 16, 2021 at 04:46:32PM +0100, German Gomez wrote:
> From: Andrew Kilroy <andrew.kilroy@arm.com>
> 
> Since the size is already printed earlier in hex, print the same data
> using the same format, in hex.
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>

Reviewed-by: Leo Yan <leo.yan@linaro.org>

> ---
>  tools/perf/util/arm-spe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 58b7069c5a5f..2196291976d9 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -100,7 +100,7 @@ static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
>  	const char *color = PERF_COLOR_BLUE;
>  
>  	color_fprintf(stdout, color,
> -		      ". ... ARM SPE data: size %zu bytes\n",
> +		      ". ... ARM SPE data: size %#zx bytes\n",
>  		      len);
>  
>  	while (len) {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez
@ 2021-09-23 13:50   ` Leo Yan
  2021-09-23 14:40     ` Leo Yan
  2021-10-17 12:05   ` Leo Yan
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-09-23 13:50 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi German,

On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote:
> The head pointer of the AUX buffer managed by the arm_spe_pmu.c driver
> is not monotonically increasing, therefore the find_snapshot callback is
> needed in order to find the trace data within the AUX buffer and avoid
> wasting space in the perf.data file.
> 
> The pointer is assumed to have wrapped if the buffer contains non-zero
> data at the end. If it has wrapped, the entire contents of the AUX
> buffer are stored in the perf.data file. Otherwise only the data up to
> the head pointer is stored.
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/arch/arm64/util/arm-spe.c | 145 +++++++++++++++++++++++++++
>  1 file changed, 145 insertions(+)
> 
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> index f8b03d164b42..56785034fc84 100644
> --- a/tools/perf/arch/arm64/util/arm-spe.c
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -23,6 +23,7 @@
>  #include "../../../util/auxtrace.h"
>  #include "../../../util/record.h"
>  #include "../../../util/arm-spe.h"
> +#include <tools/libc_compat.h> // reallocarray
>  
>  #define KiB(x) ((x) * 1024)
>  #define MiB(x) ((x) * 1024 * 1024)
> @@ -31,6 +32,8 @@ struct arm_spe_recording {
>  	struct auxtrace_record		itr;
>  	struct perf_pmu			*arm_spe_pmu;
>  	struct evlist		*evlist;
> +	int			wrapped_cnt;
> +	bool			*wrapped;
>  };
>  
>  static void arm_spe_set_timestamp(struct auxtrace_record *itr,
> @@ -299,6 +302,146 @@ static int arm_spe_snapshot_finish(struct auxtrace_record *itr)
>  	return -EINVAL;
>  }
>  
> +static int arm_spe_alloc_wrapped_array(struct arm_spe_recording *ptr, int idx)
> +{
> +	bool *wrapped;
> +	int cnt = ptr->wrapped_cnt, new_cnt, i;
> +
> +	/*
> +	 * No need to allocate, so return early.
> +	 */
> +	if (idx < cnt)
> +		return 0;
> +
> +	/*
> +	 * Make ptr->wrapped as big as idx.
> +	 */
> +	new_cnt = idx + 1;
> +
> +	/*
> +	 * Free'ed in arm_spe_recording_free().
> +	 */
> +	wrapped = reallocarray(ptr->wrapped, new_cnt, sizeof(bool));
> +	if (!wrapped)
> +		return -ENOMEM;
> +
> +	/*
> +	 * init new allocated values.
> +	 */
> +	for (i = cnt; i < new_cnt; i++)
> +		wrapped[i] = false;
> +
> +	ptr->wrapped_cnt = new_cnt;
> +	ptr->wrapped = wrapped;
> +
> +	return 0;
> +}
> +
> +static bool arm_spe_buffer_has_wrapped(unsigned char *buffer,
> +				      size_t buffer_size, u64 head)
> +{
> +	u64 i, watermark;
> +	u64 *buf = (u64 *)buffer;
> +	size_t buf_size = buffer_size;
> +
> +	/*
> +	 * Defensively handle the case where head might be continually increasing - if its value is
> +	 * equal or greater than the size of the ring buffer, then we can safely determine it has
> +	 * wrapped around. Otherwise, continue to detect if head might have wrapped.
> +	 */
> +	if (head >= buffer_size)
> +		return true;
> +
> +	/*
> +	 * We want to look the very last 512 byte (chosen arbitrarily) in the ring buffer.
> +	 */
> +	watermark = buf_size - 512;
> +
> +	/*
> +	 * The value of head is somewhere within the size of the ring buffer. This can be that there
> +	 * hasn't been enough data to fill the ring buffer yet or the trace time was so long that
> +	 * head has numerically wrapped around.  To find we need to check if we have data at the
> +	 * very end of the ring buffer.  We can reliably do this because mmap'ed pages are zeroed
> +	 * out and there is a fresh mapping with every new session.
> +	 */
> +
> +	/*
> +	 * head is less than 512 byte from the end of the ring buffer.
> +	 */
> +	if (head > watermark)
> +		watermark = head;
> +
> +	/*
> +	 * Speed things up by using 64 bit transactions (see "u64 *buf" above)
> +	 */
> +	watermark /= sizeof(u64);
> +	buf_size /= sizeof(u64);
> +
> +	/*
> +	 * If we find trace data at the end of the ring buffer, head has been there and has
> +	 * numerically wrapped around at least once.
> +	 */
> +	for (i = watermark; i < buf_size; i++)
> +		if (buf[i])
> +			return true;
> +
> +	return false;
> +}
> +
> +static int arm_spe_find_snapshot(struct auxtrace_record *itr, int idx,
> +				  struct auxtrace_mmap *mm, unsigned char *data,
> +				  u64 *head, u64 *old)
> +{
> +	int err;
> +	bool wrapped;
> +	struct arm_spe_recording *ptr =
> +			container_of(itr, struct arm_spe_recording, itr);
> +
> +	/*
> +	 * Allocate memory to keep track of wrapping if this is the first
> +	 * time we deal with this *mm.
> +	 */
> +	if (idx >= ptr->wrapped_cnt) {
> +		err = arm_spe_alloc_wrapped_array(ptr, idx);
> +		if (err)
> +			return err;
> +	}
> +
> +	/*
> +	 * Check to see if *head has wrapped around.  If it hasn't only the
> +	 * amount of data between *head and *old is snapshot'ed to avoid
> +	 * bloating the perf.data file with zeros.  But as soon as *head has
> +	 * wrapped around the entire size of the AUX ring buffer it taken.
> +	 */
> +	wrapped = ptr->wrapped[idx];
> +	if (!wrapped && arm_spe_buffer_has_wrapped(data, mm->len, *head)) {
> +		wrapped = true;
> +		ptr->wrapped[idx] = true;
> +	}
> +
> +	pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n",
> +		  __func__, idx, (size_t)*old, (size_t)*head, mm->len);
> +
> +	/*
> +	 * No wrap has occurred, we can just use *head and *old.
> +	 */
> +	if (!wrapped)
> +		return 0;
> +
> +	/*
> +	 * *head has wrapped around - adjust *head and *old to pickup the
> +	 * entire content of the AUX buffer.
> +	 */
> +	if (*head >= mm->len) {
> +		*old = *head - mm->len;
> +	} else {
> +		*head += mm->len;
> +		*old = *head - mm->len;
> +	}
> +
> +	return 0;
> +}
> +
>  static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused)
>  {
>  	struct timespec ts;
> @@ -313,6 +456,7 @@ static void arm_spe_recording_free(struct auxtrace_record *itr)
>  	struct arm_spe_recording *sper =
>  			container_of(itr, struct arm_spe_recording, itr);
>  
> +	free(sper->wrapped);
>  	free(sper);
>  }
>  
> @@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
>  	sper->itr.pmu = arm_spe_pmu;
>  	sper->itr.snapshot_start = arm_spe_snapshot_start;
>  	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
> +	sper->itr.find_snapshot = arm_spe_find_snapshot;

If I understand correctly, this patch copies the code from cs-etm for
snapshot handling.  About 2 months ago, we removed the Arm cs-etm's
specific snapshot callback function and directly use perf's function
__auxtrace_mmap__read() to handle 'head' and 'tail' pointers.  Please
see the commit for details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f01c200d4405c4562e45e8bb4de44a5ce37b217

Before I review more details for snapshot enabling in patches 03 and
04, could you confirm if Arm SPE can use the same way with cs-etm for
snapshot handling?  From my understanding, this is a better way to
handle AUX buffer's 'head' and 'tail'.

Thanks,
Leo

>  	sper->itr.parse_snapshot_options = arm_spe_parse_snapshot_options;
>  	sper->itr.recording_options = arm_spe_recording_options;
>  	sper->itr.info_priv_size = arm_spe_info_priv_size;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-23 13:50   ` Leo Yan
@ 2021-09-23 14:40     ` Leo Yan
  2021-09-30 12:26       ` German Gomez
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-09-23 14:40 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 23, 2021 at 09:50:16PM +0800, Leo Yan wrote:

[...]

> > @@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
> >  	sper->itr.pmu = arm_spe_pmu;
> >  	sper->itr.snapshot_start = arm_spe_snapshot_start;
> >  	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
> > +	sper->itr.find_snapshot = arm_spe_find_snapshot;
> 
> If I understand correctly, this patch copies the code from cs-etm for
> snapshot handling.  About 2 months ago, we removed the Arm cs-etm's
> specific snapshot callback function and directly use perf's function
> __auxtrace_mmap__read() to handle 'head' and 'tail' pointers.  Please
> see the commit for details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f01c200d4405c4562e45e8bb4de44a5ce37b217
> 
> Before I review more details for snapshot enabling in patches 03 and
> 04, could you confirm if Arm SPE can use the same way with cs-etm for
> snapshot handling?  From my understanding, this is a better way to
> handle AUX buffer's 'head' and 'tail'.

In other words, if we can only apply patch 03 and can pass the testing
in patch 05, then it would be a very neat implementation.

I will try to verify these patches and will get back result.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/5] perf cs-etm: Print size using consistent format
  2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
                   ` (4 preceding siblings ...)
  2021-09-23 13:35 ` [PATCH 1/5] perf cs-etm: Print size using consistent format Leo Yan
@ 2021-09-23 16:24 ` Mathieu Poirier
  2021-09-30 12:09   ` German Gomez
  5 siblings, 1 reply; 38+ messages in thread
From: Mathieu Poirier @ 2021-09-23 16:24 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, Andrew Kilroy, John Garry,
	Will Deacon, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi German,

On Thu, Sep 16, 2021 at 04:46:31PM +0100, German Gomez wrote:
> From: Andrew Kilroy <andrew.kilroy@arm.com>
> 
> Since the size is already printed earlier in hex, print the same data
> using the same format, in hex.
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index f323adb1af85..4f672f7d008c 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -537,7 +537,7 @@ static void cs_etm__dump_event(struct cs_etm_queue *etmq,
>  
>  	fprintf(stdout, "\n");
>  	color_fprintf(stdout, color,
> -		     ". ... CoreSight %s Trace data: size %zu bytes\n",
> +		     ". ... CoreSight %s Trace data: size %#zx bytes\n",

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

A couple of things to improve for your next interactions with the Linux community:

1) Using a cover letter, even for small changes, is always a good idea.
2) RB tags should be picked up publicly rather than done internally and added to
a patchset.
3) Keep patches semantically grouped.  Here patches 04 and 05 have nothing to do
with 01, 02 and 03.

Moreover Arnaldo queues changes to the perf tools but I don't see him CC'ed to
this patchset.  As such he will not see your work.  Ask James about how to
proceed when submitting patches to the perf tools.

Thanks,
Mathieu

>  		     cs_etm_decoder__get_name(etmq->decoder), buffer->size);
>  
>  	do {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/5] perf cs-etm: Print size using consistent format
  2021-09-23 16:24 ` Mathieu Poirier
@ 2021-09-30 12:09   ` German Gomez
  2021-09-30 16:30     ` Mathieu Poirier
  0 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-09-30 12:09 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: linux-kernel, linux-perf-users, Andrew Kilroy, John Garry,
	Will Deacon, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi Mathieu,

Thanks for your feedback. I will keep these points in mind for future 
submissions.

On 23/09/2021 17:24, Mathieu Poirier wrote:
> Hi German,
>
> On Thu, Sep 16, 2021 at 04:46:31PM +0100, German Gomez wrote:
>> [...]
> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
>
> A couple of things to improve for your next interactions with the Linux community:
>
> 1) Using a cover letter, even for small changes, is always a good idea.
> 2) RB tags should be picked up publicly rather than done internally and added to
> a patchset.
> 3) Keep patches semantically grouped.  Here patches 04 and 05 have nothing to do
> with 01, 02 and 03.
Did you perhaps mean separating 01 and 02 from the rest? I grouped 03 to 
05 because
they were related to snapshot mode.

Thanks,
German

>
> Moreover Arnaldo queues changes to the perf tools but I don't see him CC'ed to
> this patchset.  As such he will not see your work.  Ask James about how to
> proceed when submitting patches to the perf tools.
>
> Thanks,
> Mathieu
>
>>   		     cs_etm_decoder__get_name(etmq->decoder), buffer->size);
>>   
>>   	do {
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-23 14:40     ` Leo Yan
@ 2021-09-30 12:26       ` German Gomez
  2021-10-04 12:27         ` Leo Yan
  0 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-09-30 12:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi Leo,

On 23/09/2021 15:40, Leo Yan wrote:
> On Thu, Sep 23, 2021 at 09:50:16PM +0800, Leo Yan wrote:
>
> [...]
>
>>> @@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err,
>>>   	sper->itr.pmu = arm_spe_pmu;
>>>   	sper->itr.snapshot_start = arm_spe_snapshot_start;
>>>   	sper->itr.snapshot_finish = arm_spe_snapshot_finish;
>>> +	sper->itr.find_snapshot = arm_spe_find_snapshot;
>> If I understand correctly, this patch copies the code from cs-etm for
>> snapshot handling.  About 2 months ago, we removed the Arm cs-etm's
>> specific snapshot callback function and directly use perf's function
>> __auxtrace_mmap__read() to handle 'head' and 'tail' pointers.  Please
>> see the commit for details:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f01c200d4405c4562e45e8bb4de44a5ce37b217
>>
>> Before I review more details for snapshot enabling in patches 03 and
>> 04, could you confirm if Arm SPE can use the same way with cs-etm for
>> snapshot handling?  From my understanding, this is a better way to
>> handle AUX buffer's 'head' and 'tail'.
> In other words, if we can only apply patch 03 and can pass the testing
> in patch 05, then it would be a very neat implementation.
>
> I will try to verify these patches and will get back result.
>
> Thanks,
> Leo
The patch is indeed based on that commit. The reason behind it is that the
values for *head are being wrapped in the driver side (see the macro
PERF_IDX2OFF which is used at various points in 
/drivers/perf/arm_spe_pmu.c).

If this callback is not to be added, I believe the driver needs to be 
updated
first so that the head pointer monotonically increases like in cs-etm. 
Do you
think this makes sense for SPE?

(note that the patch will skip the wrap-around detection if this is the 
case,
in order to handle both cases in the userspace perf tool).

Thanks,
German


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/5] perf cs-etm: Print size using consistent format
  2021-09-30 12:09   ` German Gomez
@ 2021-09-30 16:30     ` Mathieu Poirier
  0 siblings, 0 replies; 38+ messages in thread
From: Mathieu Poirier @ 2021-09-30 16:30 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, Andrew Kilroy, John Garry,
	Will Deacon, Leo Yan, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 30, 2021 at 01:09:16PM +0100, German Gomez wrote:
> Hi Mathieu,
> 
> Thanks for your feedback. I will keep these points in mind for future
> submissions.
> 
> On 23/09/2021 17:24, Mathieu Poirier wrote:
> > Hi German,
> > 
> > On Thu, Sep 16, 2021 at 04:46:31PM +0100, German Gomez wrote:
> > > [...]
> > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > 
> > A couple of things to improve for your next interactions with the Linux community:
> > 
> > 1) Using a cover letter, even for small changes, is always a good idea.
> > 2) RB tags should be picked up publicly rather than done internally and added to
> > a patchset.
> > 3) Keep patches semantically grouped.  Here patches 04 and 05 have nothing to do
> > with 01, 02 and 03.
> Did you perhaps mean separating 01 and 02 from the rest? I grouped 03 to 05
> because
> they were related to snapshot mode.

Yes - you are correct.  It should have been 01 and 02 in one set and the rest in
another set.

> 
> Thanks,
> German
> 
> > 
> > Moreover Arnaldo queues changes to the perf tools but I don't see him CC'ed to
> > this patchset.  As such he will not see your work.  Ask James about how to
> > proceed when submitting patches to the perf tools.
> > 
> > Thanks,
> > Mathieu
> > 
> > >   		     cs_etm_decoder__get_name(etmq->decoder), buffer->size);
> > >   	do {
> > > -- 
> > > 2.17.1
> > > 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-30 12:26       ` German Gomez
@ 2021-10-04 12:27         ` Leo Yan
  2021-10-06  9:35           ` German Gomez
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-10-04 12:27 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi German,

On Thu, Sep 30, 2021 at 01:26:15PM +0100, German Gomez wrote:

[...]

> The patch is indeed based on that commit. The reason behind it is that the
> values for *head are being wrapped in the driver side (see the macro
> PERF_IDX2OFF which is used at various points in
> /drivers/perf/arm_spe_pmu.c).

Yes, I noted that Arm SPE driver doesn't use monotonical increasing
for AUX head.

> If this callback is not to be added, I believe the driver needs to be
> updated > first so that the head pointer monotonically increases like in cs-etm. Do
> you think this makes sense for SPE?

Please note, there have two cases should be handled for snapshot mode:
- Wrap-around case, somehow function __auxtrace_mmap__read() has
  handled this case, see [1];
- It's possible that there have overrun case for snapshot mode, e.g.
  the kernel space receives multiple signals and take snapshot to save
  Arm SPE trace data into AUX buffer for multiple times, but the
  userspace tool cannot catch up to save AUX data into perf.data file.
  Thus the AUX head might be wrapped around for multiple times, for
  this case, I think monotonically increasing AUX head is the right
  solution to handle overrun issue.

So simply say, I think the head pointer monotonically increasing is
the right thing to do in Arm SPE driver.

> (note that the patch will skip the wrap-around detection if this is the
> case,
> in order to handle both cases in the userspace perf tool).

Almost agree, I read multiple times but have no idea what's the
"both cases" in the last sentence.

Please let me know if anything is not clear.

Thanks,
Leo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/auxtrace.c#n1804

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-04 12:27         ` Leo Yan
@ 2021-10-06  9:35           ` German Gomez
  2021-10-06  9:51             ` Leo Yan
  0 siblings, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-10-06  9:35 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi Leo,

Many thanks for your comments.

On 04/10/2021 13:27, Leo Yan wrote:

> Hi German,
>
> On Thu, Sep 30, 2021 at 01:26:15PM +0100, German Gomez wrote:
>
> [...]
>
> So simply say, I think the head pointer monotonically increasing is
> the right thing to do in Arm SPE driver.

I will talk to James about how we can proceed on this.

>> (note that the patch will skip the wrap-around detection if this is the
>> case,
>> in order to handle both cases in the userspace perf tool).
> Almost agree, I read multiple times but have no idea what's the
> "both cases" in the last sentence.

Apologies for the later part was not clear. What I meant to say was that
in the original patch for cs-etm, it seemed to handle both cases where
AUX head might be monotonically and non-monotonically increasing, so we
applied the same for the Arm SPE patch.

>
> Please let me know if anything is not clear.
>
> Thanks,
> Leo
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/auxtrace.c#n1804

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-06  9:35           ` German Gomez
@ 2021-10-06  9:51             ` Leo Yan
  2021-10-11 15:55               ` German Gomez
  0 siblings, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-10-06  9:51 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:

[...]

> > So simply say, I think the head pointer monotonically increasing is
> > the right thing to do in Arm SPE driver.
> 
> I will talk to James about how we can proceed on this.

Thanks!

> >> (note that the patch will skip the wrap-around detection if this is the
> >> case,
> >> in order to handle both cases in the userspace perf tool).
> > Almost agree, I read multiple times but have no idea what's the
> > "both cases" in the last sentence.
> 
> Apologies for the later part was not clear. What I meant to say was that
> in the original patch for cs-etm, it seemed to handle both cases where
> AUX head might be monotonically and non-monotonically increasing, so we
> applied the same for the Arm SPE patch.

No worries, thanks for explanation.

Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-06  9:51             ` Leo Yan
@ 2021-10-11 15:55               ` German Gomez
  2021-10-12  8:19                 ` Will Deacon
  2021-10-13  0:39                 ` Leo Yan
  0 siblings, 2 replies; 38+ messages in thread
From: German Gomez @ 2021-10-11 15:55 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi Leo,

On 06/10/2021 10:51, Leo Yan wrote:
> On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
>
> [...]
>
>>> So simply say, I think the head pointer monotonically increasing is
>>> the right thing to do in Arm SPE driver.
>> I will talk to James about how we can proceed on this.
> Thanks!

I took this offline with James and, though it looks possible to patch
the SPE driver to have a monotonically increasing head pointer in order
to simplify the handling in the perf tool, it could be a breaking change
for users of the perf_event_open syscall that currently rely on the way
it works now.

An alternative way we considered to simplify the patch is to change the
logic inside the find_snapshot callback so that it records the entire
contents of the aux buffer every time.

What do you think?

Many thanks,
German

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-11 15:55               ` German Gomez
@ 2021-10-12  8:19                 ` Will Deacon
  2021-10-12  8:47                   ` James Clark
  2021-10-13  0:39                 ` Leo Yan
  1 sibling, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-10-12  8:19 UTC (permalink / raw)
  To: German Gomez
  Cc: Leo Yan, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

On Mon, Oct 11, 2021 at 04:55:37PM +0100, German Gomez wrote:
> Hi Leo,
> 
> On 06/10/2021 10:51, Leo Yan wrote:
> > On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
> >
> > [...]
> >
> >>> So simply say, I think the head pointer monotonically increasing is
> >>> the right thing to do in Arm SPE driver.
> >> I will talk to James about how we can proceed on this.
> > Thanks!
> 
> I took this offline with James and, though it looks possible to patch
> the SPE driver to have a monotonically increasing head pointer in order
> to simplify the handling in the perf tool, it could be a breaking change
> for users of the perf_event_open syscall that currently rely on the way
> it works now.
> 
> An alternative way we considered to simplify the patch is to change the
> logic inside the find_snapshot callback so that it records the entire
> contents of the aux buffer every time.
> 
> What do you think?

What does intel-pt do?

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-12  8:19                 ` Will Deacon
@ 2021-10-12  8:47                   ` James Clark
  0 siblings, 0 replies; 38+ messages in thread
From: James Clark @ 2021-10-12  8:47 UTC (permalink / raw)
  To: Will Deacon, German Gomez
  Cc: Leo Yan, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight



On 12/10/2021 09:19, Will Deacon wrote:
> On Mon, Oct 11, 2021 at 04:55:37PM +0100, German Gomez wrote:
>> Hi Leo,
>>
>> On 06/10/2021 10:51, Leo Yan wrote:
>>> On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
>>>
>>> [...]
>>>
>>>>> So simply say, I think the head pointer monotonically increasing is
>>>>> the right thing to do in Arm SPE driver.
>>>> I will talk to James about how we can proceed on this.
>>> Thanks!
>>
>> I took this offline with James and, though it looks possible to patch
>> the SPE driver to have a monotonically increasing head pointer in order
>> to simplify the handling in the perf tool, it could be a breaking change
>> for users of the perf_event_open syscall that currently rely on the way
>> it works now.
>>
>> An alternative way we considered to simplify the patch is to change the
>> logic inside the find_snapshot callback so that it records the entire
>> contents of the aux buffer every time.
>>
>> What do you think?
> 
> What does intel-pt do?

Intel-pt has a wrapped head, which is why it has the intel_pt_find_snapshot()
function in perf to try to not save any zeros from the buffer that haven't
been written yet. (With a wrapped head pointer it's impossible to tell).

Coresight has a monotonically increasing head pointer so it is possible to
tell. Recently, Leo removed the Coresight version of find_snapshot() for this
reason.

It would be nice to do the same for SPE because that function has a heuristic
and is also slow, but I imagine that not returning wrapped head pointers could
break anything that expects them.

James
 
> 
> Will
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-11 15:55               ` German Gomez
  2021-10-12  8:19                 ` Will Deacon
@ 2021-10-13  0:39                 ` Leo Yan
  2021-10-13  7:51                   ` Will Deacon
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-10-13  0:39 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi German,

On Mon, Oct 11, 2021 at 04:55:37PM +0100, German Gomez wrote:
> Hi Leo,
> 
> On 06/10/2021 10:51, Leo Yan wrote:
> > On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
> >
> > [...]
> >
> >>> So simply say, I think the head pointer monotonically increasing is
> >>> the right thing to do in Arm SPE driver.
> >> I will talk to James about how we can proceed on this.
> > Thanks!
> 
> I took this offline with James and, though it looks possible to patch
> the SPE driver to have a monotonically increasing head pointer in order
> to simplify the handling in the perf tool, it could be a breaking change
> for users of the perf_event_open syscall that currently rely on the way
> it works now.

Here I cannot create the connection between AUX head pointer and the
breakage of calling perf_event_open().

Could you elaborate what's the reason the monotonical increasing head
pointer will lead to the breakage for perf_event_open()?

> An alternative way we considered to simplify the patch is to change the
> logic inside the find_snapshot callback so that it records the entire
> contents of the aux buffer every time.
> 
> What do you think?

We cannot do this way.  If we send USR2 signal with very small interval,
then it's possible the hardware trace data cannot fill the full of AUX
buffer.  You could use below commands for the testing and should can
observe it produces small chunk trace data:
  
  perf record -e arm_spe_0// -S -a -- dd if=/dev/zero of=/dev/null &
  PERFPID=$!
  sleep 1
  kill -USR2 $PERFPID
  sleep .1
  kill -USR2 $PERFPID

Thanks,
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-13  0:39                 ` Leo Yan
@ 2021-10-13  7:51                   ` Will Deacon
  2021-10-15 12:33                     ` German Gomez
  0 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-10-13  7:51 UTC (permalink / raw)
  To: Leo Yan
  Cc: German Gomez, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

On Wed, Oct 13, 2021 at 08:39:16AM +0800, Leo Yan wrote:
> On Mon, Oct 11, 2021 at 04:55:37PM +0100, German Gomez wrote:
> > On 06/10/2021 10:51, Leo Yan wrote:
> > > On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
> > >
> > > [...]
> > >
> > >>> So simply say, I think the head pointer monotonically increasing is
> > >>> the right thing to do in Arm SPE driver.
> > >> I will talk to James about how we can proceed on this.
> > > Thanks!
> > 
> > I took this offline with James and, though it looks possible to patch
> > the SPE driver to have a monotonically increasing head pointer in order
> > to simplify the handling in the perf tool, it could be a breaking change
> > for users of the perf_event_open syscall that currently rely on the way
> > it works now.
> 
> Here I cannot create the connection between AUX head pointer and the
> breakage of calling perf_event_open().
> 
> Could you elaborate what's the reason the monotonical increasing head
> pointer will lead to the breakage for perf_event_open()?

It's a user-visible change in behaviour, isn't it? Therefore we risk
breaking applications that rely on the current behaviour if we change it
unconditionally.

Given that the driver has always worked like this and it doesn't sound like
it's the end of the world to deal with it in userspace (after all, it's
aligned with intel-pt), then I don't think we should change it.

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-13  7:51                   ` Will Deacon
@ 2021-10-15 12:33                     ` German Gomez
  2021-10-15 14:16                       ` Leo Yan
  2021-10-17  6:13                       ` Leo Yan
  0 siblings, 2 replies; 38+ messages in thread
From: German Gomez @ 2021-10-15 12:33 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi Leo,

Would you be ok with the current patch the way it is? In case it's of
any help, I'm sharing the testing steps that James and I went through
when testing this internally, if you want to add to it

- Test that only a portion of the buffer is saved until there is a wraparound

$ ./perf record -vvv -e arm_spe/period=148576/u -S -- taskset --cpu-list 0 stress --cpu 1 & while true; do sleep 0.2; killall -s USR2 perf; done

- Test snapshot mode in CPU mode

$ sudo ./perf record -vvv -C 0 -e arm_spe/period=148576/u -S -- taskset --cpu-list 0 stress --cpu 1 &

- Test that auxtrace buffers correspond to an aux record
- Test snapshot default sizes in sudo and user modes
- Test small snapshot size

$ ./perf record -vvv -e arm_spe/period=148576/u -S1000 -m16,16 -- taskset --cpu-list 0 stress --cpu 1 &

If there are any concerns with the patches, please let me know and I
will try to address them.

Thanks,
German

On 13/10/2021 08:51, Will Deacon wrote:
> On Wed, Oct 13, 2021 at 08:39:16AM +0800, Leo Yan wrote:
>> On Mon, Oct 11, 2021 at 04:55:37PM +0100, German Gomez wrote:
>>> On 06/10/2021 10:51, Leo Yan wrote:
>>>> On Wed, Oct 06, 2021 at 10:35:20AM +0100, German Gomez wrote:
>>>>
>>>> [...]
>>>>
>>>>>> So simply say, I think the head pointer monotonically increasing is
>>>>>> the right thing to do in Arm SPE driver.
>>>>> I will talk to James about how we can proceed on this.
>>>> Thanks!
>>> I took this offline with James and, though it looks possible to patch
>>> the SPE driver to have a monotonically increasing head pointer in order
>>> to simplify the handling in the perf tool, it could be a breaking change
>>> for users of the perf_event_open syscall that currently rely on the way
>>> it works now.
>> Here I cannot create the connection between AUX head pointer and the
>> breakage of calling perf_event_open().
>>
>> Could you elaborate what's the reason the monotonical increasing head
>> pointer will lead to the breakage for perf_event_open()?
> It's a user-visible change in behaviour, isn't it? Therefore we risk
> breaking applications that rely on the current behaviour if we change it
> unconditionally.
>
> Given that the driver has always worked like this and it doesn't sound like
> it's the end of the world to deal with it in userspace (after all, it's
> aligned with intel-pt), then I don't think we should change it.
>
> Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-15 12:33                     ` German Gomez
@ 2021-10-15 14:16                       ` Leo Yan
  2021-10-15 14:41                         ` German Gomez
  2021-10-17  6:13                       ` Leo Yan
  1 sibling, 1 reply; 38+ messages in thread
From: Leo Yan @ 2021-10-15 14:16 UTC (permalink / raw)
  To: German Gomez
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi German,

On Fri, Oct 15, 2021 at 01:33:39PM +0100, German Gomez wrote:
> Hi Leo,
> 
> Would you be ok with the current patch the way it is?

Sorry for my failure to catch up the discussion.

As you and Will have mentioned in other emails that it will lead to
breakage if we change to monotonical increasing head, I read the code
and realized the difficulty to use monotonical increasing head in Arm
SPE driver.  So let's use the way as this patch set is.

> In case it's of
> any help, I'm sharing the testing steps that James and I went through
> when testing this internally, if you want to add to it
> 
> - Test that only a portion of the buffer is saved until there is a wraparound
> 
> $ ./perf record -vvv -e arm_spe/period=148576/u -S -- taskset --cpu-list 0 stress --cpu 1 & while true; do sleep 0.2; killall -s USR2 perf; done
> 
> - Test snapshot mode in CPU mode
> 
> $ sudo ./perf record -vvv -C 0 -e arm_spe/period=148576/u -S -- taskset --cpu-list 0 stress --cpu 1 &
> 
> - Test that auxtrace buffers correspond to an aux record
> - Test snapshot default sizes in sudo and user modes
> - Test small snapshot size
> 
> $ ./perf record -vvv -e arm_spe/period=148576/u -S1000 -m16,16 -- taskset --cpu-list 0 stress --cpu 1 &
> 
> If there are any concerns with the patches, please let me know and I
> will try to address them.

Thanks for sharing the testing cases.  Could give me a bit more time for
the test at my side?  And please expect I might give some comments if
I think it's necessary.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-15 14:16                       ` Leo Yan
@ 2021-10-15 14:41                         ` German Gomez
  0 siblings, 0 replies; 38+ messages in thread
From: German Gomez @ 2021-10-15 14:41 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark


On 15/10/2021 15:16, Leo Yan wrote:
> Hi German,
>
> On Fri, Oct 15, 2021 at 01:33:39PM +0100, German Gomez wrote:
>
> [...]
>
> Thanks for sharing the testing cases.  Could give me a bit more time for
> the test at my side?  And please expect I might give some comments if
> I think it's necessary.
>
> Thanks,
> Leo


Absolutely. Please take the time you need.

Many thanks,
German



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-15 12:33                     ` German Gomez
  2021-10-15 14:16                       ` Leo Yan
@ 2021-10-17  6:13                       ` Leo Yan
  2021-10-19  9:23                         ` German Gomez
  2021-11-02 11:02                         ` German Gomez
  1 sibling, 2 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-17  6:13 UTC (permalink / raw)
  To: German Gomez
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi German, Will,

On Fri, Oct 15, 2021 at 01:33:39PM +0100, German Gomez wrote:

[...]

> $ ./perf record -vvv -e arm_spe/period=148576/u -S1000 -m16,16 -- taskset --cpu-list 0 stress --cpu 1 &

When testing Arm SPE snapshot mode with the command (it's quite
similiar with up command but not exactly same):

# ./perf --debug verbose=3 record -e arm_spe/period=148576/u -C 0 -S1000 -m16,16 \
    -- taskset --cpu-list 0 stress --cpu 1 &
# kill -USR2 [pid_num]

... then I wait for long time and didn't stop the perf program, then
I observed the output file contains many redundant events
PERF_RECORD_AUX.  E.g. in the shared perf data file [1], you could use
below commands to see tons of the events PERF_RECORD_AUX which I only
send only one USR2 signal for taking snapshot:

  # perf report -D -i perf.data --stdio | grep -E 'RECORD_AUX' | wc -l
  2245787

  # perf report -D -i perf.data --stdio | grep -E 'SPE'
  . ... ARM SPE data: size 0x3e8 bytes
  Binary file (standard input) matches

I looked into the Arm SPE driver and found it doesn't really support
free run mode for AUX ring buffer when the driver runs in snapshot
mode, the pair functions perf_aux_output_end() and
perf_aux_output_begin() are invoked when every time handle the
interrupt.  The detailed flow is:

  arm_spe_pmu_irq_handler()
    `> arm_spe_pmu_buf_get_fault_act()
         `> arm_spe_perf_aux_output_end()
              `> set SPE registers
              `> perf_aux_output_end()
    `> arm_spe_perf_aux_output_begin()
         `> perf_aux_output_begin()
         `> set SPE registers

Seems to me, a possible solution is to add an extra parameter 'int
in_interrupt' for functions arm_spe_perf_aux_output_end() and
arm_spe_perf_aux_output_begin(), if this parameter is passed as 1 in
the interrupt handling, these two functions should skip invoking
perf_aux_output_end() and perf_aux_output_begin() so can avoid the
redundant perf event PERF_RECORD_AUX.

  arm_spe_pmu_irq_handler()
    `> arm_spe_pmu_buf_get_fault_act()
         `> arm_spe_perf_aux_output_end(..., in_interrupt=1)
              `> set SPE registers
    `> arm_spe_perf_aux_output_begin(..., in_interrupt=1)
         `> set SPE registers

P.s. I think Intel-PT has supported free run mode for snapshot mode,
so it should not generate interrupt in this mode.  Thus Intel-PT can
avoid this issue, please see the code [2].

Thanks,
Leo

[1] https://people.linaro.org/~leo.yan/spe/snapshot_test/perf.data
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/pt.c#n753

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez
  2021-09-23 13:50   ` Leo Yan
@ 2021-10-17 12:05   ` Leo Yan
  2021-10-17 12:36     ` Leo Yan
  2021-10-19 17:34     ` German Gomez
  1 sibling, 2 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-17 12:05 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote:

[...]

> +static int arm_spe_find_snapshot(struct auxtrace_record *itr, int idx,
> +				  struct auxtrace_mmap *mm, unsigned char *data,
> +				  u64 *head, u64 *old)
> +{
> +	int err;
> +	bool wrapped;
> +	struct arm_spe_recording *ptr =
> +			container_of(itr, struct arm_spe_recording, itr);
> +
> +	/*
> +	 * Allocate memory to keep track of wrapping if this is the first
> +	 * time we deal with this *mm.
> +	 */
> +	if (idx >= ptr->wrapped_cnt) {
> +		err = arm_spe_alloc_wrapped_array(ptr, idx);
> +		if (err)
> +			return err;
> +	}
> +
> +	/*
> +	 * Check to see if *head has wrapped around.  If it hasn't only the
> +	 * amount of data between *head and *old is snapshot'ed to avoid
> +	 * bloating the perf.data file with zeros.  But as soon as *head has
> +	 * wrapped around the entire size of the AUX ring buffer it taken.
> +	 */
> +	wrapped = ptr->wrapped[idx];
> +	if (!wrapped && arm_spe_buffer_has_wrapped(data, mm->len, *head)) {
> +		wrapped = true;
> +		ptr->wrapped[idx] = true;
> +	}
> +
> +	pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n",
> +		  __func__, idx, (size_t)*old, (size_t)*head, mm->len);
> +
> +	/*
> +	 * No wrap has occurred, we can just use *head and *old.
> +	 */
> +	if (!wrapped)
> +		return 0;
> +
> +	/*
> +	 * *head has wrapped around - adjust *head and *old to pickup the
> +	 * entire content of the AUX buffer.
> +	 */
> +	if (*head >= mm->len) {
> +		*old = *head - mm->len;
> +	} else {
> +		*head += mm->len;
> +		*old = *head - mm->len;
> +	}
> +
> +	return 0;
> +}

If run a test case (the test is pasted at the end of the reply), I
can get quite different AUX trace data with passing different wait
period before sending the first USR2 signal.

  # sh test_arm_spe_snapshot.sh 2
  Couldn't synthesize bpf events.
  stress: info: [5768] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 2.833 MB perf.data ]

  # sh test_arm_spe_snapshot.sh 10
  Couldn't synthesize bpf events.
  stress: info: [5776] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 24.356 MB perf.data ]

The first command passes argument '2' so the test will wait for 2
seconds before send USR2 signal for snapshot, and the perf data file is
2.833 MB (so this means the Arm SPE trace data is about 2MB) for three
snapshots.  In the second command, the argument '10' means it will wait
for 10 seconds before sending the USR2 signals, and every time it records
the trace data from the full AUX buffer (8MB), at the end it gets 24MB
AUX trace data.

The issue happens in the second command, waiting for 10 seconds leads
to the *full* AUX ring buffer is filled by Arm SPE, so the function
arm_spe_buffer_has_wrapped() always return back true for this case.
Afterwards, arm_spe_find_snapshot() doesn't respect the passed old
header (from '*old') and assumes the trace data size is 'mm->len'.

To allow arm_spe_buffer_has_wrapped() to work properly, I think we
need to clean up the top 8 bytes of the AUX buffer in Arm SPE driver
when start the PMU event (please note, this change has an assumption
that is meantioned in another email that suggests to remove redundant
PERF_RECORD_AUX events so the function arm_spe_perf_aux_output_begin()
is invoked only once when start PMU event, so we can use the top 8
bytes in AUX buffer to indicate trace is wrap around or not).


diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..eb35f85d0efb 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -493,6 +493,16 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
        if (limit)
                limit |= BIT(SYS_PMBLIMITR_EL1_E_SHIFT);

+       /*
+        * Cleanup the top 8 bytes for snapshot mode; these 8 bytes are
+        * used to indicate if trace data is wrap around if they are not
+        * zero.
+        */
+       if (buf->snapshot) {
+               void *tail = buf->base + (buf->nr_pages << PAGE_SHIFT) - 8;
+               memset(tail, 0x0, 8);
+       }
+
        limit += (u64)buf->base;
        base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
        write_sysreg_s(base, SYS_PMBPTR_EL1);

Thanks,
Leo

---8<---

#!/bin/sh

./perf record -e arm_spe/period=148576/u -C 0 -S -m8M,8M -- taskset --cpu-list 0 stress --cpu 1 &

PERFPID=$!

echo "sleep $1 seconds" > /sys/kernel/debug/tracing/trace_marker

# Wait for perf program
sleep  $1

# Send signal to snapshot trace data
kill -USR2 $PERFPID
sleep .03
kill -USR2 $PERFPID
sleep .03
kill -USR2 $PERFPID

echo "Stop snapshot" > /sys/kernel/debug/tracing/trace_marker

kill $PERFPID
wait $PERFPID

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-17 12:05   ` Leo Yan
@ 2021-10-17 12:36     ` Leo Yan
  2021-10-19 17:34     ` German Gomez
  1 sibling, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-17 12:36 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Sun, Oct 17, 2021 at 08:05:46PM +0800, Leo Yan wrote:

[...]


> To allow arm_spe_buffer_has_wrapped() to work properly, I think we
> need to clean up the top 8 bytes of the AUX buffer in Arm SPE driver
> when start the PMU event (please note, this change has an assumption
> that is meantioned in another email that suggests to remove redundant
> PERF_RECORD_AUX events so the function arm_spe_perf_aux_output_begin()
> is invoked only once when start PMU event, so we can use the top 8
> bytes in AUX buffer to indicate trace is wrap around or not).
> 
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d44bcc29d99c..eb35f85d0efb 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -493,6 +493,16 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
>         if (limit)
>                 limit |= BIT(SYS_PMBLIMITR_EL1_E_SHIFT);
> 
> +       /*
> +        * Cleanup the top 8 bytes for snapshot mode; these 8 bytes are
> +        * used to indicate if trace data is wrap around if they are not
> +        * zero.
> +        */
> +       if (buf->snapshot) {
> +               void *tail = buf->base + (buf->nr_pages << PAGE_SHIFT) - 8;
> +               memset(tail, 0x0, 8);

Here need to add below code for flushing data cache:

                 flush_dcache_range((unsigned long)tail, (unsigned long)tail+8);

Sorry for spamming.

Leo

> +       }
> +
>         limit += (u64)buf->base;
>         base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
>         write_sysreg_s(base, SYS_PMBPTR_EL1);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-17  6:13                       ` Leo Yan
@ 2021-10-19  9:23                         ` German Gomez
  2021-10-19 13:12                           ` Leo Yan
  2021-11-02 11:02                         ` German Gomez
  1 sibling, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-10-19  9:23 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi Leo,

Yeah I agree the redundant AUX events are adding unnecessary bloat to
the perf.data file... We actually cam across this when doing one of the
test cases. Sorry for not reporting it!

Could we patch the driver in a separate patch set? Or do you think this
is critical for the purposes of this one?

Thanks,
German

On 17/10/2021 07:13, Leo Yan wrote:
> Hi German, Will,
>
> On Fri, Oct 15, 2021 at 01:33:39PM +0100, German Gomez wrote:
>
> [...]
>
>> $ ./perf record -vvv -e arm_spe/period=148576/u -S1000 -m16,16 -- taskset --cpu-list 0 stress --cpu 1 &
> When testing Arm SPE snapshot mode with the command (it's quite
> similiar with up command but not exactly same):
>
> # ./perf --debug verbose=3 record -e arm_spe/period=148576/u -C 0 -S1000 -m16,16 \
>     -- taskset --cpu-list 0 stress --cpu 1 &
> # kill -USR2 [pid_num]
>
> ... then I wait for long time and didn't stop the perf program, then
> I observed the output file contains many redundant events
> PERF_RECORD_AUX.  E.g. in the shared perf data file [1], you could use
> below commands to see tons of the events PERF_RECORD_AUX which I only
> send only one USR2 signal for taking snapshot:
>
>   # perf report -D -i perf.data --stdio | grep -E 'RECORD_AUX' | wc -l
>   2245787
>
>   # perf report -D -i perf.data --stdio | grep -E 'SPE'
>   . ... ARM SPE data: size 0x3e8 bytes
>   Binary file (standard input) matches
>
> I looked into the Arm SPE driver and found it doesn't really support
> free run mode for AUX ring buffer when the driver runs in snapshot
> mode, the pair functions perf_aux_output_end() and
> perf_aux_output_begin() are invoked when every time handle the
> interrupt.  The detailed flow is:
>
>   arm_spe_pmu_irq_handler()
>     `> arm_spe_pmu_buf_get_fault_act()
>          `> arm_spe_perf_aux_output_end()
>               `> set SPE registers
>               `> perf_aux_output_end()
>     `> arm_spe_perf_aux_output_begin()
>          `> perf_aux_output_begin()
>          `> set SPE registers
>
> Seems to me, a possible solution is to add an extra parameter 'int
> in_interrupt' for functions arm_spe_perf_aux_output_end() and
> arm_spe_perf_aux_output_begin(), if this parameter is passed as 1 in
> the interrupt handling, these two functions should skip invoking
> perf_aux_output_end() and perf_aux_output_begin() so can avoid the
> redundant perf event PERF_RECORD_AUX.
>
>   arm_spe_pmu_irq_handler()
>     `> arm_spe_pmu_buf_get_fault_act()
>          `> arm_spe_perf_aux_output_end(..., in_interrupt=1)
>               `> set SPE registers
>     `> arm_spe_perf_aux_output_begin(..., in_interrupt=1)
>          `> set SPE registers
>
> P.s. I think Intel-PT has supported free run mode for snapshot mode,
> so it should not generate interrupt in this mode.  Thus Intel-PT can
> avoid this issue, please see the code [2].
>
> Thanks,
> Leo
>
> [1] https://people.linaro.org/~leo.yan/spe/snapshot_test/perf.data
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/pt.c#n753

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-19  9:23                         ` German Gomez
@ 2021-10-19 13:12                           ` Leo Yan
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-19 13:12 UTC (permalink / raw)
  To: German Gomez
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi German,

On Tue, Oct 19, 2021 at 10:23:01AM +0100, German Gomez wrote:
> Hi Leo,
> 
> Yeah I agree the redundant AUX events are adding unnecessary bloat to
> the perf.data file... We actually cam across this when doing one of the
> test cases. Sorry for not reporting it!

No worries.

> Could we patch the driver in a separate patch set? Or do you think this
> is critical for the purposes of this one?

Yeah, we can take low priority for the redundant AUX events issue.

Please take a look for the issue mentioned in another email for
recording trace data with wrong size.  I think the issue for wrong
snapshot trace size should have a fixing in Arm SPE driver, and the
fixing need to be verified with the perf patches.  After that I am fine
for merging the perf patches (and you could upstream kernel driver
patches separately).  How about you think?

Thanks,
Leo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-17 12:05   ` Leo Yan
  2021-10-17 12:36     ` Leo Yan
@ 2021-10-19 17:34     ` German Gomez
  2021-10-20 13:25       ` Leo Yan
  1 sibling, 1 reply; 38+ messages in thread
From: German Gomez @ 2021-10-19 17:34 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi Leo,

On 17/10/2021 13:05, Leo Yan wrote:
> On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote:
>
> [...]
>
> If run a test case (the test is pasted at the end of the reply), I
> can get quite different AUX trace data with passing different wait
> period before sending the first USR2 signal.
>
>   # sh test_arm_spe_snapshot.sh 2
>   Couldn't synthesize bpf events.
>   stress: info: [5768] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
>   [ perf record: Woken up 3 times to write data ]
>   [ perf record: Captured and wrote 2.833 MB perf.data ]
>
>   # sh test_arm_spe_snapshot.sh 10
>   Couldn't synthesize bpf events.
>   stress: info: [5776] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
>   [ perf record: Woken up 3 times to write data ]
>   [ perf record: Captured and wrote 24.356 MB perf.data ]
>
> The first command passes argument '2' so the test will wait for 2
> seconds before send USR2 signal for snapshot, and the perf data file is
> 2.833 MB (so this means the Arm SPE trace data is about 2MB) for three
> snapshots.  In the second command, the argument '10' means it will wait
> for 10 seconds before sending the USR2 signals, and every time it records
> the trace data from the full AUX buffer (8MB), at the end it gets 24MB
> AUX trace data.
>
> The issue happens in the second command, waiting for 10 seconds leads
> to the *full* AUX ring buffer is filled by Arm SPE, so the function
> arm_spe_buffer_has_wrapped() always return back true for this case.
> Afterwards, arm_spe_find_snapshot() doesn't respect the passed old
> header (from '*old') and assumes the trace data size is 'mm->len'.

Returning the entire contents of the buffer once the first wrap-around
was detected was the intention of the patch, so I don't currently see it
as wrong. What were the values you were expecting to see in the test?

If the handling of snapshot mode by the perf tool can be improved after
upstreaming the changes to the driver, we could submit a followup patch
after that has been fixed.

>
> To allow arm_spe_buffer_has_wrapped() to work properly, I think we
> need to clean up the top 8 bytes of the AUX buffer in Arm SPE driver
> when start the PMU event (please note, this change has an assumption
> that is meantioned in another email that suggests to remove redundant
> PERF_RECORD_AUX events so the function arm_spe_perf_aux_output_begin()
> is invoked only once when start PMU event, so we can use the top 8
> bytes in AUX buffer to indicate trace is wrap around or not).
>
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index d44bcc29d99c..eb35f85d0efb 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -493,6 +493,16 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
>         if (limit)
>                 limit |= BIT(SYS_PMBLIMITR_EL1_E_SHIFT);
>
> +       /*
> +        * Cleanup the top 8 bytes for snapshot mode; these 8 bytes are
> +        * used to indicate if trace data is wrap around if they are not
> +        * zero.
> +        */
> +       if (buf->snapshot) {
> +               void *tail = buf->base + (buf->nr_pages << PAGE_SHIFT) - 8;
> +               memset(tail, 0x0, 8);
> +       }
> +
>         limit += (u64)buf->base;
>         base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
>         write_sysreg_s(base, SYS_PMBPTR_EL1);
>
> Thanks,
> Leo

I will try these and the other driver changes and discuss them with the
team internally, thanks!

>
> ---8<---
>
> #!/bin/sh
>
> ./perf record -e arm_spe/period=148576/u -C 0 -S -m8M,8M -- taskset --cpu-list 0 stress --cpu 1 &
>
> PERFPID=$!
>
> echo "sleep $1 seconds" > /sys/kernel/debug/tracing/trace_marker
>
> # Wait for perf program
> sleep  $1
>
> # Send signal to snapshot trace data
> kill -USR2 $PERFPID
> sleep .03
> kill -USR2 $PERFPID
> sleep .03
> kill -USR2 $PERFPID
>
> echo "Stop snapshot" > /sys/kernel/debug/tracing/trace_marker
>
> kill $PERFPID
> wait $PERFPID

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/5] perf arm-spe: Add snapshot mode support
  2021-09-16 15:46 ` [PATCH 3/5] perf arm-spe: Add snapshot mode support German Gomez
@ 2021-10-20 12:48   ` Leo Yan
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-20 12:48 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 16, 2021 at 04:46:33PM +0100, German Gomez wrote:
> This patch enabled support for snapshot mode of arm_spe events,
> including the implementation of the necessary callbacks (excluding
> find_snapshot, which is to be included in a followup commit).
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-09-16 15:46 ` [PATCH 5/5] perf arm-spe: Snapshot mode test German Gomez
@ 2021-10-20 13:13   ` Leo Yan
  2021-10-20 15:06     ` German Gomez
  2021-11-02 14:07     ` James Clark
  0 siblings, 2 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-20 13:13 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

On Thu, Sep 16, 2021 at 04:46:35PM +0100, German Gomez wrote:
> Shell script test_arm_spe.sh has been added to test the recording of SPE
> tracing events in snapshot mode.
> 
> Reviewed-by: James Clark <james.clark@arm.com>
> Signed-off-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/tests/shell/test_arm_spe.sh | 91 ++++++++++++++++++++++++++
>  1 file changed, 91 insertions(+)
>  create mode 100755 tools/perf/tests/shell/test_arm_spe.sh
> 
> diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
> new file mode 100755
> index 000000000000..9ed817e76f95
> --- /dev/null
> +++ b/tools/perf/tests/shell/test_arm_spe.sh
> @@ -0,0 +1,91 @@
> +#!/bin/sh
> +# Check Arm SPE trace data recording and synthesized samples
> +
> +# Uses the 'perf record' to record trace data of Arm SPE events;
> +# then verify if any SPE event samples are generated by SPE with
> +# 'perf script' and 'perf report' commands.
> +
> +# SPDX-License-Identifier: GPL-2.0
> +# German Gomez <german.gomez@arm.com>, 2021
> +
> +skip_if_no_arm_spe_event() {
> +	perf list | egrep -q 'arm_spe_[0-9]+//' && return 0
> +
> +	# arm_spe event doesn't exist
> +	return 2
> +}
> +
> +skip_if_no_arm_spe_event || exit 2
> +
> +perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
> +glb_err=0
> +
> +cleanup_files()
> +{
> +	rm -f ${perfdata}
> +	trap - exit term int
> +	kill -2 $$ # Forward sigint to parent

I understand you copy this code from Arm cs-etm testing, but I found
the sentence 'kill -2 $$' will cause a failure at my side with the
command:

root@ubuntu:/home/leoy/linux/tools/perf# ./perf test 85 -v
85: Check Arm SPE trace data recording and synthesized samples      :
--- start ---
test child forked, pid 29053
Recording trace with snapshot mode /tmp/__perf_test.perf.data.uughb
Looking at perf.data file for dumping samples:
Looking at perf.data file for reporting samples:
SPE snapshot testing: PASS
test child finished with -1
---- end ----
Check Arm SPE trace data recording and synthesized samples: FAILED!

I changed to use below code and looks it works for me:

        if [[ "$1" == "int" ]]; then
                kill -SIGINT $$
        fi
        if [[ "$1" == "term" ]]; then
                kill -SIGTERM $$
        fi

Thanks,
Leo

> +	exit $glb_err
> +}
> +
> +trap cleanup_files exit term int
> +
> +arm_spe_report() {
> +	if [ $2 != 0 ]; then
> +		echo "$1: FAIL"
> +		glb_err=$2
> +	else
> +		echo "$1: PASS"
> +	fi
> +}
> +
> +perf_script_samples() {
> +	echo "Looking at perf.data file for dumping samples:"
> +
> +	# from arm-spe.c/arm_spe_synth_events()
> +	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
> +
> +	# Below is an example of the samples dumping:
> +	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
> +	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
> +	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
> +	perf script -F,-time -i ${perfdata} 2>&1 | \
> +		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
> +}
> +
> +perf_report_samples() {
> +	echo "Looking at perf.data file for reporting samples:"
> +
> +	# Below is an example of the samples reporting:
> +	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
> +	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
> +	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
> +	perf report --stdio -i ${perfdata} 2>&1 | \
> +		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
> +}
> +
> +arm_spe_snapshot_test() {
> +	echo "Recording trace with snapshot mode $perfdata"
> +	perf record -o ${perfdata} -e arm_spe// -S \
> +		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
> +	PERFPID=$!
> +
> +	# Wait for perf program
> +	sleep 1
> +
> +	# Send signal to snapshot trace data
> +	kill -USR2 $PERFPID
> +
> +	# Stop perf program
> +	kill $PERFPID
> +	wait $PERFPID
> +
> +	perf_script_samples dd &&
> +	perf_report_samples dd
> +
> +	err=$?
> +	arm_spe_report "SPE snapshot testing" $err
> +}
> +
> +arm_spe_snapshot_test
> +exit $glb_err
> \ No newline at end of file
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-19 17:34     ` German Gomez
@ 2021-10-20 13:25       ` Leo Yan
  0 siblings, 0 replies; 38+ messages in thread
From: Leo Yan @ 2021-10-20 13:25 UTC (permalink / raw)
  To: German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

On Tue, Oct 19, 2021 at 06:34:24PM +0100, German Gomez wrote:
> Hi Leo,
> 
> On 17/10/2021 13:05, Leo Yan wrote:
> > On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote:
> >
> > [...]
> >
> > If run a test case (the test is pasted at the end of the reply), I
> > can get quite different AUX trace data with passing different wait
> > period before sending the first USR2 signal.
> >
> >   # sh test_arm_spe_snapshot.sh 2
> >   Couldn't synthesize bpf events.
> >   stress: info: [5768] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
> >   [ perf record: Woken up 3 times to write data ]
> >   [ perf record: Captured and wrote 2.833 MB perf.data ]
> >
> >   # sh test_arm_spe_snapshot.sh 10
> >   Couldn't synthesize bpf events.
> >   stress: info: [5776] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
> >   [ perf record: Woken up 3 times to write data ]
> >   [ perf record: Captured and wrote 24.356 MB perf.data ]
> >
> > The first command passes argument '2' so the test will wait for 2
> > seconds before send USR2 signal for snapshot, and the perf data file is
> > 2.833 MB (so this means the Arm SPE trace data is about 2MB) for three
> > snapshots.  In the second command, the argument '10' means it will wait
> > for 10 seconds before sending the USR2 signals, and every time it records
> > the trace data from the full AUX buffer (8MB), at the end it gets 24MB
> > AUX trace data.
> >
> > The issue happens in the second command, waiting for 10 seconds leads
> > to the *full* AUX ring buffer is filled by Arm SPE, so the function
> > arm_spe_buffer_has_wrapped() always return back true for this case.
> > Afterwards, arm_spe_find_snapshot() doesn't respect the passed old
> > header (from '*old') and assumes the trace data size is 'mm->len'.
> 
> Returning the entire contents of the buffer once the first wrap-around
> was detected was the intention of the patch, so I don't currently see it
> as wrong. What were the values you were expecting to see in the test?

I expect the second command takes three snapshots: the first time it
should record AUX trace data with full buffer size (8MB) after waiting
for 10 seconds, and later two times will take small AUX trace data since
the interval (0.03s) is short and Arm SPE has not filled the full AUX
buffer.

> If the handling of snapshot mode by the perf tool can be improved after
> upstreaming the changes to the driver, we could submit a followup patch
> after that has been fixed.

Okay, I understand now the main concern is for kernel driver changes,
this patch for perf tool is fine for me:

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>

[...]

> I will try these and the other driver changes and discuss them with the
> team internally, thanks!

Thanks a lot!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-10-20 13:13   ` Leo Yan
@ 2021-10-20 15:06     ` German Gomez
  2021-11-02 14:07     ` James Clark
  1 sibling, 0 replies; 38+ messages in thread
From: German Gomez @ 2021-10-20 15:06 UTC (permalink / raw)
  To: Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi Leo,

I'm unable to reproduce. I've tried on top of the most recent perf/core
branch but I still get exit code 0 consistently:

    $ git log
    commit be8ecc57f180415e8a7c1cc5620c5236be2a7e56 (grafted, origin/perf/core)
    Author: Tony Garnock-Jones <tonyg@leastfixedpoint.com>
    Date:   Thu Sep 16 14:09:39 2021 +0200

    $ ./perf test 88 -v
    Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
    88: Check Arm SPE trace data recording and synthesized samples      :
    --- start ---
    test child forked, pid 18700
    Recording trace with snapshot mode /tmp/__perf_test.perf.data.xgsUt
    Looking at perf.data file for dumping samples:
    Looking at perf.data file for reporting samples:
    SPE snapshot testing: PASS
    test child finished with 0
    ---- end ----
    Check Arm SPE trace data recording and synthesized samples: Ok

On 20/10/2021 14:13, Leo Yan wrote:
> On Thu, Sep 16, 2021 at 04:46:35PM +0100, German Gomez wrote:
>> Shell script test_arm_spe.sh has been added to test the recording of SPE
>> tracing events in snapshot mode.
>>
>> Reviewed-by: James Clark <james.clark@arm.com>
>> Signed-off-by: German Gomez <german.gomez@arm.com>
>> ---
>>  tools/perf/tests/shell/test_arm_spe.sh | 91 ++++++++++++++++++++++++++
>>  1 file changed, 91 insertions(+)
>>  create mode 100755 tools/perf/tests/shell/test_arm_spe.sh
>>
>> diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
>> new file mode 100755
>> index 000000000000..9ed817e76f95
>> --- /dev/null
>> +++ b/tools/perf/tests/shell/test_arm_spe.sh
>> @@ -0,0 +1,91 @@
>> +#!/bin/sh
>> +# Check Arm SPE trace data recording and synthesized samples
>> +
>> +# Uses the 'perf record' to record trace data of Arm SPE events;
>> +# then verify if any SPE event samples are generated by SPE with
>> +# 'perf script' and 'perf report' commands.
>> +
>> +# SPDX-License-Identifier: GPL-2.0
>> +# German Gomez <german.gomez@arm.com>, 2021
>> +
>> +skip_if_no_arm_spe_event() {
>> +	perf list | egrep -q 'arm_spe_[0-9]+//' && return 0
>> +
>> +	# arm_spe event doesn't exist
>> +	return 2
>> +}
>> +
>> +skip_if_no_arm_spe_event || exit 2
>> +
>> +perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>> +glb_err=0
>> +
>> +cleanup_files()
>> +{
>> +	rm -f ${perfdata}
>> +	trap - exit term int
>> +	kill -2 $$ # Forward sigint to parent
> I understand you copy this code from Arm cs-etm testing, but I found
> the sentence 'kill -2 $$' will cause a failure at my side with the
> command:
>
> root@ubuntu:/home/leoy/linux/tools/perf# ./perf test 85 -v
> 85: Check Arm SPE trace data recording and synthesized samples      :
> --- start ---
> test child forked, pid 29053
> Recording trace with snapshot mode /tmp/__perf_test.perf.data.uughb
> Looking at perf.data file for dumping samples:
> Looking at perf.data file for reporting samples:
> SPE snapshot testing: PASS
> test child finished with -1
> ---- end ----
> Check Arm SPE trace data recording and synthesized samples: FAILED!
>
> I changed to use below code and looks it works for me:
>
>         if [[ "$1" == "int" ]]; then
>                 kill -SIGINT $$
>         fi
>         if [[ "$1" == "term" ]]; then
>                 kill -SIGTERM $$
>         fi
>
> Thanks,
> Leo
>
>> +	exit $glb_err
>> +}
>> +
>> +trap cleanup_files exit term int
>> +
>> +arm_spe_report() {
>> +	if [ $2 != 0 ]; then
>> +		echo "$1: FAIL"
>> +		glb_err=$2
>> +	else
>> +		echo "$1: PASS"
>> +	fi
>> +}
>> +
>> +perf_script_samples() {
>> +	echo "Looking at perf.data file for dumping samples:"
>> +
>> +	# from arm-spe.c/arm_spe_synth_events()
>> +	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
>> +
>> +	# Below is an example of the samples dumping:
>> +	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	perf script -F,-time -i ${perfdata} 2>&1 | \
>> +		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
>> +}
>> +
>> +perf_report_samples() {
>> +	echo "Looking at perf.data file for reporting samples:"
>> +
>> +	# Below is an example of the samples reporting:
>> +	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
>> +	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
>> +	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
>> +	perf report --stdio -i ${perfdata} 2>&1 | \
>> +		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
>> +}
>> +
>> +arm_spe_snapshot_test() {
>> +	echo "Recording trace with snapshot mode $perfdata"
>> +	perf record -o ${perfdata} -e arm_spe// -S \
>> +		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>> +	PERFPID=$!
>> +
>> +	# Wait for perf program
>> +	sleep 1
>> +
>> +	# Send signal to snapshot trace data
>> +	kill -USR2 $PERFPID
>> +
>> +	# Stop perf program
>> +	kill $PERFPID
>> +	wait $PERFPID
>> +
>> +	perf_script_samples dd &&
>> +	perf_report_samples dd
>> +
>> +	err=$?
>> +	arm_spe_report "SPE snapshot testing" $err
>> +}
>> +
>> +arm_spe_snapshot_test
>> +exit $glb_err
>> \ No newline at end of file
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback
  2021-10-17  6:13                       ` Leo Yan
  2021-10-19  9:23                         ` German Gomez
@ 2021-11-02 11:02                         ` German Gomez
  1 sibling, 0 replies; 38+ messages in thread
From: German Gomez @ 2021-11-02 11:02 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, linux-kernel, linux-perf-users, John Garry,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight,
	James Clark

Hi Leo,

On 17/10/2021 07:13, Leo Yan wrote:
> [...]
>
> I looked into the Arm SPE driver and found it doesn't really support
> free run mode for AUX ring buffer when the driver runs in snapshot
> mode, the pair functions perf_aux_output_end() and
> perf_aux_output_begin() are invoked when every time handle the
> interrupt.  The detailed flow is:
>
>   arm_spe_pmu_irq_handler()
>     `> arm_spe_pmu_buf_get_fault_act()
>          `> arm_spe_perf_aux_output_end()
>               `> set SPE registers
>               `> perf_aux_output_end()
>     `> arm_spe_perf_aux_output_begin()
>          `> perf_aux_output_begin()
>          `> set SPE registers
>
> Seems to me, a possible solution is to add an extra parameter 'int
> in_interrupt' for functions arm_spe_perf_aux_output_end() and
> arm_spe_perf_aux_output_begin(), if this parameter is passed as 1 in
> the interrupt handling, these two functions should skip invoking
> perf_aux_output_end() and perf_aux_output_begin() so can avoid the
> redundant perf event PERF_RECORD_AUX.
>
>   arm_spe_pmu_irq_handler()
>     `> arm_spe_pmu_buf_get_fault_act()
>          `> arm_spe_perf_aux_output_end(..., in_interrupt=1)
>               `> set SPE registers
>     `> arm_spe_perf_aux_output_begin(..., in_interrupt=1)
>          `> set SPE registers

I brought the issue of the redundant AUX events to the team, and we know
of at least one tool in Arm relying on these events in snapshot mode. So
we think that changing this behavior of the driver might not be easy to
do right now.

>
> P.s. I think Intel-PT has supported free run mode for snapshot mode,
> so it should not generate interrupt in this mode.  Thus Intel-PT can
> avoid this issue, please see the code [2].
>
> Thanks,
> Leo
>
> [1] https://people.linaro.org/~leo.yan/spe/snapshot_test/perf.data
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/pt.c#n753

Thanks,
German

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-10-20 13:13   ` Leo Yan
  2021-10-20 15:06     ` German Gomez
@ 2021-11-02 14:07     ` James Clark
  2021-11-02 15:37       ` James Clark
  1 sibling, 1 reply; 38+ messages in thread
From: James Clark @ 2021-11-02 14:07 UTC (permalink / raw)
  To: Leo Yan, German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight



On 20/10/2021 14:13, Leo Yan wrote:
> On Thu, Sep 16, 2021 at 04:46:35PM +0100, German Gomez wrote:
>> Shell script test_arm_spe.sh has been added to test the recording of SPE
>> tracing events in snapshot mode.
>>
>> Reviewed-by: James Clark <james.clark@arm.com>
>> Signed-off-by: German Gomez <german.gomez@arm.com>
>> ---
>>  tools/perf/tests/shell/test_arm_spe.sh | 91 ++++++++++++++++++++++++++
>>  1 file changed, 91 insertions(+)
>>  create mode 100755 tools/perf/tests/shell/test_arm_spe.sh
>>
>> diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
>> new file mode 100755
>> index 000000000000..9ed817e76f95
>> --- /dev/null
>> +++ b/tools/perf/tests/shell/test_arm_spe.sh
>> @@ -0,0 +1,91 @@
>> +#!/bin/sh
>> +# Check Arm SPE trace data recording and synthesized samples
>> +
>> +# Uses the 'perf record' to record trace data of Arm SPE events;
>> +# then verify if any SPE event samples are generated by SPE with
>> +# 'perf script' and 'perf report' commands.
>> +
>> +# SPDX-License-Identifier: GPL-2.0
>> +# German Gomez <german.gomez@arm.com>, 2021
>> +
>> +skip_if_no_arm_spe_event() {
>> +	perf list | egrep -q 'arm_spe_[0-9]+//' && return 0
>> +
>> +	# arm_spe event doesn't exist
>> +	return 2
>> +}
>> +
>> +skip_if_no_arm_spe_event || exit 2
>> +
>> +perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>> +glb_err=0
>> +
>> +cleanup_files()
>> +{
>> +	rm -f ${perfdata}
>> +	trap - exit term int
>> +	kill -2 $$ # Forward sigint to parent
> 
> I understand you copy this code from Arm cs-etm testing, but I found
> the sentence 'kill -2 $$' will cause a failure at my side with the
> command:
> 
> root@ubuntu:/home/leoy/linux/tools/perf# ./perf test 85 -v
> 85: Check Arm SPE trace data recording and synthesized samples      :
> --- start ---
> test child forked, pid 29053
> Recording trace with snapshot mode /tmp/__perf_test.perf.data.uughb
> Looking at perf.data file for dumping samples:
> Looking at perf.data file for reporting samples:
> SPE snapshot testing: PASS
> test child finished with -1
> ---- end ----
> Check Arm SPE trace data recording and synthesized samples: FAILED!
> 
> I changed to use below code and looks it works for me:
> 
>         if [[ "$1" == "int" ]]; then
>                 kill -SIGINT $$
>         fi
>         if [[ "$1" == "term" ]]; then
>                 kill -SIGTERM $$
>         fi
> 
> Thanks,
> Leo

This is quite interesting. It looks like the issue is caused by the update from dash 0.5.8
on Ubuntu 18 to dash 0.5.10 on Ubuntu 20. Specifically the commit that causes the issue is:

   commit 9e5cd41d9605e4caaac3aacdc0482f6ee220a298
   Author: Herbert Xu <herbert@gondor.apana.org.au>
   Date:   Mon May 7 00:40:34 2018 +0800

    jobs - Do not block when waiting on SIGCHLD
    
    Because of the nature of SIGCHLD, the process may have already been
    waited on and therefore we must be prepared for the case that wait
    may block.  So ensure that it doesn't by using WNOHANG.
    
    Furthermore, multiple jobs may have exited when gotsigchld is set.
    Therefore we need to wait until there are no zombies left.
    
    Lastly, waitforjob needs to be called with interrupts off and
    the original patch broke that.
    
    Fixes: 03876c0743a5 ("eval: Reap zombies after built-in...")
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


This also means that the Coresight shell test will not be working anymore because I added
the same trap to it so that it could be run in a loop. I'm going to compare the bahaviour
to bash to see which one is doing the right thing and what the correct change to make to 
fix it is. Or a bug needs to be reported.

Thanks
James

> 
>> +	exit $glb_err
>> +}
>> +
>> +trap cleanup_files exit term int
>> +
>> +arm_spe_report() {
>> +	if [ $2 != 0 ]; then
>> +		echo "$1: FAIL"
>> +		glb_err=$2
>> +	else
>> +		echo "$1: PASS"
>> +	fi
>> +}
>> +
>> +perf_script_samples() {
>> +	echo "Looking at perf.data file for dumping samples:"
>> +
>> +	# from arm-spe.c/arm_spe_synth_events()
>> +	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
>> +
>> +	# Below is an example of the samples dumping:
>> +	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>> +	perf script -F,-time -i ${perfdata} 2>&1 | \
>> +		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
>> +}
>> +
>> +perf_report_samples() {
>> +	echo "Looking at perf.data file for reporting samples:"
>> +
>> +	# Below is an example of the samples reporting:
>> +	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
>> +	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
>> +	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
>> +	perf report --stdio -i ${perfdata} 2>&1 | \
>> +		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
>> +}
>> +
>> +arm_spe_snapshot_test() {
>> +	echo "Recording trace with snapshot mode $perfdata"
>> +	perf record -o ${perfdata} -e arm_spe// -S \
>> +		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>> +	PERFPID=$!
>> +
>> +	# Wait for perf program
>> +	sleep 1
>> +
>> +	# Send signal to snapshot trace data
>> +	kill -USR2 $PERFPID
>> +
>> +	# Stop perf program
>> +	kill $PERFPID
>> +	wait $PERFPID
>> +
>> +	perf_script_samples dd &&
>> +	perf_report_samples dd
>> +
>> +	err=$?
>> +	arm_spe_report "SPE snapshot testing" $err
>> +}
>> +
>> +arm_spe_snapshot_test
>> +exit $glb_err
>> \ No newline at end of file
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-11-02 14:07     ` James Clark
@ 2021-11-02 15:37       ` James Clark
  2021-11-09 13:26         ` German Gomez
  0 siblings, 1 reply; 38+ messages in thread
From: James Clark @ 2021-11-02 15:37 UTC (permalink / raw)
  To: Leo Yan, German Gomez
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight



On 02/11/2021 14:07, James Clark wrote:
> 
> 
> On 20/10/2021 14:13, Leo Yan wrote:
>> On Thu, Sep 16, 2021 at 04:46:35PM +0100, German Gomez wrote:
>>> Shell script test_arm_spe.sh has been added to test the recording of SPE
>>> tracing events in snapshot mode.
>>>
>>> Reviewed-by: James Clark <james.clark@arm.com>
>>> Signed-off-by: German Gomez <german.gomez@arm.com>
>>> ---
>>>  tools/perf/tests/shell/test_arm_spe.sh | 91 ++++++++++++++++++++++++++
>>>  1 file changed, 91 insertions(+)
>>>  create mode 100755 tools/perf/tests/shell/test_arm_spe.sh
>>>
>>> diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
>>> new file mode 100755
>>> index 000000000000..9ed817e76f95
>>> --- /dev/null
>>> +++ b/tools/perf/tests/shell/test_arm_spe.sh
>>> @@ -0,0 +1,91 @@
>>> +#!/bin/sh
>>> +# Check Arm SPE trace data recording and synthesized samples
>>> +
>>> +# Uses the 'perf record' to record trace data of Arm SPE events;
>>> +# then verify if any SPE event samples are generated by SPE with
>>> +# 'perf script' and 'perf report' commands.
>>> +
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +# German Gomez <german.gomez@arm.com>, 2021
>>> +
>>> +skip_if_no_arm_spe_event() {
>>> +	perf list | egrep -q 'arm_spe_[0-9]+//' && return 0
>>> +
>>> +	# arm_spe event doesn't exist
>>> +	return 2
>>> +}
>>> +
>>> +skip_if_no_arm_spe_event || exit 2
>>> +
>>> +perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>>> +glb_err=0
>>> +
>>> +cleanup_files()
>>> +{
>>> +	rm -f ${perfdata}
>>> +	trap - exit term int
>>> +	kill -2 $$ # Forward sigint to parent
>>
>> I understand you copy this code from Arm cs-etm testing, but I found
>> the sentence 'kill -2 $$' will cause a failure at my side with the
>> command:
>>
>> root@ubuntu:/home/leoy/linux/tools/perf# ./perf test 85 -v
>> 85: Check Arm SPE trace data recording and synthesized samples      :
>> --- start ---
>> test child forked, pid 29053
>> Recording trace with snapshot mode /tmp/__perf_test.perf.data.uughb
>> Looking at perf.data file for dumping samples:
>> Looking at perf.data file for reporting samples:
>> SPE snapshot testing: PASS
>> test child finished with -1
>> ---- end ----
>> Check Arm SPE trace data recording and synthesized samples: FAILED!
>>
>> I changed to use below code and looks it works for me:
>>
>>         if [[ "$1" == "int" ]]; then
>>                 kill -SIGINT $$
>>         fi
>>         if [[ "$1" == "term" ]]; then
>>                 kill -SIGTERM $$
>>         fi
>>
>> Thanks,
>> Leo
> 
> This is quite interesting. It looks like the issue is caused by the update from dash 0.5.8
> on Ubuntu 18 to dash 0.5.10 on Ubuntu 20. Specifically the commit that causes the issue is:
> 
>    commit 9e5cd41d9605e4caaac3aacdc0482f6ee220a298
>    Author: Herbert Xu <herbert@gondor.apana.org.au>
>    Date:   Mon May 7 00:40:34 2018 +0800
> 
>     jobs - Do not block when waiting on SIGCHLD
>     
>     Because of the nature of SIGCHLD, the process may have already been
>     waited on and therefore we must be prepared for the case that wait
>     may block.  So ensure that it doesn't by using WNOHANG.
>     
>     Furthermore, multiple jobs may have exited when gotsigchld is set.
>     Therefore we need to wait until there are no zombies left.
>     
>     Lastly, waitforjob needs to be called with interrupts off and
>     the original patch broke that.
>     
>     Fixes: 03876c0743a5 ("eval: Reap zombies after built-in...")
>     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> 
> This also means that the Coresight shell test will not be working anymore because I added
> the same trap to it so that it could be run in a loop. I'm going to compare the bahaviour
> to bash to see which one is doing the right thing and what the correct change to make to 
> fix it is. Or a bug needs to be reported.
> 
> Thanks
> James
> 

Ok, it seems like I was relying on buggy dash behaviour for my original change. Even with this:

        if [[ "$1" == "int" ]]; then
                kill -SIGINT $$
        fi
        if [[ "$1" == "term" ]]; then
                kill -SIGTERM $$
        fi

it still doesn't allow you to break out of running it in a while loop. This is only because of
the exit code, rather than any kind of signal propagation. Actually it's possible to stop it
with Ctrl-\ rather than Ctrl-C, and that doesn't require any extra handling in the script.

For that reason I'm happy to go with Leo's original suggestion when I first added this which was
to not have any extra kill at all.

Another fix could be this, but I'm not too keen on it because I don't think any other tests behave
like this:

        [ "$1" = "int" ] || exit 1
        [ "$1" = "term" ] || exit 1

>>
>>> +	exit $glb_err
>>> +}
>>> +
>>> +trap cleanup_files exit term int
>>> +
>>> +arm_spe_report() {
>>> +	if [ $2 != 0 ]; then
>>> +		echo "$1: FAIL"
>>> +		glb_err=$2
>>> +	else
>>> +		echo "$1: PASS"
>>> +	fi
>>> +}
>>> +
>>> +perf_script_samples() {
>>> +	echo "Looking at perf.data file for dumping samples:"
>>> +
>>> +	# from arm-spe.c/arm_spe_synth_events()
>>> +	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
>>> +
>>> +	# Below is an example of the samples dumping:
>>> +	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>> +	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>> +	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>> +	perf script -F,-time -i ${perfdata} 2>&1 | \
>>> +		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
>>> +}
>>> +
>>> +perf_report_samples() {
>>> +	echo "Looking at perf.data file for reporting samples:"
>>> +
>>> +	# Below is an example of the samples reporting:
>>> +	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
>>> +	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
>>> +	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
>>> +	perf report --stdio -i ${perfdata} 2>&1 | \
>>> +		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
>>> +}
>>> +
>>> +arm_spe_snapshot_test() {
>>> +	echo "Recording trace with snapshot mode $perfdata"
>>> +	perf record -o ${perfdata} -e arm_spe// -S \
>>> +		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>>> +	PERFPID=$!
>>> +
>>> +	# Wait for perf program
>>> +	sleep 1
>>> +
>>> +	# Send signal to snapshot trace data
>>> +	kill -USR2 $PERFPID
>>> +
>>> +	# Stop perf program
>>> +	kill $PERFPID
>>> +	wait $PERFPID
>>> +
>>> +	perf_script_samples dd &&
>>> +	perf_report_samples dd
>>> +
>>> +	err=$?
>>> +	arm_spe_report "SPE snapshot testing" $err
>>> +}
>>> +
>>> +arm_spe_snapshot_test
>>> +exit $glb_err
>>> \ No newline at end of file
>>> -- 
>>> 2.17.1
>>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/5] perf arm-spe: Snapshot mode test
  2021-11-02 15:37       ` James Clark
@ 2021-11-09 13:26         ` German Gomez
  0 siblings, 0 replies; 38+ messages in thread
From: German Gomez @ 2021-11-09 13:26 UTC (permalink / raw)
  To: James Clark, Leo Yan
  Cc: linux-kernel, linux-perf-users, John Garry, Will Deacon,
	Mathieu Poirier, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Mike Leach, linux-arm-kernel, coresight

Hi James, Leo,

Thank you for testing the patch.

On 02/11/2021 15:37, James Clark wrote:
> [...]
> Ok, it seems like I was relying on buggy dash behaviour for my original change. Even with this:
>
>         if [[ "$1" == "int" ]]; then
>                 kill -SIGINT $$
>         fi
>         if [[ "$1" == "term" ]]; then
>                 kill -SIGTERM $$
>         fi
>
> it still doesn't allow you to break out of running it in a while loop. This is only because of
> the exit code, rather than any kind of signal propagation. Actually it's possible to stop it
> with Ctrl-\ rather than Ctrl-C, and that doesn't require any extra handling in the script.
>
> For that reason I'm happy to go with Leo's original suggestion when I first added this which was
> to not have any extra kill at all.

Thanks for debugging the issue, I think I will consider this fix in the
re-submission.

Thanks,
German

>
> Another fix could be this, but I'm not too keen on it because I don't think any other tests behave
> like this:
>
>         [ "$1" = "int" ] || exit 1
>         [ "$1" = "term" ] || exit 1
>
>>>> +	exit $glb_err
>>>> +}
>>>> +
>>>> +trap cleanup_files exit term int
>>>> +
>>>> +arm_spe_report() {
>>>> +	if [ $2 != 0 ]; then
>>>> +		echo "$1: FAIL"
>>>> +		glb_err=$2
>>>> +	else
>>>> +		echo "$1: PASS"
>>>> +	fi
>>>> +}
>>>> +
>>>> +perf_script_samples() {
>>>> +	echo "Looking at perf.data file for dumping samples:"
>>>> +
>>>> +	# from arm-spe.c/arm_spe_synth_events()
>>>> +	events="(ld1-miss|ld1-access|llc-miss|lld-access|tlb-miss|tlb-access|branch-miss|remote-access|memory)"
>>>> +
>>>> +	# Below is an example of the samples dumping:
>>>> +	#	dd  3048 [002]          1    l1d-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>>> +	#	dd  3048 [002]          1    tlb-access:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>>> +	#	dd  3048 [002]          1        memory:      ffffaa64999c __GI___libc_write+0x3c (/lib/aarch64-linux-gnu/libc-2.27.so)
>>>> +	perf script -F,-time -i ${perfdata} 2>&1 | \
>>>> +		egrep " +$1 +[0-9]+ .* +${events}:(.*:)? +" > /dev/null 2>&1
>>>> +}
>>>> +
>>>> +perf_report_samples() {
>>>> +	echo "Looking at perf.data file for reporting samples:"
>>>> +
>>>> +	# Below is an example of the samples reporting:
>>>> +	#   73.04%    73.04%  dd    libc-2.27.so      [.] _dl_addr
>>>> +	#    7.71%     7.71%  dd    libc-2.27.so      [.] getenv
>>>> +	#    2.59%     2.59%  dd    ld-2.27.so        [.] strcmp
>>>> +	perf report --stdio -i ${perfdata} 2>&1 | \
>>>> +		egrep " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
>>>> +}
>>>> +
>>>> +arm_spe_snapshot_test() {
>>>> +	echo "Recording trace with snapshot mode $perfdata"
>>>> +	perf record -o ${perfdata} -e arm_spe// -S \
>>>> +		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>>>> +	PERFPID=$!
>>>> +
>>>> +	# Wait for perf program
>>>> +	sleep 1
>>>> +
>>>> +	# Send signal to snapshot trace data
>>>> +	kill -USR2 $PERFPID
>>>> +
>>>> +	# Stop perf program
>>>> +	kill $PERFPID
>>>> +	wait $PERFPID
>>>> +
>>>> +	perf_script_samples dd &&
>>>> +	perf_report_samples dd
>>>> +
>>>> +	err=$?
>>>> +	arm_spe_report "SPE snapshot testing" $err
>>>> +}
>>>> +
>>>> +arm_spe_snapshot_test
>>>> +exit $glb_err
>>>> \ No newline at end of file
>>>> -- 
>>>> 2.17.1
>>>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-11-09 13:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez
2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez
2021-09-23 13:35   ` Leo Yan
2021-09-16 15:46 ` [PATCH 3/5] perf arm-spe: Add snapshot mode support German Gomez
2021-10-20 12:48   ` Leo Yan
2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez
2021-09-23 13:50   ` Leo Yan
2021-09-23 14:40     ` Leo Yan
2021-09-30 12:26       ` German Gomez
2021-10-04 12:27         ` Leo Yan
2021-10-06  9:35           ` German Gomez
2021-10-06  9:51             ` Leo Yan
2021-10-11 15:55               ` German Gomez
2021-10-12  8:19                 ` Will Deacon
2021-10-12  8:47                   ` James Clark
2021-10-13  0:39                 ` Leo Yan
2021-10-13  7:51                   ` Will Deacon
2021-10-15 12:33                     ` German Gomez
2021-10-15 14:16                       ` Leo Yan
2021-10-15 14:41                         ` German Gomez
2021-10-17  6:13                       ` Leo Yan
2021-10-19  9:23                         ` German Gomez
2021-10-19 13:12                           ` Leo Yan
2021-11-02 11:02                         ` German Gomez
2021-10-17 12:05   ` Leo Yan
2021-10-17 12:36     ` Leo Yan
2021-10-19 17:34     ` German Gomez
2021-10-20 13:25       ` Leo Yan
2021-09-16 15:46 ` [PATCH 5/5] perf arm-spe: Snapshot mode test German Gomez
2021-10-20 13:13   ` Leo Yan
2021-10-20 15:06     ` German Gomez
2021-11-02 14:07     ` James Clark
2021-11-02 15:37       ` James Clark
2021-11-09 13:26         ` German Gomez
2021-09-23 13:35 ` [PATCH 1/5] perf cs-etm: Print size using consistent format Leo Yan
2021-09-23 16:24 ` Mathieu Poirier
2021-09-30 12:09   ` German Gomez
2021-09-30 16:30     ` Mathieu Poirier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).