LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/12] Stitch LBR call stack (Perf Tools)
@ 2020-02-28 16:29 kan.liang
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:29 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The kernel patches have been merged into linux-next.
  commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
index of raw branch records")
  commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
correctly")

Start from Haswell, Linux perf can utilize the existing Last Branch
Record (LBR) facility to record call stack. However, the depth of the
reconstructed LBR call stack limits to the number of LBR registers.
E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
That's because HW will overwrite the oldest LBR registers when it's
full.

However, the overwritten LBRs may still be retrieved from previous
sample. At that moment, HW hasn't overwritten the LBR registers yet.
Perf tools can stitch those overwritten LBRs on current call stacks to
get a more complete call stack.

To determine if LBRs can be stitched, the physical index of LBR
registers is required. A new branch sample type is introduced to dump
the physical index of the most recent LBR aka Top-of-Stack (TOS)
information for perf tools.
Patch 1 & 2 extend struct branch_stack to support the new branch sample
type, PERF_SAMPLE_BRANCH_HW_INDEX.

Since the output format of PERF_SAMPLE_BRANCH_STACK will be changed
when the new branch sample type is set, an older version of perf tool
may parse the perf.data incorrectly. Furthermore, there is no warning
if this case happens. Because current perf header never check for
unknown input bits in attr. Patch 3 adds check for event attr. (Can be
merged independently.)

Besides the physical index, the maximum number of LBRs is required as
well. Patch 4 & 5 retrieve the capabilities information from sysfs
and save them in perf header.

Patch 6 & 7 implements the LBR stitching approach.

Users can use the options introduced in patch 8-11 to enable the LBR
stitching approach for perf report, script, top and c2c.

Patch 12 adds a fast path for duplicate entries check. It benefits all
call stack parsing, not just for stitch LBR call stack. It can be
merged independently.


The stitching approach base on LBR call stack technology. The known
limitations of LBR call stack technology still apply to the approach,
e.g. Exception handing such as setjmp/longjmp will have calls/returns
not match.
This approach is not full proof. There can be cases where it creates
incorrect call stacks from incorrect matches. There is no attempt
to validate any matches in another way. So it is not enabled by default.
However in many common cases with call stack overflows it can recreate
better call stacks than the default lbr call stack output. So if there
are problems with LBR overflows this is a possible workaround.

Regression:
Users may collect LBR call stack on a machine with new perf tool and
new kernel (support LBR TOS). However, they may parse the perf.data with
old perf tool (not support LBR TOS). The old tool doesn't check
attr.branch_sample_type. Users probably get incorrect information
without any warning.

Performance impact:
The processing time may increase with the LBR stitching approach
enabled. The impact depends on the increased depth of call stacks.

For a simple test case tchain_edit with 43 depth of call stacks.
perf record --call-graph lbr -- ./tchain_edit
perf report --stitch-lbr

Without --stitch-lbr, perf report only display 32 depth of call stacks.
With --stitch-lbr, perf report can display all 43 depth of call stacks.
The depth of call stacks increase 34.3%.

Correspondingly, the processing time of perf report increases 39%,
Without --stitch-lbr:                           11.0 sec
With --stitch-lbr:                              15.3 sec

The source code of tchain_edit.c is something similar as below.
noinline void f43(void)
{
        int i;
        for (i = 0; i < 10000;) {

                if(i%2)
                        i++;
                else
                        i++;
        }
}

noinline void f42(void)
{
        int i;
        for (i = 0; i < 100; i++) {
                f43();
                f43();
                f43();
        }
}

noinline void f41(void)
{
        int i;
        for (i = 0; i < 100; i++) {
                f42();
                f42();
                f42();
        }
}
noinline void f40(void)
{
        f41();
}

... ...

noinline void f32(void)
{
        f33();
}

noinline void f31(void)
{
        int i;

        for (i = 0; i < 10000; i++) {
                if(i%2)
                        i++;
                else
                        i++;
        }

        f32();
}

noinline void f30(void)
{
        f31();
}

... ...

noinline void f1(void)
{
        f2();
}

int main()
{
        f1();
}

Kan Liang (12):
  perf tools: Add hw_idx in struct branch_stack
  perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
  perf header: Add check for event attr
  perf pmu: Add support for PMU capabilities
  perf header: Support CPU PMU capabilities
  perf machine: Refine the function for LBR call stack reconstruction
  perf tools: Stitch LBR call stack
  perf report: Add option to enable the LBR stitching approach
  perf script: Add option to enable the LBR stitching approach
  perf top: Add option to enable the LBR stitching approach
  perf c2c: Add option to enable the LBR stitching approach
  perf hist: Add fast path for duplicate entries check approach

 tools/include/uapi/linux/perf_event.h         |   8 +-
 tools/perf/Documentation/perf-c2c.txt         |  11 +
 tools/perf/Documentation/perf-report.txt      |  11 +
 tools/perf/Documentation/perf-script.txt      |  11 +
 tools/perf/Documentation/perf-top.txt         |   9 +
 .../Documentation/perf.data-file-format.txt   |  16 +
 tools/perf/builtin-c2c.c                      |   6 +
 tools/perf/builtin-record.c                   |   3 +
 tools/perf/builtin-report.c                   |   6 +
 tools/perf/builtin-script.c                   |  76 ++--
 tools/perf/builtin-stat.c                     |   1 +
 tools/perf/builtin-top.c                      |  11 +
 tools/perf/tests/sample-parsing.c             |   7 +-
 tools/perf/util/branch.h                      |  27 +-
 tools/perf/util/callchain.h                   |  12 +-
 tools/perf/util/cs-etm.c                      |   1 +
 tools/perf/util/env.h                         |   3 +
 tools/perf/util/event.h                       |   1 +
 tools/perf/util/evsel.c                       |  20 +-
 tools/perf/util/evsel.h                       |   6 +
 tools/perf/util/header.c                      | 147 ++++++
 tools/perf/util/header.h                      |   1 +
 tools/perf/util/hist.c                        |  26 +-
 tools/perf/util/intel-pt.c                    |   2 +
 tools/perf/util/machine.c                     | 424 +++++++++++++++---
 tools/perf/util/perf_event_attr_fprintf.c     |   1 +
 tools/perf/util/pmu.c                         |  87 ++++
 tools/perf/util/pmu.h                         |  12 +
 .../scripting-engines/trace-event-python.c    |  30 +-
 tools/perf/util/session.c                     |   8 +-
 tools/perf/util/sort.c                        |   2 +-
 tools/perf/util/sort.h                        |   2 +
 tools/perf/util/synthetic-events.c            |   6 +-
 tools/perf/util/thread.c                      |   2 +
 tools/perf/util/thread.h                      |  34 ++
 tools/perf/util/top.h                         |   1 +
 36 files changed, 900 insertions(+), 131 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-03-04 13:49   ` Arnaldo Carvalho de Melo
                     ` (2 more replies)
  2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The low level index of raw branch records for the most recent branch can
be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
branch_sample_type. Extend struct branch_stack to support it.

However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
entries[] will be output by kernel. The pointer of entries[] could be
wrong, since the output format is different with new struct branch_stack.
Add a variable no_hw_idx in struct perf_sample to indicate whether the
hw_idx is output.
Add get_branch_entry() to return corresponding pointer of entries[0].

To make dummy branch sample consistent as new branch sample, add hw_idx
in struct dummy_branch_stack for cs-etm and intel-pt.

Apply the new struct branch_stack for synthetic events as well.

Extend test case sample-parsing to support new struct branch_stack.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/include/uapi/linux/perf_event.h         |  8 ++-
 tools/perf/builtin-script.c                   | 70 ++++++++++---------
 tools/perf/tests/sample-parsing.c             |  7 +-
 tools/perf/util/branch.h                      | 22 ++++++
 tools/perf/util/cs-etm.c                      |  1 +
 tools/perf/util/event.h                       |  1 +
 tools/perf/util/evsel.c                       |  5 ++
 tools/perf/util/evsel.h                       |  5 ++
 tools/perf/util/hist.c                        |  3 +-
 tools/perf/util/intel-pt.c                    |  2 +
 tools/perf/util/machine.c                     | 35 +++++-----
 .../scripting-engines/trace-event-python.c    | 30 ++++----
 tools/perf/util/session.c                     |  8 ++-
 tools/perf/util/synthetic-events.c            |  6 +-
 14 files changed, 131 insertions(+), 72 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794d3105..397cfd65b3fe 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -181,6 +181,8 @@ enum perf_branch_sample_type_shift {
 
 	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
 
+	PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT	= 17, /* save low level index of raw branch records */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -208,6 +210,8 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
 		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
 
+	PERF_SAMPLE_BRANCH_HW_INDEX	= 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
@@ -853,7 +857,9 @@ enum perf_event_type {
 	 *	  char                  data[size];}&& PERF_SAMPLE_RAW
 	 *
 	 *	{ u64                   nr;
-	 *        { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
+	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
+	 *        { u64 from, to, flags } lbr[nr];
+	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 * 	{ u64			abi; # enum perf_sample_regs_abi
 	 * 	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e2406b291c1c..acf3107bbda2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 					struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 		return 0;
 
 	for (i = 0; i < br->nr; i++) {
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		if (PRINT_FIELD(DSO)) {
 			memset(&alf, 0, sizeof(alf));
@@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 		}
 
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str( br->entries + i),
-			br->entries[i].flags.in_tx? 'X' : '-',
-			br->entries[i].flags.abort? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 
 		memset(&alf, 0, sizeof(alf));
 		memset(&alt, 0, sizeof(alt));
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
 		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
@@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 			printed += fprintf(fp, ")");
 		}
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str( br->entries + i),
-			br->entries[i].flags.in_tx? 'X' : '-',
-			br->entries[i].flags.abort? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 
 		memset(&alf, 0, sizeof(alf));
 		memset(&alt, 0, sizeof(alt));
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
 		    !alf.map->dso->adjust_symbols)
@@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 			printed += fprintf(fp, ")");
 		}
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str(br->entries + i),
-			br->entries[i].flags.in_tx ? 'X' : '-',
-			br->entries[i].flags.abort ? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -1053,6 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 					    struct machine *machine, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	u64 start, end;
 	int i, insn, len, nr, ilen, printed = 0;
 	struct perf_insn x;
@@ -1073,31 +1077,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	printed += fprintf(fp, "%c", '\n');
 
 	/* Handle first from jump, of which we don't know the entry. */
-	len = grab_bb(buffer, br->entries[nr-1].from,
-			br->entries[nr-1].from,
+	len = grab_bb(buffer, entries[nr-1].from,
+			entries[nr-1].from,
 			machine, thread, &x.is64bit, &x.cpumode, false);
 	if (len > 0) {
-		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
+		printed += ip__fprintf_sym(entries[nr - 1].from, thread,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
-		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
+		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
 					    &x, buffer, len, 0, fp, &total_cycles);
 		if (PRINT_FIELD(SRCCODE))
-			printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from);
+			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
 	}
 
 	/* Print all blocks */
 	for (i = nr - 2; i >= 0; i--) {
-		if (br->entries[i].from || br->entries[i].to)
+		if (entries[i].from || entries[i].to)
 			pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i,
-				 br->entries[i].from,
-				 br->entries[i].to);
-		start = br->entries[i + 1].to;
-		end   = br->entries[i].from;
+				 entries[i].from,
+				 entries[i].to);
+		start = entries[i + 1].to;
+		end   = entries[i].from;
 
 		len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
 		/* Patch up missing kernel transfers due to ring filters */
 		if (len == -ENXIO && i > 0) {
-			end = br->entries[--i].from;
+			end = entries[--i].from;
 			pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end);
 			len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
 		}
@@ -1110,7 +1114,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
-				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp,
+				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
 							    &total_cycles);
 				if (PRINT_FIELD(SRCCODE))
 					printed += print_srccode(thread, x.cpumode, ip);
@@ -1134,9 +1138,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	 * Hit the branch? In this case we are already done, and the target
 	 * has not been executed yet.
 	 */
-	if (br->entries[0].from == sample->ip)
+	if (entries[0].from == sample->ip)
 		goto out;
-	if (br->entries[0].flags.abort)
+	if (entries[0].flags.abort)
 		goto out;
 
 	/*
@@ -1147,7 +1151,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	 * between final branch and sample. When this happens just
 	 * continue walking after the last TO until we hit a branch.
 	 */
-	start = br->entries[0].to;
+	start = entries[0].to;
 	end = sample->ip;
 	if (end < start) {
 		/* Missing jump. Scan 128 bytes for the next branch */
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 2762e1155238..14239e472187 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		COMP(branch_stack->nr);
+		COMP(branch_stack->hw_idx);
 		for (i = 0; i < s1->branch_stack->nr; i++)
 			MCOMP(branch_stack->entries[i]);
 	}
@@ -186,7 +187,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		u64 data[64];
 	} branch_stack = {
 		/* 1 branch_entry */
-		.data = {1, 211, 212, 213},
+		.data = {1, -1ULL, 211, 212, 213},
 	};
 	u64 regs[64];
 	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
@@ -208,6 +209,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		.transaction	= 112,
 		.raw_data	= (void *)raw_data,
 		.callchain	= &callchain.callchain,
+		.no_hw_idx      = false,
 		.branch_stack	= &branch_stack.branch_stack,
 		.user_regs	= {
 			.abi	= PERF_SAMPLE_REGS_ABI_64,
@@ -244,6 +246,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 	if (sample_type & PERF_SAMPLE_REGS_INTR)
 		evsel.core.attr.sample_regs_intr = sample_regs;
 
+	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+		evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+
 	for (i = 0; i < sizeof(regs); i++)
 		*(i + (u8 *)regs) = i & 0xfe;
 
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 88e00d268f6f..7fc9fa0dc361 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -12,6 +12,7 @@
 #include <linux/stddef.h>
 #include <linux/perf_event.h>
 #include <linux/types.h>
+#include "event.h"
 
 struct branch_flags {
 	u64 mispred:1;
@@ -39,9 +40,30 @@ struct branch_entry {
 
 struct branch_stack {
 	u64			nr;
+	u64			hw_idx;
 	struct branch_entry	entries[0];
 };
 
+/*
+ * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
+ * Otherwise, the output format of a sample with branch stack is
+ * struct branch_stack {
+ *	u64			nr;
+ *	struct branch_entry	entries[0];
+ * }
+ * Check whether the hw_idx is available,
+ * and return the corresponding pointer of entries[0].
+ */
+inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
+{
+	u64 *entry = (u64 *)sample->branch_stack;
+
+	entry++;
+	if (sample->no_hw_idx)
+		return (struct branch_entry *)entry;
+	return (struct branch_entry *)(++entry);
+}
+
 struct branch_type_stat {
 	bool	branch_to;
 	u64	counts[PERF_BR_MAX];
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 5471045ebf5c..e697fe1c67b3 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1202,6 +1202,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	if (etm->synth_opts.last_branch) {
 		dummy_bs = (struct dummy_branch_stack){
 			.nr = 1,
+			.hw_idx = -1ULL,
 			.entries = {
 				.from = sample.ip,
 				.to = sample.addr,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 85223159737c..3cda40a2fafc 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -139,6 +139,7 @@ struct perf_sample {
 	u16 insn_len;
 	u8  cpumode;
 	u16 misc;
+	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
 	char insn[MAX_INSN];
 	void *raw_data;
 	struct ip_callchain *callchain;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c8dc4450884c..05883a45de5b 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2169,7 +2169,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 		if (data->branch_stack->nr > max_branch_nr)
 			return -EFAULT;
+
 		sz = data->branch_stack->nr * sizeof(struct branch_entry);
+		if (perf_evsel__has_branch_hw_idx(evsel))
+			sz += sizeof(u64);
+		else
+			data->no_hw_idx = true;
 		OVERFLOW_CHECK(array, sz, max_size);
 		array = (void *)array + sz;
 	}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a823cd..99a0cb60c556 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -389,6 +389,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel)
 	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
 }
 
+static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel)
+{
+	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
+}
+
 static inline bool evsel__has_callchain(const struct evsel *evsel)
 {
 	return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ca5a8f4d007e..808ca27bd5cf 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2584,9 +2584,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			  u64 *total_cycles)
 {
 	struct branch_info *bi;
+	struct branch_entry *entries = get_branch_entry(sample);
 
 	/* If we have branch cycles always annotate them. */
-	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+	if (bs && bs->nr && entries[0].flags.cycles) {
 		int i;
 
 		bi = sample__resolve_bstack(sample, al);
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 33cf8928cf05..23c8289c2472 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1295,6 +1295,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	struct perf_sample sample = { .ip = 0, };
 	struct dummy_branch_stack {
 		u64			nr;
+		u64			hw_idx;
 		struct branch_entry	entries;
 	} dummy_bs;
 
@@ -1316,6 +1317,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
 		dummy_bs = (struct dummy_branch_stack){
 			.nr = 1,
+			.hw_idx = -1ULL,
 			.entries = {
 				.from = sample.ip,
 				.to = sample.addr,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8c5410315e8..62522b76a924 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2083,15 +2083,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 {
 	unsigned int i;
 	const struct branch_stack *bs = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
 
 	if (!bi)
 		return NULL;
 
 	for (i = 0; i < bs->nr; i++) {
-		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
-		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
-		bi[i].flags = bs->entries[i].flags;
+		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
+		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
+		bi[i].flags = entries[i].flags;
 	}
 	return bi;
 }
@@ -2187,6 +2188,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	/* LBR only affects the user callchain */
 	if (i != chain_nr) {
 		struct branch_stack *lbr_stack = sample->branch_stack;
+		struct branch_entry *entries = get_branch_entry(sample);
 		int lbr_nr = lbr_stack->nr, j, k;
 		bool branch;
 		struct branch_flags *flags;
@@ -2212,31 +2214,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					ip = chain->ips[j];
 				else if (j > i + 1) {
 					k = j - i - 2;
-					ip = lbr_stack->entries[k].from;
+					ip = entries[k].from;
 					branch = true;
-					flags = &lbr_stack->entries[k].flags;
+					flags = &entries[k].flags;
 				} else {
-					ip = lbr_stack->entries[0].to;
+					ip = entries[0].to;
 					branch = true;
-					flags = &lbr_stack->entries[0].flags;
-					branch_from =
-						lbr_stack->entries[0].from;
+					flags = &entries[0].flags;
+					branch_from = entries[0].from;
 				}
 			} else {
 				if (j < lbr_nr) {
 					k = lbr_nr - j - 1;
-					ip = lbr_stack->entries[k].from;
+					ip = entries[k].from;
 					branch = true;
-					flags = &lbr_stack->entries[k].flags;
+					flags = &entries[k].flags;
 				}
 				else if (j > lbr_nr)
 					ip = chain->ips[i + 1 - (j - lbr_nr)];
 				else {
-					ip = lbr_stack->entries[0].to;
+					ip = entries[0].to;
 					branch = true;
-					flags = &lbr_stack->entries[0].flags;
-					branch_from =
-						lbr_stack->entries[0].from;
+					flags = &entries[0].flags;
+					branch_from = entries[0].from;
 				}
 			}
 
@@ -2283,6 +2283,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 					    int max_stack)
 {
 	struct branch_stack *branch = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
@@ -2330,7 +2331,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 		for (i = 0; i < nr; i++) {
 			if (callchain_param.order == ORDER_CALLEE) {
-				be[i] = branch->entries[i];
+				be[i] = entries[i];
 
 				if (chain == NULL)
 					continue;
@@ -2349,7 +2350,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 				    be[i].from >= chain->ips[first_call] - 8)
 					first_call++;
 			} else
-				be[i] = branch->entries[branch->nr - i - 1];
+				be[i] = entries[branch->nr - i - 1];
 		}
 
 		memset(iter, 0, sizeof(struct iterations) * nr);
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 80ca5d0ab7fe..02b6c87c5abe 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 					struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	PyObject *pylist;
 	u64 i;
 
@@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 			Py_FatalError("couldn't create Python dictionary");
 
 		pydict_set_item_string_decref(pyelem, "from",
-		    PyLong_FromUnsignedLongLong(br->entries[i].from));
+		    PyLong_FromUnsignedLongLong(entries[i].from));
 		pydict_set_item_string_decref(pyelem, "to",
-		    PyLong_FromUnsignedLongLong(br->entries[i].to));
+		    PyLong_FromUnsignedLongLong(entries[i].to));
 		pydict_set_item_string_decref(pyelem, "mispred",
-		    PyBool_FromLong(br->entries[i].flags.mispred));
+		    PyBool_FromLong(entries[i].flags.mispred));
 		pydict_set_item_string_decref(pyelem, "predicted",
-		    PyBool_FromLong(br->entries[i].flags.predicted));
+		    PyBool_FromLong(entries[i].flags.predicted));
 		pydict_set_item_string_decref(pyelem, "in_tx",
-		    PyBool_FromLong(br->entries[i].flags.in_tx));
+		    PyBool_FromLong(entries[i].flags.in_tx));
 		pydict_set_item_string_decref(pyelem, "abort",
-		    PyBool_FromLong(br->entries[i].flags.abort));
+		    PyBool_FromLong(entries[i].flags.abort));
 		pydict_set_item_string_decref(pyelem, "cycles",
-		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
+		    PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
 
 		thread__find_map_fb(thread, sample->cpumode,
-				    br->entries[i].from, &al);
+				    entries[i].from, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "from_dsoname",
 					      _PyUnicode_FromString(dsoname));
 
 		thread__find_map_fb(thread, sample->cpumode,
-				    br->entries[i].to, &al);
+				    entries[i].to, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "to_dsoname",
 					      _PyUnicode_FromString(dsoname));
@@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 					   struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	PyObject *pylist;
 	u64 i;
 	char bf[512];
@@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 			Py_FatalError("couldn't create Python dictionary");
 
 		thread__find_symbol_fb(thread, sample->cpumode,
-				       br->entries[i].from, &al);
+				       entries[i].from, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "from",
 					      _PyUnicode_FromString(bf));
 
 		thread__find_symbol_fb(thread, sample->cpumode,
-				       br->entries[i].to, &al);
+				       entries[i].to, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "to",
 					      _PyUnicode_FromString(bf));
 
-		get_br_mspred(&br->entries[i].flags, bf, sizeof(bf));
+		get_br_mspred(&entries[i].flags, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "pred",
 					      _PyUnicode_FromString(bf));
 
-		if (br->entries[i].flags.in_tx) {
+		if (entries[i].flags.in_tx) {
 			pydict_set_item_string_decref(pyelem, "in_tx",
 					      _PyUnicode_FromString("X"));
 		} else {
@@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 					      _PyUnicode_FromString("-"));
 		}
 
-		if (br->entries[i].flags.abort) {
+		if (entries[i].flags.abort) {
 			pydict_set_item_string_decref(pyelem, "abort",
 					      _PyUnicode_FromString("A"));
 		} else {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25b23e3..dab985e3f136 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1007,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 {
 	struct ip_callchain *callchain = sample->callchain;
 	struct branch_stack *lbr_stack = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
 	u64 kernel_callchain_nr = callchain->nr;
 	unsigned int i;
 
@@ -1043,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 			       i, callchain->ips[i]);
 
 		printf("..... %2d: %016" PRIx64 "\n",
-		       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
+		       (int)(kernel_callchain_nr), entries[0].to);
 		for (i = 0; i < lbr_stack->nr; i++)
 			printf("..... %2d: %016" PRIx64 "\n",
-			       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
+			       (int)(i + kernel_callchain_nr + 1), entries[i].from);
 	}
 }
 
@@ -1068,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
 
 static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 {
+	struct branch_entry *entries = get_branch_entry(sample);
 	uint64_t i;
 
 	printf("%s: nr:%" PRIu64 "\n",
@@ -1075,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 		sample->branch_stack->nr);
 
 	for (i = 0; i < sample->branch_stack->nr; i++) {
-		struct branch_entry *e = &sample->branch_stack->entries[i];
+		struct branch_entry *e = &entries[i];
 
 		if (!callstack) {
 			printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298fe62d..dd3e6f43fb86 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
-		sz += sizeof(u64);
+		/* nr, hw_idx */
+		sz += 2 * sizeof(u64);
 		result += sz;
 	}
 
@@ -1344,7 +1345,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
-		sz += sizeof(u64);
+		/* nr, hw_idx */
+		sz += 2 * sizeof(u64);
 		memcpy(array, sample->branch_stack, sz);
 		array = (void *)array + sz;
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-03-05 20:25   ` Arnaldo Carvalho de Melo
  2020-03-19 14:10   ` [tip: perf/core] perf evsel: " tip-bot2 for Kan Liang
  2020-02-28 16:30 ` [PATCH 03/12] perf header: Add check for event attr kan.liang
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
in latest kernel.

Enable HW_INDEX by default in LBR call stack mode.
If kernel doesn't support the sample type, switching it off.

Add HW_INDEX in attr_fprintf as well. User can check whether the branch
sample type is set via debug information or header.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/evsel.c                   | 15 ++++++++++++---
 tools/perf/util/evsel.h                   |  1 +
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 05883a45de5b..816d930d774e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -712,7 +712,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel,
 				attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
 							PERF_SAMPLE_BRANCH_CALL_STACK |
 							PERF_SAMPLE_BRANCH_NO_CYCLES |
-							PERF_SAMPLE_BRANCH_NO_FLAGS;
+							PERF_SAMPLE_BRANCH_NO_FLAGS |
+							PERF_SAMPLE_BRANCH_HW_INDEX;
 			}
 		} else
 			 pr_warning("Cannot use LBR callstack with branch stack. "
@@ -763,7 +764,8 @@ perf_evsel__reset_callgraph(struct evsel *evsel,
 	if (param->record_mode == CALLCHAIN_LBR) {
 		perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
 		attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER |
-					      PERF_SAMPLE_BRANCH_CALL_STACK);
+					      PERF_SAMPLE_BRANCH_CALL_STACK |
+					      PERF_SAMPLE_BRANCH_HW_INDEX);
 	}
 	if (param->record_mode == CALLCHAIN_DWARF) {
 		perf_evsel__reset_sample_bit(evsel, REGS_USER);
@@ -1673,6 +1675,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 		evsel->core.attr.ksymbol = 0;
 	if (perf_missing_features.bpf)
 		evsel->core.attr.bpf_event = 0;
+	if (perf_missing_features.branch_hw_idx)
+		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->core.attr.sample_id_all = 0;
@@ -1784,7 +1788,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
+	if (!perf_missing_features.branch_hw_idx &&
+	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
+		perf_missing_features.branch_hw_idx = true;
+		pr_debug2("switching off branch HW index support\n");
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
 		perf_missing_features.aux_output = true;
 		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
 		goto out_close;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 99a0cb60c556..33804740e2ca 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -119,6 +119,7 @@ struct perf_missing_features {
 	bool ksymbol;
 	bool bpf;
 	bool aux_output;
+	bool branch_hw_idx;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 651203126c71..355d3458d4e6 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -50,6 +50,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
 		bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX),
 		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
 		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
+		bit_name(HW_INDEX),
 		{ .name = NULL, }
 	};
 #undef bit_name
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 03/12] perf header: Add check for event attr
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
  2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-03-19 14:10   ` [tip: perf/core] perf header: Add check for unexpected use of reserved membrs in " tip-bot2 for Kan Liang
  2020-02-28 16:30 ` [PATCH 04/12] perf pmu: Add support for PMU capabilities kan.liang
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The perf.data may be generated by a newer version of perf tool,
which support new input bits in attr, e.g. new bit for
branch_sample_type.
The perf.data may be parsed by an older version of perf tool later.
The old perf tool may parse the perf.data incorrectly. There is no
warning message for this case.

Current perf header never check for unknown input bits in attr.

When read the event desc from header, check the stored event attr.
The reserved bits, sample type, read format and branch sample type
will be checked.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/header.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 4246e7447e54..acbd046bf95c 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1590,6 +1590,40 @@ static void free_event_desc(struct evsel *events)
 	free(events);
 }
 
+static bool perf_attr_check(struct perf_event_attr *attr)
+{
+	if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) {
+		pr_warning("Reserved bits are set unexpectedly. "
+			   "Please update perf tool.\n");
+		return false;
+	}
+
+	if (attr->sample_type & ~(PERF_SAMPLE_MAX-1)) {
+		pr_warning("Unknown sample type (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->sample_type);
+		return false;
+	}
+
+	if (attr->read_format & ~(PERF_FORMAT_MAX-1)) {
+		pr_warning("Unknown read format (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->read_format);
+		return false;
+	}
+
+	if ((attr->sample_type & PERF_SAMPLE_BRANCH_STACK) &&
+	    (attr->branch_sample_type & ~(PERF_SAMPLE_BRANCH_MAX-1))) {
+		pr_warning("Unknown branch sample type (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->branch_sample_type);
+
+		return false;
+	}
+
+	return true;
+}
+
 static struct evsel *read_event_desc(struct feat_fd *ff)
 {
 	struct evsel *evsel, *events = NULL;
@@ -1634,6 +1668,9 @@ static struct evsel *read_event_desc(struct feat_fd *ff)
 
 		memcpy(&evsel->core.attr, buf, msz);
 
+		if (!perf_attr_check(&evsel->core.attr))
+			goto error;
+
 		if (do_read_u32(ff, &nr))
 			goto error;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 04/12] perf pmu: Add support for PMU capabilities
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (2 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 03/12] perf header: Add check for event attr kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 05/12] perf header: Support CPU " kan.liang
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The PMU capabilities information, which is located at
/sys/bus/event_source/devices/<dev>/caps, is required by perf tool.
For example, the max LBR information is required to stitch LBR call
stack.

Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information is stored in a list.

Add perf_pmu__scan_caps() to scan the capabilities one by one.

The following patch will store the capabilities information in perf
header.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/pmu.c | 87 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu.h | 12 ++++++
 2 files changed, 99 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8b99fd312aae..cec551bcf519 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -844,6 +844,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
 
 	INIT_LIST_HEAD(&pmu->format);
 	INIT_LIST_HEAD(&pmu->aliases);
+	INIT_LIST_HEAD(&pmu->caps);
 	list_splice(&format, &pmu->format);
 	list_splice(&aliases, &pmu->aliases);
 	list_add_tail(&pmu->list, &pmus);
@@ -1565,3 +1566,89 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+static int perf_pmu__new_caps(struct list_head *list, char *name, char *value)
+{
+	struct perf_pmu_caps *caps;
+
+	caps = zalloc(sizeof(*caps));
+	if (!caps)
+		return -ENOMEM;
+
+	caps->name = strdup(name);
+	caps->value = strndup(value, strlen(value) - 1);
+	list_add_tail(&caps->list, list);
+	return 0;
+}
+
+/*
+ * Reading/parsing the given pmu capabilities, which should be located at:
+ * /sys/bus/event_source/devices/<dev>/caps as sysfs group attributes.
+ * Return the number of capabilities
+ */
+int perf_pmu__caps_parse(struct perf_pmu *pmu)
+{
+	struct stat st;
+	char caps_path[PATH_MAX];
+	const char *sysfs = sysfs__mountpoint();
+	DIR *caps_dir;
+	struct dirent *evt_ent;
+	int nr_caps = 0;
+
+	if (!sysfs)
+		return -1;
+
+	snprintf(caps_path, PATH_MAX,
+		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
+
+	if (stat(caps_path, &st) < 0)
+		return 0;	/* no error if caps does not exist */
+
+	caps_dir = opendir(caps_path);
+	if (!caps_dir)
+		return -EINVAL;
+
+	while ((evt_ent = readdir(caps_dir)) != NULL) {
+		char *name = evt_ent->d_name;
+		char path[PATH_MAX];
+		char value[128];
+		FILE *file;
+
+		if (!strcmp(name, ".") || !strcmp(name, ".."))
+			continue;
+
+		snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
+
+		file = fopen(path, "r");
+		if (!file)
+			break;
+
+		if (!fgets(value, sizeof(value), file) ||
+		    (perf_pmu__new_caps(&pmu->caps, name, value) < 0)) {
+			fclose(file);
+			break;
+		}
+
+		nr_caps++;
+		fclose(file);
+	}
+
+	closedir(caps_dir);
+
+	return nr_caps;
+}
+
+struct perf_pmu_caps *perf_pmu__scan_caps(struct perf_pmu *pmu,
+					  struct perf_pmu_caps *caps)
+{
+	if (!pmu)
+		return NULL;
+
+	if (!caps)
+		caps = list_prepare_entry(caps, &pmu->caps, list);
+
+	list_for_each_entry_continue(caps, &pmu->caps, list)
+		return caps;
+
+	return NULL;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 6737e3d5d568..a228e27ae462 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -21,6 +21,12 @@ enum {
 
 struct perf_event_attr;
 
+struct perf_pmu_caps {
+	char *name;
+	char *value;
+	struct list_head list;
+};
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
@@ -32,6 +38,7 @@ struct perf_pmu {
 	struct perf_cpu_map *cpus;
 	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
 };
 
@@ -102,4 +109,9 @@ struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu);
 
 int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
 
+int perf_pmu__caps_parse(struct perf_pmu *pmu);
+
+struct perf_pmu_caps *perf_pmu__scan_caps(struct perf_pmu *pmu,
+					  struct perf_pmu_caps *caps);
+
 #endif /* __PMU_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 05/12] perf header: Support CPU PMU capabilities
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (3 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 04/12] perf pmu: Add support for PMU capabilities kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 06/12] perf machine: Refine the function for LBR call stack reconstruction kan.liang
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.

Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.

Add variable max_branches to facilitate future usage.

The CPU PMU capabilities information is only useful for LBR call stack
mode. Clear the feature for perf stat and other perf record mode.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 .../Documentation/perf.data-file-format.txt   |  16 +++
 tools/perf/builtin-record.c                   |   3 +
 tools/perf/builtin-stat.c                     |   1 +
 tools/perf/util/env.h                         |   3 +
 tools/perf/util/header.c                      | 110 ++++++++++++++++++
 tools/perf/util/header.h                      |   1 +
 6 files changed, 134 insertions(+)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index b0152e1095c5..b6472e463284 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -373,6 +373,22 @@ struct {
 Indicates that trace contains records of PERF_RECORD_COMPRESSED type
 that have perf_events records in compressed form.
 
+	HEADER_CPU_PMU_CAPS = 28,
+
+	A list of cpu PMU capabilities. The format of data is as below.
+
+struct {
+	u32 nr_cpu_pmu_caps;
+	{
+		char	name[];
+		char	value[];
+	} [nr_cpu_pmu_caps]
+};
+
+
+Example:
+ cpu pmu capabilities: branches=32, max_precise=3, pmu_name=icelake
+
 	other bits are reserved and should ignored for now
 	HEADER_FEAT_BITS	= 256,
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..428f7f5b8e48 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1120,6 +1120,9 @@ static void record__init_features(struct record *rec)
 	if (!record__comp_enabled(rec))
 		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
 
+	if (!callchain_param.enabled || (callchain_param.record_mode != CALLCHAIN_LBR))
+		perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
+
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a098c2ebf4ea..6d6979b8317a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1591,6 +1591,7 @@ static void init_features(struct perf_session *session)
 	perf_header__clear_feat(&session->header, HEADER_TRACING_DATA);
 	perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
 	perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
+	perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
 }
 
 static int __cmd_record(int argc, const char **argv)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 11d05ae3606a..d286d478b4d8 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -48,6 +48,7 @@ struct perf_env {
 	char			*cpuid;
 	unsigned long long	total_mem;
 	unsigned int		msr_pmu_type;
+	unsigned int		max_branches;
 
 	int			nr_cmdline;
 	int			nr_sibling_cores;
@@ -57,12 +58,14 @@ struct perf_env {
 	int			nr_memory_nodes;
 	int			nr_pmu_mappings;
 	int			nr_groups;
+	int			nr_cpu_pmu_caps;
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
 	char			*sibling_dies;
 	char			*sibling_threads;
 	char			*pmu_mappings;
+	char			*cpu_pmu_caps;
 	struct cpu_topology_map	*cpu;
 	struct cpu_cache_level	*caches;
 	int			 caches_cnt;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index acbd046bf95c..ce29321a4e1d 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1395,6 +1395,39 @@ static int write_compressed(struct feat_fd *ff __maybe_unused,
 	return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len));
 }
 
+static int write_cpu_pmu_caps(struct feat_fd *ff,
+			      struct evlist *evlist __maybe_unused)
+{
+	struct perf_pmu_caps *caps = NULL;
+	struct perf_pmu *cpu_pmu;
+	int nr_caps;
+	int ret;
+
+	cpu_pmu = perf_pmu__find("cpu");
+	if (!cpu_pmu)
+		return -ENOENT;
+
+	nr_caps = perf_pmu__caps_parse(cpu_pmu);
+	if (nr_caps < 0)
+		return nr_caps;
+
+	ret = do_write(ff, &nr_caps, sizeof(nr_caps));
+	if (ret < 0)
+		return ret;
+
+	while ((caps = perf_pmu__scan_caps(cpu_pmu, caps))) {
+		ret = do_write_string(ff, caps->name);
+		if (ret < 0)
+			return ret;
+
+		ret = do_write_string(ff, caps->value);
+		if (ret < 0)
+			return ret;
+	}
+
+	return ret;
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1809,6 +1842,28 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
 		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
 }
 
+static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
+{
+	const char *delimiter = "# cpu pmu capabilities: ";
+	char *str;
+	u32 nr_caps;
+
+	nr_caps = ff->ph->env.nr_cpu_pmu_caps;
+	if (!nr_caps) {
+		fprintf(fp, "# cpu pmu capabilities: not available\n");
+		return;
+	}
+
+	str = ff->ph->env.cpu_pmu_caps;
+	while (nr_caps--) {
+		fprintf(fp, "%s%s", delimiter, str);
+		delimiter = ", ";
+		str += strlen(str) + 1;
+	}
+
+	fprintf(fp, "\n");
+}
+
 static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
 {
 	const char *delimiter = "# pmu mappings: ";
@@ -2846,6 +2901,60 @@ static int process_compressed(struct feat_fd *ff,
 	return 0;
 }
 
+static int process_cpu_pmu_caps(struct feat_fd *ff,
+				void *data __maybe_unused)
+{
+	char *name, *value;
+	struct strbuf sb;
+	u32 nr_caps;
+
+	if (do_read_u32(ff, &nr_caps))
+		return -1;
+
+	if (!nr_caps) {
+		pr_debug("cpu pmu capabilities not available\n");
+		return 0;
+	}
+
+	ff->ph->env.nr_cpu_pmu_caps = nr_caps;
+
+	if (strbuf_init(&sb, 128) < 0)
+		return -1;
+
+	while (nr_caps--) {
+		name = do_read_string(ff);
+		if (!name)
+			goto error;
+
+		value = do_read_string(ff);
+		if (!value)
+			goto free_name;
+
+		if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
+			goto free_value;
+
+		/* include a NULL character at the end */
+		if (strbuf_add(&sb, "", 1) < 0)
+			goto free_value;
+
+		if (!strcmp(name, "branches"))
+			ff->ph->env.max_branches = atoi(value);
+
+		free(value);
+		free(name);
+	}
+	ff->ph->env.cpu_pmu_caps = strbuf_detach(&sb, NULL);
+	return 0;
+
+free_value:
+	free(value);
+free_name:
+	free(name);
+error:
+	strbuf_release(&sb);
+	return -1;
+}
+
 #define FEAT_OPR(n, func, __full_only) \
 	[HEADER_##n] = {					\
 		.name	    = __stringify(n),			\
@@ -2903,6 +3012,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPR(BPF_PROG_INFO, bpf_prog_info,  false),
 	FEAT_OPR(BPF_BTF,       bpf_btf,        false),
 	FEAT_OPR(COMPRESSED,	compressed,	false),
+	FEAT_OPR(CPU_PMU_CAPS,	cpu_pmu_caps,	false),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 840f95cee349..650bd1c7a99b 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -43,6 +43,7 @@ enum {
 	HEADER_BPF_PROG_INFO,
 	HEADER_BPF_BTF,
 	HEADER_COMPRESSED,
+	HEADER_CPU_PMU_CAPS,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 06/12] perf machine: Refine the function for LBR call stack reconstruction
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (4 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 05/12] perf header: Support CPU " kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 07/12] perf tools: Stitch LBR call stack kan.liang
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user
call stack. Now, with the help of HW idx, perf tool can reconstruct a
more complete call stack by adding some user call stack from previous
sample. However, current implementation is hard to be extended to
support it.

Abstract two new functions to resolve user call stack and kernel
call stack respectively.

No functional changes.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 186 ++++++++++++++++++++++++--------------
 1 file changed, 120 insertions(+), 66 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 62522b76a924..5a88230600f0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2161,6 +2161,97 @@ static int remove_loops(struct branch_entry *l, int nr,
 	return nr;
 }
 
+
+static int lbr_callchain_add_kernel_ip(struct thread *thread,
+				       struct callchain_cursor *cursor,
+				       struct perf_sample *sample,
+				       struct symbol **parent,
+				       struct addr_location *root_al,
+				       bool callee, int end)
+{
+	struct ip_callchain *chain = sample->callchain;
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int err, i;
+
+	if (callee) {
+		for (i = 0; i < end + 1; i++) {
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, chain->ips[i],
+					       false, NULL, NULL, 0);
+			if (err)
+				return err;
+		}
+	} else {
+		for (i = end; i >= 0; i--) {
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, chain->ips[i],
+					       false, NULL, NULL, 0);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int lbr_callchain_add_lbr_ip(struct thread *thread,
+				    struct callchain_cursor *cursor,
+				    struct perf_sample *sample,
+				    struct symbol **parent,
+				    struct addr_location *root_al,
+				    bool callee)
+{
+	struct branch_stack *lbr_stack = sample->branch_stack;
+	struct branch_entry *entries = get_branch_entry(sample);
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int lbr_nr = lbr_stack->nr;
+	struct branch_flags *flags;
+	u64 ip, branch_from = 0;
+	int err, i;
+
+	if (callee) {
+		ip = entries[0].to;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL, branch_from);
+		if (err)
+			return err;
+
+		for (i = 0; i < lbr_nr; i++) {
+			ip = entries[i].from;
+			flags = &entries[i].flags;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       true, flags, NULL, branch_from);
+			if (err)
+				return err;
+		}
+	} else {
+		for (i = lbr_nr - 1; i >= 0; i--) {
+			ip = entries[i].from;
+			flags = &entries[i].flags;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       true, flags, NULL, branch_from);
+			if (err)
+				return err;
+		}
+
+		ip = entries[0].to;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL, branch_from);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2176,81 +2267,44 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					int max_stack)
 {
 	struct ip_callchain *chain = sample->callchain;
-	int chain_nr = min(max_stack, (int)chain->nr), i;
-	u8 cpumode = PERF_RECORD_MISC_USER;
-	u64 ip, branch_from = 0;
+	int chain_nr = min(max_stack, (int)chain->nr);
+	int i, err;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
 			break;
 	}
 
-	/* LBR only affects the user callchain */
-	if (i != chain_nr) {
-		struct branch_stack *lbr_stack = sample->branch_stack;
-		struct branch_entry *entries = get_branch_entry(sample);
-		int lbr_nr = lbr_stack->nr, j, k;
-		bool branch;
-		struct branch_flags *flags;
-		/*
-		 * LBR callstack can only get user call chain.
-		 * The mix_chain_nr is kernel call chain
-		 * number plus LBR user call chain number.
-		 * i is kernel call chain number,
-		 * 1 is PERF_CONTEXT_USER,
-		 * lbr_nr + 1 is the user call chain number.
-		 * For details, please refer to the comments
-		 * in callchain__printf
-		 */
-		int mix_chain_nr = i + 1 + lbr_nr + 1;
-
-		for (j = 0; j < mix_chain_nr; j++) {
-			int err;
-			branch = false;
-			flags = NULL;
-
-			if (callchain_param.order == ORDER_CALLEE) {
-				if (j < i + 1)
-					ip = chain->ips[j];
-				else if (j > i + 1) {
-					k = j - i - 2;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				} else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
-			} else {
-				if (j < lbr_nr) {
-					k = lbr_nr - j - 1;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				}
-				else if (j > lbr_nr)
-					ip = chain->ips[i + 1 - (j - lbr_nr)];
-				else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
-			}
+	/*
+	 * LBR only affects the user callchain.
+	 * Fall back if there is no user callchain.
+	 */
+	if (i == chain_nr)
+		return 0;
 
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				return (err < 0) ? err : 0;
-		}
-		return 1;
+	if (callchain_param.order == ORDER_CALLEE) {
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, true, i);
+		if (err)
+			goto error;
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample,
+					       parent, root_al, true);
+		if (err)
+			goto error;
+	} else {
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample,
+					       parent, root_al, false);
+		if (err)
+			goto error;
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, false, i);
+		if (err)
+			goto error;
 	}
 
-	return 0;
+	return 1;
+error:
+	return (err < 0) ? err : 0;
 }
 
 static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 07/12] perf tools: Stitch LBR call stack
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (5 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 06/12] perf machine: Refine the function for LBR call stack reconstruction kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 08/12] perf report: Add option to enable the LBR stitching approach kan.liang
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.

  For example, on skylake, the depth of reconstructed LBR call stack is
  always <= 32.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6487119731
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # ................................

    99.97%    99.97%  tchain_edit      tchain_edit        [.] f43
            |
             --99.64%--f11
                       f12
                       f13
                       f14
                       f15
                       f16
                       f17
                       f18
                       f19
                       f20
                       f21
                       f22
                       f23
                       f24
                       f25
                       f26
                       f27
                       f28
                       f29
                       f30
                       f31
                       f32
                       f33
                       f34
                       f35
                       f36
                       f37
                       f38
                       f39
                       f40
                       f41
                       f42
                       f43

For a call stack which is deeper than LBR limit, HW will overwrite the
LBR register with oldest branch. Only partial call stacks can be
reconstructed.

However, the overwritten LBRs may still be retrieved from previous
sample. At that moment, HW hasn't overwritten the LBR registers yet.
Perf tools can stitch those overwritten LBRs on current call stacks to
get a more complete call stack.

To determine if LBRs can be stitched, perf tools need to compare current
sample with previous sample.
- They should have identical LBR records (Same from, to and flags
  values, and the same physical index of LBR registers).
- The searching starts from the base-of-stack of current sample.

In struct lbr_stitch, add 'prev_sample' to save the previous sample.
Add 'prev_lbr_cursor' to save all LBR cursor nodes from previous sample.
Once perf determines to stitch the previous LBRs, the corresponding LBR
cursor nodes will be copied to 'lists'.
The 'lists' is to track the LBR cursor nodes which are going to be
stitched.
When the stitching is over, the nodes will not be freed immediately.
They will be moved to 'free_lists'. Next stitching may reuse the space.
Both 'lists' and 'free_lists' will be freed when all samples are
processed.

The 'lbr_stitch_enable' is used to indicate whether enable LBR stitch
approach, which is disabled by default. The following patch will
introduce a new option to enable the LBR stitch approach.
This is because,
- The stitching approach base on LBR call stack technology. The known
limitations of LBR call stack technology still apply to the approach,
e.g. Exception handing such as setjmp/longjmp will have calls/returns
not match.
- This approach is not full proof. There can be cases where it creates
incorrect call stacks from incorrect matches. There is no attempt
to validate any matches in another way.

However in many common cases with call stack overflows it can recreate
better call stacks than the default lbr call stack output. So if there
are problems with LBR overflows, this is a possible workaround.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/branch.h    |   5 +-
 tools/perf/util/callchain.h |  12 +-
 tools/perf/util/machine.c   | 235 +++++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.c    |   2 +
 tools/perf/util/thread.h    |  34 ++++++
 5 files changed, 283 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 7fc9fa0dc361..395e1521f6ca 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -35,7 +35,10 @@ struct branch_info {
 struct branch_entry {
 	u64			from;
 	u64			to;
-	struct branch_flags	flags;
+	union {
+		struct branch_flags	flags;
+		u64			flags_value;
+	};
 };
 
 struct branch_stack {
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 706bb7bbe1e1..e599a23c0fdb 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -148,7 +148,17 @@ struct callchain_cursor_node {
 	u64				branch_from;
 	int				nr_loop_iter;
 	u64				iter_cycles;
-	struct callchain_cursor_node	*next;
+	union {
+		struct callchain_cursor_node	*next;
+
+		/* Indicate valid cursor node for LBR stitch */
+		bool				valid;
+	};
+};
+
+struct stitch_list {
+	struct list_head		node;
+	struct callchain_cursor_node	cursor;
 };
 
 struct callchain_cursor {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5a88230600f0..6515d0737a76 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2194,6 +2194,31 @@ static int lbr_callchain_add_kernel_ip(struct thread *thread,
 	return 0;
 }
 
+static void save_lbr_cursor_node(struct thread *thread,
+				 struct callchain_cursor *cursor,
+				 int idx)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+
+	if (!lbr_stitch)
+		return;
+
+	if (cursor->pos == cursor->nr) {
+		lbr_stitch->prev_lbr_cursor[idx].valid = false;
+		return;
+	}
+
+	if (!cursor->curr)
+		cursor->curr = cursor->first;
+	else
+		cursor->curr = cursor->curr->next;
+	memcpy(&lbr_stitch->prev_lbr_cursor[idx], cursor->curr,
+	       sizeof(struct callchain_cursor_node));
+
+	lbr_stitch->prev_lbr_cursor[idx].valid = true;
+	cursor->pos++;
+}
+
 static int lbr_callchain_add_lbr_ip(struct thread *thread,
 				    struct callchain_cursor *cursor,
 				    struct perf_sample *sample,
@@ -2209,6 +2234,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	u64 ip, branch_from = 0;
 	int err, i;
 
+	/*
+	 * The curr and pos are not used in writing session. They are cleared
+	 * in callchain_cursor_commit() when the writing session is closed.
+	 * Using curr and pos to track the current cursor node.
+	 */
+	if (thread->lbr_stitch) {
+		cursor->curr = NULL;
+		cursor->pos = cursor->nr;
+		if (cursor->nr) {
+			cursor->curr = cursor->first;
+			for (i = 0; i < (int)(cursor->nr - 1); i++)
+				cursor->curr = cursor->curr->next;
+		}
+	}
+
 	if (callee) {
 		ip = entries[0].to;
 		flags = &entries[0].flags;
@@ -2219,6 +2259,20 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 		if (err)
 			return err;
 
+		/*
+		 * The number of cursor node increases.
+		 * Move the current cursor node.
+		 * But does not need to save current cursor node for entry 0.
+		 * It's impossible to stitch the whole LBRs of previous sample.
+		 */
+		if (thread->lbr_stitch && (cursor->pos != cursor->nr)) {
+			if (!cursor->curr)
+				cursor->curr = cursor->first;
+			else
+				cursor->curr = cursor->curr->next;
+			cursor->pos++;
+		}
+
 		for (i = 0; i < lbr_nr; i++) {
 			ip = entries[i].from;
 			flags = &entries[i].flags;
@@ -2227,6 +2281,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 					       true, flags, NULL, branch_from);
 			if (err)
 				return err;
+			save_lbr_cursor_node(thread, cursor, i);
 		}
 	} else {
 		for (i = lbr_nr - 1; i >= 0; i--) {
@@ -2237,6 +2292,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 					       true, flags, NULL, branch_from);
 			if (err)
 				return err;
+			save_lbr_cursor_node(thread, cursor, i);
 		}
 
 		ip = entries[0].to;
@@ -2252,6 +2308,148 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
+static int lbr_callchain_add_stitched_lbr_ip(struct thread *thread,
+					     struct callchain_cursor *cursor)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *stitch_node;
+	int err;
+
+	struct callchain_cursor_node *cnode;
+
+	list_for_each_entry(stitch_node, &lbr_stitch->lists, node) {
+		cnode = &stitch_node->cursor;
+
+		err = callchain_cursor_append(cursor, cnode->ip,
+					      &cnode->ms,
+					      cnode->branch,
+					      &cnode->branch_flags,
+					      cnode->nr_loop_iter,
+					      cnode->iter_cycles,
+					      cnode->branch_from,
+					      cnode->srcline);
+		if (err)
+			return err;
+
+	}
+	return 0;
+}
+
+static struct stitch_list *get_stitch_node(struct thread *thread)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *stitch_node;
+
+	if (!list_empty(&lbr_stitch->free_lists)) {
+		stitch_node = list_first_entry(&lbr_stitch->free_lists,
+					       struct stitch_list, node);
+		list_del(&stitch_node->node);
+
+		return stitch_node;
+	}
+
+	return malloc(sizeof(struct stitch_list));
+}
+
+static bool has_stitched_lbr(struct thread *thread,
+			     struct perf_sample *cur,
+			     struct perf_sample *prev,
+			     unsigned int max_lbr,
+			     bool callee)
+{
+	struct branch_stack *cur_stack = cur->branch_stack;
+	struct branch_entry *cur_entries = get_branch_entry(cur);
+	struct branch_stack *prev_stack = prev->branch_stack;
+	struct branch_entry *prev_entries = get_branch_entry(prev);
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	int i, j, nr_identical_branches = 0;
+	struct stitch_list *stitch_node;
+	u64 cur_base, distance;
+
+	if (!cur_stack || !prev_stack)
+		return false;
+
+	/* Find the physical index of the base-of-stack for current sample. */
+	cur_base = max_lbr - cur_stack->nr + cur_stack->hw_idx + 1;
+
+	distance = (prev_stack->hw_idx > cur_base) ? (prev_stack->hw_idx - cur_base) :
+					    (max_lbr + prev_stack->hw_idx - cur_base);
+	/* Previous sample has shorter stack. Nothing can be stitched. */
+	if (distance + 1 > prev_stack->nr)
+		return false;
+
+	/*
+	 * Check if there are identical LBRs between two samples.
+	 * Identicall LBRs must have same from, to and flags values. Also,
+	 * they have to be saved in the same LBR registers (same physical
+	 * index).
+	 *
+	 * Starts from the base-of-stack of current sample.
+	 */
+	for (i = distance, j = cur_stack->nr - 1; (i >= 0) && (j >= 0); i--, j--) {
+		if ((prev_entries[i].from != cur_entries[j].from) ||
+		    (prev_entries[i].to != cur_entries[j].to) ||
+		    (prev_entries[i].flags_value != cur_entries[j].flags_value))
+			break;
+
+		nr_identical_branches++;
+	}
+
+	if (!nr_identical_branches)
+		return false;
+
+	/*
+	 * Save the LBRs between the base-of-stack of previous sample
+	 * and the base-of-stack of current sample into lbr_stitch->lists.
+	 * These LBRs will be stitched later.
+	 */
+	for (i = prev_stack->nr - 1; i > (int)distance; i--) {
+
+		if (!lbr_stitch->prev_lbr_cursor[i].valid)
+			continue;
+
+		stitch_node = get_stitch_node(thread);
+		if (!stitch_node)
+			return false;
+
+		memcpy(&stitch_node->cursor, &lbr_stitch->prev_lbr_cursor[i],
+		       sizeof(struct callchain_cursor_node));
+
+		if (callee)
+			list_add(&stitch_node->node, &lbr_stitch->lists);
+		else
+			list_add_tail(&stitch_node->node, &lbr_stitch->lists);
+	}
+
+	return true;
+}
+
+static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
+{
+	if (thread->lbr_stitch)
+		return true;
+
+	thread->lbr_stitch = calloc(1, sizeof(struct lbr_stitch));
+	if (!thread->lbr_stitch)
+		goto err;
+
+	thread->lbr_stitch->prev_lbr_cursor = calloc(max_lbr + 1, sizeof(struct callchain_cursor_node));
+	if (!thread->lbr_stitch->prev_lbr_cursor)
+		goto free_lbr_stitch;
+
+	INIT_LIST_HEAD(&thread->lbr_stitch->lists);
+	INIT_LIST_HEAD(&thread->lbr_stitch->free_lists);
+
+	return true;
+
+free_lbr_stitch:
+	free(thread->lbr_stitch);
+	thread->lbr_stitch = NULL;
+err:
+	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
+	thread->lbr_stitch_enable = false;
+	return false;
+}
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2264,10 +2462,14 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					struct perf_sample *sample,
 					struct symbol **parent,
 					struct addr_location *root_al,
-					int max_stack)
+					int max_stack,
+					unsigned int max_lbr)
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr);
+	bool callee = (callchain_param.order == ORDER_CALLEE);
+	struct lbr_stitch *lbr_stitch;
+	bool stitched_lbr = false;
 	int i, err;
 
 	for (i = 0; i < chain_nr; i++) {
@@ -2282,7 +2484,21 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	if (i == chain_nr)
 		return 0;
 
-	if (callchain_param.order == ORDER_CALLEE) {
+	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
+	    (max_lbr > 0) && alloc_lbr_stitch(thread, max_lbr)) {
+		lbr_stitch = thread->lbr_stitch;
+
+		stitched_lbr = has_stitched_lbr(thread, sample,
+						&lbr_stitch->prev_sample,
+						max_lbr, callee);
+		if (!stitched_lbr) {
+			list_replace_init(&lbr_stitch->lists,
+					  &lbr_stitch->free_lists);
+		}
+		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
+	}
+
+	if (callee) {
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
 						  parent, root_al, true, i);
 		if (err)
@@ -2291,7 +2507,17 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					       parent, root_al, true);
 		if (err)
 			goto error;
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
 	} else {
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
 		err = lbr_callchain_add_lbr_ip(thread, cursor, sample,
 					       parent, root_al, false);
 		if (err)
@@ -2349,8 +2575,11 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		chain_nr = chain->nr;
 
 	if (perf_evsel__has_branch_callstack(evsel)) {
+		struct perf_env *env = perf_evsel__env(evsel);
+
 		err = resolve_lbr_callchain_sample(thread, cursor, sample, parent,
-						   root_al, max_stack);
+						   root_al, max_stack,
+						   !env ? 0 : env->max_branches);
 		if (err)
 			return (err < 0) ? err : 0;
 	}
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 28b719388028..8d0da260c84c 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -47,6 +47,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		thread->lbr_stitch_enable = false;
 		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 		init_rwsem(&thread->namespaces_lock);
@@ -110,6 +111,7 @@ void thread__delete(struct thread *thread)
 
 	exit_rwsem(&thread->namespaces_lock);
 	exit_rwsem(&thread->comm_lock);
+	thread__free_stitch_list(thread);
 	free(thread);
 }
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 20b96b5d1f15..58f0d23f471e 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -13,6 +13,8 @@
 #include <strlist.h>
 #include <intlist.h>
 #include "rwsem.h"
+#include "event.h"
+#include "callchain.h"
 
 struct addr_location;
 struct map;
@@ -20,6 +22,13 @@ struct perf_record_namespaces;
 struct thread_stack;
 struct unwind_libunwind_ops;
 
+struct lbr_stitch {
+	struct list_head		lists;
+	struct list_head		free_lists;
+	struct perf_sample		prev_sample;
+	struct callchain_cursor_node	*prev_lbr_cursor;
+};
+
 struct thread {
 	union {
 		struct rb_node	 rb_node;
@@ -46,6 +55,10 @@ struct thread {
 	struct srccode_state	srccode_state;
 	bool			filter;
 	int			filter_entry_depth;
+
+	/* LBR call stack stitch */
+	bool			lbr_stitch_enable;
+	struct lbr_stitch	*lbr_stitch;
 };
 
 struct machine;
@@ -142,4 +155,25 @@ static inline bool thread__is_filtered(struct thread *thread)
 	return false;
 }
 
+static inline void thread__free_stitch_list(struct thread *thread)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *pos, *tmp;
+
+	if (!lbr_stitch)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->free_lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+	free(lbr_stitch->prev_lbr_cursor);
+	free(thread->lbr_stitch);
+}
+
 #endif	/* __PERF_THREAD_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 08/12] perf report: Add option to enable the LBR stitching approach
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (6 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 07/12] perf tools: Stitch LBR call stack kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 09/12] perf script: " kan.liang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6492797701
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # .................................
  #
    99.99%    99.99%  tchain_edit      tchain_edit        [.] f43
            |
            ---main
               f1
               f2
               f3
               f4
               f5
               f6
               f7
               f8
               f9
               f10
               f11
               f12
               f13
               f14
               f15
               f16
               f17
               f18
               f19
               f20
               f21
               f22
               f23
               f24
               f25
               f26
               f27
               f28
               f29
               f30
               f31
               |
                --99.65%--f32
                          f33
                          f34
                          f35
                          f36
                          f37
                          f38
                          f39
                          f40
                          f41
                          f42
                          f43

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt | 11 +++++++++++
 tools/perf/builtin-report.c              |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index db61f16ffa56..5e4155d2511c 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -482,6 +482,17 @@ include::itrace.txt[]
 	This option extends the perf report to show reference callgraphs,
 	which collected by reference event, in no callgraph event.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 --socket-filter::
 	Only report the samples on the processor socket that match with this filter
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9483b3f0cae3..286f50d3fc65 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -84,6 +84,7 @@ struct report {
 	bool			header_only;
 	bool			nonany_branch_mode;
 	bool			group_set;
+	bool			stitch_lbr;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	struct annotation_options annotation_opts;
@@ -267,6 +268,9 @@ static int process_sample_event(struct perf_tool *tool,
 		return -1;
 	}
 
+	if (rep->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		goto out_put;
 
@@ -1241,6 +1245,8 @@ int cmd_report(int argc, const char **argv)
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
 		    "Show callgraph from reference event"),
+	OPT_BOOLEAN(0, "stitch-lbr", &report.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_INTEGER(0, "socket-filter", &report.socket_filter,
 		    "only show processor socket that match with this filter"),
 	OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 09/12] perf script: Add option to enable the LBR stitching approach
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (7 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 08/12] perf report: Add option to enable the LBR stitching approach kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 10/12] perf top: " kan.liang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt | 11 +++++++++++
 tools/perf/builtin-script.c              |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 2599b057e47b..472f20f1e479 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -426,6 +426,17 @@ include::itrace.txt[]
 --show-on-off-events::
 	Show the --switch-on/off events too.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index acf3107bbda2..c229e0199003 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1687,6 +1687,7 @@ struct perf_script {
 	bool			show_bpf_events;
 	bool			allocated;
 	bool			per_event_dump;
+	bool			stitch_lbr;
 	struct evswitch		evswitch;
 	struct perf_cpu_map	*cpus;
 	struct perf_thread_map *threads;
@@ -1913,6 +1914,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(IP)) {
 		struct callchain_cursor *cursor = NULL;
 
+		if (script->stitch_lbr)
+			al->thread->lbr_stitch_enable = true;
+
 		if (symbol_conf.use_callchain && sample->callchain &&
 		    thread__resolve_callchain(al->thread, &callchain_cursor, evsel,
 					      sample, NULL, NULL, scripting_max_stack) == 0)
@@ -3602,6 +3606,8 @@ int cmd_script(int argc, const char **argv)
 		   "file", "file saving guest os /proc/kallsyms"),
 	OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules,
 		   "file", "file saving guest os /proc/modules"),
+	OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&script.evswitch),
 	OPT_END()
 	};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 10/12] perf top: Add option to enable the LBR stitching approach
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (8 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 09/12] perf script: " kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 11/12] perf c2c: " kan.liang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.
The option must be used with --call-graph lbr.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-top.txt |  9 +++++++++
 tools/perf/builtin-top.c              | 11 +++++++++++
 tools/perf/util/top.h                 |  1 +
 3 files changed, 21 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 324b6b53c86b..0648d96981fe 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -310,6 +310,15 @@ Default is to monitor all CPUS.
 	go straight to the histogram browser, just like 'perf top' with no events
 	explicitely specified does.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The option must be used with --call-graph lbr recording.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
 
 INTERACTIVE PROMPTING KEYS
 --------------------------
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8affcab75604..fd63e066f235 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -33,6 +33,7 @@
 #include "util/map.h"
 #include "util/mmap.h"
 #include "util/session.h"
+#include "util/thread.h"
 #include "util/symbol.h"
 #include "util/synthetic-events.h"
 #include "util/top.h"
@@ -766,6 +767,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (machine__resolve(machine, &al, sample) < 0)
 		return;
 
+	if (top->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (!machine->kptr_restrict_warned &&
 	    symbol_conf.kptr_restrict &&
 	    al.cpumode == PERF_RECORD_MISC_KERNEL) {
@@ -1543,6 +1547,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "stitch-lbr", &top.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
@@ -1612,6 +1618,11 @@ int cmd_top(int argc, const char **argv)
 		}
 	}
 
+	if (top.stitch_lbr && !(callchain_param.record_mode == CALLCHAIN_LBR)) {
+		pr_err("Error: --stitch-lbr must be used with --call-graph lbr\n");
+		goto out_delete_evlist;
+	}
+
 	if (opts->branch_stack && callchain_param.enabled)
 		symbol_conf.show_branchflag_count = true;
 
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index f117d4f4821e..45dc84ddff37 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -36,6 +36,7 @@ struct perf_top {
 	bool		   use_tui, use_stdio;
 	bool		   vmlinux_warned;
 	bool		   dump_symtab;
+	bool		   stitch_lbr;
 	struct hist_entry  *sym_filter_entry;
 	struct evsel 	   *sym_evsel;
 	struct perf_session *session;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 11/12] perf c2c: Add option to enable the LBR stitching approach
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (9 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 10/12] perf top: " kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-02-28 16:30 ` [PATCH 12/12] perf hist: Add fast path for duplicate entries check approach kan.liang
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-c2c.txt | 11 +++++++++++
 tools/perf/builtin-c2c.c              |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index e6150f21267d..2133eb320cb0 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -111,6 +111,17 @@ REPORT OPTIONS
 --display::
 	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf c2c record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 C2C RECORD
 ----------
 The perf c2c record command setup options related to HITM cacheline analysis
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 246ac0b4d54f..c798763f62db 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -95,6 +95,7 @@ struct perf_c2c {
 	bool			 use_stdio;
 	bool			 stats_only;
 	bool			 symbol_full;
+	bool			 stitch_lbr;
 
 	/* HITM shared clines stats */
 	struct c2c_stats	hitm_stats;
@@ -273,6 +274,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 		return -1;
 	}
 
+	if (c2c.stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	ret = sample__resolve_callchain(sample, &callchain_cursor, NULL,
 					evsel, &al, sysctl_perf_event_max_stack);
 	if (ret)
@@ -2752,6 +2756,8 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
+	OPT_BOOLEAN(0, "stitch-lbr", &c2c.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_PARENT(c2c_options),
 	OPT_END()
 	};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 12/12] perf hist: Add fast path for duplicate entries check approach
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (10 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 11/12] perf c2c: " kan.liang
@ 2020-02-28 16:30 ` kan.liang
  2020-03-04 13:33 ` [PATCH 00/12] Stitch LBR call stack (Perf Tools) Arnaldo Carvalho de Melo
  2020-03-06  9:39 ` Jiri Olsa
  13 siblings, 0 replies; 31+ messages in thread
From: kan.liang @ 2020-02-28 16:30 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.

The hist_entry__cmp() is used to compare the new entry with the old
entries. It will go through all the available sorts in the sort_list,
and call the specific cmp of each sort, which is very slow.
Actually, for most cases, there are no duplicate entries in callchain.
The symbols are usually different. It's much faster to do a quick check
for symbols first. Only do the full cmp when the symbols are exactly the
same.
The quick check is only to check symbols, not dso. Export
_sort__sym_cmp.

 $perf record --call-graph lbr ./tchain_edit_64

 Without the patch
 $time perf report --stdio
 real    0m21.142s
 user    0m21.110s
 sys     0m0.033s

 With the patch
 $time perf report --stdio
 real    0m10.977s
 user    0m10.948s
 sys     0m0.027s

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 23 +++++++++++++++++++++++
 tools/perf/util/sort.c |  2 +-
 tools/perf/util/sort.h |  2 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 808ca27bd5cf..1894423f5c07 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1057,6 +1057,20 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
 	return fill_callchain_info(al, node, iter->hide_unresolved);
 }
 
+static bool
+hist_entry__fast__sym_diff(struct hist_entry *left,
+			   struct hist_entry *right)
+{
+	struct symbol *sym_l = left->ms.sym;
+	struct symbol *sym_r = right->ms.sym;
+
+	if (!sym_l && !sym_r)
+		return left->ip != right->ip;
+
+	return !!_sort__sym_cmp(sym_l, sym_r);
+}
+
+
 static int
 iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			       struct addr_location *al)
@@ -1083,6 +1097,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	};
 	int i;
 	struct callchain_cursor cursor;
+	bool fast = hists__has(he_tmp.hists, sym);
 
 	callchain_cursor_snapshot(&cursor, &callchain_cursor);
 
@@ -1093,6 +1108,14 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	 * It's possible that it has cycles or recursive calls.
 	 */
 	for (i = 0; i < iter->curr; i++) {
+		/*
+		 * For most cases, there are no duplicate entries in callchain.
+		 * The symbols are usually different. Do a quick check for
+		 * symbols first.
+		 */
+		if (fast && hist_entry__fast__sym_diff(he_cache[i], &he_tmp))
+			continue;
+
 		if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) {
 			/* to avoid calling callback function */
 			iter->he = NULL;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ab0cfd790ad0..33e0fa1bc203 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -234,7 +234,7 @@ static int64_t _sort__addr_cmp(u64 left_ip, u64 right_ip)
 	return (int64_t)(right_ip - left_ip);
 }
 
-static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
+int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 {
 	if (!sym_l || !sym_r)
 		return cmp_null(sym_l, sym_r);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..c3c3c68cbfdd 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -309,5 +309,7 @@ int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t
 sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right);
+int64_t
+_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r);
 char *hist_entry__srcline(struct hist_entry *he);
 #endif	/* __PERF_SORT_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (11 preceding siblings ...)
  2020-02-28 16:30 ` [PATCH 12/12] perf hist: Add fast path for duplicate entries check approach kan.liang
@ 2020-03-04 13:33 ` Arnaldo Carvalho de Melo
  2020-03-06  9:39 ` Jiri Olsa
  13 siblings, 0 replies; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-04 13:33 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The kernel patches have been merged into linux-next.
>   commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
> index of raw branch records")
>   commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
> correctly")

I saw it landed in tip/perf/core, going thru this patchset now.

Thanks,

- Arnaldo
 
> Start from Haswell, Linux perf can utilize the existing Last Branch
> Record (LBR) facility to record call stack. However, the depth of the
> reconstructed LBR call stack limits to the number of LBR registers.
> E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
> That's because HW will overwrite the oldest LBR registers when it's
> full.
> 
> However, the overwritten LBRs may still be retrieved from previous
> sample. At that moment, HW hasn't overwritten the LBR registers yet.
> Perf tools can stitch those overwritten LBRs on current call stacks to
> get a more complete call stack.
> 
> To determine if LBRs can be stitched, the physical index of LBR
> registers is required. A new branch sample type is introduced to dump
> the physical index of the most recent LBR aka Top-of-Stack (TOS)
> information for perf tools.
> Patch 1 & 2 extend struct branch_stack to support the new branch sample
> type, PERF_SAMPLE_BRANCH_HW_INDEX.
> 
> Since the output format of PERF_SAMPLE_BRANCH_STACK will be changed
> when the new branch sample type is set, an older version of perf tool
> may parse the perf.data incorrectly. Furthermore, there is no warning
> if this case happens. Because current perf header never check for
> unknown input bits in attr. Patch 3 adds check for event attr. (Can be
> merged independently.)
> 
> Besides the physical index, the maximum number of LBRs is required as
> well. Patch 4 & 5 retrieve the capabilities information from sysfs
> and save them in perf header.
> 
> Patch 6 & 7 implements the LBR stitching approach.
> 
> Users can use the options introduced in patch 8-11 to enable the LBR
> stitching approach for perf report, script, top and c2c.
> 
> Patch 12 adds a fast path for duplicate entries check. It benefits all
> call stack parsing, not just for stitch LBR call stack. It can be
> merged independently.
> 
> 
> The stitching approach base on LBR call stack technology. The known
> limitations of LBR call stack technology still apply to the approach,
> e.g. Exception handing such as setjmp/longjmp will have calls/returns
> not match.
> This approach is not full proof. There can be cases where it creates
> incorrect call stacks from incorrect matches. There is no attempt
> to validate any matches in another way. So it is not enabled by default.
> However in many common cases with call stack overflows it can recreate
> better call stacks than the default lbr call stack output. So if there
> are problems with LBR overflows this is a possible workaround.
> 
> Regression:
> Users may collect LBR call stack on a machine with new perf tool and
> new kernel (support LBR TOS). However, they may parse the perf.data with
> old perf tool (not support LBR TOS). The old tool doesn't check
> attr.branch_sample_type. Users probably get incorrect information
> without any warning.
> 
> Performance impact:
> The processing time may increase with the LBR stitching approach
> enabled. The impact depends on the increased depth of call stacks.
> 
> For a simple test case tchain_edit with 43 depth of call stacks.
> perf record --call-graph lbr -- ./tchain_edit
> perf report --stitch-lbr
> 
> Without --stitch-lbr, perf report only display 32 depth of call stacks.
> With --stitch-lbr, perf report can display all 43 depth of call stacks.
> The depth of call stacks increase 34.3%.
> 
> Correspondingly, the processing time of perf report increases 39%,
> Without --stitch-lbr:                           11.0 sec
> With --stitch-lbr:                              15.3 sec
> 
> The source code of tchain_edit.c is something similar as below.
> noinline void f43(void)
> {
>         int i;
>         for (i = 0; i < 10000;) {
> 
>                 if(i%2)
>                         i++;
>                 else
>                         i++;
>         }
> }
> 
> noinline void f42(void)
> {
>         int i;
>         for (i = 0; i < 100; i++) {
>                 f43();
>                 f43();
>                 f43();
>         }
> }
> 
> noinline void f41(void)
> {
>         int i;
>         for (i = 0; i < 100; i++) {
>                 f42();
>                 f42();
>                 f42();
>         }
> }
> noinline void f40(void)
> {
>         f41();
> }
> 
> ... ...
> 
> noinline void f32(void)
> {
>         f33();
> }
> 
> noinline void f31(void)
> {
>         int i;
> 
>         for (i = 0; i < 10000; i++) {
>                 if(i%2)
>                         i++;
>                 else
>                         i++;
>         }
> 
>         f32();
> }
> 
> noinline void f30(void)
> {
>         f31();
> }
> 
> ... ...
> 
> noinline void f1(void)
> {
>         f2();
> }
> 
> int main()
> {
>         f1();
> }
> 
> Kan Liang (12):
>   perf tools: Add hw_idx in struct branch_stack
>   perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
>   perf header: Add check for event attr
>   perf pmu: Add support for PMU capabilities
>   perf header: Support CPU PMU capabilities
>   perf machine: Refine the function for LBR call stack reconstruction
>   perf tools: Stitch LBR call stack
>   perf report: Add option to enable the LBR stitching approach
>   perf script: Add option to enable the LBR stitching approach
>   perf top: Add option to enable the LBR stitching approach
>   perf c2c: Add option to enable the LBR stitching approach
>   perf hist: Add fast path for duplicate entries check approach
> 
>  tools/include/uapi/linux/perf_event.h         |   8 +-
>  tools/perf/Documentation/perf-c2c.txt         |  11 +
>  tools/perf/Documentation/perf-report.txt      |  11 +
>  tools/perf/Documentation/perf-script.txt      |  11 +
>  tools/perf/Documentation/perf-top.txt         |   9 +
>  .../Documentation/perf.data-file-format.txt   |  16 +
>  tools/perf/builtin-c2c.c                      |   6 +
>  tools/perf/builtin-record.c                   |   3 +
>  tools/perf/builtin-report.c                   |   6 +
>  tools/perf/builtin-script.c                   |  76 ++--
>  tools/perf/builtin-stat.c                     |   1 +
>  tools/perf/builtin-top.c                      |  11 +
>  tools/perf/tests/sample-parsing.c             |   7 +-
>  tools/perf/util/branch.h                      |  27 +-
>  tools/perf/util/callchain.h                   |  12 +-
>  tools/perf/util/cs-etm.c                      |   1 +
>  tools/perf/util/env.h                         |   3 +
>  tools/perf/util/event.h                       |   1 +
>  tools/perf/util/evsel.c                       |  20 +-
>  tools/perf/util/evsel.h                       |   6 +
>  tools/perf/util/header.c                      | 147 ++++++
>  tools/perf/util/header.h                      |   1 +
>  tools/perf/util/hist.c                        |  26 +-
>  tools/perf/util/intel-pt.c                    |   2 +
>  tools/perf/util/machine.c                     | 424 +++++++++++++++---
>  tools/perf/util/perf_event_attr_fprintf.c     |   1 +
>  tools/perf/util/pmu.c                         |  87 ++++
>  tools/perf/util/pmu.h                         |  12 +
>  .../scripting-engines/trace-event-python.c    |  30 +-
>  tools/perf/util/session.c                     |   8 +-
>  tools/perf/util/sort.c                        |   2 +-
>  tools/perf/util/sort.h                        |   2 +
>  tools/perf/util/synthetic-events.c            |   6 +-
>  tools/perf/util/thread.c                      |   2 +
>  tools/perf/util/thread.h                      |  34 ++
>  tools/perf/util/top.h                         |   1 +
>  36 files changed, 900 insertions(+), 131 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
@ 2020-03-04 13:49   ` Arnaldo Carvalho de Melo
  2020-03-04 15:45     ` Arnaldo Carvalho de Melo
  2020-03-19 14:10     ` [tip: perf/core] tools headers UAPI: Update tools's copy of linux/perf_event.h tip-bot2 for Arnaldo Carvalho de Melo
  2020-03-10  0:42   ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack Arnaldo Carvalho de Melo
  2020-03-19 14:10   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2 siblings, 2 replies; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-04 13:49 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Feb 28, 2020 at 08:30:00AM -0800, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The low level index of raw branch records for the most recent branch can
> be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
> branch_sample_type. Extend struct branch_stack to support it.
> 
> However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
> entries[] will be output by kernel. The pointer of entries[] could be
> wrong, since the output format is different with new struct branch_stack.
> Add a variable no_hw_idx in struct perf_sample to indicate whether the
> hw_idx is output.
> Add get_branch_entry() to return corresponding pointer of entries[0].
 
This should be broken up in at least two patches, one that syncs
tools/include/uapi/linux/perf_event.h with the kernel, and another to do
what this changeset log message states, I'll do it this time to expedite
processing of this patchset, please do it that way next time.

- Arnaldo

> To make dummy branch sample consistent as new branch sample, add hw_idx
> in struct dummy_branch_stack for cs-etm and intel-pt.
> 
> Apply the new struct branch_stack for synthetic events as well.
> 
> Extend test case sample-parsing to support new struct branch_stack.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/include/uapi/linux/perf_event.h         |  8 ++-
>  tools/perf/builtin-script.c                   | 70 ++++++++++---------
>  tools/perf/tests/sample-parsing.c             |  7 +-
>  tools/perf/util/branch.h                      | 22 ++++++
>  tools/perf/util/cs-etm.c                      |  1 +
>  tools/perf/util/event.h                       |  1 +
>  tools/perf/util/evsel.c                       |  5 ++
>  tools/perf/util/evsel.h                       |  5 ++
>  tools/perf/util/hist.c                        |  3 +-
>  tools/perf/util/intel-pt.c                    |  2 +
>  tools/perf/util/machine.c                     | 35 +++++-----
>  .../scripting-engines/trace-event-python.c    | 30 ++++----
>  tools/perf/util/session.c                     |  8 ++-
>  tools/perf/util/synthetic-events.c            |  6 +-
>  14 files changed, 131 insertions(+), 72 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 377d794d3105..397cfd65b3fe 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -181,6 +181,8 @@ enum perf_branch_sample_type_shift {
>  
>  	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
>  
> +	PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT	= 17, /* save low level index of raw branch records */
> +
>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>  };
>  
> @@ -208,6 +210,8 @@ enum perf_branch_sample_type {
>  	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
>  		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>  
> +	PERF_SAMPLE_BRANCH_HW_INDEX	= 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
> +
>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> @@ -853,7 +857,9 @@ enum perf_event_type {
>  	 *	  char                  data[size];}&& PERF_SAMPLE_RAW
>  	 *
>  	 *	{ u64                   nr;
> -	 *        { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
> +	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> +	 *        { u64 from, to, flags } lbr[nr];
> +	 *      } && PERF_SAMPLE_BRANCH_STACK
>  	 *
>  	 * 	{ u64			abi; # enum perf_sample_regs_abi
>  	 * 	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index e2406b291c1c..acf3107bbda2 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  					struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  		return 0;
>  
>  	for (i = 0; i < br->nr; i++) {
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		if (PRINT_FIELD(DSO)) {
>  			memset(&alf, 0, sizeof(alf));
> @@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  		}
>  
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str( br->entries + i),
> -			br->entries[i].flags.in_tx? 'X' : '-',
> -			br->entries[i].flags.abort? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  					   struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  
>  		memset(&alf, 0, sizeof(alf));
>  		memset(&alt, 0, sizeof(alt));
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
>  		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
> @@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  			printed += fprintf(fp, ")");
>  		}
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str( br->entries + i),
> -			br->entries[i].flags.in_tx? 'X' : '-',
> -			br->entries[i].flags.abort? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  					   struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  
>  		memset(&alf, 0, sizeof(alf));
>  		memset(&alt, 0, sizeof(alt));
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
>  		    !alf.map->dso->adjust_symbols)
> @@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  			printed += fprintf(fp, ")");
>  		}
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str(br->entries + i),
> -			br->entries[i].flags.in_tx ? 'X' : '-',
> -			br->entries[i].flags.abort ? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -1053,6 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  					    struct machine *machine, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	u64 start, end;
>  	int i, insn, len, nr, ilen, printed = 0;
>  	struct perf_insn x;
> @@ -1073,31 +1077,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	printed += fprintf(fp, "%c", '\n');
>  
>  	/* Handle first from jump, of which we don't know the entry. */
> -	len = grab_bb(buffer, br->entries[nr-1].from,
> -			br->entries[nr-1].from,
> +	len = grab_bb(buffer, entries[nr-1].from,
> +			entries[nr-1].from,
>  			machine, thread, &x.is64bit, &x.cpumode, false);
>  	if (len > 0) {
> -		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
> +		printed += ip__fprintf_sym(entries[nr - 1].from, thread,
>  					   x.cpumode, x.cpu, &lastsym, attr, fp);
> -		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
> +		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
>  					    &x, buffer, len, 0, fp, &total_cycles);
>  		if (PRINT_FIELD(SRCCODE))
> -			printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from);
> +			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
>  	}
>  
>  	/* Print all blocks */
>  	for (i = nr - 2; i >= 0; i--) {
> -		if (br->entries[i].from || br->entries[i].to)
> +		if (entries[i].from || entries[i].to)
>  			pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i,
> -				 br->entries[i].from,
> -				 br->entries[i].to);
> -		start = br->entries[i + 1].to;
> -		end   = br->entries[i].from;
> +				 entries[i].from,
> +				 entries[i].to);
> +		start = entries[i + 1].to;
> +		end   = entries[i].from;
>  
>  		len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>  		/* Patch up missing kernel transfers due to ring filters */
>  		if (len == -ENXIO && i > 0) {
> -			end = br->entries[--i].from;
> +			end = entries[--i].from;
>  			pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end);
>  			len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>  		}
> @@ -1110,7 +1114,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  
>  			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
>  			if (ip == end) {
> -				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp,
> +				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
>  							    &total_cycles);
>  				if (PRINT_FIELD(SRCCODE))
>  					printed += print_srccode(thread, x.cpumode, ip);
> @@ -1134,9 +1138,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	 * Hit the branch? In this case we are already done, and the target
>  	 * has not been executed yet.
>  	 */
> -	if (br->entries[0].from == sample->ip)
> +	if (entries[0].from == sample->ip)
>  		goto out;
> -	if (br->entries[0].flags.abort)
> +	if (entries[0].flags.abort)
>  		goto out;
>  
>  	/*
> @@ -1147,7 +1151,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	 * between final branch and sample. When this happens just
>  	 * continue walking after the last TO until we hit a branch.
>  	 */
> -	start = br->entries[0].to;
> +	start = entries[0].to;
>  	end = sample->ip;
>  	if (end < start) {
>  		/* Missing jump. Scan 128 bytes for the next branch */
> diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
> index 2762e1155238..14239e472187 100644
> --- a/tools/perf/tests/sample-parsing.c
> +++ b/tools/perf/tests/sample-parsing.c
> @@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		COMP(branch_stack->nr);
> +		COMP(branch_stack->hw_idx);
>  		for (i = 0; i < s1->branch_stack->nr; i++)
>  			MCOMP(branch_stack->entries[i]);
>  	}
> @@ -186,7 +187,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  		u64 data[64];
>  	} branch_stack = {
>  		/* 1 branch_entry */
> -		.data = {1, 211, 212, 213},
> +		.data = {1, -1ULL, 211, 212, 213},
>  	};
>  	u64 regs[64];
>  	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
> @@ -208,6 +209,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  		.transaction	= 112,
>  		.raw_data	= (void *)raw_data,
>  		.callchain	= &callchain.callchain,
> +		.no_hw_idx      = false,
>  		.branch_stack	= &branch_stack.branch_stack,
>  		.user_regs	= {
>  			.abi	= PERF_SAMPLE_REGS_ABI_64,
> @@ -244,6 +246,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  	if (sample_type & PERF_SAMPLE_REGS_INTR)
>  		evsel.core.attr.sample_regs_intr = sample_regs;
>  
> +	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
> +		evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
> +
>  	for (i = 0; i < sizeof(regs); i++)
>  		*(i + (u8 *)regs) = i & 0xfe;
>  
> diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
> index 88e00d268f6f..7fc9fa0dc361 100644
> --- a/tools/perf/util/branch.h
> +++ b/tools/perf/util/branch.h
> @@ -12,6 +12,7 @@
>  #include <linux/stddef.h>
>  #include <linux/perf_event.h>
>  #include <linux/types.h>
> +#include "event.h"
>  
>  struct branch_flags {
>  	u64 mispred:1;
> @@ -39,9 +40,30 @@ struct branch_entry {
>  
>  struct branch_stack {
>  	u64			nr;
> +	u64			hw_idx;
>  	struct branch_entry	entries[0];
>  };
>  
> +/*
> + * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
> + * Otherwise, the output format of a sample with branch stack is
> + * struct branch_stack {
> + *	u64			nr;
> + *	struct branch_entry	entries[0];
> + * }
> + * Check whether the hw_idx is available,
> + * and return the corresponding pointer of entries[0].
> + */
> +inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
> +{
> +	u64 *entry = (u64 *)sample->branch_stack;
> +
> +	entry++;
> +	if (sample->no_hw_idx)
> +		return (struct branch_entry *)entry;
> +	return (struct branch_entry *)(++entry);
> +}
> +
>  struct branch_type_stat {
>  	bool	branch_to;
>  	u64	counts[PERF_BR_MAX];
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 5471045ebf5c..e697fe1c67b3 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1202,6 +1202,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>  	if (etm->synth_opts.last_branch) {
>  		dummy_bs = (struct dummy_branch_stack){
>  			.nr = 1,
> +			.hw_idx = -1ULL,
>  			.entries = {
>  				.from = sample.ip,
>  				.to = sample.addr,
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 85223159737c..3cda40a2fafc 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -139,6 +139,7 @@ struct perf_sample {
>  	u16 insn_len;
>  	u8  cpumode;
>  	u16 misc;
> +	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
>  	char insn[MAX_INSN];
>  	void *raw_data;
>  	struct ip_callchain *callchain;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index c8dc4450884c..05883a45de5b 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2169,7 +2169,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  
>  		if (data->branch_stack->nr > max_branch_nr)
>  			return -EFAULT;
> +
>  		sz = data->branch_stack->nr * sizeof(struct branch_entry);
> +		if (perf_evsel__has_branch_hw_idx(evsel))
> +			sz += sizeof(u64);
> +		else
> +			data->no_hw_idx = true;
>  		OVERFLOW_CHECK(array, sz, max_size);
>  		array = (void *)array + sz;
>  	}
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index dc14f4a823cd..99a0cb60c556 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -389,6 +389,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel)
>  	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
>  }
>  
> +static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel)
> +{
> +	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
> +}
> +
>  static inline bool evsel__has_callchain(const struct evsel *evsel)
>  {
>  	return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0;
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index ca5a8f4d007e..808ca27bd5cf 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -2584,9 +2584,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>  			  u64 *total_cycles)
>  {
>  	struct branch_info *bi;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  
>  	/* If we have branch cycles always annotate them. */
> -	if (bs && bs->nr && bs->entries[0].flags.cycles) {
> +	if (bs && bs->nr && entries[0].flags.cycles) {
>  		int i;
>  
>  		bi = sample__resolve_bstack(sample, al);
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 33cf8928cf05..23c8289c2472 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -1295,6 +1295,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>  	struct perf_sample sample = { .ip = 0, };
>  	struct dummy_branch_stack {
>  		u64			nr;
> +		u64			hw_idx;
>  		struct branch_entry	entries;
>  	} dummy_bs;
>  
> @@ -1316,6 +1317,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>  	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
>  		dummy_bs = (struct dummy_branch_stack){
>  			.nr = 1,
> +			.hw_idx = -1ULL,
>  			.entries = {
>  				.from = sample.ip,
>  				.to = sample.addr,
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index c8c5410315e8..62522b76a924 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2083,15 +2083,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
>  {
>  	unsigned int i;
>  	const struct branch_stack *bs = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
>  
>  	if (!bi)
>  		return NULL;
>  
>  	for (i = 0; i < bs->nr; i++) {
> -		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
> -		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
> -		bi[i].flags = bs->entries[i].flags;
> +		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
> +		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
> +		bi[i].flags = entries[i].flags;
>  	}
>  	return bi;
>  }
> @@ -2187,6 +2188,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  	/* LBR only affects the user callchain */
>  	if (i != chain_nr) {
>  		struct branch_stack *lbr_stack = sample->branch_stack;
> +		struct branch_entry *entries = get_branch_entry(sample);
>  		int lbr_nr = lbr_stack->nr, j, k;
>  		bool branch;
>  		struct branch_flags *flags;
> @@ -2212,31 +2214,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  					ip = chain->ips[j];
>  				else if (j > i + 1) {
>  					k = j - i - 2;
> -					ip = lbr_stack->entries[k].from;
> +					ip = entries[k].from;
>  					branch = true;
> -					flags = &lbr_stack->entries[k].flags;
> +					flags = &entries[k].flags;
>  				} else {
> -					ip = lbr_stack->entries[0].to;
> +					ip = entries[0].to;
>  					branch = true;
> -					flags = &lbr_stack->entries[0].flags;
> -					branch_from =
> -						lbr_stack->entries[0].from;
> +					flags = &entries[0].flags;
> +					branch_from = entries[0].from;
>  				}
>  			} else {
>  				if (j < lbr_nr) {
>  					k = lbr_nr - j - 1;
> -					ip = lbr_stack->entries[k].from;
> +					ip = entries[k].from;
>  					branch = true;
> -					flags = &lbr_stack->entries[k].flags;
> +					flags = &entries[k].flags;
>  				}
>  				else if (j > lbr_nr)
>  					ip = chain->ips[i + 1 - (j - lbr_nr)];
>  				else {
> -					ip = lbr_stack->entries[0].to;
> +					ip = entries[0].to;
>  					branch = true;
> -					flags = &lbr_stack->entries[0].flags;
> -					branch_from =
> -						lbr_stack->entries[0].from;
> +					flags = &entries[0].flags;
> +					branch_from = entries[0].from;
>  				}
>  			}
>  
> @@ -2283,6 +2283,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  					    int max_stack)
>  {
>  	struct branch_stack *branch = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct ip_callchain *chain = sample->callchain;
>  	int chain_nr = 0;
>  	u8 cpumode = PERF_RECORD_MISC_USER;
> @@ -2330,7 +2331,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  
>  		for (i = 0; i < nr; i++) {
>  			if (callchain_param.order == ORDER_CALLEE) {
> -				be[i] = branch->entries[i];
> +				be[i] = entries[i];
>  
>  				if (chain == NULL)
>  					continue;
> @@ -2349,7 +2350,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  				    be[i].from >= chain->ips[first_call] - 8)
>  					first_call++;
>  			} else
> -				be[i] = branch->entries[branch->nr - i - 1];
> +				be[i] = entries[branch->nr - i - 1];
>  		}
>  
>  		memset(iter, 0, sizeof(struct iterations) * nr);
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 80ca5d0ab7fe..02b6c87c5abe 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>  					struct thread *thread)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	PyObject *pylist;
>  	u64 i;
>  
> @@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>  			Py_FatalError("couldn't create Python dictionary");
>  
>  		pydict_set_item_string_decref(pyelem, "from",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].from));
> +		    PyLong_FromUnsignedLongLong(entries[i].from));
>  		pydict_set_item_string_decref(pyelem, "to",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].to));
> +		    PyLong_FromUnsignedLongLong(entries[i].to));
>  		pydict_set_item_string_decref(pyelem, "mispred",
> -		    PyBool_FromLong(br->entries[i].flags.mispred));
> +		    PyBool_FromLong(entries[i].flags.mispred));
>  		pydict_set_item_string_decref(pyelem, "predicted",
> -		    PyBool_FromLong(br->entries[i].flags.predicted));
> +		    PyBool_FromLong(entries[i].flags.predicted));
>  		pydict_set_item_string_decref(pyelem, "in_tx",
> -		    PyBool_FromLong(br->entries[i].flags.in_tx));
> +		    PyBool_FromLong(entries[i].flags.in_tx));
>  		pydict_set_item_string_decref(pyelem, "abort",
> -		    PyBool_FromLong(br->entries[i].flags.abort));
> +		    PyBool_FromLong(entries[i].flags.abort));
>  		pydict_set_item_string_decref(pyelem, "cycles",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
> +		    PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
>  
>  		thread__find_map_fb(thread, sample->cpumode,
> -				    br->entries[i].from, &al);
> +				    entries[i].from, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "from_dsoname",
>  					      _PyUnicode_FromString(dsoname));
>  
>  		thread__find_map_fb(thread, sample->cpumode,
> -				    br->entries[i].to, &al);
> +				    entries[i].to, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "to_dsoname",
>  					      _PyUnicode_FromString(dsoname));
> @@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  					   struct thread *thread)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	PyObject *pylist;
>  	u64 i;
>  	char bf[512];
> @@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  			Py_FatalError("couldn't create Python dictionary");
>  
>  		thread__find_symbol_fb(thread, sample->cpumode,
> -				       br->entries[i].from, &al);
> +				       entries[i].from, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "from",
>  					      _PyUnicode_FromString(bf));
>  
>  		thread__find_symbol_fb(thread, sample->cpumode,
> -				       br->entries[i].to, &al);
> +				       entries[i].to, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "to",
>  					      _PyUnicode_FromString(bf));
>  
> -		get_br_mspred(&br->entries[i].flags, bf, sizeof(bf));
> +		get_br_mspred(&entries[i].flags, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "pred",
>  					      _PyUnicode_FromString(bf));
>  
> -		if (br->entries[i].flags.in_tx) {
> +		if (entries[i].flags.in_tx) {
>  			pydict_set_item_string_decref(pyelem, "in_tx",
>  					      _PyUnicode_FromString("X"));
>  		} else {
> @@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  					      _PyUnicode_FromString("-"));
>  		}
>  
> -		if (br->entries[i].flags.abort) {
> +		if (entries[i].flags.abort) {
>  			pydict_set_item_string_decref(pyelem, "abort",
>  					      _PyUnicode_FromString("A"));
>  		} else {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index d0d7d25b23e3..dab985e3f136 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1007,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>  {
>  	struct ip_callchain *callchain = sample->callchain;
>  	struct branch_stack *lbr_stack = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	u64 kernel_callchain_nr = callchain->nr;
>  	unsigned int i;
>  
> @@ -1043,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>  			       i, callchain->ips[i]);
>  
>  		printf("..... %2d: %016" PRIx64 "\n",
> -		       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
> +		       (int)(kernel_callchain_nr), entries[0].to);
>  		for (i = 0; i < lbr_stack->nr; i++)
>  			printf("..... %2d: %016" PRIx64 "\n",
> -			       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
> +			       (int)(i + kernel_callchain_nr + 1), entries[i].from);
>  	}
>  }
>  
> @@ -1068,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
>  
>  static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>  {
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	uint64_t i;
>  
>  	printf("%s: nr:%" PRIu64 "\n",
> @@ -1075,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>  		sample->branch_stack->nr);
>  
>  	for (i = 0; i < sample->branch_stack->nr; i++) {
> -		struct branch_entry *e = &sample->branch_stack->entries[i];
> +		struct branch_entry *e = &entries[i];
>  
>  		if (!callstack) {
>  			printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index c423298fe62d..dd3e6f43fb86 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> -		sz += sizeof(u64);
> +		/* nr, hw_idx */
> +		sz += 2 * sizeof(u64);
>  		result += sz;
>  	}
>  
> @@ -1344,7 +1345,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> -		sz += sizeof(u64);
> +		/* nr, hw_idx */
> +		sz += 2 * sizeof(u64);
>  		memcpy(array, sample->branch_stack, sz);
>  		array = (void *)array + sz;
>  	}
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-03-04 13:49   ` Arnaldo Carvalho de Melo
@ 2020-03-04 15:45     ` Arnaldo Carvalho de Melo
  2020-03-04 16:07       ` Liang, Kan
  2020-03-19 14:10     ` [tip: perf/core] tools headers UAPI: Update tools's copy of linux/perf_event.h tip-bot2 for Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-04 15:45 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Wed, Mar 04, 2020 at 10:49:02AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Feb 28, 2020 at 08:30:00AM -0800, kan.liang@linux.intel.com escreveu:
> > From: Kan Liang <kan.liang@linux.intel.com>
> > The low level index of raw branch records for the most recent branch can
> > be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
> > branch_sample_type. Extend struct branch_stack to support it.

> > However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
> > entries[] will be output by kernel. The pointer of entries[] could be
> > wrong, since the output format is different with new struct branch_stack.
> > Add a variable no_hw_idx in struct perf_sample to indicate whether the
> > hw_idx is output.
> > Add get_branch_entry() to return corresponding pointer of entries[0].
  
> This should be broken up in at least two patches, one that syncs
> tools/include/uapi/linux/perf_event.h with the kernel, and another to do
> what this changeset log message states, I'll do it this time to expedite
> processing of this patchset, please do it that way next time.

So, after doing that split I'm also suggesting/tentatively applying this
patch on top of it, to keep the naming convention we have in tools/perf,
and also the 'static' to that inline, please holler if you disagree,
I'll put the end result in a branch for further visualization/comments.

At some point these functions obtaining stuff from a 'struct perf_sample' to
tools/lib/perf/ (aka libperf), so better go doing proper namespacing,
etc, right Jiri?

- Arnaldo

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index acf3107bbda2..656b347f6dd8 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -735,7 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 					struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -783,7 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -829,7 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -1056,7 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 					    struct machine *machine, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	u64 start, end;
 	int i, insn, len, nr, ilen, printed = 0;
 	struct perf_insn x;
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 014c3cd4cb32..154a05cd03af 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -54,7 +54,7 @@ struct branch_stack {
  * Check whether the hw_idx is available,
  * and return the corresponding pointer of entries[0].
  */
-inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
+static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample)
 {
 	u64 *entry = (u64 *)sample->branch_stack;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 808ca27bd5cf..e74a5acf66d9 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2584,7 +2584,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			  u64 *total_cycles)
 {
 	struct branch_info *bi;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 
 	/* If we have branch cycles always annotate them. */
 	if (bs && bs->nr && entries[0].flags.cycles) {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b0623f99fb9c..fd14f1489802 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2081,7 +2081,7 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 {
 	unsigned int i;
 	const struct branch_stack *bs = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
 
 	if (!bi)
@@ -2186,7 +2186,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	/* LBR only affects the user callchain */
 	if (i != chain_nr) {
 		struct branch_stack *lbr_stack = sample->branch_stack;
-		struct branch_entry *entries = get_branch_entry(sample);
+		struct branch_entry *entries = perf_sample__branch_entries(sample);
 		int lbr_nr = lbr_stack->nr, j, k;
 		bool branch;
 		struct branch_flags *flags;
@@ -2281,7 +2281,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 					    int max_stack)
 {
 	struct branch_stack *branch = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 02b6c87c5abe..8c1b27cd8b99 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -464,7 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 					struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	PyObject *pylist;
 	u64 i;
 
@@ -562,7 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 					   struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	PyObject *pylist;
 	u64 i;
 	char bf[512];
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dab985e3f136..055b00abd56d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1007,7 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 {
 	struct ip_callchain *callchain = sample->callchain;
 	struct branch_stack *lbr_stack = sample->branch_stack;
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	u64 kernel_callchain_nr = callchain->nr;
 	unsigned int i;
 
@@ -1069,7 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
 
 static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 {
-	struct branch_entry *entries = get_branch_entry(sample);
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	uint64_t i;
 
 	printf("%s: nr:%" PRIu64 "\n",

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-03-04 15:45     ` Arnaldo Carvalho de Melo
@ 2020-03-04 16:07       ` Liang, Kan
  0 siblings, 0 replies; 31+ messages in thread
From: Liang, Kan @ 2020-03-04 16:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/4/2020 10:45 AM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 04, 2020 at 10:49:02AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Feb 28, 2020 at 08:30:00AM -0800, kan.liang@linux.intel.com escreveu:
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>> The low level index of raw branch records for the most recent branch can
>>> be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
>>> branch_sample_type. Extend struct branch_stack to support it.
> 
>>> However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
>>> entries[] will be output by kernel. The pointer of entries[] could be
>>> wrong, since the output format is different with new struct branch_stack.
>>> Add a variable no_hw_idx in struct perf_sample to indicate whether the
>>> hw_idx is output.
>>> Add get_branch_entry() to return corresponding pointer of entries[0].
>    
>> This should be broken up in at least two patches, one that syncs
>> tools/include/uapi/linux/perf_event.h with the kernel, and another to do
>> what this changeset log message states, I'll do it this time to expedite
>> processing of this patchset, please do it that way next time.
>

Thanks. I will keep it in mind.

> So, after doing that split I'm also suggesting/tentatively applying this
> patch on top of it, to keep the naming convention we have in tools/perf,
> and also the 'static' to that inline, 

Thanks. The changes look good to me.

Thanks,
Kan

> please holler if you disagree,
> I'll put the end result in a branch for further visualization/comments.
> 
> At some point these functions obtaining stuff from a 'struct perf_sample' to
> tools/lib/perf/ (aka libperf), so better go doing proper namespacing,
> etc, right Jiri?
> 
> - Arnaldo
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index acf3107bbda2..656b347f6dd8 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -735,7 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>   					struct perf_event_attr *attr, FILE *fp)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	struct addr_location alf, alt;
>   	u64 i, from, to;
>   	int printed = 0;
> @@ -783,7 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>   					   struct perf_event_attr *attr, FILE *fp)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	struct addr_location alf, alt;
>   	u64 i, from, to;
>   	int printed = 0;
> @@ -829,7 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>   					   struct perf_event_attr *attr, FILE *fp)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	struct addr_location alf, alt;
>   	u64 i, from, to;
>   	int printed = 0;
> @@ -1056,7 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>   					    struct machine *machine, FILE *fp)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	u64 start, end;
>   	int i, insn, len, nr, ilen, printed = 0;
>   	struct perf_insn x;
> diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
> index 014c3cd4cb32..154a05cd03af 100644
> --- a/tools/perf/util/branch.h
> +++ b/tools/perf/util/branch.h
> @@ -54,7 +54,7 @@ struct branch_stack {
>    * Check whether the hw_idx is available,
>    * and return the corresponding pointer of entries[0].
>    */
> -inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
> +static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample)
>   {
>   	u64 *entry = (u64 *)sample->branch_stack;
>   
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 808ca27bd5cf..e74a5acf66d9 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -2584,7 +2584,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>   			  u64 *total_cycles)
>   {
>   	struct branch_info *bi;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   
>   	/* If we have branch cycles always annotate them. */
>   	if (bs && bs->nr && entries[0].flags.cycles) {
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index b0623f99fb9c..fd14f1489802 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2081,7 +2081,7 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
>   {
>   	unsigned int i;
>   	const struct branch_stack *bs = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
>   
>   	if (!bi)
> @@ -2186,7 +2186,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>   	/* LBR only affects the user callchain */
>   	if (i != chain_nr) {
>   		struct branch_stack *lbr_stack = sample->branch_stack;
> -		struct branch_entry *entries = get_branch_entry(sample);
> +		struct branch_entry *entries = perf_sample__branch_entries(sample);
>   		int lbr_nr = lbr_stack->nr, j, k;
>   		bool branch;
>   		struct branch_flags *flags;
> @@ -2281,7 +2281,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>   					    int max_stack)
>   {
>   	struct branch_stack *branch = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	struct ip_callchain *chain = sample->callchain;
>   	int chain_nr = 0;
>   	u8 cpumode = PERF_RECORD_MISC_USER;
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 02b6c87c5abe..8c1b27cd8b99 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -464,7 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>   					struct thread *thread)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	PyObject *pylist;
>   	u64 i;
>   
> @@ -562,7 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>   					   struct thread *thread)
>   {
>   	struct branch_stack *br = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	PyObject *pylist;
>   	u64 i;
>   	char bf[512];
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index dab985e3f136..055b00abd56d 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1007,7 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>   {
>   	struct ip_callchain *callchain = sample->callchain;
>   	struct branch_stack *lbr_stack = sample->branch_stack;
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	u64 kernel_callchain_nr = callchain->nr;
>   	unsigned int i;
>   
> @@ -1069,7 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
>   
>   static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>   {
> -	struct branch_entry *entries = get_branch_entry(sample);
> +	struct branch_entry *entries = perf_sample__branch_entries(sample);
>   	uint64_t i;
>   
>   	printf("%s: nr:%" PRIu64 "\n",
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
  2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
@ 2020-03-05 20:25   ` Arnaldo Carvalho de Melo
  2020-03-05 21:02     ` Liang, Kan
  2020-03-19 14:10   ` [tip: perf/core] perf evsel: " tip-bot2 for Kan Liang
  1 sibling, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-05 20:25 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Feb 28, 2020 at 08:30:01AM -0800, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
> in latest kernel.
> 
> Enable HW_INDEX by default in LBR call stack mode.
> If kernel doesn't support the sample type, switching it off.
> 
> Add HW_INDEX in attr_fprintf as well. User can check whether the branch
> sample type is set via debug information or header.

So while this works with a kernel where PERF_SAMPLE_BRANCH_HW_INDEX is
present and we get, from the committer notes I was putting together
while testing/applying this cset:

First collect some samples with LBR callchains, system wide, for a few
seconds:

  # perf record --call-graph lbr -a sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
  #

Now lets use 'perf evlist -v' to look at the branch_sample_type:

  # perf evlist -v
  cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
  #

So the machine has the kernel feature, and it was correctly added to
perf_event_attr.branch_sample_type, for the default 'cycles' event.

Cool, and look at that 'attr.precise_ip: 3' part, the kernel is OK with
having that together with attr.branch_sample_type with HW_INDEX set.

The problem happens when I go test this in an older kernel, where the
kernel doesn't know about HW_INDEX, we get it disabled but then
precise_ip is set to zero in its detection , even if at the end we get
it to 3, as expected, which got me a bit confused, I'll investigate this
a bit more to try and avoid these extra probes for the max precise level
that fails in older kernels due to branch_sample_type having HW_INDEX
:-\


# perf record -vv --call-graph lbr -a sleep 5
<SNIP>
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  precise_ip                       3
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -95
decreasing precise_ip by one (2)
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  precise_ip                       2
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -95
decreasing precise_ip by one (1)
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  precise_ip                       1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -95
decreasing precise_ip by one (0)
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -22
switching off branch HW index support
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  task                             1
  precise_ip                       3
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES
------------------------------------------------------------


 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/util/evsel.c                   | 15 ++++++++++++---
>  tools/perf/util/evsel.h                   |  1 +
>  tools/perf/util/perf_event_attr_fprintf.c |  1 +
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 05883a45de5b..816d930d774e 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -712,7 +712,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel,
>  				attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
>  							PERF_SAMPLE_BRANCH_CALL_STACK |
>  							PERF_SAMPLE_BRANCH_NO_CYCLES |
> -							PERF_SAMPLE_BRANCH_NO_FLAGS;
> +							PERF_SAMPLE_BRANCH_NO_FLAGS |
> +							PERF_SAMPLE_BRANCH_HW_INDEX;
>  			}
>  		} else
>  			 pr_warning("Cannot use LBR callstack with branch stack. "
> @@ -763,7 +764,8 @@ perf_evsel__reset_callgraph(struct evsel *evsel,
>  	if (param->record_mode == CALLCHAIN_LBR) {
>  		perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
>  		attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER |
> -					      PERF_SAMPLE_BRANCH_CALL_STACK);
> +					      PERF_SAMPLE_BRANCH_CALL_STACK |
> +					      PERF_SAMPLE_BRANCH_HW_INDEX);
>  	}
>  	if (param->record_mode == CALLCHAIN_DWARF) {
>  		perf_evsel__reset_sample_bit(evsel, REGS_USER);
> @@ -1673,6 +1675,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>  		evsel->core.attr.ksymbol = 0;
>  	if (perf_missing_features.bpf)
>  		evsel->core.attr.bpf_event = 0;
> +	if (perf_missing_features.branch_hw_idx)
> +		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
>  retry_sample_id:
>  	if (perf_missing_features.sample_id_all)
>  		evsel->core.attr.sample_id_all = 0;
> @@ -1784,7 +1788,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>  	 * Must probe features in the order they were added to the
>  	 * perf_event_attr interface.
>  	 */
> -	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
> +	if (!perf_missing_features.branch_hw_idx &&
> +	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
> +		perf_missing_features.branch_hw_idx = true;
> +		pr_debug2("switching off branch HW index support\n");
> +		goto fallback_missing_features;
> +	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
>  		perf_missing_features.aux_output = true;
>  		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
>  		goto out_close;
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 99a0cb60c556..33804740e2ca 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -119,6 +119,7 @@ struct perf_missing_features {
>  	bool ksymbol;
>  	bool bpf;
>  	bool aux_output;
> +	bool branch_hw_idx;
>  };
>  
>  extern struct perf_missing_features perf_missing_features;
> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> index 651203126c71..355d3458d4e6 100644
> --- a/tools/perf/util/perf_event_attr_fprintf.c
> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> @@ -50,6 +50,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
>  		bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX),
>  		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
>  		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
> +		bit_name(HW_INDEX),
>  		{ .name = NULL, }
>  	};
>  #undef bit_name
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
  2020-03-05 20:25   ` Arnaldo Carvalho de Melo
@ 2020-03-05 21:02     ` Liang, Kan
  2020-03-05 23:17       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 31+ messages in thread
From: Liang, Kan @ 2020-03-05 21:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/5/2020 3:25 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Feb 28, 2020 at 08:30:01AM -0800, kan.liang@linux.intel.com escreveu:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
>> in latest kernel.
>>
>> Enable HW_INDEX by default in LBR call stack mode.
>> If kernel doesn't support the sample type, switching it off.
>>
>> Add HW_INDEX in attr_fprintf as well. User can check whether the branch
>> sample type is set via debug information or header.
> 
> So while this works with a kernel where PERF_SAMPLE_BRANCH_HW_INDEX is
> present and we get, from the committer notes I was putting together
> while testing/applying this cset:
> 
> First collect some samples with LBR callchains, system wide, for a few
> seconds:
> 
>    # perf record --call-graph lbr -a sleep 5
>    [ perf record: Woken up 1 times to write data ]
>    [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
>    #
> 
> Now lets use 'perf evlist -v' to look at the branch_sample_type:
> 
>    # perf evlist -v
>    cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
>    #
> 
> So the machine has the kernel feature, and it was correctly added to
> perf_event_attr.branch_sample_type, for the default 'cycles' event.
> 
> Cool, and look at that 'attr.precise_ip: 3' part, the kernel is OK with
> having that together with attr.branch_sample_type with HW_INDEX set.
> 
> The problem happens when I go test this in an older kernel, where the
> kernel doesn't know about HW_INDEX, we get it disabled but then
> precise_ip is set to zero in its detection , even if at the end we get
> it to 3, as expected, which got me a bit confused, I'll investigate this
> a bit more to try and avoid these extra probes for the max precise level
> that fails in older kernels due to branch_sample_type having HW_INDEX
> :-\

It looks like this is an expected behavior for the event with maximum 
precise config for current perf tool.

The related commits are as below:
commit ID: 4e8a5c155137 ("perf evsel: Fix max perf_event_attr.precise_ip 
detection")
commit ID: cd136189370c ("perf evsel: Do not rely on errno values for 
precise_ip fallback")

Before handling any standard fallback (not just HW_INDEX), perf tool 
will try all the precise_ip value first.

Thanks,
Kan

> 
> 
> # perf record -vv --call-graph lbr -a sleep 5
> <SNIP>
> ------------------------------------------------------------
> perf_event_attr:
>    size                             120
>    { sample_period, sample_freq }   4000
>    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
>    read_format                      ID
>    disabled                         1
>    inherit                          1
>    mmap                             1
>    comm                             1
>    freq                             1
>    task                             1
>    precise_ip                       3
>    sample_id_all                    1
>    exclude_guest                    1
>    mmap2                            1
>    comm_exec                        1
>    ksymbol                          1
>    bpf_event                        1
>    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> decreasing precise_ip by one (2)
> ------------------------------------------------------------
> perf_event_attr:
>    size                             120
>    { sample_period, sample_freq }   4000
>    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
>    read_format                      ID
>    disabled                         1
>    inherit                          1
>    mmap                             1
>    comm                             1
>    freq                             1
>    task                             1
>    precise_ip                       2
>    sample_id_all                    1
>    exclude_guest                    1
>    mmap2                            1
>    comm_exec                        1
>    ksymbol                          1
>    bpf_event                        1
>    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> decreasing precise_ip by one (1)
> ------------------------------------------------------------
> perf_event_attr:
>    size                             120
>    { sample_period, sample_freq }   4000
>    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
>    read_format                      ID
>    disabled                         1
>    inherit                          1
>    mmap                             1
>    comm                             1
>    freq                             1
>    task                             1
>    precise_ip                       1
>    sample_id_all                    1
>    exclude_guest                    1
>    mmap2                            1
>    comm_exec                        1
>    ksymbol                          1
>    bpf_event                        1
>    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> decreasing precise_ip by one (0)
> ------------------------------------------------------------
> perf_event_attr:
>    size                             120
>    { sample_period, sample_freq }   4000
>    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
>    read_format                      ID
>    disabled                         1
>    inherit                          1
>    mmap                             1
>    comm                             1
>    freq                             1
>    task                             1
>    sample_id_all                    1
>    exclude_guest                    1
>    mmap2                            1
>    comm_exec                        1
>    ksymbol                          1
>    bpf_event                        1
>    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -22
> switching off branch HW index support
> ------------------------------------------------------------
> perf_event_attr:
>    size                             120
>    { sample_period, sample_freq }   4000
>    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
>    read_format                      ID
>    disabled                         1
>    inherit                          1
>    mmap                             1
>    comm                             1
>    freq                             1
>    task                             1
>    precise_ip                       3
>    sample_id_all                    1
>    exclude_guest                    1
>    mmap2                            1
>    comm_exec                        1
>    ksymbol                          1
>    bpf_event                        1
>    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES
> ------------------------------------------------------------
> 
> 
>   
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   tools/perf/util/evsel.c                   | 15 ++++++++++++---
>>   tools/perf/util/evsel.h                   |  1 +
>>   tools/perf/util/perf_event_attr_fprintf.c |  1 +
>>   3 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index 05883a45de5b..816d930d774e 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -712,7 +712,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel,
>>   				attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
>>   							PERF_SAMPLE_BRANCH_CALL_STACK |
>>   							PERF_SAMPLE_BRANCH_NO_CYCLES |
>> -							PERF_SAMPLE_BRANCH_NO_FLAGS;
>> +							PERF_SAMPLE_BRANCH_NO_FLAGS |
>> +							PERF_SAMPLE_BRANCH_HW_INDEX;
>>   			}
>>   		} else
>>   			 pr_warning("Cannot use LBR callstack with branch stack. "
>> @@ -763,7 +764,8 @@ perf_evsel__reset_callgraph(struct evsel *evsel,
>>   	if (param->record_mode == CALLCHAIN_LBR) {
>>   		perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
>>   		attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER |
>> -					      PERF_SAMPLE_BRANCH_CALL_STACK);
>> +					      PERF_SAMPLE_BRANCH_CALL_STACK |
>> +					      PERF_SAMPLE_BRANCH_HW_INDEX);
>>   	}
>>   	if (param->record_mode == CALLCHAIN_DWARF) {
>>   		perf_evsel__reset_sample_bit(evsel, REGS_USER);
>> @@ -1673,6 +1675,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>>   		evsel->core.attr.ksymbol = 0;
>>   	if (perf_missing_features.bpf)
>>   		evsel->core.attr.bpf_event = 0;
>> +	if (perf_missing_features.branch_hw_idx)
>> +		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
>>   retry_sample_id:
>>   	if (perf_missing_features.sample_id_all)
>>   		evsel->core.attr.sample_id_all = 0;
>> @@ -1784,7 +1788,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>>   	 * Must probe features in the order they were added to the
>>   	 * perf_event_attr interface.
>>   	 */
>> -	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
>> +	if (!perf_missing_features.branch_hw_idx &&
>> +	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
>> +		perf_missing_features.branch_hw_idx = true;
>> +		pr_debug2("switching off branch HW index support\n");
>> +		goto fallback_missing_features;
>> +	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
>>   		perf_missing_features.aux_output = true;
>>   		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
>>   		goto out_close;
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index 99a0cb60c556..33804740e2ca 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -119,6 +119,7 @@ struct perf_missing_features {
>>   	bool ksymbol;
>>   	bool bpf;
>>   	bool aux_output;
>> +	bool branch_hw_idx;
>>   };
>>   
>>   extern struct perf_missing_features perf_missing_features;
>> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
>> index 651203126c71..355d3458d4e6 100644
>> --- a/tools/perf/util/perf_event_attr_fprintf.c
>> +++ b/tools/perf/util/perf_event_attr_fprintf.c
>> @@ -50,6 +50,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
>>   		bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX),
>>   		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
>>   		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
>> +		bit_name(HW_INDEX),
>>   		{ .name = NULL, }
>>   	};
>>   #undef bit_name
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
  2020-03-05 21:02     ` Liang, Kan
@ 2020-03-05 23:17       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-05 23:17 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Arnaldo Carvalho de Melo, jolsa, peterz, mingo, linux-kernel,
	namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak

Em Thu, Mar 05, 2020 at 04:02:10PM -0500, Liang, Kan escreveu:
> 
> 
> On 3/5/2020 3:25 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Feb 28, 2020 at 08:30:01AM -0800, kan.liang@linux.intel.com escreveu:
> > > From: Kan Liang <kan.liang@linux.intel.com>
> > > 
> > > A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
> > > in latest kernel.
> > > 
> > > Enable HW_INDEX by default in LBR call stack mode.
> > > If kernel doesn't support the sample type, switching it off.
> > > 
> > > Add HW_INDEX in attr_fprintf as well. User can check whether the branch
> > > sample type is set via debug information or header.
> > 
> > So while this works with a kernel where PERF_SAMPLE_BRANCH_HW_INDEX is
> > present and we get, from the committer notes I was putting together
> > while testing/applying this cset:
> > 
> > First collect some samples with LBR callchains, system wide, for a few
> > seconds:
> > 
> >    # perf record --call-graph lbr -a sleep 5
> >    [ perf record: Woken up 1 times to write data ]
> >    [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
> >    #
> > 
> > Now lets use 'perf evlist -v' to look at the branch_sample_type:
> > 
> >    # perf evlist -v
> >    cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> >    #
> > 
> > So the machine has the kernel feature, and it was correctly added to
> > perf_event_attr.branch_sample_type, for the default 'cycles' event.
> > 
> > Cool, and look at that 'attr.precise_ip: 3' part, the kernel is OK with
> > having that together with attr.branch_sample_type with HW_INDEX set.
> > 
> > The problem happens when I go test this in an older kernel, where the
> > kernel doesn't know about HW_INDEX, we get it disabled but then
> > precise_ip is set to zero in its detection , even if at the end we get
> > it to 3, as expected, which got me a bit confused, I'll investigate this
> > a bit more to try and avoid these extra probes for the max precise level
> > that fails in older kernels due to branch_sample_type having HW_INDEX
> > :-\
> 
> It looks like this is an expected behavior for the event with maximum
> precise config for current perf tool.
> 
> The related commits are as below:
> commit ID: 4e8a5c155137 ("perf evsel: Fix max perf_event_attr.precise_ip
> detection")
> commit ID: cd136189370c ("perf evsel: Do not rely on errno values for
> precise_ip fallback")
> 
> Before handling any standard fallback (not just HW_INDEX), perf tool will
> try all the precise_ip value first.

Right, and since it uses HW_INDEX and the kernel doesn't support that,
that precise_ip max value detection will fail, it will go back to
whatever it was and try again later, then the fallback will remove the
HW_INDEX and it will work.

What I said is that if we don't set branch_sample_type when detecting
the max precise_ip, then that detection will work and we will not
needlessly go on testing with precise_ip = 3, 2, 1, 0.

My doubt was more about if HW_INDEX (or any other branch_sample_type)
had to be tested in conjunction with precise_ip, which I think isn't
the case from what I saw in my tests.

- Arnaldo
 
> Thanks,
> Kan
> 
> > 
> > 
> > # perf record -vv --call-graph lbr -a sleep 5
> > <SNIP>
> > ------------------------------------------------------------
> > perf_event_attr:
> >    size                             120
> >    { sample_period, sample_freq }   4000
> >    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
> >    read_format                      ID
> >    disabled                         1
> >    inherit                          1
> >    mmap                             1
> >    comm                             1
> >    freq                             1
> >    task                             1
> >    precise_ip                       3
> >    sample_id_all                    1
> >    exclude_guest                    1
> >    mmap2                            1
> >    comm_exec                        1
> >    ksymbol                          1
> >    bpf_event                        1
> >    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
> > decreasing precise_ip by one (2)
> > ------------------------------------------------------------
> > perf_event_attr:
> >    size                             120
> >    { sample_period, sample_freq }   4000
> >    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
> >    read_format                      ID
> >    disabled                         1
> >    inherit                          1
> >    mmap                             1
> >    comm                             1
> >    freq                             1
> >    task                             1
> >    precise_ip                       2
> >    sample_id_all                    1
> >    exclude_guest                    1
> >    mmap2                            1
> >    comm_exec                        1
> >    ksymbol                          1
> >    bpf_event                        1
> >    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
> > decreasing precise_ip by one (1)
> > ------------------------------------------------------------
> > perf_event_attr:
> >    size                             120
> >    { sample_period, sample_freq }   4000
> >    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
> >    read_format                      ID
> >    disabled                         1
> >    inherit                          1
> >    mmap                             1
> >    comm                             1
> >    freq                             1
> >    task                             1
> >    precise_ip                       1
> >    sample_id_all                    1
> >    exclude_guest                    1
> >    mmap2                            1
> >    comm_exec                        1
> >    ksymbol                          1
> >    bpf_event                        1
> >    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -95
> > decreasing precise_ip by one (0)
> > ------------------------------------------------------------
> > perf_event_attr:
> >    size                             120
> >    { sample_period, sample_freq }   4000
> >    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
> >    read_format                      ID
> >    disabled                         1
> >    inherit                          1
> >    mmap                             1
> >    comm                             1
> >    freq                             1
> >    task                             1
> >    sample_id_all                    1
> >    exclude_guest                    1
> >    mmap2                            1
> >    comm_exec                        1
> >    ksymbol                          1
> >    bpf_event                        1
> >    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off branch HW index support
> > ------------------------------------------------------------
> > perf_event_attr:
> >    size                             120
> >    { sample_period, sample_freq }   4000
> >    sample_type                      IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK
> >    read_format                      ID
> >    disabled                         1
> >    inherit                          1
> >    mmap                             1
> >    comm                             1
> >    freq                             1
> >    task                             1
> >    precise_ip                       3
> >    sample_id_all                    1
> >    exclude_guest                    1
> >    mmap2                            1
> >    comm_exec                        1
> >    ksymbol                          1
> >    bpf_event                        1
> >    branch_sample_type               USER|CALL_STACK|NO_FLAGS|NO_CYCLES
> > ------------------------------------------------------------
> > 
> > 
> > > Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> > > ---
> > >   tools/perf/util/evsel.c                   | 15 ++++++++++++---
> > >   tools/perf/util/evsel.h                   |  1 +
> > >   tools/perf/util/perf_event_attr_fprintf.c |  1 +
> > >   3 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > index 05883a45de5b..816d930d774e 100644
> > > --- a/tools/perf/util/evsel.c
> > > +++ b/tools/perf/util/evsel.c
> > > @@ -712,7 +712,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel,
> > >   				attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
> > >   							PERF_SAMPLE_BRANCH_CALL_STACK |
> > >   							PERF_SAMPLE_BRANCH_NO_CYCLES |
> > > -							PERF_SAMPLE_BRANCH_NO_FLAGS;
> > > +							PERF_SAMPLE_BRANCH_NO_FLAGS |
> > > +							PERF_SAMPLE_BRANCH_HW_INDEX;
> > >   			}
> > >   		} else
> > >   			 pr_warning("Cannot use LBR callstack with branch stack. "
> > > @@ -763,7 +764,8 @@ perf_evsel__reset_callgraph(struct evsel *evsel,
> > >   	if (param->record_mode == CALLCHAIN_LBR) {
> > >   		perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
> > >   		attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER |
> > > -					      PERF_SAMPLE_BRANCH_CALL_STACK);
> > > +					      PERF_SAMPLE_BRANCH_CALL_STACK |
> > > +					      PERF_SAMPLE_BRANCH_HW_INDEX);
> > >   	}
> > >   	if (param->record_mode == CALLCHAIN_DWARF) {
> > >   		perf_evsel__reset_sample_bit(evsel, REGS_USER);
> > > @@ -1673,6 +1675,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > >   		evsel->core.attr.ksymbol = 0;
> > >   	if (perf_missing_features.bpf)
> > >   		evsel->core.attr.bpf_event = 0;
> > > +	if (perf_missing_features.branch_hw_idx)
> > > +		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
> > >   retry_sample_id:
> > >   	if (perf_missing_features.sample_id_all)
> > >   		evsel->core.attr.sample_id_all = 0;
> > > @@ -1784,7 +1788,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > >   	 * Must probe features in the order they were added to the
> > >   	 * perf_event_attr interface.
> > >   	 */
> > > -	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
> > > +	if (!perf_missing_features.branch_hw_idx &&
> > > +	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
> > > +		perf_missing_features.branch_hw_idx = true;
> > > +		pr_debug2("switching off branch HW index support\n");
> > > +		goto fallback_missing_features;
> > > +	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
> > >   		perf_missing_features.aux_output = true;
> > >   		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
> > >   		goto out_close;
> > > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > > index 99a0cb60c556..33804740e2ca 100644
> > > --- a/tools/perf/util/evsel.h
> > > +++ b/tools/perf/util/evsel.h
> > > @@ -119,6 +119,7 @@ struct perf_missing_features {
> > >   	bool ksymbol;
> > >   	bool bpf;
> > >   	bool aux_output;
> > > +	bool branch_hw_idx;
> > >   };
> > >   extern struct perf_missing_features perf_missing_features;
> > > diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> > > index 651203126c71..355d3458d4e6 100644
> > > --- a/tools/perf/util/perf_event_attr_fprintf.c
> > > +++ b/tools/perf/util/perf_event_attr_fprintf.c
> > > @@ -50,6 +50,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
> > >   		bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX),
> > >   		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
> > >   		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
> > > +		bit_name(HW_INDEX),
> > >   		{ .name = NULL, }
> > >   	};
> > >   #undef bit_name
> > > -- 
> > > 2.17.1
> > > 
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (12 preceding siblings ...)
  2020-03-04 13:33 ` [PATCH 00/12] Stitch LBR call stack (Perf Tools) Arnaldo Carvalho de Melo
@ 2020-03-06  9:39 ` Jiri Olsa
  2020-03-06 19:13   ` Liang, Kan
  13 siblings, 1 reply; 31+ messages in thread
From: Jiri Olsa @ 2020-03-06  9:39 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

On Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com wrote:

SNIP

> Kan Liang (12):
>   perf tools: Add hw_idx in struct branch_stack
>   perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
>   perf header: Add check for event attr
>   perf pmu: Add support for PMU capabilities

hi,
I'm getting compile error:

	util/pmu.c: In function ‘perf_pmu__caps_parse’:
	util/pmu.c:1620:32: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
	 1620 |   snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
	      |                                ^~
	In file included from /usr/include/stdio.h:867,
			 from util/pmu.c:12:
	/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096
	   67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	   68 |        __bos (__s), __fmt, __va_arg_pack ());
	      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	cc1: all warnings being treated as errors

	[jolsa@krava perf]$ gcc --version
	gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)

jirka


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-03-06  9:39 ` Jiri Olsa
@ 2020-03-06 19:13   ` Liang, Kan
  2020-03-06 20:06     ` Arnaldo Carvalho de Melo
  2020-03-09 13:27     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 31+ messages in thread
From: Liang, Kan @ 2020-03-06 19:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/6/2020 4:39 AM, Jiri Olsa wrote:
> On Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com wrote:
> 
> SNIP
> 
>> Kan Liang (12):
>>    perf tools: Add hw_idx in struct branch_stack
>>    perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
>>    perf header: Add check for event attr
>>    perf pmu: Add support for PMU capabilities
> 
> hi,
> I'm getting compile error:
> 
> 	util/pmu.c: In function ‘perf_pmu__caps_parse’:
> 	util/pmu.c:1620:32: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
> 	 1620 |   snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
> 	      |                                ^~
> 	In file included from /usr/include/stdio.h:867,
> 			 from util/pmu.c:12:
> 	/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096
> 	   67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> 	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	   68 |        __bos (__s), __fmt, __va_arg_pack ());
> 	      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	cc1: all warnings being treated as errors
> 
> 	[jolsa@krava perf]$ gcc --version
> 	gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)

My GCC version is too old. I will send V2 later to fix the error.

Thanks,
Kan

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-03-06 19:13   ` Liang, Kan
@ 2020-03-06 20:06     ` Arnaldo Carvalho de Melo
  2020-03-09 13:27     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-06 20:06 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Jiri Olsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Mar 06, 2020 at 02:13:15PM -0500, Liang, Kan escreveu:
> 
> 
> On 3/6/2020 4:39 AM, Jiri Olsa wrote:
> > On Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com wrote:
> > 
> > SNIP
> > 
> > > Kan Liang (12):
> > >    perf tools: Add hw_idx in struct branch_stack
> > >    perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
> > >    perf header: Add check for event attr
> > >    perf pmu: Add support for PMU capabilities
> > 
> > hi,
> > I'm getting compile error:
> > 
> > 	util/pmu.c: In function ‘perf_pmu__caps_parse’:
> > 	util/pmu.c:1620:32: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
> > 	 1620 |   snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
> > 	      |                                ^~
> > 	In file included from /usr/include/stdio.h:867,
> > 			 from util/pmu.c:12:
> > 	/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096
> > 	   67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> > 	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 	   68 |        __bos (__s), __fmt, __va_arg_pack ());
> > 	      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 	cc1: all warnings being treated as errors
> > 
> > 	[jolsa@krava perf]$ gcc --version
> > 	gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
> 
> My GCC version is too old. I will send V2 later to fix the error.

So, Jiri asked me to push my perf/core, which I did, please refresh from
there,

Right now it is at:

[acme@quaco perf]$ git log --oneline -10
c45c91f6161c (HEAD -> perf/core, seventh/perf/core, acme/perf/core, acme.korg/perf/core) perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX
1fa65c5092da perf tools: Add hw_idx in struct branch_stack
6339998d22ec tools headers UAPI: Update tools's copy of linux/perf_event.h
401d61cbd4d4 tools lib traceevent: Remove extra '\n' in print_event_time()
76ce02651dab libperf: Add counting example
dabce16bd292 perf annotate: Get rid of annotation->nr_jumps
357a5d24c471 perf llvm: Add debug hint message about missing kernel-devel package
1af62ce61cd8 perf stat: Show percore counts in per CPU output
7982a8985150 tools lib api fs: Move cgroupsfs_find_mountpoint()
d46eec8e975a Merge remote-tracking branch 'acme/perf/urgent' into perf/core
[acme@quaco perf]$

Repository:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

Branch:

perf/core

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-03-06 19:13   ` Liang, Kan
  2020-03-06 20:06     ` Arnaldo Carvalho de Melo
@ 2020-03-09 13:27     ` Arnaldo Carvalho de Melo
  2020-03-09 13:42       ` Liang, Kan
  1 sibling, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-09 13:27 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Jiri Olsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Mar 06, 2020 at 02:13:15PM -0500, Liang, Kan escreveu:
> 
> 
> On 3/6/2020 4:39 AM, Jiri Olsa wrote:
> > On Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com wrote:
> > 
> > SNIP
> > 
> > > Kan Liang (12):
> > >    perf tools: Add hw_idx in struct branch_stack
> > >    perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
> > >    perf header: Add check for event attr
> > >    perf pmu: Add support for PMU capabilities
> > 
> > hi,
> > I'm getting compile error:
> > 
> > 	util/pmu.c: In function ‘perf_pmu__caps_parse’:
> > 	util/pmu.c:1620:32: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
> > 	 1620 |   snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
> > 	      |                                ^~
> > 	In file included from /usr/include/stdio.h:867,
> > 			 from util/pmu.c:12:
> > 	/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096
> > 	   67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> > 	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 	   68 |        __bos (__s), __fmt, __va_arg_pack ());
> > 	      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 	cc1: all warnings being treated as errors
> > 
> > 	[jolsa@krava perf]$ gcc --version
> > 	gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
> 
> My GCC version is too old. I will send V2 later to fix the error.

So I stopped at the patch just before the one introducing this problem,
i.e. now I have:

[acme@seventh perf]$ git log --oneline -10
5100c2b77049 (HEAD -> perf/core, five/perf/core, acme/perf/core) perf header: Add check for unexpected use of reserved membrs in event attr
1d2fc2bd7c1c perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX
1fa65c5092da perf tools: Add hw_idx in struct branch_stack
6339998d22ec tools headers UAPI: Update tools's copy of linux/perf_event.h
401d61cbd4d4 tools lib traceevent: Remove extra '\n' in print_event_time()
76ce02651dab libperf: Add counting example
dabce16bd292 perf annotate: Get rid of annotation->nr_jumps
357a5d24c471 perf llvm: Add debug hint message about missing kernel-devel package
1af62ce61cd8 perf stat: Show percore counts in per CPU output
7982a8985150 tools lib api fs: Move cgroupsfs_find_mountpoint()
[acme@seventh perf]$

Please continue from there, I'll process some other patchsets,

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/12] Stitch LBR call stack (Perf Tools)
  2020-03-09 13:27     ` Arnaldo Carvalho de Melo
@ 2020-03-09 13:42       ` Liang, Kan
  0 siblings, 0 replies; 31+ messages in thread
From: Liang, Kan @ 2020-03-09 13:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/9/2020 9:27 AM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 06, 2020 at 02:13:15PM -0500, Liang, Kan escreveu:
>>
>>
>> On 3/6/2020 4:39 AM, Jiri Olsa wrote:
>>> On Fri, Feb 28, 2020 at 08:29:59AM -0800, kan.liang@linux.intel.com wrote:
>>>
>>> SNIP
>>>
>>>> Kan Liang (12):
>>>>     perf tools: Add hw_idx in struct branch_stack
>>>>     perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX
>>>>     perf header: Add check for event attr
>>>>     perf pmu: Add support for PMU capabilities
>>>
>>> hi,
>>> I'm getting compile error:
>>>
>>> 	util/pmu.c: In function ‘perf_pmu__caps_parse’:
>>> 	util/pmu.c:1620:32: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
>>> 	 1620 |   snprintf(path, PATH_MAX, "%s/%s", caps_path, name);
>>> 	      |                                ^~
>>> 	In file included from /usr/include/stdio.h:867,
>>> 			 from util/pmu.c:12:
>>> 	/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096
>>> 	   67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
>>> 	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 	   68 |        __bos (__s), __fmt, __va_arg_pack ());
>>> 	      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 	cc1: all warnings being treated as errors
>>>
>>> 	[jolsa@krava perf]$ gcc --version
>>> 	gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
>>
>> My GCC version is too old. I will send V2 later to fix the error.
> 
> So I stopped at the patch just before the one introducing this problem,
> i.e. now I have:
> 
> [acme@seventh perf]$ git log --oneline -10
> 5100c2b77049 (HEAD -> perf/core, five/perf/core, acme/perf/core) perf header: Add check for unexpected use of reserved membrs in event attr
> 1d2fc2bd7c1c perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX
> 1fa65c5092da perf tools: Add hw_idx in struct branch_stack
> 6339998d22ec tools headers UAPI: Update tools's copy of linux/perf_event.h
> 401d61cbd4d4 tools lib traceevent: Remove extra '\n' in print_event_time()
> 76ce02651dab libperf: Add counting example
> dabce16bd292 perf annotate: Get rid of annotation->nr_jumps
> 357a5d24c471 perf llvm: Add debug hint message about missing kernel-devel package
> 1af62ce61cd8 perf stat: Show percore counts in per CPU output
> 7982a8985150 tools lib api fs: Move cgroupsfs_find_mountpoint()
> [acme@seventh perf]$
> 
> Please continue from there, I'll process some other patchsets,
> 

Sure. I will re-base my patchset on top of it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
  2020-03-04 13:49   ` Arnaldo Carvalho de Melo
@ 2020-03-10  0:42   ` Arnaldo Carvalho de Melo
  2020-03-10 12:53     ` Liang, Kan
  2020-03-19 14:10   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2 siblings, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-10  0:42 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Fri, Feb 28, 2020 at 08:30:00AM -0800, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The low level index of raw branch records for the most recent branch can
> be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
> branch_sample_type. Extend struct branch_stack to support it.
> 
> However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
> entries[] will be output by kernel. The pointer of entries[] could be
> wrong, since the output format is different with new struct branch_stack.
> Add a variable no_hw_idx in struct perf_sample to indicate whether the
> hw_idx is output.
> Add get_branch_entry() to return corresponding pointer of entries[0].
> 
> To make dummy branch sample consistent as new branch sample, add hw_idx
> in struct dummy_branch_stack for cs-etm and intel-pt.
> 
> Apply the new struct branch_stack for synthetic events as well.
> 
> Extend test case sample-parsing to support new struct branch_stack.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/include/uapi/linux/perf_event.h         |  8 ++-
>  tools/perf/builtin-script.c                   | 70 ++++++++++---------
>  tools/perf/tests/sample-parsing.c             |  7 +-
>  tools/perf/util/branch.h                      | 22 ++++++
>  tools/perf/util/cs-etm.c                      |  1 +
>  tools/perf/util/event.h                       |  1 +
>  tools/perf/util/evsel.c                       |  5 ++
>  tools/perf/util/evsel.h                       |  5 ++
>  tools/perf/util/hist.c                        |  3 +-
>  tools/perf/util/intel-pt.c                    |  2 +
>  tools/perf/util/machine.c                     | 35 +++++-----
>  .../scripting-engines/trace-event-python.c    | 30 ++++----
>  tools/perf/util/session.c                     |  8 ++-
>  tools/perf/util/synthetic-events.c            |  6 +-
>  14 files changed, 131 insertions(+), 72 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 377d794d3105..397cfd65b3fe 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -181,6 +181,8 @@ enum perf_branch_sample_type_shift {
>  
>  	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
>  
> +	PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT	= 17, /* save low level index of raw branch records */
> +
>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>  };
>  
> @@ -208,6 +210,8 @@ enum perf_branch_sample_type {
>  	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
>  		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>  
> +	PERF_SAMPLE_BRANCH_HW_INDEX	= 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
> +
>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> @@ -853,7 +857,9 @@ enum perf_event_type {
>  	 *	  char                  data[size];}&& PERF_SAMPLE_RAW
>  	 *
>  	 *	{ u64                   nr;
> -	 *        { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
> +	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
> +	 *        { u64 from, to, flags } lbr[nr];
> +	 *      } && PERF_SAMPLE_BRANCH_STACK
>  	 *
>  	 * 	{ u64			abi; # enum perf_sample_regs_abi
>  	 * 	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index e2406b291c1c..acf3107bbda2 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  					struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  		return 0;
>  
>  	for (i = 0; i < br->nr; i++) {
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		if (PRINT_FIELD(DSO)) {
>  			memset(&alf, 0, sizeof(alf));
> @@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>  		}
>  
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str( br->entries + i),
> -			br->entries[i].flags.in_tx? 'X' : '-',
> -			br->entries[i].flags.abort? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  					   struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  
>  		memset(&alf, 0, sizeof(alf));
>  		memset(&alt, 0, sizeof(alt));
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
>  		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
> @@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>  			printed += fprintf(fp, ")");
>  		}
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str( br->entries + i),
> -			br->entries[i].flags.in_tx? 'X' : '-',
> -			br->entries[i].flags.abort? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  					   struct perf_event_attr *attr, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct addr_location alf, alt;
>  	u64 i, from, to;
>  	int printed = 0;
> @@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  
>  		memset(&alf, 0, sizeof(alf));
>  		memset(&alt, 0, sizeof(alt));
> -		from = br->entries[i].from;
> -		to   = br->entries[i].to;
> +		from = entries[i].from;
> +		to   = entries[i].to;
>  
>  		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
>  		    !alf.map->dso->adjust_symbols)
> @@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>  			printed += fprintf(fp, ")");
>  		}
>  		printed += fprintf(fp, "/%c/%c/%c/%d ",
> -			mispred_str(br->entries + i),
> -			br->entries[i].flags.in_tx ? 'X' : '-',
> -			br->entries[i].flags.abort ? 'A' : '-',
> -			br->entries[i].flags.cycles);
> +			mispred_str(entries + i),
> +			entries[i].flags.in_tx ? 'X' : '-',
> +			entries[i].flags.abort ? 'A' : '-',
> +			entries[i].flags.cycles);
>  	}
>  
>  	return printed;
> @@ -1053,6 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  					    struct machine *machine, FILE *fp)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	u64 start, end;
>  	int i, insn, len, nr, ilen, printed = 0;
>  	struct perf_insn x;
> @@ -1073,31 +1077,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	printed += fprintf(fp, "%c", '\n');
>  
>  	/* Handle first from jump, of which we don't know the entry. */
> -	len = grab_bb(buffer, br->entries[nr-1].from,
> -			br->entries[nr-1].from,
> +	len = grab_bb(buffer, entries[nr-1].from,
> +			entries[nr-1].from,
>  			machine, thread, &x.is64bit, &x.cpumode, false);
>  	if (len > 0) {
> -		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
> +		printed += ip__fprintf_sym(entries[nr - 1].from, thread,
>  					   x.cpumode, x.cpu, &lastsym, attr, fp);
> -		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
> +		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
>  					    &x, buffer, len, 0, fp, &total_cycles);
>  		if (PRINT_FIELD(SRCCODE))
> -			printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from);
> +			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
>  	}
>  
>  	/* Print all blocks */
>  	for (i = nr - 2; i >= 0; i--) {
> -		if (br->entries[i].from || br->entries[i].to)
> +		if (entries[i].from || entries[i].to)
>  			pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i,
> -				 br->entries[i].from,
> -				 br->entries[i].to);
> -		start = br->entries[i + 1].to;
> -		end   = br->entries[i].from;
> +				 entries[i].from,
> +				 entries[i].to);
> +		start = entries[i + 1].to;
> +		end   = entries[i].from;
>  
>  		len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>  		/* Patch up missing kernel transfers due to ring filters */
>  		if (len == -ENXIO && i > 0) {
> -			end = br->entries[--i].from;
> +			end = entries[--i].from;
>  			pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end);
>  			len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>  		}
> @@ -1110,7 +1114,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  
>  			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
>  			if (ip == end) {
> -				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp,
> +				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
>  							    &total_cycles);
>  				if (PRINT_FIELD(SRCCODE))
>  					printed += print_srccode(thread, x.cpumode, ip);
> @@ -1134,9 +1138,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	 * Hit the branch? In this case we are already done, and the target
>  	 * has not been executed yet.
>  	 */
> -	if (br->entries[0].from == sample->ip)
> +	if (entries[0].from == sample->ip)
>  		goto out;
> -	if (br->entries[0].flags.abort)
> +	if (entries[0].flags.abort)
>  		goto out;
>  
>  	/*
> @@ -1147,7 +1151,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>  	 * between final branch and sample. When this happens just
>  	 * continue walking after the last TO until we hit a branch.
>  	 */
> -	start = br->entries[0].to;
> +	start = entries[0].to;
>  	end = sample->ip;
>  	if (end < start) {
>  		/* Missing jump. Scan 128 bytes for the next branch */
> diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
> index 2762e1155238..14239e472187 100644
> --- a/tools/perf/tests/sample-parsing.c
> +++ b/tools/perf/tests/sample-parsing.c
> @@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		COMP(branch_stack->nr);
> +		COMP(branch_stack->hw_idx);
>  		for (i = 0; i < s1->branch_stack->nr; i++)
>  			MCOMP(branch_stack->entries[i]);
>  	}
> @@ -186,7 +187,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  		u64 data[64];
>  	} branch_stack = {
>  		/* 1 branch_entry */
> -		.data = {1, 211, 212, 213},
> +		.data = {1, -1ULL, 211, 212, 213},
>  	};
>  	u64 regs[64];
>  	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
> @@ -208,6 +209,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  		.transaction	= 112,
>  		.raw_data	= (void *)raw_data,
>  		.callchain	= &callchain.callchain,
> +		.no_hw_idx      = false,
>  		.branch_stack	= &branch_stack.branch_stack,
>  		.user_regs	= {
>  			.abi	= PERF_SAMPLE_REGS_ABI_64,
> @@ -244,6 +246,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  	if (sample_type & PERF_SAMPLE_REGS_INTR)
>  		evsel.core.attr.sample_regs_intr = sample_regs;
>  
> +	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
> +		evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
> +
>  	for (i = 0; i < sizeof(regs); i++)
>  		*(i + (u8 *)regs) = i & 0xfe;
>  
> diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
> index 88e00d268f6f..7fc9fa0dc361 100644
> --- a/tools/perf/util/branch.h
> +++ b/tools/perf/util/branch.h
> @@ -12,6 +12,7 @@
>  #include <linux/stddef.h>
>  #include <linux/perf_event.h>
>  #include <linux/types.h>
> +#include "event.h"
>  
>  struct branch_flags {
>  	u64 mispred:1;
> @@ -39,9 +40,30 @@ struct branch_entry {
>  
>  struct branch_stack {
>  	u64			nr;
> +	u64			hw_idx;
>  	struct branch_entry	entries[0];
>  };
>  
> +/*
> + * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
> + * Otherwise, the output format of a sample with branch stack is
> + * struct branch_stack {
> + *	u64			nr;
> + *	struct branch_entry	entries[0];
> + * }
> + * Check whether the hw_idx is available,
> + * and return the corresponding pointer of entries[0].
> + */
> +inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
> +{
> +	u64 *entry = (u64 *)sample->branch_stack;
> +
> +	entry++;
> +	if (sample->no_hw_idx)
> +		return (struct branch_entry *)entry;
> +	return (struct branch_entry *)(++entry);
> +}
> +
>  struct branch_type_stat {
>  	bool	branch_to;
>  	u64	counts[PERF_BR_MAX];
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 5471045ebf5c..e697fe1c67b3 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1202,6 +1202,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>  	if (etm->synth_opts.last_branch) {
>  		dummy_bs = (struct dummy_branch_stack){
>  			.nr = 1,
> +			.hw_idx = -1ULL,
>  			.entries = {
>  				.from = sample.ip,
>  				.to = sample.addr,

This one breaks the build when cross building to arm64:

  CC       /tmp/build/perf/util/cs-etm.o
  CC       /tmp/build/perf/util/parse-branch-options.o
  CC       /tmp/build/perf/util/dump-insn.o
util/cs-etm.c: In function 'cs_etm__synth_branch_sample':
util/cs-etm.c:1205:5: error: 'struct dummy_branch_stack' has no member named 'hw_idx'
    .hw_idx = -1ULL,
     ^~~~~~
util/cs-etm.c:1203:14: error: missing braces around initializer [-Werror=missing-braces]
   dummy_bs = (struct dummy_branch_stack){
              ^
util/cs-etm.c:1205:14:
    .hw_idx = -1ULL,
              {    }
util/cs-etm.c:1206:15: error: initialized field overwritten [-Werror=override-init]
    .entries = {
               ^
util/cs-etm.c:1206:15: note: (near initialization for '(anonymous).entries')
util/cs-etm.c:1203:14: error: missing braces around initializer [-Werror=missing-braces]
   dummy_bs = (struct dummy_branch_stack){
              ^
util/cs-etm.c:1205:14:
    .hw_idx = -1ULL,
              {    }
cc1: all warnings being treated as errors
mv: cannot stat '/tmp/build/perf/util/.cs-etm.o.tmp': No such file or directory

As that is 'struct dummy_branck_stack', not 'struct branch_stack', where you
added that hw_idx, please check the logic, I'm adding the following quick fix
and restarting the builds overnight, please check if this is the right thing to
do.

- Arnaldo

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index e697fe1c67b3..b3b3fe3ea345 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1172,6 +1172,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	union perf_event *event = tidq->event_buf;
 	struct dummy_branch_stack {
 		u64			nr;
+		u64			hw_idx;
 		struct branch_entry	entries;
 	} dummy_bs;
 	u64 ip;

> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 85223159737c..3cda40a2fafc 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -139,6 +139,7 @@ struct perf_sample {
>  	u16 insn_len;
>  	u8  cpumode;
>  	u16 misc;
> +	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
>  	char insn[MAX_INSN];
>  	void *raw_data;
>  	struct ip_callchain *callchain;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index c8dc4450884c..05883a45de5b 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2169,7 +2169,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  
>  		if (data->branch_stack->nr > max_branch_nr)
>  			return -EFAULT;
> +
>  		sz = data->branch_stack->nr * sizeof(struct branch_entry);
> +		if (perf_evsel__has_branch_hw_idx(evsel))
> +			sz += sizeof(u64);
> +		else
> +			data->no_hw_idx = true;
>  		OVERFLOW_CHECK(array, sz, max_size);
>  		array = (void *)array + sz;
>  	}
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index dc14f4a823cd..99a0cb60c556 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -389,6 +389,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel)
>  	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
>  }
>  
> +static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel)
> +{
> +	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
> +}
> +
>  static inline bool evsel__has_callchain(const struct evsel *evsel)
>  {
>  	return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0;
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index ca5a8f4d007e..808ca27bd5cf 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -2584,9 +2584,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>  			  u64 *total_cycles)
>  {
>  	struct branch_info *bi;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  
>  	/* If we have branch cycles always annotate them. */
> -	if (bs && bs->nr && bs->entries[0].flags.cycles) {
> +	if (bs && bs->nr && entries[0].flags.cycles) {
>  		int i;
>  
>  		bi = sample__resolve_bstack(sample, al);
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 33cf8928cf05..23c8289c2472 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -1295,6 +1295,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>  	struct perf_sample sample = { .ip = 0, };
>  	struct dummy_branch_stack {
>  		u64			nr;
> +		u64			hw_idx;
>  		struct branch_entry	entries;
>  	} dummy_bs;
>  
> @@ -1316,6 +1317,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>  	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
>  		dummy_bs = (struct dummy_branch_stack){
>  			.nr = 1,
> +			.hw_idx = -1ULL,
>  			.entries = {
>  				.from = sample.ip,
>  				.to = sample.addr,
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index c8c5410315e8..62522b76a924 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2083,15 +2083,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
>  {
>  	unsigned int i;
>  	const struct branch_stack *bs = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
>  
>  	if (!bi)
>  		return NULL;
>  
>  	for (i = 0; i < bs->nr; i++) {
> -		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
> -		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
> -		bi[i].flags = bs->entries[i].flags;
> +		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
> +		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
> +		bi[i].flags = entries[i].flags;
>  	}
>  	return bi;
>  }
> @@ -2187,6 +2188,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  	/* LBR only affects the user callchain */
>  	if (i != chain_nr) {
>  		struct branch_stack *lbr_stack = sample->branch_stack;
> +		struct branch_entry *entries = get_branch_entry(sample);
>  		int lbr_nr = lbr_stack->nr, j, k;
>  		bool branch;
>  		struct branch_flags *flags;
> @@ -2212,31 +2214,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  					ip = chain->ips[j];
>  				else if (j > i + 1) {
>  					k = j - i - 2;
> -					ip = lbr_stack->entries[k].from;
> +					ip = entries[k].from;
>  					branch = true;
> -					flags = &lbr_stack->entries[k].flags;
> +					flags = &entries[k].flags;
>  				} else {
> -					ip = lbr_stack->entries[0].to;
> +					ip = entries[0].to;
>  					branch = true;
> -					flags = &lbr_stack->entries[0].flags;
> -					branch_from =
> -						lbr_stack->entries[0].from;
> +					flags = &entries[0].flags;
> +					branch_from = entries[0].from;
>  				}
>  			} else {
>  				if (j < lbr_nr) {
>  					k = lbr_nr - j - 1;
> -					ip = lbr_stack->entries[k].from;
> +					ip = entries[k].from;
>  					branch = true;
> -					flags = &lbr_stack->entries[k].flags;
> +					flags = &entries[k].flags;
>  				}
>  				else if (j > lbr_nr)
>  					ip = chain->ips[i + 1 - (j - lbr_nr)];
>  				else {
> -					ip = lbr_stack->entries[0].to;
> +					ip = entries[0].to;
>  					branch = true;
> -					flags = &lbr_stack->entries[0].flags;
> -					branch_from =
> -						lbr_stack->entries[0].from;
> +					flags = &entries[0].flags;
> +					branch_from = entries[0].from;
>  				}
>  			}
>  
> @@ -2283,6 +2283,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  					    int max_stack)
>  {
>  	struct branch_stack *branch = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	struct ip_callchain *chain = sample->callchain;
>  	int chain_nr = 0;
>  	u8 cpumode = PERF_RECORD_MISC_USER;
> @@ -2330,7 +2331,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  
>  		for (i = 0; i < nr; i++) {
>  			if (callchain_param.order == ORDER_CALLEE) {
> -				be[i] = branch->entries[i];
> +				be[i] = entries[i];
>  
>  				if (chain == NULL)
>  					continue;
> @@ -2349,7 +2350,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  				    be[i].from >= chain->ips[first_call] - 8)
>  					first_call++;
>  			} else
> -				be[i] = branch->entries[branch->nr - i - 1];
> +				be[i] = entries[branch->nr - i - 1];
>  		}
>  
>  		memset(iter, 0, sizeof(struct iterations) * nr);
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 80ca5d0ab7fe..02b6c87c5abe 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>  					struct thread *thread)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	PyObject *pylist;
>  	u64 i;
>  
> @@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>  			Py_FatalError("couldn't create Python dictionary");
>  
>  		pydict_set_item_string_decref(pyelem, "from",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].from));
> +		    PyLong_FromUnsignedLongLong(entries[i].from));
>  		pydict_set_item_string_decref(pyelem, "to",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].to));
> +		    PyLong_FromUnsignedLongLong(entries[i].to));
>  		pydict_set_item_string_decref(pyelem, "mispred",
> -		    PyBool_FromLong(br->entries[i].flags.mispred));
> +		    PyBool_FromLong(entries[i].flags.mispred));
>  		pydict_set_item_string_decref(pyelem, "predicted",
> -		    PyBool_FromLong(br->entries[i].flags.predicted));
> +		    PyBool_FromLong(entries[i].flags.predicted));
>  		pydict_set_item_string_decref(pyelem, "in_tx",
> -		    PyBool_FromLong(br->entries[i].flags.in_tx));
> +		    PyBool_FromLong(entries[i].flags.in_tx));
>  		pydict_set_item_string_decref(pyelem, "abort",
> -		    PyBool_FromLong(br->entries[i].flags.abort));
> +		    PyBool_FromLong(entries[i].flags.abort));
>  		pydict_set_item_string_decref(pyelem, "cycles",
> -		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
> +		    PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
>  
>  		thread__find_map_fb(thread, sample->cpumode,
> -				    br->entries[i].from, &al);
> +				    entries[i].from, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "from_dsoname",
>  					      _PyUnicode_FromString(dsoname));
>  
>  		thread__find_map_fb(thread, sample->cpumode,
> -				    br->entries[i].to, &al);
> +				    entries[i].to, &al);
>  		dsoname = get_dsoname(al.map);
>  		pydict_set_item_string_decref(pyelem, "to_dsoname",
>  					      _PyUnicode_FromString(dsoname));
> @@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  					   struct thread *thread)
>  {
>  	struct branch_stack *br = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	PyObject *pylist;
>  	u64 i;
>  	char bf[512];
> @@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  			Py_FatalError("couldn't create Python dictionary");
>  
>  		thread__find_symbol_fb(thread, sample->cpumode,
> -				       br->entries[i].from, &al);
> +				       entries[i].from, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "from",
>  					      _PyUnicode_FromString(bf));
>  
>  		thread__find_symbol_fb(thread, sample->cpumode,
> -				       br->entries[i].to, &al);
> +				       entries[i].to, &al);
>  		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "to",
>  					      _PyUnicode_FromString(bf));
>  
> -		get_br_mspred(&br->entries[i].flags, bf, sizeof(bf));
> +		get_br_mspred(&entries[i].flags, bf, sizeof(bf));
>  		pydict_set_item_string_decref(pyelem, "pred",
>  					      _PyUnicode_FromString(bf));
>  
> -		if (br->entries[i].flags.in_tx) {
> +		if (entries[i].flags.in_tx) {
>  			pydict_set_item_string_decref(pyelem, "in_tx",
>  					      _PyUnicode_FromString("X"));
>  		} else {
> @@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>  					      _PyUnicode_FromString("-"));
>  		}
>  
> -		if (br->entries[i].flags.abort) {
> +		if (entries[i].flags.abort) {
>  			pydict_set_item_string_decref(pyelem, "abort",
>  					      _PyUnicode_FromString("A"));
>  		} else {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index d0d7d25b23e3..dab985e3f136 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1007,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>  {
>  	struct ip_callchain *callchain = sample->callchain;
>  	struct branch_stack *lbr_stack = sample->branch_stack;
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	u64 kernel_callchain_nr = callchain->nr;
>  	unsigned int i;
>  
> @@ -1043,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>  			       i, callchain->ips[i]);
>  
>  		printf("..... %2d: %016" PRIx64 "\n",
> -		       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
> +		       (int)(kernel_callchain_nr), entries[0].to);
>  		for (i = 0; i < lbr_stack->nr; i++)
>  			printf("..... %2d: %016" PRIx64 "\n",
> -			       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
> +			       (int)(i + kernel_callchain_nr + 1), entries[i].from);
>  	}
>  }
>  
> @@ -1068,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
>  
>  static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>  {
> +	struct branch_entry *entries = get_branch_entry(sample);
>  	uint64_t i;
>  
>  	printf("%s: nr:%" PRIu64 "\n",
> @@ -1075,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>  		sample->branch_stack->nr);
>  
>  	for (i = 0; i < sample->branch_stack->nr; i++) {
> -		struct branch_entry *e = &sample->branch_stack->entries[i];
> +		struct branch_entry *e = &entries[i];
>  
>  		if (!callstack) {
>  			printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index c423298fe62d..dd3e6f43fb86 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> -		sz += sizeof(u64);
> +		/* nr, hw_idx */
> +		sz += 2 * sizeof(u64);
>  		result += sz;
>  	}
>  
> @@ -1344,7 +1345,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>  
>  	if (type & PERF_SAMPLE_BRANCH_STACK) {
>  		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> -		sz += sizeof(u64);
> +		/* nr, hw_idx */
> +		sz += 2 * sizeof(u64);
>  		memcpy(array, sample->branch_stack, sz);
>  		array = (void *)array + sz;
>  	}
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack
  2020-03-10  0:42   ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack Arnaldo Carvalho de Melo
@ 2020-03-10 12:53     ` Liang, Kan
  0 siblings, 0 replies; 31+ messages in thread
From: Liang, Kan @ 2020-03-10 12:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/9/2020 8:42 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Feb 28, 2020 at 08:30:00AM -0800, kan.liang@linux.intel.com escreveu:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The low level index of raw branch records for the most recent branch can
>> be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
>> branch_sample_type. Extend struct branch_stack to support it.
>>
>> However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
>> entries[] will be output by kernel. The pointer of entries[] could be
>> wrong, since the output format is different with new struct branch_stack.
>> Add a variable no_hw_idx in struct perf_sample to indicate whether the
>> hw_idx is output.
>> Add get_branch_entry() to return corresponding pointer of entries[0].
>>
>> To make dummy branch sample consistent as new branch sample, add hw_idx
>> in struct dummy_branch_stack for cs-etm and intel-pt.
>>
>> Apply the new struct branch_stack for synthetic events as well.
>>
>> Extend test case sample-parsing to support new struct branch_stack.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   tools/include/uapi/linux/perf_event.h         |  8 ++-
>>   tools/perf/builtin-script.c                   | 70 ++++++++++---------
>>   tools/perf/tests/sample-parsing.c             |  7 +-
>>   tools/perf/util/branch.h                      | 22 ++++++
>>   tools/perf/util/cs-etm.c                      |  1 +
>>   tools/perf/util/event.h                       |  1 +
>>   tools/perf/util/evsel.c                       |  5 ++
>>   tools/perf/util/evsel.h                       |  5 ++
>>   tools/perf/util/hist.c                        |  3 +-
>>   tools/perf/util/intel-pt.c                    |  2 +
>>   tools/perf/util/machine.c                     | 35 +++++-----
>>   .../scripting-engines/trace-event-python.c    | 30 ++++----
>>   tools/perf/util/session.c                     |  8 ++-
>>   tools/perf/util/synthetic-events.c            |  6 +-
>>   14 files changed, 131 insertions(+), 72 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> index 377d794d3105..397cfd65b3fe 100644
>> --- a/tools/include/uapi/linux/perf_event.h
>> +++ b/tools/include/uapi/linux/perf_event.h
>> @@ -181,6 +181,8 @@ enum perf_branch_sample_type_shift {
>>   
>>   	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
>>   
>> +	PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT	= 17, /* save low level index of raw branch records */
>> +
>>   	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>>   };
>>   
>> @@ -208,6 +210,8 @@ enum perf_branch_sample_type {
>>   	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
>>   		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>>   
>> +	PERF_SAMPLE_BRANCH_HW_INDEX	= 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
>> +
>>   	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>   };
>>   
>> @@ -853,7 +857,9 @@ enum perf_event_type {
>>   	 *	  char                  data[size];}&& PERF_SAMPLE_RAW
>>   	 *
>>   	 *	{ u64                   nr;
>> -	 *        { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
>> +	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
>> +	 *        { u64 from, to, flags } lbr[nr];
>> +	 *      } && PERF_SAMPLE_BRANCH_STACK
>>   	 *
>>   	 * 	{ u64			abi; # enum perf_sample_regs_abi
>>   	 * 	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index e2406b291c1c..acf3107bbda2 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>>   					struct perf_event_attr *attr, FILE *fp)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	struct addr_location alf, alt;
>>   	u64 i, from, to;
>>   	int printed = 0;
>> @@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>>   		return 0;
>>   
>>   	for (i = 0; i < br->nr; i++) {
>> -		from = br->entries[i].from;
>> -		to   = br->entries[i].to;
>> +		from = entries[i].from;
>> +		to   = entries[i].to;
>>   
>>   		if (PRINT_FIELD(DSO)) {
>>   			memset(&alf, 0, sizeof(alf));
>> @@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
>>   		}
>>   
>>   		printed += fprintf(fp, "/%c/%c/%c/%d ",
>> -			mispred_str( br->entries + i),
>> -			br->entries[i].flags.in_tx? 'X' : '-',
>> -			br->entries[i].flags.abort? 'A' : '-',
>> -			br->entries[i].flags.cycles);
>> +			mispred_str(entries + i),
>> +			entries[i].flags.in_tx ? 'X' : '-',
>> +			entries[i].flags.abort ? 'A' : '-',
>> +			entries[i].flags.cycles);
>>   	}
>>   
>>   	return printed;
>> @@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>>   					   struct perf_event_attr *attr, FILE *fp)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	struct addr_location alf, alt;
>>   	u64 i, from, to;
>>   	int printed = 0;
>> @@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>>   
>>   		memset(&alf, 0, sizeof(alf));
>>   		memset(&alt, 0, sizeof(alt));
>> -		from = br->entries[i].from;
>> -		to   = br->entries[i].to;
>> +		from = entries[i].from;
>> +		to   = entries[i].to;
>>   
>>   		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
>>   		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
>> @@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
>>   			printed += fprintf(fp, ")");
>>   		}
>>   		printed += fprintf(fp, "/%c/%c/%c/%d ",
>> -			mispred_str( br->entries + i),
>> -			br->entries[i].flags.in_tx? 'X' : '-',
>> -			br->entries[i].flags.abort? 'A' : '-',
>> -			br->entries[i].flags.cycles);
>> +			mispred_str(entries + i),
>> +			entries[i].flags.in_tx ? 'X' : '-',
>> +			entries[i].flags.abort ? 'A' : '-',
>> +			entries[i].flags.cycles);
>>   	}
>>   
>>   	return printed;
>> @@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>>   					   struct perf_event_attr *attr, FILE *fp)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	struct addr_location alf, alt;
>>   	u64 i, from, to;
>>   	int printed = 0;
>> @@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>>   
>>   		memset(&alf, 0, sizeof(alf));
>>   		memset(&alt, 0, sizeof(alt));
>> -		from = br->entries[i].from;
>> -		to   = br->entries[i].to;
>> +		from = entries[i].from;
>> +		to   = entries[i].to;
>>   
>>   		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
>>   		    !alf.map->dso->adjust_symbols)
>> @@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
>>   			printed += fprintf(fp, ")");
>>   		}
>>   		printed += fprintf(fp, "/%c/%c/%c/%d ",
>> -			mispred_str(br->entries + i),
>> -			br->entries[i].flags.in_tx ? 'X' : '-',
>> -			br->entries[i].flags.abort ? 'A' : '-',
>> -			br->entries[i].flags.cycles);
>> +			mispred_str(entries + i),
>> +			entries[i].flags.in_tx ? 'X' : '-',
>> +			entries[i].flags.abort ? 'A' : '-',
>> +			entries[i].flags.cycles);
>>   	}
>>   
>>   	return printed;
>> @@ -1053,6 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>>   					    struct machine *machine, FILE *fp)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	u64 start, end;
>>   	int i, insn, len, nr, ilen, printed = 0;
>>   	struct perf_insn x;
>> @@ -1073,31 +1077,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>>   	printed += fprintf(fp, "%c", '\n');
>>   
>>   	/* Handle first from jump, of which we don't know the entry. */
>> -	len = grab_bb(buffer, br->entries[nr-1].from,
>> -			br->entries[nr-1].from,
>> +	len = grab_bb(buffer, entries[nr-1].from,
>> +			entries[nr-1].from,
>>   			machine, thread, &x.is64bit, &x.cpumode, false);
>>   	if (len > 0) {
>> -		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
>> +		printed += ip__fprintf_sym(entries[nr - 1].from, thread,
>>   					   x.cpumode, x.cpu, &lastsym, attr, fp);
>> -		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
>> +		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
>>   					    &x, buffer, len, 0, fp, &total_cycles);
>>   		if (PRINT_FIELD(SRCCODE))
>> -			printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from);
>> +			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
>>   	}
>>   
>>   	/* Print all blocks */
>>   	for (i = nr - 2; i >= 0; i--) {
>> -		if (br->entries[i].from || br->entries[i].to)
>> +		if (entries[i].from || entries[i].to)
>>   			pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i,
>> -				 br->entries[i].from,
>> -				 br->entries[i].to);
>> -		start = br->entries[i + 1].to;
>> -		end   = br->entries[i].from;
>> +				 entries[i].from,
>> +				 entries[i].to);
>> +		start = entries[i + 1].to;
>> +		end   = entries[i].from;
>>   
>>   		len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>>   		/* Patch up missing kernel transfers due to ring filters */
>>   		if (len == -ENXIO && i > 0) {
>> -			end = br->entries[--i].from;
>> +			end = entries[--i].from;
>>   			pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end);
>>   			len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
>>   		}
>> @@ -1110,7 +1114,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>>   
>>   			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
>>   			if (ip == end) {
>> -				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp,
>> +				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
>>   							    &total_cycles);
>>   				if (PRINT_FIELD(SRCCODE))
>>   					printed += print_srccode(thread, x.cpumode, ip);
>> @@ -1134,9 +1138,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>>   	 * Hit the branch? In this case we are already done, and the target
>>   	 * has not been executed yet.
>>   	 */
>> -	if (br->entries[0].from == sample->ip)
>> +	if (entries[0].from == sample->ip)
>>   		goto out;
>> -	if (br->entries[0].flags.abort)
>> +	if (entries[0].flags.abort)
>>   		goto out;
>>   
>>   	/*
>> @@ -1147,7 +1151,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
>>   	 * between final branch and sample. When this happens just
>>   	 * continue walking after the last TO until we hit a branch.
>>   	 */
>> -	start = br->entries[0].to;
>> +	start = entries[0].to;
>>   	end = sample->ip;
>>   	if (end < start) {
>>   		/* Missing jump. Scan 128 bytes for the next branch */
>> diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
>> index 2762e1155238..14239e472187 100644
>> --- a/tools/perf/tests/sample-parsing.c
>> +++ b/tools/perf/tests/sample-parsing.c
>> @@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
>>   
>>   	if (type & PERF_SAMPLE_BRANCH_STACK) {
>>   		COMP(branch_stack->nr);
>> +		COMP(branch_stack->hw_idx);
>>   		for (i = 0; i < s1->branch_stack->nr; i++)
>>   			MCOMP(branch_stack->entries[i]);
>>   	}
>> @@ -186,7 +187,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>>   		u64 data[64];
>>   	} branch_stack = {
>>   		/* 1 branch_entry */
>> -		.data = {1, 211, 212, 213},
>> +		.data = {1, -1ULL, 211, 212, 213},
>>   	};
>>   	u64 regs[64];
>>   	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
>> @@ -208,6 +209,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>>   		.transaction	= 112,
>>   		.raw_data	= (void *)raw_data,
>>   		.callchain	= &callchain.callchain,
>> +		.no_hw_idx      = false,
>>   		.branch_stack	= &branch_stack.branch_stack,
>>   		.user_regs	= {
>>   			.abi	= PERF_SAMPLE_REGS_ABI_64,
>> @@ -244,6 +246,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>>   	if (sample_type & PERF_SAMPLE_REGS_INTR)
>>   		evsel.core.attr.sample_regs_intr = sample_regs;
>>   
>> +	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
>> +		evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
>> +
>>   	for (i = 0; i < sizeof(regs); i++)
>>   		*(i + (u8 *)regs) = i & 0xfe;
>>   
>> diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
>> index 88e00d268f6f..7fc9fa0dc361 100644
>> --- a/tools/perf/util/branch.h
>> +++ b/tools/perf/util/branch.h
>> @@ -12,6 +12,7 @@
>>   #include <linux/stddef.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/types.h>
>> +#include "event.h"
>>   
>>   struct branch_flags {
>>   	u64 mispred:1;
>> @@ -39,9 +40,30 @@ struct branch_entry {
>>   
>>   struct branch_stack {
>>   	u64			nr;
>> +	u64			hw_idx;
>>   	struct branch_entry	entries[0];
>>   };
>>   
>> +/*
>> + * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
>> + * Otherwise, the output format of a sample with branch stack is
>> + * struct branch_stack {
>> + *	u64			nr;
>> + *	struct branch_entry	entries[0];
>> + * }
>> + * Check whether the hw_idx is available,
>> + * and return the corresponding pointer of entries[0].
>> + */
>> +inline struct branch_entry *get_branch_entry(struct perf_sample *sample)
>> +{
>> +	u64 *entry = (u64 *)sample->branch_stack;
>> +
>> +	entry++;
>> +	if (sample->no_hw_idx)
>> +		return (struct branch_entry *)entry;
>> +	return (struct branch_entry *)(++entry);
>> +}
>> +
>>   struct branch_type_stat {
>>   	bool	branch_to;
>>   	u64	counts[PERF_BR_MAX];
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 5471045ebf5c..e697fe1c67b3 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -1202,6 +1202,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>>   	if (etm->synth_opts.last_branch) {
>>   		dummy_bs = (struct dummy_branch_stack){
>>   			.nr = 1,
>> +			.hw_idx = -1ULL,
>>   			.entries = {
>>   				.from = sample.ip,
>>   				.to = sample.addr,
> 
> This one breaks the build when cross building to arm64:
> 
>    CC       /tmp/build/perf/util/cs-etm.o
>    CC       /tmp/build/perf/util/parse-branch-options.o
>    CC       /tmp/build/perf/util/dump-insn.o
> util/cs-etm.c: In function 'cs_etm__synth_branch_sample':
> util/cs-etm.c:1205:5: error: 'struct dummy_branch_stack' has no member named 'hw_idx'
>      .hw_idx = -1ULL,
>       ^~~~~~
> util/cs-etm.c:1203:14: error: missing braces around initializer [-Werror=missing-braces]
>     dummy_bs = (struct dummy_branch_stack){
>                ^
> util/cs-etm.c:1205:14:
>      .hw_idx = -1ULL,
>                {    }
> util/cs-etm.c:1206:15: error: initialized field overwritten [-Werror=override-init]
>      .entries = {
>                 ^
> util/cs-etm.c:1206:15: note: (near initialization for '(anonymous).entries')
> util/cs-etm.c:1203:14: error: missing braces around initializer [-Werror=missing-braces]
>     dummy_bs = (struct dummy_branch_stack){
>                ^
> util/cs-etm.c:1205:14:
>      .hw_idx = -1ULL,
>                {    }
> cc1: all warnings being treated as errors
> mv: cannot stat '/tmp/build/perf/util/.cs-etm.o.tmp': No such file or directory
> 
> As that is 'struct dummy_branck_stack', not 'struct branch_stack', where you
> added that hw_idx, please check the logic, I'm adding the following quick fix
> and restarting the builds overnight, please check if this is the right thing to
> do.

Yes, it's correct. Thanks for the quick fix.

Thanks,
Kan

> 
> - Arnaldo
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index e697fe1c67b3..b3b3fe3ea345 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1172,6 +1172,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>   	union perf_event *event = tidq->event_buf;
>   	struct dummy_branch_stack {
>   		u64			nr;
> +		u64			hw_idx;
>   		struct branch_entry	entries;
>   	} dummy_bs;
>   	u64 ip;
> 
>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>> index 85223159737c..3cda40a2fafc 100644
>> --- a/tools/perf/util/event.h
>> +++ b/tools/perf/util/event.h
>> @@ -139,6 +139,7 @@ struct perf_sample {
>>   	u16 insn_len;
>>   	u8  cpumode;
>>   	u16 misc;
>> +	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
>>   	char insn[MAX_INSN];
>>   	void *raw_data;
>>   	struct ip_callchain *callchain;
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index c8dc4450884c..05883a45de5b 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -2169,7 +2169,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>>   
>>   		if (data->branch_stack->nr > max_branch_nr)
>>   			return -EFAULT;
>> +
>>   		sz = data->branch_stack->nr * sizeof(struct branch_entry);
>> +		if (perf_evsel__has_branch_hw_idx(evsel))
>> +			sz += sizeof(u64);
>> +		else
>> +			data->no_hw_idx = true;
>>   		OVERFLOW_CHECK(array, sz, max_size);
>>   		array = (void *)array + sz;
>>   	}
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index dc14f4a823cd..99a0cb60c556 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -389,6 +389,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel)
>>   	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
>>   }
>>   
>> +static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel)
>> +{
>> +	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
>> +}
>> +
>>   static inline bool evsel__has_callchain(const struct evsel *evsel)
>>   {
>>   	return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0;
>> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
>> index ca5a8f4d007e..808ca27bd5cf 100644
>> --- a/tools/perf/util/hist.c
>> +++ b/tools/perf/util/hist.c
>> @@ -2584,9 +2584,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>>   			  u64 *total_cycles)
>>   {
>>   	struct branch_info *bi;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   
>>   	/* If we have branch cycles always annotate them. */
>> -	if (bs && bs->nr && bs->entries[0].flags.cycles) {
>> +	if (bs && bs->nr && entries[0].flags.cycles) {
>>   		int i;
>>   
>>   		bi = sample__resolve_bstack(sample, al);
>> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
>> index 33cf8928cf05..23c8289c2472 100644
>> --- a/tools/perf/util/intel-pt.c
>> +++ b/tools/perf/util/intel-pt.c
>> @@ -1295,6 +1295,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>>   	struct perf_sample sample = { .ip = 0, };
>>   	struct dummy_branch_stack {
>>   		u64			nr;
>> +		u64			hw_idx;
>>   		struct branch_entry	entries;
>>   	} dummy_bs;
>>   
>> @@ -1316,6 +1317,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>>   	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
>>   		dummy_bs = (struct dummy_branch_stack){
>>   			.nr = 1,
>> +			.hw_idx = -1ULL,
>>   			.entries = {
>>   				.from = sample.ip,
>>   				.to = sample.addr,
>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> index c8c5410315e8..62522b76a924 100644
>> --- a/tools/perf/util/machine.c
>> +++ b/tools/perf/util/machine.c
>> @@ -2083,15 +2083,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
>>   {
>>   	unsigned int i;
>>   	const struct branch_stack *bs = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
>>   
>>   	if (!bi)
>>   		return NULL;
>>   
>>   	for (i = 0; i < bs->nr; i++) {
>> -		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
>> -		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
>> -		bi[i].flags = bs->entries[i].flags;
>> +		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
>> +		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
>> +		bi[i].flags = entries[i].flags;
>>   	}
>>   	return bi;
>>   }
>> @@ -2187,6 +2188,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>>   	/* LBR only affects the user callchain */
>>   	if (i != chain_nr) {
>>   		struct branch_stack *lbr_stack = sample->branch_stack;
>> +		struct branch_entry *entries = get_branch_entry(sample);
>>   		int lbr_nr = lbr_stack->nr, j, k;
>>   		bool branch;
>>   		struct branch_flags *flags;
>> @@ -2212,31 +2214,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>>   					ip = chain->ips[j];
>>   				else if (j > i + 1) {
>>   					k = j - i - 2;
>> -					ip = lbr_stack->entries[k].from;
>> +					ip = entries[k].from;
>>   					branch = true;
>> -					flags = &lbr_stack->entries[k].flags;
>> +					flags = &entries[k].flags;
>>   				} else {
>> -					ip = lbr_stack->entries[0].to;
>> +					ip = entries[0].to;
>>   					branch = true;
>> -					flags = &lbr_stack->entries[0].flags;
>> -					branch_from =
>> -						lbr_stack->entries[0].from;
>> +					flags = &entries[0].flags;
>> +					branch_from = entries[0].from;
>>   				}
>>   			} else {
>>   				if (j < lbr_nr) {
>>   					k = lbr_nr - j - 1;
>> -					ip = lbr_stack->entries[k].from;
>> +					ip = entries[k].from;
>>   					branch = true;
>> -					flags = &lbr_stack->entries[k].flags;
>> +					flags = &entries[k].flags;
>>   				}
>>   				else if (j > lbr_nr)
>>   					ip = chain->ips[i + 1 - (j - lbr_nr)];
>>   				else {
>> -					ip = lbr_stack->entries[0].to;
>> +					ip = entries[0].to;
>>   					branch = true;
>> -					flags = &lbr_stack->entries[0].flags;
>> -					branch_from =
>> -						lbr_stack->entries[0].from;
>> +					flags = &entries[0].flags;
>> +					branch_from = entries[0].from;
>>   				}
>>   			}
>>   
>> @@ -2283,6 +2283,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>>   					    int max_stack)
>>   {
>>   	struct branch_stack *branch = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	struct ip_callchain *chain = sample->callchain;
>>   	int chain_nr = 0;
>>   	u8 cpumode = PERF_RECORD_MISC_USER;
>> @@ -2330,7 +2331,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>>   
>>   		for (i = 0; i < nr; i++) {
>>   			if (callchain_param.order == ORDER_CALLEE) {
>> -				be[i] = branch->entries[i];
>> +				be[i] = entries[i];
>>   
>>   				if (chain == NULL)
>>   					continue;
>> @@ -2349,7 +2350,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>>   				    be[i].from >= chain->ips[first_call] - 8)
>>   					first_call++;
>>   			} else
>> -				be[i] = branch->entries[branch->nr - i - 1];
>> +				be[i] = entries[branch->nr - i - 1];
>>   		}
>>   
>>   		memset(iter, 0, sizeof(struct iterations) * nr);
>> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
>> index 80ca5d0ab7fe..02b6c87c5abe 100644
>> --- a/tools/perf/util/scripting-engines/trace-event-python.c
>> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
>> @@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>>   					struct thread *thread)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	PyObject *pylist;
>>   	u64 i;
>>   
>> @@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
>>   			Py_FatalError("couldn't create Python dictionary");
>>   
>>   		pydict_set_item_string_decref(pyelem, "from",
>> -		    PyLong_FromUnsignedLongLong(br->entries[i].from));
>> +		    PyLong_FromUnsignedLongLong(entries[i].from));
>>   		pydict_set_item_string_decref(pyelem, "to",
>> -		    PyLong_FromUnsignedLongLong(br->entries[i].to));
>> +		    PyLong_FromUnsignedLongLong(entries[i].to));
>>   		pydict_set_item_string_decref(pyelem, "mispred",
>> -		    PyBool_FromLong(br->entries[i].flags.mispred));
>> +		    PyBool_FromLong(entries[i].flags.mispred));
>>   		pydict_set_item_string_decref(pyelem, "predicted",
>> -		    PyBool_FromLong(br->entries[i].flags.predicted));
>> +		    PyBool_FromLong(entries[i].flags.predicted));
>>   		pydict_set_item_string_decref(pyelem, "in_tx",
>> -		    PyBool_FromLong(br->entries[i].flags.in_tx));
>> +		    PyBool_FromLong(entries[i].flags.in_tx));
>>   		pydict_set_item_string_decref(pyelem, "abort",
>> -		    PyBool_FromLong(br->entries[i].flags.abort));
>> +		    PyBool_FromLong(entries[i].flags.abort));
>>   		pydict_set_item_string_decref(pyelem, "cycles",
>> -		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
>> +		    PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
>>   
>>   		thread__find_map_fb(thread, sample->cpumode,
>> -				    br->entries[i].from, &al);
>> +				    entries[i].from, &al);
>>   		dsoname = get_dsoname(al.map);
>>   		pydict_set_item_string_decref(pyelem, "from_dsoname",
>>   					      _PyUnicode_FromString(dsoname));
>>   
>>   		thread__find_map_fb(thread, sample->cpumode,
>> -				    br->entries[i].to, &al);
>> +				    entries[i].to, &al);
>>   		dsoname = get_dsoname(al.map);
>>   		pydict_set_item_string_decref(pyelem, "to_dsoname",
>>   					      _PyUnicode_FromString(dsoname));
>> @@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>>   					   struct thread *thread)
>>   {
>>   	struct branch_stack *br = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	PyObject *pylist;
>>   	u64 i;
>>   	char bf[512];
>> @@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>>   			Py_FatalError("couldn't create Python dictionary");
>>   
>>   		thread__find_symbol_fb(thread, sample->cpumode,
>> -				       br->entries[i].from, &al);
>> +				       entries[i].from, &al);
>>   		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>>   		pydict_set_item_string_decref(pyelem, "from",
>>   					      _PyUnicode_FromString(bf));
>>   
>>   		thread__find_symbol_fb(thread, sample->cpumode,
>> -				       br->entries[i].to, &al);
>> +				       entries[i].to, &al);
>>   		get_symoff(al.sym, &al, true, bf, sizeof(bf));
>>   		pydict_set_item_string_decref(pyelem, "to",
>>   					      _PyUnicode_FromString(bf));
>>   
>> -		get_br_mspred(&br->entries[i].flags, bf, sizeof(bf));
>> +		get_br_mspred(&entries[i].flags, bf, sizeof(bf));
>>   		pydict_set_item_string_decref(pyelem, "pred",
>>   					      _PyUnicode_FromString(bf));
>>   
>> -		if (br->entries[i].flags.in_tx) {
>> +		if (entries[i].flags.in_tx) {
>>   			pydict_set_item_string_decref(pyelem, "in_tx",
>>   					      _PyUnicode_FromString("X"));
>>   		} else {
>> @@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
>>   					      _PyUnicode_FromString("-"));
>>   		}
>>   
>> -		if (br->entries[i].flags.abort) {
>> +		if (entries[i].flags.abort) {
>>   			pydict_set_item_string_decref(pyelem, "abort",
>>   					      _PyUnicode_FromString("A"));
>>   		} else {
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index d0d7d25b23e3..dab985e3f136 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -1007,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>>   {
>>   	struct ip_callchain *callchain = sample->callchain;
>>   	struct branch_stack *lbr_stack = sample->branch_stack;
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	u64 kernel_callchain_nr = callchain->nr;
>>   	unsigned int i;
>>   
>> @@ -1043,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
>>   			       i, callchain->ips[i]);
>>   
>>   		printf("..... %2d: %016" PRIx64 "\n",
>> -		       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
>> +		       (int)(kernel_callchain_nr), entries[0].to);
>>   		for (i = 0; i < lbr_stack->nr; i++)
>>   			printf("..... %2d: %016" PRIx64 "\n",
>> -			       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
>> +			       (int)(i + kernel_callchain_nr + 1), entries[i].from);
>>   	}
>>   }
>>   
>> @@ -1068,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
>>   
>>   static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>>   {
>> +	struct branch_entry *entries = get_branch_entry(sample);
>>   	uint64_t i;
>>   
>>   	printf("%s: nr:%" PRIu64 "\n",
>> @@ -1075,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
>>   		sample->branch_stack->nr);
>>   
>>   	for (i = 0; i < sample->branch_stack->nr; i++) {
>> -		struct branch_entry *e = &sample->branch_stack->entries[i];
>> +		struct branch_entry *e = &entries[i];
>>   
>>   		if (!callstack) {
>>   			printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
>> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
>> index c423298fe62d..dd3e6f43fb86 100644
>> --- a/tools/perf/util/synthetic-events.c
>> +++ b/tools/perf/util/synthetic-events.c
>> @@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>>   
>>   	if (type & PERF_SAMPLE_BRANCH_STACK) {
>>   		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
>> -		sz += sizeof(u64);
>> +		/* nr, hw_idx */
>> +		sz += 2 * sizeof(u64);
>>   		result += sz;
>>   	}
>>   
>> @@ -1344,7 +1345,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>>   
>>   	if (type & PERF_SAMPLE_BRANCH_STACK) {
>>   		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
>> -		sz += sizeof(u64);
>> +		/* nr, hw_idx */
>> +		sz += 2 * sizeof(u64);
>>   		memcpy(array, sample->branch_stack, sz);
>>   		array = (void *)array + sz;
>>   	}
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip: perf/core] perf header: Add check for unexpected use of reserved membrs in event attr
  2020-02-28 16:30 ` [PATCH 03/12] perf header: Add check for event attr kan.liang
@ 2020-03-19 14:10   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-03-19 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Adrian Hunter, Alexey Budankov, Andi Kleen, Jiri Olsa,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     277ce1efa7b504873cd32a4106654836c2f80e1b
Gitweb:        https://git.kernel.org/tip/277ce1efa7b504873cd32a4106654836c2f80e1b
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 28 Feb 2020 08:30:02 -08:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 09 Mar 2020 21:43:24 -03:00

perf header: Add check for unexpected use of reserved membrs in event attr

The perf.data may be generated by a newer version of perf tool, which
support new input bits in attr, e.g. new bit for branch_sample_type.

The perf.data may be parsed by an older version of perf tool later.  The
old perf tool may parse the perf.data incorrectly. There is no warning
message for this case.

Current perf header never check for unknown input bits in attr.

When read the event desc from header, check the stored event attr.  The
reserved bits, sample type, read format and branch sample type will be
checked.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/header.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 4246e74..acbd046 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1590,6 +1590,40 @@ static void free_event_desc(struct evsel *events)
 	free(events);
 }
 
+static bool perf_attr_check(struct perf_event_attr *attr)
+{
+	if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) {
+		pr_warning("Reserved bits are set unexpectedly. "
+			   "Please update perf tool.\n");
+		return false;
+	}
+
+	if (attr->sample_type & ~(PERF_SAMPLE_MAX-1)) {
+		pr_warning("Unknown sample type (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->sample_type);
+		return false;
+	}
+
+	if (attr->read_format & ~(PERF_FORMAT_MAX-1)) {
+		pr_warning("Unknown read format (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->read_format);
+		return false;
+	}
+
+	if ((attr->sample_type & PERF_SAMPLE_BRANCH_STACK) &&
+	    (attr->branch_sample_type & ~(PERF_SAMPLE_BRANCH_MAX-1))) {
+		pr_warning("Unknown branch sample type (0x%llx) is detected. "
+			   "Please update perf tool.\n",
+			   attr->branch_sample_type);
+
+		return false;
+	}
+
+	return true;
+}
+
 static struct evsel *read_event_desc(struct feat_fd *ff)
 {
 	struct evsel *evsel, *events = NULL;
@@ -1634,6 +1668,9 @@ static struct evsel *read_event_desc(struct feat_fd *ff)
 
 		memcpy(&evsel->core.attr, buf, msz);
 
+		if (!perf_attr_check(&evsel->core.attr))
+			goto error;
+
 		if (do_read_u32(ff, &nr))
 			goto error;
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip: perf/core] perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX
  2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
  2020-03-05 20:25   ` Arnaldo Carvalho de Melo
@ 2020-03-19 14:10   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-03-19 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexey Budankov, Andi Kleen, Jiri Olsa, Mathieu Poirier,
	Michael Ellerman, Namhyung Kim, Pavel Gerasimov, Peter Zijlstra,
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     d3f85437ad6a55113882d730beaa75759452da8f
Gitweb:        https://git.kernel.org/tip/d3f85437ad6a55113882d730beaa75759452da8f
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 28 Feb 2020 08:30:01 -08:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 09 Mar 2020 21:43:24 -03:00

perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX

A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
in latest kernel.

Enable HW_INDEX by default in LBR call stack mode.

If kernel doesn't support the sample type, switching it off.

Add HW_INDEX in attr_fprintf as well. User can check whether the branch
sample type is set via debug information or header.

Committer testing:

First collect some samples with LBR callchains, system wide, for a few
seconds:

  # perf record --call-graph lbr -a sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
  #

Now lets use 'perf evlist -v' to look at the branch_sample_type:

  # perf evlist -v
  cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
  #

So the machine has the kernel feature, and it was correctly added to
perf_event_attr.branch_sample_type, for the default 'cycles' event.

If we do it in another machine, where the kernel lacks the HW_INDEX
feature, we get:

  # perf record --call-graph lbr -a sleep 2s
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ]
  # perf evlist -v
  cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
  #

No HW_INDEX in attr.branch_sample_type.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel.c                   | 15 ++++++++++++---
 tools/perf/util/evsel.h                   |  1 +
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 05883a4..816d930 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -712,7 +712,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel,
 				attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
 							PERF_SAMPLE_BRANCH_CALL_STACK |
 							PERF_SAMPLE_BRANCH_NO_CYCLES |
-							PERF_SAMPLE_BRANCH_NO_FLAGS;
+							PERF_SAMPLE_BRANCH_NO_FLAGS |
+							PERF_SAMPLE_BRANCH_HW_INDEX;
 			}
 		} else
 			 pr_warning("Cannot use LBR callstack with branch stack. "
@@ -763,7 +764,8 @@ perf_evsel__reset_callgraph(struct evsel *evsel,
 	if (param->record_mode == CALLCHAIN_LBR) {
 		perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
 		attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER |
-					      PERF_SAMPLE_BRANCH_CALL_STACK);
+					      PERF_SAMPLE_BRANCH_CALL_STACK |
+					      PERF_SAMPLE_BRANCH_HW_INDEX);
 	}
 	if (param->record_mode == CALLCHAIN_DWARF) {
 		perf_evsel__reset_sample_bit(evsel, REGS_USER);
@@ -1673,6 +1675,8 @@ fallback_missing_features:
 		evsel->core.attr.ksymbol = 0;
 	if (perf_missing_features.bpf)
 		evsel->core.attr.bpf_event = 0;
+	if (perf_missing_features.branch_hw_idx)
+		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->core.attr.sample_id_all = 0;
@@ -1784,7 +1788,12 @@ try_fallback:
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
+	if (!perf_missing_features.branch_hw_idx &&
+	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
+		perf_missing_features.branch_hw_idx = true;
+		pr_debug2("switching off branch HW index support\n");
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
 		perf_missing_features.aux_output = true;
 		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
 		goto out_close;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 99a0cb6..3380474 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -119,6 +119,7 @@ struct perf_missing_features {
 	bool ksymbol;
 	bool bpf;
 	bool aux_output;
+	bool branch_hw_idx;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 6512031..355d345 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -50,6 +50,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
 		bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX),
 		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
 		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
+		bit_name(HW_INDEX),
 		{ .name = NULL, }
 	};
 #undef bit_name

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip: perf/core] tools headers UAPI: Update tools's copy of linux/perf_event.h
  2020-03-04 13:49   ` Arnaldo Carvalho de Melo
  2020-03-04 15:45     ` Arnaldo Carvalho de Melo
@ 2020-03-19 14:10     ` tip-bot2 for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot2 for Arnaldo Carvalho de Melo @ 2020-03-19 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Adrian Hunter, Alexey Budankov, Andi Kleen, Jiri Olsa,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra (Intel),
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy,
	Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     6339998d22ecae5d6435dd87b4904ff6e16bfe56
Gitweb:        https://git.kernel.org/tip/6339998d22ecae5d6435dd87b4904ff6e16bfe56
Author:        Arnaldo Carvalho de Melo <acme@redhat.com>
AuthorDate:    Wed, 04 Mar 2020 10:50:58 -03:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 05 Mar 2020 11:01:11 -03:00

tools headers UAPI: Update tools's copy of linux/perf_event.h

To get the changes in:

  bbfd5e4fab63 ("perf/core: Add new branch sample type for HW index of raw branch records")

This silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

This update is a prerequisite to adding support for the HW index of raw
branch records.

Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200304134902.GB12612@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/include/uapi/linux/perf_event.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794..397cfd6 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -181,6 +181,8 @@ enum perf_branch_sample_type_shift {
 
 	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
 
+	PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT	= 17, /* save low level index of raw branch records */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -208,6 +210,8 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
 		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
 
+	PERF_SAMPLE_BRANCH_HW_INDEX	= 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
@@ -853,7 +857,9 @@ enum perf_event_type {
 	 *	  char                  data[size];}&& PERF_SAMPLE_RAW
 	 *
 	 *	{ u64                   nr;
-	 *        { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
+	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
+	 *        { u64 from, to, flags } lbr[nr];
+	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 * 	{ u64			abi; # enum perf_sample_regs_abi
 	 * 	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip: perf/core] perf tools: Add hw_idx in struct branch_stack
  2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
  2020-03-04 13:49   ` Arnaldo Carvalho de Melo
  2020-03-10  0:42   ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack Arnaldo Carvalho de Melo
@ 2020-03-19 14:10   ` tip-bot2 for Kan Liang
  2 siblings, 0 replies; 31+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-03-19 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Adrian Hunter, Alexey Budankov, Andi Kleen, Jiri Olsa,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     42bbabed09ce6208026648a71a45b4394c74585a
Gitweb:        https://git.kernel.org/tip/42bbabed09ce6208026648a71a45b4394c74585a
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 28 Feb 2020 08:30:00 -08:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 09 Mar 2020 21:42:53 -03:00

perf tools: Add hw_idx in struct branch_stack

The low level index of raw branch records for the most recent branch can
be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
branch_sample_type. Extend struct branch_stack to support it.

However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
entries[] will be output by kernel. The pointer of entries[] could be
wrong, since the output format is different with new struct
branch_stack.  Add a variable no_hw_idx in struct perf_sample to
indicate whether the hw_idx is output.  Add get_branch_entry() to return
corresponding pointer of entries[0].

To make dummy branch sample consistent as new branch sample, add hw_idx
in struct dummy_branch_stack for cs-etm and intel-pt.

Apply the new struct branch_stack for synthetic events as well.

Extend test case sample-parsing to support new struct branch_stack.

Committer notes:

Renamed get_branch_entries() to perf_sample__branch_entries() to have
proper namespacing and pave the way for this to be moved to libperf,
eventually.

Add 'static' to that inline as it is in a header.

Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
on arm64.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c                            | 70 ++++-----
 tools/perf/tests/sample-parsing.c                      |  7 +-
 tools/perf/util/branch.h                               | 22 +++-
 tools/perf/util/cs-etm.c                               |  2 +-
 tools/perf/util/event.h                                |  1 +-
 tools/perf/util/evsel.c                                |  5 +-
 tools/perf/util/evsel.h                                |  5 +-
 tools/perf/util/hist.c                                 |  3 +-
 tools/perf/util/intel-pt.c                             |  2 +-
 tools/perf/util/machine.c                              | 35 ++---
 tools/perf/util/scripting-engines/trace-event-python.c | 30 ++--
 tools/perf/util/session.c                              |  8 +-
 tools/perf/util/synthetic-events.c                     |  6 +-
 13 files changed, 125 insertions(+), 71 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e2406b2..656b347 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 					struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 		return 0;
 
 	for (i = 0; i < br->nr; i++) {
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		if (PRINT_FIELD(DSO)) {
 			memset(&alf, 0, sizeof(alf));
@@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample,
 		}
 
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str( br->entries + i),
-			br->entries[i].flags.in_tx? 'X' : '-',
-			br->entries[i].flags.abort? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 
 		memset(&alf, 0, sizeof(alf));
 		memset(&alt, 0, sizeof(alt));
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		thread__find_symbol_fb(thread, sample->cpumode, from, &alf);
 		thread__find_symbol_fb(thread, sample->cpumode, to, &alt);
@@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample,
 			printed += fprintf(fp, ")");
 		}
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str( br->entries + i),
-			br->entries[i].flags.in_tx? 'X' : '-',
-			br->entries[i].flags.abort? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 					   struct perf_event_attr *attr, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct addr_location alf, alt;
 	u64 i, from, to;
 	int printed = 0;
@@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 
 		memset(&alf, 0, sizeof(alf));
 		memset(&alt, 0, sizeof(alt));
-		from = br->entries[i].from;
-		to   = br->entries[i].to;
+		from = entries[i].from;
+		to   = entries[i].to;
 
 		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
 		    !alf.map->dso->adjust_symbols)
@@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 			printed += fprintf(fp, ")");
 		}
 		printed += fprintf(fp, "/%c/%c/%c/%d ",
-			mispred_str(br->entries + i),
-			br->entries[i].flags.in_tx ? 'X' : '-',
-			br->entries[i].flags.abort ? 'A' : '-',
-			br->entries[i].flags.cycles);
+			mispred_str(entries + i),
+			entries[i].flags.in_tx ? 'X' : '-',
+			entries[i].flags.abort ? 'A' : '-',
+			entries[i].flags.cycles);
 	}
 
 	return printed;
@@ -1053,6 +1056,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 					    struct machine *machine, FILE *fp)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	u64 start, end;
 	int i, insn, len, nr, ilen, printed = 0;
 	struct perf_insn x;
@@ -1073,31 +1077,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	printed += fprintf(fp, "%c", '\n');
 
 	/* Handle first from jump, of which we don't know the entry. */
-	len = grab_bb(buffer, br->entries[nr-1].from,
-			br->entries[nr-1].from,
+	len = grab_bb(buffer, entries[nr-1].from,
+			entries[nr-1].from,
 			machine, thread, &x.is64bit, &x.cpumode, false);
 	if (len > 0) {
-		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
+		printed += ip__fprintf_sym(entries[nr - 1].from, thread,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
-		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
+		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
 					    &x, buffer, len, 0, fp, &total_cycles);
 		if (PRINT_FIELD(SRCCODE))
-			printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from);
+			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
 	}
 
 	/* Print all blocks */
 	for (i = nr - 2; i >= 0; i--) {
-		if (br->entries[i].from || br->entries[i].to)
+		if (entries[i].from || entries[i].to)
 			pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i,
-				 br->entries[i].from,
-				 br->entries[i].to);
-		start = br->entries[i + 1].to;
-		end   = br->entries[i].from;
+				 entries[i].from,
+				 entries[i].to);
+		start = entries[i + 1].to;
+		end   = entries[i].from;
 
 		len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
 		/* Patch up missing kernel transfers due to ring filters */
 		if (len == -ENXIO && i > 0) {
-			end = br->entries[--i].from;
+			end = entries[--i].from;
 			pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end);
 			len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false);
 		}
@@ -1110,7 +1114,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
-				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp,
+				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
 							    &total_cycles);
 				if (PRINT_FIELD(SRCCODE))
 					printed += print_srccode(thread, x.cpumode, ip);
@@ -1134,9 +1138,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	 * Hit the branch? In this case we are already done, and the target
 	 * has not been executed yet.
 	 */
-	if (br->entries[0].from == sample->ip)
+	if (entries[0].from == sample->ip)
 		goto out;
-	if (br->entries[0].flags.abort)
+	if (entries[0].flags.abort)
 		goto out;
 
 	/*
@@ -1147,7 +1151,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	 * between final branch and sample. When this happens just
 	 * continue walking after the last TO until we hit a branch.
 	 */
-	start = br->entries[0].to;
+	start = entries[0].to;
 	end = sample->ip;
 	if (end < start) {
 		/* Missing jump. Scan 128 bytes for the next branch */
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 2762e11..14239e4 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1,
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		COMP(branch_stack->nr);
+		COMP(branch_stack->hw_idx);
 		for (i = 0; i < s1->branch_stack->nr; i++)
 			MCOMP(branch_stack->entries[i]);
 	}
@@ -186,7 +187,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		u64 data[64];
 	} branch_stack = {
 		/* 1 branch_entry */
-		.data = {1, 211, 212, 213},
+		.data = {1, -1ULL, 211, 212, 213},
 	};
 	u64 regs[64];
 	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
@@ -208,6 +209,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		.transaction	= 112,
 		.raw_data	= (void *)raw_data,
 		.callchain	= &callchain.callchain,
+		.no_hw_idx      = false,
 		.branch_stack	= &branch_stack.branch_stack,
 		.user_regs	= {
 			.abi	= PERF_SAMPLE_REGS_ABI_64,
@@ -244,6 +246,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 	if (sample_type & PERF_SAMPLE_REGS_INTR)
 		evsel.core.attr.sample_regs_intr = sample_regs;
 
+	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+		evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+
 	for (i = 0; i < sizeof(regs); i++)
 		*(i + (u8 *)regs) = i & 0xfe;
 
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 88e00d2..154a05c 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -12,6 +12,7 @@
 #include <linux/stddef.h>
 #include <linux/perf_event.h>
 #include <linux/types.h>
+#include "event.h"
 
 struct branch_flags {
 	u64 mispred:1;
@@ -39,9 +40,30 @@ struct branch_entry {
 
 struct branch_stack {
 	u64			nr;
+	u64			hw_idx;
 	struct branch_entry	entries[0];
 };
 
+/*
+ * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
+ * Otherwise, the output format of a sample with branch stack is
+ * struct branch_stack {
+ *	u64			nr;
+ *	struct branch_entry	entries[0];
+ * }
+ * Check whether the hw_idx is available,
+ * and return the corresponding pointer of entries[0].
+ */
+static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample)
+{
+	u64 *entry = (u64 *)sample->branch_stack;
+
+	entry++;
+	if (sample->no_hw_idx)
+		return (struct branch_entry *)entry;
+	return (struct branch_entry *)(++entry);
+}
+
 struct branch_type_stat {
 	bool	branch_to;
 	u64	counts[PERF_BR_MAX];
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 5471045..b3b3fe3 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1172,6 +1172,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	union perf_event *event = tidq->event_buf;
 	struct dummy_branch_stack {
 		u64			nr;
+		u64			hw_idx;
 		struct branch_entry	entries;
 	} dummy_bs;
 	u64 ip;
@@ -1202,6 +1203,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	if (etm->synth_opts.last_branch) {
 		dummy_bs = (struct dummy_branch_stack){
 			.nr = 1,
+			.hw_idx = -1ULL,
 			.entries = {
 				.from = sample.ip,
 				.to = sample.addr,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 8522315..3cda40a 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -139,6 +139,7 @@ struct perf_sample {
 	u16 insn_len;
 	u8  cpumode;
 	u16 misc;
+	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
 	char insn[MAX_INSN];
 	void *raw_data;
 	struct ip_callchain *callchain;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c8dc445..05883a4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2169,7 +2169,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 		if (data->branch_stack->nr > max_branch_nr)
 			return -EFAULT;
+
 		sz = data->branch_stack->nr * sizeof(struct branch_entry);
+		if (perf_evsel__has_branch_hw_idx(evsel))
+			sz += sizeof(u64);
+		else
+			data->no_hw_idx = true;
 		OVERFLOW_CHECK(array, sz, max_size);
 		array = (void *)array + sz;
 	}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a..99a0cb6 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -389,6 +389,11 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel)
 	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
 }
 
+static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel)
+{
+	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
+}
+
 static inline bool evsel__has_callchain(const struct evsel *evsel)
 {
 	return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ca5a8f4..e74a5ac 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2584,9 +2584,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			  u64 *total_cycles)
 {
 	struct branch_info *bi;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 
 	/* If we have branch cycles always annotate them. */
-	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+	if (bs && bs->nr && entries[0].flags.cycles) {
 		int i;
 
 		bi = sample__resolve_bstack(sample, al);
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 33cf892..23c8289 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1295,6 +1295,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	struct perf_sample sample = { .ip = 0, };
 	struct dummy_branch_stack {
 		u64			nr;
+		u64			hw_idx;
 		struct branch_entry	entries;
 	} dummy_bs;
 
@@ -1316,6 +1317,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
 		dummy_bs = (struct dummy_branch_stack){
 			.nr = 1,
+			.hw_idx = -1ULL,
 			.entries = {
 				.from = sample.ip,
 				.to = sample.addr,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fb5c2cd..fd14f14 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2081,15 +2081,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 {
 	unsigned int i;
 	const struct branch_stack *bs = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
 
 	if (!bi)
 		return NULL;
 
 	for (i = 0; i < bs->nr; i++) {
-		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
-		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
-		bi[i].flags = bs->entries[i].flags;
+		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
+		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
+		bi[i].flags = entries[i].flags;
 	}
 	return bi;
 }
@@ -2185,6 +2186,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	/* LBR only affects the user callchain */
 	if (i != chain_nr) {
 		struct branch_stack *lbr_stack = sample->branch_stack;
+		struct branch_entry *entries = perf_sample__branch_entries(sample);
 		int lbr_nr = lbr_stack->nr, j, k;
 		bool branch;
 		struct branch_flags *flags;
@@ -2210,31 +2212,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					ip = chain->ips[j];
 				else if (j > i + 1) {
 					k = j - i - 2;
-					ip = lbr_stack->entries[k].from;
+					ip = entries[k].from;
 					branch = true;
-					flags = &lbr_stack->entries[k].flags;
+					flags = &entries[k].flags;
 				} else {
-					ip = lbr_stack->entries[0].to;
+					ip = entries[0].to;
 					branch = true;
-					flags = &lbr_stack->entries[0].flags;
-					branch_from =
-						lbr_stack->entries[0].from;
+					flags = &entries[0].flags;
+					branch_from = entries[0].from;
 				}
 			} else {
 				if (j < lbr_nr) {
 					k = lbr_nr - j - 1;
-					ip = lbr_stack->entries[k].from;
+					ip = entries[k].from;
 					branch = true;
-					flags = &lbr_stack->entries[k].flags;
+					flags = &entries[k].flags;
 				}
 				else if (j > lbr_nr)
 					ip = chain->ips[i + 1 - (j - lbr_nr)];
 				else {
-					ip = lbr_stack->entries[0].to;
+					ip = entries[0].to;
 					branch = true;
-					flags = &lbr_stack->entries[0].flags;
-					branch_from =
-						lbr_stack->entries[0].from;
+					flags = &entries[0].flags;
+					branch_from = entries[0].from;
 				}
 			}
 
@@ -2281,6 +2281,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 					    int max_stack)
 {
 	struct branch_stack *branch = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
@@ -2328,7 +2329,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 		for (i = 0; i < nr; i++) {
 			if (callchain_param.order == ORDER_CALLEE) {
-				be[i] = branch->entries[i];
+				be[i] = entries[i];
 
 				if (chain == NULL)
 					continue;
@@ -2347,7 +2348,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 				    be[i].from >= chain->ips[first_call] - 8)
 					first_call++;
 			} else
-				be[i] = branch->entries[branch->nr - i - 1];
+				be[i] = entries[branch->nr - i - 1];
 		}
 
 		memset(iter, 0, sizeof(struct iterations) * nr);
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 80ca5d0..8c1b27c 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 					struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	PyObject *pylist;
 	u64 i;
 
@@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample,
 			Py_FatalError("couldn't create Python dictionary");
 
 		pydict_set_item_string_decref(pyelem, "from",
-		    PyLong_FromUnsignedLongLong(br->entries[i].from));
+		    PyLong_FromUnsignedLongLong(entries[i].from));
 		pydict_set_item_string_decref(pyelem, "to",
-		    PyLong_FromUnsignedLongLong(br->entries[i].to));
+		    PyLong_FromUnsignedLongLong(entries[i].to));
 		pydict_set_item_string_decref(pyelem, "mispred",
-		    PyBool_FromLong(br->entries[i].flags.mispred));
+		    PyBool_FromLong(entries[i].flags.mispred));
 		pydict_set_item_string_decref(pyelem, "predicted",
-		    PyBool_FromLong(br->entries[i].flags.predicted));
+		    PyBool_FromLong(entries[i].flags.predicted));
 		pydict_set_item_string_decref(pyelem, "in_tx",
-		    PyBool_FromLong(br->entries[i].flags.in_tx));
+		    PyBool_FromLong(entries[i].flags.in_tx));
 		pydict_set_item_string_decref(pyelem, "abort",
-		    PyBool_FromLong(br->entries[i].flags.abort));
+		    PyBool_FromLong(entries[i].flags.abort));
 		pydict_set_item_string_decref(pyelem, "cycles",
-		    PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
+		    PyLong_FromUnsignedLongLong(entries[i].flags.cycles));
 
 		thread__find_map_fb(thread, sample->cpumode,
-				    br->entries[i].from, &al);
+				    entries[i].from, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "from_dsoname",
 					      _PyUnicode_FromString(dsoname));
 
 		thread__find_map_fb(thread, sample->cpumode,
-				    br->entries[i].to, &al);
+				    entries[i].to, &al);
 		dsoname = get_dsoname(al.map);
 		pydict_set_item_string_decref(pyelem, "to_dsoname",
 					      _PyUnicode_FromString(dsoname));
@@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 					   struct thread *thread)
 {
 	struct branch_stack *br = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	PyObject *pylist;
 	u64 i;
 	char bf[512];
@@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 			Py_FatalError("couldn't create Python dictionary");
 
 		thread__find_symbol_fb(thread, sample->cpumode,
-				       br->entries[i].from, &al);
+				       entries[i].from, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "from",
 					      _PyUnicode_FromString(bf));
 
 		thread__find_symbol_fb(thread, sample->cpumode,
-				       br->entries[i].to, &al);
+				       entries[i].to, &al);
 		get_symoff(al.sym, &al, true, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "to",
 					      _PyUnicode_FromString(bf));
 
-		get_br_mspred(&br->entries[i].flags, bf, sizeof(bf));
+		get_br_mspred(&entries[i].flags, bf, sizeof(bf));
 		pydict_set_item_string_decref(pyelem, "pred",
 					      _PyUnicode_FromString(bf));
 
-		if (br->entries[i].flags.in_tx) {
+		if (entries[i].flags.in_tx) {
 			pydict_set_item_string_decref(pyelem, "in_tx",
 					      _PyUnicode_FromString("X"));
 		} else {
@@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample,
 					      _PyUnicode_FromString("-"));
 		}
 
-		if (br->entries[i].flags.abort) {
+		if (entries[i].flags.abort) {
 			pydict_set_item_string_decref(pyelem, "abort",
 					      _PyUnicode_FromString("A"));
 		} else {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25..055b00a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1007,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 {
 	struct ip_callchain *callchain = sample->callchain;
 	struct branch_stack *lbr_stack = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	u64 kernel_callchain_nr = callchain->nr;
 	unsigned int i;
 
@@ -1043,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 			       i, callchain->ips[i]);
 
 		printf("..... %2d: %016" PRIx64 "\n",
-		       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
+		       (int)(kernel_callchain_nr), entries[0].to);
 		for (i = 0; i < lbr_stack->nr; i++)
 			printf("..... %2d: %016" PRIx64 "\n",
-			       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
+			       (int)(i + kernel_callchain_nr + 1), entries[i].from);
 	}
 }
 
@@ -1068,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel,
 
 static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 {
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
 	uint64_t i;
 
 	printf("%s: nr:%" PRIu64 "\n",
@@ -1075,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 		sample->branch_stack->nr);
 
 	for (i = 0; i < sample->branch_stack->nr; i++) {
-		struct branch_entry *e = &sample->branch_stack->entries[i];
+		struct branch_entry *e = &entries[i];
 
 		if (!callstack) {
 			printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298..dd3e6f4 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1183,7 +1183,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
-		sz += sizeof(u64);
+		/* nr, hw_idx */
+		sz += 2 * sizeof(u64);
 		result += sz;
 	}
 
@@ -1344,7 +1345,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 
 	if (type & PERF_SAMPLE_BRANCH_STACK) {
 		sz = sample->branch_stack->nr * sizeof(struct branch_entry);
-		sz += sizeof(u64);
+		/* nr, hw_idx */
+		sz += 2 * sizeof(u64);
 		memcpy(array, sample->branch_stack, sz);
 		array = (void *)array + sz;
 	}

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-03-19 14:11 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 16:29 [PATCH 00/12] Stitch LBR call stack (Perf Tools) kan.liang
2020-02-28 16:30 ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack kan.liang
2020-03-04 13:49   ` Arnaldo Carvalho de Melo
2020-03-04 15:45     ` Arnaldo Carvalho de Melo
2020-03-04 16:07       ` Liang, Kan
2020-03-19 14:10     ` [tip: perf/core] tools headers UAPI: Update tools's copy of linux/perf_event.h tip-bot2 for Arnaldo Carvalho de Melo
2020-03-10  0:42   ` [PATCH 01/12] perf tools: Add hw_idx in struct branch_stack Arnaldo Carvalho de Melo
2020-03-10 12:53     ` Liang, Kan
2020-03-19 14:10   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 02/12] perf tools: Support PERF_SAMPLE_BRANCH_HW_INDEX kan.liang
2020-03-05 20:25   ` Arnaldo Carvalho de Melo
2020-03-05 21:02     ` Liang, Kan
2020-03-05 23:17       ` Arnaldo Carvalho de Melo
2020-03-19 14:10   ` [tip: perf/core] perf evsel: " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 03/12] perf header: Add check for event attr kan.liang
2020-03-19 14:10   ` [tip: perf/core] perf header: Add check for unexpected use of reserved membrs in " tip-bot2 for Kan Liang
2020-02-28 16:30 ` [PATCH 04/12] perf pmu: Add support for PMU capabilities kan.liang
2020-02-28 16:30 ` [PATCH 05/12] perf header: Support CPU " kan.liang
2020-02-28 16:30 ` [PATCH 06/12] perf machine: Refine the function for LBR call stack reconstruction kan.liang
2020-02-28 16:30 ` [PATCH 07/12] perf tools: Stitch LBR call stack kan.liang
2020-02-28 16:30 ` [PATCH 08/12] perf report: Add option to enable the LBR stitching approach kan.liang
2020-02-28 16:30 ` [PATCH 09/12] perf script: " kan.liang
2020-02-28 16:30 ` [PATCH 10/12] perf top: " kan.liang
2020-02-28 16:30 ` [PATCH 11/12] perf c2c: " kan.liang
2020-02-28 16:30 ` [PATCH 12/12] perf hist: Add fast path for duplicate entries check approach kan.liang
2020-03-04 13:33 ` [PATCH 00/12] Stitch LBR call stack (Perf Tools) Arnaldo Carvalho de Melo
2020-03-06  9:39 ` Jiri Olsa
2020-03-06 19:13   ` Liang, Kan
2020-03-06 20:06     ` Arnaldo Carvalho de Melo
2020-03-09 13:27     ` Arnaldo Carvalho de Melo
2020-03-09 13:42       ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).