LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
@ 2018-04-06 12:49 Alexey Budankov
  2018-04-06 15:31 ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Budankov @ 2018-04-06 12:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


Store user space frame-pointer value (BP register) into Perf trace 
on a sample for x86_64 process so the value becomes available when 
unwinding call stacks for functions gaining event samples.

Test executable for the example below was compiled with frame pointer 
support enabled:

g++ -o futex-fp -fpermissive --no-omit-frame-pointer futex.c

and profiled using:

tools/perf/perf record --user-regs=IP,SP,BP \
	-g --call-graph=dwarf,1024 -e cycles -- ./futex-fp

Output of

tools/perf/perf report -i perf.data --stdio 

demonstrates the effect of the patch change so before saving BP 
value on a sample we have several frames missing above main 
function frame:

# Samples: 138K of event 'cpu-cycles'
# Event count (approx.): 92713835335
#
# Children      Self  Command   Shared Object     Symbol                                        
# ........  ........  ........  ................  ..........................
#
    96.15%     0.72%  futex-fp  futex-fp          [.] main
            |          
            |--95.43%--main
            |          |          
            |          |--71.56%--syscall
            |          |          |          
            |          |          |--57.28%--entry_SYSCALL_64_after_hwframe
            |          |          |          |          
            |          |          |           --56.95%--do_syscall_64
            |          |          |                     |          
            |          |          |                      --55.77%--sys_futex

and after saving BP value on a sample we have expected 

	_start
	__libc_start_main 

frames unwound:

# Samples: 128K of event 'cpu-cycles'
# Event count (approx.): 85349981034
#
# Children      Self  Command   Shared Object     Symbol                                        
# ........  ........  ........  ................  ..................
#
    95.83%     0.00%  futex-fp  futex-fp          [.] _start
            |
==>         ---_start
==>            __libc_start_main
               main
               |          
               |--71.28%--syscall
               |          |          
               |          |--55.67%--entry_SYSCALL_64
               |          |          |          
               |          |           --55.40%--do_syscall_64
               |          |                     |          
               |          |                      --54.21%--sys_futex

perf test

 1: vmlinux symtab matches kallsyms                       : Ok
 2: Detect openat syscall event                           : Ok
 3: Detect openat syscall event on all cpus               : Ok
 4: Read samples using the mmap interface                 : Ok
 5: Test data source output                               : Ok
 6: Parse event definition strings                        : Ok
 7: Simple expression parser                              : Ok
 8: PERF_RECORD_* events & perf_sample fields             : Ok
 9: Parse perf pmu format                                 : Ok
10: DSO data read                                         : Ok
11: DSO data cache                                        : Ok
12: DSO data reopen                                       : Ok
13: Roundtrip evsel->name                                 : Ok
14: Parse sched tracepoints fields                        : Ok
15: syscalls:sys_enter_openat event fields                : Ok
16: Setup struct perf_event_attr                          : Skip
17: Match and link multiple hists                         : Ok
18: 'import perf' in python                               : FAILED!
19: Breakpoint overflow signal handler                    : Ok
20: Breakpoint overflow sampling                          : Ok
21: Breakpoint accounting                                 : Ok
22: Number of exit events of a simple workload            : Ok
23: Software clock events period values                   : Ok
24: Object code reading                                   : Ok
25: Sample parsing                                        : Ok
26: Use a dummy software event to keep tracking           : Ok
27: Parse with no sample_id_all bit set                   : Ok
28: Filter hist entries                                   : Ok
29: Lookup mmap thread                                    : Ok
30: Share thread mg                                       : Ok
31: Sort output of hist entries                           : Ok
32: Cumulate child hist entries                           : Ok
33: Track with sched_switch                               : Ok
34: Filter fds with revents mask in a fdarray             : Ok
35: Add fd to a fdarray, making it autogrow               : Ok
36: kmod_path__parse                                      : Ok
37: Thread map                                            : Ok
38: LLVM search and compile                               :
38.1: Basic BPF llvm compile                              : Skip
38.2: kbuild searching                                    : Skip
38.3: Compile source for BPF prologue generation          : Skip
38.4: Compile source for BPF relocation                   : Skip
39: Session topology                                      : Ok
40: BPF filter                                            :
40.1: Basic BPF filtering                                 : Skip
40.2: BPF pinning                                         : Skip
40.3: BPF prologue generation                             : Skip
40.4: BPF relocation checker                              : Skip
41: Synthesize thread map                                 : Ok
42: Remove thread map                                     : Ok
43: Synthesize cpu map                                    : Ok
44: Synthesize stat config                                : Ok
45: Synthesize stat                                       : Ok
46: Synthesize stat round                                 : Ok
47: Synthesize attr update                                : Ok
48: Event times                                           : Ok
49: Read backward ring buffer                             : Ok
50: Print cpu map                                         : Ok
51: Probe SDT events                                      : Ok
52: is_printable_array                                    : Ok
53: Print bitmap                                          : Ok
54: perf hooks                                            : Ok
55: builtin clang support                                 : Skip (not compiled in)
56: unit_number__scnprintf                                : Ok
57: mem2node                                              : Ok
58: x86 rdpmc                                             : Ok
59: Convert perf time to TSC                              : Ok
60: DWARF unwind                                          : Ok
61: x86 instruction decoder - new instructions            : Ok
62: Use vfs_getname probe to get syscall args filenames   : Ok
63: Add vfs_getname probe to get syscall args filenames   : Ok
64: Check open filename arg using perf trace + vfs_getname: Ok
65: probe libc's inet_pton & backtrace it with ping       : Ok

make: Entering directory '/root/abudanko/kernel/tip/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
- /root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP: cd . && make FEATURE_DUMP_COPY=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  feature-dump
cd . && make FEATURE_DUMP_COPY=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump
            make_clean_all_O: cd . && make clean all FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.EB4YTtRbxL DESTDIR=/tmp/tmp.IPhLgudoxW
                 make_tags_O: cd . && make tags FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.z8RttESyW2 DESTDIR=/tmp/tmp.6Gv8JZOYx3
           make_no_libperl_O: cd . && make NO_LIBPERL=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.0pRqgxEWLe DESTDIR=/tmp/tmp.ioqNtqIWn3
                 make_help_O: cd . && make help FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.wNsijVwTQH DESTDIR=/tmp/tmp.jz1T2nM2IK
              make_no_gtk2_O: cd . && make NO_GTK2=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.nfZdzepUbZ DESTDIR=/tmp/tmp.DQZlnzHRSX
           make_no_libnuma_O: cd . && make NO_LIBNUMA=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.87R85hXnHy DESTDIR=/tmp/tmp.evxqlvqF9Q
     make_util_pmu_bison_o_O: cd . && make util/pmu-bison.o FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.uZ2uxgsHfH DESTDIR=/tmp/tmp.gK94AuyCSw
            make_no_libelf_O: cd . && make NO_LIBELF=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.SbQvlckgpr DESTDIR=/tmp/tmp.BdicQuNJ4I
         make_no_libpython_O: cd . && make NO_LIBPYTHON=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.Hs8Fgw58sM DESTDIR=/tmp/tmp.O9xJ46ivch
          make_no_libaudit_O: cd . && make NO_LIBAUDIT=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.4J4C37PEcX DESTDIR=/tmp/tmp.17KyQdjlJr
          make_no_demangle_O: cd . && make NO_DEMANGLE=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.9T7d972Iq4 DESTDIR=/tmp/tmp.OmhdemgoD0
      make_with_babeltrace_O: cd . && make LIBBABELTRACE=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.YAAQ8BJQ47 DESTDIR=/tmp/tmp.ab6vWEz2BH
         make_no_libbionic_O: cd . && make NO_LIBBIONIC=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.PVn97i2GaD DESTDIR=/tmp/tmp.FtoNcYLuit
              make_no_newt_O: cd . && make NO_NEWT=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.vXXDkjPncA DESTDIR=/tmp/tmp.6C1TxNjEsK
               make_perf_o_O: cd . && make perf.o FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.41vLM9THKV DESTDIR=/tmp/tmp.iRNmuH6HhX
 make_install_prefix_slash_O: cd . && make install prefix=/tmp/krava/ FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.s7QM9fH857 DESTDIR=/tmp/tmp.5ksvTXOKFS
              make_install_O: cd . && make install FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.AtP0IWbb1q DESTDIR=/tmp/tmp.ALFitlnvys
       make_install_prefix_O: cd . && make install prefix=/tmp/krava FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.d7Fj07hY9t DESTDIR=/tmp/tmp.lJgh1rAlC8
                  make_doc_O: cd . && make doc FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.d8zNjEXSK9 DESTDIR=/tmp/tmp.h5deIYBlpA
              make_minimal_O: cd . && make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.kMK5kEhFxt DESTDIR=/tmp/tmp.gk2k2wXNhl
           make_util_map_o_O: cd . && make util/map.o FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.2OFaqPWKaX DESTDIR=/tmp/tmp.UISM6T2OtL
          make_no_auxtrace_O: cd . && make NO_AUXTRACE=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.8SKo9n8vWp DESTDIR=/tmp/tmp.6fwjE762L1
       make_with_clangllvm_O: cd . && make LIBCLANGLLVM=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.22nXtyLNUw DESTDIR=/tmp/tmp.iMc2ET9PEQ
         make_no_libunwind_O: cd . && make NO_LIBUNWIND=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.9D2ef8yWWg DESTDIR=/tmp/tmp.BigQsiYWgB
make_no_libdw_dwarf_unwind_O: cd . && make NO_LIBDW_DWARF_UNWIND=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.QvQVtsyo0U DESTDIR=/tmp/tmp.35obmHcXNO
                make_no_ui_O: cd . && make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.QkDnnb6nS0 DESTDIR=/tmp/tmp.UxLWSGZcAb
             make_no_slang_O: cd . && make NO_SLANG=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.whyhxkgyt5 DESTDIR=/tmp/tmp.Sk5P3IyCfK
            make_no_libbpf_O: cd . && make NO_LIBBPF=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.kZDK82cROR DESTDIR=/tmp/tmp.AuAmKVkSJM
                 make_pure_O: cd . && make FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.CnnvrI746k DESTDIR=/tmp/tmp.iagYJmBzuk
           make_no_scripts_O: cd . && make NO_LIBPYTHON=1 NO_LIBPERL=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.GdYxbYcpa6 DESTDIR=/tmp/tmp.l2Kdlb4nw6
          make_install_bin_O: cd . && make install-bin FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.K53lCwDwuc DESTDIR=/tmp/tmp.P054EgzUVW
                make_debug_O: cd . && make DEBUG=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.URb8OKvyPH DESTDIR=/tmp/tmp.lEmNtybd4Y
         make_no_backtrace_O: cd . && make NO_BACKTRACE=1 FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP -j8 O=/tmp/tmp.AYhs9jZJta DESTDIR=/tmp/tmp.3utydz1YeX
- /root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC: cd . && make FEATURE_DUMP_COPY=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC  LDFLAGS='-static' feature-dump
cd . && make FEATURE_DUMP_COPY=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump
               make_static_O: cd . && make LDFLAGS=-static FEATURES_DUMP=/root/abudanko/kernel/tip/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC -j8 O=/tmp/tmp.LUNSZNLI7T DESTDIR=/tmp/tmp.KEOWH9Dlum
OK
make: Leaving directory '/root/abudanko/kernel/tip/tools/perf'

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
  MAINTAINERS file lacks references to appropriate folks for reviewing 
  changes at arch/x86/kernel/perf_regs.c so probably it makes sense to 
  update the file as well in this respect. 
---
 arch/x86/kernel/perf_regs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index e47b2dbbdef3..9284048cf5b0 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 	 */
 	regs_user_copy->bx = -1;
 	regs_user_copy->bp = -1;
+	if (user_64bit_mode(user_regs)) {
+		/*
+		 * Store user space frame-pointer value on sample
+		 * to facilitate stack unwinding for cases when
+		 * user space x86_64 executable code has such
+		 * support enabled at compile time;
+		 */
+		regs_user_copy->bp = user_regs->bp;
+	}
 	regs_user_copy->r12 = -1;
 	regs_user_copy->r13 = -1;
 	regs_user_copy->r14 = -1;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-06 12:49 [PATCH v1]: perf/x86: store user space frame-pointer value on a sample Alexey Budankov
@ 2018-04-06 15:31 ` Andi Kleen
  2018-04-06 19:06   ` Alexey Budankov
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2018-04-06 15:31 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
> index e47b2dbbdef3..9284048cf5b0 100644
> --- a/arch/x86/kernel/perf_regs.c
> +++ b/arch/x86/kernel/perf_regs.c
> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>  	 */
>  	regs_user_copy->bx = -1;
>  	regs_user_copy->bp = -1;
> +	if (user_64bit_mode(user_regs)) {

Why is it 64bit only? Should work on 32bit too.

-Andi

> +		/*
> +		 * Store user space frame-pointer value on sample
> +		 * to facilitate stack unwinding for cases when
> +		 * user space x86_64 executable code has such
> +		 * support enabled at compile time;
> +		 */
> +		regs_user_copy->bp = user_regs->bp;
> +	}
>  	regs_user_copy->r12 = -1;
>  	regs_user_copy->r13 = -1;
>  	regs_user_copy->r14 = -1;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-06 15:31 ` Andi Kleen
@ 2018-04-06 19:06   ` Alexey Budankov
  2018-04-06 19:53     ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Budankov @ 2018-04-06 19:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On 06.04.2018 18:31, Andi Kleen wrote:
>> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
>> index e47b2dbbdef3..9284048cf5b0 100644
>> --- a/arch/x86/kernel/perf_regs.c
>> +++ b/arch/x86/kernel/perf_regs.c
>> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>>  	 */
>>  	regs_user_copy->bx = -1;
>>  	regs_user_copy->bp = -1;
>> +	if (user_64bit_mode(user_regs)) {
> 
> Why is it 64bit only? Should work on 32bit too.

bp register is a part of i386 syscall ABI 
(http://man7.org/linux/man-pages/man2/syscall.2.html) 
so not sure if it will make any sense for 32bit processes. 

-Alexey

> 
> -Andi
> 
>> +		/*
>> +		 * Store user space frame-pointer value on sample
>> +		 * to facilitate stack unwinding for cases when
>> +		 * user space x86_64 executable code has such
>> +		 * support enabled at compile time;
>> +		 */
>> +		regs_user_copy->bp = user_regs->bp;
>> +	}
>>  	regs_user_copy->r12 = -1;
>>  	regs_user_copy->r13 = -1;
>>  	regs_user_copy->r14 = -1;
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-06 19:06   ` Alexey Budankov
@ 2018-04-06 19:53     ` Andi Kleen
  2018-04-07  6:18       ` Alexey Budankov
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2018-04-06 19:53 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
> On 06.04.2018 18:31, Andi Kleen wrote:
> >> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
> >> index e47b2dbbdef3..9284048cf5b0 100644
> >> --- a/arch/x86/kernel/perf_regs.c
> >> +++ b/arch/x86/kernel/perf_regs.c
> >> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
> >>  	 */
> >>  	regs_user_copy->bx = -1;
> >>  	regs_user_copy->bp = -1;
> >> +	if (user_64bit_mode(user_regs)) {
> > 
> > Why is it 64bit only? Should work on 32bit too.
> 
> bp register is a part of i386 syscall ABI 
> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
> so not sure if it will make any sense for 32bit processes. 

Both 32bit and 64bit use the same frame pointer, if they
use frame pointer.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-06 19:53     ` Andi Kleen
@ 2018-04-07  6:18       ` Alexey Budankov
  2018-04-09  5:23         ` Alexey Budankov
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Budankov @ 2018-04-07  6:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On 06.04.2018 22:53, Andi Kleen wrote:
> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
>> On 06.04.2018 18:31, Andi Kleen wrote:
>>>> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
>>>> index e47b2dbbdef3..9284048cf5b0 100644
>>>> --- a/arch/x86/kernel/perf_regs.c
>>>> +++ b/arch/x86/kernel/perf_regs.c
>>>> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>>>>  	 */
>>>>  	regs_user_copy->bx = -1;
>>>>  	regs_user_copy->bp = -1;
>>>> +	if (user_64bit_mode(user_regs)) {
>>>
>>> Why is it 64bit only? Should work on 32bit too.
>>
>> bp register is a part of i386 syscall ABI 
>> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
>> so not sure if it will make any sense for 32bit processes. 
> 
> Both 32bit and 64bit use the same frame pointer, if they
> use frame pointer.

Well let me check the same scenario for 32bit binary.
If the issue exists for it too and is fixed by the exposing bp
then it is obviously worth this improvement.

-Alexey

> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-07  6:18       ` Alexey Budankov
@ 2018-04-09  5:23         ` Alexey Budankov
  2018-04-10 14:41           ` Alexey Budankov
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Budankov @ 2018-04-09  5:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On 07.04.2018 9:18, Alexey Budankov wrote:
> On 06.04.2018 22:53, Andi Kleen wrote:
>> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
>>> On 06.04.2018 18:31, Andi Kleen wrote:
>>>>> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
>>>>> index e47b2dbbdef3..9284048cf5b0 100644
>>>>> --- a/arch/x86/kernel/perf_regs.c
>>>>> +++ b/arch/x86/kernel/perf_regs.c
>>>>> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>>>>>  	 */
>>>>>  	regs_user_copy->bx = -1;
>>>>>  	regs_user_copy->bp = -1;
>>>>> +	if (user_64bit_mode(user_regs)) {
>>>>
>>>> Why is it 64bit only? Should work on 32bit too.
>>>
>>> bp register is a part of i386 syscall ABI 
>>> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
>>> so not sure if it will make any sense for 32bit processes. 
>>
>> Both 32bit and 64bit use the same frame pointer, if they
>> use frame pointer.
> 
> Well let me check the same scenario for 32bit binary.

Here is what I have when profiling 32bit process on the patched 64bit 
kernel w/o 32bit frame-pointer exposure:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file]
==> libc-2.26.so ! syscall + 0x26 - [unknown source file]
==> futex32-fp ! main + 0xba - [unknown source file]
==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file]

so stack is unwound till the top. However if I enable 32bit exposure 
then the stack looks like this:

vmlinux ! try_to_wake_up - [unknown source file]
vmlinux ! wake_up_q + 0x3e - [unknown source file]
vmlinux ! futex_wake + 0x141 - [unknown source file]
vmlinux ! do_futex + 0x49b - [unknown source file]
vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
==> [vdso] ! [vdso] + 0x1058 - [unknown source file]
==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file]

and x86_64 perf report --stdio shows this:

...
unwind: target platform=x86 is not supported
...
# Samples: 140K of event 'cycles'
# Event count (approx.): 93688193797
#
# Children      Self  Command     Shared Object     Symbol                                        
# ........  ........  ..........  ................  .........................
#
    86.00%    14.40%  futex32-fp  [kernel.vmlinux]  [k] entry_SYSENTER_compat
            |
            ---entry_SYSENTER_compat
               |          
                --71.60%--do_fast_syscall_32
                          |          
                          |--54.62%--compat_sys_futex
                          |          |          
                          |           --53.67%--do_futex

I am not sure it is worth exposing frame pointer for 32bit too.

-Alexey

> If the issue exists for it too and is fixed by the exposing bp
> then it is obviously worth this improvement.
> 
> -Alexey
> 
>>
>> -Andi
>>
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1]: perf/x86: store user space frame-pointer value on a sample
  2018-04-09  5:23         ` Alexey Budankov
@ 2018-04-10 14:41           ` Alexey Budankov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexey Budankov @ 2018-04-10 14:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-kernel

On 09.04.2018 8:23, Alexey Budankov wrote:
> On 07.04.2018 9:18, Alexey Budankov wrote:
>> On 06.04.2018 22:53, Andi Kleen wrote:
>>> On Fri, Apr 06, 2018 at 10:06:26PM +0300, Alexey Budankov wrote:
>>>> On 06.04.2018 18:31, Andi Kleen wrote:
>>>>>> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
>>>>>> index e47b2dbbdef3..9284048cf5b0 100644
>>>>>> --- a/arch/x86/kernel/perf_regs.c
>>>>>> +++ b/arch/x86/kernel/perf_regs.c
>>>>>> @@ -157,6 +157,15 @@ void perf_get_regs_user(struct perf_regs *regs_user,
>>>>>>  	 */
>>>>>>  	regs_user_copy->bx = -1;
>>>>>>  	regs_user_copy->bp = -1;
>>>>>> +	if (user_64bit_mode(user_regs)) {
>>>>>
>>>>> Why is it 64bit only? Should work on 32bit too.
>>>>
>>>> bp register is a part of i386 syscall ABI 
>>>> (http://man7.org/linux/man-pages/man2/syscall.2.html) 
>>>> so not sure if it will make any sense for 32bit processes. 
>>>
>>> Both 32bit and 64bit use the same frame pointer, if they
>>> use frame pointer.
>>
>> Well let me check the same scenario for 32bit binary.
> 
> Here is what I have when profiling 32bit process on the patched 64bit 
> kernel w/o 32bit frame-pointer exposure:
> 
> vmlinux ! try_to_wake_up - [unknown source file]
> vmlinux ! wake_up_q + 0x3e - [unknown source file]
> vmlinux ! futex_wake + 0x141 - [unknown source file]
> vmlinux ! do_futex + 0x49b - [unknown source file]
> vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
> vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
> vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
> ==> [vdso] ! __kernel_vsyscall + 0x8 - [unknown source file]
> ==> libc-2.26.so ! syscall + 0x26 - [unknown source file]
> ==> futex32-fp ! main + 0xba - [unknown source file]
> ==> libc-2.26.so ! __libc_start_main + 0xf2 - [unknown source file]
> 
> so stack is unwound till the top. However if I enable 32bit exposure 
> then the stack looks like this:
> 
> vmlinux ! try_to_wake_up - [unknown source file]
> vmlinux ! wake_up_q + 0x3e - [unknown source file]
> vmlinux ! futex_wake + 0x141 - [unknown source file]
> vmlinux ! do_futex + 0x49b - [unknown source file]
> vmlinux ! compat_SyS_futex + 0x123 - [unknown source file]
> vmlinux ! do_fast_syscall_32 + 0xb9 - [unknown source file]
> vmlinux ! entry_SYSENTER_compat + 0x7e - [unknown source file]
> ==> [vdso] ! [vdso] + 0x1058 - [unknown source file]
> ==> vmlinux ! [Skipped stack frame(s)] + 0x1 - [unknown source file]

Investigated more on that unwind failure case above and it turns out that
in case of system wide monitoring there may be several modules named equally 
but of different architecture, e.g. vdso like in that case above, so unwinding 
code needs to be smart enough to distinguish between the modules to choose 
proper one when walking stack on a sample. Well, lifting the restriction 
on the frame-pointer architecture looks reasonable.

In order to enable unwinding code for that mixed mode case above 
it is required to expose module architecture to unwinding code.

Thanks,
Alexey

> 
> and x86_64 perf report --stdio shows this:
> 
> ...
> unwind: target platform=x86 is not supported
> ...
> # Samples: 140K of event 'cycles'
> # Event count (approx.): 93688193797
> #
> # Children      Self  Command     Shared Object     Symbol                                        
> # ........  ........  ..........  ................  .........................
> #
>     86.00%    14.40%  futex32-fp  [kernel.vmlinux]  [k] entry_SYSENTER_compat
>             |
>             ---entry_SYSENTER_compat
>                |          
>                 --71.60%--do_fast_syscall_32
>                           |          
>                           |--54.62%--compat_sys_futex
>                           |          |          
>                           |           --53.67%--do_futex
> 
> I am not sure it is worth exposing frame pointer for 32bit too.
> 
> -Alexey
> 
>> If the issue exists for it too and is fixed by the exposing bp
>> then it is obviously worth this improvement.
>>
>> -Alexey
>>
>>>
>>> -Andi
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-10 14:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-06 12:49 [PATCH v1]: perf/x86: store user space frame-pointer value on a sample Alexey Budankov
2018-04-06 15:31 ` Andi Kleen
2018-04-06 19:06   ` Alexey Budankov
2018-04-06 19:53     ` Andi Kleen
2018-04-07  6:18       ` Alexey Budankov
2018-04-09  5:23         ` Alexey Budankov
2018-04-10 14:41           ` Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).