Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [fs] 936e92b615: unixbench.score 32.3% improvement
[not found] <20200708072346.GL3874@shao2-debian>
@ 2020-07-13 8:49 ` Shaokun Zhang
0 siblings, 0 replies; only message in thread
From: Shaokun Zhang @ 2020-07-13 8:49 UTC (permalink / raw)
To: kernel test robot
Cc: linux-fsdevel, linux-kernel, Will Deacon, Mark Rutland,
Peter Zijlstra, Alexander Viro, Boqun Feng, Yuqi Jin, lkp
Hi maintainers,
This issue is debugged on Huawei Kunpeng 920 which is an ARM64 platform and we also do more tests
on x86 platform.
Since Rong has also reported the improvement on x86,it seems necessary for us to do it.
Any comments on it?
Thanks,
Shaokun
在 2020/7/8 15:23, kernel test robot 写道:
> Greeting,
>
> FYI, we noticed a 32.3% improvement of unixbench.score due to commit:
>
>
> commit: 936e92b615e212d08eb74951324bef25ba564c34 ("[PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode")
> url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Move-f_count-to-different-cacheline-with-f_mode/20200624-163511
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e857ce6eae7ca21b2055cca4885545e29228fe2
>
> in testcase: unixbench
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 30%
> test: syscall
> cpufreq_governor: performance
> ucode: 0x5002f01
>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
> gcc-9/performance/x86_64-rhel-7.6/30%/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap3/syscall/unixbench/0x5002f01
>
> commit:
> 5e857ce6ea ("Merge branch 'hch' (maccess patches from Christoph Hellwig)")
> 936e92b615 ("fs: Move @f_count to different cacheline with @f_mode")
>
> 5e857ce6eae7ca21 936e92b615e212d08eb74951324
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2297 ± 2% +32.3% 3038 unixbench.score
> 171.74 +34.8% 231.55 unixbench.time.user_time
> 1.366e+09 +32.6% 1.812e+09 unixbench.workload
> 26472 ± 6% +1270.0% 362665 ±158% cpuidle.C1.usage
> 0.25 ± 2% +0.1 0.33 mpstat.cpu.all.usr%
> 8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock.stddev
> 8.32 ± 43% +129.7% 19.12 ± 63% sched_debug.cpu.clock_task.stddev
> 2100 ± 2% -15.6% 1772 ± 9% sched_debug.cpu.nr_switches.min
> 373.34 ± 3% +12.4% 419.48 ± 6% sched_debug.cpu.ttwu_local.stddev
> 2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_inactive_anon
> 3139 ± 8% -69.9% 946.25 ± 97% numa-vmstat.node0.nr_shmem
> 2740 ± 12% -72.3% 757.75 ±105% numa-vmstat.node0.nr_zone_inactive_anon
> 373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_inactive_anon
> 496.00 ± 19% +366.1% 2311 ± 29% numa-vmstat.node2.nr_shmem
> 373.75 ± 51% +443.3% 2030 ± 26% numa-vmstat.node2.nr_zone_inactive_anon
> 13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_active_anon
> 78558 +11.3% 87431 ± 6% numa-vmstat.node3.nr_file_pages
> 9939 ± 8% +19.7% 11902 ± 13% numa-vmstat.node3.nr_shmem
> 13728 ± 13% +148.1% 34056 ± 46% numa-vmstat.node3.nr_zone_active_anon
> 11103 ± 13% -71.2% 3201 ± 99% numa-meminfo.node0.Inactive
> 10962 ± 12% -72.3% 3032 ±105% numa-meminfo.node0.Inactive(anon)
> 8551 ± 31% -29.4% 6034 ± 18% numa-meminfo.node0.Mapped
> 12560 ± 8% -69.9% 3786 ± 97% numa-meminfo.node0.Shmem
> 1596 ± 51% +415.6% 8230 ± 24% numa-meminfo.node2.Inactive
> 1496 ± 51% +442.8% 8122 ± 26% numa-meminfo.node2.Inactive(anon)
> 1984 ± 19% +366.1% 9248 ± 29% numa-meminfo.node2.Shmem
> 54929 ± 13% +148.0% 136212 ± 46% numa-meminfo.node3.Active
> 54929 ± 13% +148.0% 136206 ± 46% numa-meminfo.node3.Active(anon)
> 314216 +11.3% 349697 ± 6% numa-meminfo.node3.FilePages
> 747907 ± 2% +15.2% 861672 ± 9% numa-meminfo.node3.MemUsed
> 39744 ± 8% +19.7% 47580 ± 13% numa-meminfo.node3.Shmem
> 13.94 ± 6% -13.9 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +0.7 0.66 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_umask.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 31.64 ± 8% +3.4 35.08 ± 5% perf-profile.calltrace.cycles-pp.__fget_files.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.82 ± 8% +5.6 12.41 ± 12% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 23.54 ± 58% +12.7 36.27 ± 5% perf-profile.calltrace.cycles-pp.ksys_dup.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 23.54 ± 58% +12.7 36.29 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_dup.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.98 ± 6% -14.0 0.00 perf-profile.children.cycles-pp.dnotify_flush
> 39.81 ± 6% -10.8 28.96 ± 9% perf-profile.children.cycles-pp.filp_close
> 40.13 ± 6% -10.7 29.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_close
> 0.15 ± 10% -0.0 0.13 ± 8% perf-profile.children.cycles-pp.scheduler_tick
> 0.05 ± 8% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.__x64_sys_getuid
> 0.10 ± 7% +0.0 0.12 ± 8% perf-profile.children.cycles-pp.__prepare_exit_to_usermode
> 0.44 ± 7% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 31.78 ± 8% +3.4 35.22 ± 5% perf-profile.children.cycles-pp.__fget_files
> 32.52 ± 8% +3.7 36.27 ± 5% perf-profile.children.cycles-pp.ksys_dup
> 32.54 ± 8% +3.8 36.30 ± 5% perf-profile.children.cycles-pp.__x64_sys_dup
> 6.86 ± 7% +5.6 12.45 ± 12% perf-profile.children.cycles-pp.fput_many
> 13.91 ± 6% -13.9 0.00 perf-profile.self.cycles-pp.dnotify_flush
> 18.05 ± 5% -1.6 16.41 ± 7% perf-profile.self.cycles-pp.filp_close
> 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__prepare_exit_to_usermode
> 0.09 ± 9% +0.0 0.11 ± 7% perf-profile.self.cycles-pp.do_syscall_64
> 0.16 ± 9% +0.0 0.20 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.30 ± 8% +0.1 0.36 ± 7% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.44 ± 7% +0.1 0.56 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret
> 31.61 ± 8% +3.4 35.00 ± 5% perf-profile.self.cycles-pp.__fget_files
> 6.81 ± 7% +5.6 12.38 ± 12% perf-profile.self.cycles-pp.fput_many
> 36623 ± 3% +11.5% 40822 ± 7% softirqs.CPU100.SCHED
> 16499 ± 40% +27.8% 21088 ± 35% softirqs.CPU122.RCU
> 16758 ± 41% +30.0% 21781 ± 35% softirqs.CPU126.RCU
> 178.25 ± 11% +7718.2% 13936 ±168% softirqs.CPU13.NET_RX
> 40883 ± 4% -6.9% 38055 ± 2% softirqs.CPU132.SCHED
> 16029 ± 41% +35.9% 21789 ± 33% softirqs.CPU144.RCU
> 16220 ± 43% +32.4% 21484 ± 35% softirqs.CPU145.RCU
> 16393 ± 39% +29.9% 21301 ± 32% softirqs.CPU146.RCU
> 16217 ± 39% +29.8% 21055 ± 35% softirqs.CPU147.RCU
> 37011 ± 12% +12.4% 41589 ± 5% softirqs.CPU149.SCHED
> 16127 ± 41% +34.5% 21685 ± 34% softirqs.CPU150.RCU
> 16131 ± 41% +32.3% 21333 ± 35% softirqs.CPU151.RCU
> 16558 ± 37% +28.2% 21230 ± 34% softirqs.CPU152.RCU
> 15863 ± 40% +34.1% 21266 ± 32% softirqs.CPU153.RCU
> 16044 ± 41% +32.7% 21286 ± 34% softirqs.CPU154.RCU
> 16057 ± 40% +34.9% 21658 ± 33% softirqs.CPU155.RCU
> 16352 ± 39% +31.0% 21423 ± 33% softirqs.CPU156.RCU
> 16006 ± 39% +33.4% 21348 ± 32% softirqs.CPU158.RCU
> 16300 ± 41% +32.0% 21521 ± 34% softirqs.CPU161.RCU
> 37546 ± 4% +13.5% 42605 ± 3% softirqs.CPU161.SCHED
> 16411 ± 41% +33.4% 21894 ± 33% softirqs.CPU162.RCU
> 16329 ± 41% +32.9% 21704 ± 35% softirqs.CPU163.RCU
> 16517 ± 39% +29.8% 21441 ± 34% softirqs.CPU164.RCU
> 16227 ± 41% +32.3% 21471 ± 34% softirqs.CPU165.RCU
> 16347 ± 40% +31.4% 21481 ± 35% softirqs.CPU166.RCU
> 16360 ± 43% +32.2% 21631 ± 35% softirqs.CPU167.RCU
> 36986 +11.3% 41148 ± 6% softirqs.CPU167.SCHED
> 16218 ± 44% +34.7% 21843 ± 33% softirqs.CPU189.RCU
> 16501 ± 39% +32.0% 21783 ± 33% softirqs.CPU52.RCU
> 17101 ± 41% +29.4% 22121 ± 35% softirqs.CPU68.RCU
> 1.087e+09 +20.9% 1.314e+09 perf-stat.i.branch-instructions
> 19778787 +22.1% 24144895 ± 16% perf-stat.i.branch-misses
> 22.88 -17.7% 18.84 ± 2% perf-stat.i.cpi
> 1.635e+09 +23.6% 2.021e+09 perf-stat.i.dTLB-loads
> 20648 ± 2% +218.4% 65736 ±110% perf-stat.i.dTLB-store-misses
> 1.023e+09 +24.8% 1.276e+09 perf-stat.i.dTLB-stores
> 78.10 +1.4 79.54 perf-stat.i.iTLB-load-miss-rate%
> 16169669 +8.2% 17493234 perf-stat.i.iTLB-load-misses
> 5.364e+09 +21.3% 6.507e+09 perf-stat.i.instructions
> 369.33 +11.8% 413.03 ± 5% perf-stat.i.instructions-per-iTLB-miss
> 0.41 ± 2% +83.3% 0.76 ± 16% perf-stat.i.metric.K/sec
> 19.79 +23.2% 24.39 perf-stat.i.metric.M/sec
> 4460149 ± 2% -45.1% 2447884 ± 14% perf-stat.i.node-load-misses
> 241219 ± 2% -58.8% 99443 ± 47% perf-stat.i.node-loads
> 1679821 ± 2% -4.4% 1605611 ± 3% perf-stat.i.node-store-misses
> 25.91 -17.6% 21.36 perf-stat.overall.cpi
> 82.51 +1.7 84.17 perf-stat.overall.iTLB-load-miss-rate%
> 331.21 +12.2% 371.62 perf-stat.overall.instructions-per-iTLB-miss
> 0.04 +21.3% 0.05 perf-stat.overall.ipc
> 1566 -8.4% 1435 perf-stat.overall.path-length
> 1.089e+09 +21.0% 1.318e+09 perf-stat.ps.branch-instructions
> 19801099 +21.7% 24102537 ± 15% perf-stat.ps.branch-misses
> 1.641e+09 +23.6% 2.028e+09 perf-stat.ps.dTLB-loads
> 20512 ± 2% +212.7% 64142 ±109% perf-stat.ps.dTLB-store-misses
> 1.027e+09 +24.8% 1.282e+09 perf-stat.ps.dTLB-stores
> 16239916 +8.2% 17567773 perf-stat.ps.iTLB-load-misses
> 5.378e+09 +21.4% 6.527e+09 perf-stat.ps.instructions
> 4485062 ± 2% -45.2% 2458026 ± 14% perf-stat.ps.node-load-misses
> 242388 ± 2% -59.0% 99493 ± 47% perf-stat.ps.node-loads
> 1689890 ± 2% -4.5% 1614182 ± 3% perf-stat.ps.node-store-misses
> 2.139e+12 +21.5% 2.6e+12 perf-stat.total.instructions
> 288.00 ± 13% +8910.9% 25951 ±168% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
> 2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.NMI:Non-maskable_interrupts
> 2042 ± 57% +190.2% 5927 ± 26% interrupts.CPU1.PMI:Performance_monitoring_interrupts
> 3.75 ± 34% +2373.3% 92.75 ±130% interrupts.CPU100.TLB:TLB_shootdowns
> 3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.NMI:Non-maskable_interrupts
> 3510 ± 88% -85.1% 522.00 ±124% interrupts.CPU107.PMI:Performance_monitoring_interrupts
> 3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.NMI:Non-maskable_interrupts
> 3813 ± 74% -73.3% 1018 ±150% interrupts.CPU110.PMI:Performance_monitoring_interrupts
> 4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.NMI:Non-maskable_interrupts
> 4536 ± 51% -97.1% 131.50 ± 8% interrupts.CPU111.PMI:Performance_monitoring_interrupts
> 4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.NMI:Non-maskable_interrupts
> 4476 ± 47% -97.5% 113.00 ± 19% interrupts.CPU112.PMI:Performance_monitoring_interrupts
> 3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.NMI:Non-maskable_interrupts
> 3522 ± 36% +92.7% 6787 ± 16% interrupts.CPU120.PMI:Performance_monitoring_interrupts
> 2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.NMI:Non-maskable_interrupts
> 2888 ± 66% +117.5% 6283 ± 21% interrupts.CPU123.PMI:Performance_monitoring_interrupts
> 3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.NMI:Non-maskable_interrupts
> 3109 ± 61% +132.5% 7230 ± 7% interrupts.CPU124.PMI:Performance_monitoring_interrupts
> 1067 ± 19% -21.6% 836.50 interrupts.CPU125.CAL:Function_call_interrupts
> 288.00 ± 13% +8910.9% 25951 ±168% interrupts.CPU13.34:PCI-MSI.524292-edge.eth0-TxRx-3
> 244.25 ± 96% -95.3% 11.50 ± 95% interrupts.CPU13.TLB:TLB_shootdowns
> 2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.NMI:Non-maskable_interrupts
> 2056 ±117% +206.3% 6298 ± 20% interrupts.CPU130.PMI:Performance_monitoring_interrupts
> 831.50 +21.4% 1009 ± 13% interrupts.CPU133.CAL:Function_call_interrupts
> 8.00 ± 29% +634.4% 58.75 ±119% interrupts.CPU133.RES:Rescheduling_interrupts
> 1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.NMI:Non-maskable_interrupts
> 1629 ±159% +265.3% 5952 ± 29% interrupts.CPU139.PMI:Performance_monitoring_interrupts
> 1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.NMI:Non-maskable_interrupts
> 1660 ±159% +161.0% 4332 ± 61% interrupts.CPU141.PMI:Performance_monitoring_interrupts
> 882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.NMI:Non-maskable_interrupts
> 882.75 ±147% +542.5% 5671 ± 38% interrupts.CPU143.PMI:Performance_monitoring_interrupts
> 2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.NMI:Non-maskable_interrupts
> 2600 ± 29% +68.8% 4389 ± 47% interrupts.CPU144.PMI:Performance_monitoring_interrupts
> 1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.NMI:Non-maskable_interrupts
> 1494 ± 20% +91.3% 2859 ± 29% interrupts.CPU147.PMI:Performance_monitoring_interrupts
> 3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.NMI:Non-maskable_interrupts
> 3657 ± 54% -96.3% 133.75 ± 8% interrupts.CPU15.PMI:Performance_monitoring_interrupts
> 5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.NMI:Non-maskable_interrupts
> 5165 ± 40% -97.8% 115.00 ± 26% interrupts.CPU16.PMI:Performance_monitoring_interrupts
> 34.00 ±125% -84.6% 5.25 ± 49% interrupts.CPU186.RES:Rescheduling_interrupts
> 1033 ± 24% -19.0% 836.75 interrupts.CPU190.CAL:Function_call_interrupts
> 68.00 ± 28% +55.5% 105.75 ± 9% interrupts.CPU26.RES:Rescheduling_interrupts
> 882.25 ± 4% +6.3% 937.75 ± 7% interrupts.CPU32.CAL:Function_call_interrupts
> 139.25 ± 96% -74.0% 36.25 ± 72% interrupts.CPU32.TLB:TLB_shootdowns
> 848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.NMI:Non-maskable_interrupts
> 848.25 ±130% +368.9% 3977 ± 56% interrupts.CPU35.PMI:Performance_monitoring_interrupts
> 958.25 ± 11% -10.6% 856.75 interrupts.CPU36.CAL:Function_call_interrupts
> 1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.NMI:Non-maskable_interrupts
> 1903 ± 72% +127.9% 4337 ± 23% interrupts.CPU41.PMI:Performance_monitoring_interrupts
> 1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.NMI:Non-maskable_interrupts
> 1320 ±158% +245.4% 4560 ± 32% interrupts.CPU47.PMI:Performance_monitoring_interrupts
> 837.50 +5.2% 881.25 ± 4% interrupts.CPU61.CAL:Function_call_interrupts
> 1074 ± 28% -22.1% 836.50 interrupts.CPU69.CAL:Function_call_interrupts
> 1042 ± 12% -18.7% 847.50 ± 2% interrupts.CPU86.CAL:Function_call_interrupts
>
>
>
> unixbench.score
>
> 3200 +--------------------------------------------------------------------+
> | O O O |
> 3000 |-+ O O O O O O O O O |
> | O O O O |
> | O |
> 2800 |-+ |
> | |
> 2600 |-+ |
> | |
> 2400 |-+ |
> | +.+.. .+.+..+. +..+. .+. .+. .+..+.+.+..+.+.+. .+.|
> |.+.. + .+ +.+..+. + + +. + +. |
> 2200 |-+ + + + |
> | |
> 2000 +--------------------------------------------------------------------+
>
>
> unixbench.workload
>
> 1.9e+09 +-----------------------------------------------------------------+
> | O O O O |
> 1.8e+09 |-+ O O O O O O O O |
> | O O O O O |
> 1.7e+09 |-+ |
> | |
> 1.6e+09 |-+ |
> | |
> 1.5e+09 |-+ |
> | |
> 1.4e+09 |-+ +.+ .+..+.+ +.+. .+.. .+. .+..+. .+. .+.. .|
> |.+. .. : + + .+.+.. + + + +.+ + + +.+ |
> 1.3e+09 |-+ + : + + + |
> | + |
> 1.2e+09 +-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen
>
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-07-13 8:49 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20200708072346.GL3874@shao2-debian>
2020-07-13 8:49 ` [fs] 936e92b615: unixbench.score 32.3% improvement Shaokun Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).