LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Riccardo Mancini <rickyman7@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Subject: Re: [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events
Date: Sun, 29 Aug 2021 23:59:41 +0200	[thread overview]
Message-ID: <YSwDTWsihFxn6f1E@krava> (raw)
In-Reply-To: <cover.1629454773.git.rickyman7@gmail.com>

On Fri, Aug 20, 2021 at 12:53:46PM +0200, Riccardo Mancini wrote:
> Changes in v3:
>  - improved separation of threadpool and threadpool_entry method
>  - replaced shared workqueue with per-thread workqueue. This should
>    improve the performance on big machines (Jiri noticed in his
>    experiments a significant performance degradation after 15 threads
>    with the shared queue).
>  - improved error reporting in both threadpool and workqueue
>  - added lazy spinup of threads in workqueue [9/15]
>  - added global workqueue [10/15]
>  - setup global workqueue in perf record, top and synthesize bench
>    [12-14/15] and used in in synthetic events


hi,
I ran the test again and there's still the slowdown,
adding the stats below

I'm doing the review and I noticed few strange things,
but so far nothing that would explain that

like I can see for 40 threads only 35 threads spawned,
need to check on that more

also I'll try run some tests for parallel_for > 1 to cut
down some of the workqueue code.. any tests on that?

jirka


---
new:                                                                                    old:
ell-r440-01 perf]# ./perf bench internals synthesize -t                                      [root@dell-r440-01 perf]# ./perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:                                                  # Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by                              Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:                                                                synthesizing events on CPU 0:
  Number of synthesis threads: 1                                                               Number of synthesis threads: 1
    Average synthesis took: 13970.400 usec (+- 339.216 usec)                                     Average synthesis took: 13563.700 usec (+- 348.354 usec)
    Average num. events: 2349.000 (+- 0.000)                                                     Average num. events: 2317.000 (+- 0.000)
    Average time per event 5.947 usec                                                            Average time per event 5.854 usec
  Number of synthesis threads: 2                                                               Number of synthesis threads: 2
    Average synthesis took: 15651.800 usec (+- 1612.798 usec)                                    Average synthesis took: 8433.600 usec (+- 83.725 usec)
    Average num. events: 2353.000 (+- 0.000)                                                     Average num. events: 2321.600 (+- 0.306)
    Average time per event 6.652 usec                                                            Average time per event 3.633 usec
  Number of synthesis threads: 3                                                               Number of synthesis threads: 3
    Average synthesis took: 12114.100 usec (+- 1208.208 usec)                                    Average synthesis took: 6716.200 usec (+- 16.889 usec)
    Average num. events: 2355.000 (+- 0.000)                                                     Average num. events: 2325.000 (+- 0.000)
    Average time per event 5.144 usec                                                            Average time per event 2.889 usec
  Number of synthesis threads: 4                                                               Number of synthesis threads: 4
    Average synthesis took: 9812.500 usec (+- 951.284 usec)                                      Average synthesis took: 5981.400 usec (+- 11.102 usec)
    Average num. events: 2357.000 (+- 0.000)                                                     Average num. events: 2323.000 (+- 0.000)
    Average time per event 4.163 usec                                                            Average time per event 2.575 usec
  Number of synthesis threads: 5                                                               Number of synthesis threads: 5
    Average synthesis took: 7338.300 usec (+- 661.620 usec)                                      Average synthesis took: 5538.800 usec (+- 12.990 usec)
    Average num. events: 2359.000 (+- 0.000)                                                     Average num. events: 2329.000 (+- 0.000)
    Average time per event 3.111 usec                                                            Average time per event 2.378 usec
  Number of synthesis threads: 6                                                               Number of synthesis threads: 6
    Average synthesis took: 7256.800 usec (+- 680.312 usec)                                      Average synthesis took: 5255.700 usec (+- 7.454 usec)
    Average num. events: 2361.000 (+- 0.000)                                                     Average num. events: 2331.000 (+- 0.000)
    Average time per event 3.074 usec                                                            Average time per event 2.255 usec
  Number of synthesis threads: 7                                                               Number of synthesis threads: 7
    Average synthesis took: 6119.600 usec (+- 479.409 usec)                                      Average synthesis took: 4836.200 usec (+- 8.132 usec)
    Average num. events: 2363.000 (+- 0.000)                                                     Average num. events: 2323.000 (+- 0.000)
    Average time per event 2.590 usec                                                            Average time per event 2.082 usec
  Number of synthesis threads: 8                                                               Number of synthesis threads: 8
    Average synthesis took: 5899.600 usec (+- 506.285 usec)                                      Average synthesis took: 4643.000 usec (+- 4.913 usec)
    Average num. events: 2365.000 (+- 0.000)                                                     Average num. events: 2335.000 (+- 0.000)
    Average time per event 2.495 usec                                                            Average time per event 1.988 usec
  Number of synthesis threads: 9                                                               Number of synthesis threads: 9
    Average synthesis took: 5459.100 usec (+- 431.725 usec)                                      Average synthesis took: 4526.600 usec (+- 5.207 usec)
    Average num. events: 2367.000 (+- 0.000)                                                     Average num. events: 2337.000 (+- 0.000)
    Average time per event 2.306 usec                                                            Average time per event 1.937 usec
  Number of synthesis threads: 10                                                              Number of synthesis threads: 10
    Average synthesis took: 4977.100 usec (+- 251.378 usec)                                      Average synthesis took: 4128.700 usec (+- 5.911 usec)
    Average num. events: 2369.000 (+- 0.000)                                                     Average num. events: 2327.800 (+- 0.533)
    Average time per event 2.101 usec                                                            Average time per event 1.774 usec
  Number of synthesis threads: 11                                                              Number of synthesis threads: 11
    Average synthesis took: 5428.700 usec (+- 513.409 usec)                                      Average synthesis took: 3890.800 usec (+- 15.051 usec)
    Average num. events: 2371.000 (+- 0.000)                                                     Average num. events: 2323.000 (+- 0.000)
    Average time per event 2.290 usec                                                            Average time per event 1.675 usec
  Number of synthesis threads: 12                                                              Number of synthesis threads: 12
    Average synthesis took: 5517.800 usec (+- 508.171 usec)                                      Average synthesis took: 3367.800 usec (+- 14.261 usec)
    Average num. events: 2373.000 (+- 0.000)                                                     Average num. events: 2343.000 (+- 0.000)
    Average time per event 2.325 usec                                                            Average time per event 1.437 usec
  Number of synthesis threads: 13                                                              Number of synthesis threads: 13
    Average synthesis took: 5279.500 usec (+- 432.819 usec)                                      Average synthesis took: 3974.300 usec (+- 12.437 usec)
    Average num. events: 2375.000 (+- 0.000)                                                     Average num. events: 2328.200 (+- 1.405)
    Average time per event 2.223 usec                                                            Average time per event 1.707 usec
  Number of synthesis threads: 14                                                              Number of synthesis threads: 14
    Average synthesis took: 4993.100 usec (+- 392.485 usec)                                      Average synthesis took: 4157.100 usec (+- 163.268 usec)
    Average num. events: 2377.000 (+- 0.000)                                                     Average num. events: 2319.800 (+- 0.533)
    Average time per event 2.101 usec                                                            Average time per event 1.792 usec
  Number of synthesis threads: 15                                                              Number of synthesis threads: 15
    Average synthesis took: 5584.700 usec (+- 379.862 usec)                                      Average synthesis took: 4065.700 usec (+- 25.656 usec)
    Average num. events: 2379.000 (+- 0.000)                                                     Average num. events: 2322.800 (+- 0.467)
    Average time per event 2.347 usec                                                            Average time per event 1.750 usec
  Number of synthesis threads: 16                                                              Number of synthesis threads: 16
    Average synthesis took: 5009.800 usec (+- 381.018 usec)                                      Average synthesis took: 4580.600 usec (+- 129.218 usec)
    Average num. events: 2381.000 (+- 0.000)                                                     Average num. events: 2324.800 (+- 0.200)
    Average time per event 2.104 usec                                                            Average time per event 1.970 usec
  Number of synthesis threads: 17                                                              Number of synthesis threads: 17
    Average synthesis took: 5543.300 usec (+- 376.064 usec)                                      Average synthesis took: 4089.700 usec (+- 54.096 usec)
    Average num. events: 2383.000 (+- 0.000)                                                     Average num. events: 2320.200 (+- 0.611)
    Average time per event 2.326 usec                                                            Average time per event 1.763 usec
  Number of synthesis threads: 18                                                              Number of synthesis threads: 18
    Average synthesis took: 5191.800 usec (+- 342.317 usec)                                      Average synthesis took: 4219.000 usec (+- 61.395 usec)
    Average num. events: 2385.000 (+- 0.000)                                                     Average num. events: 2323.000 (+- 0.516)
    Average time per event 2.177 usec                                                            Average time per event 1.816 usec
  Number of synthesis threads: 19                                                              Number of synthesis threads: 19
    Average synthesis took: 4647.000 usec (+- 273.303 usec)                                      Average synthesis took: 3998.800 usec (+- 49.221 usec)
    Average num. events: 2387.000 (+- 0.000)                                                     Average num. events: 2325.200 (+- 0.200)
    Average time per event 1.947 usec                                                            Average time per event 1.720 usec
  Number of synthesis threads: 20                                                              Number of synthesis threads: 20
    Average synthesis took: 4710.600 usec (+- 179.874 usec)                                      Average synthesis took: 3930.300 usec (+- 67.725 usec)
    Average num. events: 2389.000 (+- 0.000)                                                     Average num. events: 2319.000 (+- 0.000)
    Average time per event 1.972 usec                                                            Average time per event 1.695 usec
  Number of synthesis threads: 21                                                              Number of synthesis threads: 21
    Average synthesis took: 4959.100 usec (+- 318.519 usec)                                      Average synthesis took: 3696.400 usec (+- 30.953 usec)
    Average num. events: 2390.800 (+- 0.200)                                                     Average num. events: 2319.800 (+- 0.533)
    Average time per event 2.074 usec                                                            Average time per event 1.593 usec
  Number of synthesis threads: 22                                                              Number of synthesis threads: 22
    Average synthesis took: 4422.300 usec (+- 236.998 usec)                                      Average synthesis took: 3394.000 usec (+- 63.254 usec)
    Average num. events: 2392.800 (+- 0.200)                                                     Average num. events: 2319.000 (+- 0.000)
    Average time per event 1.848 usec                                                            Average time per event 1.464 usec
  Number of synthesis threads: 23                                                              Number of synthesis threads: 23
    Average synthesis took: 4640.800 usec (+- 245.604 usec)                                      Average synthesis took: 4091.100 usec (+- 134.320 usec)
    Average num. events: 2394.400 (+- 0.600)                                                     Average num. events: 2323.400 (+- 0.267)
    Average time per event 1.938 usec                                                            Average time per event 1.761 usec
  Number of synthesis threads: 24                                                              Number of synthesis threads: 24
    Average synthesis took: 4554.900 usec (+- 201.121 usec)                                      Average synthesis took: 3346.600 usec (+- 78.846 usec)
    Average num. events: 2395.800 (+- 0.854)                                                     Average num. events: 2321.000 (+- 0.667)
    Average time per event 1.901 usec                                                            Average time per event 1.442 usec
  Number of synthesis threads: 25                                                              Number of synthesis threads: 25
    Average synthesis took: 4668.300 usec (+- 248.254 usec)                                      Average synthesis took: 3794.300 usec (+- 191.158 usec)
    Average num. events: 2398.000 (+- 0.803)                                                     Average num. events: 2317.900 (+- 6.248)
    Average time per event 1.947 usec                                                            Average time per event 1.637 usec
  Number of synthesis threads: 26                                                              Number of synthesis threads: 26
    Average synthesis took: 4683.300 usec (+- 226.836 usec)                                      Average synthesis took: 3285.700 usec (+- 18.785 usec)
    Average num. events: 2399.000 (+- 1.265)                                                     Average num. events: 2317.100 (+- 6.198)
    Average time per event 1.952 usec                                                            Average time per event 1.418 usec
  Number of synthesis threads: 27                                                              Number of synthesis threads: 27
    Average synthesis took: 4590.300 usec (+- 158.000 usec)                                      Average synthesis took: 3604.600 usec (+- 35.487 usec)
    Average num. events: 2400.200 (+- 1.497)                                                     Average num. events: 2319.800 (+- 0.533)
    Average time per event 1.912 usec                                                            Average time per event 1.554 usec
  Number of synthesis threads: 28                                                              Number of synthesis threads: 28
    Average synthesis took: 4683.500 usec (+- 233.543 usec)                                      Average synthesis took: 3594.700 usec (+- 21.267 usec)
    Average num. events: 2402.400 (+- 1.688)                                                     Average num. events: 2319.200 (+- 0.200)
    Average time per event 1.950 usec                                                            Average time per event 1.550 usec
  Number of synthesis threads: 29                                                              Number of synthesis threads: 29
    Average synthesis took: 4830.700 usec (+- 235.730 usec)                                      Average synthesis took: 3531.700 usec (+- 15.935 usec)
    Average num. events: 2405.000 (+- 2.530)                                                     Average num. events: 2322.200 (+- 0.800)
    Average time per event 2.009 usec                                                            Average time per event 1.521 usec
  Number of synthesis threads: 30                                                              Number of synthesis threads: 30
    Average synthesis took: 4684.500 usec (+- 210.137 usec)                                      Average synthesis took: 3505.700 usec (+- 58.332 usec)
    Average num. events: 2407.600 (+- 2.495)                                                     Average num. events: 2315.100 (+- 5.900)
    Average time per event 1.946 usec                                                            Average time per event 1.514 usec
  Number of synthesis threads: 31                                                              Number of synthesis threads: 31
    Average synthesis took: 4823.300 usec (+- 213.480 usec)                                      Average synthesis took: 3431.100 usec (+- 42.022 usec)
    Average num. events: 2407.400 (+- 2.647)                                                     Average num. events: 2319.000 (+- 0.000)
    Average time per event 2.004 usec                                                            Average time per event 1.480 usec
  Number of synthesis threads: 32                                                              Number of synthesis threads: 32
    Average synthesis took: 4400.800 usec (+- 224.134 usec)                                      Average synthesis took: 3684.900 usec (+- 253.077 usec)
    Average num. events: 2407.400 (+- 2.544)                                                     Average num. events: 2319.200 (+- 0.200)
    Average time per event 1.828 usec                                                            Average time per event 1.589 usec
  Number of synthesis threads: 33                                                              Number of synthesis threads: 33
    Average synthesis took: 4452.600 usec (+- 231.034 usec)                                      Average synthesis took: 3233.000 usec (+- 24.035 usec)
    Average num. events: 2409.300 (+- 3.190)                                                     Average num. events: 2316.500 (+- 6.069)
    Average time per event 1.848 usec                                                            Average time per event 1.396 usec
  Number of synthesis threads: 34                                                              Number of synthesis threads: 34
    Average synthesis took: 4770.900 usec (+- 182.325 usec)                                      Average synthesis took: 3016.300 usec (+- 13.343 usec)
    Average num. events: 2411.200 (+- 3.032)                                                     Average num. events: 2322.800 (+- 0.200)
    Average time per event 1.979 usec                                                            Average time per event 1.299 usec
  Number of synthesis threads: 35                                                              Number of synthesis threads: 35
    Average synthesis took: 4442.800 usec (+- 248.017 usec)                                      Average synthesis took: 3246.700 usec (+- 71.765 usec)
    Average num. events: 2412.000 (+- 3.296)                                                     Average num. events: 2321.800 (+- 0.611)
    Average time per event 1.842 usec                                                            Average time per event 1.398 usec
  Number of synthesis threads: 36                                                              Number of synthesis threads: 36
    Average synthesis took: 5005.200 usec (+- 235.823 usec)                                      Average synthesis took: 3329.000 usec (+- 122.028 usec)
    Average num. events: 2410.400 (+- 2.750)                                                     Average num. events: 2310.800 (+- 8.133)
    Average time per event 2.077 usec                                                            Average time per event 1.441 usec
  Number of synthesis threads: 37                                                              Number of synthesis threads: 37
    Average synthesis took: 4654.000 usec (+- 208.838 usec)                                      Average synthesis took: 3011.600 usec (+- 46.026 usec)
    Average num. events: 2409.400 (+- 2.473)                                                     Average num. events: 2322.200 (+- 0.533)
    Average time per event 1.932 usec                                                            Average time per event 1.297 usec
  Number of synthesis threads: 38                                                              Number of synthesis threads: 38
    Average synthesis took: 4763.700 usec (+- 197.409 usec)                                      Average synthesis took: 3163.500 usec (+- 36.589 usec)
    Average num. events: 2406.200 (+- 2.462)                                                     Average num. events: 2319.000 (+- 0.000)
    Average time per event 1.980 usec                                                            Average time per event 1.364 usec
  Number of synthesis threads: 39                                                              Number of synthesis threads: 39
    Average synthesis took: 4333.100 usec (+- 194.456 usec)                                      Average synthesis took: 3170.900 usec (+- 30.538 usec)
    Average num. events: 2408.600 (+- 3.124)                                                     Average num. events: 2319.000 (+- 0.000)
    Average time per event 1.799 usec                                                            Average time per event 1.367 usec
  Number of synthesis threads: 40                                                              Number of synthesis threads: 40
    Average synthesis took: 4520.200 usec (+- 188.901 usec)                                      Average synthesis took: 3111.900 usec (+- 24.287 usec)
    Average num. events: 2409.600 (+- 3.184)                                                     Average num. events: 2307.600 (+- 7.600)
    Average time per event 1.876 usec                                                            Average time per event 1.349 usec


  parent reply	other threads:[~2021-08-29 21:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20 10:53 Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 01/15] perf workqueue: threadpool creation and destruction Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 02/15] perf tests: add test for workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 03/15] perf workqueue: add threadpool start and stop functions Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 04/15] perf workqueue: add threadpool execute and wait functions Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 05/15] tools: add sparse context/locking annotations in compiler-types.h Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 06/15] perf workqueue: introduce workqueue struct Riccardo Mancini
2021-08-24 19:27   ` Namhyung Kim
2021-08-31 16:13     ` Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 07/15] perf workqueue: implement worker thread and management Riccardo Mancini
2021-08-30  7:22   ` Jiri Olsa
2021-08-20 10:53 ` [RFC PATCH v3 08/15] perf workqueue: add queue_work and flush_workqueue functions Riccardo Mancini
2021-08-24 19:40   ` Namhyung Kim
2021-08-31 16:23     ` Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 09/15] perf workqueue: spinup threads when needed Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 10/15] perf workqueue: create global workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 11/15] perf workqueue: add utility to execute a for loop in parallel Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 12/15] perf record: setup global workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 13/15] perf top: " Riccardo Mancini
2021-08-20 10:54 ` [RFC PATCH v3 14/15] perf test/synthesis: " Riccardo Mancini
2021-08-20 10:54 ` [RFC PATCH v3 15/15] perf synthetic-events: use workqueue parallel_for Riccardo Mancini
2021-08-29 21:59 ` Jiri Olsa [this message]
2021-08-31 15:46   ` [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events Jiri Olsa
2021-08-31 16:57     ` Riccardo Mancini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSwDTWsihFxn6f1E@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=alexey.v.bayduraev@linux.intel.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rickyman7@gmail.com \
    --subject='Re: [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).