LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Daniel Lezcano <firstname.lastname@example.org>
To: Morten Rasmussen <email@example.com>
Cc: Vincent Guittot <firstname.lastname@example.org>,
Peter Zijlstra <email@example.com>,
Ingo Molnar <firstname.lastname@example.org>,
Juri Lelli <email@example.com>,
Dietmar Eggemann <firstname.lastname@example.org>,
Steven Rostedt <email@example.com>,
Ben Segall <firstname.lastname@example.org>,
Qais Yousef <email@example.com>,
Valentin Schneider <firstname.lastname@example.org>
Subject: Re: [PATCH V2] sched: fair: Use the earliest break even
Date: Tue, 17 Mar 2020 18:07:43 +0100 [thread overview]
Message-ID: <email@example.com> (raw)
On 17/03/2020 15:30, Morten Rasmussen wrote:
> On Tue, Mar 17, 2020 at 02:48:51PM +0100, Daniel Lezcano wrote:
>> Hi Morten,
>> On 17/03/2020 08:56, Morten Rasmussen wrote:
>>> Hi Daniel,
>>> First, I think letting the scheduler know about desired minimum idle
>>> times is an interesting optimization if the overhead can be kept at a
>>> minimum. I do have a few comments about the patch though.
>>> On Thu, Mar 12, 2020 at 11:04:19AM +0100, Daniel Lezcano wrote:
>>>> On 12/03/2020 09:36, Vincent Guittot wrote:
>>>>> Hi Daniel,
>>>>> On Wed, 11 Mar 2020 at 21:28, Daniel Lezcano <firstname.lastname@example.org> wrote:
>>>>>> In the idle CPU selection process occuring in the slow path via the
>>>>>> find_idlest_group_cpu() function, we pick up in priority an idle CPU
>>>>>> with the shallowest idle state otherwise we fall back to the least
>>>>>> loaded CPU.
>>>>> The idea makes sense but this path is only used by fork and exec so
>>>>> I'm not sure about the real impact
>>>> I agree the fork / exec path is called much less often than the wake
>>>> path but it makes more sense for the decision.
>>> Looking at the flow in find_idlest_cpu(), AFAICT,
>>> find_idlest_group_cpu() is not actually making the final choice of CPU,
>>> so going through a lot of trouble there looking at idle states is
>>> pointless. Is there something I don't see?
>>> We fellow sd->child until groups == CPUs which which means that
>>> find_idlest_group() actually makes the final choice as the final group
>>> passed to find_idlest_group_cpu() is single-CPU group. The flow has been
>>> like that for years. Even before you added the initial idle-state
>>> I agree with Vincent, if this should really make a difference it should
>>> include wake-ups existing tasks too. Although I'm aware it would be a
>>> more invasive change. As said from the beginning, the idea is fine, but
>>> the current implementation should not make any measurable difference?
>> I'm seeing the wake-ups path so sensitive, I'm not comfortable to do any
>> changes in it. That is the reason why the patch only changes the slow path.
> Right. I'm not against being cautious at all. It would be interesting to
> evaluate how bad it really is. The extra time-stamping business cost is
> the same, so it really down how much we dare to use the information in
> the fast-path and change the CPU selection policy. And of course, how
> much can be gained by the change.
If the change provided by this patch is acceptable. I can give a try
with the fast path.
>>>>>> In order to be more energy efficient but without impacting the
>>>>>> performances, let's use another criteria: the break even deadline.
>>>>>> At idle time, when we store the idle state the CPU is entering in, we
>>>>>> compute the next deadline where the CPU could be woken up without
>>>>>> spending more energy to sleep.
>>> I don't follow the argument that sleeping longer should improve energy
>> May be it is not explained correctly.
>> The patch is about selecting a CPU with the smallest break even deadline
>> value. In a group of idle CPUs in the same idle state, we will pick the
>> one with the smallest break even dead line which is the one with the
>> highest probability it already reached its target residency.
>> It is best effort.
> Indeed. I get what the patch does, I just don't see how the patch
> improves energy efficiency.
If the CPU is woken up before it reached the break even, the idle state
cost in energy is greater than the energy it saved.
Am I misunderstanding your point?
>>> The patch doesn't affect the number of idle state
>>> enter/exit cycles, so you spend the amount of energy on those
>>> transitions. The main change is that idle time get spread out, so CPUs
>>> are less likely to be in the process of entering an idle state when they
>>> are asked to wake back up again.
>>> Isn't it fair to say that we expect the total number of wake-ups remains
>>> unchanged? Total busy and idle times across all CPUs should remain the
>>> same too? Unless chosen idle-state is changed, which I don't think we
>>> expect either, there should be no net effect on energy? The main benefit
>>> is reduced wake-up latency I think.
>>> Regarding chosen idle state, I'm wondering how this patch affects the
>>> cpuidle governor's idle state selection. Could the spreading of wake-ups
>>> trick governor to pick a shallower idle-state for some idle CPUs because
>>> we actively spread wake-ups rather than consolidating them? Just a
>> May be I missed the point, why are we spreading the tasks?
> Picking the CPU with the smallest break-even time-stamp means you pick
> the CPU that has been idle longest in the shallowest idle-state. If you
> periodically one-shot spawn tasks at a rate which is long enough that
> the shallowest state is the same for several CPUs, you would end up
> picking the least recently used CPU each time effectively spreading the
> wake-ups across all the CPUs in the same state.
Ok, thanks for the clarification.
> Thinking more about it, it might not be a real problem as if one of the
> CPUs suddenly choose a shallower idle-state, it would become the target
> all new tasks from that point onwards.
>> We are taking the decision on the same sched domain, no?
> I'm not sure I get the relation to the sched_domain?
Never mind. I misunderstood your comment.
>>>>>> At the selection process, we use the shallowest CPU but in addition we
>>>>>> choose the one with the minimal break even deadline instead of relying
>>>>>> on the idle_timestamp. When the CPU is idle, the timestamp has less
>>>>>> meaning because the CPU could have wake up and sleep again several times
>>>>>> without exiting the idle loop. In this case the break even deadline is
>>>>>> more relevant as it increases the probability of choosing a CPU which
>>>>>> reached its break even.
>>> I guess you could improve the idle time stamping without adding the
>>> break-even time, they don't have to go together?
>> Yes, we can add the idle start time when entering idle in the
>> cpuidle_enter function which is different from the idle_timestamp which
>> gives the idle task scheduling. I sent a RFC for that .
>> However, each time we would like to inspect the deadline, we will have
>> to compute it, so IMO it makes more sense to pre-compute it when
>> entering idle in addition to the idle start.
>>  https://lkml.org/lkml/2020/3/16/902
> Yes, I saw that patch too. Seems to make sense :-)
>>>>>> Tested on:
>>>>>> - a synquacer 24 cores, 6 sched domains
>>>>>> - a hikey960 HMP 8 cores, 2 sched domains, with the EAS and energy probe
>>>>>> sched/perf and messaging does not show a performance regression. Ran
>>>>>> 50 times schbench, adrestia and forkbench.
>>>>>> The tools described at https://lwn.net/Articles/724935/
>>>>>> | Synquacer | With break even | Without break even |
>>>>>> | schbench *99.0th | 14844.8 | 15017.6 |
>>>>>> | adrestia / periodic | 57.95 | 57 |
>>>>>> | adrestia / single | 49.3 | 55.4 |
>>>>> Have you got some figures or cpuidle statistics for the syncquacer ?
>>>> No, and we just noticed the syncquacer has a bug in the firmware and
>>>> does not actually go to the idle states.
>>> I would also like some statistics to help understanding what actually
>>> I did some measurements on TX2, which only has one idle-state. I don't
>>> see the same trends as you do. adrestia single seems to be most affected
>>> by the patch, but _increases_ with the break_even patch rather than
>>> decrease. I don't trust adrestia too much though as the time resolution
>>> is low on TX2.
>>> TX2 tip break_even
>>> adrestia / single 5.21 5.51
>>> adrestia / periodic 5.75 5.67
>>> schbench 99.0th 45465.6 45376.0
>>> hackbench 27.9851 27.9775
>>> adrestia: Avg of 100 runs: adrestia -l 25000
>>> schbench: Avg of 10 runs: schbench -m16 -t64
>>> hackbench: Avg of 10 runs: hackbench -g 20 -T 256 -l 100000
>> Thanks for testing. Is that a Jetson TX2 from Nvidia? If that is the
>> case, IIRC, it has some kind of switcher for the CPUs in the firmware, I
>> don't know how that can interact with the testing.
> Sorry, I should have been clearer. It is a ThunderX2. 2x 32-core (128
> threads) = 256 HW threads.
Indeed, that is different :)
I've a syncquacer but we find out a bug in the firmware, hopefully it
can be fixed.
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
next prev parent reply other threads:[~2020-03-17 17:07 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-11 20:26 Daniel Lezcano
2020-03-12 8:36 ` Vincent Guittot
2020-03-12 10:04 ` Daniel Lezcano
2020-03-12 12:27 ` Vincent Guittot
2020-03-13 12:15 ` Daniel Lezcano
2020-03-13 13:15 ` Vincent Guittot
2020-03-13 13:17 ` Daniel Lezcano
2020-03-13 13:21 ` Vincent Guittot
2020-03-17 7:56 ` Morten Rasmussen
2020-03-17 13:48 ` Daniel Lezcano
2020-03-17 14:30 ` Morten Rasmussen
2020-03-17 17:07 ` Daniel Lezcano [this message]
2020-03-18 8:24 ` Morten Rasmussen
2020-03-18 10:17 ` Daniel Lezcano
2020-03-18 14:38 ` Morten Rasmussen
2020-03-18 7:51 ` Vincent Guittot
2020-03-18 8:33 ` Morten Rasmussen
2020-03-17 10:56 ` Valentin Schneider
2020-03-17 13:59 ` Daniel Lezcano
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--subject='Re: [PATCH V2] sched: fair: Use the earliest break even' \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).