LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com,
peterz@infradead.org, dsmythies@telus.net,
daniel.lezcano@linaro.org, ego@linux.vnet.ibm.com,
svaidy@linux.ibm.com, pratik.sampat@in.ibm.com,
pratik.r.sampat@gmail.com
Subject: Re: [RFC 0/1] Weighted approach to gather and use history in TEO governor
Date: Tue, 25 Feb 2020 10:43:06 +0530 [thread overview]
Message-ID: <20200225051306.GG12846@in.ibm.com> (raw)
In-Reply-To: <20200222070002.12897-1-psampat@linux.ibm.com>
Hello Pratik,
On Sat, Feb 22, 2020 at 12:30:01PM +0530, Pratik Rajesh Sampat wrote:
> Currently the TEO governor apart from the TEO timer and hit/miss/early
> hit buckets; also gathers history of 8 intervals and if there are
> significant idle durations less than the current, then it decides if a
> shallower state must be chosen.
>
> The current sliding history window does do a fair job at prediction,
> however, the hard-coded window can be a limiting factor for an accurate
> prediction and having the window size increase can also linearly affect
> both space and time complexity of the prediction.
>
> To complement the current moving window history, an approach is devised
> where each idle state separately maintains a weight for itself and its
> counterpart idle states to form a probability distribution.
>
> When a decision needs to be made, the TEO governor selects an idle state
> based on its timer and other hits/early hits metric. After which, the
> probability distribution of that selected idle state is looked at which
> gives insight into how probable that state is to occur if picked.
>
> The probability distribution is nothing but a n*n matrix, where
> n = drv->state_count.
> Each entry in the array signifies a weight for that row.
> The weights can vary from the range [0-10000].
>
> For example:
> state_mat[1][2] = 3000 means that previously when state 1 was selected,
> the probability that state 2 will occur is 30%.
Could you clarify what this means ? Do you mean that when state 1 is
selected, the probability that the CPU will be in state 1 for the
duration corresponding to state 2's residency is 30% ?
Further more, this means that during idle state selection we have O(n)
complexity if n is the number of idle states, since we want to select
a state where we are more likely to reside ?
> The trailing zeros correspond to having more resolution while increasing
> or reducing the weights for correction.
>
> Currently, for selection of an idle state based on probabilities, a
> weighted random number generator is used to choose one of the idle
> states. Naturally, the states with higher weights are more likely to be
> chosen.
>
> On wakeup, the weights are updated. The state with which it should have
> woken up with (could be the hit / miss / early hit state) is increased
> in weight by the "LEARNING_RATE" % and the rest of the states for that
> index are reduced by the same factor.
So we only update the weight in just one cell ?
To use the example above, if we selected state 1, and we resided in it
for a duration corresponding to state 2's residency, we will only
update state_mat[1][2] ?
>
> The advantage of this approach is that unlimited history of idle states
> can be maintained in constant overhead, which can help in more accurate
> prediction for choosing idle states.
>
> The advantage of unlimited history can become a possible disadvantage as
> the lifetime history for that thread may make the weights stale and
> influence the choosing of idle states which may not be relevant
> anymore.
Can the effect of this staleless be observed ? For instance, if we
have a particular idle entry/exit pattern for a very long duration,
say a few 10s of minutes and then the idle entry/exit pattern changes,
how bad will the weighted approach be compared to the current TEO
governor ?
> Aging the weights could be a solution for that, although this RFC does
> not cover the implementation for that.
>
> Having a finer view of the history in addition to weighted randomized
> salt seems to show some promise in terms of saving power without
> compromising performance.
>
> Benchmarks:
> Note: Wt. TEO governor represents the governor after the proposed change
>
> Schbench
> ========
> Benchmarks wakeup latencies
> Scale of measurement:
> 1. 99th percentile latency - usec
> 2. Power - Watts
>
> Command: $ schbench -c 30000 -s 30000 -m 6 -r 30 -t <Threads>
> Varying parameter: -t
>
> Machine: IBM POWER 9
>
> +--------+-------------+-----------------+-----------+-----------------+
> | Threads| TEO latency | Wt. TEO latency | TEO power | Wt. TEO power |
> +--------+-------------+-----------------+-----------+-----------------+
> | 2 | 979 | 949 ( +3.06%) | 38 | 36 ( +5.26%) |
> | 4 | 997 | 1042 ( -4.51%) | 51 | 39 ( +23.52%) |
> | 8 | 1158 | 1050 ( +9.32%) | 89 | 63 ( +29.21%) |
> | 16 | 1138 | 1135 ( +0.26%) | 105 | 117 ( -11.42%) |
> +--------+-------------+-----------------+-----------+-----------------+
>
> Sleeping Ebizzy
> ===============
> Program to generate workloads resembling web server workloads.
> The benchmark is customized to allow for a sleep interval -i
> Scale of measurement:
> 1. Number of records/s
> 2. systime (s)
>
> Parameters:
> 1. -m => Always use mmap instead of malloc
> 2. -M => Never use mmap
> 3. -S <seconds> => Number of seconds to run
> 4. -i <interval> => Sleep interval
>
> Machine: IBM POWER 9
>
> +-------------------+-------------+-------------------+-----------+---------------+
> | Parameters | TEO records | Wt. TEO records | TEO power | Wt. TEO power |
> +-------------------+-------------+-------------------+-----------+---------------+
> | -S 60 -i 10000 | 1115000 | 1198081 ( +7.45%) | 149 | 150 ( -0.66%) |
> | -m -S 60 -i 10000 | 15879 | 15513 ( -2.30%) | 23 | 22 ( +4.34%) |
> | -M -S 60 -i 10000 | 72887 | 77546 ( +6.39%) | 104 | 103 ( +0.96%) |
> +-------------------+-------------+-------------------+-----------+---------------+
>
> Hackbench
> =========
> Creates a specified number of pairs of schedulable entities
> which communicate via either sockets or pipes and time how long it
> takes for each pair to send data back and forth.
> Scale of measurement:
> 1. Time (s)
> 2. Power (watts)
>
> Command: Sockets: $ hackbench -l <Messages>
> Pipes : $ hackbench --pipe -l <Messages>
> Varying parameter: -l
>
> Machine: IBM POWER 9
>
> +----------+------------+-------------------+----------+-------------------+
> | Messages | TEO socket | Wt. TEO socket | TEO pipe | Wt. TEO pipe |
> +----------+------------+-------------------+----------+-------------------+
> | 100 | 0.042 | 0.043 ( -2.32%) | 0.031 | 0.032 ( +3.12%) |
> | 1000 | 0.258 | 0.272 ( +5.14%) | 0.301 | 0.312 ( -3.65%) |
> | 10000 | 2.397 | 2.441 ( +1.80%) | 5.642 | 5.092 ( +9.74%) |
> | 100000 | 23.691 | 23.730 ( -0.16%) | 57.762 | 57.857 ( -0.16%) |
> | 1000000 | 234.103 | 233.841 ( +0.11%) | 559.807 | 592.304 ( -5.80%) |
> +----------+------------+-------------------+----------+-------------------+
>
> Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO
> Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO
>
>
Could you also provide power measurements for the duration when the
system is completely idle for each of the variants of TEO governor ?
Is it the case that the benefits that we are seeing above are only due
to Wt. TEO being more conservative than TEO governor by always
choosing a shallower state ?
>
> Pratik Rajesh Sampat (1):
> Weighted approach to gather and use history in TEO governor
>
> drivers/cpuidle/governors/teo.c | 95 +++++++++++++++++++++++++++++++--
> 1 file changed, 90 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>
--
Thanks and Regards
gautham.
next prev parent reply other threads:[~2020-02-25 5:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-22 7:00 Pratik Rajesh Sampat
2020-02-22 7:00 ` [RFC 1/1] " Pratik Rajesh Sampat
2020-02-25 6:18 ` Gautham R Shenoy
2020-02-29 8:58 ` Pratik Sampat
2020-02-25 5:13 ` Gautham R Shenoy [this message]
2020-02-27 16:14 ` [RFC 0/1] " Doug Smythies
2020-02-29 8:58 ` Pratik Sampat
2020-02-29 8:58 ` Pratik Sampat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200225051306.GG12846@in.ibm.com \
--to=ego@linux.vnet.ibm.com \
--cc=daniel.lezcano@linaro.org \
--cc=dsmythies@telus.net \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=pratik.r.sampat@gmail.com \
--cc=pratik.sampat@in.ibm.com \
--cc=psampat@linux.ibm.com \
--cc=rafael.j.wysocki@intel.com \
--cc=svaidy@linux.ibm.com \
--subject='Re: [RFC 0/1] Weighted approach to gather and use history in TEO governor' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).