LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com,
	peterz@infradead.org, dsmythies@telus.net,
	daniel.lezcano@linaro.org, ego@linux.vnet.ibm.com,
	svaidy@linux.ibm.com, pratik.sampat@in.ibm.com,
	pratik.r.sampat@gmail.com
Subject: Re: [RFC 0/1] Weighted approach to gather and use history in TEO governor
Date: Tue, 25 Feb 2020 10:43:06 +0530	[thread overview]
Message-ID: <20200225051306.GG12846@in.ibm.com> (raw)
In-Reply-To: <20200222070002.12897-1-psampat@linux.ibm.com>

Hello Pratik,

On Sat, Feb 22, 2020 at 12:30:01PM +0530, Pratik Rajesh Sampat wrote:
> Currently the TEO governor apart from the TEO timer and hit/miss/early
> hit buckets; also gathers history of 8 intervals and if there are
> significant idle durations less than the current, then it decides if a
> shallower state must be chosen.
> 
> The current sliding history window does do a fair job at prediction,
> however, the hard-coded window can be a limiting factor for an accurate
> prediction and having the window size increase can also linearly affect
> both space and time complexity of the prediction.
> 
> To complement the current moving window history, an approach is devised
> where each idle state separately maintains a weight for itself and its
> counterpart idle states to form a probability distribution.
> 
> When a decision needs to be made, the TEO governor selects an idle state
> based on its timer and other hits/early hits metric. After which, the
> probability distribution of that selected idle state is looked at which
> gives insight into how probable that state is to occur if picked.
> 
> The probability distribution is nothing but a n*n matrix, where
> n = drv->state_count.
> Each entry in the array signifies a weight for that row.
> The weights can vary from the range [0-10000].
> 
> For example:
> state_mat[1][2] = 3000 means that previously when state 1 was selected,
> the probability that state 2 will occur is 30%.

Could you clarify what this means ? Do you mean that when state 1 is
selected, the probability that the CPU will be in state 1 for the
duration corresponding to state 2's residency is 30% ?

Further more, this means that during idle state selection we have O(n)
complexity if n is the number of idle states, since we want to select
a state where we are more likely to reside ?

> The trailing zeros correspond to having more resolution while increasing
> or reducing the weights for correction.
> 
> Currently, for selection of an idle state based on probabilities, a
> weighted random number generator is used to choose one of the idle
> states. Naturally, the states with higher weights are more likely to be
> chosen.
> 
> On wakeup, the weights are updated. The state with which it should have
> woken up with (could be the hit / miss / early hit state) is increased
> in weight by the "LEARNING_RATE" % and the rest of the states for that
> index are reduced by the same factor.

So we only update the weight in just one cell ?

To use the example above, if we selected state 1, and we resided in it
for a duration corresponding to state 2's residency, we will only
update state_mat[1][2] ?

> 
> The advantage of this approach is that unlimited history of idle states
> can be maintained in constant overhead, which can help in more accurate
> prediction for choosing idle states.
> 
> The advantage of unlimited history can become a possible disadvantage as
> the lifetime history for that thread may make the weights stale and
> influence the choosing of idle states which may not be relevant
> anymore.

Can the effect of this staleless be observed ? For instance, if we
have a particular idle entry/exit pattern for a very long duration,
say a few 10s of minutes and then the idle entry/exit pattern changes,
how bad will the weighted approach be compared to the current TEO
governor ?




> Aging the weights could be a solution for that, although this RFC does
> not cover the implementation for that.
> 
> Having a finer view of the history in addition to weighted randomized
> salt seems to show some promise in terms of saving power without
> compromising performance.
> 
> Benchmarks:
> Note: Wt. TEO governor represents the governor after the proposed change
> 
> Schbench
> ========
> Benchmarks wakeup latencies
> Scale of measurement:
> 1. 99th percentile latency - usec
> 2. Power - Watts
> 
> Command: $ schbench -c 30000 -s 30000 -m 6 -r 30 -t <Threads>
> Varying parameter: -t
> 
> Machine: IBM POWER 9
> 
> +--------+-------------+-----------------+-----------+-----------------+
> | Threads| TEO latency | Wt. TEO latency | TEO power | Wt. TEO power   |
> +--------+-------------+-----------------+-----------+-----------------+
> | 2      | 979         | 949  ( +3.06%)  | 38        | 36  ( +5.26%)   |
> | 4      | 997         | 1042 ( -4.51%)  | 51        | 39  ( +23.52%)  |
> | 8      | 1158        | 1050 ( +9.32%)  | 89        | 63  ( +29.21%)  |
> | 16     | 1138        | 1135 ( +0.26%)  | 105       | 117 ( -11.42%)  |
> +--------+-------------+-----------------+-----------+-----------------+
> 
> Sleeping Ebizzy
> ===============
> Program to generate workloads resembling web server workloads.
> The benchmark is customized to allow for a sleep interval -i
> Scale of measurement:
> 1. Number of records/s
> 2. systime (s)
> 
> Parameters:
> 1. -m => Always use mmap instead of malloc
> 2. -M => Never use mmap
> 3. -S <seconds> => Number of seconds to run
> 4. -i <interval> => Sleep interval
> 
> Machine: IBM POWER 9
> 
> +-------------------+-------------+-------------------+-----------+---------------+
> | Parameters        | TEO records | Wt. TEO records   | TEO power | Wt. TEO power |
> +-------------------+-------------+-------------------+-----------+---------------+
> | -S 60 -i 10000    | 1115000     | 1198081 ( +7.45%) | 149       | 150 ( -0.66%) |
> | -m -S 60 -i 10000 | 15879       | 15513   ( -2.30%) | 23        | 22  ( +4.34%) |
> | -M -S 60 -i 10000 | 72887       | 77546   ( +6.39%) | 104       | 103 ( +0.96%) |
> +-------------------+-------------+-------------------+-----------+---------------+
> 
> Hackbench
> =========
> Creates a specified number of pairs of schedulable entities
> which communicate via either sockets or pipes and time how long  it
> takes for each pair to send data back and forth.
> Scale of measurement:
> 1. Time (s)
> 2. Power (watts)
> 
> Command: Sockets: $ hackbench -l <Messages>
>          Pipes  : $ hackbench --pipe -l <Messages>
> Varying parameter: -l
> 
> Machine: IBM POWER 9
> 
> +----------+------------+-------------------+----------+-------------------+
> | Messages | TEO socket | Wt. TEO socket    | TEO pipe | Wt. TEO pipe      |
> +----------+------------+-------------------+----------+-------------------+
> | 100      | 0.042      | 0.043   ( -2.32%) | 0.031    | 0.032   ( +3.12%) |
> | 1000     | 0.258      | 0.272   ( +5.14%) | 0.301    | 0.312   ( -3.65%) |
> | 10000    | 2.397      | 2.441   ( +1.80%) | 5.642    | 5.092   ( +9.74%) |
> | 100000   | 23.691     | 23.730  ( -0.16%) | 57.762   | 57.857  ( -0.16%) |
> | 1000000  | 234.103    | 233.841 ( +0.11%) | 559.807  | 592.304 ( -5.80%) |
> +----------+------------+-------------------+----------+-------------------+
> 
> Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO
>        Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO
> 
>


Could you also provide power measurements for the duration when the
system is completely idle for each of the variants of TEO governor ?
Is it the case that the benefits that we are seeing above are only due
to Wt. TEO being more conservative than TEO governor by always
choosing a shallower state ?





> 
> Pratik Rajesh Sampat (1):
>   Weighted approach to gather and use history in TEO governor
> 
>  drivers/cpuidle/governors/teo.c | 95 +++++++++++++++++++++++++++++++--
>  1 file changed, 90 insertions(+), 5 deletions(-)
> 
> -- 
> 2.17.1
> 

--
Thanks and Regards
gautham.

  parent reply	other threads:[~2020-02-25  5:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-22  7:00 Pratik Rajesh Sampat
2020-02-22  7:00 ` [RFC 1/1] " Pratik Rajesh Sampat
2020-02-25  6:18   ` Gautham R Shenoy
2020-02-29  8:58     ` Pratik Sampat
2020-02-25  5:13 ` Gautham R Shenoy [this message]
2020-02-27 16:14   ` [RFC 0/1] " Doug Smythies
2020-02-29  8:58     ` Pratik Sampat
2020-02-29  8:58   ` Pratik Sampat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200225051306.GG12846@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dsmythies@telus.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pratik.r.sampat@gmail.com \
    --cc=pratik.sampat@in.ibm.com \
    --cc=psampat@linux.ibm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=svaidy@linux.ibm.com \
    --subject='Re: [RFC 0/1] Weighted approach to gather and use history in TEO governor' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).