LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [RFC 0/1] Alternate history mechanism for the TEO governor @ 2020-05-11 14:10 Pratik Rajesh Sampat 2020-05-11 14:10 ` [RFC 1/1] Weighted approach to gather and use history in " Pratik Rajesh Sampat 2020-05-17 18:11 ` [RFC 0/1] Alternate history mechanism for the " Doug Smythies 0 siblings, 2 replies; 9+ messages in thread From: Pratik Rajesh Sampat @ 2020-05-11 14:10 UTC (permalink / raw) To: linux-kernel, linux-pm, rafael.j.wysocki, peterz, dsmythies, daniel.lezcano, ego, svaidy, psampat, pratik.sampat, pratik.r.sampat First RFC posting: https://lkml.org/lkml/2020/2/22/27 Changelog 1. Upated benchmarks Currently the TEO governor apart from the TEO timer and hit/miss/early hit buckets; especially for non-timer events, also gathers history of 8 intervals and if there are significant idle durations less than the current, then it decides if a shallower state must be chosen. The current sliding history window does do fair job for the prediction of the non-timer events, so to improve upon the history mechanism, one may just hypothesize increasing the interval window so that more history can influence better decisions. However, this may not be true. The current algorithm can be limited due to its linearity, while also average of interval durations may not be a suitable metric when the data presents skewness. This can especially be true for the when number of intervals taken into account increases. Benchmark: Schbench Provides latency statistics for scheduler wakeups Machine - IBM Power 9 Metric of measurement 1. Performance - latency 99th percentile - usec 2. Power - watts Performance and Power statistics are normalized Latency - Normalized 8 intervals is baseline +---------+-------------+--------------+--------------+--------------+ | Threads | 8 Intervals | 16 Intervals | 32 Intervals | 64 Intervals | +---------+-------------+--------------+--------------+--------------+ | 2 | 100 | 102.81 | 106.93 | 100.86 | +---------+-------------+--------------+--------------+--------------+ | 4 | 100 | 104.67 | 96.58 | 96.58 | +---------+-------------+--------------+--------------+--------------+ | 8 | 100 | 116.26 | 100 | 121.44 | +---------+-------------+--------------+--------------+--------------+ | 16 | 100 | 85.94 | 81.46 | 80.86 | +---------+-------------+--------------+--------------+--------------+ +---------+---------------+ | Threads | 128 Intervals | +---------+---------------+ | 2 | 83.09 | +---------+---------------+ | 4 | 74.75 | +---------+---------------+ | 8 | 87.15 | +---------+---------------+ | 16 | 76.98 | +---------+---------------+ The latency number are kind of a mixed bag. Some intervals perform similar, while some slightly better, while some considerably worse Standout: 128 latency intervals always performs well in terms of latency Power - Normalized 8 intervals in baseline +---------+-------------+--------------+--------------+--------------+ | Threads | 8 Intervals | 16 Intervals | 32 Intervals | 64 Intervals | +---------+-------------+--------------+--------------+--------------+ | 2 | 100 | 100 | 85.29 | 97.05 | +---------+-------------+--------------+--------------+--------------+ | 4 | 100 | 115.9 | 111.36 | 145.45 | +---------+-------------+--------------+--------------+--------------+ | 8 | 100 | 109.63 | 114.45 | 102.4 | +---------+-------------+--------------+--------------+--------------+ | 16 | 100 | 104.2 | 103.36 | 102.52 | +---------+-------------+--------------+--------------+--------------+ +---------+---------------+ | Threads | 128 Intervals | +---------+---------------+ | 2 | 123.52 | +---------+---------------+ | 4 | 159.09 | +---------+---------------+ | 8 | 114.45 | +---------+---------------+ | 16 | 106.72 | +---------+---------------+ Power considerably regresses in almost all cases, showing that with increasing history there seems to be skewness presented to shallow states. Weighted TEO Governor --------------------- To complement the current moving window history, an approach is devised where each idle state separately maintains a weight for itself and its counterpart idle states to form a probability distribution. When a decision needs to be made, the TEO governor selects an idle state based on its timer and other hits/early hits metric. After which, the probability distribution of that selected idle state is looked at which gives insight into how probable that state is to occur if picked. The probability distribution is nothing but a n*n matrix, where n - drv->state_count. Each entry in the array signifies a weight for that row. The weights can vary from the range [0-10000]. For example: state_mat[2][1] - 3000 means that when state 2 is entered idle with, the probability that the interval will last long enough to satisfy state 1's residency is 30%. The trailing zeros correspond to having more resolution while increasing or reducing the weights for correction. Currently, for selection of an idle state based on probabilities, a weighted random number generator is used to choose one of the idle states. Naturally, the states with higher weights are more likely to be chosen. On wakeup, the weights are updated. The state with which it should have woken up with (could be the hit / miss / early hit state) is increased in weight by the "LEARNING_RATE" % and the rest of the states for that index are reduced by the same factor. A possible disadvantage of this approach is that history for that thread may make the weights stale and influence the choosing of idle states which may not be relevant anymore. Aging the weights could be a solution for that, where over time, the highest weight for that index can be decayed and the weight equally spread across the rest of the states. Although this RFC does not cover the implementation for that as there seemed to be too much run to run variance with this approach. Having a finer view of the history in addition to weighted randomized salt seems to show some promise in terms of saving power without compromising performance. Why not just pick the state with the highest probability? - If there are multiple states close in probabilities competing for dominance they should be given a fair chance Of course, if the prediction was wrong the algorithm will self correct, however by that time the pattern may have changed - If this contention pattern is exhibited often then the prediction algorithm will always stay playing catch-up. Benchmarks: Metric of measurement: 1. Performance (latency / throughput) 2. Power (watts) 3. Accuracy % a. Correct prediction - The idle state predicted aligns with the actual sleep duration b. Undershoot prediction (US) - The idle state predicted is deeper for the actual sleep duration c. Overshoot prediction (OS) - The idle state predicted is shallower for the actual sleep duration Performance and Power numbers are normalized. However accuracy numbers are deliberately kept as-is to show how well the vanilla governor performed in the first place Schbench -------- Benchmarks scheduler wakeup latencies 1. latency 99th percentile - usec 2. Power - watts Machine - IBM Power 9 Latency and Power - Normalized +---------+--------------+-----------------+---------------+ | Threads | TEO Baseline | Wt. TEO Latency | Wt. TEO Power | +---------+--------------+-----------------+---------------+ | 2 | 100 | 101.3 | 85.29 | +---------+--------------+-----------------+---------------+ | 4 | 100 | 105.06 | 113.63 | +---------+--------------+-----------------+---------------+ | 8 | 100 | 92.32 | 90.36 | +---------+--------------+-----------------+---------------+ | 16 | 100 | 99.1 | 92.43 | +---------+--------------+-----------------+---------------+ Accuracy Vanilla TEO Governor - Prediction distribution % +---------+------+------+------+-------+-------+-------+---------+ | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | +---------+------+------+------+-------+-------+-------+---------+ | 2 | 6.12 | 1.08 | 1.76 | 20.41 | 9.2 | 28.74 | 22.51 | +---------+------+------+------+-------+-------+-------+---------+ | 4 | 8.54 | 1.56 | 1.25 | 20.24 | 10.75 | 25.17 | 22.67 | +---------+------+------+------+-------+-------+-------+---------+ | 8 | 5.88 | 2.67 | 1.09 | 13.72 | 17.08 | 32.04 | 22.95 | +---------+------+------+------+-------+-------+-------+---------+ | 16 | 6.29 | 2.43 | 0.86 | 13.21 | 15.33 | 26.52 | 29.34 | +---------+------+------+------+-------+-------+-------+---------+ +---------+------+------+------+ | Threads | OS 1 | OS 2 | OS 3 | +---------+------+------+------+ | 2 | 1.77 | 1.27 | 7.14 | +---------+------+------+------+ | 4 | 1.8 | 1.31 | 6.71 | +---------+------+------+------+ | 8 | 0.65 | 0.72 | 3.2 | +---------+------+------+------+ | 16 | 0.63 | 1.71 | 3.68 | +---------+------+------+------+ Weighted TEO Governor - Prediction distribution % +---------+------+------+------+-------+-------+-------+---------+ | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | +---------+------+------+------+-------+-------+-------+---------+ | 2 | 7.26 | 2.07 | 0.02 | 15.85 | 13.29 | 36.26 | 22.13 | +---------+------+------+------+-------+-------+-------+---------+ | 4 | 4.33 | 1.45 | 0.15 | 14.17 | 14.68 | 40.36 | 21.01 | +---------+------+------+------+-------+-------+-------+---------+ | 8 | 4.73 | 2.46 | 0.12 | 12.48 | 14.68 | 32.38 | 28.9 | +---------+------+------+------+-------+-------+-------+---------+ | 16 | 7.68 | 1.25 | 0.98 | 12.15 | 11.19 | 24.91 | 35.92 | +---------+------+------+------+-------+-------+-------+---------+ +---------+------+------+------+ | Threads | OS 1 | OS 2 | OS 3 | +---------+------+------+------+ | 2 | 0.39 | 0.42 | 2.31 | +---------+------+------+------+ | 4 | 0.45 | 0.51 | 2.89 | +---------+------+------+------+ | 8 | 0.53 | 0.66 | 3.06 | +---------+------+------+------+ | 16 | 0.97 | 1.9 | 3.05 | +---------+------+------+------+ Sleeping Ebizzy --------------- Program to generate workloads resembling web server workloads. The benchmark is customized to allow for a sleep interval -i 1. Number of records 2. Power - watts Machine - IBM Power 9 Parameters: 1. -m -> Always use mmap instead of malloc 2. -M -> Never use mmap 3. -S <seconds> -> Number of seconds to run 4. -i <interval> -> Sleep interval Number of records and power normalized +-------------------+---------------+------------------+-----------------+ | Parameters | TEO baseline | Wt TEO records | Wt. TEO Power | +-------------------+---------------+------------------+-----------------+ | -S 60 -i 10000 | 100 | 106.56 | 93.95 | +-------------------+---------------+------------------+-----------------+ | -m -S 60 -i 10000 | 100 | 100.62 | 82.14 | +-------------------+---------------+------------------+-----------------+ | -M -S 60 -i 10000 | 100 | 104.97 | 95.19 | +-------------------+---------------+------------------+-----------------+ Accuracy Vanilla TEO Governor - Prediction distribution % +-------------------+-------+------+------+-------+------+-------+ | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | +-------------------+-------+------+------+-------+------+-------+ | -S 60 -i 10000 | 45.46 | 0.52 | 1.5 | 15.34 | 2.44 | 8.61 | +-------------------+-------+------+------+-------+------+-------+ | -m -S 60 -i 10000 | 4.22 | 2.08 | 0.71 | 90.01 | 0 | 0.01 | +-------------------+-------+------+------+-------+------+-------+ | -M -S 60 -i 10000 | 15.78 | 1.42 | 2.4 | 22.39 | 1.68 | 11.25 | +-------------------+-------+------+------+-------+------+-------+ +-------------------+---------+------+------+------+------+ | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | +-------------------+---------+------+------+------+------+ | -S 60 -i 10000 | 17.03 | 1.73 | 1.1 | 6.27 | 0 | +-------------------+---------+------+------+------+------+ | -m -S 60 -i 10000 | 2.44 | 0.18 | 0.13 | 0.22 | 0 | +-------------------+---------+------+------+------+------+ | -M -S 60 -i 10000 | 31.65 | 3.45 | 1.8 | 8.18 | 0 | +-------------------+---------+------+------+------+------+ Weigted TEO Governor - Prediction distribution % +-------------------+-------+------+------+-------+------+-------+ | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | +-------------------+-------+------+------+-------+------+-------+ | -S 60 -i 10000 | 8.25 | 0.87 | 0.98 | 19.23 | 4.05 | 26.35 | +-------------------+-------+------+------+-------+------+-------+ | -m -S 60 -i 10000 | 7.69 | 4.35 | 0.93 | 82.74 | 0.01 | 0.01 | +-------------------+-------+------+------+-------+------+-------+ | -M -S 60 -i 10000 | 3.73 | 3.29 | 0.73 | 13.33 | 7.38 | 18.61 | +-------------------+-------+------+------+-------+------+-------+ +-------------------+---------+------+------+------+------+ | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | +-------------------+---------+------+------+------+------+ | -S 60 -i 10000 | 32.86 | 3.27 | 2.05 | 2.09 | 0 | +-------------------+---------+------+------+------+------+ | -m -S 60 -i 10000 | 3.4 | 0.29 | 0.28 | 0.3 | 0 | +-------------------+---------+------+------+------+------+ | -M -S 60 -i 10000 | 48.19 | 1.8 | 0.93 | 1.97 | 0.04 | +-------------------+---------+------+------+------+------+ Pgbench ------- pgbench is a simple program for running benchmark tests on PostgreSQL. It runs the same sequence of SQL commands over and over, possibly in multiple concurrent database sessions, and then calculates the average transaction rate (transactions per second). Scale of measurement: 1. Number of transactions 2. Power (watts) Machine - IBM Power 9 Number of transactions and power is normalized +---------+---------------+---------------------+-----------------+ | Clients | TEO Baseline | Wt. TEO transations | Wt. TEO power | +---------+---------------+---------------------+-----------------+ | 4 | 100 | 105.93 | 85.18 | +---------+---------------+---------------------+-----------------+ | 8 | 100 | 98.11 | 100 | +---------+---------------+---------------------+-----------------+ | 16 | 100 | 98.73 | 104.16 | +---------+---------------+---------------------+-----------------+ | 32 | 100 | 101.05 | 95 | +---------+---------------+---------------------+-----------------+ Accuracy Vanilla TEO Governor - Prediction distribution % +---------+-------+-------+-------+-------+------+------+---------+ | Clients | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | +---------+-------+-------+-------+-------+------+------+---------+ | 4 | 59.97 | 2.56 | 0.46 | 1.45 | 0.13 | 0.87 | 32.09 | +---------+-------+-------+-------+-------+------+------+---------+ | 8 | 82.76 | 7.31 | 0.12 | 1.23 | 0.08 | 1.22 | 7.2 | +---------+-------+-------+-------+-------+------+------+---------+ | 16 | 2 | 13.46 | 73.27 | 10.7 | 0.04 | 0.12 | 0.38 | +---------+-------+-------+-------+-------+------+------+---------+ | 32 | 1.33 | 0.76 | 44.9 | 40.8 | 0.07 | 12 | 0.12 | +---------+-------+-------+-------+-------+------+------+---------+ +---------+------+------+------+ | Clients | OS 1 | OS 2 | OS 3 | +---------+------+------+------+ | 4 | 2.44 | 0.01 | 0.02 | +---------+------+------+------+ | 8 | 0.04 | 0.02 | 0.02 | +---------+------+------+------+ | 16 | 0.02 | 0 | 0.01 | +---------+------+------+------+ | 32 | 0.02 | 0 | 0 | +---------+------+------+------+ Weigted TEO Governor - Prediction distribution % +---------+-------+-------+-------+-------+------+------+---------+ | Clients | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | +---------+-------+-------+-------+-------+------+------+---------+ | 4 | 42.79 | 1.79 | 0.01 | 0.72 | 0.1 | 0.78 | 46.99 | +---------+-------+-------+-------+-------+------+------+---------+ | 8 | 76.52 | 8.02 | 0.12 | 0.42 | 0.06 | 0.5 | 14.35 | +---------+-------+-------+-------+-------+------+------+---------+ | 16 | 0.72 | 51.95 | 42.02 | 4.65 | 0.05 | 0.31 | 0.3 | +---------+-------+-------+-------+-------+------+------+---------+ | 32 | 1.06 | 1.59 | 42.89 | 53.11 | 0.05 | 0.42 | 0.87 | +---------+-------+-------+-------+-------+------+------+---------+ +---------+------+------+------+ | Clients | OS 1 | OS 2 | OS 3 | +---------+------+------+------+ | 4 | 6.81 | 0 | 0.01 | +---------+------+------+------+ | 8 | 0 | 0 | 0.01 | +---------+------+------+------+ | 16 | 0 | 0 | 0 | +---------+------+------+------+ | 32 | 0 | 0 | 0.01 | +---------+------+------+------+ Hackbench --------- Creates a specified number of pairs of schedulable entities which communicate via either sockets or pipes and time how long it takes for each pair to send data back and forth. Machine - IBM Power 9 Scale of measurement: 1. Time (s) 2. Power (watts) Time is normalized +---------+----------+----------------------+-------------------+ | Loops | TEO Time | Wt. TEO Time Sockets | Wt. TEO Time Pipe | +---------+----------+----------------------+-------------------+ | 100 | 100 | 95.23 | 87.09 | +---------+----------+----------------------+-------------------+ | 1000 | 100 | 105.81 | 98.67 | +---------+----------+----------------------+-------------------+ | 10000 | 100 | 99.33 | 92.73 | +---------+----------+----------------------+-------------------+ | 100000 | 100 | 98.88 | 101.99 | +---------+----------+----------------------+-------------------+ | 1000000 | 100 | 100.04 | 100.2 | +---------+----------+----------------------+-------------------+ Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO Pratik Rajesh Sampat (1): Weighted approach to gather and use history in TEO governor drivers/cpuidle/governors/teo.c | 96 +++++++++++++++++++++++++++++++-- 1 file changed, 91 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC 1/1] Weighted approach to gather and use history in TEO governor 2020-05-11 14:10 [RFC 0/1] Alternate history mechanism for the TEO governor Pratik Rajesh Sampat @ 2020-05-11 14:10 ` Pratik Rajesh Sampat 2020-05-12 17:37 ` Peter Zijlstra 2020-05-17 18:11 ` [RFC 0/1] Alternate history mechanism for the " Doug Smythies 1 sibling, 1 reply; 9+ messages in thread From: Pratik Rajesh Sampat @ 2020-05-11 14:10 UTC (permalink / raw) To: linux-kernel, linux-pm, rafael.j.wysocki, peterz, dsmythies, daniel.lezcano, ego, svaidy, psampat, pratik.sampat, pratik.r.sampat Complementing the current self correcting window algorithm, an alternate approach is devised to leverage history based on probability of occurrence of states Each CPU maintains a matrix wherein each idle state maintains a probability distribution for itself and the other corresponding states. The probability distribution is nothing but a n*n matrix, where n = drv->state_count. Each entry in the array signifies a weight for that row. The weights can vary from the range [0-10000]. For example: state_mat[2][1] = 3000 means that when state 2 is entered idle with, the probability that the interval will last long enough to satisfy state 1's residency is 30%. The trailing zeros correspond to having more resolution while increasing or reducing the weights for correction. Initially the weights are distributed in a way such that the index of that state in question has a higher probability of choosing itself, as we have no reason to believe otherwise yet. Initial bias to itself is 60% and the rest 40% is equally distributed to the rest of the states. Selection of an idle state: When the TEO governor chooses an idle state, the probability distribution for that state is looked at. A weighted random number generator is used using the weights as bias to choose the next idle state. The algorithm leans to choose that or a shallower state than that for its next prediction Correction of the probability distribution: On wakeup, the weights are updated. The state which it should have woken up with (could be the hit / miss / early hit state) is increased in weight by the "LEARNING_RATE" % and the rest of the states for that index are reduced by the same factor. The LEARNING RATE is experimentally chosen to be 10 % Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com> --- drivers/cpuidle/governors/teo.c | 96 +++++++++++++++++++++++++++++++-- 1 file changed, 91 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index de7e706efd46..84058d797b43 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -50,6 +50,7 @@ #include <linux/kernel.h> #include <linux/sched/clock.h> #include <linux/tick.h> +#include <linux/random.h> /* * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value @@ -64,6 +65,12 @@ */ #define INTERVALS 8 +/* + * Percentage of the amount of weight to be shifted in the idle state weight + * distribution for correction + */ +#define LEARNING_RATE 10 + /** * struct teo_idle_state - Idle state data used by the TEO cpuidle governor. * @early_hits: "Early" CPU wakeups "matching" this state. @@ -98,6 +105,8 @@ struct teo_idle_state { * @states: Idle states data corresponding to this CPU. * @interval_idx: Index of the most recent saved idle interval. * @intervals: Saved idle duration values. + * @state_mat: Each idle state maintains a weights corresponding to that + * state, storing the probability distribution of occurrence for that state */ struct teo_cpu { u64 time_span_ns; @@ -105,6 +114,7 @@ struct teo_cpu { struct teo_idle_state states[CPUIDLE_STATE_MAX]; int interval_idx; u64 intervals[INTERVALS]; + int state_mat[CPUIDLE_STATE_MAX][CPUIDLE_STATE_MAX]; }; static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); @@ -117,7 +127,8 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - int i, idx_hit = -1, idx_timer = -1; + int i, idx_hit = -1, idx_timer = -1, idx = -1; + int last_idx = dev->last_state_idx; u64 measured_ns; if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { @@ -183,16 +194,50 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (idx_timer > idx_hit) { misses += PULSE; - if (idx_hit >= 0) + idx = idx_timer; + if (idx_hit >= 0) { cpu_data->states[idx_hit].early_hits += PULSE; + idx = idx_hit; + } } else { hits += PULSE; + idx = last_idx; } cpu_data->states[idx_timer].misses = misses; cpu_data->states[idx_timer].hits = hits; } + /* + * Rearrange the weight distribution of the state, increase the weight + * by the LEARNING RATE % for the idle state that was supposed to be + * chosen and reduce by the same amount for rest of the states + * + * If the weights are greater than (100 - LEARNING_RATE) % or lesser + * than LEARNING_RATE %, do not increase or decrease the confidence + * respectively + */ + for (i = 0; i < drv->state_count; i++) { + unsigned int delta; + + if (idx == -1) + break; + if (i == idx) { + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100; + if (cpu_data->state_mat[last_idx][i] + delta >= + (100 - LEARNING_RATE) * 100) + continue; + cpu_data->state_mat[last_idx][i] += delta; + continue; + } + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / + ((drv->state_count - 1) * 100); + if (cpu_data->state_mat[last_idx][i] - delta <= + LEARNING_RATE * 100) + continue; + cpu_data->state_mat[last_idx][i] -= delta; + } + /* * Save idle duration values corresponding to non-timer wakeups for * pattern detection. @@ -244,7 +289,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, s64 latency_req = cpuidle_governor_latency_req(dev->cpu); u64 duration_ns; unsigned int hits, misses, early_hits; - int max_early_idx, prev_max_early_idx, constraint_idx, idx, i; + int max_early_idx, prev_max_early_idx, constraint_idx, idx, i, og_idx; ktime_t delta_tick; if (dev->last_state_idx >= 0) { @@ -374,10 +419,14 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (constraint_idx < idx) idx = constraint_idx; + og_idx = idx; + if (idx < 0) { idx = 0; /* No states enabled. Must use 0. */ } else if (idx > 0) { - unsigned int count = 0; + unsigned int weights_list[CPUIDLE_STATE_MAX]; + unsigned int i, j = 0, rnd_wt, rnd_num = 0; + unsigned int count = 0, sum_weights = 0; u64 sum = 0; /* @@ -412,6 +461,28 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx, avg_ns); } } + /* + * In case, the recent history yields a shallower state, then + * the probability distribution is looked at. + * The weighted random number generator uses the weights as a + * bias to choose the next idle state + */ + if (og_idx != idx) { + for (i = 0; i <= idx; i++) { + if (dev->states_usage[i].disable) + continue; + sum_weights += cpu_data->state_mat[idx][i]; + weights_list[j++] = sum_weights; + } + get_random_bytes(&rnd_num, sizeof(rnd_num)); + rnd_num = rnd_num % 100; + rnd_wt = (rnd_num * sum_weights) / 100; + for (i = 0; i < j; i++) { + if (rnd_wt < weights_list[i]) + break; + } + idx = i; + } } /* @@ -468,13 +539,28 @@ static int teo_enable_device(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - int i; + int i, j; memset(cpu_data, 0, sizeof(*cpu_data)); for (i = 0; i < INTERVALS; i++) cpu_data->intervals[i] = U64_MAX; + /* + * Populate initial weights for each state + * The stop state is initially more biased for itself. + * + * Currently the initial distribution of probabilities are 70%-30%. + * The trailing 0s are for increased resolution. + */ + for (i = 0; i < drv->state_count; i++) { + for (j = 0; j < drv->state_count; j++) { + if (i == j) + cpu_data->state_mat[i][j] = 6000; + else + cpu_data->state_mat[i][j] = 4000 / (drv->state_count - 1); + } + } return 0; } -- 2.17.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor 2020-05-11 14:10 ` [RFC 1/1] Weighted approach to gather and use history in " Pratik Rajesh Sampat @ 2020-05-12 17:37 ` Peter Zijlstra 2020-05-13 5:31 ` Pratik Sampat 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2020-05-12 17:37 UTC (permalink / raw) To: Pratik Rajesh Sampat Cc: linux-kernel, linux-pm, rafael.j.wysocki, dsmythies, daniel.lezcano, ego, svaidy, pratik.sampat, pratik.r.sampat Just a quick note.. On Mon, May 11, 2020 at 07:40:55PM +0530, Pratik Rajesh Sampat wrote: > + /* > + * Rearrange the weight distribution of the state, increase the weight > + * by the LEARNING RATE % for the idle state that was supposed to be > + * chosen and reduce by the same amount for rest of the states > + * > + * If the weights are greater than (100 - LEARNING_RATE) % or lesser > + * than LEARNING_RATE %, do not increase or decrease the confidence > + * respectively > + */ > + for (i = 0; i < drv->state_count; i++) { > + unsigned int delta; > + > + if (idx == -1) > + break; > + if (i == idx) { > + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100; 100 is a crap number to divide by as a computer. We bio-puddings happend to have 10 digits, so 100 makes sense to us, but it does not to our binary friends. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor 2020-05-12 17:37 ` Peter Zijlstra @ 2020-05-13 5:31 ` Pratik Sampat 2020-05-13 14:49 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Pratik Sampat @ 2020-05-13 5:31 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-pm, rafael.j.wysocki, dsmythies, daniel.lezcano, ego, svaidy, pratik.sampat, pratik.r.sampat Thanks for your comment. On 12/05/20 11:07 pm, Peter Zijlstra wrote: > Just a quick note.. > > On Mon, May 11, 2020 at 07:40:55PM +0530, Pratik Rajesh Sampat wrote: > >> + /* >> + * Rearrange the weight distribution of the state, increase the weight >> + * by the LEARNING RATE % for the idle state that was supposed to be >> + * chosen and reduce by the same amount for rest of the states >> + * >> + * If the weights are greater than (100 - LEARNING_RATE) % or lesser >> + * than LEARNING_RATE %, do not increase or decrease the confidence >> + * respectively >> + */ >> + for (i = 0; i < drv->state_count; i++) { >> + unsigned int delta; >> + >> + if (idx == -1) >> + break; >> + if (i == idx) { >> + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100; > 100 is a crap number to divide by as a computer. We bio-puddings happend > to have 10 digits, so 100 makes sense to us, but it does not to our > binary friends. > > Absolutely! I just wrote the code exactly the way I did the Math on paper, definitely need to figure out an optimal way of doing things. ~Pratik ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor 2020-05-13 5:31 ` Pratik Sampat @ 2020-05-13 14:49 ` Rafael J. Wysocki 2020-05-14 15:35 ` Pratik Sampat 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2020-05-13 14:49 UTC (permalink / raw) To: Pratik Sampat Cc: Peter Zijlstra, Linux Kernel Mailing List, Linux PM, Rafael Wysocki, Doug Smythies, Daniel Lezcano, Gautham R. Shenoy, Vaidyanathan Srinivasan, pratik.sampat, pratik.r.sampat On Wed, May 13, 2020 at 7:31 AM Pratik Sampat <psampat@linux.ibm.com> wrote: > > Thanks for your comment. > > > On 12/05/20 11:07 pm, Peter Zijlstra wrote: > > Just a quick note.. > > > > On Mon, May 11, 2020 at 07:40:55PM +0530, Pratik Rajesh Sampat wrote: > > > >> + /* > >> + * Rearrange the weight distribution of the state, increase the weight > >> + * by the LEARNING RATE % for the idle state that was supposed to be > >> + * chosen and reduce by the same amount for rest of the states > >> + * > >> + * If the weights are greater than (100 - LEARNING_RATE) % or lesser > >> + * than LEARNING_RATE %, do not increase or decrease the confidence > >> + * respectively > >> + */ > >> + for (i = 0; i < drv->state_count; i++) { > >> + unsigned int delta; > >> + > >> + if (idx == -1) > >> + break; > >> + if (i == idx) { > >> + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100; > > 100 is a crap number to divide by as a computer. We bio-puddings happend > > to have 10 digits, so 100 makes sense to us, but it does not to our > > binary friends. > > > > > Absolutely! I just wrote the code exactly the way I did the Math on paper, > definitely need to figure out an optimal way of doing things. There is no particular reason to use percent in computations at all. You may as well use 1/1024 parts instead (and then use shifts instead of divisions). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 1/1] Weighted approach to gather and use history in TEO governor 2020-05-13 14:49 ` Rafael J. Wysocki @ 2020-05-14 15:35 ` Pratik Sampat 0 siblings, 0 replies; 9+ messages in thread From: Pratik Sampat @ 2020-05-14 15:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Peter Zijlstra, Linux Kernel Mailing List, Linux PM, Rafael Wysocki, Doug Smythies, Daniel Lezcano, Gautham R. Shenoy, Vaidyanathan Srinivasan, pratik.sampat, pratik.r.sampat On 13/05/20 8:19 pm, Rafael J. Wysocki wrote: > On Wed, May 13, 2020 at 7:31 AM Pratik Sampat <psampat@linux.ibm.com> wrote: >> Thanks for your comment. >> >> >> On 12/05/20 11:07 pm, Peter Zijlstra wrote: >>> Just a quick note.. >>> >>> On Mon, May 11, 2020 at 07:40:55PM +0530, Pratik Rajesh Sampat wrote: >>> >>>> + /* >>>> + * Rearrange the weight distribution of the state, increase the weight >>>> + * by the LEARNING RATE % for the idle state that was supposed to be >>>> + * chosen and reduce by the same amount for rest of the states >>>> + * >>>> + * If the weights are greater than (100 - LEARNING_RATE) % or lesser >>>> + * than LEARNING_RATE %, do not increase or decrease the confidence >>>> + * respectively >>>> + */ >>>> + for (i = 0; i < drv->state_count; i++) { >>>> + unsigned int delta; >>>> + >>>> + if (idx == -1) >>>> + break; >>>> + if (i == idx) { >>>> + delta = (LEARNING_RATE * cpu_data->state_mat[last_idx][i]) / 100; >>> 100 is a crap number to divide by as a computer. We bio-puddings happend >>> to have 10 digits, so 100 makes sense to us, but it does not to our >>> binary friends. >>> >>> >> Absolutely! I just wrote the code exactly the way I did the Math on paper, >> definitely need to figure out an optimal way of doing things. > There is no particular reason to use percent in computations at all. > You may as well use 1/1024 parts instead (and then use shifts instead > of divisions). Yes you're right. Looking at it now the whole percent system and divisions does seem quite unnecessary and we can achieve it rather with bitwise operations. Thanks! ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC 0/1] Alternate history mechanism for the TEO governor 2020-05-11 14:10 [RFC 0/1] Alternate history mechanism for the TEO governor Pratik Rajesh Sampat 2020-05-11 14:10 ` [RFC 1/1] Weighted approach to gather and use history in " Pratik Rajesh Sampat @ 2020-05-17 18:11 ` Doug Smythies 2020-05-21 11:09 ` Pratik Sampat 1 sibling, 1 reply; 9+ messages in thread From: Doug Smythies @ 2020-05-17 18:11 UTC (permalink / raw) To: 'Pratik Rajesh Sampat' Cc: linux-kernel, linux-pm, rafael.j.wysocki, peterz, daniel.lezcano, ego, svaidy, pratik.sampat, pratik.r.sampat On 2020.05.11 Pratik Rajesh Sampat wrote: > > First RFC posting: https://lkml.org/lkml/2020/2/22/27 Summary: On that thread I wrote: > I have done a couple of other tests with this patch set, > but nothing to report yet, as the differences have been > minor so far. I tried your tests, or as close as I could find, and still do not notice much difference. For detail, but likely little added value, read on: Kernel: 5.7-rc4: "teo": unmodified kernel. "wtteo": with this patch added. "menu": the menu idle governor, for comparison. CPU frequency scaling driver: intel-cpufreq CPU frequency scaling governor: schedutil CPU idle driver: intel_idle ... > Benchmarks: > Schbench > -------- > Benchmarks scheduler wakeup latencies > > 1. latency 99th percentile - usec I found a Phoronix schbench test. It defaults to 99.9th percentile. schbench (usec, 99.9th Latency Percentile, less is better)(8 workers) threads teo wtteo menu 2 14197 14194 99.98% 14467 101.90% 4 46733 46490 99.48% 46554 99.62% 6 57306 58291 101.72% 57754 100.78% 8 81408 80768 99.21% 81715 100.38% 16 157286 156570 99.54% 156621 99.58% 32 314573 310784 98.80% 315802 100.39% Powers and other idle statistics were similar. [1] > 2. Power - watts > Machine - IBM Power 9 > > Latency and Power - Normalized > +---------+--------------+-----------------+---------------+ > | Threads | TEO Baseline | Wt. TEO Latency | Wt. TEO Power | > +---------+--------------+-----------------+---------------+ > | 2 | 100 | 101.3 | 85.29 | > +---------+--------------+-----------------+---------------+ > | 4 | 100 | 105.06 | 113.63 | > +---------+--------------+-----------------+---------------+ > | 8 | 100 | 92.32 | 90.36 | > +---------+--------------+-----------------+---------------+ > | 16 | 100 | 99.1 | 92.43 | > +---------+--------------+-----------------+---------------+ > > Accuracy > > Vanilla TEO Governor - Prediction distribution % > +---------+------+------+------+-------+-------+-------+---------+ > | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | > +---------+------+------+------+-------+-------+-------+---------+ > | 2 | 6.12 | 1.08 | 1.76 | 20.41 | 9.2 | 28.74 | 22.51 | > +---------+------+------+------+-------+-------+-------+---------+ > | 4 | 8.54 | 1.56 | 1.25 | 20.24 | 10.75 | 25.17 | 22.67 | > +---------+------+------+------+-------+-------+-------+---------+ > | 8 | 5.88 | 2.67 | 1.09 | 13.72 | 17.08 | 32.04 | 22.95 | > +---------+------+------+------+-------+-------+-------+---------+ > | 16 | 6.29 | 2.43 | 0.86 | 13.21 | 15.33 | 26.52 | 29.34 | > +---------+------+------+------+-------+-------+-------+---------+ > +---------+------+------+------+ > | Threads | OS 1 | OS 2 | OS 3 | > +---------+------+------+------+ > | 2 | 1.77 | 1.27 | 7.14 | > +---------+------+------+------+ > | 4 | 1.8 | 1.31 | 6.71 | > +---------+------+------+------+ > | 8 | 0.65 | 0.72 | 3.2 | > +---------+------+------+------+ > | 16 | 0.63 | 1.71 | 3.68 | > +---------+------+------+------+ > > Weighted TEO Governor - Prediction distribution % > +---------+------+------+------+-------+-------+-------+---------+ > | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | > +---------+------+------+------+-------+-------+-------+---------+ > | 2 | 7.26 | 2.07 | 0.02 | 15.85 | 13.29 | 36.26 | 22.13 | > +---------+------+------+------+-------+-------+-------+---------+ > | 4 | 4.33 | 1.45 | 0.15 | 14.17 | 14.68 | 40.36 | 21.01 | > +---------+------+------+------+-------+-------+-------+---------+ > | 8 | 4.73 | 2.46 | 0.12 | 12.48 | 14.68 | 32.38 | 28.9 | > +---------+------+------+------+-------+-------+-------+---------+ > | 16 | 7.68 | 1.25 | 0.98 | 12.15 | 11.19 | 24.91 | 35.92 | > +---------+------+------+------+-------+-------+-------+---------+ > +---------+------+------+------+ > | Threads | OS 1 | OS 2 | OS 3 | > +---------+------+------+------+ > | 2 | 0.39 | 0.42 | 2.31 | > +---------+------+------+------+ > | 4 | 0.45 | 0.51 | 2.89 | > +---------+------+------+------+ > | 8 | 0.53 | 0.66 | 3.06 | > +---------+------+------+------+ > | 16 | 0.97 | 1.9 | 3.05 | > +---------+------+------+------+ > > Sleeping Ebizzy > --------------- > Program to generate workloads resembling web server workloads. > The benchmark is customized to allow for a sleep interval -i I found a Phoronix ebizzy, but without the customization, which I suspect is important to demonstrate your potential improvement. Could you send me yours to try? ebizzy (records per second, more is better) teo wtteo menu 132344 132228 99.91% 130926 98.93% Powers and other idle statistics were similar. [2] > 1. Number of records > 2. Power - watts > Machine - IBM Power 9 > > Parameters: > 1. -m -> Always use mmap instead of malloc > 2. -M -> Never use mmap > 3. -S <seconds> -> Number of seconds to run > 4. -i <interval> -> Sleep interval What are the units of this interval? They must be microseconds, as that is the only thing that makes sense. I have tried to simulate the resulting actual workflow myself, but didn't get results like yours. (I may have done a poorly.) My test does not produce performance data, as it just has to do its work before the next time to do a chunk of work. The test is: forever do 100 times very short sleep enddo sleep for 10 milliseconds endforever The overheads result in enough activity. Powers and other idle statistics were similar. [3] > > Number of records and power normalized > +-------------------+---------------+------------------+-----------------+ > | Parameters | TEO baseline | Wt TEO records | Wt. TEO Power | > +-------------------+---------------+------------------+-----------------+ > | -S 60 -i 10000 | 100 | 106.56 | 93.95 | > +-------------------+---------------+------------------+-----------------+ > | -m -S 60 -i 10000 | 100 | 100.62 | 82.14 | > +-------------------+---------------+------------------+-----------------+ > | -M -S 60 -i 10000 | 100 | 104.97 | 95.19 | > +-------------------+---------------+------------------+-----------------+ > > Accuracy > > Vanilla TEO Governor - Prediction distribution % > +-------------------+-------+------+------+-------+------+-------+ > | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | > +-------------------+-------+------+------+-------+------+-------+ > | -S 60 -i 10000 | 45.46 | 0.52 | 1.5 | 15.34 | 2.44 | 8.61 | > +-------------------+-------+------+------+-------+------+-------+ > | -m -S 60 -i 10000 | 4.22 | 2.08 | 0.71 | 90.01 | 0 | 0.01 | > +-------------------+-------+------+------+-------+------+-------+ > | -M -S 60 -i 10000 | 15.78 | 1.42 | 2.4 | 22.39 | 1.68 | 11.25 | > +-------------------+-------+------+------+-------+------+-------+ > +-------------------+---------+------+------+------+------+ > | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | > +-------------------+---------+------+------+------+------+ > | -S 60 -i 10000 | 17.03 | 1.73 | 1.1 | 6.27 | 0 | > +-------------------+---------+------+------+------+------+ > | -m -S 60 -i 10000 | 2.44 | 0.18 | 0.13 | 0.22 | 0 | > +-------------------+---------+------+------+------+------+ > | -M -S 60 -i 10000 | 31.65 | 3.45 | 1.8 | 8.18 | 0 | > +-------------------+---------+------+------+------+------+ > > Weigted TEO Governor - Prediction distribution % > +-------------------+-------+------+------+-------+------+-------+ > | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | > +-------------------+-------+------+------+-------+------+-------+ > | -S 60 -i 10000 | 8.25 | 0.87 | 0.98 | 19.23 | 4.05 | 26.35 | > +-------------------+-------+------+------+-------+------+-------+ > | -m -S 60 -i 10000 | 7.69 | 4.35 | 0.93 | 82.74 | 0.01 | 0.01 | > +-------------------+-------+------+------+-------+------+-------+ > | -M -S 60 -i 10000 | 3.73 | 3.29 | 0.73 | 13.33 | 7.38 | 18.61 | > +-------------------+-------+------+------+-------+------+-------+ > +-------------------+---------+------+------+------+------+ > | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | > +-------------------+---------+------+------+------+------+ > | -S 60 -i 10000 | 32.86 | 3.27 | 2.05 | 2.09 | 0 | > +-------------------+---------+------+------+------+------+ > | -m -S 60 -i 10000 | 3.4 | 0.29 | 0.28 | 0.3 | 0 | > +-------------------+---------+------+------+------+------+ > | -M -S 60 -i 10000 | 48.19 | 1.8 | 0.93 | 1.97 | 0.04 | > +-------------------+---------+------+------+------+------+ For accuracy numbers, it would help to know the sample size and the importance. For this 60 second test, I wonder if the number of times each idle state was entered and exited was large enough to draw any conclusion. I often find for tests that some states are only used a few times in 1 minute, and so don't really care about the accuracy. Anyway, for my attempts that this test, I had to extend to a 5 minute sample time to get adequate numbers in all idle states for the accuracy statistics. (which showed no difference, by the way (for those not looking at the graphs).) For my test all three governors, teo, wtteo, and menu, were using idle state 0 about 7 to 8 thousand times per 5 minutes, and 100% of time the assessment was the state was too shallow. However, I don't really care because it is only 0.003% of the time, and if idle state 0 is disabled (teo-0disable on [3] (it is enabled again at minute 35), the power doesn't change. All that being said, your power/accuracy results do seem correlated. > > Pgbench > ------- > pgbench is a simple program for running benchmark tests on PostgreSQL. > It runs the same sequence of SQL commands over and over, possibly in > multiple concurrent database sessions, and then calculates the average > transaction rate (transactions per second). I did not try this test or anything similar. ... > > Hackbench > --------- > Creates a specified number of pairs of schedulable entities > which communicate via either sockets or pipes and time how long it > takes for each pair to send data back and forth. > I found a Phoronix version, but it doesn't like your low loops counts, so I stayed with the default 50,000. I suspect your low loop count results in a workflow somewhat like your special ebizzy test. Anyway, maybe I should try your version and low loop counts. I did many tests, and get inconsistent results. You use these terms like "sockets" and "pipes", but the phoronix test uses "count" and "thread" or "process". I only used "process" for the simple reason that there was very very little use of idle at all with "thread", so there was no value in any test. hackbench test 1: all - process (seconds, less is better) test count teo wtteo menu 1 1 8.7 8.99 103.33% 9.071 104.26% 2 2 16.509 16.96 102.73% 17.159 103.94% 3 4 33.451 34.081 101.88% 34.101 101.94% 4 8 69.037 71.647 103.78% 69.914 101.27% 5 16 161.64 165.569 102.43% 165.015 102.09% Powers and other idle statistics were similar. [4] hackbench test 2: count 1 - process (seconds, less is better) teo wtteo menu average 8.906 8.703 97.72% 9.032 101.41% max 9.263 8.856 9.228 min 8.761 8.599 8.876 Std. Dev. 0.83% 0.46% 0.80% runs 256 256 200 Powers and other idle statistics were similar. [5] However, idle state 3 is worthy of a look. hackbench test 3: count 2 - process (seconds, less is better) teo wtteo menu average 16.702 16.65 99.69% 16.796 100.56% max 16.853 16.966 17.058 min 16.542 16.487 16.659 Std. Dev. 0.41% 0.59% 0.56% runs 100 100 100 Powers and other idle statistics were similar. [6] However, idle state 3 is worthy of a look. > Machine - IBM Power 9 > > Scale of measurement: > 1. Time (s) > 2. Power (watts) > Time is normalized > > +---------+----------+----------------------+-------------------+ > | Loops | TEO Time | Wt. TEO Time Sockets | Wt. TEO Time Pipe | > +---------+----------+----------------------+-------------------+ > | 100 | 100 | 95.23 | 87.09 | > +---------+----------+----------------------+-------------------+ > | 1000 | 100 | 105.81 | 98.67 | > +---------+----------+----------------------+-------------------+ > | 10000 | 100 | 99.33 | 92.73 | > +---------+----------+----------------------+-------------------+ > | 100000 | 100 | 98.88 | 101.99 | > +---------+----------+----------------------+-------------------+ > | 1000000 | 100 | 100.04 | 100.2 | > +---------+----------+----------------------+-------------------+ > > Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO > Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO > > Pratik Rajesh Sampat (1): > Weighted approach to gather and use history in TEO governor > > drivers/cpuidle/governors/teo.c | 96 +++++++++++++++++++++++++++++++-- > 1 file changed, 91 insertions(+), 5 deletions(-) > > -- > 2.17.1 I also tried Giovanni's and Mel's mmtests, (uses idle states 0 and 1 a lot) but couldn't extract the performance report. [7] Old sweep test, which doesn't produce performance data. [8] Old system idle test. [9] [1] http://www.smythies.com/~doug/linux/idle/wtteo/schbench/ [2] http://www.smythies.com/~doug/linux/idle/wtteo/ebizzy/ [3] http://www.smythies.com/~doug/linux/idle/wtteo/pn01/ [4] http://www.smythies.com/~doug/linux/idle/wtteo/hackbench/ [5] http://www.smythies.com/~doug/linux/idle/wtteo/hackbench2/ [6] http://www.smythies.com/~doug/linux/idle/wtteo/hackbench3/ [7] http://www.smythies.com/~doug/linux/idle/wtteo/mmtests-udp/ [8] http://www.smythies.com/~doug/linux/idle/wtteo/sweep/ [9] http://www.smythies.com/~doug/linux/idle/wtteo/idle/ ... Doug ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC 0/1] Alternate history mechanism for the TEO governor 2020-05-17 18:11 ` [RFC 0/1] Alternate history mechanism for the " Doug Smythies @ 2020-05-21 11:09 ` Pratik Sampat 2020-05-25 18:32 ` Doug Smythies 0 siblings, 1 reply; 9+ messages in thread From: Pratik Sampat @ 2020-05-21 11:09 UTC (permalink / raw) To: Doug Smythies Cc: linux-kernel, linux-pm, rafael.j.wysocki, peterz, daniel.lezcano, ego, svaidy, pratik.sampat, pratik.r.sampat Hello Doug, Thanks a lot for running these benchmarks on an Intel box. On 17/05/20 11:41 pm, Doug Smythies wrote: > On 2020.05.11 Pratik Rajesh Sampat wrote: >> First RFC posting:https://lkml.org/lkml/2020/2/22/27 > Summary: > > On that thread I wrote: > > > I have done a couple of other tests with this patch set, > > but nothing to report yet, as the differences have been > > minor so far. > > I tried your tests, or as close as I could find, and still > do not notice much difference. That is quite unfortunate. At least it doesn't seem to regress. Nevertheless, as Rafael suggested aging is crucial, this patch doesn't age weights. I do have a version with aging but I had a lot of run to run variance so I had refrained from posting that. I'm tweaking around the logic for aging as well as distribution of weights, hopefully that may help. > For detail, but likely little added value, read on: > > Kernel: 5.7-rc4: > "teo": unmodified kernel. > "wtteo": with this patch added. > "menu": the menu idle governor, for comparison. > CPU frequency scaling driver: intel-cpufreq > CPU frequency scaling governor: schedutil > CPU idle driver: intel_idle > > ... > >> Benchmarks: >> Schbench >> -------- >> Benchmarks scheduler wakeup latencies >> >> 1. latency 99th percentile - usec > I found a Phoronix schbench test. > It defaults to 99.9th percentile. > > schbench (usec, 99.9th Latency Percentile, less is better)(8 workers) > > threads teo wtteo menu > 2 14197 14194 99.98% 14467 101.90% > 4 46733 46490 99.48% 46554 99.62% > 6 57306 58291 101.72% 57754 100.78% > 8 81408 80768 99.21% 81715 100.38% > 16 157286 156570 99.54% 156621 99.58% > 32 314573 310784 98.80% 315802 100.39% > > Powers and other idle statistics were similar. [1] > >> 2. Power - watts >> Machine - IBM Power 9 >> >> Latency and Power - Normalized >> +---------+--------------+-----------------+---------------+ >> | Threads | TEO Baseline | Wt. TEO Latency | Wt. TEO Power | >> +---------+--------------+-----------------+---------------+ >> | 2 | 100 | 101.3 | 85.29 | >> +---------+--------------+-----------------+---------------+ >> | 4 | 100 | 105.06 | 113.63 | >> +---------+--------------+-----------------+---------------+ >> | 8 | 100 | 92.32 | 90.36 | >> +---------+--------------+-----------------+---------------+ >> | 16 | 100 | 99.1 | 92.43 | >> +---------+--------------+-----------------+---------------+ >> >> Accuracy >> >> Vanilla TEO Governor - Prediction distribution % >> +---------+------+------+------+-------+-------+-------+---------+ >> | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 2 | 6.12 | 1.08 | 1.76 | 20.41 | 9.2 | 28.74 | 22.51 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 4 | 8.54 | 1.56 | 1.25 | 20.24 | 10.75 | 25.17 | 22.67 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 8 | 5.88 | 2.67 | 1.09 | 13.72 | 17.08 | 32.04 | 22.95 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 16 | 6.29 | 2.43 | 0.86 | 13.21 | 15.33 | 26.52 | 29.34 | >> +---------+------+------+------+-------+-------+-------+---------+ >> +---------+------+------+------+ >> | Threads | OS 1 | OS 2 | OS 3 | >> +---------+------+------+------+ >> | 2 | 1.77 | 1.27 | 7.14 | >> +---------+------+------+------+ >> | 4 | 1.8 | 1.31 | 6.71 | >> +---------+------+------+------+ >> | 8 | 0.65 | 0.72 | 3.2 | >> +---------+------+------+------+ >> | 16 | 0.63 | 1.71 | 3.68 | >> +---------+------+------+------+ >> >> Weighted TEO Governor - Prediction distribution % >> +---------+------+------+------+-------+-------+-------+---------+ >> | Threads | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | Correct | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 2 | 7.26 | 2.07 | 0.02 | 15.85 | 13.29 | 36.26 | 22.13 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 4 | 4.33 | 1.45 | 0.15 | 14.17 | 14.68 | 40.36 | 21.01 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 8 | 4.73 | 2.46 | 0.12 | 12.48 | 14.68 | 32.38 | 28.9 | >> +---------+------+------+------+-------+-------+-------+---------+ >> | 16 | 7.68 | 1.25 | 0.98 | 12.15 | 11.19 | 24.91 | 35.92 | >> +---------+------+------+------+-------+-------+-------+---------+ >> +---------+------+------+------+ >> | Threads | OS 1 | OS 2 | OS 3 | >> +---------+------+------+------+ >> | 2 | 0.39 | 0.42 | 2.31 | >> +---------+------+------+------+ >> | 4 | 0.45 | 0.51 | 2.89 | >> +---------+------+------+------+ >> | 8 | 0.53 | 0.66 | 3.06 | >> +---------+------+------+------+ >> | 16 | 0.97 | 1.9 | 3.05 | >> +---------+------+------+------+ >> >> Sleeping Ebizzy >> --------------- >> Program to generate workloads resembling web server workloads. >> The benchmark is customized to allow for a sleep interval -i > I found a Phoronix ebizzy, but without the customization, > which I suspect is important to demonstrate your potential > improvement. > > Could you send me yours to try? Sure thing, sleeping ebizzy is hosted here: https://github.com/pratiksampat/sleeping-ebizzy > > ebizzy (records per second, more is better) > > teo wtteo menu > 132344 132228 99.91% 130926 98.93% > > Powers and other idle statistics were similar. [2] > >> 1. Number of records >> 2. Power - watts >> Machine - IBM Power 9 >> >> Parameters: >> 1. -m -> Always use mmap instead of malloc >> 2. -M -> Never use mmap >> 3. -S <seconds> -> Number of seconds to run >> 4. -i <interval> -> Sleep interval > What are the units of this interval? > They must be microseconds, as that is the only thing that makes sense. Yes, it is in microseconds > I have tried to simulate the resulting actual workflow > myself, but didn't get results like yours. (I may have done a poorly.) > My test does not produce performance data, as it just has to do its work > before the next time to do a chunk of work. > The test is: > > forever > do 100 times > very short sleep > enddo > sleep for 10 milliseconds > endforever Yes, In logic this is very similar to what benchmark emulates. > The overheads result in enough activity. > Powers and other idle statistics were similar. [3] > >> Number of records and power normalized >> +-------------------+---------------+------------------+-----------------+ >> | Parameters | TEO baseline | Wt TEO records | Wt. TEO Power | >> +-------------------+---------------+------------------+-----------------+ >> | -S 60 -i 10000 | 100 | 106.56 | 93.95 | >> +-------------------+---------------+------------------+-----------------+ >> | -m -S 60 -i 10000 | 100 | 100.62 | 82.14 | >> +-------------------+---------------+------------------+-----------------+ >> | -M -S 60 -i 10000 | 100 | 104.97 | 95.19 | >> +-------------------+---------------+------------------+-----------------+ >> >> Accuracy >> >> Vanilla TEO Governor - Prediction distribution % >> +-------------------+-------+------+------+-------+------+-------+ >> | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -S 60 -i 10000 | 45.46 | 0.52 | 1.5 | 15.34 | 2.44 | 8.61 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -m -S 60 -i 10000 | 4.22 | 2.08 | 0.71 | 90.01 | 0 | 0.01 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -M -S 60 -i 10000 | 15.78 | 1.42 | 2.4 | 22.39 | 1.68 | 11.25 | >> +-------------------+-------+------+------+-------+------+-------+ >> +-------------------+---------+------+------+------+------+ >> | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | >> +-------------------+---------+------+------+------+------+ >> | -S 60 -i 10000 | 17.03 | 1.73 | 1.1 | 6.27 | 0 | >> +-------------------+---------+------+------+------+------+ >> | -m -S 60 -i 10000 | 2.44 | 0.18 | 0.13 | 0.22 | 0 | >> +-------------------+---------+------+------+------+------+ >> | -M -S 60 -i 10000 | 31.65 | 3.45 | 1.8 | 8.18 | 0 | >> +-------------------+---------+------+------+------+------+ >> >> Weigted TEO Governor - Prediction distribution % >> +-------------------+-------+------+------+-------+------+-------+ >> | Parameters | US 1 | US 2 | US 3 | US 4 | US 5 | US 6 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -S 60 -i 10000 | 8.25 | 0.87 | 0.98 | 19.23 | 4.05 | 26.35 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -m -S 60 -i 10000 | 7.69 | 4.35 | 0.93 | 82.74 | 0.01 | 0.01 | >> +-------------------+-------+------+------+-------+------+-------+ >> | -M -S 60 -i 10000 | 3.73 | 3.29 | 0.73 | 13.33 | 7.38 | 18.61 | >> +-------------------+-------+------+------+-------+------+-------+ >> +-------------------+---------+------+------+------+------+ >> | Parameters | Correct | OS 1 | OS 2 | OS 3 | OS 4 | >> +-------------------+---------+------+------+------+------+ >> | -S 60 -i 10000 | 32.86 | 3.27 | 2.05 | 2.09 | 0 | >> +-------------------+---------+------+------+------+------+ >> | -m -S 60 -i 10000 | 3.4 | 0.29 | 0.28 | 0.3 | 0 | >> +-------------------+---------+------+------+------+------+ >> | -M -S 60 -i 10000 | 48.19 | 1.8 | 0.93 | 1.97 | 0.04 | >> +-------------------+---------+------+------+------+------+ > For accuracy numbers, it would help to know the sample size > and the importance. > > For this 60 second test, I wonder if the number of times > each idle state was entered and exited was large enough to > draw any conclusion. I often find for tests that some states are > only used a few times in 1 minute, and so don't really care about the accuracy. The sample size does go upto early double digit thousands but I don't really know the physical importance of such a number. So, I get what you're saying and maybe I need to benchmark with a longer duration as your experience shows. > Anyway, for my attempts that this test, I had to extend to a 5 minute sample > time to get adequate numbers in all idle states for the accuracy statistics. > (which showed no difference, by the way (for those not looking at the graphs).) > > For my test all three governors, teo, wtteo, and menu, were > using idle state 0 about 7 to 8 thousand times per 5 minutes, > and 100% of time the assessment was the state was too shallow. > However, I don't really care because it is only 0.003% of the time, > and if idle state 0 is disabled (teo-0disable on [3] (it is enabled > again at minute 35), the power doesn't change. > > All that being said, your power/accuracy results do seem correlated. > This I believe is a good affirmation to have. I would be worried if we predicted more correctly and somehow ended up doing worse or vise-versa. >> Pgbench >> ------- >> pgbench is a simple program for running benchmark tests on PostgreSQL. >> It runs the same sequence of SQL commands over and over, possibly in >> multiple concurrent database sessions, and then calculates the average >> transaction rate (transactions per second). > I did not try this test or anything similar. > ... > >> Hackbench >> --------- >> Creates a specified number of pairs of schedulable entities >> which communicate via either sockets or pipes and time how long it >> takes for each pair to send data back and forth. >> > I found a Phoronix version, but it doesn't like > your low loops counts, so I stayed with the default 50,000. > > I suspect your low loop count results in a workflow somewhat like > your special ebizzy test. Anyway, maybe I should try your version > and low loop counts. > > I did many tests, and get inconsistent results. > > You use these terms like "sockets" and "pipes", but > the phoronix test uses "count" and "thread" or "process". > > I only used "process" for the simple reason that there was very > very little use of idle at all with "thread", so there was no value > in any test. > > hackbench test 1: all - process (seconds, less is better) > > test count teo wtteo menu > 1 1 8.7 8.99 103.33% 9.071 104.26% > 2 2 16.509 16.96 102.73% 17.159 103.94% > 3 4 33.451 34.081 101.88% 34.101 101.94% > 4 8 69.037 71.647 103.78% 69.914 101.27% > 5 16 161.64 165.569 102.43% 165.015 102.09% > > Powers and other idle statistics were similar. [4] > > hackbench test 2: count 1 - process (seconds, less is better) > teo wtteo menu > average 8.906 8.703 97.72% 9.032 101.41% > max 9.263 8.856 9.228 > min 8.761 8.599 8.876 > Std. Dev. 0.83% 0.46% 0.80% > runs 256 256 200 > > Powers and other idle statistics were similar. [5] > However, idle state 3 is worthy of a look. > > hackbench test 3: count 2 - process (seconds, less is better) > teo wtteo menu > average 16.702 16.65 99.69% 16.796 100.56% > max 16.853 16.966 17.058 > min 16.542 16.487 16.659 > Std. Dev. 0.41% 0.59% 0.56% > runs 100 100 100 > > Powers and other idle statistics were similar. [6] > However, idle state 3 is worthy of a look. > >> Machine - IBM Power 9 >> >> Scale of measurement: >> 1. Time (s) >> 2. Power (watts) >> Time is normalized >> >> +---------+----------+----------------------+-------------------+ >> | Loops | TEO Time | Wt. TEO Time Sockets | Wt. TEO Time Pipe | >> +---------+----------+----------------------+-------------------+ >> | 100 | 100 | 95.23 | 87.09 | >> +---------+----------+----------------------+-------------------+ >> | 1000 | 100 | 105.81 | 98.67 | >> +---------+----------+----------------------+-------------------+ >> | 10000 | 100 | 99.33 | 92.73 | >> +---------+----------+----------------------+-------------------+ >> | 100000 | 100 | 98.88 | 101.99 | >> +---------+----------+----------------------+-------------------+ >> | 1000000 | 100 | 100.04 | 100.2 | >> +---------+----------+----------------------+-------------------+ >> >> Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO >> Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO >> >> Pratik Rajesh Sampat (1): >> Weighted approach to gather and use history in TEO governor >> >> drivers/cpuidle/governors/teo.c | 96 +++++++++++++++++++++++++++++++-- >> 1 file changed, 91 insertions(+), 5 deletions(-) >> >> -- >> 2.17.1 > I also tried Giovanni's and Mel's mmtests, (uses idle states 0 and 1 a lot) > but couldn't extract the performance report. [7] > > Old sweep test, which doesn't produce performance data. [8] > Old system idle test. [9] > > [1]http://www.smythies.com/~doug/linux/idle/wtteo/schbench/ > [2]http://www.smythies.com/~doug/linux/idle/wtteo/ebizzy/ > [3]http://www.smythies.com/~doug/linux/idle/wtteo/pn01/ > [4]http://www.smythies.com/~doug/linux/idle/wtteo/hackbench/ > [5]http://www.smythies.com/~doug/linux/idle/wtteo/hackbench2/ > [6]http://www.smythies.com/~doug/linux/idle/wtteo/hackbench3/ > [7]http://www.smythies.com/~doug/linux/idle/wtteo/mmtests-udp/ > [8]http://www.smythies.com/~doug/linux/idle/wtteo/sweep/ > [9]http://www.smythies.com/~doug/linux/idle/wtteo/idle/ > > ... Doug > > Thanks again for these comprehensive results. ~ Pratik ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC 0/1] Alternate history mechanism for the TEO governor 2020-05-21 11:09 ` Pratik Sampat @ 2020-05-25 18:32 ` Doug Smythies 0 siblings, 0 replies; 9+ messages in thread From: Doug Smythies @ 2020-05-25 18:32 UTC (permalink / raw) To: 'Pratik Sampat' Cc: linux-kernel, linux-pm, rafael.j.wysocki, peterz, daniel.lezcano, ego, svaidy, pratik.sampat, pratik.r.sampat On 2020.05.21 04:09 Pratik Sampat wrote: > On 17/05/20 11:41 pm, Doug Smythies wrote: > > On 2020.05.11 Pratik Rajesh Sampat wrote: > >> First RFC posting:https://lkml.org/lkml/2020/2/22/27 > > Summary: > > > > On that thread I wrote: > > > > > I have done a couple of other tests with this patch set, > > > but nothing to report yet, as the differences have been > > > minor so far. > > > > I tried your tests, or as close as I could find, and still > > do not notice much difference. > > That is quite unfortunate. At least it doesn't seem to regress. Yes, while I have not been able to demonstrate improvement, I have not found any regression. > > Nevertheless, as Rafael suggested aging is crucial, this patch doesn't age > weights. I do have a version with aging but I had a lot of run to run variance > so I had refrained from posting that. > I'm tweaking around the logic for aging as well as distribution of weights, > hopefully that may help. O.K. I am putting this testing aside for now. I like the set of tests, as they really show the differences between menu and teo governors well. > >> > >> Sleeping Ebizzy > >> --------------- > >> Program to generate workloads resembling web server workloads. > >> The benchmark is customized to allow for a sleep interval -i > > I found a Phoronix ebizzy, but without the customization, > > which I suspect is important to demonstrate your potential > > improvement. > > > > Could you send me yours to try? > > Sure thing, sleeping ebizzy is hosted here: > https://github.com/pratiksampat/sleeping-ebizzy > > > > > ebizzy (records per second, more is better) > > > > teo wtteo menu > > 132344 132228 99.91% 130926 98.93% O.K. yours is way different than what I was using. Anyway, results still are not very different between teo and wtteo. Some tests are showing a little difference between above/below statistics [1] [1] http://www.smythies.com/~doug/linux/idle/wtteo/ebizzy-interval/2_below.png By the way, and likely not relevant, your sleeping-ebizzy test seems extremely sensitive to the interval and number of threads. It is not clear to me what settings I should use to try to re-create your results. [2] is an interesting graph of records per second verses intervals verses threads. [2] http://www.smythies.com/~doug/linux/idle/wtteo/doug08/sleeping-ebizzy-records-intervals-threads.png ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-05-25 18:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-11 14:10 [RFC 0/1] Alternate history mechanism for the TEO governor Pratik Rajesh Sampat 2020-05-11 14:10 ` [RFC 1/1] Weighted approach to gather and use history in " Pratik Rajesh Sampat 2020-05-12 17:37 ` Peter Zijlstra 2020-05-13 5:31 ` Pratik Sampat 2020-05-13 14:49 ` Rafael J. Wysocki 2020-05-14 15:35 ` Pratik Sampat 2020-05-17 18:11 ` [RFC 0/1] Alternate history mechanism for the " Doug Smythies 2020-05-21 11:09 ` Pratik Sampat 2020-05-25 18:32 ` Doug Smythies
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).