LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-29 5:27 Peter Williams
2004-05-29 11:17 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-29 5:27 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Fri, 28 May 2004 19:24, Peter Williams wrote:
> > Ingo Molnar wrote:
> > > just try it - run a task that runs 95% of the time and sleeps 5%
> > > of the time, and run a (same prio) task that runs 100% of the
> > > time. With the current scheduler the slightly-sleeping task gets
> > > 45% of the CPU, the looping one gets 55% of the CPU. With your
> > > patch the slightly-sleeping process can easily monopolize 90% of
> > > the CPU!
>
> > This does, of course, not take into account the interactive bonus.
> > If the task doing the shorter CPU bursts manages to earn a larger
> > interactivity bonus than the other then it will get more CPU but
> > isn't that the intention of the interactivity bonus?
>
> No. Ideally the interactivity bonus should decide what goes first
> every time to decrease the latency of interactive tasks, but the cpu
> percentage should remain close to the same for equal "nice" tasks.
There are at least two possible ways of viewing "nice": one of these is
that it is an indicator of the tasks entitlement to CPU resource (which
is more or less the view you describe) and another that it is an
indicator of the task's priority with respect to access to CPU resources.
If you wish the system to take the first of these views then the
appropriate solution to the scheduling problem is to use an entitlement
based scheduler such as EBS (see
<http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
than the current O(1) scheduler and has the advantage that it gives
pretty good interactive responsiveness without treating interactive
tasks specially (although some modification in this regard may be
desirable if very high loads are going to be encountered).
If you want the second of these then this proposed modification is a
simple way of getting it (with the added proviso that starvation be
avoided).
Of course, there can be other scheduling aims such as maximising
throughput where different scheduler paradigms need to be used. As a
matter of interest these tend to have not very good interactive response.
If the system is an interactive system then all of these models (or at
least two of them) need to be modified to "break the rules" as far as
interactive tasks are concerned and give them higher priority in order
not to try human patience.
> Interactive tasks need low scheduling latency and short bursts of high
> cpu usage; not more cpu usage overall. When the cpu percentage
differs > significantly from this the logic has failed.
The only way this will happen is if the interactive bonus mechanism
misidentifies a CPU bound task as an interactive task and gives it a
large bonus. This seems to be the case as tasks with a 95% CPU demand
rate are being given a bonus of 9 (out of 10 possible) points.
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 5:27 [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array Peter Williams
@ 2004-05-29 11:17 ` Con Kolivas
2004-05-30 0:19 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Con Kolivas @ 2004-05-29 11:17 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Sat, 29 May 2004 15:27, Peter Williams wrote:
> Con Kolivas wrote:
> > On Fri, 28 May 2004 19:24, Peter Williams wrote:
> > > Ingo Molnar wrote:
> > > > just try it - run a task that runs 95% of the time and sleeps 5%
> > > > of the time, and run a (same prio) task that runs 100% of the
> > > > time. With the current scheduler the slightly-sleeping task gets
> > > > 45% of the CPU, the looping one gets 55% of the CPU. With your
> > > > patch the slightly-sleeping process can easily monopolize 90% of
> > > > the CPU!
> > >
> > > This does, of course, not take into account the interactive bonus.
> > > If the task doing the shorter CPU bursts manages to earn a larger
> > > interactivity bonus than the other then it will get more CPU but
> > > isn't that the intention of the interactivity bonus?
> >
> > No. Ideally the interactivity bonus should decide what goes first
> > every time to decrease the latency of interactive tasks, but the cpu
> > percentage should remain close to the same for equal "nice" tasks.
>
> There are at least two possible ways of viewing "nice": one of these is
> that it is an indicator of the tasks entitlement to CPU resource (which
> is more or less the view you describe) and another that it is an
> indicator of the task's priority with respect to access to CPU resources.
>
> If you wish the system to take the first of these views then the
> appropriate solution to the scheduling problem is to use an entitlement
> based scheduler such as EBS (see
> <http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
> than the current O(1) scheduler and has the advantage that it gives
> pretty good interactive responsiveness without treating interactive
> tasks specially (although some modification in this regard may be
> desirable if very high loads are going to be encountered).
>
> If you want the second of these then this proposed modification is a
> simple way of getting it (with the added proviso that starvation be
> avoided).
>
> Of course, there can be other scheduling aims such as maximising
> throughput where different scheduler paradigms need to be used. As a
> matter of interest these tend to have not very good interactive response.
>
> If the system is an interactive system then all of these models (or at
> least two of them) need to be modified to "break the rules" as far as
> interactive tasks are concerned and give them higher priority in order
> not to try human patience.
>
> > Interactive tasks need low scheduling latency and short bursts of high
> > cpu usage; not more cpu usage overall. When the cpu percentage
>
> differs > significantly from this the logic has failed.
>
> The only way this will happen is if the interactive bonus mechanism
> misidentifies a CPU bound task as an interactive task and gives it a
> large bonus. This seems to be the case as tasks with a 95% CPU demand
> rate are being given a bonus of 9 (out of 10 possible) points.
This is all a matter of semantics and I have no argument with it.
I think your aims of simplifying the scheduler are admirable but I hope you
don't suffer the quagmire that is manipulating the interactivity stuff.
Changing one value and saying it has no apparent effect is almost certainly
wrong; surely it was put there for a reason - or rather I put it there for a
reason.
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 11:17 ` Con Kolivas
@ 2004-05-30 0:19 ` Peter Williams
2004-05-30 12:56 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-30 0:19 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Sat, 29 May 2004 15:27, Peter Williams wrote:
>
>>Con Kolivas wrote:
>> > On Fri, 28 May 2004 19:24, Peter Williams wrote:
>> > > Ingo Molnar wrote:
>> > > > just try it - run a task that runs 95% of the time and sleeps 5%
>> > > > of the time, and run a (same prio) task that runs 100% of the
>> > > > time. With the current scheduler the slightly-sleeping task gets
>> > > > 45% of the CPU, the looping one gets 55% of the CPU. With your
>> > > > patch the slightly-sleeping process can easily monopolize 90% of
>> > > > the CPU!
>> > >
>> > > This does, of course, not take into account the interactive bonus.
>> > > If the task doing the shorter CPU bursts manages to earn a larger
>> > > interactivity bonus than the other then it will get more CPU but
>> > > isn't that the intention of the interactivity bonus?
>> >
>> > No. Ideally the interactivity bonus should decide what goes first
>> > every time to decrease the latency of interactive tasks, but the cpu
>> > percentage should remain close to the same for equal "nice" tasks.
>>
>>There are at least two possible ways of viewing "nice": one of these is
>>that it is an indicator of the tasks entitlement to CPU resource (which
>>is more or less the view you describe) and another that it is an
>>indicator of the task's priority with respect to access to CPU resources.
>>
>>If you wish the system to take the first of these views then the
>>appropriate solution to the scheduling problem is to use an entitlement
>>based scheduler such as EBS (see
>><http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
>>than the current O(1) scheduler and has the advantage that it gives
>>pretty good interactive responsiveness without treating interactive
>>tasks specially (although some modification in this regard may be
>>desirable if very high loads are going to be encountered).
>>
>>If you want the second of these then this proposed modification is a
>>simple way of getting it (with the added proviso that starvation be
>>avoided).
>>
>>Of course, there can be other scheduling aims such as maximising
>>throughput where different scheduler paradigms need to be used. As a
>>matter of interest these tend to have not very good interactive response.
>>
>>If the system is an interactive system then all of these models (or at
>>least two of them) need to be modified to "break the rules" as far as
>>interactive tasks are concerned and give them higher priority in order
>>not to try human patience.
>>
>> > Interactive tasks need low scheduling latency and short bursts of high
>> > cpu usage; not more cpu usage overall. When the cpu percentage
>>
>>differs > significantly from this the logic has failed.
>>
>>The only way this will happen is if the interactive bonus mechanism
>>misidentifies a CPU bound task as an interactive task and gives it a
>>large bonus. This seems to be the case as tasks with a 95% CPU demand
>>rate are being given a bonus of 9 (out of 10 possible) points.
>
>
> This is all a matter of semantics and I have no argument with it.
>
> I think your aims of simplifying the scheduler are admirable but I hope you
> don't suffer the quagmire that is manipulating the interactivity stuff.
As you surmise, this patch is just a starting point and there are some
parts of it the may need to be fine tuned.
For instance, the current time slice used is set at the average that the
current mechanism would have dispensed. Making this smaller would
lessen the severity of the anomaly under discussion but making it too
small would increase the context switch rate. There is evidence from
our kernbench results that we have room to decrease this value and still
keep the context switch rate below that of the current scheduler (at
least, for normal to moderately heavy loads). If possible I'd like to
get some statistics on the sleep/wake cycles of tasks on a typical
system to help make a judgment about what is the best value here.
Another area that needs more consideration is the determination of the
promotion interval. At the moment, there's no promotion if there's less
than 2 runnable tasks on a CPU and the interval is a constant multiplied
by the number of runnable tasks otherwise.
Another area of investigation is (yet another) bonus intended to
increase system throughput by minimizing (or at least attempting to) the
time tasks spend on the run queues. The principal difficulty here is
making sure that this doesn't adversely effect interactive
responsiveness as it's an unfortunate fact of life that what's good for
interactive response isn't necessarily (and usually isn't) good for
maximizing throughput and vice versa.
Then, the interactive bonus mechanism might be examined but this is of
low priority as the current one seems to do a reasonable job.
Lastly, with the simplification of the scheduler I believe that it would
be possible to make both the interactive response and throughput bonuses
optional. An example of why this MIGHT BE desirable is that the
interactive response bonus adversely effects throughput and turning it
off on servers where there are no interactive users may be worthwhile.
> Changing one value and saying it has no apparent effect is almost certainly
> wrong; surely it was put there for a reason - or rather I put it there for a
> reason.
Out of interest, what was the reason? What problem were you addressing?
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-30 0:19 ` Peter Williams
@ 2004-05-30 12:56 ` Con Kolivas
2004-05-31 0:04 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Con Kolivas @ 2004-05-30 12:56 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Sun, 30 May 2004 10:19, Peter Williams wrote:
> Out of interest, what was the reason? What problem were you addressing?
The interactive credit?
There was a problem with difficulty elevating back to interactive state if an
interactive task had used too long a burst of cpu (ie Xfree) which was
addressed by making the bonus pseudo-exponentially curved for rapid recovery
and slow decay - in fact this is probably the most important part of
addressing the interactive tasks and had the best effect. The problem was
that giving this to all tasks meant that cpu bound tasks that had, as a
property of their behaviour, long waits on say pipes or i/o would also get
this rapid recovery to interactive state and as soon as they became fully
bound to cpu again they would cause noticable stalls. The standard example is
the increasing number of jobs in a make, where each job waits longer for i/o
as the job numbers increase. However there were much worse examples at even
normal - low loads, such as mpeg or divx encoding where the encoder would
buffer say 250 frames sleeping and then do them in a burst. (this is the
maximum space between key [I] frame or intervals). The interactive credit
prevented those tasks that would have long but only infrequent sleeps from
getting the curved bonus/penalty.
Hmm... if this is black magic I guess I'm breaking the magician's cardinal
rules and revealing my tricks ;-)
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-31 0:04 ` Peter Williams
@ 2004-05-30 23:13 ` Con Kolivas
0 siblings, 0 replies; 15+ messages in thread
From: Con Kolivas @ 2004-05-30 23:13 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
Quoting Peter Williams <pwil3058@bigpond.net.au>:
> Con Kolivas wrote:
> > The interactive credit?
>
> No. I was asking about the mechanism in schedule() that, based on the
> value of "activated", allows some tasks to count their time on the run
> queue as sleep time.
No this is different again. When there is significant load and an interactive
task (like X) is using bursts of cpu it will not get a chance to sleep. It will
either run out of timeslice or be preempted and thus will be constantly on the
runqueue. In that time if only sleep time is counted it is progressively seen
as a cpu bound task - in fact this was a major problem before where X would
basically die under load because you'd move a window and it would be constantly
waiting for more cpu time rather than ever sleeping.
> I also think that their is a category of task that may be automatically
> detected and that needs special attention and that is tasks that need
> very regular access to small bursts of CPU. I suspect that the tasks
> doing the mpeg and divx encoding/decoding that you refer to above fall
> into this category. The key to identifying these tasks would be that
> the variance (or standard deviation) of their sleeps would be close to
> zero and as they probably do much the same amount of work each CPU burst
> the same would be true of the variance of the length of the CPU bursts.
I've already tried that experiment and I personally failed to make the pattern
of sleep/burst correspond with interactivity. There is a huge variation in the
duration of sleep/run for all different things, and it changes with load. Of
course your modelling may be better than mine.
> Dr Peter Williams pwil3058@bigpond.net.au
Oooh I've got one of those too but I normally dont use it on lkml.
So just for this once...
Dr Con Kolivas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-30 12:56 ` Con Kolivas
@ 2004-05-31 0:04 ` Peter Williams
2004-05-30 23:13 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-31 0:04 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Sun, 30 May 2004 10:19, Peter Williams wrote:
>
>>Out of interest, what was the reason? What problem were you addressing?
>
>
> The interactive credit?
No. I was asking about the mechanism in schedule() that, based on the
value of "activated", allows some tasks to count their time on the run
queue as sleep time.
>
> There was a problem with difficulty elevating back to interactive state if an
> interactive task had used too long a burst of cpu (ie Xfree) which was
> addressed by making the bonus pseudo-exponentially curved for rapid recovery
> and slow decay - in fact this is probably the most important part of
> addressing the interactive tasks and had the best effect.
But this probably answers my question anyway.
> The problem was
> that giving this to all tasks meant that cpu bound tasks that had, as a
> property of their behaviour, long waits on say pipes or i/o would also get
> this rapid recovery to interactive state and as soon as they became fully
> bound to cpu again they would cause noticable stalls. The standard example is
> the increasing number of jobs in a make, where each job waits longer for i/o
> as the job numbers increase. However there were much worse examples at even
> normal - low loads, such as mpeg or divx encoding where the encoder would
> buffer say 250 frames sleeping and then do them in a burst. (this is the
> maximum space between key [I] frame or intervals). The interactive credit
> prevented those tasks that would have long but only infrequent sleeps from
> getting the curved bonus/penalty.
In the near future, I'll be proposing another patch that goes on top of
the single priority array (SPA) patch that has the express purpose of
reducing the time that all tasks spend on run queues. I think that one
of the side effects of this mechanism is that it will alleviate the
problem you've described above.
I also think that their is a category of task that may be automatically
detected and that needs special attention and that is tasks that need
very regular access to small bursts of CPU. I suspect that the tasks
doing the mpeg and divx encoding/decoding that you refer to above fall
into this category. The key to identifying these tasks would be that
the variance (or standard deviation) of their sleeps would be close to
zero and as they probably do much the same amount of work each CPU burst
the same would be true of the variance of the length of the CPU bursts.
Currently, the scheduler relies on the interactive bonus to make sure
that these tasks get CPU when they need it but I suspect that to make
this happen the interactive bonus has to be larger than it might
otherwise need to be. So if these tasks can be identified and treated
specially (e.g. reserve the MAX_RT_PRIO slot for them) the interactive
bonus could be reduced and this would improve overall system throughput.
>
> Hmm... if this is black magic I guess I'm breaking the magician's cardinal
> rules and revealing my tricks ;-)
>
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 12:24 ` Andi Kleen
2004-05-29 12:38 ` Con Kolivas
@ 2004-06-04 7:40 ` Peter Williams
1 sibling, 0 replies; 15+ messages in thread
From: Peter Williams @ 2004-06-04 7:40 UTC (permalink / raw)
To: Andi Kleen; +Cc: Con Kolivas, linux-kernel, kernel.linux
Andi Kleen wrote:
> Con Kolivas <kernel@kolivas.org> writes:
>
>>I think your aims of simplifying the scheduler are admirable but I hope you
>>don't suffer the quagmire that is manipulating the interactivity stuff.
>>Changing one value and saying it has no apparent effect is almost certainly
>>wrong; surely it was put there for a reason - or rather I put it there for a
>>reason.
>
>
> But that doesn't mean that the reason cannot be reevaluated later.
> If Peter can up with a simpler scheduler and nobody can break it significantly
> it would be great, and i'm hope such simplifications could be merged
> after testing. Certainly the current one does far too much black magic.
As a result of your encouragement I have implemented a simplified
version of the interactive bonus scheme and an extension whose aim is to
improve general system throughput on to of my SPA patch (to the 2.6.6
kernel) an updated version of which is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa-v2>
Interactive Bonus (SPA_IA_BONUS) patch is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa_ia_bonus>
This patch approaches the problem from a control systems perspective and
the statistics are calculated for each task (using extremely simple and
efficient Kalman filters):
A_c which is the running average number of nanoseconds of CPU time that
it spends on the CPU during a scheduling cycle (SC) which is defined to
be the period from one activation (or wake up) to the next.
A_s which is the running average number of nanoseconds that it spends
sleeping during the SC
A_d which is the running average number of nanoseconds that it spends on
a run queue waiting for CPU access
These statistics are then used to make the following tests about the task:
1. is it showing interactive behaviour, and/or
2. is it showing CPU bound behaviour.
The first test consists of testing that A_s / (A_s + A_c) (where A_s is
the average sleep time per cycle and A_c is the average CPU time per
cycle) is greater than a threshold (currently 90%) (NB this is
equivalent to A_c / (A_c + A_s) being less than 10%) and also that the
average sleep time isn't greater than a threshold (currently set at 15
minutes). This test is applied at the end of each scheduling cycle when
the task is woken and put back on the run queue. If the task passes
this test its interactive bonus is increased asymptotically towards the
maximum bonus (or, at least, the maximum multiplied by their average A_s
/ (A_s + A_c)).
The second test is applied at the end of a cycle if the above test fails
and also at the end of each time slice (if the task stays active for
more than one time slice). This task consists of testing whether the
ratio A_c / (A_s + A_c + A_d) (where A_d is the average delay on run
queues waiting for CPU access per cycle) is greater a threshold
(currently 50%) or A_s is zero or very large (currently greater than 2
hours) and if the task passes this test it is considered to be CPU bound
or a chronic idler and NOT interactive and its interactive bonus is
decreased asymptotically towards zero. The reason that A_c / (A_s + A_c
+ A_d) instead of the equivalent A_c / (A_c + A_s) form the first test
is so that interactive tasks are not mistakenly classified as CPU bound
tasks when the system is very busy and their wait time becomes squeezed
and turns into run queue delay time. Similarly, the reason that A_c /
(A_c + A_s + A_d) isn't used in the first test is that it could lead to
CPU bound tasks being wrongly classed as interactive tasks when the
system is very busy.
In the description of the first test, I mention that the interactive
bonus of a tasks asymptotically approaches the product of the maximum
bonus and their average A_s / (A_s + A_c). This has the important side
effect that the interactive bonus of a fairly busy task such as the X
server doesn't get as big bonus as that of those low CPU usage stream
servers such as xmms and they work quite happily under extremely heavy
loads including heavy X server activity.
Although this may sound complex and expensive it's actually quite cheap
as estimating the averages is very simple and the more complex bits
(involving 64 bit division to calculate the ratios) occur relatively
infrequently.
Throughput Bonus (SPA_TPT_BONUS) patch is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa_tpt_bonus>
This bonus is also based on the statistics described above and awards
bonus points to tasks that are spending a disproportionate amount of
time (compared to the CPU time they are using) on run queues. The
maximum amount of this bonus is half that of the maximum interactive
bonus so it should not cause tasks receiving it to interfere with them.
The bonus awarded is proportional to the equation: (A_d / (A_d + A_c *
N)) where N is the number of tasks running on the CPU in question. This
latter correction is necessary to stop all tasks getting the maximum
bonus when the system is busy. Unlike the interactive bonus this bonus
is not persistent and is recalculated every time the task is to be
requeued. When the system is not heavily loaded this bonus has the
tendency to reduce the overall amount of queuing on the system and
improve throughput. When the system is heavily loaded it tends to
spread the pain evenly among the non interactive tasks (subject, of
course, to niceness).
Hopefully my explanations above don't fall into the "black magic"
category and the mechanics of the scheduler are easy to understand? :-)
If not and you have any questions don't hesitate to ask
Peter
PS Patches for 2.6.7-rc2 should be available next week.
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 12:24 ` Andi Kleen
@ 2004-05-29 12:38 ` Con Kolivas
2004-06-04 7:40 ` Peter Williams
1 sibling, 0 replies; 15+ messages in thread
From: Con Kolivas @ 2004-05-29 12:38 UTC (permalink / raw)
To: Andi Kleen; +Cc: pwil3058, linux-kernel
On Sat, 29 May 2004 22:24, Andi Kleen wrote:
> Con Kolivas <kernel@kolivas.org> writes:
> > I think your aims of simplifying the scheduler are admirable but I hope
> > you don't suffer the quagmire that is manipulating the interactivity
> > stuff. Changing one value and saying it has no apparent effect is almost
> > certainly wrong; surely it was put there for a reason - or rather I put
> > it there for a reason.
>
> But that doesn't mean that the reason cannot be reevaluated later.
> If Peter can up with a simpler scheduler and nobody can break it
> significantly it would be great, and i'm hope such simplifications could be
> merged after testing. Certainly the current one does far too much black
> magic.
Once again I agree. What I'm saying is altering the magic incantation even
with just one knob will lead to unexpected results so a complete overhaul is
probably the correct way to tackle the unnecessary complexity. I see at least
4 of them happening on list already by people with much more coding skill
than I'll ever have and am happy that interactivity is one of the things high
on people's minds for development.
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
[not found] ` <21acm-2GN-1@gated-at.bofh.it>
@ 2004-05-29 12:24 ` Andi Kleen
2004-05-29 12:38 ` Con Kolivas
2004-06-04 7:40 ` Peter Williams
0 siblings, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2004-05-29 12:24 UTC (permalink / raw)
To: Con Kolivas; +Cc: pwil3058, linux-kernel
Con Kolivas <kernel@kolivas.org> writes:
>
> I think your aims of simplifying the scheduler are admirable but I hope you
> don't suffer the quagmire that is manipulating the interactivity stuff.
> Changing one value and saying it has no apparent effect is almost certainly
> wrong; surely it was put there for a reason - or rather I put it there for a
> reason.
But that doesn't mean that the reason cannot be reevaluated later.
If Peter can up with a simpler scheduler and nobody can break it significantly
it would be great, and i'm hope such simplifications could be merged
after testing. Certainly the current one does far too much black magic.
-Andi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-29 1:39 Peter Williams
0 siblings, 0 replies; 15+ messages in thread
From: Peter Williams @ 2004-05-29 1:39 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel Mailing List
> Ingo Molnar wrote:
>> Peter Williams <peterw@xxxxxxxxxx> wrote:
>> >just try it - run a task that runs 95% of the time and sleeps 5% of the
>> >time, and run a (same prio) task that runs 100% of the time. With the
>> >current scheduler the slightly-sleeping task gets 45% of the CPU, the
>> >looping one gets 55% of the CPU. With your patch the slightly-sleeping
>> >process can easily monopolize 90% of the CPU!
>>
>> If these two tasks have the same nice value they should around robin
>> with each other in the same priority slot and this means that the one
>> doing the smaller bites of CPU each time will in fact get less CPU
>> than the other i.e. the outcome will be the opposite of what you
>> claim.
>
> just try what i described with and without your patch and look at the
> 'top' output. You can do a simple loop plus short 10-20msec sleeps
> (via nanosleep) to simulate a 95% busy task.
OK. I've tried this experiment and an effect similar to what you
described (but not as severe) was observed. However, as I opined, this
was due to the interactive bonus mechanism being too generous to the
sleeper - sometimes giving at as many as 9 bonus points out of a
possible maximum of 10. The severity of the effect was variable and
proportional to the size of the bonus awarded at any given time. So the
problem is due to the bonus point scheme and not the absence of the
active and expired arrays.
Further investigation of this problem has revealed that the reason that
the interactive bonus scheme is so generous to this type of task is due
the code in schedule() that allows tasks with "activated" greater than 1
to count time spent on the run queue as sleep time. If this code is
removed the effect disappears.
Interestingly enough, removing that code has no effect (that I could
discern) on the interactive response even under heavy loads. This had
been observed previously and consideration was given to removing this
code as part of this patch but it was decided to make no significant
changes to the interactive bonus scheme for this patch. I.e.
modifying/improving the interactive bonus mechanism was considered to be
another issue.
Peter
PS Sorry about the change in e-mail address but my ISP changed my IP
address and this has caused me to lose my SSH connection with my
aurema.com account and as it's now Saturday here in Sydney I'm unlikely
to get it back until Monday.
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
@ 2004-05-28 9:57 ` Con Kolivas
1 sibling, 0 replies; 15+ messages in thread
From: Con Kolivas @ 2004-05-28 9:57 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Fri, 28 May 2004 19:24, Peter Williams wrote:
> Ingo Molnar wrote:
> > just try it - run a task that runs 95% of the time and sleeps 5% of the
> > time, and run a (same prio) task that runs 100% of the time. With the
> > current scheduler the slightly-sleeping task gets 45% of the CPU, the
> > looping one gets 55% of the CPU. With your patch the slightly-sleeping
> > process can easily monopolize 90% of the CPU!
> This does, of course, not take into account the interactive bonus. If
> the task doing the shorter CPU bursts manages to earn a larger
> interactivity bonus than the other then it will get more CPU but isn't
> that the intention of the interactivity bonus?
No. Ideally the interactivity bonus should decide what goes first every time
to decrease the latency of interactive tasks, but the cpu percentage should
remain close to the same for equal "nice" tasks. Interactive tasks need low
scheduling latency and short bursts of high cpu usage; not more cpu usage
overall. When the cpu percentage differs significantly from this the logic
has failed.
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:24 ` Peter Williams
@ 2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-05-28 9:29 UTC (permalink / raw)
To: Peter Williams; +Cc: Linux Kernel Mailing List
* Peter Williams <peterw@aurema.com> wrote:
> >just try it - run a task that runs 95% of the time and sleeps 5% of the
> >time, and run a (same prio) task that runs 100% of the time. With the
> >current scheduler the slightly-sleeping task gets 45% of the CPU, the
> >looping one gets 55% of the CPU. With your patch the slightly-sleeping
> >process can easily monopolize 90% of the CPU!
>
> If these two tasks have the same nice value they should around robin
> with each other in the same priority slot and this means that the one
> doing the smaller bites of CPU each time will in fact get less CPU
> than the other i.e. the outcome will be the opposite of what you
> claim.
just try what i described with and without your patch and look at the
'top' output. You can do a simple loop plus short 10-20msec sleeps (via
nanosleep) to simulate a 95% busy task.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:05 ` Ingo Molnar
@ 2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
0 siblings, 2 replies; 15+ messages in thread
From: Peter Williams @ 2004-05-28 9:24 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel Mailing List
Ingo Molnar wrote:
> * Peter Williams <peterw@aurema.com> wrote:
>
>
>>-- at the end of each time slice (or when waking up) each task is
>>given a complete new time slice and, if class SCHED_NORMAL, is put in
>>a priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
>
>
> this is the Achilles' heel of approaches that try to get rid of the
> active/expired array and/or try to get rid of timeslice tracking. A
> CPU-bound task which schedules away for small amounts of time will get a
> disproportionatly larger share of the CPU than a CPU-bound task that
> doesnt schedule at all.
>
> just try it - run a task that runs 95% of the time and sleeps 5% of the
> time, and run a (same prio) task that runs 100% of the time. With the
> current scheduler the slightly-sleeping task gets 45% of the CPU, the
> looping one gets 55% of the CPU. With your patch the slightly-sleeping
> process can easily monopolize 90% of the CPU!
If these two tasks have the same nice value they should around robin
with each other in the same priority slot and this means that the one
doing the smaller bites of CPU each time will in fact get less CPU than
the other i.e. the outcome will be the opposite of what you claim.
This does, of course, not take into account the interactive bonus. If
the task doing the shorter CPU bursts manages to earn a larger
interactivity bonus than the other then it will get more CPU but isn't
that the intention of the interactivity bonus?
Peter
--
Dr Peter Williams, Chief Scientist peterw@aurema.com
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 4:52 Peter Williams
@ 2004-05-28 9:05 ` Ingo Molnar
2004-05-28 9:24 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-05-28 9:05 UTC (permalink / raw)
To: Peter Williams; +Cc: Linux Kernel Mailing List
* Peter Williams <peterw@aurema.com> wrote:
> -- at the end of each time slice (or when waking up) each task is
> given a complete new time slice and, if class SCHED_NORMAL, is put in
> a priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
this is the Achilles' heel of approaches that try to get rid of the
active/expired array and/or try to get rid of timeslice tracking. A
CPU-bound task which schedules away for small amounts of time will get a
disproportionatly larger share of the CPU than a CPU-bound task that
doesnt schedule at all.
just try it - run a task that runs 95% of the time and sleeps 5% of the
time, and run a (same prio) task that runs 100% of the time. With the
current scheduler the slightly-sleeping task gets 45% of the CPU, the
looping one gets 55% of the CPU. With your patch the slightly-sleeping
process can easily monopolize 90% of the CPU!
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-28 4:52 Peter Williams
2004-05-28 9:05 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-28 4:52 UTC (permalink / raw)
To: Linux Kernel Mailing List
This is a modification of the O(1) scheduler to replace the active and
expired priority arrays with a single priority array:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa>
This patch simplifies the scheduler code by:
- removing the need to manage two priority arrays per CPU
- removing the need to use variable time slices to implement "nice"
which means no more sharing of time slices between parents and children
at fork and exit.
Key features are:
-- uses the existing "interactive task" bonus scheme so there is no
change to interactive responsiveness
-- the number of priority slots has been increased by MAX_BONUS + 1 so
that there is no need to truncate the allocated priority after
allocation of the bonus and to allow the "idle" tasks to be placed on
the queues
-- at the end of each time slice (or when waking up) each task is given
a complete new time slice and, if class SCHED_NORMAL, is put in a
priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
-- starvation is prevented by promoting all runnable tasks by one
priority slot at intervals determined by a base promotion interval
multiplied by the number of runnable tasks on the CPU in question
-- when there are less than 2 runnable processes on the CPU the
promotion is not performed
-- in order for the anti starvation promotion mechanism to be O(1) the
"prio" field has been removed from the task structure
-- to enable the priority of the current task to be available for
various uses that previously used current->prio a record of the priority
slot for the current task is now kept in the runqueue struct
-- having the "idle" tasks on the queues allows the code for selecting
the "next" task in schedule() to be simplified
Effected files are:
init_task.h -- cope with removal of "prio" and "array" from task struct
sched.h -- remove "prio" and "array" from task struct
exit.c -- remove call to sched_exit()
sched.c -- the bulk of the change
Performance:
- no discernible change in interactive responsiveness
- slight improvements in results from tests such as kernbench e.g.
Average Half Load -j 3 Run:
Elapsed Time 573.582 (vs 576.196)
User Time 1576.36 (vs 1582.19)
System Time 100.866 (vs 102.266)
Percent CPU 291.8 (vs 291.8)
Context Switches 9970.2 (vs 10653.6)
Sleeps 23996 (vs 23736.2)
Average Optimal -j 16 Load Run:
Elapsed Time 439.88 (vs 443.21)
User Time 1605.09 (vs 1613.75)
System Time 104.618 (vs 106.99)
Percent CPU 388 (vs 387.6)
Context Switches 29816.4 (vs 35444.6)
Sleeps 25589.8 (vs 27509.4)
Average Maximal -j Load Run:
Elapsed Time 460.252 (vs 460.72)
User Time 1628.57 (vs 1633.04)
System Time 147.666 (vs 147.222)
Percent CPU 385.4 (vs 386.2)
Context Switches 78475.6 (vs 78242.4)
Sleeps 126011 (vs 113026)
In summary, this patch provides code simplification without cost.
--
Dr Peter Williams, Chief Scientist peterw@aurema.com
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2004-06-04 7:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-29 5:27 [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array Peter Williams
2004-05-29 11:17 ` Con Kolivas
2004-05-30 0:19 ` Peter Williams
2004-05-30 12:56 ` Con Kolivas
2004-05-31 0:04 ` Peter Williams
2004-05-30 23:13 ` Con Kolivas
[not found] <214A1-6NK-7@gated-at.bofh.it>
[not found] ` <21acm-2GN-1@gated-at.bofh.it>
2004-05-29 12:24 ` Andi Kleen
2004-05-29 12:38 ` Con Kolivas
2004-06-04 7:40 ` Peter Williams
-- strict thread matches above, loose matches on Subject: below --
2004-05-29 1:39 Peter Williams
2004-05-28 4:52 Peter Williams
2004-05-28 9:05 ` Ingo Molnar
2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).