LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second
2007-03-01 22:52 ` [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second Simon Arlott
@ 2002-01-01 3:05 ` Pavel Machek
2007-03-05 22:35 ` Simon Arlott
2007-03-06 22:20 ` [PATCH (updated)] " Chuck Ebbert
2007-03-01 23:10 ` Bill Irwin
1 sibling, 2 replies; 17+ messages in thread
From: Pavel Machek @ 2002-01-01 3:05 UTC (permalink / raw)
To: Simon Arlott; +Cc: Linux Kernel Mailing List, akpm, arjan
Hi!
> Whenever jiffies is started at a multiple of 5*HZ or
> wraps, calc_load is run exactly on the second which is
> when tasks using round_jiffies will be scheduled to run.
> This has a bad effect on the load average, making it
> tend towards 1.00 if a task happens to run every time
> the load is being calculated.
>
> This changes calc_load so that it updates load half a
> second after any tasks scheduled using round_jiffies.
Hmm, otoh this makes calc_load more expensive, power-wise, because it
needs to wake the cpu once more?
Timetraveling, sorry.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* round_jiffies and load average
@ 2007-02-24 15:19 Simon Arlott
2007-03-01 9:11 ` Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-02-24 15:19 UTC (permalink / raw)
To: Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2205 bytes --]
I've modified the driver of an USB device (cxacru) to schedule the next poll for status every 1s using round_jiffies_relative instead of just waiting 1s since the last poll was completed. This process takes on average 11ms to complete and while it is waiting for a response it's considered running.
The load average is calculated every 5 seconds equivalent to round_jiffies, so I have a problem with the load average always tending towards 1.00:
14:26:19 up 6 days, 14:45, 5 users, load average: 1.10, 1.04, 1.02
14:27:19 up 6 days, 14:46, 5 users, load average: 1.03, 1.03, 1.01
14:28:19 up 6 days, 14:47, 5 users, load average: 0.95, 1.01, 1.00
14:29:19 up 6 days, 14:48, 5 users, load average: 0.90, 0.97, 0.99
14:30:19 up 6 days, 14:49, 5 users, load average: 0.93, 0.96, 0.98
14:31:19 up 6 days, 14:50, 5 users, load average: 0.90, 0.94, 0.97
14:32:20 up 6 days, 14:51, 5 users, load average: 0.96, 0.95, 0.97
14:33:20 up 6 days, 14:52, 5 users, load average: 0.98, 0.95, 0.97
14:34:20 up 6 days, 14:53, 5 users, load average: 1.09, 0.99, 0.98
14:35:20 up 6 days, 14:54, 5 users, load average: 1.03, 0.99, 0.98
14:36:20 up 6 days, 14:55, 5 users, load average: 1.01, 0.99, 0.98
14:37:20 up 6 days, 14:56, 5 users, load average: 1.00, 0.99, 0.98
14:38:21 up 6 days, 14:57, 5 users, load average: 1.22, 1.05, 1.00
14:39:21 up 6 days, 14:58, 5 users, load average: 1.08, 1.04, 1.00
14:40:21 up 6 days, 14:59, 5 users, load average: 1.03, 1.03, 1.00
14:41:21 up 6 days, 15:00, 5 users, load average: 0.95, 1.01, 0.99
14:42:21 up 6 days, 15:01, 5 users, load average: 0.90, 0.97, 0.98
14:43:21 up 6 days, 15:02, 5 users, load average: 0.99, 0.99, 0.99
14:44:22 up 6 days, 15:03, 5 users, load average: 0.86, 0.94, 0.97
14:45:22 up 6 days, 15:04, 5 users, load average: 0.97, 0.95, 0.97
14:46:22 up 6 days, 15:05, 5 users, load average: 1.08, 0.99, 0.98
14:47:22 up 6 days, 15:06, 5 users, load average: 1.03, 0.99, 0.98
14:48:22 up 6 days, 15:07, 5 users, load average: 1.32, 1.07, 1.01
Perhaps the load average count could be run just before or in the middle of a second instead?
--
Simon Arlott
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 829 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: round_jiffies and load average
2007-02-24 15:19 round_jiffies and load average Simon Arlott
@ 2007-03-01 9:11 ` Simon Arlott
2007-03-01 18:52 ` [PATCH] timer: Add an initial 0.5s delay to calc_load Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-01 9:11 UTC (permalink / raw)
To: Linux Kernel Mailing List
On 24/02/07 15:19, Simon Arlott wrote:
> I've modified the driver of an USB device (cxacru) to schedule the next poll
> for status every 1s using round_jiffies_relative instead of just waiting 1s
> since the last poll was completed. This process takes on average 11ms to
> complete and while it is waiting for a response it's considered running.
> The load average is calculated every 5 seconds equivalent to round_jiffies,
> so I have a problem with the load average always tending towards 1.00.
> Perhaps the load average count could be run just before or in the middle of
> a second instead?
I've added some printks to show exactly what happens (see below).
Since calc_load() appears to be run as often as possible, would it be ok to
move it to run half a second out of step from rounded jiffies? Its count
value is initialised to LOAD_FREQ so it would be easy to add HZ/2.
Otherwise every process using round_jiffies to 1s (1.00), 2s (0.50),
3s (0.33), 4s (0.25), 5s (0.00 or 1.00) could affect the load average in a
constant way. (Beyond 10s the effect is unlikely to be noticed).
After a while of this (~3min):
[ 192.147049] cxacru_poll_status(..), jiffies=4294832000
[ 192.151319] cxacru_poll_status(..), jiffies=4294832004, next=996
[ 192.463459] calc_load(27), count=-20, jiffies=4294832316
[ 192.463472] calc_load(27), count=4980, jiffies=4294832316
Jiffies wraps:
[ 327.121469] cxacru_poll_status(..), jiffies=4294967000
[ 327.139428] cxacru_poll_status(..), jiffies=4294967018, next=1000
[ 327.437739] calc_load(27), count=-20, jiffies=20
[ 327.437750] calc_load(27), count=4980, jiffies=20
The next run occurs during cxacru_poll_status:
[ 332.416288] cxacru_poll_status(..), jiffies=5000
[ 332.417312] calc_load(1), count=-1, jiffies=5001
[ 332.417322] calc_load(1), count=4999, jiffies=5001
[ 332.423382] cxacru_poll_status(..), jiffies=5007, next=993
This happens without fail until I reboot:
[ 672.350970] cxacru_poll_status(..), jiffies=345000
[ 672.351913] calc_load(1), count=-1, jiffies=345001
[ 672.351926] calc_load(1), count=4999, jiffies=345001
[ 672.367737] cxacru_poll_status(..), jiffies=345016, next=984
It's interesting that calc_load() is run most of the time 19ms later (count=-20) than usual until jiffies wraps, then it runs on time or 1ms late (count=-2). Full log available on request (75K), HZ=1000 with NO_HZ.
(diff below of cxacru.c based on http://lkml.org/lkml/2007/2/23/328)
---
diff --git a/drivers/usb/atm/cxacru.c b/drivers/usb/atm/cxacru.c
index c8b69bf..4717fa4 100644
--- a/drivers/usb/atm/cxacru.c
+++ b/drivers/usb/atm/cxacru.c
@@ -535,6 +535,7 @@ static void cxacru_poll_status(struct work_struct *work)
struct atm_dev *atm_dev = usbatm->atm_dev;
int ret;
+ printk(KERN_DEBUG "cxacru_poll_status(..), jiffies=%lu\n", jiffies);
ret = cxacru_cm_get_array(instance, CM_REQUEST_CARD_INFO_GET, buf, CXINF_MAX);
if (ret < 0) {
atm_warn(usbatm, "poll status: error %d\n", ret);
@@ -599,6 +600,8 @@ static void cxacru_poll_status(struct work_struct *work)
reschedule:
schedule_delayed_work(&instance->poll_work,
round_jiffies_relative(msecs_to_jiffies(POLL_INTERVAL*1000)));
+ printk(KERN_DEBUG "cxacru_poll_status(..), jiffies=%lu, next=%lu\n", jiffies,
+ round_jiffies_relative(msecs_to_jiffies(POLL_INTERVAL*1000)));
}
static int cxacru_fw(struct usb_device *usb_dev, enum cxacru_fw_request fw,
diff --git a/kernel/timer.c b/kernel/timer.c
index cb1b86a..9c2b816 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1230,12 +1230,14 @@ static inline void calc_load(unsigned long ticks)
count -= ticks;
if (unlikely(count < 0)) {
active_tasks = count_active_tasks();
+ printk(KERN_DEBUG "calc_load(%lu), count=%d, jiffies=%lu\n", ticks, count, jiffies);
do {
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
count += LOAD_FREQ;
} while (count < 0);
+ printk(KERN_DEBUG "calc_load(%lu), count=%d, jiffies=%lu\n", ticks, count, jiffies);
}
}
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] timer: Add an initial 0.5s delay to calc_load
2007-03-01 9:11 ` Simon Arlott
@ 2007-03-01 18:52 ` Simon Arlott
2007-03-01 22:52 ` [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-01 18:52 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: akpm, arjan
This adds an initial 0.5s delay to calc_load so that it avoids updating
load at the same time as tasks scheduled using round_jiffies, otherwise
the load average is badly affected by tasks that run every time calc_load
does (currently every 5s).
I'm assuming this change doesn't affect the intention of round_jiffies,
to avoid tasks waking the cpu at different times, because calc_load is
already run very often by a call from update_times(ticks) on every timer
interrupt.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
Without this change or an appropriate equivalent, my change to cxacru
causes the load to stay around 1.00 even when mostly idle since it now
runs every second using round_jiffies:
> [ 332.416288] cxacru_poll_status(..), jiffies=5000 [start]
> [ 332.417312] calc_load(1), count=-1, jiffies=5001 [start]
> [ 332.417322] calc_load(1), count=4999, jiffies=5001 [finish]
> [ 332.423382] cxacru_poll_status(..), jiffies=5007, next=993 [finish]
kernel/timer.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index cb1b86a..4bb21b5 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1225,7 +1225,7 @@ EXPORT_SYMBOL(avenrun);
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
- static int count = LOAD_FREQ;
+ static int count = LOAD_FREQ + HZ/2;
count -= ticks;
if (unlikely(count < 0)) {
--
1.5.0.1
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second
2007-03-01 18:52 ` [PATCH] timer: Add an initial 0.5s delay to calc_load Simon Arlott
@ 2007-03-01 22:52 ` Simon Arlott
2002-01-01 3:05 ` Pavel Machek
2007-03-01 23:10 ` Bill Irwin
0 siblings, 2 replies; 17+ messages in thread
From: Simon Arlott @ 2007-03-01 22:52 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: akpm, arjan
Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
run exactly on the second which is when tasks using round_jiffies will
be scheduled to run. This has a bad effect on the load average, making
it tend towards 1.00 if a task happens to run every time the load is
being calculated.
This changes calc_load so that it updates load half a second after any
tasks scheduled using round_jiffies.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
Without this change or an appropriate equivalent, my change to cxacru
causes the load to stay around 1.00 even when mostly idle since it now
runs every second using round_jiffies:
> [ 332.416288] cxacru_poll_status(..), jiffies=5000 [start]
> [ 332.417312] calc_load(1), count=-1, jiffies=5001 [start]
> [ 332.417322] calc_load(1), count=4999, jiffies=5001 [finish]
> [ 332.423382] cxacru_poll_status(..), jiffies=5007, next=993 [finish]
kernel/timer.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index cb1b86a..ee61f6c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1225,17 +1225,19 @@ EXPORT_SYMBOL(avenrun);
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
- static int count = LOAD_FREQ;
+ static int count = LOAD_FREQ + HZ/2;
count -= ticks;
- if (unlikely(count < 0)) {
+ if (unlikely(count <= 0)) {
active_tasks = count_active_tasks();
do {
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
count += LOAD_FREQ;
- } while (count < 0);
+ } while (count <= 0);
+
+ count = round_jiffies_relative(count + HZ/2) - HZ/2;
}
}
--
1.5.0.1
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second
2007-03-01 22:52 ` [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second Simon Arlott
2002-01-01 3:05 ` Pavel Machek
@ 2007-03-01 23:10 ` Bill Irwin
2007-03-02 10:15 ` [PATCH (update 2)] " Simon Arlott
1 sibling, 1 reply; 17+ messages in thread
From: Bill Irwin @ 2007-03-01 23:10 UTC (permalink / raw)
To: Simon Arlott; +Cc: Linux Kernel Mailing List, akpm, arjan
On Thu, Mar 01, 2007 at 10:52:01PM +0000, Simon Arlott wrote:
> Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
> run exactly on the second which is when tasks using round_jiffies will
> be scheduled to run. This has a bad effect on the load average, making
> it tend towards 1.00 if a task happens to run every time the load is
> being calculated.
> This changes calc_load so that it updates load half a second after any
> tasks scheduled using round_jiffies.
> Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
Well, it doesn't reintroduce the for_each_task() (not that it's present
in similar form) loop in count_active_tasks(), so it doesn't bother me.
You seem to have merely changed some offsets, which resolves the
round_jiffies() clash. It's easy to envision similar degenerate cases,
though I'm not sure we care enough to drop in a PRNG to handle them.
== wli
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 2)] timer: Run calc_load halfway through each round_jiffies second
2007-03-01 23:10 ` Bill Irwin
@ 2007-03-02 10:15 ` Simon Arlott
2007-03-02 15:15 ` [PATCH (update 3)] " Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-02 10:15 UTC (permalink / raw)
To: akpm; +Cc: Bill Irwin, Linux Kernel Mailing List, arjan
Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
run exactly on the second which is when tasks using round_jiffies will
be scheduled to run. This has a bad effect on the load average, making
it tend towards 1.00 if a task happens to run every time the load is
being calculated.
This changes calc_load so that it updates load half a second after any
tasks scheduled using round_jiffies.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
This version starts count off at 0, since LOAD_FREQ + HZ/2 is no better
than LOAD_FREQ, and avoids rounding it to a negative value.
kernel/timer.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index cb1b86a..11ccfed 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1225,17 +1225,19 @@ EXPORT_SYMBOL(avenrun);
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
- static int count = LOAD_FREQ;
+ static int count = 0;
count -= ticks;
- if (unlikely(count < 0)) {
+ if (unlikely(count <= 0)) {
active_tasks = count_active_tasks();
do {
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
count += LOAD_FREQ;
- } while (count < 0);
+ } while (count < HZ/2);
+
+ count = round_jiffies_relative(count + HZ/2) - HZ/2;
}
}
--
1.5.0.1
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 10:15 ` [PATCH (update 2)] " Simon Arlott
@ 2007-03-02 15:15 ` Simon Arlott
2007-03-02 16:35 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-02 15:15 UTC (permalink / raw)
To: akpm; +Cc: Bill Irwin, Linux Kernel Mailing List, arjan
Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
run exactly on the second which is when tasks using round_jiffies will
be scheduled to run. This has a bad effect on the load average, making
it tend towards 1.00 if a task happens to run every time the load is
being calculated.
This changes calc_load so that it updates load half a second after any
tasks scheduled using round_jiffies.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
This version starts count off at 0, since LOAD_FREQ + HZ/2 is no better
than LOAD_FREQ. The previous version is actually wrong, rounding count to
a negative value will only happen when getting calc_load running at the
right time related to jiffies, it won't be negative running normally -
even with NO_HZ.
kernel/timer.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index 6663a87..4abead0 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1226,17 +1226,19 @@ EXPORT_SYMBOL(avenrun);
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
- static int count = LOAD_FREQ;
+ static int count = 0;
count -= ticks;
- if (unlikely(count < 0)) {
+ if (unlikely(count <= 0)) {
active_tasks = count_active_tasks();
do {
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
count += LOAD_FREQ;
- } while (count < 0);
+ } while (count <= 0);
+
+ count = round_jiffies_relative(count + HZ/2) - HZ/2;
}
}
--
1.5.0.1
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 15:15 ` [PATCH (update 3)] " Simon Arlott
@ 2007-03-02 16:35 ` Eric Dumazet
2007-03-02 17:32 ` Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2007-03-02 16:35 UTC (permalink / raw)
To: Simon Arlott; +Cc: akpm, Bill Irwin, Linux Kernel Mailing List, arjan
On Friday 02 March 2007 16:15, Simon Arlott wrote:
> Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
> run exactly on the second which is when tasks using round_jiffies will
> be scheduled to run. This has a bad effect on the load average, making
> it tend towards 1.00 if a task happens to run every time the load is
> being calculated.
>
> This changes calc_load so that it updates load half a second after any
> tasks scheduled using round_jiffies.
>
Simon
I believe this patch is too complex/hazardous and may break exp decay
computation.
(Even if nobody care about avenrun[] those days :), do you ? )
You could just change LOAD_FREQ from (5*HZ) to (5*HZ+1)
#define LOAD_FREQ (5*HZ+1)
Mathematical proof (well... sort of)
$ cat prog.c
#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 ((double)(1<<FSHIFT)) /* 1.0 as fixed-point */
#include <math.h>
#include <stdio.h>
int main()
{
printf("Old values :\n");
printf("#define EXP_1 %g\n", FIXED_1/exp(5.0/60.0));
printf("#define EXP_5 %g\n", FIXED_1/exp(5.0/(5*60.0)));
printf("#define EXP_15 %g\n", FIXED_1/exp(5.0/(15*60.0)));
printf("New values :\n");
printf("%g\n", FIXED_1/exp(5.01/60.0));
printf("%g\n", FIXED_1/exp(5.01/(5*60.0)));
printf("%g\n", FIXED_1/exp(5.01/(15*60.0)));
return 0;
}
# gcc -o prog prog.c -lm
# ./prog
Old values :
#define EXP_1 1884.25
#define EXP_5 2014.15
#define EXP_15 2036.65
New values :
1883.94
2014.08
2036.63
You can see that 5.01 instead of 5.00 second gives the same EXP_xx values.
So (5*HZ + 1) is safe. (because HZ >= 100)
Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 16:35 ` Eric Dumazet
@ 2007-03-02 17:32 ` Simon Arlott
2007-03-02 18:03 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-02 17:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: akpm, Bill Irwin, Linux Kernel Mailing List, arjan
On 02/03/07 16:35, Eric Dumazet wrote:
> On Friday 02 March 2007 16:15, Simon Arlott wrote:
>> Whenever jiffies is started at a multiple of 5*HZ or wraps, calc_load is
>> run exactly on the second which is when tasks using round_jiffies will
>> be scheduled to run. This has a bad effect on the load average, making
>> it tend towards 1.00 if a task happens to run every time the load is
>> being calculated.
>>
>> This changes calc_load so that it updates load half a second after any
>> tasks scheduled using round_jiffies.
>
> I believe this patch is too complex/hazardous and may break exp decay
> computation.
Only for a single calculation whenever it has to adjust, which should only
happen every 49.7 days (on 32-bit archs). (Or 5 minutes after booting...
I always wondered why that happened and now I see it's initialised so it
always wraps early). Whilst it is in sync with jiffies it will not affect
the process - count is just set to the current value every time. Even with
NO_HZ because jiffies will be correct when calc_load is called.
> (Even if nobody care about avenrun[] those days :), do you ? )
>
> You could just change LOAD_FREQ from (5*HZ) to (5*HZ+1)
> You can see that 5.01 instead of 5.00 second gives the same EXP_xx values.
>
> So (5*HZ + 1) is safe. (because HZ >= 100)
On HZ=1000, this would cause the load average to be pushed towards +1.00
for up to 2 minutes every ~83 minutes with no obvious cause. (If a task
takes ~10-20ms to run, so 20 runs are needed at HZ=1000 before it passes
it again).
On HZ=100 it would happen every ~8 minutes for up to 10 seconds and never
be noticed.
Using 5*HZ+2 would move this to ~167 and ~17 minutes which would mitigate
the effect further still without changing the exp values.
1884.25 -> 1883.62
2014.15 -> 2014.02
2036.65 -> 2036.61
Will anyone notice if the load is adjusted slightly less frequently?
If this is considered preferable to adjusting calc_load to avoid almost all
round_jiffies scheduled tasks (some of which may take longer than ~15ms to
run), then I have no problems with it - I just needed something to stop my
driver changes doing odd things to the load average for other people. I'll
continue to run with this version, is it possible to add a Kconfig option
for it somewhere?
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 17:32 ` Simon Arlott
@ 2007-03-02 18:03 ` Eric Dumazet
2007-03-02 20:14 ` Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2007-03-02 18:03 UTC (permalink / raw)
To: Simon Arlott; +Cc: akpm, Bill Irwin, Linux Kernel Mailing List, arjan
On Friday 02 March 2007 18:32, Simon Arlott wrote:
> On 02/03/07 16:35, Eric Dumazet wrote:
> > You could just change LOAD_FREQ from (5*HZ) to (5*HZ+1)
> > You can see that 5.01 instead of 5.00 second gives the same EXP_xx
> > values.
> >
> > So (5*HZ + 1) is safe. (because HZ >= 100)
>
> On HZ=1000, this would cause the load average to be pushed towards +1.00
> for up to 2 minutes every ~83 minutes with no obvious cause. (If a task
> takes ~10-20ms to run, so 20 runs are needed at HZ=1000 before it passes
> it again).
Nope, you dont quite understand how load (avenrun[]) is computed.
Every 5 seconds, three values are adjusted, based on their previous value and
the actual value. Lets focus on the first value (mean load average on one
minute)
exp = 1.0 / exp(5.0/60.0);
avenrun[0] = (avenrun[0] * exp) + (active * (1.0 - exp));
If previous value is 0.0, and current active count 1, then next value for
avenrun[0] will be : 0.0799556
Not exactly 1.0 as you think !
Then in the next intervals (if active count is 0), it will decrease 'slowly' :
0.0735627
0.0676809
0.0622695
0.0572907
In average, your load factor close to reality.
Just try my suggestion, it should work. I even proved it in my previous
mail :)
Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 18:03 ` Eric Dumazet
@ 2007-03-02 20:14 ` Simon Arlott
2007-03-02 22:32 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-02 20:14 UTC (permalink / raw)
To: Eric Dumazet; +Cc: akpm, Bill Irwin, Linux Kernel Mailing List, arjan
On 02/03/07 18:03, Eric Dumazet wrote:
> On Friday 02 March 2007 18:32, Simon Arlott wrote:
>> On 02/03/07 16:35, Eric Dumazet wrote:
>
>>> You could just change LOAD_FREQ from (5*HZ) to (5*HZ+1)
>>> You can see that 5.01 instead of 5.00 second gives the same EXP_xx
>>> values.
>>>
>>> So (5*HZ + 1) is safe. (because HZ >= 100)
>> On HZ=1000, this would cause the load average to be pushed towards +1.00
>> for up to 2 minutes every ~83 minutes with no obvious cause. (If a task
>> takes ~10-20ms to run, so 20 runs are needed at HZ=1000 before it passes
>> it again).
>
> Nope, you dont quite understand how load (avenrun[]) is computed.
> Not exactly 1.0 as you think !
> Then in the next intervals (if active count is 0), it will decrease 'slowly' :
> 0.0735627
> 0.0676809
> 0.0622695
> 0.0572907
>
> In average, your load factor close to reality.
I knew that; but the task runs for more than 1 tick and it takes until the
next calc_load run before it moves on even 1 tick.
> Just try my suggestion, it should work. I even proved it in my previous
> mail :)
With HZ=1000, the active count will be 1 up to 20 times in a row before it
becomes out of sync with when the task is run again. This is ample time for
the load value itself to get closer to 1:
$ uptime; (yes>/dev/null &); sleep 100; uptime
20:00:29 up 4:35, 7 users, load average: 0.33, 0.51, 0.78
20:02:09 up 4:37, 7 users, load average: 0.97, 0.67, 0.81
(not very useful results since the load isn't at 0.00 very often)
On 02/03/07 16:35, Eric Dumazet wrote:
> I believe this patch is too complex/hazardous and may break exp decay
> computation.
I still don't know why you think it may change the computation of load (aside
from at boot or jiffies wrapping), and it's not really complex at all. It is
possible that someone will change the value of LOAD_FREQ to something other
than a multiple of HZ and this won't work because it'll get rounded up to a
whole second. That and the negligible extra processing time of doing
round_jiffies every 5 seconds is the only problem I can see.
I accidentally left LOAD_FREQ at 5 instead of 5*HZ and had a printk in there,
it still worked fine aside from the load average going up and down every tick.
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 20:14 ` Simon Arlott
@ 2007-03-02 22:32 ` Eric Dumazet
2007-03-02 23:54 ` Simon Arlott
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2007-03-02 22:32 UTC (permalink / raw)
To: Simon Arlott; +Cc: akpm, Bill Irwin, Linux Kernel Mailing List, arjan
Simon Arlott a écrit :
> On 02/03/07 18:03, Eric Dumazet wrote:
>> On Friday 02 March 2007 18:32, Simon Arlott wrote:
>>> On 02/03/07 16:35, Eric Dumazet wrote:
>>
>>>> You could just change LOAD_FREQ from (5*HZ) to (5*HZ+1)
>>>> You can see that 5.01 instead of 5.00 second gives the same EXP_xx
>>>> values.
>>>>
>>>> So (5*HZ + 1) is safe. (because HZ >= 100)
>>> On HZ=1000, this would cause the load average to be pushed towards +1.00
>>> for up to 2 minutes every ~83 minutes with no obvious cause. (If a task
>>> takes ~10-20ms to run, so 20 runs are needed at HZ=1000 before it passes
>>> it again).
>>
>> Nope, you dont quite understand how load (avenrun[]) is computed.
>> Not exactly 1.0 as you think !
>> Then in the next intervals (if active count is 0), it will decrease
>> 'slowly' : 0.0735627
>> 0.0676809
>> 0.0622695
>> 0.0572907
>>
>> In average, your load factor close to reality.
>
> I knew that; but the task runs for more than 1 tick and it takes until
> the next calc_load run before it moves on even 1 tick.
>
>> Just try my suggestion, it should work. I even proved it in my
>> previous mail :)
>
> With HZ=1000, the active count will be 1 up to 20 times in a row before
> it becomes out of sync with when the task is run again. This is ample
> time for the load value itself to get closer to 1:
> $ uptime; (yes>/dev/null &); sleep 100; uptime
> 20:00:29 up 4:35, 7 users, load average: 0.33, 0.51, 0.78
> 20:02:09 up 4:37, 7 users, load average: 0.97, 0.67, 0.81
> (not very useful results since the load isn't at 0.00 very often)
>
>
> On 02/03/07 16:35, Eric Dumazet wrote:
>> I believe this patch is too complex/hazardous and may break exp decay
>> computation.
>
> I still don't know why you think it may change the computation of load
> (aside from at boot or jiffies wrapping), and it's not really complex at
> all. It is possible that someone will change the value of LOAD_FREQ to
> something other than a multiple of HZ and this won't work because it'll
> get rounded up to a whole second. That and the negligible extra
> processing time of doing round_jiffies every 5 seconds is the only
> problem I can see.
You apparently have no idea of the mathematic formulae used.
This formulae has a meaning *only* if EXP_1, EXP_5, EXP_15 are directly
computed from the exact LOAD_FREQ value. If you change it 'randomly' without
changing the EXP_... you basically compute a wrong value... So what ? Do you
want to impress your boss with a given value ?
Please dont mess it. Just ignore the avenrun values and let it die.
You can change it to suit your needs, but it wont suit every needs.
Imagine for example your task is awaken for 1us periods every HZ.
Basically your cpu load should be HZ/1000000 (machine mostly idle)
But computed 'load' will be 1.0
This whole avenrun[] thing is plain stupid anyway. The load should be
something between 0 and 1 (per cpu), to get a precise idea of cpu_power
used/unused. Nobody mentioned avenrun[] values on lkml in the last decade.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 3)] timer: Run calc_load halfway through each round_jiffies second
2007-03-02 22:32 ` Eric Dumazet
@ 2007-03-02 23:54 ` Simon Arlott
0 siblings, 0 replies; 17+ messages in thread
From: Simon Arlott @ 2007-03-02 23:54 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Linux Kernel Mailing List
(I've removed the other CC:s for now to avoid annoying them - assuming
removing them from the CC: list doesn't do that).
On 02/03/07 22:32, Eric Dumazet wrote:
> Simon Arlott a écrit :
>> On 02/03/07 16:35, Eric Dumazet wrote:
>>> I believe this patch is too complex/hazardous and may break exp decay
>>> computation.
>>
>> I still don't know why you think it may change the computation of load
>> (aside from at boot or jiffies wrapping), and it's not really complex
>> at all. It is possible that someone will change the value of LOAD_FREQ
>> to something other than a multiple of HZ and this won't work because
>> it'll get rounded up to a whole second. That and the negligible extra
>> processing time of doing round_jiffies every 5 seconds is the only
>> problem I can see.
>
> You apparently have no idea of the mathematic formulae used.
You're right, I've not looked at exactly how it's done. I do know that
calling it early/late (by +/- 750ms/250ms) *once* on boot and every 7
weeks is going to have a tiny effect that will go away quickly.
> This formulae has a meaning *only* if EXP_1, EXP_5, EXP_15 are directly
> computed from the exact LOAD_FREQ value. If you change it 'randomly'
> without changing the EXP_... you basically compute a wrong value...
My comment asking about why you say it's complex is about the patch, not
the computation. After an initial sync the patch does not change how
often the calc_load computation is run or how it works, it will still be
run exactly the same way and every 5*HZ ticks as before (try adding a
printk to report when the count value's been changed by the call to
round_jiffies).
> So what ? Do you want to impress your boss with a given value ?
?
> Please dont mess it. Just ignore the avenrun values and let it die.
> You can change it to suit your needs, but it wont suit every needs.
I'm not sure you realise the problem the patch fixes, which is that since
2.6.20 tasks can use round_jiffies to run "in around X seconds time" at a
time when jiffies % HZ == 0 to avoid waking the CPU several times because
of multiple timers firing at random. This affects the calc_load process
really badly because it also tends to run at this time too. Moving calc_load
to always run halfway between a second will solve the problem.
> Imagine for example your task is awaken for 1us periods every HZ.
> Basically your cpu load should be HZ/1000000 (machine mostly idle)
> But computed 'load' will be 1.0
> This whole avenrun[] thing is plain stupid anyway. The load should be
> something between 0 and 1 (per cpu), to get a precise idea of cpu_power
> used/unused. Nobody mentioned avenrun[] values on lkml in the last decade.
If you're sure no one here cares about the values then I'll submit a patch
to remove them from /proc/loadavg (and wait it to get rejected). Until the
whole concept of load average and how it should be better calculated and
reported is solved the best that can be done is to not make things worse
like round_jiffies has done.
*I* know the values are nowhere near perfect, but I made a change to a
driver so that it uses round_jiffies to schedule status checking because
it's appropriate and it makes the load average values sit at 1.00 even when
idle - someone is going to complain about it and possibly take the time to
find out why. If they bisect it, my patch will show up as the cause of the
problem. (Assuming someone else uses this device; I sent an email to the
accessrunner list once -rc2-mm1 was released asking for people to test it).
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second
2002-01-01 3:05 ` Pavel Machek
@ 2007-03-05 22:35 ` Simon Arlott
2007-03-06 18:42 ` [PATCH (update 4)] " Simon Arlott
2007-03-06 22:20 ` [PATCH (updated)] " Chuck Ebbert
1 sibling, 1 reply; 17+ messages in thread
From: Simon Arlott @ 2007-03-05 22:35 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linux Kernel Mailing List, akpm, arjan
On 01/01/02 03:05, Pavel Machek wrote:
>> Whenever jiffies is started at a multiple of 5*HZ or
>> wraps, calc_load is run exactly on the second which is
>> when tasks using round_jiffies will be scheduled to run.
>> This has a bad effect on the load average, making it
>> tend towards 1.00 if a task happens to run every time
>> the load is being calculated.
>>
>> This changes calc_load so that it updates load half a
>> second after any tasks scheduled using round_jiffies.
>
> Hmm, otoh this makes calc_load more expensive, power-wise, because it
> needs to wake the cpu once more?
This was something I was concerned about, it's hard to avoid since it
shouldn't be run when scheduled tasks are which leaves running it
before those tasks run as the only other option. To do that you need
some idea of how long it's going to take to run.
Mar 1 22:36:17 redrum [ 639.147319] calc_load(1) jiffies=311500 count=0 [start]
Mar 1 22:36:17 redrum [ 639.147331] calc_load... count=5000 [run]
Mar 1 22:36:17 redrum [ 639.147336] calc_load... count=5000 [finish]
While I really doubt the accuracy of these timings using printk (they
vary a lot only going as low as 4 + 4µs), it does show that 1ms is plenty
of time and that it *could* be scheduled at 1 tick before the second even
at HZ=1000.
Of course then you miss out on ever catching tasks scheduled with
round_jiffies since they'd need to run for a full second to affect it.
Also, calc_load with NO_HZ only appears to run every time the system
needed to wake up for something else (calc_load itself is already able
to handle this appropriately), so this isn't as much of a problem as it
might seem. Someone whose CPU only needs to be woken up every 5s (or 4.5s
for a round_jiffies task) will find their load being higher than it
should be - but the current version does that already. (Mine doesn't
stay below 0.5 with NO_HZ because of this).
> Timetraveling, sorry.
Since Eric Dumazet seems to have disappeared for now I really need
other people to comment on this too. Although you shouldn't literally
time travel ;) (Date: Tue, 1 Jan 2002 03:05:07 +0000).
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (update 4)] timer: Run calc_load halfway through each round_jiffies second
2007-03-05 22:35 ` Simon Arlott
@ 2007-03-06 18:42 ` Simon Arlott
0 siblings, 0 replies; 17+ messages in thread
From: Simon Arlott @ 2007-03-06 18:42 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Pavel Machek, akpm, arjan
On 05/03/07 22:35, Simon Arlott wrote:
> On 01/01/02 03:05, Pavel Machek wrote:
>>> This changes calc_load so that it updates load half a second after
>>> any tasks scheduled using round_jiffies.
>>
>> Hmm, otoh this makes calc_load more expensive, power-wise, because it
>> needs to wake the cpu once more?
I suddenly realised that since calc_load was already being called at
second + 1 tick, moving it back 1 tick should mean it gets called and
allowed to finish before the other tasks:
[ 1392.745730] cxacru_poll_status jiffies=1064000 [start]
[ 1392.765628] cxacru_poll_status jiffies=1064020 [finish]
[ 1393.745284] calc_load(1) jiffies=1065000 count=5000 count=5000
[ 1393.745320] cxacru_poll_status jiffies=1065000 [start]
[ 1393.754412] cxacru_poll_status jiffies=1065009 [finish]
How will this work in the presence of multiple CPUs? Will the scheduled
task be run on another CPU while calc_load is run? I don't have anything
I can test it on.
However since 5*HZ is unlikely to be a factor of 2^32, as soon as jiffies
wraps a second time it will no longer be in sync:
HZ=100 0 (time after jiffies wrap that calc_load will be run)
HZ=100 4
HZ=100 8
HZ=100 12
HZ=100 16
...
It takes 3.2 *years* (increasing 40ms each time) at HZ=100 before it's
back in sync. This would be ok for my task which is unlikely to ever take
40ms or more to complete, but could still hit other tasks.
HZ=1000 0
HZ=1000 704
HZ=1000 408
HZ=1000 112
HZ=1000 816
...
It takes 16.9 years (jumping around, only <=40ms 4 times) at HZ=1000
before it's back in sync.
Since calc_load is now running on the second already and this is
preferable to different times during a second than other tasks, it can
be rounded more easily than before:
kernel/timer.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index 6663a87..4abead0 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1226,17 +1226,19 @@ EXPORT_SYMBOL(avenrun);
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
static int count = LOAD_FREQ;
count -= ticks;
- if (unlikely(count < 0)) {
+ if (unlikely(count <= 0)) {
active_tasks = count_active_tasks();
do {
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
count += LOAD_FREQ;
- } while (count < 0);
+ } while (count <= 0);
+
+ count = round_jiffies_relative(count);
}
}
--
Simon Arlott
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second
2002-01-01 3:05 ` Pavel Machek
2007-03-05 22:35 ` Simon Arlott
@ 2007-03-06 22:20 ` Chuck Ebbert
1 sibling, 0 replies; 17+ messages in thread
From: Chuck Ebbert @ 2007-03-06 22:20 UTC (permalink / raw)
To: Pavel Machek; +Cc: Simon Arlott, Linux Kernel Mailing List, akpm, arjan
Pavel Machek wrote:
>
> Timetraveling, sorry.
> Pavel
Sure confuses Thunderbird. It shows one unread message but
"N" does not do anything...
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-03-06 22:20 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-24 15:19 round_jiffies and load average Simon Arlott
2007-03-01 9:11 ` Simon Arlott
2007-03-01 18:52 ` [PATCH] timer: Add an initial 0.5s delay to calc_load Simon Arlott
2007-03-01 22:52 ` [PATCH (updated)] timer: Run calc_load halfway through each round_jiffies second Simon Arlott
2002-01-01 3:05 ` Pavel Machek
2007-03-05 22:35 ` Simon Arlott
2007-03-06 18:42 ` [PATCH (update 4)] " Simon Arlott
2007-03-06 22:20 ` [PATCH (updated)] " Chuck Ebbert
2007-03-01 23:10 ` Bill Irwin
2007-03-02 10:15 ` [PATCH (update 2)] " Simon Arlott
2007-03-02 15:15 ` [PATCH (update 3)] " Simon Arlott
2007-03-02 16:35 ` Eric Dumazet
2007-03-02 17:32 ` Simon Arlott
2007-03-02 18:03 ` Eric Dumazet
2007-03-02 20:14 ` Simon Arlott
2007-03-02 22:32 ` Eric Dumazet
2007-03-02 23:54 ` Simon Arlott
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).