LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* CPU load
@ 2002-07-10 14:50 David Chow
  2002-07-10 16:54 ` William Lee Irwin III
  0 siblings, 1 reply; 25+ messages in thread
From: David Chow @ 2002-07-10 14:50 UTC (permalink / raw)
  To: linux-kernel

Dear all,

Is there any calls in the kernel space I can determine the current
system load or CPU load?

regards,
David




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2002-07-10 14:50 CPU load David Chow
@ 2002-07-10 16:54 ` William Lee Irwin III
  2002-07-10 17:49   ` Robert Love
  0 siblings, 1 reply; 25+ messages in thread
From: William Lee Irwin III @ 2002-07-10 16:54 UTC (permalink / raw)
  To: David Chow; +Cc: linux-kernel

On Wed, Jul 10, 2002 at 10:50:15PM +0800, David Chow wrote:
> Is there any calls in the kernel space I can determine the current
> system load or CPU load?

Examine the avenrun array declared in kernel/timer.c in a manner similar
to how loadavg_read_proc() in fs/proc/proc_misc.c does.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2002-07-10 16:54 ` William Lee Irwin III
@ 2002-07-10 17:49   ` Robert Love
  2002-07-26 17:38     ` David Chow
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Love @ 2002-07-10 17:49 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: David Chow, linux-kernel

On Wed, 2002-07-10 at 09:54, William Lee Irwin III wrote:

> Examine the avenrun array declared in kernel/timer.c in a manner similar
> to how loadavg_read_proc() in fs/proc/proc_misc.c does.

David, I wanted to add that we formalized the locking rules on
avenrun[3] a couple kernel revisions ago.

In 2.4, I believe it is implicitly assumed you will do a cli() before
accessing the data (if you want all 3 values to be in sync you need the
read to be safe).

In 2.5, grab a read_lock on xtime_lock and go at it.

	Robert Love


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2002-07-10 17:49   ` Robert Love
@ 2002-07-26 17:38     ` David Chow
  0 siblings, 0 replies; 25+ messages in thread
From: David Chow @ 2002-07-26 17:38 UTC (permalink / raw)
  To: Robert Love; +Cc: William Lee Irwin III, linux-kernel

Robert Love wrote:

>On Wed, 2002-07-10 at 09:54, William Lee Irwin III wrote:
>
>  
>
>>Examine the avenrun array declared in kernel/timer.c in a manner similar
>>to how loadavg_read_proc() in fs/proc/proc_misc.c does.
>>    
>>
>
>David, I wanted to add that we formalized the locking rules on
>avenrun[3] a couple kernel revisions ago.
>
>In 2.4, I believe it is implicitly assumed you will do a cli() before
>accessing the data (if you want all 3 values to be in sync you need the
>read to be safe).
>
>In 2.5, grab a read_lock on xtime_lock and go at it.
>
>	Robert Love
>  
>

Thanks for your information. I think having a generic interface to 
deterining CPU load of the system can help developers to determine some 
task schdeuling policy to make the system more efficient utilise the 
systems processing power. For example, I would not want to do some 
intensive processing job when CPU load is high, but I can leaving this 
work util the CPU load is not high (for non-urgent tasks).

regards,
David

regards,
David


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-26 10:42                 ` malc
@ 2007-02-26 16:38                   ` Randy Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Dunlap @ 2007-02-26 16:38 UTC (permalink / raw)
  To: malc; +Cc: Pavel Machek, linux-kernel

On Mon, 26 Feb 2007 13:42:50 +0300 (MSK) malc wrote:

> On Mon, 26 Feb 2007, Pavel Machek wrote:
> 
> > Hi!
> >
> >> [..snip..]
> >>
> >>>> The current situation ought to be documented. Better yet some flag
> >>>> can
> >>>
> >>> It probably _is_ documented, somewhere :-). If you find nice place
> >>> where to document it (top manpage?) go ahead with the patch.
> >>
> >>
> >> How about this:
> >
> > Looks okay to me. (You should probably add your name to it, and I do
> > not like html-like markup... plus please don't add extra spaces
> > between words)...
> 
> Thanks. html-like markup was added to clearly mark the boundaries of
> the message and the text. Extra-spaces courtesy emacs' C-0 M-q.
> 
> >
> > You probably want to send it to akpm?
> 
> Any pointers on how to do that and perhaps preferred submission
> format?
> 
> [..snip..]

Well, he wrote it up and posted it at
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-26  9:28               ` Pavel Machek
@ 2007-02-26 10:42                 ` malc
  2007-02-26 16:38                   ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: malc @ 2007-02-26 10:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

On Mon, 26 Feb 2007, Pavel Machek wrote:

> Hi!
>
>> [..snip..]
>>
>>>> The current situation ought to be documented. Better yet some flag
>>>> can
>>>
>>> It probably _is_ documented, somewhere :-). If you find nice place
>>> where to document it (top manpage?) go ahead with the patch.
>>
>>
>> How about this:
>
> Looks okay to me. (You should probably add your name to it, and I do
> not like html-like markup... plus please don't add extra spaces
> between words)...

Thanks. html-like markup was added to clearly mark the boundaries of
the message and the text. Extra-spaces courtesy emacs' C-0 M-q.

>
> You probably want to send it to akpm?

Any pointers on how to do that and perhaps preferred submission
format?

[..snip..]

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-25 10:35             ` malc
@ 2007-02-26  9:28               ` Pavel Machek
  2007-02-26 10:42                 ` malc
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2007-02-26  9:28 UTC (permalink / raw)
  To: malc; +Cc: Con Kolivas, linux-kernel

Hi!

> [..snip..]
> 
> >>The current situation ought to be documented. Better yet some flag
> >>can
> >
> >It probably _is_ documented, somewhere :-). If you find nice place
> >where to document it (top manpage?) go ahead with the patch.
> 
> 
> How about this:

Looks okay to me. (You should probably add your name to it, and I do
not like html-like markup... plus please don't add extra spaces
between words)...

You probably want to send it to akpm?
								Pavel

> <Documentation/load.txt>
> CPU load
> --------
> 
> Linux exports various bits     of information via  `/proc/stat'    and
> `/proc/uptime' that userland tools,  such as top(1), use  to calculate
> the average time system spent in a particular state, for example:
> 
> <transcript>
> $ iostat
> Linux 2.6.18.3-exp (linmac)     02/20/2007
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           10.01    0.00    2.92    5.44    0.00   81.63
> 
> ...
> </transcript>
> 
> Here   the system  thinks that  over   the default sampling period the
> system spent 10.01% of the time doing work in user space, 2.92% in the
> kernel, and was overall 81.63% of the time idle.
> 
> In most cases the `/proc/stat'  information reflects the reality quite
> closely, however  due to the   nature of how/when  the kernel collects
> this data sometimes it can not be trusted at all.
> 
> So  how is this information  collected?   Whenever timer interrupt  is
> signalled the  kernel looks  what kind  of task   was running at  this
> moment  and   increments the counter  that  corresponds  to this tasks
> kind/state.  The  problem with  this is  that  the  system  could have
> switched between  various states   multiple times between    two timer
> interrupts yet the counter is incremented only for the last state.
> 
> 
> Example
> -------
> 
> If we imagine the system with one task that periodically burns cycles
> in the following manner:
> 
>  time line between two timer interrupts
> |--------------------------------------|
>  ^                                    ^
>  |_ something begins working          |
>                                       |_ something goes to sleep
>                                      (only to be awaken quite soon)
> 
> In the above  situation the system will be  0% loaded according to the
> `/proc/stat' (since  the timer interrupt will   always happen when the
> system is  executing  the idle  handler),  but in reality  the load is
> closer to 99%.
> 
> One can imagine many more situations where this behavior of the kernel
> will lead to quite erratic information inside `/proc/stat'.
> 
> 
> /* gcc -o hog smallhog.c */
> #include <time.h>
> #include <limits.h>
> #include <signal.h>
> #include <sys/time.h>
> #define HIST 10
> 
> static volatile sig_atomic_t stop;
> 
> static void sighandler (int signr)
> {
>      (void) signr;
>      stop = 1;
> }
> static unsigned long hog (unsigned long niters)
> {
>      stop = 0;
>      while (!stop && --niters);
>      return niters;
> }
> int main (void)
> {
>      int i;
>      struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
>                              .it_value = { .tv_sec = 0, .tv_usec = 1 } };
>      sigset_t set;
>      unsigned long v[HIST];
>      double tmp = 0.0;
>      unsigned long n;
>      signal (SIGALRM, &sighandler);
>      setitimer (ITIMER_REAL, &it, NULL);
> 
>      hog (ULONG_MAX);
>      for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
>      for (i = 0; i < HIST; ++i) tmp += v[i];
>      tmp /= HIST;
>      n = tmp - (tmp / 3.0);
> 
>      sigemptyset (&set);
>      sigaddset (&set, SIGALRM);
> 
>      for (;;) {
>          hog (n);
>          sigwait (&set, &i);
>      }
>      return 0;
> }
> 
> 
> References
> ----------
> 
> http://lkml.org/lkml/2007/2/12/6
> Documentation/filesystems/proc.txt (1.8)
> </Documentation/load.txt>
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-14 20:45           ` Pavel Machek
@ 2007-02-25 10:35             ` malc
  2007-02-26  9:28               ` Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: malc @ 2007-02-25 10:35 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Con Kolivas, linux-kernel

On Wed, 14 Feb 2007, Pavel Machek wrote:

> Hi!

[..snip..]

>> The current situation ought to be documented. Better yet some flag
>> can
>
> It probably _is_ documented, somewhere :-). If you find nice place
> where to document it (top manpage?) go ahead with the patch.


How about this:

<Documentation/load.txt>
CPU load
--------

Linux exports various bits     of information via  `/proc/stat'    and
`/proc/uptime' that userland tools,  such as top(1), use  to calculate
the average time system spent in a particular state, for example:

<transcript>
$ iostat
Linux 2.6.18.3-exp (linmac)     02/20/2007

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           10.01    0.00    2.92    5.44    0.00   81.63

...
</transcript>

Here   the system  thinks that  over   the default sampling period the
system spent 10.01% of the time doing work in user space, 2.92% in the
kernel, and was overall 81.63% of the time idle.

In most cases the `/proc/stat'  information reflects the reality quite
closely, however  due to the   nature of how/when  the kernel collects
this data sometimes it can not be trusted at all.

So  how is this information  collected?   Whenever timer interrupt  is
signalled the  kernel looks  what kind  of task   was running at  this
moment  and   increments the counter  that  corresponds  to this tasks
kind/state.  The  problem with  this is  that  the  system  could have
switched between  various states   multiple times between    two timer
interrupts yet the counter is incremented only for the last state.


Example
-------

If we imagine the system with one task that periodically burns cycles
in the following manner:

  time line between two timer interrupts
|--------------------------------------|
  ^                                    ^
  |_ something begins working          |
                                       |_ something goes to sleep
                                      (only to be awaken quite soon)

In the above  situation the system will be  0% loaded according to the
`/proc/stat' (since  the timer interrupt will   always happen when the
system is  executing  the idle  handler),  but in reality  the load is
closer to 99%.

One can imagine many more situations where this behavior of the kernel
will lead to quite erratic information inside `/proc/stat'.


/* gcc -o hog smallhog.c */
#include <time.h>
#include <limits.h>
#include <signal.h>
#include <sys/time.h>
#define HIST 10

static volatile sig_atomic_t stop;

static void sighandler (int signr)
{
      (void) signr;
      stop = 1;
}
static unsigned long hog (unsigned long niters)
{
      stop = 0;
      while (!stop && --niters);
      return niters;
}
int main (void)
{
      int i;
      struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
                              .it_value = { .tv_sec = 0, .tv_usec = 1 } };
      sigset_t set;
      unsigned long v[HIST];
      double tmp = 0.0;
      unsigned long n;
      signal (SIGALRM, &sighandler);
      setitimer (ITIMER_REAL, &it, NULL);

      hog (ULONG_MAX);
      for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
      for (i = 0; i < HIST; ++i) tmp += v[i];
      tmp /= HIST;
      n = tmp - (tmp / 3.0);

      sigemptyset (&set);
      sigaddset (&set, SIGALRM);

      for (;;) {
          hog (n);
          sigwait (&set, &i);
      }
      return 0;
}


References
----------

http://lkml.org/lkml/2007/2/12/6
Documentation/filesystems/proc.txt (1.8)
</Documentation/load.txt>

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-14  7:28         ` malc
  2007-02-14  8:09           ` Con Kolivas
@ 2007-02-14 20:45           ` Pavel Machek
  2007-02-25 10:35             ` malc
  1 sibling, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2007-02-14 20:45 UTC (permalink / raw)
  To: malc; +Cc: Con Kolivas, linux-kernel

Hi!
> 
> >>>I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu
> >>>without being noticed.
> >>
> >>Slightly changed version of hog(around 3 lines in total changed) does that
> >>easily on 2.6.18.3 on PPC.
> >>
> >>http://www.boblycat.org/~malc/apc/load-hog-ppc.png
> >
> >I guess it's worth mentioning this is _only_ about displaying the cpu 
> >usage to
> >userspace, as the cpu scheduler knows the accounting of each task in
> >different ways. This behaviour can not be used to exploit the cpu scheduler
> >into a starvation situation. Using the discrete per process accounting to
> >accumulate the displayed values to userspace would fix this problem, but
> >would be expensive.
> 
> Guess you are right, but, once again, the problem is not so much about
> fooling the system to do something or other, but confusing the user:
> 
> a. Everything is fine - the load is 0%, the fact that the system is
>    overheating and/or that some processes do not do as much as they
>    could is probably due to the bad hardware.
> 
> b. The weird load pattern must be the result of bugs in my code.
>    (And then a whole lot of time/effort is poured into fixing the
>     problem which is simply not there)
> 
> The current situation ought to be documented. Better yet some flag
> can

It probably _is_ documented, somewhere :-). If you find nice place
where to document it (top manpage?) go ahead with the patch.

> be introduced somewhere in the system so that it exports realy values to
> /proc, not the estimations that are innacurate in some cases (like hog)

Patch would be welcome, but I do not think it will be easy.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-14  7:28         ` malc
@ 2007-02-14  8:09           ` Con Kolivas
  2007-02-14 20:45           ` Pavel Machek
  1 sibling, 0 replies; 25+ messages in thread
From: Con Kolivas @ 2007-02-14  8:09 UTC (permalink / raw)
  To: malc; +Cc: Pavel Machek, linux-kernel

On Wednesday 14 February 2007 18:28, malc wrote:
> On Wed, 14 Feb 2007, Con Kolivas wrote:
> > On Wednesday 14 February 2007 09:01, malc wrote:
> >> On Mon, 12 Feb 2007, Pavel Machek wrote:
> >>> Hi!
>
> [..snip..]
>
> >>> I have (had?) code that 'exploits' this. I believe I could eat 90% of
> >>> cpu without being noticed.
> >>
> >> Slightly changed version of hog(around 3 lines in total changed) does
> >> that easily on 2.6.18.3 on PPC.
> >>
> >> http://www.boblycat.org/~malc/apc/load-hog-ppc.png
> >
> > I guess it's worth mentioning this is _only_ about displaying the cpu
> > usage to userspace, as the cpu scheduler knows the accounting of each
> > task in different ways. This behaviour can not be used to exploit the cpu
> > scheduler into a starvation situation. Using the discrete per process
> > accounting to accumulate the displayed values to userspace would fix this
> > problem, but would be expensive.
>
> Guess you are right, but, once again, the problem is not so much about
> fooling the system to do something or other, but confusing the user:

Yes and I certainly am not arguing against that.

>
> a. Everything is fine - the load is 0%, the fact that the system is
>     overheating and/or that some processes do not do as much as they
>     could is probably due to the bad hardware.
>
> b. The weird load pattern must be the result of bugs in my code.
>     (And then a whole lot of time/effort is poured into fixing the
>      problem which is simply not there)
>
> The current situation ought to be documented. Better yet some flag can
> be introduced somewhere in the system so that it exports realy values to
> /proc, not the estimations that are innacurate in some cases (like hog)

I wouldn't argue against any of those either. schedstats with userspace tools 
to understand the data will give better information I believe.

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-13 22:08       ` Con Kolivas
@ 2007-02-14  7:28         ` malc
  2007-02-14  8:09           ` Con Kolivas
  2007-02-14 20:45           ` Pavel Machek
  0 siblings, 2 replies; 25+ messages in thread
From: malc @ 2007-02-14  7:28 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Pavel Machek, linux-kernel

On Wed, 14 Feb 2007, Con Kolivas wrote:

> On Wednesday 14 February 2007 09:01, malc wrote:
>> On Mon, 12 Feb 2007, Pavel Machek wrote:
>>> Hi!

[..snip..]

>>> I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu
>>> without being noticed.
>>
>> Slightly changed version of hog(around 3 lines in total changed) does that
>> easily on 2.6.18.3 on PPC.
>>
>> http://www.boblycat.org/~malc/apc/load-hog-ppc.png
>
> I guess it's worth mentioning this is _only_ about displaying the cpu usage to
> userspace, as the cpu scheduler knows the accounting of each task in
> different ways. This behaviour can not be used to exploit the cpu scheduler
> into a starvation situation. Using the discrete per process accounting to
> accumulate the displayed values to userspace would fix this problem, but
> would be expensive.

Guess you are right, but, once again, the problem is not so much about
fooling the system to do something or other, but confusing the user:

a. Everything is fine - the load is 0%, the fact that the system is
    overheating and/or that some processes do not do as much as they
    could is probably due to the bad hardware.

b. The weird load pattern must be the result of bugs in my code.
    (And then a whole lot of time/effort is poured into fixing the
     problem which is simply not there)

The current situation ought to be documented. Better yet some flag can
be introduced somewhere in the system so that it exports realy values to
/proc, not the estimations that are innacurate in some cases (like hog)

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-13 22:01     ` malc
@ 2007-02-13 22:08       ` Con Kolivas
  2007-02-14  7:28         ` malc
  0 siblings, 1 reply; 25+ messages in thread
From: Con Kolivas @ 2007-02-13 22:08 UTC (permalink / raw)
  To: malc; +Cc: Pavel Machek, linux-kernel

On Wednesday 14 February 2007 09:01, malc wrote:
> On Mon, 12 Feb 2007, Pavel Machek wrote:
> > Hi!
> >
> >> The kernel looks at what is using cpu _only_ during the
> >> timer
> >> interrupt. Which means if your HZ is 1000 it looks at
> >> what is running
> >> at precisely the moment those 1000 timer ticks occur. It
> >> is
> >> theoretically possible using this measurement system to
> >> use >99% cpu
> >> and record 0 usage if you time your cpu usage properly.
> >> It gets even
> >> more inaccurate at lower HZ values for the same reason.
> >
> > I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu
> > without being noticed.
>
> Slightly changed version of hog(around 3 lines in total changed) does that
> easily on 2.6.18.3 on PPC.
>
> http://www.boblycat.org/~malc/apc/load-hog-ppc.png

I guess it's worth mentioning this is _only_ about displaying the cpu usage to 
userspace, as the cpu scheduler knows the accounting of each task in 
different ways. This behaviour can not be used to exploit the cpu scheduler 
into a starvation situation. Using the discrete per process accounting to 
accumulate the displayed values to userspace would fix this problem, but 
would be expensive.

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12 14:32   ` Pavel Machek
@ 2007-02-13 22:01     ` malc
  2007-02-13 22:08       ` Con Kolivas
  0 siblings, 1 reply; 25+ messages in thread
From: malc @ 2007-02-13 22:01 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Con Kolivas, linux-kernel

On Mon, 12 Feb 2007, Pavel Machek wrote:

> Hi!
>
>> The kernel looks at what is using cpu _only_ during the
>> timer
>> interrupt. Which means if your HZ is 1000 it looks at
>> what is running
>> at precisely the moment those 1000 timer ticks occur. It
>> is
>> theoretically possible using this measurement system to
>> use >99% cpu
>> and record 0 usage if you time your cpu usage properly.
>> It gets even
>> more inaccurate at lower HZ values for the same reason.
>
> I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu
> without being noticed.

Slightly changed version of hog(around 3 lines in total changed) does that
easily on 2.6.18.3 on PPC.

http://www.boblycat.org/~malc/apc/load-hog-ppc.png

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12 16:57 Andrew Burgess
@ 2007-02-12 18:15 ` malc
  0 siblings, 0 replies; 25+ messages in thread
From: malc @ 2007-02-12 18:15 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: linux-kernel

On Mon, 12 Feb 2007, Andrew Burgess wrote:

> On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
>>
>> How does the kernel calculates the value it places in `/proc/stat' at
>> 4th position (i.e. "idle: twiddling thumbs")?
>>
> ..
>>
>> Later small kernel module was developed that tried to time how much
>> time is spent in the idle handler inside the kernel and exported this
>> information to the user-space. The results were consistent with our
>> expectations and the output of the test utility.
> ..
>> http://www.boblycat.org/~malc/apc
>
> Vassili
>
> Could you rewrite this code as a kernel patch for
> discussion/inclusion in mainline? I and maybe others would
> appreciate having idle statistics be more accurate.

I really don't know how to approach that, what i do in itc.c is ugly
to say the least (it's less ugly on PPC, but still).

There's stuff there that is very dangerous, i.e. entering idle handler
on SMP and simultaneously rmmoding the module (which surprisingly
never actually caused any bad things on kernels i had (starting with
2.6.17.3), but paniced on Debians 2.6.8). Safety nets were added but i
don't know whether they are sufficient. All in all what i have is a
gross hack, but it works for my purposes.

Another thing that keeps bothering me (again discovered with this
Debian kernel) is the fact that PREEMPT preempts idle handler, this
just doesn't add up in my head.

So to summarize: i don't know how to properly do that (so that it
works on all/most architectures, is less of a hack, has no negative
impact on performance, etc)

But i guess what innocent `smallhog.c' posted earlier demonstrated -
is that something probably ought to be done about it, or at least
the current situation documented.

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:44 ` Con Kolivas
                     ` (2 preceding siblings ...)
  2007-02-12 14:32   ` Pavel Machek
@ 2007-02-12 18:05   ` malc
  3 siblings, 0 replies; 25+ messages in thread
From: malc @ 2007-02-12 18:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Mon, 12 Feb 2007, Con Kolivas wrote:

> On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
>> Hello,

[..snip..]

>
> The kernel looks at what is using cpu _only_ during the timer
> interrupt. Which means if your HZ is 1000 it looks at what is running
> at precisely the moment those 1000 timer ticks occur. It is
> theoretically possible using this measurement system to use >99% cpu
> and record 0 usage if you time your cpu usage properly. It gets even
> more inaccurate at lower HZ values for the same reason.

And indeed it appears to be possible to do just that. Example:

/* gcc -o hog smallhog.c */
#include <time.h>
#include <limits.h>
#include <signal.h>
#include <sys/time.h>

#define HIST 10

static sig_atomic_t stop;

static void sighandler (int signr)
{
     (void) signr;
     stop = 1;
}

static unsigned long hog (unsigned long niters)
{
     stop = 0;
     while (!stop && --niters);
     return niters;
}

int main (void)
{
     int i;
     struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
                             .it_value = { .tv_sec = 0, .tv_usec = 1 } };
     sigset_t set;
     unsigned long v[HIST];
     double tmp = 0.0;
     unsigned long n;

     signal (SIGALRM, &sighandler);
     setitimer (ITIMER_REAL, &it, NULL);

     for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
     for (i = 0; i < HIST; ++i) tmp += v[i];
     tmp /= HIST;
     n = tmp - (tmp / 3.0);

     sigemptyset (&set);
     sigaddset (&set, SIGALRM);

     for (;;) {
         hog (n);
         sigwait (&set, &i);
     }
     return 0;
}
/* end smallhog.c */

Might need some adjustment for a particular system but ran just fine here
on:
2.4.30   + Athlon tbird (1Ghz)
2.6.19.2 + Athlon X2 3800+ (2Ghz)

Showing next to zero load in top(1) and a whole lot more in APC.

http://www.boblycat.org/~malc/apc/load-tbird-hog.png
http://www.boblycat.org/~malc/apc/load-x2-hog.png

Not quite 99% but nevertheless scary.

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
@ 2007-02-12 16:57 Andrew Burgess
  2007-02-12 18:15 ` malc
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Burgess @ 2007-02-12 16:57 UTC (permalink / raw)
  To: av1474, linux-kernel

On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
>
> How does the kernel calculates the value it places in `/proc/stat' at
> 4th position (i.e. "idle: twiddling thumbs")?
>
..
>
> Later small kernel module was developed that tried to time how much
> time is spent in the idle handler inside the kernel and exported this
> information to the user-space. The results were consistent with our
> expectations and the output of the test utility.
..
> http://www.boblycat.org/~malc/apc

Vassili

Could you rewrite this code as a kernel patch for
discussion/inclusion in mainline? I and maybe others would
appreciate having idle statistics be more accurate.

Thanks for your work
Andrew


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:44 ` Con Kolivas
  2007-02-12  5:54   ` malc
  2007-02-12  5:55   ` Stephen Rothwell
@ 2007-02-12 14:32   ` Pavel Machek
  2007-02-13 22:01     ` malc
  2007-02-12 18:05   ` malc
  3 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2007-02-12 14:32 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Vassili Karpov, linux-kernel

Hi!

> The kernel looks at what is using cpu _only_ during the 
> timer
> interrupt. Which means if your HZ is 1000 it looks at 
> what is running
> at precisely the moment those 1000 timer ticks occur. It 
> is
> theoretically possible using this measurement system to 
> use >99% cpu
> and record 0 usage if you time your cpu usage properly. 
> It gets even
> more inaccurate at lower HZ values for the same reason.

I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu
without being noticed.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  7:10       ` malc
@ 2007-02-12  7:29         ` Con Kolivas
  0 siblings, 0 replies; 25+ messages in thread
From: Con Kolivas @ 2007-02-12  7:29 UTC (permalink / raw)
  To: malc; +Cc: linux-kernel

On Monday 12 February 2007 18:10, malc wrote:
> On Mon, 12 Feb 2007, Con Kolivas wrote:
> > Lots of confusion comes from this, and often people think their pc
> > suddenly uses a lot less cpu when they change from 1000HZ to 100HZ and
> > use this as an argument/reason for changing to 100HZ when in fact the
> > massive _reported_ difference is simply worse accounting. Of course there
> > is more overhead going from 100 to 1000 but it doesn't suddenly make your
> > apps use 10 times more cpu.
>
> Yep. This, i belive, what made the mplayer developers incorrectly conclude
> that utilizing RTC suddenly made the code run slower, after all /proc/stat
> now claims that CPU load is higher, while in reality it stayed the same -
> it's the accuracy that has improved (somewhat)
>
> But back to the original question, does it look at what's running on timer
> interrupt only or any IRQ? (something which is more in line with my own
> observations)

During the timer interrupt only. However if you create any form of timer, they 
will of course have some periodicity relationship with the timer interrupt.

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  6:12     ` Con Kolivas
@ 2007-02-12  7:10       ` malc
  2007-02-12  7:29         ` Con Kolivas
  0 siblings, 1 reply; 25+ messages in thread
From: malc @ 2007-02-12  7:10 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Mon, 12 Feb 2007, Con Kolivas wrote:

> On Monday 12 February 2007 16:54, malc wrote:
>> On Mon, 12 Feb 2007, Con Kolivas wrote:
>>> On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
>>
>> [..snip..]
>>
>>> The kernel looks at what is using cpu _only_ during the timer
>>> interrupt. Which means if your HZ is 1000 it looks at what is running
>>> at precisely the moment those 1000 timer ticks occur. It is
>>> theoretically possible using this measurement system to use >99% cpu
>>> and record 0 usage if you time your cpu usage properly. It gets even
>>> more inaccurate at lower HZ values for the same reason.
>>
>> Thank you very much. This somewhat contradicts what i saw (and outlined
>> in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse
>> /dev/rtc interrupt is considered to be the same as the interrupt from
>> PIT (on X86 that is)
>>
>> P.S. Perhaps it worth documenting this in the documentation? I caused
>>       me, and perhaps quite a few other people, a great deal of pain and
>>       frustration.
>
> Lots of confusion comes from this, and often people think their pc suddenly
> uses a lot less cpu when they change from 1000HZ to 100HZ and use this as an
> argument/reason for changing to 100HZ when in fact the massive _reported_
> difference is simply worse accounting. Of course there is more overhead going
> from 100 to 1000 but it doesn't suddenly make your apps use 10 times more
> cpu.

Yep. This, i belive, what made the mplayer developers incorrectly conclude
that utilizing RTC suddenly made the code run slower, after all /proc/stat
now claims that CPU load is higher, while in reality it stayed the same -
it's the accuracy that has improved (somewhat)

But back to the original question, does it look at what's running on timer
interrupt only or any IRQ? (something which is more in line with my own
observations)

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:54   ` malc
@ 2007-02-12  6:12     ` Con Kolivas
  2007-02-12  7:10       ` malc
  0 siblings, 1 reply; 25+ messages in thread
From: Con Kolivas @ 2007-02-12  6:12 UTC (permalink / raw)
  To: malc; +Cc: linux-kernel

On Monday 12 February 2007 16:54, malc wrote:
> On Mon, 12 Feb 2007, Con Kolivas wrote:
> > On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
>
> [..snip..]
>
> > The kernel looks at what is using cpu _only_ during the timer
> > interrupt. Which means if your HZ is 1000 it looks at what is running
> > at precisely the moment those 1000 timer ticks occur. It is
> > theoretically possible using this measurement system to use >99% cpu
> > and record 0 usage if you time your cpu usage properly. It gets even
> > more inaccurate at lower HZ values for the same reason.
>
> Thank you very much. This somewhat contradicts what i saw (and outlined
> in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse
> /dev/rtc interrupt is considered to be the same as the interrupt from
> PIT (on X86 that is)
>
> P.S. Perhaps it worth documenting this in the documentation? I caused
>       me, and perhaps quite a few other people, a great deal of pain and
>       frustration.

Lots of confusion comes from this, and often people think their pc suddenly 
uses a lot less cpu when they change from 1000HZ to 100HZ and use this as an 
argument/reason for changing to 100HZ when in fact the massive _reported_ 
difference is simply worse accounting. Of course there is more overhead going 
from 100 to 1000 but it doesn't suddenly make your apps use 10 times more 
cpu.

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:55   ` Stephen Rothwell
@ 2007-02-12  6:08     ` Con Kolivas
  0 siblings, 0 replies; 25+ messages in thread
From: Con Kolivas @ 2007-02-12  6:08 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Vassili Karpov, linux-kernel

On Monday 12 February 2007 16:55, Stephen Rothwell wrote:
> On Mon, 12 Feb 2007 16:44:22 +1100 "Con Kolivas" <kernel@kolivas.org> wrote:
> > The kernel looks at what is using cpu _only_ during the timer
> > interrupt. Which means if your HZ is 1000 it looks at what is running
> > at precisely the moment those 1000 timer ticks occur. It is
> > theoretically possible using this measurement system to use >99% cpu
> > and record 0 usage if you time your cpu usage properly. It gets even
> > more inaccurate at lower HZ values for the same reason.
>
> That is not true on all architecures, some do more accurate accounting by
> recording the times at user/kernel/interrupt transitions ...

Indeed. It's certainly the way the common more boring pc architectures do it 
though.

-- 
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:44 ` Con Kolivas
  2007-02-12  5:54   ` malc
@ 2007-02-12  5:55   ` Stephen Rothwell
  2007-02-12  6:08     ` Con Kolivas
  2007-02-12 14:32   ` Pavel Machek
  2007-02-12 18:05   ` malc
  3 siblings, 1 reply; 25+ messages in thread
From: Stephen Rothwell @ 2007-02-12  5:55 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Vassili Karpov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

On Mon, 12 Feb 2007 16:44:22 +1100 "Con Kolivas" <kernel@kolivas.org> wrote:
>
> The kernel looks at what is using cpu _only_ during the timer
> interrupt. Which means if your HZ is 1000 it looks at what is running
> at precisely the moment those 1000 timer ticks occur. It is
> theoretically possible using this measurement system to use >99% cpu
> and record 0 usage if you time your cpu usage properly. It gets even
> more inaccurate at lower HZ values for the same reason.

That is not true on all architecures, some do more accurate accounting by
recording the times at user/kernel/interrupt transitions ...

--
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:44 ` Con Kolivas
@ 2007-02-12  5:54   ` malc
  2007-02-12  6:12     ` Con Kolivas
  2007-02-12  5:55   ` Stephen Rothwell
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: malc @ 2007-02-12  5:54 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

On Mon, 12 Feb 2007, Con Kolivas wrote:

> On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:

[..snip..]

> The kernel looks at what is using cpu _only_ during the timer
> interrupt. Which means if your HZ is 1000 it looks at what is running
> at precisely the moment those 1000 timer ticks occur. It is
> theoretically possible using this measurement system to use >99% cpu
> and record 0 usage if you time your cpu usage properly. It gets even
> more inaccurate at lower HZ values for the same reason.

Thank you very much. This somewhat contradicts what i saw (and outlined
in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse
/dev/rtc interrupt is considered to be the same as the interrupt from
PIT (on X86 that is)

P.S. Perhaps it worth documenting this in the documentation? I caused
      me, and perhaps quite a few other people, a great deal of pain and
      frustration.

-- 
vale

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CPU load
  2007-02-12  5:33 Vassili Karpov
@ 2007-02-12  5:44 ` Con Kolivas
  2007-02-12  5:54   ` malc
                     ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Con Kolivas @ 2007-02-12  5:44 UTC (permalink / raw)
  To: Vassili Karpov; +Cc: linux-kernel

On 12/02/07, Vassili Karpov <av1474@comtv.ru> wrote:
> Hello,
>
> How does the kernel calculates the value it places in `/proc/stat' at
> 4th position (i.e. "idle: twiddling thumbs")?
>
> For background information as to why this question arose in the first
> place read on.
>
> While writing the code dealing with video acquisition/processing at
> work noticed that what top(1) (and every other tool that uses
> `/proc/stat' or `/proc/uptime') shows some very strange results.
>
> Top claimed that the system running one version of the code[A] is
> idling more often than the code[B] doing the same thing but more
> cleverly. After some head scratching one of my colleagues suggested a
> simple test that was implemented in a few minutes.
>
> The test consisted of a counter that incremented in an endless loop
> also after certain period of time had elapsed it printed the value of
> the counter.  Running this test (with priority set to the lowest
> possible level) with code[A] and code[B] confirmed that code[B] is
> indeed faster than code[A], in a sense that the test made more forward
> progress while code[B] is running.
>
> Hard-coding some things (i.e. the value of the counter after counting
> for the duration of one period on completely idle system) we extended
> the test to show the percentage of CPU that was utilized. This never
> matched the value that top presented us with.
>
> Later small kernel module was developed that tried to time how much
> time is spent in the idle handler inside the kernel and exported this
> information to the user-space. The results were consistent with our
> expectations and the output of the test utility.
>
> Two more points.
>
> a. In the past (again video processing context) i have witnessed
>    `/proc/stat' claiming that CPU utilization is 0% for, say, 20
>    seconds followed by 5 seconds of 30% load, and then the cycle
>    repeated. According to the methods outlined above the load is
>    always at 30%.
>
> b. In my personal experience difference between `/proc/stat' and
>    "reality" can easily reach 40% (think i saw even more than that)
>
> The module and graphical application that uses it, along with some
> short README and a link to Usenet article dealing with the same
> subject is available at:
> http://www.boblycat.org/~malc/apc

The kernel looks at what is using cpu _only_ during the timer
interrupt. Which means if your HZ is 1000 it looks at what is running
at precisely the moment those 1000 timer ticks occur. It is
theoretically possible using this measurement system to use >99% cpu
and record 0 usage if you time your cpu usage properly. It gets even
more inaccurate at lower HZ values for the same reason.

--
-ck

^ permalink raw reply	[flat|nested] 25+ messages in thread

* CPU load
@ 2007-02-12  5:33 Vassili Karpov
  2007-02-12  5:44 ` Con Kolivas
  0 siblings, 1 reply; 25+ messages in thread
From: Vassili Karpov @ 2007-02-12  5:33 UTC (permalink / raw)
  To: linux-kernel

Hello,

How does the kernel calculates the value it places in `/proc/stat' at
4th position (i.e. "idle: twiddling thumbs")?

For background information as to why this question arose in the first
place read on.

While writing the code dealing with video acquisition/processing at
work noticed that what top(1) (and every other tool that uses
`/proc/stat' or `/proc/uptime') shows some very strange results.

Top claimed that the system running one version of the code[A] is
idling more often than the code[B] doing the same thing but more
cleverly. After some head scratching one of my colleagues suggested a
simple test that was implemented in a few minutes.

The test consisted of a counter that incremented in an endless loop
also after certain period of time had elapsed it printed the value of
the counter.  Running this test (with priority set to the lowest
possible level) with code[A] and code[B] confirmed that code[B] is
indeed faster than code[A], in a sense that the test made more forward
progress while code[B] is running.

Hard-coding some things (i.e. the value of the counter after counting
for the duration of one period on completely idle system) we extended
the test to show the percentage of CPU that was utilized. This never
matched the value that top presented us with.

Later small kernel module was developed that tried to time how much
time is spent in the idle handler inside the kernel and exported this
information to the user-space. The results were consistent with our
expectations and the output of the test utility.

Two more points.

a. In the past (again video processing context) i have witnessed
   `/proc/stat' claiming that CPU utilization is 0% for, say, 20
   seconds followed by 5 seconds of 30% load, and then the cycle
   repeated. According to the methods outlined above the load is
   always at 30%.

b. In my personal experience difference between `/proc/stat' and
   "reality" can easily reach 40% (think i saw even more than that)

The module and graphical application that uses it, along with some
short README and a link to Usenet article dealing with the same
subject is available at:
http://www.boblycat.org/~malc/apc

Thanks



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-02-26 16:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-10 14:50 CPU load David Chow
2002-07-10 16:54 ` William Lee Irwin III
2002-07-10 17:49   ` Robert Love
2002-07-26 17:38     ` David Chow
2007-02-12  5:33 Vassili Karpov
2007-02-12  5:44 ` Con Kolivas
2007-02-12  5:54   ` malc
2007-02-12  6:12     ` Con Kolivas
2007-02-12  7:10       ` malc
2007-02-12  7:29         ` Con Kolivas
2007-02-12  5:55   ` Stephen Rothwell
2007-02-12  6:08     ` Con Kolivas
2007-02-12 14:32   ` Pavel Machek
2007-02-13 22:01     ` malc
2007-02-13 22:08       ` Con Kolivas
2007-02-14  7:28         ` malc
2007-02-14  8:09           ` Con Kolivas
2007-02-14 20:45           ` Pavel Machek
2007-02-25 10:35             ` malc
2007-02-26  9:28               ` Pavel Machek
2007-02-26 10:42                 ` malc
2007-02-26 16:38                   ` Randy Dunlap
2007-02-12 18:05   ` malc
2007-02-12 16:57 Andrew Burgess
2007-02-12 18:15 ` malc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).