LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Odd heuristic in load average calculation when many processes start in a small window
@ 2015-01-29 15:05 SZALAY Attila
  0 siblings, 0 replies; only message in thread
From: SZALAY Attila @ 2015-01-29 15:05 UTC (permalink / raw)
  To: linux-kernel

I found a strange spike in one of my machine's load average.

The machine does nothing (right now). The normal load average is nearly
zero, the user and system usage is not more than 5 per cent. But once in
a while the load average go to more than one, some times even much
higher (5-20).

The problem with this is that I want to do an alert based on the load of
the machine but this amount of false positive alerts cause some trouble.

I checked the overall status of the machine with the
dstat -tcdngslyip

command and found no running process and no blocked process either.

But the number of the new processes was high in every occurrence.

So I created a small script to mimic this behavior and I can reproduce
the problem with a labor environment. I tested it in an ubuntu trusty,
with kernel version 3.13.0 and 3.18.0. The production system is an
ubuntu precise with kernel version 3.2.0.

In the test system I could not create really big load average, but it is
a virtual machine with 4 core and the production system is a bare metal
with 16 core.

So, my question is:
- Can I do something to mitigate this problem (the load of processes is
  started by the munin and I could not eliminate it from the system)

- Is this can be treated as a bug in the load average calculation? Or
  it is a known issue/design fact?

Of course I searched the web for the answers but found nothing related
to this issue. In every place I found there were processes in D state or
at least high iowait, but not here.

Thanks you for your help

A simplified output sample of the test machine is the following:
----system---- ----total-cpu-usage---- ---load-avg--- ---procs---
     time     |usr sys idl wai hiq siq| 1m   5m  15m |run blk new
29-01 14:52:46|  0   0 100   0   0   0|   0 0.12 0.30|  0   0   0
29-01 14:52:47|  0   0 100   0   0   0|   0 0.12 0.30|  0   0   0
29-01 14:52:48|  0   0 100   0   0   0|   0 0.12 0.30|  0   0   0
29-01 14:52:49|  0   0 100   0   0   0|   0 0.12 0.30|  0   0   0
29-01 14:52:50|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:51|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:52|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:53|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:54|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:55|  6  12  81   0   0   0|   0 0.11 0.30|  0   0 504
29-01 14:52:56|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:57|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:58|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:52:59|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:00|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:01|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:02|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:03|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:04|  0   0 100   0   0   0|   0 0.11 0.30|  0   0   0
29-01 14:53:05|  6  12  83   0   0   0|0.32 0.17 0.32|  0   0 503
29-01 14:53:06|  0   0 100   0   0   0|0.32 0.17 0.32|  0   0   0

And the test script is the following:
#!/bin/sh

while `/bin/true`
do
  for i in `seq 1 500`
  do
    /bin/echo -en "" &
  done
  sleep 10
done



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-01-29 15:05 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-29 15:05 Odd heuristic in load average calculation when many processes start in a small window SZALAY Attila

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).