LKML Archive on
help / color / mirror / Atom feed
From: "Asbjørn Sannes" <>
To: Ray Lee <>
Cc: Linux Kernel Mailing List <>
Subject: Re: Unpredictable performance
Date: Mon, 28 Jan 2008 10:02:05 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Ray Lee wrote:
> On Jan 25, 2008 12:49 PM, Asbjørn Sannes <> wrote:
>> Ray Lee wrote:
>>> On Jan 25, 2008 3:32 AM, Asbjorn Sannes <> wrote:
>>>> Hi,
>>>> I am experiencing unpredictable results with the following test
>>>> without other processes running (exception is udev, I believe):
>>>> cd /usr/src/test
>>>> tar -jxf ../linux-
>>>> cp ../working-config linux-
>>>> cd linux-
>>>> make oldconfig
>>>> time make -j3 > /dev/null # This is what I note down as a "test" result
>>>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>>>> and then reboot
>>>> The kernel is booted with the parameter mem=81920000
>>>> For the results vary from (real time) 33m30.551s to 45m32.703s
>>>> (30 runs)
>>>> For with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
>>>> For also varied a lot.. but, lost results :(
>>>> For only vary from 34m32.054s to 38m1.928s (10 runs)
>>>> Any idea of what can cause this? I have tried to make the runs as equal
>>>> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>>>> sys and user time only varies a couple of seconds.. and the order of
>>>> when it is "fast" and when it is "slow" is completly random, but it
>>>> seems that the results are mostly concentrated around the mean.
>> .. I may have jumped the gun a "little" early saying that it is mostly
>> concentrated around the mean, grepping from memory is not always .. hm,
>> accurate :P
> For you (or anyone!) to have any faith in your conclusions at all, you
> need to generate the mean and the standard deviation of each of your
> runs.
>>> First off, not all tests are good tests. In particular, small timing
>>> differences can get magnified horrendously by heading into swap.
>> So, what you are saying is that it is expected to vary this much under
>> memory pressure? That I can not do anything with this on real hardware?
> No, I'm saying exactly what I wrote.
> What you're testing is basically a bunch of processes competing for
> the CPU scheduler. Who wins that competition is essentially random.
> Whoever wins then places pressure on the IO subsystem. If you then go
> into swap, you're then placing even *more* random pressure on the IO
> system. The reason is that  the order of the requests you're asking it
> to do vary *wildly* between each of your 'tests', and disk drives have
> a horrible time seeking between tracks. That's what I'm saying by
> minute differences in the kernel's behavior can get magnified
> thousands of times if you start hitting swap, or running a test that
> won't all fit into cache.
Yes, I get this, just to make it clear, the test is supposed to throw
things out from the page cache, because this is in part what I want to
test. I was under the impression that (not anymore though) a kernel
compile would be pretty deterministic in how it would pressure the page
cache and how much I/O would result from that.. so, I'm going to try and
change the test.
> So whatever you're trying to measure, you need to be aware that you're
> basically throwing a random number generator into the mix.
>>> That said, do you have the means and standard deviations of those
>>> runs? That's a good way to tell whether the tests are converging or
>>> not, and whether your results are telling you anything.
>> I have all the numbers, I was just hoping that there was a way to
>> benchmark a small change without a lot of runs. It seems to me to quite
>> randomly distributed .. from the runs:
> Sure, you just keep running the tests until your standard deviation
> converges to a significant enough range, where significant is whatever
> you like it to be (+- one minute, say, or 10 seconds, or whatever).
> But beware, if your test is essentially random, then it may never
> converge. That in itself is interesting, too.
>> It seems to me to quite
>> randomly distributed .. from the runs:
>> 43m10.022s, 34m31.104s, 43m47.221s, 41m17.840s, 34m15.454s,
>> 37m54.327s, 35m6.193s, 38m16.909s, 37m45.411s, 40m13.169s
>> 38m17.414s, 34m37.561s, 43m18.181s, 35m46.233s, 34m44.414s,
>> 39m55.257s, 35m28.477s, 33m30.551s, 41m36.394s, 43m6.359s,
>> 42m42.396s, 37m44.293s, 41m6.615s, 35m43.084s, 39m25.846s,
>> 34m23.753s, 36m0.556s, 41m38.095s, 45m32.703s, 36m18.325s,
>> 42m4.840s, 43m53.759s, 35m51.138s, 40m19.001s
>> Say I made a histogram of this (tilt your head :P) with 1 minute intervals:
>> 33 *
>> 34 *****
>> 35 *****
>> 36 **
>> 37 ***
>> 38 **
>> 39 **
>> 40 **
>> 41 ****
>> 42 **
>> 43 *****
>> 44
>> 45 *
Mean is 2328s and standard deviation is 210s.
> Just eyeballing that, I can tell you your standard deviation is large,
> and you would therefore need to run more tests.
> However, let me just underscore that unless you're planning on setting
> up a compile server that you *want* to push into swap all the time,
> then this is a pretty silly test for getting good numbers, and instead
> should try testing something closer to what you're actually concerned
> about.
> As an example -- there's a reason kernel developers have stopped
> trying to get good numbers from dbench: it's horribly sensitive to
> timing issues.
>> I don't really know what to make of that.. Going to see what happens
>> with less memory and make -j1, perhaps it will be more stable.
>>> Also as you're on a uniprocessor system, make -j2 is probably going to
>>> be faster than make -j3. Perhaps immaterial to whatever you're trying
>>> to test, but there you go.
>> Yes, I was hoping to have a more deterministic test to get a higher
>> confidence in fewer runs when testing changes. Especially under memory
>> pressure. And I truly was not expecting this much fluctuations, which is
>> why I tested several kernel versions to see if this influenced it and
>> mailed lkml. The computer is actually a dual core amd processor, but I
>> compiled the kernel with no smp to see if that helped on the dispersion.
> Well, if you really feel something is weird between the different
> kernel versions, then you should try bisecting various kernels between
> 2.6.20 and 2.6.22 and see if it leads you to anything conclusive. You
> only took 10 data points on 2.6.20, though, so I'd retest it as much
> as you have the other versions just to make sure the results are
> stable.
Just going to put this here, did more tests on (56runs) and
here is the outcome of that:
33: **
34: *********  
35: ************
36: *****************
37: **********
38: *****
39: *

Mean is 2175s (~36m) and standard deviation 77s.

> The kernel really *does* change behavior between each version,
> sometimes intentional, sometimes not. But good tests are one of the
> few starting points for discovering that.
> In particular, though, you need to be aware that Ingo Molnar's
> "Completely Fair Scheduler" went into 2.6.23. What that means is that
> the processes get scheduled on the CPU more fairly. It also means that
> the disk drive access in your test is going to be much more random and
> seeky than before, as the processes are now interleaving their
> requests together in a much more fine grained way. This is why I keep
> saying (sorry!) that I don't think you're really testing whatever it
> is that you actually care about.
What I want to test is how compressed caching (compressing the page
cache) improves performance under memory pressure. Now, if I just
compared it to the vanilla kernel and I only had one version of
compressed caching to test with, then that would be fine. But I want to
test some small changes to heuristics that may only change the outcome
by 2-3%. A good starting point for me would be to have a vanilla kernel
and a test that gives me consistent  enough results, before starting to
compare it.

I'm going to try a kernel compile with no parallel processes, the
randomness should become less I suppose :)

Asbjorn Sannes

      parent reply	other threads:[~2008-01-28  9:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-25 11:32 Asbjorn Sannes
2008-01-25 14:00 ` Nick Piggin
2008-01-25 14:31   ` Asbjørn Sannes
2008-01-25 15:03     ` Asbjørn Sannes
2008-01-26  0:38       ` Nick Piggin
2008-01-28  9:12         ` Asbjørn Sannes
2008-01-25 17:16 ` Ray Lee
2008-01-25 20:49   ` Asbjørn Sannes
     [not found]     ` <>
2008-01-28  9:02       ` Asbjørn Sannes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: Unpredictable performance' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).