LKML Archive on
help / color / mirror / Atom feed
From: Willy Tarreau <>
To: Nick Piggin <>
Cc: Ray Lee <>,
	Peter Zijlstra <>,
	Ingo Molnar <>,
	"LKML," <>
Subject: Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)
Date: Mon, 17 Mar 2008 09:26:38 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, Mar 17, 2008 at 06:19:38PM +1100, Nick Piggin wrote:
> On Monday 17 March 2008 16:21, Willy Tarreau wrote:
> > On Sun, Mar 16, 2008 at 10:16:36PM -0700, Ray Lee wrote:
> > > On Sun, Mar 16, 2008 at 5:44 PM, Nick Piggin <> 
> > > > Can this be changed by default, please?
> > >
> > > Not without benchmarks of interactivity, please. There are far, far
> > > more linux desktops than there are servers. People expect to have to
> > > tune servers (I do, for the servers I maintain). People don't expect
> > > to have to tune a desktop to make it run well.
> >
> > Oh and even on servers, when your anti-virus proxy reaches a load of 800,
> > you're happy no to have too large a time-slice so that you regularly get
> > a chance of being allowed to type in commands over SSH.
> Your ssh session should be allowed to run anyway. I don't see the difference.
> If the runqueue length is 100 and the time-slice is (say) 10ms, then if your
> ssh only needs average of 5ms of CPU time per second, then it should be run
> next when it becomes runnable. If it wants 20ms of CPU time per second, then
> it has to wait for 2 seconds anyway to be run next, regardless of whether
> the timeslice was 10ms or 20ms.

It's not about what *ssh* uses but about what *others* use. Except by
renicing SSH or marking it real-time, it has no way to say "give the
CPU to me right now, I have something very short to do". So it will
have to wait for the 100 other tasks to eat their 10ms, waiting 1 second
to consume 5ms of CPU (and I was speaking about 800 and not 100).

It is one of the situations where I prefer to shorten timeslices when
load increases because it will not slow down the service too much, but
will still provide better interactivity, which is also benefical to the
service itself since there is no reason for the cycles usage to be the
same for all processes. So by having a finer granularity, small CPU eaters
will finish sooner.

> > Large time-slices are needed only in HPC environments IMHO, where only
> > one task runs.
> That's silly. By definition if there is only one task running, you don't
> care what the timeslice is.

I mean there's only one important task. There is always a bit of pollution
around it, and interrupting the tasks less often slightly reduces the
context-switch overhead.

> We actually did conduct some benchmarks, and a 10ms timeslice can start
> hurting even things like kbuild quite a bit.
> But anyway, I don't care what the time slice is so much (although it should
> be higher -- if the scheduler can't get good interactive behaviour with a
> 20-30ms timeslice in most cases then it's no good IMO). I care mostly that
> the timeslice does not decrease when load increases.

On the opposite, I think it's a fundamental requirement if you need to
maintain a reasonable interactivity, and a fair progress between all
tasks. I think it's obvious to understand that the only way to maintain
a constant latency with a growing number of tasks is to reduce the time
each task may spend on the CPU. Contrary to other domains such as network,
you don't know how much time a task will spend on the CPU if you grant an
access to it, and there is no way to know because only the work that this
task will perform will determine if it should run shorter or longer. Fair
scheduling in other areas such as network is "easier" because you know the
size of your packets so you know how much time they will take on the wire.

Here with tasks, the best you can do is estimating based on history. But
it will be very rare when you'll be able to correctly guess and guarantee
that the latency is correct.

Maybe the timeslices should shrink only past a certain load though (I don't
know how it's done today).


  reply	other threads:[~2008-03-17  8:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-11  6:49 Nick Piggin
2008-03-11  7:58 ` Ingo Molnar
2008-03-11  8:28   ` Nick Piggin
2008-03-11  9:59     ` Ingo Molnar
2008-03-11 21:07 ` Nicholas Miell
2008-03-11 21:34   ` Cyrus Massoumi
2008-03-11 23:12     ` Nicholas Miell
2008-03-11 23:42       ` Nick Piggin
2008-03-11 23:47       ` Stephen Hemminger
2008-03-12  9:00       ` Zhang, Yanmin
     [not found] ` <>
     [not found]   ` <>
2008-03-12  1:21     ` Nick Piggin
2008-03-12  7:58       ` Peter Zijlstra
2008-03-17  0:44         ` Nick Piggin
2008-03-17  5:16           ` Ray Lee
2008-03-17  5:21             ` Willy Tarreau
2008-03-17  7:19               ` Nick Piggin
2008-03-17  8:26                 ` Willy Tarreau [this message]
2008-03-17  8:54                   ` Nick Piggin
2008-03-17  9:28                     ` Peter Zijlstra
2008-03-17  9:56                       ` Nick Piggin
2008-03-17 10:16                       ` Ingo Molnar
2008-03-17  5:34             ` Nick Piggin
2008-03-14 15:42 Xose Vazquez Perez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \
    --subject='Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).