LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: CFS review
@ 2007-08-11 10:44 Al Boldi
  2007-08-12  4:17 ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-11 10:44 UTC (permalink / raw)
  To: linux-kernel

Roman Zippel wrote:
> On Fri, 10 Aug 2007, Ingo Molnar wrote:
> > achieve that. It probably wont make a real difference, but it's really
> > easy for you to send and it's still very useful when one tries to
> > eliminate possibilities and when one wants to concentrate on the
> > remaining possibilities alone.
>
> The thing I'm afraid about CFS is its possible unpredictability, which
> would make it hard to reproduce problems and we may end up with users with
> unexplainable weird problems. That's the main reason I'm trying so hard to
> push for a design discussion.
>
> Just to give an idea here are two more examples of irregular behaviour,
> which are hopefully easier to reproduce.
>
> 1. Two simple busy loops, one of them is reniced to 15, according to my
> calculations the reniced task should get about 3.4% (1/(1.25^15+1)), but I
> get this:
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4433 roman     20   0  1532  300  244 R 99.2  0.2   5:05.51 l
>  4434 roman     35  15  1532   72   16 R  0.7  0.1   0:10.62 l
>
> OTOH upto nice level 12 I get what I expect.
>
> 2. If I start 20 busy loops, initially I see in top that every task gets
> 5% and time increments equally (as it should):
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4492 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4491 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4490 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4489 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4488 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4487 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4486 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4485 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4484 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4483 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4482 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4481 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4480 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4479 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4478 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4477 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4476 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4475 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4474 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
>  4473 roman     20   0  1532  296  244 R  5.0  0.2   0:02.86 l
>
> But if I renice all of them to -15, the time every task gets is rather
> random:
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4492 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.95 l
>  4491 roman      5 -15  1532   68   16 R  4.3  0.1   0:07.62 l
>  4490 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.50 l
>  4489 roman      5 -15  1532   68   16 R  7.6  0.1   0:07.80 l
>  4488 roman      5 -15  1532   68   16 R  9.6  0.1   0:08.31 l
>  4487 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.59 l
>  4486 roman      5 -15  1532   68   16 R  6.6  0.1   0:07.08 l
>  4485 roman      5 -15  1532   68   16 R 10.0  0.1   0:07.31 l
>  4484 roman      5 -15  1532   68   16 R  8.0  0.1   0:07.30 l
>  4483 roman      5 -15  1532   68   16 R  7.0  0.1   0:07.34 l
>  4482 roman      5 -15  1532   68   16 R  1.0  0.1   0:05.84 l
>  4481 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.16 l
>  4480 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.00 l
>  4479 roman      5 -15  1532   68   16 R  1.0  0.1   0:06.66 l
>  4478 roman      5 -15  1532   68   16 R  8.6  0.1   0:06.96 l
>  4477 roman      5 -15  1532   68   16 R  8.6  0.1   0:07.63 l
>  4476 roman      5 -15  1532   68   16 R  9.6  0.1   0:07.38 l
>  4475 roman      5 -15  1532   68   16 R  1.3  0.1   0:07.09 l
>  4474 roman      5 -15  1532   68   16 R  2.3  0.1   0:07.97 l
>  4473 roman      5 -15  1532  296  244 R  1.0  0.2   0:07.73 l

That's because granularity increases when decreasing nice, and results in 
larger timeslices, which affects smoothness negatively.  chew.c easily shows 
this problem with 2 background cpu-hogs at the same nice-level.

pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
pid 908, prio   0, out for    8 ms, ran for    2 ms, load  26%
pid 908, prio   0, out for    8 ms, ran for    4 ms, load  38%
pid 908, prio   0, out for    2 ms, ran for    1 ms, load  47%

pid 908, prio  -5, out for   23 ms, ran for    3 ms, load  14%
pid 908, prio  -5, out for   17 ms, ran for    9 ms, load  35%
pid 908, prio  -5, out for   18 ms, ran for    6 ms, load  27%
pid 908, prio  -5, out for   20 ms, ran for   10 ms, load  34%
pid 908, prio  -5, out for    9 ms, ran for    3 ms, load  30%

pid 908, prio -10, out for   69 ms, ran for    8 ms, load  11%
pid 908, prio -10, out for   35 ms, ran for   19 ms, load  36%
pid 908, prio -10, out for   58 ms, ran for   20 ms, load  26%
pid 908, prio -10, out for   34 ms, ran for   17 ms, load  34%
pid 908, prio -10, out for   58 ms, ran for   23 ms, load  28%

pid 908, prio -15, out for  164 ms, ran for   20 ms, load  11%
pid 908, prio -15, out for   21 ms, ran for   11 ms, load  36%
pid 908, prio -15, out for   21 ms, ran for   12 ms, load  37%
pid 908, prio -15, out for  115 ms, ran for   14 ms, load  11%
pid 908, prio -15, out for   27 ms, ran for   22 ms, load  45%
pid 908, prio -15, out for  125 ms, ran for   33 ms, load  21%
pid 908, prio -15, out for   54 ms, ran for   16 ms, load  22%
pid 908, prio -15, out for   34 ms, ran for   33 ms, load  49%
pid 908, prio -15, out for   94 ms, ran for   15 ms, load  14%
pid 908, prio -15, out for   29 ms, ran for   21 ms, load  42%
pid 908, prio -15, out for  108 ms, ran for   20 ms, load  15%
pid 908, prio -15, out for   44 ms, ran for   20 ms, load  31%
pid 908, prio -15, out for   34 ms, ran for  110 ms, load  76%
pid 908, prio -15, out for  132 ms, ran for   21 ms, load  14%
pid 908, prio -15, out for   42 ms, ran for   39 ms, load  48%
pid 908, prio -15, out for   57 ms, ran for  124 ms, load  68%
pid 908, prio -15, out for   44 ms, ran for   17 ms, load  28%

It looks like the larger the granularity, the more unpredictable it gets, 
which probably means that this unpredictability exists even at smaller 
granularity but is only exposed with larger ones.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-11 10:44 CFS review Al Boldi
@ 2007-08-12  4:17 ` Ingo Molnar
  2007-08-12 15:27   ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-12  4:17 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Roman Zippel, Linus Torvalds,
	Andrew Morton, linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> That's because granularity increases when decreasing nice, and results 
> in larger timeslices, which affects smoothness negatively.  chew.c 
> easily shows this problem with 2 background cpu-hogs at the same 
> nice-level.
> 
> pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
> pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
> pid 908, prio   0, out for    8 ms, ran for    2 ms, load  26%
> pid 908, prio   0, out for    8 ms, ran for    4 ms, load  38%
> pid 908, prio   0, out for    2 ms, ran for    1 ms, load  47%
> 
> pid 908, prio  -5, out for   23 ms, ran for    3 ms, load  14%
> pid 908, prio  -5, out for   17 ms, ran for    9 ms, load  35%

yeah. Incidentally, i refined this last week and those nice-level 
granularity changes went into the upstream scheduler code a few days 
ago:

  commit 7cff8cf61cac15fa29a1ca802826d2bcbca66152
  Author: Ingo Molnar <mingo@elte.hu>
  Date:   Thu Aug 9 11:16:52 2007 +0200

      sched: refine negative nice level granularity

      refine the granularity of negative nice level tasks: let them
      reschedule more often to offset the effect of them consuming
      their wait_runtime proportionately slower. (This makes nice-0
      task scheduling smoother in the presence of negatively
      reniced tasks.)

      Signed-off-by: Ingo Molnar <mingo@elte.hu>

so could you please re-check chew jitter behavior with the latest 
kernel? (i've attached the standalone patch below, it will apply cleanly 
to rc2 too.)

when testing this, you might also want to try chew-max:

   http://redhat.com/~mingo/cfs-scheduler/tools/chew-max.c

i added a few trivial enhancements to chew.c: it tracks the maximum 
latency, latency fluctuations (noise of scheduling) and allows it to be 
run for a fixed amount of time.

NOTE: if you run chew from any indirect terminal (xterm, ssh, etc.) it's 
best to capture/report chew numbers like this:

  ./chew-max 60 > chew.log

otherwise the indirect scheduling activities of the chew printout will 
disturb the numbers.

> It looks like the larger the granularity, the more unpredictable it 
> gets, which probably means that this unpredictability exists even at 
> smaller granularity but is only exposed with larger ones.

this should only affect non-default nice levels. Note that 99.9%+ of all 
userspace Linux CPU time is spent on default nice level 0, and that is 
what controls the design. So the approach was always to first get nice-0 
right, and then to adjust the non-default nice level behavior too, 
carefully, without hurting nice-0 - to refine all the workloads where 
people (have to) use positive or negative nice levels. In any case, 
please keep re-testing this so that we can adjust it.

	Ingo

--------------------->
commit 7cff8cf61cac15fa29a1ca802826d2bcbca66152
Author: Ingo Molnar <mingo@elte.hu>
Date:   Thu Aug 9 11:16:52 2007 +0200

    sched: refine negative nice level granularity
    
    refine the granularity of negative nice level tasks: let them
    reschedule more often to offset the effect of them consuming
    their wait_runtime proportionately slower. (This makes nice-0
    task scheduling smoother in the presence of negatively
    reniced tasks.)
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 7a632c5..e91db32 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -222,21 +222,25 @@ niced_granularity(struct sched_entity *curr, unsigned long granularity)
 {
 	u64 tmp;
 
+	if (likely(curr->load.weight == NICE_0_LOAD))
+		return granularity;
 	/*
-	 * Negative nice levels get the same granularity as nice-0:
+	 * Positive nice levels get the same granularity as nice-0:
 	 */
-	if (likely(curr->load.weight >= NICE_0_LOAD))
-		return granularity;
+	if (likely(curr->load.weight < NICE_0_LOAD)) {
+		tmp = curr->load.weight * (u64)granularity;
+		return (long) (tmp >> NICE_0_SHIFT);
+	}
 	/*
-	 * Positive nice level tasks get linearly finer
+	 * Negative nice level tasks get linearly finer
 	 * granularity:
 	 */
-	tmp = curr->load.weight * (u64)granularity;
+	tmp = curr->load.inv_weight * (u64)granularity;
 
 	/*
 	 * It will always fit into 'long':
 	 */
-	return (long) (tmp >> NICE_0_SHIFT);
+	return (long) (tmp >> WMULT_SHIFT);
 }
 
 static inline void

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-12  4:17 ` Ingo Molnar
@ 2007-08-12 15:27   ` Al Boldi
  2007-08-12 15:52     ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-12 15:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Roman Zippel, Linus Torvalds,
	Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > That's because granularity increases when decreasing nice, and results
> > in larger timeslices, which affects smoothness negatively.  chew.c
> > easily shows this problem with 2 background cpu-hogs at the same
> > nice-level.
> >
> > pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
> > pid 908, prio   0, out for    8 ms, ran for    4 ms, load  37%
> > pid 908, prio   0, out for    8 ms, ran for    2 ms, load  26%
> > pid 908, prio   0, out for    8 ms, ran for    4 ms, load  38%
> > pid 908, prio   0, out for    2 ms, ran for    1 ms, load  47%
> >
> > pid 908, prio  -5, out for   23 ms, ran for    3 ms, load  14%
> > pid 908, prio  -5, out for   17 ms, ran for    9 ms, load  35%
>
> yeah. Incidentally, i refined this last week and those nice-level
> granularity changes went into the upstream scheduler code a few days
> ago:
>
>   commit 7cff8cf61cac15fa29a1ca802826d2bcbca66152
>   Author: Ingo Molnar <mingo@elte.hu>
>   Date:   Thu Aug 9 11:16:52 2007 +0200
>
>       sched: refine negative nice level granularity
>
>       refine the granularity of negative nice level tasks: let them
>       reschedule more often to offset the effect of them consuming
>       their wait_runtime proportionately slower. (This makes nice-0
>       task scheduling smoother in the presence of negatively
>       reniced tasks.)
>
>       Signed-off-by: Ingo Molnar <mingo@elte.hu>
>
> so could you please re-check chew jitter behavior with the latest
> kernel? (i've attached the standalone patch below, it will apply cleanly
> to rc2 too.)

That fixes it, but by reducing granularity ctx is up 4-fold.

Mind you, it does have an enormous effect on responsiveness, as negative nice 
with small granularity can't hijack the system any more.

> when testing this, you might also want to try chew-max:
>
>    http://redhat.com/~mingo/cfs-scheduler/tools/chew-max.c
>
> i added a few trivial enhancements to chew.c: it tracks the maximum
> latency, latency fluctuations (noise of scheduling) and allows it to be
> run for a fixed amount of time.

Looks great.  Thanks!

> NOTE: if you run chew from any indirect terminal (xterm, ssh, etc.) it's
> best to capture/report chew numbers like this:
>
>   ./chew-max 60 > chew.log
>
> otherwise the indirect scheduling activities of the chew printout will
> disturb the numbers.

Correct; that's why I always boot into /bin/sh to get a clean run.  But 
redirecting output to a file is also a good idea, provided that this file 
lives on something like tmpfs, otherwise you'll get flush out jitter.

> > It looks like the larger the granularity, the more unpredictable it
> > gets, which probably means that this unpredictability exists even at
> > smaller granularity but is only exposed with larger ones.
>
> this should only affect non-default nice levels. Note that 99.9%+ of all
> userspace Linux CPU time is spent on default nice level 0, and that is
> what controls the design. So the approach was always to first get nice-0
> right, and then to adjust the non-default nice level behavior too,
> carefully, without hurting nice-0 - to refine all the workloads where
> people (have to) use positive or negative nice levels. In any case,
> please keep re-testing this so that we can adjust it.

The thing is, this unpredictability seems to exist even at nice level 0, but 
the smaller granularity covers it all up.  It occasionally exhibits itself 
as hick-ups during transient heavy workload flux.  But it's not easily 
reproducible.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-12 15:27   ` Al Boldi
@ 2007-08-12 15:52     ` Ingo Molnar
  2007-08-12 19:43       ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-12 15:52 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Roman Zippel, Linus Torvalds,
	Andrew Morton, linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > so could you please re-check chew jitter behavior with the latest 
> > kernel? (i've attached the standalone patch below, it will apply 
> > cleanly to rc2 too.)
> 
> That fixes it, but by reducing granularity ctx is up 4-fold.

ok, great! (the context-switch rate is obviously up.)

> Mind you, it does have an enormous effect on responsiveness, as 
> negative nice with small granularity can't hijack the system any more.

ok. i'm glad you like the result :-) This makes reniced X (or any 
reniced app) more usable.

> The thing is, this unpredictability seems to exist even at nice level 
> 0, but the smaller granularity covers it all up.  It occasionally 
> exhibits itself as hick-ups during transient heavy workload flux.  But 
> it's not easily reproducible.

In general, "hickups" can be due to many, many reasons. If a task got 
indeed delayed by scheduling jitter that is provable, even if the 
behavior is hard to reproduce, by enabling CONFIG_SCHED_DEBUG=y and 
CONFIG_SCHEDSTATS=y in your kernel. First clear all the stats:

  for N in /proc/*/task/*/sched; do echo 0 > $N; done

then wait for the 'hickup' to happen, and once it happens capture the 
system state (after the hickup) via this script:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and tell me which specific task exhibited that 'hickup' and send me the 
debug output. Also, could you try the patch below as well? Thanks,

	Ingo

-------------------------------->
Subject: sched: fix sleeper bonus
From: Ingo Molnar <mingo@elte.hu>

Peter Ziljstra noticed that the sleeper bonus deduction code
was not properly rate-limited: a task that scheduled more
frequently would get a disproportionately large deduction.
So limit the deduction to delta_exec.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -75,7 +75,7 @@ enum {
 
 unsigned int sysctl_sched_features __read_mostly =
 		SCHED_FEAT_FAIR_SLEEPERS	*1 |
-		SCHED_FEAT_SLEEPER_AVG		*1 |
+		SCHED_FEAT_SLEEPER_AVG		*0 |
 		SCHED_FEAT_SLEEPER_LOAD_AVG	*1 |
 		SCHED_FEAT_PRECISE_CPU_LOAD	*1 |
 		SCHED_FEAT_START_DEBIT		*1 |
@@ -304,11 +304,9 @@ __update_curr(struct cfs_rq *cfs_rq, str
 	delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
 
 	if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
-		delta = calc_delta_mine(cfs_rq->sleeper_bonus,
-					curr->load.weight, lw);
-		if (unlikely(delta > cfs_rq->sleeper_bonus))
-			delta = cfs_rq->sleeper_bonus;
-
+		delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec);
+		delta = calc_delta_mine(delta, curr->load.weight, lw);
+		delta = min((u64)delta, cfs_rq->sleeper_bonus);
 		cfs_rq->sleeper_bonus -= delta;
 		delta_mine -= delta;
 	}
@@ -521,6 +519,8 @@ static void __enqueue_sleeper(struct cfs
 	 * Track the amount of bonus we've given to sleepers:
 	 */
 	cfs_rq->sleeper_bonus += delta_fair;
+	if (unlikely(cfs_rq->sleeper_bonus > sysctl_sched_runtime_limit))
+		cfs_rq->sleeper_bonus = sysctl_sched_runtime_limit;
 
 	schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
 }

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-12 15:52     ` Ingo Molnar
@ 2007-08-12 19:43       ` Al Boldi
  2007-08-21 10:58         ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-12 19:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Roman Zippel, Linus Torvalds,
	Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > The thing is, this unpredictability seems to exist even at nice level
> > 0, but the smaller granularity covers it all up.  It occasionally
> > exhibits itself as hick-ups during transient heavy workload flux.  But
> > it's not easily reproducible.
>
> In general, "hickups" can be due to many, many reasons. If a task got
> indeed delayed by scheduling jitter that is provable, even if the
> behavior is hard to reproduce, by enabling CONFIG_SCHED_DEBUG=y and
> CONFIG_SCHEDSTATS=y in your kernel. First clear all the stats:
>
>   for N in /proc/*/task/*/sched; do echo 0 > $N; done
>
> then wait for the 'hickup' to happen, and once it happens capture the
> system state (after the hickup) via this script:
>
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
>
> and tell me which specific task exhibited that 'hickup' and send me the
> debug output.

Ok.

> Also, could you try the patch below as well? Thanks,

Looks ok, but I'm not sure which workload this is supposed to improve.

There is one workload that still isn't performing well; it's a web-server 
workload that spawns 1K+ client procs.  It can be emulated by using this:

  for i in `seq 1 to 3333`; do ping 10.1 -A > /dev/null & done

The problem is that consecutive runs don't give consistent results and 
sometimes stalls.  You may want to try that.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-12 19:43       ` Al Boldi
@ 2007-08-21 10:58         ` Ingo Molnar
  2007-08-21 22:27           ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-21 10:58 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> There is one workload that still isn't performing well; it's a 
> web-server workload that spawns 1K+ client procs.  It can be emulated 
> by using this:
> 
>   for i in `seq 1 to 3333`; do ping 10.1 -A > /dev/null & done

on bash i did this as:

  for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done

and this quickly creates a monster-runqueue with tons of ping tasks 
pending. (i replaced 10.1 with the IP of another box on the same LAN as 
the testbox) Is this what should happen?

> The problem is that consecutive runs don't give consistent results and 
> sometimes stalls.  You may want to try that.

well, there's a natural saturation point after a few hundred tasks 
(depending on your CPU's speed), at which point there's no idle time 
left. From that point on things get slower progressively (and the 
ability of the shell to start new ping tasks is impacted as well), but 
that's expected on an overloaded system, isnt it?

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-21 10:58         ` Ingo Molnar
@ 2007-08-21 22:27           ` Al Boldi
  2007-08-24 13:45             ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-21 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > There is one workload that still isn't performing well; it's a
> > web-server workload that spawns 1K+ client procs.  It can be emulated
> > by using this:
> >
> >   for i in `seq 1 to 3333`; do ping 10.1 -A > /dev/null & done
>
> on bash i did this as:
>
>   for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done
>
> and this quickly creates a monster-runqueue with tons of ping tasks
> pending. (i replaced 10.1 with the IP of another box on the same LAN as
> the testbox) Is this what should happen?

Yes, sometimes they start pending and sometimes they run immediately.

> > The problem is that consecutive runs don't give consistent results and
> > sometimes stalls.  You may want to try that.
>
> well, there's a natural saturation point after a few hundred tasks
> (depending on your CPU's speed), at which point there's no idle time
> left. From that point on things get slower progressively (and the
> ability of the shell to start new ping tasks is impacted as well), but
> that's expected on an overloaded system, isnt it?

Of course, things should get slower with higher load, but it should be 
consistent without stalls.

To see this problem, make sure you boot into /bin/sh with the normal VGA 
console (ie. not fb-console).  Then try each loop a few times to show 
different behaviour; loops like:

# for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done

# for ((i=0; i<3333; i++)); do nice -99 ping 10.1 -A > /dev/null & done

# { for ((i=0; i<3333; i++)); do
        ping 10.1 -A > /dev/null &
    done } > /dev/null 2>&1

Especially the last one sometimes causes a complete console lock-up, while 
the other two sometimes stall then surge periodically.

BTW, I am also wondering how one might test threading behaviour wrt to 
startup and sync-on-exit with parent thread.  This may not show any problems 
with small number of threads, but how does it scale with 1K+?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-21 22:27           ` Al Boldi
@ 2007-08-24 13:45             ` Ingo Molnar
  2007-08-25 22:27               ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-24 13:45 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > > The problem is that consecutive runs don't give consistent results 
> > > and sometimes stalls.  You may want to try that.
> >
> > well, there's a natural saturation point after a few hundred tasks 
> > (depending on your CPU's speed), at which point there's no idle time 
> > left. From that point on things get slower progressively (and the 
> > ability of the shell to start new ping tasks is impacted as well), 
> > but that's expected on an overloaded system, isnt it?
> 
> Of course, things should get slower with higher load, but it should be 
> consistent without stalls.
> 
> To see this problem, make sure you boot into /bin/sh with the normal 
> VGA console (ie. not fb-console).  Then try each loop a few times to 
> show different behaviour; loops like:
> 
> # for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done
> 
> # for ((i=0; i<3333; i++)); do nice -99 ping 10.1 -A > /dev/null & done
> 
> # { for ((i=0; i<3333; i++)); do
>         ping 10.1 -A > /dev/null &
>     done } > /dev/null 2>&1
> 
> Especially the last one sometimes causes a complete console lock-up, 
> while the other two sometimes stall then surge periodically.

ok. I think i might finally have found the bug causing this. Could you 
try the fix below, does your webserver thread-startup test work any 
better?

	Ingo

--------------------------->
Subject: sched: fix startup penalty calculation
From: Ingo Molnar <mingo@elte.hu>

fix task startup penalty miscalculation: sysctl_sched_granularity is
unsigned int and wait_runtime is long so we first have to convert it
to long before turning it negative ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -1048,7 +1048,7 @@ static void task_new_fair(struct rq *rq,
 	 * -granularity/2, so initialize the task with that:
 	 */
 	if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
-		p->se.wait_runtime = -(sysctl_sched_granularity / 2);
+		p->se.wait_runtime = -((long)sysctl_sched_granularity / 2);
 
 	__enqueue_entity(cfs_rq, se);
 }

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-24 13:45             ` Ingo Molnar
@ 2007-08-25 22:27               ` Al Boldi
  2007-08-25 23:15                 ` Ingo Molnar
  2007-08-29  3:37                 ` Bill Davidsen
  0 siblings, 2 replies; 123+ messages in thread
From: Al Boldi @ 2007-08-25 22:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > > The problem is that consecutive runs don't give consistent results
> > > > and sometimes stalls.  You may want to try that.
> > >
> > > well, there's a natural saturation point after a few hundred tasks
> > > (depending on your CPU's speed), at which point there's no idle time
> > > left. From that point on things get slower progressively (and the
> > > ability of the shell to start new ping tasks is impacted as well),
> > > but that's expected on an overloaded system, isnt it?
> >
> > Of course, things should get slower with higher load, but it should be
> > consistent without stalls.
> >
> > To see this problem, make sure you boot into /bin/sh with the normal
> > VGA console (ie. not fb-console).  Then try each loop a few times to
> > show different behaviour; loops like:
> >
> > # for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done
> >
> > # for ((i=0; i<3333; i++)); do nice -99 ping 10.1 -A > /dev/null & done
> >
> > # { for ((i=0; i<3333; i++)); do
> >         ping 10.1 -A > /dev/null &
> >     done } > /dev/null 2>&1
> >
> > Especially the last one sometimes causes a complete console lock-up,
> > while the other two sometimes stall then surge periodically.
>
> ok. I think i might finally have found the bug causing this. Could you
> try the fix below, does your webserver thread-startup test work any
> better?

It seems to help somewhat, but the problem is still visible.  Even v20.3 on 
2.6.22.5 didn't help.

It does look related to ia-boosting, so I turned off __update_curr like Roman 
mentioned, which had an enormous smoothing effect, but then nice levels 
completely break down and lockup the system.

There is another way to show the problem visually under X (vesa-driver), by 
starting 3 gears simultaneously, which after laying them out side-by-side 
need some settling time before smoothing out.  Without __update_curr it's 
absolutely smooth from the start.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-25 22:27               ` Al Boldi
@ 2007-08-25 23:15                 ` Ingo Molnar
  2007-08-26 16:27                   ` Al Boldi
  2007-08-29  3:42                   ` Bill Davidsen
  2007-08-29  3:37                 ` Bill Davidsen
  1 sibling, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-25 23:15 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > ok. I think i might finally have found the bug causing this. Could 
> > you try the fix below, does your webserver thread-startup test work 
> > any better?
> 
> It seems to help somewhat, but the problem is still visible.  Even 
> v20.3 on 2.6.22.5 didn't help.
> 
> It does look related to ia-boosting, so I turned off __update_curr 
> like Roman mentioned, which had an enormous smoothing effect, but then 
> nice levels completely break down and lockup the system.

you can turn sleeper-fairness off via:

   echo 28 > /proc/sys/kernel/sched_features

another thing to try would be:

   echo 12 > /proc/sys/kernel/sched_features

(that's the new-task penalty turned off.)

Another thing to try would be to edit this:

        if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
                p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2);

to:

        if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
                p->se.wait_runtime = -(sched_granularity(cfs_rq);

and could you also check 20.4 on 2.6.22.5 perhaps, or very latest -git? 
(Peter has experienced smaller spikes with that.)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-25 23:15                 ` Ingo Molnar
@ 2007-08-26 16:27                   ` Al Boldi
  2007-08-26 16:39                     ` Ingo Molnar
  2007-08-29  3:42                   ` Bill Davidsen
  1 sibling, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-26 16:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > ok. I think i might finally have found the bug causing this. Could
> > > you try the fix below, does your webserver thread-startup test work
> > > any better?
> >
> > It seems to help somewhat, but the problem is still visible.  Even
> > v20.3 on 2.6.22.5 didn't help.
> >
> > It does look related to ia-boosting, so I turned off __update_curr
> > like Roman mentioned, which had an enormous smoothing effect, but then
> > nice levels completely break down and lockup the system.
>
> you can turn sleeper-fairness off via:
>
>    echo 28 > /proc/sys/kernel/sched_features
>
> another thing to try would be:
>
>    echo 12 > /proc/sys/kernel/sched_features
>
> (that's the new-task penalty turned off.)
>
> Another thing to try would be to edit this:
>
>         if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
>                 p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2);
>
> to:
>
>         if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
>                 p->se.wait_runtime = -(sched_granularity(cfs_rq);
>
> and could you also check 20.4 on 2.6.22.5 perhaps, or very latest -git?
> (Peter has experienced smaller spikes with that.)

Ok, I tried all your suggestions, but nothing works as smooth as removing 
__update_curr.

Does the problem show on your machine with the 3x gears under X-vesa test?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-26 16:27                   ` Al Boldi
@ 2007-08-26 16:39                     ` Ingo Molnar
  2007-08-27  4:06                       ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-26 16:39 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > and could you also check 20.4 on 2.6.22.5 perhaps, or very latest 
> > -git? (Peter has experienced smaller spikes with that.)
> 
> Ok, I tried all your suggestions, but nothing works as smooth as 
> removing __update_curr.

could you send the exact patch that shows what you did? And could you 
also please describe it exactly which aspect of the workload you call 
'smooth'. Could it be made quantitative somehow?

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-26 16:39                     ` Ingo Molnar
@ 2007-08-27  4:06                       ` Al Boldi
  2007-08-27 10:53                         ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-27  4:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > and could you also check 20.4 on 2.6.22.5 perhaps, or very latest
> > > -git? (Peter has experienced smaller spikes with that.)
> >
> > Ok, I tried all your suggestions, but nothing works as smooth as
> > removing __update_curr.
>
> could you send the exact patch that shows what you did?

On 2.6.22.5-v20.3 (not v20.4):

340-    curr->delta_exec += delta_exec;
341-
342-    if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) {
343://          __update_curr(cfs_rq, curr);
344-            curr->delta_exec = 0;
345-    }
346-    curr->exec_start = rq_of(cfs_rq)->clock;

> And could you
> also please describe it exactly which aspect of the workload you call
> 'smooth'. Could it be made quantitative somehow?

The 3x gears test shows the startup problem in a really noticeable way.  With 
v20.4 they startup surging and stalling periodically for about 10sec, then 
they are smooth.  With v20.3 + above patch they startup completely smooth.


Thanks!

--
Al

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-27  4:06                       ` Al Boldi
@ 2007-08-27 10:53                         ` Ingo Molnar
  2007-08-27 14:46                           ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-27 10:53 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > could you send the exact patch that shows what you did?
> 
> On 2.6.22.5-v20.3 (not v20.4):
> 
> 340-    curr->delta_exec += delta_exec;
> 341-
> 342-    if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) {
> 343://          __update_curr(cfs_rq, curr);
> 344-            curr->delta_exec = 0;
> 345-    }
> 346-    curr->exec_start = rq_of(cfs_rq)->clock;

ouch - this produces a really broken scheduler - with this we dont do 
any run-time accounting (!).

Could you try the patch below instead, does this make 3x glxgears smooth 
again? (if yes, could you send me your Signed-off-by line as well.)

	Ingo

------------------------>
Subject: sched: make the scheduler converge to the ideal latency
From: Ingo Molnar <mingo@elte.hu>

de-HZ-ification of the granularity defaults unearthed a pre-existing
property of CFS: while it correctly converges to the granularity goal,
it does not prevent run-time fluctuations in the range of [-gran ...
+gran].

With the increase of the granularity due to the removal of HZ
dependencies, this becomes visible in chew-max output (with 5 tasks
running):

 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
 out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40

average slice is the ideal 13 msecs and the period is picture-perfect 40
msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
mechanism in CFS to keep that from happening: it's a perfectly valid
solution that CFS finds.

the solution is to add a granularity/preemption rule that knows about
the "target latency", which makes tasks that run longer than the ideal
latency run a bit less. The simplest approach is to simply decrease the
preemption granularity when a task overruns its ideal latency. For this
we have to track how much the task executed since its last preemption.

( this adds a new field to task_struct, but we can eliminate that
  overhead in 2.6.24 by putting all the scheduler timestamps into an
  anonymous union. )

with this change in place, chew-max output is fluctuation-less all
around:

 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40

this patch has no impact on any fastpath or on any globally observable
scheduling property. (unless you have sharp enough eyes to see
millisecond-level ruckles in glxgears smoothness :-)

Also, with this mechanism in place the formula for adaptive granularity
can be simplified down to the obvious "granularity = latency/nr_running"
calculation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h |    1 +
 kernel/sched_fair.c   |   43 ++++++++++++++-----------------------------
 2 files changed, 15 insertions(+), 29 deletions(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -904,6 +904,7 @@ struct sched_entity {
 
 	u64			exec_start;
 	u64			sum_exec_runtime;
+	u64			prev_sum_exec_runtime;
 	u64			wait_start_fair;
 	u64			sleep_start_fair;
 
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -225,30 +225,6 @@ static struct sched_entity *__pick_next_
  * Calculate the preemption granularity needed to schedule every
  * runnable task once per sysctl_sched_latency amount of time.
  * (down to a sensible low limit on granularity)
- *
- * For example, if there are 2 tasks running and latency is 10 msecs,
- * we switch tasks every 5 msecs. If we have 3 tasks running, we have
- * to switch tasks every 3.33 msecs to get a 10 msecs observed latency
- * for each task. We do finer and finer scheduling up to until we
- * reach the minimum granularity value.
- *
- * To achieve this we use the following dynamic-granularity rule:
- *
- *    gran = lat/nr - lat/nr/nr
- *
- * This comes out of the following equations:
- *
- *    kA1 + gran = kB1
- *    kB2 + gran = kA2
- *    kA2 = kA1
- *    kB2 = kB1 - d + d/nr
- *    lat = d * nr
- *
- * Where 'k' is key, 'A' is task A (waiting), 'B' is task B (running),
- * '1' is start of time, '2' is end of time, 'd' is delay between
- * 1 and 2 (during which task B was running), 'nr' is number of tasks
- * running, 'lat' is the the period of each task. ('lat' is the
- * sched_latency that we aim for.)
  */
 static long
 sched_granularity(struct cfs_rq *cfs_rq)
@@ -257,7 +233,7 @@ sched_granularity(struct cfs_rq *cfs_rq)
 	unsigned int nr = cfs_rq->nr_running;
 
 	if (nr > 1) {
-		gran = gran/nr - gran/nr/nr;
+		gran = gran/nr;
 		gran = max(gran, sysctl_sched_min_granularity);
 	}
 
@@ -668,7 +644,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 /*
  * Preempt the current task with a newly woken task if needed:
  */
-static void
+static int
 __check_preempt_curr_fair(struct cfs_rq *cfs_rq, struct sched_entity *se,
 			  struct sched_entity *curr, unsigned long granularity)
 {
@@ -679,8 +655,11 @@ __check_preempt_curr_fair(struct cfs_rq 
 	 * preempt the current task unless the best task has
 	 * a larger than sched_granularity fairness advantage:
 	 */
-	if (__delta > niced_granularity(curr, granularity))
+	if (__delta > niced_granularity(curr, granularity)) {
 		resched_task(rq_of(cfs_rq)->curr);
+		return 1;
+	}
+	return 0;
 }
 
 static inline void
@@ -725,6 +704,7 @@ static void put_prev_entity(struct cfs_r
 
 static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
+	unsigned long gran, delta_exec;
 	struct sched_entity *next;
 
 	/*
@@ -741,8 +721,13 @@ static void entity_tick(struct cfs_rq *c
 	if (next == curr)
 		return;
 
-	__check_preempt_curr_fair(cfs_rq, next, curr,
-			sched_granularity(cfs_rq));
+	gran = sched_granularity(cfs_rq);
+	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+	if (delta_exec > gran)
+		gran = 0;
+
+	if (__check_preempt_curr_fair(cfs_rq, next, curr, gran))
+		curr->prev_sum_exec_runtime = curr->sum_exec_runtime;
 }
 
 /**************************************************

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-27 10:53                         ` Ingo Molnar
@ 2007-08-27 14:46                           ` Al Boldi
  2007-08-27 20:41                             ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-27 14:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > could you send the exact patch that shows what you did?
> >
> > On 2.6.22.5-v20.3 (not v20.4):
> >
> > 340-    curr->delta_exec += delta_exec;
> > 341-
> > 342-    if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity))
> > { 343://          __update_curr(cfs_rq, curr);
> > 344-            curr->delta_exec = 0;
> > 345-    }
> > 346-    curr->exec_start = rq_of(cfs_rq)->clock;
>
> ouch - this produces a really broken scheduler - with this we dont do
> any run-time accounting (!).

Of course it's broken, and it's not meant as a fix, but this change allows 
you to see the amount of overhead as well as any miscalculations 
__update_curr incurs.

In terms of overhead, __update_curr incurs ~3x slowdown, and in terms of 
run-time accounting it exhibits a ~10sec task-startup miscalculation.

> Could you try the patch below instead, does this make 3x glxgears smooth
> again? (if yes, could you send me your Signed-off-by line as well.)

The task-startup stalling is still there for ~10sec.

Can you see the problem on your machine?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-27 14:46                           ` Al Boldi
@ 2007-08-27 20:41                             ` Ingo Molnar
  2007-08-28  4:37                               ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-27 20:41 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > Could you try the patch below instead, does this make 3x glxgears 
> > smooth again? (if yes, could you send me your Signed-off-by line as 
> > well.)
> 
> The task-startup stalling is still there for ~10sec.
> 
> Can you see the problem on your machine?

nope (i have no framebuffer setup) - but i can see some chew-max 
latencies that occur when new tasks are started up. I _think_ it's 
probably the same problem as yours.

could you try the patch below (which is the combo patch of my current 
queue), ontop of head 50c46637aa? This makes chew-max behave better 
during task mass-startup here.

	Ingo

----------------->
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -904,6 +904,7 @@ struct sched_entity {
 
 	u64			exec_start;
 	u64			sum_exec_runtime;
+	u64			prev_sum_exec_runtime;
 	u64			wait_start_fair;
 	u64			sleep_start_fair;
 
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1587,6 +1587,7 @@ static void __sched_fork(struct task_str
 	p->se.wait_start_fair		= 0;
 	p->se.exec_start		= 0;
 	p->se.sum_exec_runtime		= 0;
+	p->se.prev_sum_exec_runtime	= 0;
 	p->se.delta_exec		= 0;
 	p->se.delta_fair_run		= 0;
 	p->se.delta_fair_sleep		= 0;
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -82,12 +82,12 @@ enum {
 };
 
 unsigned int sysctl_sched_features __read_mostly =
-		SCHED_FEAT_FAIR_SLEEPERS	*1 |
+		SCHED_FEAT_FAIR_SLEEPERS	*0 |
 		SCHED_FEAT_SLEEPER_AVG		*0 |
 		SCHED_FEAT_SLEEPER_LOAD_AVG	*1 |
 		SCHED_FEAT_PRECISE_CPU_LOAD	*1 |
-		SCHED_FEAT_START_DEBIT		*1 |
-		SCHED_FEAT_SKIP_INITIAL		*0;
+		SCHED_FEAT_START_DEBIT		*0 |
+		SCHED_FEAT_SKIP_INITIAL		*1;
 
 extern struct sched_class fair_sched_class;
 
@@ -225,39 +225,15 @@ static struct sched_entity *__pick_next_
  * Calculate the preemption granularity needed to schedule every
  * runnable task once per sysctl_sched_latency amount of time.
  * (down to a sensible low limit on granularity)
- *
- * For example, if there are 2 tasks running and latency is 10 msecs,
- * we switch tasks every 5 msecs. If we have 3 tasks running, we have
- * to switch tasks every 3.33 msecs to get a 10 msecs observed latency
- * for each task. We do finer and finer scheduling up to until we
- * reach the minimum granularity value.
- *
- * To achieve this we use the following dynamic-granularity rule:
- *
- *    gran = lat/nr - lat/nr/nr
- *
- * This comes out of the following equations:
- *
- *    kA1 + gran = kB1
- *    kB2 + gran = kA2
- *    kA2 = kA1
- *    kB2 = kB1 - d + d/nr
- *    lat = d * nr
- *
- * Where 'k' is key, 'A' is task A (waiting), 'B' is task B (running),
- * '1' is start of time, '2' is end of time, 'd' is delay between
- * 1 and 2 (during which task B was running), 'nr' is number of tasks
- * running, 'lat' is the the period of each task. ('lat' is the
- * sched_latency that we aim for.)
  */
-static long
+static unsigned long
 sched_granularity(struct cfs_rq *cfs_rq)
 {
 	unsigned int gran = sysctl_sched_latency;
 	unsigned int nr = cfs_rq->nr_running;
 
 	if (nr > 1) {
-		gran = gran/nr - gran/nr/nr;
+		gran = gran/nr;
 		gran = max(gran, sysctl_sched_min_granularity);
 	}
 
@@ -489,6 +465,9 @@ update_stats_wait_end(struct cfs_rq *cfs
 {
 	unsigned long delta_fair;
 
+	if (unlikely(!se->wait_start_fair))
+		return;
+
 	delta_fair = (unsigned long)min((u64)(2*sysctl_sched_runtime_limit),
 			(u64)(cfs_rq->fair_clock - se->wait_start_fair));
 
@@ -668,7 +647,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 /*
  * Preempt the current task with a newly woken task if needed:
  */
-static void
+static int
 __check_preempt_curr_fair(struct cfs_rq *cfs_rq, struct sched_entity *se,
 			  struct sched_entity *curr, unsigned long granularity)
 {
@@ -679,8 +658,11 @@ __check_preempt_curr_fair(struct cfs_rq 
 	 * preempt the current task unless the best task has
 	 * a larger than sched_granularity fairness advantage:
 	 */
-	if (__delta > niced_granularity(curr, granularity))
+	if (__delta > niced_granularity(curr, granularity)) {
 		resched_task(rq_of(cfs_rq)->curr);
+		return 1;
+	}
+	return 0;
 }
 
 static inline void
@@ -725,6 +707,7 @@ static void put_prev_entity(struct cfs_r
 
 static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
+	unsigned long gran, delta_exec;
 	struct sched_entity *next;
 
 	/*
@@ -741,8 +724,13 @@ static void entity_tick(struct cfs_rq *c
 	if (next == curr)
 		return;
 
-	__check_preempt_curr_fair(cfs_rq, next, curr,
-			sched_granularity(cfs_rq));
+	gran = sched_granularity(cfs_rq);
+	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+	if (delta_exec > gran)
+		gran = 0;
+
+	if (__check_preempt_curr_fair(cfs_rq, next, curr, gran))
+		curr->prev_sum_exec_runtime = curr->sum_exec_runtime;
 }
 
 /**************************************************
@@ -1080,29 +1068,27 @@ static void task_new_fair(struct rq *rq,
 
 	sched_info_queued(p);
 
+	update_curr(cfs_rq);
 	update_stats_enqueue(cfs_rq, se);
-	/*
-	 * Child runs first: we let it run before the parent
-	 * until it reschedules once. We set up the key so that
-	 * it will preempt the parent:
-	 */
-	p->se.fair_key = current->se.fair_key -
-		niced_granularity(&rq->curr->se, sched_granularity(cfs_rq)) - 1;
+
 	/*
 	 * The first wait is dominated by the child-runs-first logic,
 	 * so do not credit it with that waiting time yet:
 	 */
 	if (sysctl_sched_features & SCHED_FEAT_SKIP_INITIAL)
-		p->se.wait_start_fair = 0;
+		se->wait_start_fair = 0;
 
 	/*
 	 * The statistical average of wait_runtime is about
 	 * -granularity/2, so initialize the task with that:
 	 */
-	if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
-		p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2);
+	if (sysctl_sched_features & SCHED_FEAT_START_DEBIT) {
+		se->wait_runtime = -(sched_granularity(cfs_rq)/2);
+		schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
+	}
 
 	__enqueue_entity(cfs_rq, se);
+	resched_task(current);
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-27 20:41                             ` Ingo Molnar
@ 2007-08-28  4:37                               ` Al Boldi
  2007-08-28  5:05                                 ` Linus Torvalds
                                                   ` (2 more replies)
  0 siblings, 3 replies; 123+ messages in thread
From: Al Boldi @ 2007-08-28  4:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > Could you try the patch below instead, does this make 3x glxgears
> > > smooth again? (if yes, could you send me your Signed-off-by line as
> > > well.)
> >
> > The task-startup stalling is still there for ~10sec.
> >
> > Can you see the problem on your machine?
>
> nope (i have no framebuffer setup)

No need for framebuffer.  All you need is X using the X.org vesa-driver.  
Then start gears like this:

  # gears & gears & gears &

Then lay them out side by side to see the periodic stallings for ~10sec.

> - but i can see some chew-max
> latencies that occur when new tasks are started up. I _think_ it's
> probably the same problem as yours.

chew-max is great, but it's too accurate in that it exposes any scheduling 
glitches and as such hides the startup glitch within the many glitches it 
exposes.  For example, it fluctuates all over the place using this:

  # for ((i=0;i<9;i++)); do chew-max 60 > /dev/shm/chew$i.log & done

Also, chew-max locks-up when disabling __update_curr, which means that the 
workload of chew-max is different from either the ping-startup loop or the 
gears.  You really should try the gears test by any means, as the problem is 
really pronounced there.

> could you try the patch below (which is the combo patch of my current
> queue), ontop of head 50c46637aa? This makes chew-max behave better
> during task mass-startup here.

Still no improvement.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  4:37                               ` Al Boldi
@ 2007-08-28  5:05                                 ` Linus Torvalds
  2007-08-28  5:23                                   ` Al Boldi
  2007-08-28 20:46                                   ` Valdis.Kletnieks
  2007-08-28  7:43                                 ` Xavier Bestel
  2007-08-29  4:18                                 ` Ingo Molnar
  2 siblings, 2 replies; 123+ messages in thread
From: Linus Torvalds @ 2007-08-28  5:05 UTC (permalink / raw)
  To: Al Boldi
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Andrew Morton, linux-kernel



On Tue, 28 Aug 2007, Al Boldi wrote:
> 
> No need for framebuffer.  All you need is X using the X.org vesa-driver.  
> Then start gears like this:
> 
>   # gears & gears & gears &
> 
> Then lay them out side by side to see the periodic stallings for ~10sec.

I don't think this is a good test.

Why?

If you're not using direct rendering, what you have is the X server doing 
all the rendering, which in turn means that what you are testing is quite 
possibly not so much about the *kernel* scheduling, but about *X-server* 
scheduling!

I'm sure the kernel scheduler has an impact, but what's more likely to be 
going on is that you're seeing effects that are indirect, and not 
necessarily at all even "good".

For example, if the X server is the scheduling point, it's entirely 
possible that it ends up showing effects that are more due to the queueing 
of the X command stream than due to the scheduler - and that those 
stalls are simply due to *that*.

One thing to try is to run the X connection in synchronous mode, which 
minimizes queueing issues. I don't know if gears has a flag to turn on 
synchronous X messaging, though. Many X programs take the "[+-]sync" flag 
to turn on synchronous mode, iirc.

		Linus

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  5:05                                 ` Linus Torvalds
@ 2007-08-28  5:23                                   ` Al Boldi
  2007-08-28  7:28                                     ` Mike Galbraith
  2007-08-28 16:34                                     ` Linus Torvalds
  2007-08-28 20:46                                   ` Valdis.Kletnieks
  1 sibling, 2 replies; 123+ messages in thread
From: Al Boldi @ 2007-08-28  5:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Andrew Morton, linux-kernel

Linus Torvalds wrote:
> On Tue, 28 Aug 2007, Al Boldi wrote:
> > No need for framebuffer.  All you need is X using the X.org vesa-driver.
> > Then start gears like this:
> >
> >   # gears & gears & gears &
> >
> > Then lay them out side by side to see the periodic stallings for ~10sec.
>
> I don't think this is a good test.
>
> Why?
>
> If you're not using direct rendering, what you have is the X server doing
> all the rendering, which in turn means that what you are testing is quite
> possibly not so much about the *kernel* scheduling, but about *X-server*
> scheduling!
>
> I'm sure the kernel scheduler has an impact, but what's more likely to be
> going on is that you're seeing effects that are indirect, and not
> necessarily at all even "good".
>
> For example, if the X server is the scheduling point, it's entirely
> possible that it ends up showing effects that are more due to the queueing
> of the X command stream than due to the scheduler - and that those
> stalls are simply due to *that*.
>
> One thing to try is to run the X connection in synchronous mode, which
> minimizes queueing issues. I don't know if gears has a flag to turn on
> synchronous X messaging, though. Many X programs take the "[+-]sync" flag
> to turn on synchronous mode, iirc.

I like your analysis, but how do you explain that these stalls vanish when 
__update_curr is disabled?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  5:23                                   ` Al Boldi
@ 2007-08-28  7:28                                     ` Mike Galbraith
  2007-08-28  7:36                                       ` Ingo Molnar
  2007-08-28 16:34                                     ` Linus Torvalds
  1 sibling, 1 reply; 123+ messages in thread
From: Mike Galbraith @ 2007-08-28  7:28 UTC (permalink / raw)
  To: Al Boldi
  Cc: Linus Torvalds, Ingo Molnar, Peter Zijlstra, Andrew Morton, linux-kernel

On Tue, 2007-08-28 at 08:23 +0300, Al Boldi wrote:
> Linus Torvalds wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > No need for framebuffer.  All you need is X using the X.org vesa-driver.
> > > Then start gears like this:
> > >
> > >   # gears & gears & gears &
> > >
> > > Then lay them out side by side to see the periodic stallings for ~10sec.
> >
> > I don't think this is a good test.
> >
> > Why?
> >
> > If you're not using direct rendering, what you have is the X server doing
> > all the rendering, which in turn means that what you are testing is quite
> > possibly not so much about the *kernel* scheduling, but about *X-server*
> > scheduling!
> >
> > I'm sure the kernel scheduler has an impact, but what's more likely to be
> > going on is that you're seeing effects that are indirect, and not
> > necessarily at all even "good".
> >
> > For example, if the X server is the scheduling point, it's entirely
> > possible that it ends up showing effects that are more due to the queueing
> > of the X command stream than due to the scheduler - and that those
> > stalls are simply due to *that*.
> >
> > One thing to try is to run the X connection in synchronous mode, which
> > minimizes queueing issues. I don't know if gears has a flag to turn on
> > synchronous X messaging, though. Many X programs take the "[+-]sync" flag
> > to turn on synchronous mode, iirc.
> 
> I like your analysis, but how do you explain that these stalls vanish when 
> __update_curr is disabled?

When you disable __update_curr(), you're utterly destroying the
scheduler.  There may well be a scheduler connection, but disabling
__update_curr() doesn't tell you anything meaningful.  Basically, you're
letting all tasks run uninterrupted for just as long as they please
(which is why busy loops lock your box solid as a rock).  I'd suggest
gathering some sched_debug stats or something... shoot, _anything_ but
what you did :)

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  7:28                                     ` Mike Galbraith
@ 2007-08-28  7:36                                       ` Ingo Molnar
  0 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-28  7:36 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Al Boldi, Linus Torvalds, Peter Zijlstra, Andrew Morton, linux-kernel


* Mike Galbraith <efault@gmx.de> wrote:

> > I like your analysis, but how do you explain that these stalls 
> > vanish when __update_curr is disabled?
> 
> When you disable __update_curr(), you're utterly destroying the
> scheduler.  There may well be a scheduler connection, but disabling
> __update_curr() doesn't tell you anything meaningful.  Basically, you're
> letting all tasks run uninterrupted for just as long as they please
> (which is why busy loops lock your box solid as a rock).  I'd suggest
> gathering some sched_debug stats or something... [...]

the output of the following would be nice:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

captured while the gears are running.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  4:37                               ` Al Boldi
  2007-08-28  5:05                                 ` Linus Torvalds
@ 2007-08-28  7:43                                 ` Xavier Bestel
  2007-08-28  8:02                                   ` Ingo Molnar
  2007-08-29  4:18                                 ` Ingo Molnar
  2 siblings, 1 reply; 123+ messages in thread
From: Xavier Bestel @ 2007-08-28  7:43 UTC (permalink / raw)
  To: Al Boldi
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel

On Tue, 2007-08-28 at 07:37 +0300, Al Boldi wrote:
> start gears like this:
> 
>   # gears & gears & gears &
> 
> Then lay them out side by side to see the periodic stallings for
> ~10sec.

Are you sure they are stalled ? What you may have is simple gears
running at a multiple of your screen refresh rate, so they only appear
stalled.

Plus, as said Linus, you're not really testing the kernel scheduler.
gears is really bad benchmark, it should die.

	Xav



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  7:43                                 ` Xavier Bestel
@ 2007-08-28  8:02                                   ` Ingo Molnar
  2007-08-28 19:19                                     ` Willy Tarreau
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-28  8:02 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel


* Xavier Bestel <xavier.bestel@free.fr> wrote:

> Are you sure they are stalled ? What you may have is simple gears 
> running at a multiple of your screen refresh rate, so they only appear 
> stalled.
> 
> Plus, as said Linus, you're not really testing the kernel scheduler. 
> gears is really bad benchmark, it should die.

i like glxgears as long as it runs on _real_ 3D hardware, because there 
it has minimal interaction with X and so it's an excellent visual test 
about consistency of scheduling. You can immediately see (literally) 
scheduling hickups down to a millisecond range (!). In this sense, if 
done and interpreted carefully, glxgears gives more feedback than many 
audio tests. (audio latency problems are audible, but on most sound hw 
it takes quite a bit of latency to produce an xrun.) So basically 
glxgears is the "early warning system" that tells us about the potential 
for xruns earlier than an xrun would happen for real.

[ of course you can also run all the other tools to get numeric results,
  but glxgears is nice in that it gives immediate visual feedback. ]

but i agree that on a non-accelerated X setup glxgears is not really 
meaningful. It can have similar "spam the X server" effects as xperf.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  5:23                                   ` Al Boldi
  2007-08-28  7:28                                     ` Mike Galbraith
@ 2007-08-28 16:34                                     ` Linus Torvalds
  2007-08-28 16:44                                       ` Arjan van de Ven
  2007-08-28 16:45                                       ` Ingo Molnar
  1 sibling, 2 replies; 123+ messages in thread
From: Linus Torvalds @ 2007-08-28 16:34 UTC (permalink / raw)
  To: Al Boldi
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Andrew Morton, linux-kernel



On Tue, 28 Aug 2007, Al Boldi wrote:
> 
> I like your analysis, but how do you explain that these stalls vanish when 
> __update_curr is disabled?

It's entirely possible that what happens is that the X scheduling is just 
a slightly unstable system - which effectively would turn a small 
scheduling difference into a *huge* visible difference.

And the "small scheduling difference" might be as simple as "if the 
process slept for a while, we give it a bit more CPU time". And then you 
get into some unbalanced setup where the X scheduler makes it sleep even 
more, because it fills its buffers.

Or something. I can easily see two schedulers that are trying to 
*individually* be "fair", fighting it out in a way where the end result is 
not very good.

I do suspect it's probably a very interesting load, so I hope Ingo looks 
more at it, but I also suspect it's more than just the kernel scheduler.

		Linus

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28 16:34                                     ` Linus Torvalds
@ 2007-08-28 16:44                                       ` Arjan van de Ven
  2007-08-28 16:45                                       ` Ingo Molnar
  1 sibling, 0 replies; 123+ messages in thread
From: Arjan van de Ven @ 2007-08-28 16:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Boldi, Ingo Molnar, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, linux-kernel

On Tue, 28 Aug 2007 09:34:03 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 28 Aug 2007, Al Boldi wrote:
> > 
> > I like your analysis, but how do you explain that these stalls
> > vanish when __update_curr is disabled?
> 
> It's entirely possible that what happens is that the X scheduling is
> just a slightly unstable system - which effectively would turn a
> small scheduling difference into a *huge* visible difference.

one thing that happens if you remove __update_curr is the following
pattern (since no apps will get preempted involuntarily)

app 1 submits a full frame worth of 3D stuff to X 
app 1 then sleeps/waits for that to complete
X gets to run, has 1 full frame to render, does this
X now waits for more input
app 2 now gets to run and submits a full frame
app 2 then sleeps again
X gets to run again to process and complete
X goes to sleep
app 3 gets to run and submits a full frame
app 3 then sleeps
X runs
X sleeps
app 1 gets to submit a frame

etc etc

so without preemption happening, you can get "perfect" behavior, just
because everything is perfectly doing 1 thing at a time cooperatively.
once you start doing timeslices and enforcing limits on them, this
"perfect pattern" will break down (remember this is all software
rendering in the problem being described), and whatever you get won't
be as perfect as this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28 16:34                                     ` Linus Torvalds
  2007-08-28 16:44                                       ` Arjan van de Ven
@ 2007-08-28 16:45                                       ` Ingo Molnar
  2007-08-29  4:19                                         ` Al Boldi
  1 sibling, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-28 16:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton, linux-kernel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 28 Aug 2007, Al Boldi wrote:
> > 
> > I like your analysis, but how do you explain that these stalls 
> > vanish when __update_curr is disabled?
> 
> It's entirely possible that what happens is that the X scheduling is 
> just a slightly unstable system - which effectively would turn a small 
> scheduling difference into a *huge* visible difference.

i think it's because disabling __update_curr() in essence removes the 
ability of scheduler to preempt tasks - that hack in essence results in 
a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous 
pair of tasks in essence - and thus gears cannot "overload" X.

Normally gears + X is an asynchronous pair of tasks, with gears (or 
xperf, or devel versions of firefox, etc.) not being throttled at all 
and thus being able to overload/spam the X server with requests. (And we 
generally want to _reward_ asynchronity and want to allow tasks to 
overlap each other and we want each task to go as fast and as parallel 
as it can.)

Eventually X's built-in "bad, abusive client" throttling code kicks in, 
which, AFAIK is pretty crude and might yield to such artifacts. But ... 
it would be nice for an X person to confirm - and in any case i'll try 
Al's workload - i thought i had a reproducer but i barked up the wrong 
tree :-) My laptop doesnt run with the vesa driver, so i have no easy 
reproducer for now.

( also, it would be nice if Al could try rc4 plus my latest scheduler
  tree as well - just on the odd chance that something got fixed
  meanwhile. In particular Mike's sleeper-bonus-limit fix could be
  related. )

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  8:02                                   ` Ingo Molnar
@ 2007-08-28 19:19                                     ` Willy Tarreau
  2007-08-28 19:55                                       ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Willy Tarreau @ 2007-08-28 19:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xavier Bestel, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel

On Tue, Aug 28, 2007 at 10:02:18AM +0200, Ingo Molnar wrote:
> 
> * Xavier Bestel <xavier.bestel@free.fr> wrote:
> 
> > Are you sure they are stalled ? What you may have is simple gears 
> > running at a multiple of your screen refresh rate, so they only appear 
> > stalled.
> > 
> > Plus, as said Linus, you're not really testing the kernel scheduler. 
> > gears is really bad benchmark, it should die.
> 
> i like glxgears as long as it runs on _real_ 3D hardware, because there 
> it has minimal interaction with X and so it's an excellent visual test 
> about consistency of scheduling. You can immediately see (literally) 
> scheduling hickups down to a millisecond range (!). In this sense, if 
> done and interpreted carefully, glxgears gives more feedback than many 
> audio tests. (audio latency problems are audible, but on most sound hw 
> it takes quite a bit of latency to produce an xrun.) So basically 
> glxgears is the "early warning system" that tells us about the potential 
> for xruns earlier than an xrun would happen for real.
> 
> [ of course you can also run all the other tools to get numeric results,
>   but glxgears is nice in that it gives immediate visual feedback. ]

Al could also test ocbench, which brings visual feedback without harnessing
the X server :  http://linux.1wt.eu/sched/

I packaged it exactly for this problem and it has already helped. It uses
X after each loop, so if you run it with large run time, X is nearly not
sollicitated. 

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28 19:19                                     ` Willy Tarreau
@ 2007-08-28 19:55                                       ` Ingo Molnar
  0 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-28 19:55 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Xavier Bestel, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel


* Willy Tarreau <w@1wt.eu> wrote:

> On Tue, Aug 28, 2007 at 10:02:18AM +0200, Ingo Molnar wrote:
> > 
> > * Xavier Bestel <xavier.bestel@free.fr> wrote:
> > 
> > > Are you sure they are stalled ? What you may have is simple gears 
> > > running at a multiple of your screen refresh rate, so they only appear 
> > > stalled.
> > > 
> > > Plus, as said Linus, you're not really testing the kernel scheduler. 
> > > gears is really bad benchmark, it should die.
> > 
> > i like glxgears as long as it runs on _real_ 3D hardware, because there 
> > it has minimal interaction with X and so it's an excellent visual test 
> > about consistency of scheduling. You can immediately see (literally) 
> > scheduling hickups down to a millisecond range (!). In this sense, if 
> > done and interpreted carefully, glxgears gives more feedback than many 
> > audio tests. (audio latency problems are audible, but on most sound hw 
> > it takes quite a bit of latency to produce an xrun.) So basically 
> > glxgears is the "early warning system" that tells us about the potential 
> > for xruns earlier than an xrun would happen for real.
> > 
> > [ of course you can also run all the other tools to get numeric results,
> >   but glxgears is nice in that it gives immediate visual feedback. ]
> 
> Al could also test ocbench, which brings visual feedback without 
> harnessing the X server : http://linux.1wt.eu/sched/
> 
> I packaged it exactly for this problem and it has already helped. It 
> uses X after each loop, so if you run it with large run time, X is 
> nearly not sollicitated.

yeah, and ocbench is one of my favorite cross-task-fairness tests - i 
dont release a CFS patch without checking it with ocbench first :-)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  5:05                                 ` Linus Torvalds
  2007-08-28  5:23                                   ` Al Boldi
@ 2007-08-28 20:46                                   ` Valdis.Kletnieks
  1 sibling, 0 replies; 123+ messages in thread
From: Valdis.Kletnieks @ 2007-08-28 20:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Boldi, Ingo Molnar, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On Mon, 27 Aug 2007 22:05:37 PDT, Linus Torvalds said:
> 
> 
> On Tue, 28 Aug 2007, Al Boldi wrote:
> > 
> > No need for framebuffer.  All you need is X using the X.org vesa-driver.  
> > Then start gears like this:
> > 
> >   # gears & gears & gears &
> > 
> > Then lay them out side by side to see the periodic stallings for ~10sec.
> 
> I don't think this is a good test.
> 
> Why?
> 
> If you're not using direct rendering, what you have is the X server doing 
> all the rendering, which in turn means that what you are testing is quite 
> possibly not so much about the *kernel* scheduling, but about *X-server* 
> scheduling!

I wonder - can people who are doing this as a test please specify whether
they're using an older X that has the libX11 or the newer libxcb code? That
may have a similar impact as well.

(libxcb is pretty new - it landed in Fedora Rawhide just about a month ago,
after Fedora 7 shipped.  Not sure what other distros have it now...)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-25 22:27               ` Al Boldi
  2007-08-25 23:15                 ` Ingo Molnar
@ 2007-08-29  3:37                 ` Bill Davidsen
  2007-08-29  3:45                   ` Ingo Molnar
  1 sibling, 1 reply; 123+ messages in thread
From: Bill Davidsen @ 2007-08-29  3:37 UTC (permalink / raw)
  To: Al Boldi
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel

Al Boldi wrote:
> Ingo Molnar wrote:
>> * Al Boldi <a1426z@gawab.com> wrote:
>>>>> The problem is that consecutive runs don't give consistent results
>>>>> and sometimes stalls.  You may want to try that.
>>>> well, there's a natural saturation point after a few hundred tasks
>>>> (depending on your CPU's speed), at which point there's no idle time
>>>> left. From that point on things get slower progressively (and the
>>>> ability of the shell to start new ping tasks is impacted as well),
>>>> but that's expected on an overloaded system, isnt it?
>>> Of course, things should get slower with higher load, but it should be
>>> consistent without stalls.
>>>
>>> To see this problem, make sure you boot into /bin/sh with the normal
>>> VGA console (ie. not fb-console).  Then try each loop a few times to
>>> show different behaviour; loops like:
>>>
>>> # for ((i=0; i<3333; i++)); do ping 10.1 -A > /dev/null & done
>>>
>>> # for ((i=0; i<3333; i++)); do nice -99 ping 10.1 -A > /dev/null & done
>>>
>>> # { for ((i=0; i<3333; i++)); do
>>>         ping 10.1 -A > /dev/null &
>>>     done } > /dev/null 2>&1
>>>
>>> Especially the last one sometimes causes a complete console lock-up,
>>> while the other two sometimes stall then surge periodically.
>> ok. I think i might finally have found the bug causing this. Could you
>> try the fix below, does your webserver thread-startup test work any
>> better?
> 
> It seems to help somewhat, but the problem is still visible.  Even v20.3 on 
> 2.6.22.5 didn't help.
> 
> It does look related to ia-boosting, so I turned off __update_curr like Roman 
> mentioned, which had an enormous smoothing effect, but then nice levels 
> completely break down and lockup the system.
> 
> There is another way to show the problem visually under X (vesa-driver), by 
> starting 3 gears simultaneously, which after laying them out side-by-side 
> need some settling time before smoothing out.  Without __update_curr it's 
> absolutely smooth from the start.

I posted a LOT of stuff using the glitch1 script, and finally found a 
set of tuning values which make the test script run smooth. See back 
posts, I don't have them here.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-25 23:15                 ` Ingo Molnar
  2007-08-26 16:27                   ` Al Boldi
@ 2007-08-29  3:42                   ` Bill Davidsen
  1 sibling, 0 replies; 123+ messages in thread
From: Bill Davidsen @ 2007-08-29  3:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> 
>>> ok. I think i might finally have found the bug causing this. Could 
>>> you try the fix below, does your webserver thread-startup test work 
>>> any better?
>> It seems to help somewhat, but the problem is still visible.  Even 
>> v20.3 on 2.6.22.5 didn't help.
>>
>> It does look related to ia-boosting, so I turned off __update_curr 
>> like Roman mentioned, which had an enormous smoothing effect, but then 
>> nice levels completely break down and lockup the system.
> 
> you can turn sleeper-fairness off via:
> 
>    echo 28 > /proc/sys/kernel/sched_features
> 
> another thing to try would be:
> 
>    echo 12 > /proc/sys/kernel/sched_features

14, and drop the granularity to 500000.
> 
> (that's the new-task penalty turned off.)
> 
> Another thing to try would be to edit this:
> 
>         if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
>                 p->se.wait_runtime = -(sched_granularity(cfs_rq) / 2);
> 
> to:
> 
>         if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
>                 p->se.wait_runtime = -(sched_granularity(cfs_rq);
> 
> and could you also check 20.4 on 2.6.22.5 perhaps, or very latest -git? 
> (Peter has experienced smaller spikes with that.)
> 
> 	Ingo


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  3:37                 ` Bill Davidsen
@ 2007-08-29  3:45                   ` Ingo Molnar
  2007-08-29 13:11                     ` Bill Davidsen
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  3:45 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel


* Bill Davidsen <davidsen@tmr.com> wrote:

> > There is another way to show the problem visually under X 
> > (vesa-driver), by starting 3 gears simultaneously, which after 
> > laying them out side-by-side need some settling time before 
> > smoothing out.  Without __update_curr it's absolutely smooth from 
> > the start.
> 
> I posted a LOT of stuff using the glitch1 script, and finally found a 
> set of tuning values which make the test script run smooth. See back 
> posts, I don't have them here.

but you have real 3D hw and DRI enabled, correct? In that case X uses up 
almost no CPU time and glxgears makes most of the processing. That is 
quite different from the above software-rendering case, where X spends 
most of the CPU time.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28  4:37                               ` Al Boldi
  2007-08-28  5:05                                 ` Linus Torvalds
  2007-08-28  7:43                                 ` Xavier Bestel
@ 2007-08-29  4:18                                 ` Ingo Molnar
  2007-08-29  4:29                                   ` Keith Packard
  2007-08-29  4:40                                   ` Mike Galbraith
  2 siblings, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  4:18 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel, Keith Packard


* Al Boldi <a1426z@gawab.com> wrote:

> No need for framebuffer.  All you need is X using the X.org 
> vesa-driver.  Then start gears like this:
> 
>   # gears & gears & gears &
> 
> Then lay them out side by side to see the periodic stallings for 
> ~10sec.

i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
and i'm wondering how it can be smooth on vesa-driver at all. I tested 
it on a Core2Duo box and software rendering manages to do about 3 frames 
per second. (although glxgears itself thinks it does ~600 fps) If i 
start 3 glxgears then they do ~1 frame per second each. This is on 
Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
xorg-x11-drv-i810-2.0.0-4.fc7.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-28 16:45                                       ` Ingo Molnar
@ 2007-08-29  4:19                                         ` Al Boldi
  2007-08-29  4:53                                           ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-29  4:19 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, linux-kernel

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Tue, 28 Aug 2007, Al Boldi wrote:
> > > I like your analysis, but how do you explain that these stalls
> > > vanish when __update_curr is disabled?
> >
> > It's entirely possible that what happens is that the X scheduling is
> > just a slightly unstable system - which effectively would turn a small
> > scheduling difference into a *huge* visible difference.
>
> i think it's because disabling __update_curr() in essence removes the
> ability of scheduler to preempt tasks - that hack in essence results in
> a non-scheduler. Hence the gears + X pair of tasks becomes a synchronous
> pair of tasks in essence - and thus gears cannot "overload" X.

I have narrowed it down a bit to add_wait_runtime.

Patch 2.6.22.5-v20.4 like this:

346-     * the two values are equal)
347-     * [Note: delta_mine - delta_exec is negative]:
348-     */
349://  add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
350-}
351-
352-static void update_curr(struct cfs_rq *cfs_rq)

When disabling add_wait_runtime the stalls are gone.  With this change the 
scheduler is still usable, but it does not constitute a fix.

Now, even with this hack, uneven nice-levels between X and gears causes a 
return of the stalls, so make sure both X and gears run on the same 
nice-level when testing.

Again, the whole point of this workload is to expose scheduler glitches 
regardless of whether X is broken or not, and my hunch is that this problem 
looks suspiciously like an ia-boosting bug.  What's important to note is 
that by adjusting the scheduler we can effect a correction in behaviour, and 
as such should yield this problem as fixable.

It's probably a good idea to look further into add_wait_runtime.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:18                                 ` Ingo Molnar
@ 2007-08-29  4:29                                   ` Keith Packard
  2007-08-29  4:46                                     ` Ingo Molnar
  2007-08-29  4:40                                   ` Mike Galbraith
  1 sibling, 1 reply; 123+ messages in thread
From: Keith Packard @ 2007-08-29  4:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote:

> > Then lay them out side by side to see the periodic stallings for 
> > ~10sec.

The X scheduling code isn't really designed to handle software GL well;
the requests can be very expensive to execute, and yet are specified as
atomic operations (sigh).

> i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
> and i'm wondering how it can be smooth on vesa-driver at all. I tested 
> it on a Core2Duo box and software rendering manages to do about 3 frames 
> per second. (although glxgears itself thinks it does ~600 fps) If i 
> start 3 glxgears then they do ~1 frame per second each. This is on 
> Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
> xorg-x11-drv-i810-2.0.0-4.fc7.

Are you attempting to measure the visible updates by eye? Or are you
using some other metric?

In any case, attempting to measure anything using glxgears is a bad
idea; it's not representative of *any* real applications. And then using
software GL on top of that...

What was the question again?

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:18                                 ` Ingo Molnar
  2007-08-29  4:29                                   ` Keith Packard
@ 2007-08-29  4:40                                   ` Mike Galbraith
  1 sibling, 0 replies; 123+ messages in thread
From: Mike Galbraith @ 2007-08-29  4:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Al Boldi, Peter Zijlstra, Andrew Morton, Linus Torvalds,
	linux-kernel, Keith Packard

On Wed, 2007-08-29 at 06:18 +0200, Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> 
> > No need for framebuffer.  All you need is X using the X.org 
> > vesa-driver.  Then start gears like this:
> > 
> >   # gears & gears & gears &
> > 
> > Then lay them out side by side to see the periodic stallings for 
> > ~10sec.
> 
> i just tried something similar (by adding Option "NoDRI" to xorg.conf) 
> and i'm wondering how it can be smooth on vesa-driver at all. I tested 
> it on a Core2Duo box and software rendering manages to do about 3 frames 
> per second. (although glxgears itself thinks it does ~600 fps) If i 
> start 3 glxgears then they do ~1 frame per second each. This is on 
> Fedora 7 with xorg-x11-server-Xorg-1.3.0.0-9.fc7 and 
> xorg-x11-drv-i810-2.0.0-4.fc7.

At least you can run the darn test... the third instance of glxgears
here means say bye bye to GUI instantly.

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:29                                   ` Keith Packard
@ 2007-08-29  4:46                                     ` Ingo Molnar
  2007-08-29  7:57                                       ` Keith Packard
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  4:46 UTC (permalink / raw)
  To: Keith Packard
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel


* Keith Packard <keith.packard@intel.com> wrote:

> > > Then lay them out side by side to see the periodic stallings for 
> > > ~10sec.
> 
> The X scheduling code isn't really designed to handle software GL 
> well; the requests can be very expensive to execute, and yet are 
> specified as atomic operations (sigh).
[...]

> Are you attempting to measure the visible updates by eye? Or are you 
> using some other metric?
> 
> In any case, attempting to measure anything using glxgears is a bad 
> idea; it's not representative of *any* real applications. And then 
> using software GL on top of that...
> 
> What was the question again?

ok, i finally managed to reproduce the "artifact" myself on an older 
box. It goes like this: start up X with the vesa driver (or with NoDRI) 
to force software rendering. Then start up a couple of glxgears 
instances. Those glxgears instances update in a very "chunky", 
"stuttering" way - each glxgears instance runs/stops/runs/stops at a 
rate of a about once per second, and this was reported to me as a 
potential CPU scheduler regression.

at a quick glance this is not a CPU scheduler thing: X uses up 99% of 
CPU time, all the glxgears tasks (i needed 8 parallel instances to see 
the stallings) are using up the remaining 1% of CPU time. The ordering 
of the requests from the glxgears tasks is X's choice - and for a 
pathological overload situation like this we cannot blame X at all for 
not producing a completely smooth output. (although Xorg could perhaps 
try to schedule such requests more smoothly, in a more finegrained way?)

i've attached below a timestamped strace of one of the glxgears 
instances that shows such a 'stall':

1188356998.440155 select(4, [3], [3], NULL, NULL) = 1 (out [3]) <1.173680>

the select (waiting for X) took 1.17 seconds.

	Ingo

-------------------->
Process 3644 attached - interrupt to quit
1188356997.810351 select(4, [3], [3], NULL, NULL) = 1 (out [3]) <0.594074>
1188356998.404580 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000115>
1188356998.404769 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.404880 gettimeofday({1188356998, 404893}, {4294967176, 0}) = 0 <0.000006>
1188356998.404923 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.405054 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.405116 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000073>
1188356998.405221 gettimeofday({1188356998, 405231}, {4294967176, 0}) = 0 <0.000006>
1188356998.405258 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.405394 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.405461 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.405582 gettimeofday({1188356998, 405593}, {4294967176, 0}) = 0 <0.000006>
1188356998.405620 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.405656 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000108>
1188356998.405818 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.405856 gettimeofday({1188356998, 405866}, {4294967176, 0}) = 0 <0.000006>
1188356998.405993 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.406032 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.406092 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000107>
1188356998.406232 gettimeofday({1188356998, 406242}, {4294967176, 0}) = 0 <0.000006>
1188356998.406269 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.406305 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000065>
1188356998.406423 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000118>
1188356998.406573 gettimeofday({1188356998, 406583}, {4294967176, 0}) = 0 <0.000006>
1188356998.406610 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.406646 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000125>
1188356998.406824 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.406863 gettimeofday({1188356998, 406873}, {4294967176, 0}) = 0 <0.000007>
1188356998.406900 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.407069 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.407131 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.407169 gettimeofday({1188356998, 407179}, {4294967176, 0}) = 0 <0.000007>
1188356998.407206 ioctl(3, FIONREAD, [0]) = 0 <0.000139>
1188356998.407376 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.407440 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.407478 gettimeofday({1188356998, 407487}, {4294967176, 0}) = 0 <0.000006>
1188356998.407649 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.407687 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.407747 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.407785 gettimeofday({1188356998, 407795}, {4294967176, 0}) = 0 <0.000140>
1188356998.407957 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.407993 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.408052 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.408224 gettimeofday({1188356998, 408236}, {4294967176, 0}) = 0 <0.000007>
1188356998.408263 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.408322 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000113>
1188356998.408490 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.408528 gettimeofday({1188356998, 408537}, {4294967176, 0}) = 0 <0.000006>
1188356998.408565 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356998.408735 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.408796 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.408834 gettimeofday({1188356998, 408843}, {4294967176, 0}) = 0 <0.000005>
1188356998.408870 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.408905 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000174>
1188356998.409132 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.409170 gettimeofday({1188356998, 409178}, {4294967176, 0}) = 0 <0.000005>
1188356998.409205 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.409240 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.409469 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.409508 gettimeofday({1188356998, 409517}, {4294967176, 0}) = 0 <0.000005>
1188356998.409544 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.409579 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.409638 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.409675 gettimeofday({1188356998, 409684}, {4294967176, 0}) = 0 <0.000005>
1188356998.409711 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.409747 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.409805 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000308>
1188356998.410145 gettimeofday({1188356998, 410154}, {4294967176, 0}) = 0 <0.000006>
1188356998.410181 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.410215 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.410274 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.410311 gettimeofday({1188356998, 410320}, {4294967176, 0}) = 0 <0.000006>
1188356998.410347 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.410381 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000260>
1188356998.410697 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.410735 gettimeofday({1188356998, 410743}, {4294967176, 0}) = 0 <0.000006>
1188356998.410771 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.410805 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.410864 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.410902 gettimeofday({1188356998, 410910}, {4294967176, 0}) = 0 <0.000005>
1188356998.410937 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.410972 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.411030 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.411068 gettimeofday({1188356998, 411077}, {4294967176, 0}) = 0 <0.000005>
1188356998.411104 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.411139 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.411198 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.411235 gettimeofday({1188356998, 411244}, {4294967176, 0}) = 0 <0.000005>
1188356998.411271 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.411306 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.411377 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000010>
1188356998.412017 gettimeofday({1188356998, 412027}, {4294967176, 0}) = 0 <0.000006>
1188356998.412055 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.412089 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.412148 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.412185 gettimeofday({1188356998, 412194}, {4294967176, 0}) = 0 <0.000006>
1188356998.412221 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.412255 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.412313 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.412350 gettimeofday({1188356998, 412359}, {4294967176, 0}) = 0 <0.000006>
1188356998.412385 ioctl(3, FIONREAD, [0]) = 0 <0.000009>
1188356998.412776 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.412837 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.412874 gettimeofday({1188356998, 412883}, {4294967176, 0}) = 0 <0.000006>
1188356998.412910 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.412944 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.413003 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.413040 gettimeofday({1188356998, 413049}, {4294967176, 0}) = 0 <0.000006>
1188356998.413076 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.413110 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.413169 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.413206 gettimeofday({1188356998, 413214}, {4294967176, 0}) = 0 <0.000006>
1188356998.413241 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.413276 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.413334 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.413371 gettimeofday({1188356998, 413380}, {4294967176, 0}) = 0 <0.000005>
1188356998.413924 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.413961 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.414020 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.414057 gettimeofday({1188356998, 414066}, {4294967176, 0}) = 0 <0.000006>
1188356998.414093 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.414127 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.414186 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.414223 gettimeofday({1188356998, 414232}, {4294967176, 0}) = 0 <0.000006>
1188356998.414259 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.414293 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.414351 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000006>
1188356998.414388 gettimeofday({1188356998, 414397}, {4294967176, 0}) = 0 <0.000082>
1188356998.414503 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.414538 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.414601 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.414638 gettimeofday({1188356998, 414647}, {4294967176, 0}) = 0 <0.000005>
1188356998.414674 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.414709 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.414768 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.414818 gettimeofday({1188356998, 414827}, {4294967176, 0}) = 0 <0.000006>
1188356998.414854 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.414889 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.414948 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.414986 gettimeofday({1188356998, 414995}, {4294967176, 0}) = 0 <0.000005>
1188356998.415022 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.415057 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000010>
1188356998.415118 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.415156 gettimeofday({1188356998, 415165}, {4294967176, 0}) = 0 <0.000005>
1188356998.415192 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.415227 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.415287 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.415325 gettimeofday({1188356998, 415334}, {4294967176, 0}) = 0 <0.000006>
1188356998.415362 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.415397 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.001116>
1188356998.416602 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356998.416707 gettimeofday({1188356998, 416717}, {4294967176, 0}) = 0 <0.000038>
1188356998.416778 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.416813 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.416872 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.416911 gettimeofday({1188356998, 416919}, {4294967176, 0}) = 0 <0.000006>
1188356998.416946 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.416981 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.417039 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.417076 gettimeofday({1188356998, 417085}, {4294967176, 0}) = 0 <0.000006>
1188356998.417111 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.417146 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.417204 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.417241 gettimeofday({1188356998, 417250}, {4294967176, 0}) = 0 <0.000006>
1188356998.417277 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.417311 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.417370 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.417953 gettimeofday({1188356998, 417963}, {4294967176, 0}) = 0 <0.000037>
1188356998.418056 ioctl(3, FIONREAD, [0]) = 0 <0.000037>
1188356998.418158 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.418218 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.418255 gettimeofday({1188356998, 418263}, {4294967176, 0}) = 0 <0.000006>
1188356998.418290 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.418325 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.418384 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000206>
1188356998.418656 gettimeofday({1188356998, 418666}, {4294967176, 0}) = 0 <0.000037>
1188356998.418759 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.418829 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.418887 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.418924 gettimeofday({1188356998, 418945}, {4294967176, 0}) = 0 <0.000006>
1188356998.418972 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.419007 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.419065 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.419103 gettimeofday({1188356998, 419111}, {4294967176, 0}) = 0 <0.000006>
1188356998.419138 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.419172 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.419231 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.419268 gettimeofday({1188356998, 419276}, {4294967176, 0}) = 0 <0.000005>
1188356998.419303 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.419338 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.419396 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000481>
1188356998.419944 gettimeofday({1188356998, 419954}, {4294967176, 0}) = 0 <0.000038>
1188356998.420048 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.420118 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.420176 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.420213 gettimeofday({1188356998, 420222}, {4294967176, 0}) = 0 <0.000005>
1188356998.420249 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.420283 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.420342 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.420379 gettimeofday({1188356998, 420387}, {4294967176, 0}) = 0 <0.000006>
1188356998.420682 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.420786 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356998.420880 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.420917 gettimeofday({1188356998, 420926}, {4294967176, 0}) = 0 <0.000006>
1188356998.420952 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.420987 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.421046 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.421083 gettimeofday({1188356998, 421091}, {4294967176, 0}) = 0 <0.000006>
1188356998.421118 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.421152 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.421212 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.421249 gettimeofday({1188356998, 421258}, {4294967176, 0}) = 0 <0.000005>
1188356998.421285 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.421319 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.421378 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000009>
1188356998.421881 gettimeofday({1188356998, 421891}, {4294967176, 0}) = 0 <0.000038>
1188356998.421984 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.422054 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.422112 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.422149 gettimeofday({1188356998, 422157}, {4294967176, 0}) = 0 <0.000005>
1188356998.422184 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.422218 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.422277 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.422314 gettimeofday({1188356998, 422323}, {4294967176, 0}) = 0 <0.000006>
1188356998.422350 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.422395 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000320>
1188356998.422804 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000039>
1188356998.422909 gettimeofday({1188356998, 422919}, {4294967176, 0}) = 0 <0.000005>
1188356998.422946 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.422981 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.423039 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.423077 gettimeofday({1188356998, 423085}, {4294967176, 0}) = 0 <0.000006>
1188356998.423112 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.423146 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.423204 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.423241 gettimeofday({1188356998, 423250}, {4294967176, 0}) = 0 <0.000006>
1188356998.423277 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.423312 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.423370 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.423831 gettimeofday({1188356998, 423841}, {4294967176, 0}) = 0 <0.000038>
1188356998.423936 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.424040 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.424100 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.424137 gettimeofday({1188356998, 424146}, {4294967176, 0}) = 0 <0.000005>
1188356998.424173 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.424207 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.424265 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.424302 gettimeofday({1188356998, 424310}, {4294967176, 0}) = 0 <0.000006>
1188356998.424339 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.424374 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000311>
1188356998.424775 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356998.424880 gettimeofday({1188356998, 424891}, {4294967176, 0}) = 0 <0.000006>
1188356998.424918 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.424952 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.425011 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.425048 gettimeofday({1188356998, 425056}, {4294967176, 0}) = 0 <0.000005>
1188356998.425083 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.425117 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.425175 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.425212 gettimeofday({1188356998, 425221}, {4294967176, 0}) = 0 <0.000005>
1188356998.425248 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.425282 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.425341 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.425379 gettimeofday({1188356998, 425388}, {4294967176, 0}) = 0 <0.000005>
1188356998.425930 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356998.426036 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.426096 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.426134 gettimeofday({1188356998, 426142}, {4294967176, 0}) = 0 <0.000006>
1188356998.426169 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.426203 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.426274 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.426312 gettimeofday({1188356998, 426320}, {4294967176, 0}) = 0 <0.000006>
1188356998.426347 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.426382 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000304>
1188356998.426775 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000041>
1188356998.426882 gettimeofday({1188356998, 426892}, {4294967176, 0}) = 0 <0.000006>
1188356998.426919 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.426954 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.427013 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.427050 gettimeofday({1188356998, 427059}, {4294967176, 0}) = 0 <0.000006>
1188356998.427086 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.427120 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.427179 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.427216 gettimeofday({1188356998, 427225}, {4294967176, 0}) = 0 <0.000006>
1188356998.427252 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.427286 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.427344 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.427381 gettimeofday({1188356998, 427390}, {4294967176, 0}) = 0 <0.000006>
1188356998.427867 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356998.427972 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356998.428067 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.428104 gettimeofday({1188356998, 428113}, {4294967176, 0}) = 0 <0.000005>
1188356998.428140 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.428175 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.428233 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.428270 gettimeofday({1188356998, 428279}, {4294967176, 0}) = 0 <0.000006>
1188356998.428305 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.428340 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.428401 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000322>
1188356998.428788 gettimeofday({1188356998, 428798}, {4294967176, 0}) = 0 <0.000039>
1188356998.428894 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.428930 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.428989 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.429027 gettimeofday({1188356998, 429035}, {4294967176, 0}) = 0 <0.000006>
1188356998.429062 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.429096 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.429155 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.429192 gettimeofday({1188356998, 429201}, {4294967176, 0}) = 0 <0.000005>
1188356998.429228 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.429262 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.429321 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000006>
1188356998.429357 gettimeofday({1188356998, 429366}, {4294967176, 0}) = 0 <0.000005>
1188356998.429393 ioctl(3, FIONREAD, [0]) = 0 <0.000436>
1188356998.429896 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000040>
1188356998.430023 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.430074 gettimeofday({1188356998, 430083}, {4294967176, 0}) = 0 <0.000005>
1188356998.430110 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.430145 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.430204 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.430242 gettimeofday({1188356998, 430250}, {4294967176, 0}) = 0 <0.000006>
1188356998.430277 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.430311 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.430370 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.430723 gettimeofday({1188356998, 430733}, {4294967176, 0}) = 0 <0.000039>
1188356998.430828 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356998.430899 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.430958 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.430995 gettimeofday({1188356998, 431004}, {4294967176, 0}) = 0 <0.000005>
1188356998.431031 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.431065 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.431124 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.431161 gettimeofday({1188356998, 431169}, {4294967176, 0}) = 0 <0.000006>
1188356998.431196 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.431230 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.431289 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.431326 gettimeofday({1188356998, 431334}, {4294967176, 0}) = 0 <0.000006>
1188356998.431361 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.431395 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000456>
1188356998.431940 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000039>
1188356998.432046 gettimeofday({1188356998, 432056}, {4294967176, 0}) = 0 <0.000005>
1188356998.432083 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.432117 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.432176 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.432213 gettimeofday({1188356998, 432222}, {4294967176, 0}) = 0 <0.000005>
1188356998.432249 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.432283 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.432342 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.432379 gettimeofday({1188356998, 432387}, {4294967176, 0}) = 0 <0.000006>
1188356998.432751 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356998.432855 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.432915 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.432952 gettimeofday({1188356998, 432961}, {4294967176, 0}) = 0 <0.000005>
1188356998.432988 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.433022 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.433081 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.433118 gettimeofday({1188356998, 433127}, {4294967176, 0}) = 0 <0.000006>
1188356998.433154 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.433188 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.433247 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.433283 gettimeofday({1188356998, 433292}, {4294967176, 0}) = 0 <0.000005>
1188356998.433330 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.433365 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000441>
1188356998.433896 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356998.434002 gettimeofday({1188356998, 434012}, {4294967176, 0}) = 0 <0.000038>
1188356998.434073 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.434108 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.434167 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.434204 gettimeofday({1188356998, 434213}, {4294967176, 0}) = 0 <0.000005>
1188356998.434240 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.434274 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.434332 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356998.434370 gettimeofday({1188356998, 434379}, {4294967176, 0}) = 0 <0.000006>
1188356998.434408 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.434443 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.434503 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.434540 gettimeofday({1188356998, 434549}, {4294967176, 0}) = 0 <0.000006>
1188356998.434575 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.434610 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.434668 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.434705 gettimeofday({1188356998, 434714}, {4294967176, 0}) = 0 <0.000005>
1188356998.434741 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.434775 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.434835 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.434871 gettimeofday({1188356998, 434880}, {4294967176, 0}) = 0 <0.000005>
1188356998.434907 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.434941 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.434999 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.435036 gettimeofday({1188356998, 435045}, {4294967176, 0}) = 0 <0.000006>
1188356998.435071 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.435106 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.435164 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.435201 gettimeofday({1188356998, 435209}, {4294967176, 0}) = 0 <0.000006>
1188356998.435237 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.435271 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.435329 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.435365 gettimeofday({1188356998, 435374}, {4294967176, 0}) = 0 <0.000006>
1188356998.435403 ioctl(3, FIONREAD, [0]) = 0 <0.001120>
1188356998.436587 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356998.436715 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000039>
1188356998.436821 gettimeofday({1188356998, 436831}, {4294967176, 0}) = 0 <0.000005>
1188356998.436857 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.436892 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.436951 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.436988 gettimeofday({1188356998, 436996}, {4294967176, 0}) = 0 <0.000006>
1188356998.437023 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.437069 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.437129 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.437166 gettimeofday({1188356998, 437175}, {4294967176, 0}) = 0 <0.000006>
1188356998.437202 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.437237 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.437295 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.437332 gettimeofday({1188356998, 437341}, {4294967176, 0}) = 0 <0.000005>
1188356998.437368 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.437404 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000498>
1188356998.437988 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356998.438094 gettimeofday({1188356998, 438104}, {4294967176, 0}) = 0 <0.000038>
1188356998.438198 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.438234 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.438293 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.438330 gettimeofday({1188356998, 438339}, {4294967176, 0}) = 0 <0.000006>
1188356998.438365 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.438403 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000206>
1188356998.438694 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356998.438800 gettimeofday({1188356998, 438810}, {4294967176, 0}) = 0 <0.000038>
1188356998.438871 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.438906 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356998.438966 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439003 gettimeofday({1188356998, 439012}, {4294967176, 0}) = 0 <0.000005>
1188356998.439039 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.439073 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356998.439132 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439169 gettimeofday({1188356998, 439178}, {4294967176, 0}) = 0 <0.000005>
1188356998.439205 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.439239 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.439297 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439334 gettimeofday({1188356998, 439343}, {4294967176, 0}) = 0 <0.000006>
1188356998.439370 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.439406 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.439465 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439503 gettimeofday({1188356998, 439511}, {4294967176, 0}) = 0 <0.000006>
1188356998.439538 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.439573 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.439631 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439669 gettimeofday({1188356998, 439677}, {4294967176, 0}) = 0 <0.000005>
1188356998.439704 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.439738 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.439797 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.439834 gettimeofday({1188356998, 439842}, {4294967176, 0}) = 0 <0.000006>
1188356998.439869 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356998.439904 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356998.439974 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356998.440011 gettimeofday({1188356998, 440020}, {4294967176, 0}) = 0 <0.000006>
1188356998.440047 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356998.440081 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = -1 EAGAIN (Resource temporarily unavailable) <0.000007>
1188356998.440155 select(4, [3], [3], NULL, NULL) = 1 (out [3]) <1.173680>
1188356999.614013 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000047>
1188356999.614125 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000059>
1188356999.614218 gettimeofday({1188356999, 614228}, {4294967176, 0}) = 0 <0.000006>
1188356999.614326 ioctl(3, FIONREAD, [0]) = 0 <0.000008>
1188356999.614371 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000075>
1188356999.614505 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.614611 gettimeofday({1188356999, 614622}, {4294967176, 0}) = 0 <0.000006>
1188356999.614650 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.614770 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.614831 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000074>
1188356999.614938 gettimeofday({1188356999, 614948}, {4294967176, 0}) = 0 <0.000007>
1188356999.614975 ioctl(3, FIONREAD, [0]) = 0 <0.000088>
1188356999.615095 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.615155 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000090>
1188356999.615278 gettimeofday({1188356999, 615287}, {4294967176, 0}) = 0 <0.000006>
1188356999.615315 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.615407 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.615565 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.615606 gettimeofday({1188356999, 615615}, {4294967176, 0}) = 0 <0.000006>
1188356999.615643 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.615798 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.615859 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.615897 gettimeofday({1188356999, 615907}, {4294967176, 0}) = 0 <0.000006>
1188356999.616053 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.616090 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.616150 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000109>
1188356999.616292 gettimeofday({1188356999, 616301}, {4294967176, 0}) = 0 <0.000006>
1188356999.616329 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.616365 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000126>
1188356999.616548 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.616587 gettimeofday({1188356999, 616596}, {4294967176, 0}) = 0 <0.000006>
1188356999.616624 ioctl(3, FIONREAD, [0]) = 0 <0.000124>
1188356999.616779 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356999.616840 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.616878 gettimeofday({1188356999, 616888}, {4294967176, 0}) = 0 <0.000122>
1188356999.617034 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.617070 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.617130 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000125>
1188356999.617288 gettimeofday({1188356999, 617297}, {4294967176, 0}) = 0 <0.000006>
1188356999.617348 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.617385 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000129>
1188356999.617571 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.617610 gettimeofday({1188356999, 617619}, {4294967176, 0}) = 0 <0.000005>
1188356999.617748 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.617785 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.617844 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.617881 gettimeofday({1188356999, 617890}, {4294967176, 0}) = 0 <0.000005>
1188356999.617917 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.617952 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.618011 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.618305 gettimeofday({1188356999, 618315}, {4294967176, 0}) = 0 <0.000006>
1188356999.618343 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.618378 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000110>
1188356999.618543 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.618581 gettimeofday({1188356999, 618590}, {4294967176, 0}) = 0 <0.000005>
1188356999.618617 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.618652 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.618710 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.618748 gettimeofday({1188356999, 618756}, {4294967176, 0}) = 0 <0.000006>
1188356999.618783 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.618818 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000010>
1188356999.618878 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.618916 gettimeofday({1188356999, 618925}, {4294967176, 0}) = 0 <0.000005>
1188356999.618952 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.618987 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.619046 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.619083 gettimeofday({1188356999, 619092}, {4294967176, 0}) = 0 <0.000005>
1188356999.619119 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.619154 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.619792 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.619831 gettimeofday({1188356999, 619840}, {4294967176, 0}) = 0 <0.000006>
1188356999.619867 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.619902 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.619961 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.619998 gettimeofday({1188356999, 620007}, {4294967176, 0}) = 0 <0.000005>
1188356999.620034 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.620068 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.620127 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.620164 gettimeofday({1188356999, 620173}, {4294967176, 0}) = 0 <0.000005>
1188356999.620200 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.620235 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.620293 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.620330 gettimeofday({1188356999, 620339}, {4294967176, 0}) = 0 <0.000006>
1188356999.620366 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.620959 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.621020 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621057 gettimeofday({1188356999, 621066}, {4294967176, 0}) = 0 <0.000005>
1188356999.621093 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.621128 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.621187 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621224 gettimeofday({1188356999, 621233}, {4294967176, 0}) = 0 <0.000006>
1188356999.621259 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.621294 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.621352 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621390 gettimeofday({1188356999, 621401}, {4294967176, 0}) = 0 <0.000008>
1188356999.621429 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.621464 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.621522 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621559 gettimeofday({1188356999, 621568}, {4294967176, 0}) = 0 <0.000005>
1188356999.621595 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.621630 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.621688 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621726 gettimeofday({1188356999, 621735}, {4294967176, 0}) = 0 <0.000005>
1188356999.621762 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.621797 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.621856 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.621894 gettimeofday({1188356999, 621903}, {4294967176, 0}) = 0 <0.000006>
1188356999.621930 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.621964 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.622023 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.622061 gettimeofday({1188356999, 622070}, {4294967176, 0}) = 0 <0.000006>
1188356999.622097 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.622132 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.622191 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.622230 gettimeofday({1188356999, 622239}, {4294967176, 0}) = 0 <0.000006>
1188356999.622267 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.622301 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.622360 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.622400 gettimeofday({1188356999, 622410}, {4294967176, 0}) = 0 <0.001199>
1188356999.623665 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.623768 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.623828 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.623866 gettimeofday({1188356999, 623875}, {4294967176, 0}) = 0 <0.000006>
1188356999.623902 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.623936 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.623995 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.624032 gettimeofday({1188356999, 624041}, {4294967176, 0}) = 0 <0.000005>
1188356999.624068 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.624103 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.624174 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.624212 gettimeofday({1188356999, 624221}, {4294967176, 0}) = 0 <0.000006>
1188356999.624248 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.624282 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.624344 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.624381 gettimeofday({1188356999, 624390}, {4294967176, 0}) = 0 <0.000005>
1188356999.624987 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.625091 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356999.625185 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.625222 gettimeofday({1188356999, 625231}, {4294967176, 0}) = 0 <0.000006>
1188356999.625258 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.625292 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.625351 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.625388 gettimeofday({1188356999, 625397}, {4294967176, 0}) = 0 <0.000261>
1188356999.625714 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356999.625818 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.625879 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.625916 gettimeofday({1188356999, 625925}, {4294967176, 0}) = 0 <0.000005>
1188356999.625952 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.625987 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.626045 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.626082 gettimeofday({1188356999, 626091}, {4294967176, 0}) = 0 <0.000005>
1188356999.626118 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.626153 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.626211 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.626249 gettimeofday({1188356999, 626257}, {4294967176, 0}) = 0 <0.000005>
1188356999.626284 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.626318 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.626377 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000010>
1188356999.626913 gettimeofday({1188356999, 626923}, {4294967176, 0}) = 0 <0.000038>
1188356999.627018 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.627088 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.627147 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.627185 gettimeofday({1188356999, 627193}, {4294967176, 0}) = 0 <0.000005>
1188356999.627220 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.627254 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.627313 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.627350 gettimeofday({1188356999, 627359}, {4294967176, 0}) = 0 <0.000005>
1188356999.627386 ioctl(3, FIONREAD, [0]) = 0 <0.000009>
1188356999.627733 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000043>
1188356999.627861 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000006>
1188356999.627901 gettimeofday({1188356999, 627910}, {4294967176, 0}) = 0 <0.000006>
1188356999.627936 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.627971 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000010>
1188356999.628031 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.628081 gettimeofday({1188356999, 628090}, {4294967176, 0}) = 0 <0.000006>
1188356999.628117 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.628152 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.628211 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.628248 gettimeofday({1188356999, 628256}, {4294967176, 0}) = 0 <0.000006>
1188356999.628283 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.628318 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.628376 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000009>
1188356999.628864 gettimeofday({1188356999, 628875}, {4294967176, 0}) = 0 <0.000039>
1188356999.628970 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356999.629040 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.629099 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.629136 gettimeofday({1188356999, 629145}, {4294967176, 0}) = 0 <0.000005>
1188356999.629172 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.629207 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.629265 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.629302 gettimeofday({1188356999, 629311}, {4294967176, 0}) = 0 <0.000005>
1188356999.629338 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.629373 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000323>
1188356999.629784 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.629891 gettimeofday({1188356999, 629901}, {4294967176, 0}) = 0 <0.000005>
1188356999.629928 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.629963 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.630022 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.630059 gettimeofday({1188356999, 630068}, {4294967176, 0}) = 0 <0.000005>
1188356999.630095 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.630130 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.630188 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.630225 gettimeofday({1188356999, 630234}, {4294967176, 0}) = 0 <0.000005>
1188356999.630261 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.630295 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.630354 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.630391 gettimeofday({1188356999, 630403}, {4294967176, 0}) = 0 <0.000441>
1188356999.630897 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356999.631000 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.631060 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.631097 gettimeofday({1188356999, 631106}, {4294967176, 0}) = 0 <0.000006>
1188356999.631133 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.631167 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.631226 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.631263 gettimeofday({1188356999, 631272}, {4294967176, 0}) = 0 <0.000006>
1188356999.631299 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.631333 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.631392 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000339>
1188356999.631799 gettimeofday({1188356999, 631809}, {4294967176, 0}) = 0 <0.000037>
1188356999.631916 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.631953 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.632012 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.632050 gettimeofday({1188356999, 632058}, {4294967176, 0}) = 0 <0.000006>
1188356999.632085 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.632120 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.632178 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.632215 gettimeofday({1188356999, 632224}, {4294967176, 0}) = 0 <0.000005>
1188356999.632251 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.632285 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.632344 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.632381 gettimeofday({1188356999, 632390}, {4294967176, 0}) = 0 <0.000006>
1188356999.632836 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356999.632939 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356999.633033 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.633071 gettimeofday({1188356999, 633079}, {4294967176, 0}) = 0 <0.000006>
1188356999.633106 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.633141 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.633199 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.633237 gettimeofday({1188356999, 633245}, {4294967176, 0}) = 0 <0.000006>
1188356999.633273 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.633307 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.633365 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.633405 gettimeofday({1188356999, 633414}, {4294967176, 0}) = 0 <0.000356>
1188356999.633826 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.633930 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.633989 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.634026 gettimeofday({1188356999, 634035}, {4294967176, 0}) = 0 <0.000006>
1188356999.634062 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.634096 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.634155 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.634192 gettimeofday({1188356999, 634200}, {4294967176, 0}) = 0 <0.000005>
1188356999.634227 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.634261 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.634319 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.634356 gettimeofday({1188356999, 634365}, {4294967176, 0}) = 0 <0.000005>
1188356999.634392 ioctl(3, FIONREAD, [0]) = 0 <0.000405>
1188356999.634863 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356999.634990 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.635029 gettimeofday({1188356999, 635038}, {4294967176, 0}) = 0 <0.000005>
1188356999.635065 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.635099 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.635158 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.635195 gettimeofday({1188356999, 635204}, {4294967176, 0}) = 0 <0.000006>
1188356999.635230 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.635265 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.635337 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.635374 gettimeofday({1188356999, 635383}, {4294967176, 0}) = 0 <0.000006>
1188356999.635781 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.635885 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356999.635980 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.636017 gettimeofday({1188356999, 636026}, {4294967176, 0}) = 0 <0.000006>
1188356999.636053 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.636087 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.636146 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.636183 gettimeofday({1188356999, 636192}, {4294967176, 0}) = 0 <0.000005>
1188356999.636219 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.636253 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.636312 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.636349 gettimeofday({1188356999, 636358}, {4294967176, 0}) = 0 <0.000005>
1188356999.636385 ioctl(3, FIONREAD, [0]) = 0 <0.000009>
1188356999.636828 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000041>
1188356999.636955 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000006>
1188356999.636994 gettimeofday({1188356999, 637003}, {4294967176, 0}) = 0 <0.000006>
1188356999.637030 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.637065 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.637123 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.637161 gettimeofday({1188356999, 637169}, {4294967176, 0}) = 0 <0.000006>
1188356999.637196 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.637232 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.637291 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.637328 gettimeofday({1188356999, 637337}, {4294967176, 0}) = 0 <0.000006>
1188356999.637364 ioctl(3, FIONREAD, [0]) = 0 <0.000007>
1188356999.637402 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000414>
1188356999.637902 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.637976 gettimeofday({1188356999, 637985}, {4294967176, 0}) = 0 <0.000006>
1188356999.638012 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.638047 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.638106 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.638143 gettimeofday({1188356999, 638152}, {4294967176, 0}) = 0 <0.000006>
1188356999.638179 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.638213 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.638272 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.638308 gettimeofday({1188356999, 638317}, {4294967176, 0}) = 0 <0.000005>
1188356999.638344 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.638378 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000396>
1188356999.638863 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.638969 gettimeofday({1188356999, 638979}, {4294967176, 0}) = 0 <0.000005>
1188356999.639006 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.639041 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356999.639112 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.639150 gettimeofday({1188356999, 639159}, {4294967176, 0}) = 0 <0.000006>
1188356999.639186 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.639220 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.639279 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.639316 gettimeofday({1188356999, 639325}, {4294967176, 0}) = 0 <0.000006>
1188356999.639352 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.639386 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000373>
1188356999.639848 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.639955 gettimeofday({1188356999, 639965}, {4294967176, 0}) = 0 <0.000005>
1188356999.639992 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.640027 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.640085 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.640123 gettimeofday({1188356999, 640131}, {4294967176, 0}) = 0 <0.000006>
1188356999.640159 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.640193 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.640251 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.640288 gettimeofday({1188356999, 640297}, {4294967176, 0}) = 0 <0.000006>
1188356999.640324 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.640358 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000011>
1188356999.640802 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000041>
1188356999.640910 gettimeofday({1188356999, 640920}, {4294967176, 0}) = 0 <0.000038>
1188356999.640982 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.641017 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.641076 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.641113 gettimeofday({1188356999, 641122}, {4294967176, 0}) = 0 <0.000005>
1188356999.641149 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.641183 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.641242 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.641279 gettimeofday({1188356999, 641287}, {4294967176, 0}) = 0 <0.000006>
1188356999.641315 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.641349 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.641775 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.641882 gettimeofday({1188356999, 641892}, {4294967176, 0}) = 0 <0.000038>
1188356999.641954 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.641988 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.642047 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.642085 gettimeofday({1188356999, 642093}, {4294967176, 0}) = 0 <0.000006>
1188356999.642121 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.642155 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.642213 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.642250 gettimeofday({1188356999, 642259}, {4294967176, 0}) = 0 <0.000005>
1188356999.642286 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.642320 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.642379 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000388>
1188356999.642849 gettimeofday({1188356999, 642859}, {4294967176, 0}) = 0 <0.000038>
1188356999.642953 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.642989 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout) <0.000008>
1188356999.643030 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.643090 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.643127 gettimeofday({1188356999, 643136}, {4294967176, 0}) = 0 <0.000006>
1188356999.643163 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.643198 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.643256 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.643294 gettimeofday({1188356999, 643303}, {4294967176, 0}) = 0 <0.000006>
1188356999.643330 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.643364 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000374>
1188356999.643827 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.643933 gettimeofday({1188356999, 643943}, {4294967176, 0}) = 0 <0.000038>
1188356999.644005 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.644040 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.644099 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.644137 gettimeofday({1188356999, 644145}, {4294967176, 0}) = 0 <0.000006>
1188356999.644172 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.644207 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.644266 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.644303 gettimeofday({1188356999, 644311}, {4294967176, 0}) = 0 <0.000006>
1188356999.644339 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.644373 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000357>
1188356999.644819 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.644926 gettimeofday({1188356999, 644936}, {4294967176, 0}) = 0 <0.000005>
1188356999.644963 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.644998 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.645057 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.645095 gettimeofday({1188356999, 645103}, {4294967176, 0}) = 0 <0.000005>
1188356999.645130 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.645165 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.645225 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.645262 gettimeofday({1188356999, 645271}, {4294967176, 0}) = 0 <0.000006>
1188356999.645298 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.645332 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.645391 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000407>
1188356999.645865 gettimeofday({1188356999, 645875}, {4294967176, 0}) = 0 <0.000038>
1188356999.645969 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.646005 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.646065 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.646102 gettimeofday({1188356999, 646111}, {4294967176, 0}) = 0 <0.000006>
1188356999.646138 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.646172 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.646231 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.646281 gettimeofday({1188356999, 646290}, {4294967176, 0}) = 0 <0.000005>
1188356999.646317 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.646352 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.646763 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.646870 gettimeofday({1188356999, 646880}, {4294967176, 0}) = 0 <0.000039>
1188356999.646941 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.646976 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.647036 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.647073 gettimeofday({1188356999, 647082}, {4294967176, 0}) = 0 <0.000005>
1188356999.647109 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.647143 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.647202 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.647240 gettimeofday({1188356999, 647248}, {4294967176, 0}) = 0 <0.000006>
1188356999.647276 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.647310 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000009>
1188356999.647372 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.647812 gettimeofday({1188356999, 647823}, {4294967176, 0}) = 0 <0.000039>
1188356999.647917 ioctl(3, FIONREAD, [0]) = 0 <0.000039>
1188356999.647988 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.648047 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.648084 gettimeofday({1188356999, 648093}, {4294967176, 0}) = 0 <0.000005>
1188356999.648120 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.648155 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.648213 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.648251 gettimeofday({1188356999, 648260}, {4294967176, 0}) = 0 <0.000006>
1188356999.648287 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.648321 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.648380 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000010>
1188356999.648420 gettimeofday({1188356999, 648429}, {4294967176, 0}) = 0 <0.000005>
1188356999.648456 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.648491 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.648549 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.648586 gettimeofday({1188356999, 648595}, {4294967176, 0}) = 0 <0.000005>
1188356999.648622 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.648656 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.648715 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.648752 gettimeofday({1188356999, 648761}, {4294967176, 0}) = 0 <0.000005>
1188356999.648788 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.648823 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000007>
1188356999.648881 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000008>
1188356999.648919 gettimeofday({1188356999, 648928}, {4294967176, 0}) = 0 <0.000006>
1188356999.648955 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.648990 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.649048 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.649086 gettimeofday({1188356999, 649094}, {4294967176, 0}) = 0 <0.000005>
1188356999.649134 ioctl(3, FIONREAD, [0]) = 0 <0.000006>
1188356999.649170 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.649229 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000007>
1188356999.649266 gettimeofday({1188356999, 649275}, {4294967176, 0}) = 0 <0.000006>
1188356999.649302 ioctl(3, FIONREAD, [0]) = 0 <0.000005>
1188356999.649336 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = 240 <0.000008>
1188356999.649394 write(3, "\217\v\3\0\1\0\0\0\2\0 \2", 12) = 12 <0.000040>
1188356999.650639 gettimeofday({1188356999, 650649}, {4294967176, 0}) = 0 <0.000038>
1188356999.650743 ioctl(3, FIONREAD, [0]) = 0 <0.000038>
1188356999.650846 writev(3, [{"\217\1<\0\1\0\0\0", 8}, {"\10\0\177\0\0A\0\0\4\0\270\0\24\0\272\0\0\0\240A\0\0\200"..., 232}], 2) = -1 EAGAIN (Resource temporarily unavailable) <0.000006>
1188356999.650912 select(4, [3], [3], NULL, NULL <unfinished ...>
Process 3644 detached

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:19                                         ` Al Boldi
@ 2007-08-29  4:53                                           ` Ingo Molnar
  2007-08-29  5:58                                             ` Al Boldi
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  4:53 UTC (permalink / raw)
  To: Al Boldi
  Cc: Linus Torvalds, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	linux-kernel, Keith Packard


* Al Boldi <a1426z@gawab.com> wrote:

> I have narrowed it down a bit to add_wait_runtime.

the scheduler is a red herring here. Could you "strace -ttt -TTT" one of 
the glxgears instances (and send us the cfs-debug-info.sh output, with 
CONFIG_SCHED_DEBUG=y and CONFIG_SCHEDSTATS=y as requested before) so 
that we can have a closer look?

i reproduced something similar and there the stall is caused by 1+ 
second select() delays on the X client<->server socket. The scheduler 
stats agree with that:

 se.sleep_max             :          2194711437
 se.block_max             :                   0
 se.exec_max              :              977446
 se.wait_max              :             1912321

the scheduler itself had a worst-case scheduling delay of 1.9 
milliseconds for that glxgears instance (which is perfectly good - in 
fact - excellent interactivity) - but the task had a maximum sleep time 
of 2.19 seconds. So the 'glitch' was not caused by the scheduler.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:53                                           ` Ingo Molnar
@ 2007-08-29  5:58                                             ` Al Boldi
  2007-08-29  6:43                                               ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Al Boldi @ 2007-08-29  5:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	linux-kernel, Keith Packard

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > I have narrowed it down a bit to add_wait_runtime.
>
> the scheduler is a red herring here. Could you "strace -ttt -TTT" one of
> the glxgears instances (and send us the cfs-debug-info.sh output, with
> CONFIG_SCHED_DEBUG=y and CONFIG_SCHEDSTATS=y as requested before) so
> that we can have a closer look?
>
> i reproduced something similar and there the stall is caused by 1+
> second select() delays on the X client<->server socket. The scheduler
> stats agree with that:
>
>  se.sleep_max             :          2194711437
>  se.block_max             :                   0
>  se.exec_max              :              977446
>  se.wait_max              :             1912321
>
> the scheduler itself had a worst-case scheduling delay of 1.9
> milliseconds for that glxgears instance (which is perfectly good - in
> fact - excellent interactivity) - but the task had a maximum sleep time
> of 2.19 seconds. So the 'glitch' was not caused by the scheduler.

2.19sec is probably the time you need to lay them out side by side.  You see, 
gears sleeps when it is covered by another window, so once you lay them out 
it starts running, and that's when they start to stutter for about 10sec.  
After that they should run smoothly, because they used up all the sleep 
bonus.

If you like, I can send you my straces, but they are kind of big though, and 
you need to strace each gear, as stracing itself changes the workload 
balance.

Let's first make sure what we are looking for:
1. start # gears & gears & gears &
2. lay them out side by side, don't worry about sleep times yet.
3. now they start stuttering for about 10sec
4. now they run out of sleep bonuses and smooth out

If this is the sequence you get on your machine, then try disabling 
add_wait_runtime to see the difference.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  5:58                                             ` Al Boldi
@ 2007-08-29  6:43                                               ` Ingo Molnar
  0 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  6:43 UTC (permalink / raw)
  To: Al Boldi
  Cc: Linus Torvalds, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	linux-kernel, Keith Packard


* Al Boldi <a1426z@gawab.com> wrote:

> >  se.sleep_max             :          2194711437
> >  se.block_max             :                   0
> >  se.exec_max              :              977446
> >  se.wait_max              :             1912321
> >
> > the scheduler itself had a worst-case scheduling delay of 1.9
> > milliseconds for that glxgears instance (which is perfectly good - in
> > fact - excellent interactivity) - but the task had a maximum sleep time
> > of 2.19 seconds. So the 'glitch' was not caused by the scheduler.
> 
> 2.19sec is probably the time you need to lay them out side by side. 
> [...]

nope, i cleared the stats after i laid the glxgears out, via:

   for N in /proc/*/sched; do echo 0 > $N; done

and i did the strace (which showed a 1+ seconds latency) while the 
glxgears was not manipulated in any way.

> [...]  You see, gears sleeps when it is covered by another window, 
> [...]

none of the gear windows in my test were overlaid...

> [...] so once you lay them out it starts running, and that's when they 
> start to stutter for about 10sec.  After that they should run 
> smoothly, because they used up all the sleep bonus.

that's plain wrong - at least in the test i've reproduced. In any case, 
if that were the case then that would be visible in the stats. So please 
send me your cfs-debug-info.sh output captured while the test is running 
(with a CONFIG_SCHEDSTATS=y and CONFIG_SCHED_DEBUG=y kernel) - you can 
download it from:

   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

for best data, execute this before running it:

   for N in /proc/*/sched; do echo 0 > $N; done

> If you like, I can send you my straces, but they are kind of big 
> though, and you need to strace each gear, as stracing itself changes 
> the workload balance.

sure, send them along or upload them somewhere - but more importantly, 
please send the cfs-debug-info.sh output.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  4:46                                     ` Ingo Molnar
@ 2007-08-29  7:57                                       ` Keith Packard
  2007-08-29  8:04                                         ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Keith Packard @ 2007-08-29  7:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2155 bytes --]

On Wed, 2007-08-29 at 06:46 +0200, Ingo Molnar wrote:

> ok, i finally managed to reproduce the "artifact" myself on an older 
> box. It goes like this: start up X with the vesa driver (or with NoDRI) 
> to force software rendering. Then start up a couple of glxgears 
> instances. Those glxgears instances update in a very "chunky", 
> "stuttering" way - each glxgears instance runs/stops/runs/stops at a 
> rate of a about once per second, and this was reported to me as a 
> potential CPU scheduler regression.

Hmm. I can't even run two copies of glxgears on software GL code today;
it's broken in every X server I have available. Someone broke it a while
ago, but no-one noticed. However, this shouldn't be GLX related as the
software rasterizer is no different from any other rendering code.

Testing with my smart-scheduler case (many copies of 'plaid') shows that
at least with git master, things are working as designed. When GLX is
working again, I'll try that as well.

> at a quick glance this is not a CPU scheduler thing: X uses up 99% of 
> CPU time, all the glxgears tasks (i needed 8 parallel instances to see 
> the stallings) are using up the remaining 1% of CPU time. The ordering 
> of the requests from the glxgears tasks is X's choice - and for a 
> pathological overload situation like this we cannot blame X at all for 
> not producing a completely smooth output. (although Xorg could perhaps 
> try to schedule such requests more smoothly, in a more finegrained way?)

It does. It should switch between clients ever 20ms; that's why X spends
so much time asking the kernel for the current time.

Make sure the X server isn't running with the smart scheduler disabled;
that will cause precisely the symptoms you're seeing here. In the normal
usptream sources, you'd have to use '-dumbSched' as an X server command
line option.

The old 'scheduler' would run an entire X client's input buffer dry
before looking for requests from another client. Because glxgears
requests are small but time consuming, this can cause very long delays
between client switching.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  7:57                                       ` Keith Packard
@ 2007-08-29  8:04                                         ` Ingo Molnar
  2007-08-29  8:53                                           ` Al Boldi
  2007-08-29 15:57                                           ` Keith Packard
  0 siblings, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-29  8:04 UTC (permalink / raw)
  To: Keith Packard
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel


* Keith Packard <keith.packard@intel.com> wrote:

> Make sure the X server isn't running with the smart scheduler 
> disabled; that will cause precisely the symptoms you're seeing here. 
> In the normal usptream sources, you'd have to use '-dumbSched' as an X 
> server command line option.
> 
> The old 'scheduler' would run an entire X client's input buffer dry 
> before looking for requests from another client. Because glxgears 
> requests are small but time consuming, this can cause very long delays 
> between client switching.

on the old box where i've reproduced this i've got an ancient X version:

  neptune:~> X -version

  X Window System Version 6.8.2
  Release Date: 9 February 2005
  X Protocol Version 11, Revision 0, Release 6.8.2
  Build Operating System: Linux 2.6.9-22.ELsmp i686 [ELF]

is that old enough to not have the smart X scheduler?

on newer systems i dont see correctly updated glxgears output (probably 
the GLX bug you mentioned) so i cannot reproduce the bug.

Al, could you send us your 'X -version' output?

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  8:04                                         ` Ingo Molnar
@ 2007-08-29  8:53                                           ` Al Boldi
  2007-08-29 15:57                                           ` Keith Packard
  1 sibling, 0 replies; 123+ messages in thread
From: Al Boldi @ 2007-08-29  8:53 UTC (permalink / raw)
  To: Ingo Molnar, Keith Packard
  Cc: Peter Zijlstra, Mike Galbraith, Andrew Morton, Linus Torvalds,
	linux-kernel

Ingo Molnar wrote:
> * Keith Packard <keith.packard@intel.com> wrote:
> > Make sure the X server isn't running with the smart scheduler
> > disabled; that will cause precisely the symptoms you're seeing here.
> > In the normal usptream sources, you'd have to use '-dumbSched' as an X
> > server command line option.
> >
> > The old 'scheduler' would run an entire X client's input buffer dry
> > before looking for requests from another client. Because glxgears
> > requests are small but time consuming, this can cause very long delays
> > between client switching.
>
> on the old box where i've reproduced this i've got an ancient X version:
>
>   neptune:~> X -version
>
>   X Window System Version 6.8.2
>   Release Date: 9 February 2005
>   X Protocol Version 11, Revision 0, Release 6.8.2
>   Build Operating System: Linux 2.6.9-22.ELsmp i686 [ELF]
>
> is that old enough to not have the smart X scheduler?
>
> on newer systems i dont see correctly updated glxgears output (probably
> the GLX bug you mentioned) so i cannot reproduce the bug.
>
> Al, could you send us your 'X -version' output?

This is the one I have been talking about:

XFree86 Version 4.3.0
Release Date: 27 February 2003
X Protocol Version 11, Revision 0, Release 6.6
Build Operating System: Linux 2.4.21-0.13mdksmp i686 [ELF] 


I also tried the gears test just now on this:

X Window System Version 6.8.1
Release Date: 17 September 2004
X Protocol Version 11, Revision 0, Release 6.8.1
Build Operating System: Linux 2.6.9-1.860_ELsmp i686 [ELF] 

but it completely locks up.  Disabling add_wait_runtime seems to fix it.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  3:45                   ` Ingo Molnar
@ 2007-08-29 13:11                     ` Bill Davidsen
  0 siblings, 0 replies; 123+ messages in thread
From: Bill Davidsen @ 2007-08-29 13:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Al Boldi, Peter Zijlstra, Mike Galbraith, Andrew Morton,
	Linus Torvalds, linux-kernel

Ingo Molnar wrote:
> * Bill Davidsen <davidsen@tmr.com> wrote:
>
>   
>>> There is another way to show the problem visually under X 
>>> (vesa-driver), by starting 3 gears simultaneously, which after 
>>> laying them out side-by-side need some settling time before 
>>> smoothing out.  Without __update_curr it's absolutely smooth from 
>>> the start.
>>>       
>> I posted a LOT of stuff using the glitch1 script, and finally found a 
>> set of tuning values which make the test script run smooth. See back 
>> posts, I don't have them here.
>>     
>
> but you have real 3D hw and DRI enabled, correct? In that case X uses up 
> almost no CPU time and glxgears makes most of the processing. That is 
> quite different from the above software-rendering case, where X spends 
> most of the CPU time.
>   

No, my test machine for that is a compile server, and uses the built-in 
motherboard graphics which are very limited. This is not in any sense a 
graphics powerhouse, it is used to build custom kernels and 
applications, and for testing of kvm and xen, and I grabbed it because 
it had the only Core2 CPU I could reboot to try new kernel versions and 
"from cold boot" testing, discovered the graphics smoothness issue by 
having several windows open on compiles, and developed the glitch1 
script as a way to reproduce it.

The settings I used, features=14, granularity=500000, work to improve 
smoothness on other machines for other uses, but they do seem to impact 
performance for compiles, video processing, etc, so they are not optimal 
for general use. I regard the existence of these tuning knobs as one of 
the real strengths of CFS, when you change the tuning it has a visible 
effect.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29  8:04                                         ` Ingo Molnar
  2007-08-29  8:53                                           ` Al Boldi
@ 2007-08-29 15:57                                           ` Keith Packard
  2007-08-29 19:56                                             ` Rene Herman
  1 sibling, 1 reply; 123+ messages in thread
From: Keith Packard @ 2007-08-29 15:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

On Wed, 2007-08-29 at 10:04 +0200, Ingo Molnar wrote:

> is that old enough to not have the smart X scheduler?

The smart scheduler went into the server in like 2000. I don't think
you've got any systems that old. XFree86 4.1 or 4.2, I can't remember
which.

> (probably 
> the GLX bug you mentioned) so i cannot reproduce the bug.

With X server 1.3, I'm getting consistent crashes with two glxgear
instances running. So, if you're getting any output, it's better than my
situation.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29 15:57                                           ` Keith Packard
@ 2007-08-29 19:56                                             ` Rene Herman
  2007-08-30  7:05                                               ` Rene Herman
  2007-08-30 16:06                                               ` CFS review Chuck Ebbert
  0 siblings, 2 replies; 123+ messages in thread
From: Rene Herman @ 2007-08-29 19:56 UTC (permalink / raw)
  To: keith.packard
  Cc: Ingo Molnar, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel

On 08/29/2007 05:57 PM, Keith Packard wrote:

> With X server 1.3, I'm getting consistent crashes with two glxgear
> instances running. So, if you're getting any output, it's better than my
> situation.

Before people focuss on software rendering too much -- also with 1.3.0 (and
a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly crummy using
hardware rendering. While I can move the glxgears window itself, the actual
spinning wheels stay in the upper-left corner of the screen and the movement
leaves a non-repainting trace on the screen. Running a second instance of
glxgears in addition seems to make both instances unkillable  -- and when
I just now forcefully killed X in this situation (the spinning wheels were
covering the upper left corner of all my desktops) I got the below.

Kernel is 2.6.22.5-cfs-v20.5, schedule() is in the traces (but that may be
expected anyway).

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010
  printing eip:
c10ff416
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: nfsd exportfs lockd nfs_acl sunrpc nls_iso8859_1 nls_cp437 vfat fat 
nls_base
CPU:    0
EIP:    0060:[<c10ff416>]    Not tainted VLI
EFLAGS: 00210246   (2.6.22.5-cfs-v20.5-local #5)
EIP is at mga_dma_buffers+0x189/0x2e3
eax: 00000000   ebx: efd07200   ecx: 00000001   edx: efc32c00
esi: 00000000   edi: c12756cc   ebp: dfea44c0   esp: dddaaec0
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process glxgears (pid: 1775, ti=dddaa000 task=e9daca60 task.ti=dddaa000)
Stack: efc32c00 00000000 00000004 e4c3bd20 c10fa54b e4c3bd20 efc32c00 00000000
        00000004 00000000 00000000 00000000 00000000 00000001 00010000 bfbdb8bc
        bfbdb8b8 00000000 c10ff28d 00000029 c12756cc dfea44c0 c10f87fc bfbdb844
Call Trace:
  [<c10fa54b>] drm_lock+0x255/0x2de
  [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
  [<c10f87fc>] drm_ioctl+0x142/0x18a
  [<c1005973>] do_IRQ+0x97/0xb0
  [<c10f86ba>] drm_ioctl+0x0/0x18a
  [<c10f86ba>] drm_ioctl+0x0/0x18a
  [<c105b0d7>] do_ioctl+0x87/0x9f
  [<c105b32c>] vfs_ioctl+0x23d/0x250
  [<c11b533e>] schedule+0x2d0/0x2e6
  [<c105b372>] sys_ioctl+0x33/0x4d
  [<c1003d1e>] syscall_call+0x7/0xb
  =======================
Code: 9a 08 03 00 00 8b 73 30 74 14 c7 44 24 04 28 76 1c c1 c7 04 24 49 51 23 c1 e8 b0 74 
f1 ff 8b 83 d8 00 00 00 83 3d 1c 47 30 c1 00 <8b> 40 10 8b a8 58 1e 00 00 8b 43 28 8b b8 
64 01 00 00 74 32 8b
EIP: [<c10ff416>] mga_dma_buffers+0x189/0x2e3 SS:ESP 0068:dddaaec0
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010
  printing eip:
c10ff416
*pde = 00000000
Oops: 0000 [#2]
PREEMPT
Modules linked in: nfsd exportfs lockd nfs_acl sunrpc nls_iso8859_1 nls_cp437 vfat fat 
nls_base
CPU:    0
EIP:    0060:[<c10ff416>]    Not tainted VLI
EFLAGS: 00210246   (2.6.22.5-cfs-v20.5-local #5)
EIP is at mga_dma_buffers+0x189/0x2e3
eax: 00000000   ebx: efd07200   ecx: 00000001   edx: efc32c00
esi: 00000000   edi: c12756cc   ebp: dfea4780   esp: e0552ec0
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process glxgears (pid: 1776, ti=e0552000 task=c19ec000 task.ti=e0552000)
Stack: efc32c00 00000000 00000003 efc64b40 c10fa54b efc64b40 efc32c00 00000000
        00000003 00000000 00000000 00000000 00000000 00000001 00010000 bf8dbdcc
        bf8dbdc8 00000000 c10ff28d 00000029 c12756cc dfea4780 c10f87fc bf8dbd54
Call Trace:
  [<c10fa54b>] drm_lock+0x255/0x2de
  [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
  [<c10f87fc>] drm_ioctl+0x142/0x18a
  [<c11b53f6>] preempt_schedule+0x4e/0x5a
  [<c10f86ba>] drm_ioctl+0x0/0x18a
  [<c10f86ba>] drm_ioctl+0x0/0x18a
  [<c105b0d7>] do_ioctl+0x87/0x9f
  [<c105b32c>] vfs_ioctl+0x23d/0x250
  [<c11b52a9>] schedule+0x23b/0x2e6
  [<c11b533e>] schedule+0x2d0/0x2e6
  [<c105b372>] sys_ioctl+0x33/0x4d
  [<c1003d1e>] syscall_call+0x7/0xb
  =======================
Code: 9a 08 03 00 00 8b 73 30 74 14 c7 44 24 04 28 76 1c c1 c7 04 24 49 51 23 c1 e8 b0 74 
f1 ff 8b 83 d8 00 00 00 83 3d 1c 47 30 c1 00 <8b> 40 10 8b a8 58 1e 00 00 8b 43 28 8b b8 
64 01 00 00 74 32 8b
EIP: [<c10ff416>] mga_dma_buffers+0x189/0x2e3 SS:ESP 0068:e0552ec0
[drm:drm_release] *ERROR* Device busy: 2 0

Rene.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29 19:56                                             ` Rene Herman
@ 2007-08-30  7:05                                               ` Rene Herman
  2007-08-30  7:20                                                 ` Ingo Molnar
  2007-08-31  6:46                                                 ` Tilman Sauerbeck
  2007-08-30 16:06                                               ` CFS review Chuck Ebbert
  1 sibling, 2 replies; 123+ messages in thread
From: Rene Herman @ 2007-08-30  7:05 UTC (permalink / raw)
  To: keith.packard
  Cc: Ingo Molnar, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel, airlied, dri-devel

On 08/29/2007 09:56 PM, Rene Herman wrote:

Realised the BUGs may mean the kernel DRM people could want to be in CC...

> On 08/29/2007 05:57 PM, Keith Packard wrote:
> 
>> With X server 1.3, I'm getting consistent crashes with two glxgear
>> instances running. So, if you're getting any output, it's better than my
>> situation.
> 
> Before people focuss on software rendering too much -- also with 1.3.0
> (and a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly
> crummy using hardware rendering. While I can move the glxgears window
> itself, the actual spinning wheels stay in the upper-left corner of the
> screen and the movement leaves a non-repainting trace on the screen.
> Running a second instance of glxgears in addition seems to make both
> instances unkillable -- and when I just now forcefully killed X in this
> situation (the spinning wheels were covering the upper left corner of all
> my desktops) I got the below.
> 
> Kernel is 2.6.22.5-cfs-v20.5, schedule() is in the traces (but that may be
> expected anyway).
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 00000010
>  printing eip:
> c10ff416
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT
> Modules linked in: nfsd exportfs lockd nfs_acl sunrpc nls_iso8859_1 
> nls_cp437 vfat fat nls_base
> CPU:    0
> EIP:    0060:[<c10ff416>]    Not tainted VLI
> EFLAGS: 00210246   (2.6.22.5-cfs-v20.5-local #5)
> EIP is at mga_dma_buffers+0x189/0x2e3
> eax: 00000000   ebx: efd07200   ecx: 00000001   edx: efc32c00
> esi: 00000000   edi: c12756cc   ebp: dfea44c0   esp: dddaaec0
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process glxgears (pid: 1775, ti=dddaa000 task=e9daca60 task.ti=dddaa000)
> Stack: efc32c00 00000000 00000004 e4c3bd20 c10fa54b e4c3bd20 efc32c00 
> 00000000
>        00000004 00000000 00000000 00000000 00000000 00000001 00010000 
> bfbdb8bc
>        bfbdb8b8 00000000 c10ff28d 00000029 c12756cc dfea44c0 c10f87fc 
> bfbdb844
> Call Trace:
>  [<c10fa54b>] drm_lock+0x255/0x2de
>  [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
>  [<c10f87fc>] drm_ioctl+0x142/0x18a
>  [<c1005973>] do_IRQ+0x97/0xb0
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c105b0d7>] do_ioctl+0x87/0x9f
>  [<c105b32c>] vfs_ioctl+0x23d/0x250
>  [<c11b533e>] schedule+0x2d0/0x2e6
>  [<c105b372>] sys_ioctl+0x33/0x4d
>  [<c1003d1e>] syscall_call+0x7/0xb
>  =======================
> Code: 9a 08 03 00 00 8b 73 30 74 14 c7 44 24 04 28 76 1c c1 c7 04 24 49 
> 51 23 c1 e8 b0 74 f1 ff 8b 83 d8 00 00 00 83 3d 1c 47 30 c1 00 <8b> 40 
> 10 8b a8 58 1e 00 00 8b 43 28 8b b8 64 01 00 00 74 32 8b
> EIP: [<c10ff416>] mga_dma_buffers+0x189/0x2e3 SS:ESP 0068:dddaaec0
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 00000010
>  printing eip:
> c10ff416
> *pde = 00000000
> Oops: 0000 [#2]
> PREEMPT
> Modules linked in: nfsd exportfs lockd nfs_acl sunrpc nls_iso8859_1 
> nls_cp437 vfat fat nls_base
> CPU:    0
> EIP:    0060:[<c10ff416>]    Not tainted VLI
> EFLAGS: 00210246   (2.6.22.5-cfs-v20.5-local #5)
> EIP is at mga_dma_buffers+0x189/0x2e3
> eax: 00000000   ebx: efd07200   ecx: 00000001   edx: efc32c00
> esi: 00000000   edi: c12756cc   ebp: dfea4780   esp: e0552ec0
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process glxgears (pid: 1776, ti=e0552000 task=c19ec000 task.ti=e0552000)
> Stack: efc32c00 00000000 00000003 efc64b40 c10fa54b efc64b40 efc32c00 
> 00000000
>        00000003 00000000 00000000 00000000 00000000 00000001 00010000 
> bf8dbdcc
>        bf8dbdc8 00000000 c10ff28d 00000029 c12756cc dfea4780 c10f87fc 
> bf8dbd54
> Call Trace:
>  [<c10fa54b>] drm_lock+0x255/0x2de
>  [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
>  [<c10f87fc>] drm_ioctl+0x142/0x18a
>  [<c11b53f6>] preempt_schedule+0x4e/0x5a
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c105b0d7>] do_ioctl+0x87/0x9f
>  [<c105b32c>] vfs_ioctl+0x23d/0x250
>  [<c11b52a9>] schedule+0x23b/0x2e6
>  [<c11b533e>] schedule+0x2d0/0x2e6
>  [<c105b372>] sys_ioctl+0x33/0x4d
>  [<c1003d1e>] syscall_call+0x7/0xb
>  =======================
> Code: 9a 08 03 00 00 8b 73 30 74 14 c7 44 24 04 28 76 1c c1 c7 04 24 49 
> 51 23 c1 e8 b0 74 f1 ff 8b 83 d8 00 00 00 83 3d 1c 47 30 c1 00 <8b> 40 
> 10 8b a8 58 1e 00 00 8b 43 28 8b b8 64 01 00 00 74 32 8b
> EIP: [<c10ff416>] mga_dma_buffers+0x189/0x2e3 SS:ESP 0068:e0552ec0
> [drm:drm_release] *ERROR* Device busy: 2 0

Rene.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-30  7:05                                               ` Rene Herman
@ 2007-08-30  7:20                                                 ` Ingo Molnar
  2007-08-31  6:46                                                 ` Tilman Sauerbeck
  1 sibling, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-30  7:20 UTC (permalink / raw)
  To: Rene Herman
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	Andrew Morton, Linus Torvalds, linux-kernel, airlied, dri-devel


* Rene Herman <rene.herman@gmail.com> wrote:

> Realised the BUGs may mean the kernel DRM people could want to be in CC...

and note that the schedule() call in there is not part of the crash 
backtrace:

> >Call Trace:
> > [<c10fa54b>] drm_lock+0x255/0x2de
> > [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
> > [<c10f87fc>] drm_ioctl+0x142/0x18a
> > [<c1005973>] do_IRQ+0x97/0xb0
> > [<c10f86ba>] drm_ioctl+0x0/0x18a
> > [<c10f86ba>] drm_ioctl+0x0/0x18a
> > [<c105b0d7>] do_ioctl+0x87/0x9f
> > [<c105b32c>] vfs_ioctl+0x23d/0x250
> > [<c11b533e>] schedule+0x2d0/0x2e6
> > [<c105b372>] sys_ioctl+0x33/0x4d
> > [<c1003d1e>] syscall_call+0x7/0xb

it just happened to be on the kernel stack. Nor is the do_IRQ() entry 
real. Both are frequent functions (and were executed recently) that's 
why they were still in the stackframe.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-29 19:56                                             ` Rene Herman
  2007-08-30  7:05                                               ` Rene Herman
@ 2007-08-30 16:06                                               ` Chuck Ebbert
  2007-08-30 16:48                                                 ` Rene Herman
  1 sibling, 1 reply; 123+ messages in thread
From: Chuck Ebbert @ 2007-08-30 16:06 UTC (permalink / raw)
  To: Rene Herman
  Cc: keith.packard, Ingo Molnar, Al Boldi, Peter Zijlstra,
	Mike Galbraith, Andrew Morton, Linus Torvalds, linux-kernel,
	Dave Airlie

On 08/29/2007 03:56 PM, Rene Herman wrote:
> 
> Before people focuss on software rendering too much -- also with 1.3.0 (and
> a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly crummy
> using
> hardware rendering. While I can move the glxgears window itself, the actual
> spinning wheels stay in the upper-left corner of the screen and the
> movement
> leaves a non-repainting trace on the screen. Running a second instance of
> glxgears in addition seems to make both instances unkillable  -- and when
> I just now forcefully killed X in this situation (the spinning wheels were
> covering the upper left corner of all my desktops) I got the below.
> 
> Kernel is 2.6.22.5-cfs-v20.5, schedule() is in the traces (but that may be
> expected anyway).
> 

And this doesn't happen at all with the stock scheduler? (Just confirming,
in case you didn't compare.)

> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000010
>  printing eip:
> c10ff416
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT

Try it without preempt?

> Modules linked in: nfsd exportfs lockd nfs_acl sunrpc nls_iso8859_1
> nls_cp437 vfat fat nls_base
> CPU:    0
> EIP:    0060:[<c10ff416>]    Not tainted VLI
> EFLAGS: 00210246   (2.6.22.5-cfs-v20.5-local #5)
> EIP is at mga_dma_buffers+0x189/0x2e3
> eax: 00000000   ebx: efd07200   ecx: 00000001   edx: efc32c00
> esi: 00000000   edi: c12756cc   ebp: dfea44c0   esp: dddaaec0
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process glxgears (pid: 1775, ti=dddaa000 task=e9daca60 task.ti=dddaa000)
> Stack: efc32c00 00000000 00000004 e4c3bd20 c10fa54b e4c3bd20 efc32c00
> 00000000
>        00000004 00000000 00000000 00000000 00000000 00000001 00010000
> bfbdb8bc
>        bfbdb8b8 00000000 c10ff28d 00000029 c12756cc dfea44c0 c10f87fc
> bfbdb844
> Call Trace:
>  [<c10fa54b>] drm_lock+0x255/0x2de
>  [<c10ff28d>] mga_dma_buffers+0x0/0x2e3
>  [<c10f87fc>] drm_ioctl+0x142/0x18a
>  [<c1005973>] do_IRQ+0x97/0xb0
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c10f86ba>] drm_ioctl+0x0/0x18a
>  [<c105b0d7>] do_ioctl+0x87/0x9f
>  [<c105b32c>] vfs_ioctl+0x23d/0x250
>  [<c11b533e>] schedule+0x2d0/0x2e6
>  [<c105b372>] sys_ioctl+0x33/0x4d
>  [<c1003d1e>] syscall_call+0x7/0xb
>  =======================
> Code: 9a 08 03 00 00 8b 73 30 74 14 c7 44 24 04 28 76 1c c1 c7 04 24 49
> 51 23 c1 e8 b0 74 f1 ff 8b 83 d8 00 00 00 83 3d 1c 47 30 c1 00 <8b> 40
> 10 8b a8 58 1e 00 00 8b 43 28 8b b8 64 01 00 00 74 32 8b
> EIP: [<c10ff416>] mga_dma_buffers+0x189/0x2e3 SS:ESP 0068:dddaaec0

dev->dev_private->mmio is NULL when trying to access mmio.handle

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-30 16:06                                               ` CFS review Chuck Ebbert
@ 2007-08-30 16:48                                                 ` Rene Herman
  0 siblings, 0 replies; 123+ messages in thread
From: Rene Herman @ 2007-08-30 16:48 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: keith.packard, Ingo Molnar, Al Boldi, Peter Zijlstra,
	Mike Galbraith, Andrew Morton, Linus Torvalds, linux-kernel,
	Dave Airlie

On 08/30/2007 06:06 PM, Chuck Ebbert wrote:

> On 08/29/2007 03:56 PM, Rene Herman wrote:

>> Before people focuss on software rendering too much -- also with 1.3.0
>> (and a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly
>> crummy using hardware rendering. While I can move the glxgears window
>> itself, the actual spinning wheels stay in the upper-left corner of the
>> screen and the movement leaves a non-repainting trace on the screen.
>> Running a second instance of glxgears in addition seems to make both
>> instances unkillable -- and when I just now forcefully killed X in this
>> situation (the spinning wheels were covering the upper left corner of
>> all my desktops) I got the below.
>> 
>> Kernel is 2.6.22.5-cfs-v20.5, schedule() is in the traces (but that may
>> be expected anyway).

> And this doesn't happen at all with the stock scheduler? (Just confirming,
> in case you didn't compare.)

I didn't compare -- it no doubt will. I know the title of this thread is 
"CFS review" but it turned into Keith Packard noticing glxgears being broken 
on recent-ish X.org. The start of the thread was about things being broken 
using _software_ rendering though, so I thought it might be useful to 
remark/report glxgears also being quite broken using hardware rendering on 
my setup at least.

>> BUG: unable to handle kernel NULL pointer dereference at virtual address
>> 00000010
>>  printing eip:
>> c10ff416
>> *pde = 00000000
>> Oops: 0000 [#1]
>> PREEMPT
> 
> Try it without preempt?

If you're asking in a "I'll go debug the DRM" way I'll go dig a bit later 
(please say) but if you are only interested in the thread due to CFS, note 
that I'm aware it's not likely to have anything to do with CFS.

It's not reproducable for you? (full description of bug above).

Rene.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-30  7:05                                               ` Rene Herman
  2007-08-30  7:20                                                 ` Ingo Molnar
@ 2007-08-31  6:46                                                 ` Tilman Sauerbeck
  2007-08-31 10:44                                                   ` DRM and/or X trouble (was Re: CFS review) Rene Herman
  1 sibling, 1 reply; 123+ messages in thread
From: Tilman Sauerbeck @ 2007-08-31  6:46 UTC (permalink / raw)
  To: Rene Herman
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	linux-kernel, airlied, Ingo Molnar, dri-devel, Linus Torvalds,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

Rene Herman [2007-08-30 09:05]:
> On 08/29/2007 09:56 PM, Rene Herman wrote:
> 
> Realised the BUGs may mean the kernel DRM people could want to be in CC...
> 
> > On 08/29/2007 05:57 PM, Keith Packard wrote:
> > 
> >> With X server 1.3, I'm getting consistent crashes with two glxgear
> >> instances running. So, if you're getting any output, it's better than my
> >> situation.
> > 
> > Before people focuss on software rendering too much -- also with 1.3.0
> > (and a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly
> > crummy using hardware rendering. While I can move the glxgears window
> > itself, the actual spinning wheels stay in the upper-left corner of the
> > screen and the movement leaves a non-repainting trace on the screen.

This sounds like you're running an older version of Mesa.
The bugfix went into Mesa 6.3 and 7.0.

> > Running a second instance of glxgears in addition seems to make both
> > instances unkillable -- and when I just now forcefully killed X in this
> > situation (the spinning wheels were covering the upper left corner of all
> > my desktops) I got the below.

Running two instances of glxgears and killing them works for me, too.

I'm using xorg-server 1.3.0.0, Mesa 7.0.1 with the latest DRM bits from
http://gitweb.freedesktop.org/?p=mesa/drm.git;a=summary

I'm not running CFS though, but I guess the oops wasn't related to that.

Regards,
Tilman

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* DRM and/or X trouble (was Re: CFS review)
  2007-08-31  6:46                                                 ` Tilman Sauerbeck
@ 2007-08-31 10:44                                                   ` Rene Herman
  2007-08-31 14:55                                                     ` DRM and/or X trouble Satyam Sharma
  0 siblings, 1 reply; 123+ messages in thread
From: Rene Herman @ 2007-08-31 10:44 UTC (permalink / raw)
  To: Tilman Sauerbeck
  Cc: keith.packard, Al Boldi, Peter Zijlstra, Mike Galbraith,
	linux-kernel, airlied, Ingo Molnar, dri-devel, Linus Torvalds,
	Andrew Morton

On 08/31/2007 08:46 AM, Tilman Sauerbeck wrote:

>> On 08/29/2007 09:56 PM, Rene Herman wrote:

>>>> With X server 1.3, I'm getting consistent crashes with two glxgear
>>>> instances running. So, if you're getting any output, it's better than my
>>>> situation.
>>> Before people focuss on software rendering too much -- also with 1.3.0
>>> (and a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly
>>> crummy using hardware rendering. While I can move the glxgears window
>>> itself, the actual spinning wheels stay in the upper-left corner of the
>>> screen and the movement leaves a non-repainting trace on the screen.
> 
> This sounds like you're running an older version of Mesa.
> The bugfix went into Mesa 6.3 and 7.0.

I have Mesa 6.5.2 it seems (slackware-12.0 standard):

OpenGL renderer string: Mesa DRI G400 20061030 AGP 2x x86/MMX+/3DNow!+/SSE
OpenGL version string: 1.2 Mesa 6.5.2

The bit of the problem sketched above -- the gears just sitting there in the 
upper left corner of the screen and not moving alongside their window is 
fully reproduceable. The bit below ... :

>>> Running a second instance of glxgears in addition seems to make both
>>> instances unkillable -- and when I just now forcefully killed X in this
>>> situation (the spinning wheels were covering the upper left corner of all
>>> my desktops) I got the below.

[ two kernel BUGs ]

... isn't. This seems to (again) have been a race of sorts that I hit by 
accident since I haven't reproduced yet. Had the same type of "racyness" 
trouble with keyboard behaviour in this version of X earlier.

> Running two instances of glxgears and killing them works for me, too.
> 
> I'm using xorg-server 1.3.0.0, Mesa 7.0.1 with the latest DRM bits from
> http://gitweb.freedesktop.org/?p=mesa/drm.git;a=summary

For me, everything standard slackware-12.0 (X.org 1.3.0) and kernel 2.6.22 DRM.

> I'm not running CFS though, but I guess the oops wasn't related to that.

I've noticed before the Matrox driver seems to get little attention/testing 
so maybe that's just it. A G550 is ofcourse in graphics-time a Model T by 
now. I'm rather decidedly not a graphics person so I don't care a lot but 
every time I try to do something fashionable (run Google Earth for example) 
I notice things are horribly, horribly broken.

X bugs I do not find very interesting (there's just too many) and the kernel 
bugs are requiring more time to reproduce than I have available. If the BUGs 
as posted aren't enough for a diagnosis, please consider the report withdrawn.

Rene.


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: DRM and/or X trouble
  2007-08-31 10:44                                                   ` DRM and/or X trouble (was Re: CFS review) Rene Herman
@ 2007-08-31 14:55                                                     ` Satyam Sharma
  0 siblings, 0 replies; 123+ messages in thread
From: Satyam Sharma @ 2007-08-31 14:55 UTC (permalink / raw)
  To: Rene Herman; +Cc: Linux Kernel Mailing List, airlied, dri-devel

[ Trimmed Cc: list, dropped sched folk, retained DRM. ]


On Fri, 31 Aug 2007, Rene Herman wrote:

> On 08/31/2007 08:46 AM, Tilman Sauerbeck wrote:
> 
> > > On 08/29/2007 09:56 PM, Rene Herman wrote:
> 
> > > > > With X server 1.3, I'm getting consistent crashes with two glxgear
> > > > > instances running. So, if you're getting any output, it's better than
> > > > > my
> > > > > situation.
> > > > Before people focuss on software rendering too much -- also with 1.3.0
> > > > (and a Matrox Millenium G550 AGP, 32M) glxgears also works decidedly
> > > > crummy using hardware rendering. While I can move the glxgears window
> > > > itself, the actual spinning wheels stay in the upper-left corner of the
> > > > screen and the movement leaves a non-repainting trace on the screen.
> > 
> > This sounds like you're running an older version of Mesa.
> > The bugfix went into Mesa 6.3 and 7.0.
> 
> I have Mesa 6.5.2 it seems (slackware-12.0 standard):
> 
> OpenGL renderer string: Mesa DRI G400 20061030 AGP 2x x86/MMX+/3DNow!+/SSE
> OpenGL version string: 1.2 Mesa 6.5.2
> 
> The bit of the problem sketched above -- the gears just sitting there in the
> upper left corner of the screen and not moving alongside their window is fully
> reproduceable. The bit below ... :
> 
> > > > Running a second instance of glxgears in addition seems to make both
> > > > instances unkillable -- and when I just now forcefully killed X in this
> > > > situation (the spinning wheels were covering the upper left corner of
> > > > all
> > > > my desktops) I got the below.
> 
> [ two kernel BUGs ]
> 
> ... isn't. This seems to (again) have been a race of sorts that I hit by
> accident since I haven't reproduced yet. Had the same type of "racyness"
> trouble with keyboard behaviour in this version of X earlier.

Dave (Airlie), this is an oops first reported at:
http://lkml.org/lkml/2007/8/30/9

mga_freelist_get() is inlined at its only callsite mga_dma_get_buffers()
that is in turn inlined at its only callsite in mga_dma_buffers(). This
oops was hit ...

static struct drm_buf *mga_freelist_get(struct drm_device * dev)
{
	...
	head = MGA_READ(MGA_PRIMADDRESS); <=== ... HERE.
	...
}

MGA_READ() is DRM_READ32(), and dev_priv->mmio was found to be NULL when
trying to access dev_priv->mmio->handle as shown above.


> > Running two instances of glxgears and killing them works for me, too.
> > 
> > I'm using xorg-server 1.3.0.0, Mesa 7.0.1 with the latest DRM bits from
> > http://gitweb.freedesktop.org/?p=mesa/drm.git;a=summary
> 
> For me, everything standard slackware-12.0 (X.org 1.3.0) and kernel 2.6.22
> DRM.
> 
> > I'm not running CFS though, but I guess the oops wasn't related to that.
> 
> I've noticed before the Matrox driver seems to get little attention/testing so
> maybe that's just it. A G550 is ofcourse in graphics-time a Model T by now.
> I'm rather decidedly not a graphics person so I don't care a lot but every
> time I try to do something fashionable (run Google Earth for example) I notice
> things are horribly, horribly broken.
> 
> X bugs I do not find very interesting (there's just too many) and the kernel
> bugs are requiring more time to reproduce than I have available. If the BUGs
> as posted aren't enough for a diagnosis, please consider the report withdrawn.

As you already know by now, this oops isn't a bug or anything in the
scheduler at all, but more likely a race in the DRM itself (which possibly
could have been exposed by some aspect of CFS). So it makes sense to
"withdraw" this as a CFS-related bug report, but definitely not as a DRM-
related bug report.


Satyam

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-21  7:33                                                         ` Mike Galbraith
  2007-08-21  8:35                                                           ` Ingo Molnar
@ 2007-08-21 11:54                                                           ` Roman Zippel
  1 sibling, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-21 11:54 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Willy Tarreau, Michael Chang, Linus Torvalds,
	Andi Kleen, Andrew Morton, linux-kernel

Hi,

On Tue, 21 Aug 2007, Mike Galbraith wrote:

> I thought this was history.  With your config, I was finally able to
> reproduce the anomaly (only with your proggy though), and Ingo's patch
> does indeed fix it here.
> 
> Freshly reproduced anomaly and patch verification, running 2.6.23-rc3
> with your config, both with and without Ingo's patch reverted:

I did update to 2.6.23-rc3-git1 first, but I ended up reverting the patch, 
as I didn't notice it had been applied already. Sorry about that.
With this patch the underflows are gone, but there are still the 
overflows, so the questions from the last mail still remain.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-21  7:33                                                         ` Mike Galbraith
@ 2007-08-21  8:35                                                           ` Ingo Molnar
  2007-08-21 11:54                                                           ` Roman Zippel
  1 sibling, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-21  8:35 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Roman Zippel, Willy Tarreau, Michael Chang, Linus Torvalds,
	Andi Kleen, Andrew Morton, linux-kernel


* Mike Galbraith <efault@gmx.de> wrote:

> > It doesn't make much of a difference.
> 
> I thought this was history.  With your config, I was finally able to 
> reproduce the anomaly (only with your proggy though), and Ingo's patch 
> does indeed fix it here.
> 
> Freshly reproduced anomaly and patch verification, running 2.6.23-rc3 
> with your config, both with and without Ingo's patch reverted:
> 
> 6561 root      20   0  1696  492  404 S 32.0  0.0   0:30.83 0 lt
> 6562 root      20   0  1696  336  248 R 32.0  0.0   0:30.79 0 lt
> 6563 root      20   0  1696  336  248 R 32.0  0.0   0:30.80 0 lt
> 6564 root      20   0  2888 1236 1028 R  4.6  0.1   0:05.26 0 sh
> 
> 6507 root      20   0  2888 1236 1028 R 25.8  0.1   0:30.75 0 sh
> 6504 root      20   0  1696  492  404 R 24.4  0.0   0:29.26 0 lt
> 6505 root      20   0  1696  336  248 R 24.4  0.0   0:29.26 0 lt
> 6506 root      20   0  1696  336  248 R 24.4  0.0   0:29.25 0 lt

oh, great! I'm glad we didnt discard this as a pure sched_clock 
resolution artifact.

Roman, a quick & easy request: please send the usual cfs-debug-info.sh 
output captured while your testcase is running. (Preferably try .23-rc3 
or later as Mike did, which has the most recent scheduler code, it 
includes the patch i sent to you already.) I'll reply to your 
sleeper-fairness questions separately, but in any case we need to figure 
out what's happening on your box - if you can still reproduce it with 
.23-rc3. Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-20 22:19                                                       ` Roman Zippel
@ 2007-08-21  7:33                                                         ` Mike Galbraith
  2007-08-21  8:35                                                           ` Ingo Molnar
  2007-08-21 11:54                                                           ` Roman Zippel
  0 siblings, 2 replies; 123+ messages in thread
From: Mike Galbraith @ 2007-08-21  7:33 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Willy Tarreau, Michael Chang, Linus Torvalds,
	Andi Kleen, Andrew Morton, linux-kernel

On Tue, 2007-08-21 at 00:19 +0200, Roman Zippel wrote: 
> Hi,
> 
> On Sat, 11 Aug 2007, Ingo Molnar wrote:
> 
> > the only relevant thing that comes to mind at the moment is that last 
> > week Peter noticed a buggy aspect of sleeper bonuses (in that we do not 
> > rate-limit their output, hence we 'waste' them instead of redistributing 
> > them), and i've got the small patch below in my queue to fix that - 
> > could you give it a try?
> 
> It doesn't make much of a difference.

I thought this was history.  With your config, I was finally able to
reproduce the anomaly (only with your proggy though), and Ingo's patch
does indeed fix it here.

Freshly reproduced anomaly and patch verification, running 2.6.23-rc3
with your config, both with and without Ingo's patch reverted:

6561 root      20   0  1696  492  404 S 32.0  0.0   0:30.83 0 lt
6562 root      20   0  1696  336  248 R 32.0  0.0   0:30.79 0 lt
6563 root      20   0  1696  336  248 R 32.0  0.0   0:30.80 0 lt
6564 root      20   0  2888 1236 1028 R  4.6  0.1   0:05.26 0 sh

6507 root      20   0  2888 1236 1028 R 25.8  0.1   0:30.75 0 sh
6504 root      20   0  1696  492  404 R 24.4  0.0   0:29.26 0 lt
6505 root      20   0  1696  336  248 R 24.4  0.0   0:29.26 0 lt
6506 root      20   0  1696  336  248 R 24.4  0.0   0:29.25 0 lt

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-11  0:30                                                     ` Ingo Molnar
@ 2007-08-20 22:19                                                       ` Roman Zippel
  2007-08-21  7:33                                                         ` Mike Galbraith
  0 siblings, 1 reply; 123+ messages in thread
From: Roman Zippel @ 2007-08-20 22:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Willy Tarreau, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Sat, 11 Aug 2007, Ingo Molnar wrote:

> the only relevant thing that comes to mind at the moment is that last 
> week Peter noticed a buggy aspect of sleeper bonuses (in that we do not 
> rate-limit their output, hence we 'waste' them instead of redistributing 
> them), and i've got the small patch below in my queue to fix that - 
> could you give it a try?

It doesn't make much of a difference. OTOH if I disabled the sleeper code 
completely in __update_curr(), I get this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3139 roman     20   0  1796  344  256 R 21.7  0.3   0:02.68 lt
 3138 roman     20   0  1796  344  256 R 21.7  0.3   0:02.68 lt
 3137 roman     20   0  1796  520  432 R 21.7  0.4   0:02.68 lt
 3136 roman     20   0  1532  268  216 R 34.5  0.2   0:06.82 l

Disabling this code completely via sched_features makes only a minor 
difference:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3139 roman     20   0  1796  344  256 R 20.4  0.3   0:09.94 lt
 3138 roman     20   0  1796  344  256 R 20.4  0.3   0:09.94 lt
 3137 roman     20   0  1796  520  432 R 20.4  0.4   0:09.94 lt
 3136 roman     20   0  1532  268  216 R 39.1  0.2   0:19.20 l

> this is just a blind stab into the dark - i couldnt see any real impact 
> from that patch in various workloads (and it's not upstream yet), so it 
> might not make a big difference.

Can we please skip to the point, where you try to explain the intention a 
little more?
If I had to guess that this is supposed to keep the runtime balance, then 
it would be better to use wait_runtime to adjust fair_clock, from where it 
would be evenly distributed to all tasks (but this had to be done during 
enqueue and dequeue). OTOH this also had then a consequence for the wait 
queue, as fair_clock is used to calculate fair_key.
IMHO current wait_runtime should have some influence in calculating the 
sleep bonus, so that wait_runtime doesn't constantly overflow for tasks 
which only run occasionally.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-11  5:28                                                       ` Willy Tarreau
@ 2007-08-12  5:17                                                         ` Ingo Molnar
  0 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-12  5:17 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Roman Zippel, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel


* Willy Tarreau <w@1wt.eu> wrote:

> > 1. Two simple busy loops, one of them is reniced to 15, according to 
> > my calculations the reniced task should get about 3.4% 
> > (1/(1.25^15+1)), but I get this:
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  4433 roman     20   0  1532  300  244 R 99.2  0.2   5:05.51 l
> >  4434 roman     35  15  1532   72   16 R  0.7  0.1   0:10.62 l
> 
> Could this be caused by typos in some tables like you have found in 
> wmult ?

note that the typo was not in the weight table but in the inverse weight 
table which didnt really affect CPU utilization (that's why we didnt 
notice the typo sooner). Regarding the above problem with nice +15 being 
beefier than intended i'd suggest to re-test with a doubled 
/proc/sys/kernel/sched_runtime_limit value, or with:

  echo 30 > /proc/sys/kernel/sched_features

(which turns off sleeper fairness)

> >  4477 roman      5 -15  1532   68   16 R  8.6  0.1   0:07.63 l
> >  4476 roman      5 -15  1532   68   16 R  9.6  0.1   0:07.38 l
> >  4475 roman      5 -15  1532   68   16 R  1.3  0.1   0:07.09 l
> >  4474 roman      5 -15  1532   68   16 R  2.3  0.1   0:07.97 l
> >  4473 roman      5 -15  1532  296  244 R  1.0  0.2   0:07.73 l
> 
> Do you see this only at -15, or starting with -15 and below ?

i think this was scheduling jitter caused by the larger granularity of 
negatively reniced tasks. This got improved recently, with latest -git i 
get:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3108 root       5 -15  1576  248  196 R  5.0  0.0   0:07.26 loop_silent
 3109 root       5 -15  1576  248  196 R  5.0  0.0   0:07.26 loop_silent
 3110 root       5 -15  1576  248  196 R  5.0  0.0   0:07.26 loop_silent
 3111 root       5 -15  1576  244  196 R  5.0  0.0   0:07.26 loop_silent
 3112 root       5 -15  1576  248  196 R  5.0  0.0   0:07.26 loop_silent
 3113 root       5 -15  1576  248  196 R  5.0  0.0   0:07.26 loop_silent

that's picture-perfect CPU time distribution. But, and that's fair to 
say, i never ran such an artificial workload of 20x nice -15 infinite 
loops (!) before, and boy does interactivity suck (as expected) ;)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 22:50                                                     ` Roman Zippel
@ 2007-08-11  5:28                                                       ` Willy Tarreau
  2007-08-12  5:17                                                         ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Willy Tarreau @ 2007-08-11  5:28 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

On Sat, Aug 11, 2007 at 12:50:08AM +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 10 Aug 2007, Ingo Molnar wrote:
> 
> > achieve that. It probably wont make a real difference, but it's really 
> > easy for you to send and it's still very useful when one tries to 
> > eliminate possibilities and when one wants to concentrate on the 
> > remaining possibilities alone.
> 
> The thing I'm afraid about CFS is its possible unpredictability, which 
> would make it hard to reproduce problems and we may end up with users with 
> unexplainable weird problems. That's the main reason I'm trying so hard to 
> push for a design discussion.

You may be interested by looking at the very early CFS versions. The design
was much more naive and understandable. After that, a lot of tricks have
been added to take into account a lot of uses and corner cases, which may
not help in understanding it globally.

> Just to give an idea here are two more examples of irregular behaviour, 
> which are hopefully easier to reproduce.
> 
> 1. Two simple busy loops, one of them is reniced to 15, according to my 
> calculations the reniced task should get about 3.4% (1/(1.25^15+1)), but I 
> get this:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4433 roman     20   0  1532  300  244 R 99.2  0.2   5:05.51 l
>  4434 roman     35  15  1532   72   16 R  0.7  0.1   0:10.62 l

Could this be caused by typos in some tables like you have found in wmult ?

> OTOH upto nice level 12 I get what I expect.
> 
> 2. If I start 20 busy loops, initially I see in top that every task gets 
> 5% and time increments equally (as it should):
(...)

> But if I renice all of them to -15, the time every task gets is rather 
> random:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4492 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.95 l
>  4491 roman      5 -15  1532   68   16 R  4.3  0.1   0:07.62 l
>  4490 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.50 l
>  4489 roman      5 -15  1532   68   16 R  7.6  0.1   0:07.80 l
>  4488 roman      5 -15  1532   68   16 R  9.6  0.1   0:08.31 l
>  4487 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.59 l
>  4486 roman      5 -15  1532   68   16 R  6.6  0.1   0:07.08 l
>  4485 roman      5 -15  1532   68   16 R 10.0  0.1   0:07.31 l
>  4484 roman      5 -15  1532   68   16 R  8.0  0.1   0:07.30 l
>  4483 roman      5 -15  1532   68   16 R  7.0  0.1   0:07.34 l
>  4482 roman      5 -15  1532   68   16 R  1.0  0.1   0:05.84 l
>  4481 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.16 l
>  4480 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.00 l
>  4479 roman      5 -15  1532   68   16 R  1.0  0.1   0:06.66 l
>  4478 roman      5 -15  1532   68   16 R  8.6  0.1   0:06.96 l
>  4477 roman      5 -15  1532   68   16 R  8.6  0.1   0:07.63 l
>  4476 roman      5 -15  1532   68   16 R  9.6  0.1   0:07.38 l
>  4475 roman      5 -15  1532   68   16 R  1.3  0.1   0:07.09 l
>  4474 roman      5 -15  1532   68   16 R  2.3  0.1   0:07.97 l
>  4473 roman      5 -15  1532  296  244 R  1.0  0.2   0:07.73 l

Do you see this only at -15, or starting with -15 and below ?

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 21:15                                                 ` Roman Zippel
  2007-08-10 21:36                                                   ` Ingo Molnar
@ 2007-08-11  5:15                                                   ` Willy Tarreau
  1 sibling, 0 replies; 123+ messages in thread
From: Willy Tarreau @ 2007-08-11  5:15 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Michael Chang, Ingo Molnar, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

On Fri, Aug 10, 2007 at 11:15:55PM +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 10 Aug 2007, Willy Tarreau wrote:
> 
> > fortunately all bug reporters are not like you. It's amazing how long
> > you can resist sending a simple bug report to a developer!
> 
> I'm more amazed how long Ingo can resist providing some explanations (not 
> just about this problem).

It's a matter of time balance. It takes a short time to send the output
of a script, and it takes a very long time to explain how things work.
I often encounter the same situation with haproxy. People ask me to
explain them in detail how this or that would apply to their context, and
it's often easier for me to provide them with a 5-lines patch to add the
feature they need, than to spend half an hour explaining why and how it
would badly behave.

> It's not like I haven't given him anything, he already has the test 
> programs, he already knows the system configuration.
> Well, I've sent him the stuff now...

fine, thanks.

> > Maybe you
> > consider that you need to fix the bug by yourself after you understand
> > the code,
> 
> Fixing the bug requires some knowledge what the code is intended to do.
> 
> > Please try to be a little bit more transparent if you really want the
> > bugs fixed, and don't behave as if you wanted this bug to survive
> > till -final.
> 
> Could you please ask Ingo the same? I'm simply trying to get some 
> transparancy into the CFS design. Without further information it's 
> difficult to tell, whether something is supposed to work this way or it's 
> a bug.

I know that Ingo tends to reply to a question with another question. But
as I said, imagine if he has to explain the same things to each person
who asks him for it. I think that a more constructive approach would be
to point what is missing/unclear/inexact in the doc so that he adds some
paragraphs for you and everyone else. If you need this information to debug,
most likely other people will need it too.

> In this case it's quite possible that due to a recent change my testcase 
> doesn't work anymore. Should I consider the problem fixed or did it just 
> go into hiding? Without more information it's difficult to verify this 
> independently.

generally, problems that appear only on one person's side and which suddenly
disappear are either caused by some random buggy patch left in the tree (not
your case it seems), or by an obscure bug of the feature being tested which
will resurface from time to time as long as it's not identified.

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 21:36                                                   ` Ingo Molnar
  2007-08-10 22:50                                                     ` Roman Zippel
@ 2007-08-11  0:30                                                     ` Ingo Molnar
  2007-08-20 22:19                                                       ` Roman Zippel
  1 sibling, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-11  0:30 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Willy Tarreau, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > Well, I've sent him the stuff now...
> 
> received it - thanks alot, looking at it!

everything looks good in your debug output and the TSC dump data, except 
for the wait_runtime values, they are quite out of balance - and that 
balance cannot be explained with jiffies granularity or with any sort of 
sched_clock() artifact. So this clearly looks like a CFS regression that 
should be fixed.

the only relevant thing that comes to mind at the moment is that last 
week Peter noticed a buggy aspect of sleeper bonuses (in that we do not 
rate-limit their output, hence we 'waste' them instead of redistributing 
them), and i've got the small patch below in my queue to fix that - 
could you give it a try?

this is just a blind stab into the dark - i couldnt see any real impact 
from that patch in various workloads (and it's not upstream yet), so it 
might not make a big difference. The trace you did (could you send the 
source for that?) seems to implicate sleeper bonuses though.

if this patch doesnt help, could you check the general theory whether 
it's related to sleeper-fairness, via turning it off:

   echo 30 > /proc/sys/kernel/sched_features

does the bug go away if you do that? If sleeper bonuses are showing too 
many artifacts then we could turn it off for final .23.

	Ingo

--------------------->
Subject: sched: fix sleeper bonus
From: Ingo Molnar <mingo@elte.hu>

Peter Ziljstra noticed that the sleeper bonus deduction code was not 
properly rate-limited: a task that scheduled more frequently would get a 
disproportionately large deduction. So limit the deduction to delta_exec 
and limit production to runtime_limit.

Not-Yet-Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -75,7 +75,7 @@ enum {
 
 unsigned int sysctl_sched_features __read_mostly =
 		SCHED_FEAT_FAIR_SLEEPERS	*1 |
-		SCHED_FEAT_SLEEPER_AVG		*1 |
+		SCHED_FEAT_SLEEPER_AVG		*0 |
 		SCHED_FEAT_SLEEPER_LOAD_AVG	*1 |
 		SCHED_FEAT_PRECISE_CPU_LOAD	*1 |
 		SCHED_FEAT_START_DEBIT		*1 |
@@ -304,11 +304,9 @@ __update_curr(struct cfs_rq *cfs_rq, str
 	delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
 
 	if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
-		delta = calc_delta_mine(cfs_rq->sleeper_bonus,
-					curr->load.weight, lw);
-		if (unlikely(delta > cfs_rq->sleeper_bonus))
-			delta = cfs_rq->sleeper_bonus;
-
+		delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec);
+		delta = calc_delta_mine(delta, curr->load.weight, lw);
+		delta = min((u64)delta, cfs_rq->sleeper_bonus);
 		cfs_rq->sleeper_bonus -= delta;
 		delta_mine -= delta;
 	}
@@ -521,6 +519,8 @@ static void __enqueue_sleeper(struct cfs
 	 * Track the amount of bonus we've given to sleepers:
 	 */
 	cfs_rq->sleeper_bonus += delta_fair;
+	if (unlikely(cfs_rq->sleeper_bonus > sysctl_sched_runtime_limit))
+		cfs_rq->sleeper_bonus = sysctl_sched_runtime_limit;
 
 	schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
 }

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 21:36                                                   ` Ingo Molnar
@ 2007-08-10 22:50                                                     ` Roman Zippel
  2007-08-11  5:28                                                       ` Willy Tarreau
  2007-08-11  0:30                                                     ` Ingo Molnar
  1 sibling, 1 reply; 123+ messages in thread
From: Roman Zippel @ 2007-08-10 22:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Willy Tarreau, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Fri, 10 Aug 2007, Ingo Molnar wrote:

> achieve that. It probably wont make a real difference, but it's really 
> easy for you to send and it's still very useful when one tries to 
> eliminate possibilities and when one wants to concentrate on the 
> remaining possibilities alone.

The thing I'm afraid about CFS is its possible unpredictability, which 
would make it hard to reproduce problems and we may end up with users with 
unexplainable weird problems. That's the main reason I'm trying so hard to 
push for a design discussion.

Just to give an idea here are two more examples of irregular behaviour, 
which are hopefully easier to reproduce.

1. Two simple busy loops, one of them is reniced to 15, according to my 
calculations the reniced task should get about 3.4% (1/(1.25^15+1)), but I 
get this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4433 roman     20   0  1532  300  244 R 99.2  0.2   5:05.51 l
 4434 roman     35  15  1532   72   16 R  0.7  0.1   0:10.62 l

OTOH upto nice level 12 I get what I expect.

2. If I start 20 busy loops, initially I see in top that every task gets 
5% and time increments equally (as it should):

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4492 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4491 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4490 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4489 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4488 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4487 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4486 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4485 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4484 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4483 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4482 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4481 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4480 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4479 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4478 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4477 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4476 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4475 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4474 roman     20   0  1532   68   16 R  5.0  0.1   0:02.86 l
 4473 roman     20   0  1532  296  244 R  5.0  0.2   0:02.86 l

But if I renice all of them to -15, the time every task gets is rather 
random:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4492 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.95 l
 4491 roman      5 -15  1532   68   16 R  4.3  0.1   0:07.62 l
 4490 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.50 l
 4489 roman      5 -15  1532   68   16 R  7.6  0.1   0:07.80 l
 4488 roman      5 -15  1532   68   16 R  9.6  0.1   0:08.31 l
 4487 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.59 l
 4486 roman      5 -15  1532   68   16 R  6.6  0.1   0:07.08 l
 4485 roman      5 -15  1532   68   16 R 10.0  0.1   0:07.31 l
 4484 roman      5 -15  1532   68   16 R  8.0  0.1   0:07.30 l
 4483 roman      5 -15  1532   68   16 R  7.0  0.1   0:07.34 l
 4482 roman      5 -15  1532   68   16 R  1.0  0.1   0:05.84 l
 4481 roman      5 -15  1532   68   16 R  1.0  0.1   0:07.16 l
 4480 roman      5 -15  1532   68   16 R  3.3  0.1   0:07.00 l
 4479 roman      5 -15  1532   68   16 R  1.0  0.1   0:06.66 l
 4478 roman      5 -15  1532   68   16 R  8.6  0.1   0:06.96 l
 4477 roman      5 -15  1532   68   16 R  8.6  0.1   0:07.63 l
 4476 roman      5 -15  1532   68   16 R  9.6  0.1   0:07.38 l
 4475 roman      5 -15  1532   68   16 R  1.3  0.1   0:07.09 l
 4474 roman      5 -15  1532   68   16 R  2.3  0.1   0:07.97 l
 4473 roman      5 -15  1532  296  244 R  1.0  0.2   0:07.73 l

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 21:15                                                 ` Roman Zippel
@ 2007-08-10 21:36                                                   ` Ingo Molnar
  2007-08-10 22:50                                                     ` Roman Zippel
  2007-08-11  0:30                                                     ` Ingo Molnar
  2007-08-11  5:15                                                   ` Willy Tarreau
  1 sibling, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-10 21:36 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Willy Tarreau, Michael Chang, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Well, I've sent him the stuff now...

received it - thanks alot, looking at it!

> It's not like I haven't given him anything, he already has the test 
> programs, he already knows the system configuration.

one more small thing: could you please send your exact .config (Mike 
asked for that too, and i too on two prior occasions). Sometimes 
unexpected little details in the .config make a difference, we are not 
asking you that because we are second-guessing you in any way, the 
reason is simple: i frequently boot _the very .config that others use_, 
and see surprising reproducability of bugs that i couldnt trigger 
before. It's standard procedure to just pick up the .config of others to 
eliminate a whole bunch of degrees of freedom for a bug to hide behind - 
and your "it's a pretty standard config" description doesnt really 
achieve that. It probably wont make a real difference, but it's really 
easy for you to send and it's still very useful when one tries to 
eliminate possibilities and when one wants to concentrate on the 
remaining possibilities alone. Thanks again,

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 19:47                                               ` Willy Tarreau
@ 2007-08-10 21:15                                                 ` Roman Zippel
  2007-08-10 21:36                                                   ` Ingo Molnar
  2007-08-11  5:15                                                   ` Willy Tarreau
  0 siblings, 2 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-10 21:15 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Michael Chang, Ingo Molnar, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Fri, 10 Aug 2007, Willy Tarreau wrote:

> fortunately all bug reporters are not like you. It's amazing how long
> you can resist sending a simple bug report to a developer!

I'm more amazed how long Ingo can resist providing some explanations (not 
just about this problem).
It's not like I haven't given him anything, he already has the test 
programs, he already knows the system configuration.
Well, I've sent him the stuff now...

> Maybe you
> consider that you need to fix the bug by yourself after you understand
> the code,

Fixing the bug requires some knowledge what the code is intended to do.

> Please try to be a little bit more transparent if you really want the
> bugs fixed, and don't behave as if you wanted this bug to survive
> till -final.

Could you please ask Ingo the same? I'm simply trying to get some 
transparancy into the CFS design. Without further information it's 
difficult to tell, whether something is supposed to work this way or it's 
a bug.

In this case it's quite possible that due to a recent change my testcase 
doesn't work anymore. Should I consider the problem fixed or did it just 
go into hiding? Without more information it's difficult to verify this 
independently.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 17:25                                             ` Roman Zippel
  2007-08-10 19:44                                               ` Ingo Molnar
@ 2007-08-10 19:47                                               ` Willy Tarreau
  2007-08-10 21:15                                                 ` Roman Zippel
  1 sibling, 1 reply; 123+ messages in thread
From: Willy Tarreau @ 2007-08-10 19:47 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Michael Chang, Ingo Molnar, Linus Torvalds, Andi Kleen,
	Mike Galbraith, Andrew Morton, linux-kernel

On Fri, Aug 10, 2007 at 07:25:57PM +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 10 Aug 2007, Michael Chang wrote:
> 
> > On 8/10/07, Roman Zippel <zippel@linux-m68k.org> wrote:
> > > Is there any reason to believe my analysis is wrong?
> > 
> > Not yet, but if you give Ingo what he wants (as opposed to what you're
> > giving him) it'll be easier for him to answer what's going wrong, and
> > perhaps "fix" the problem to boot.
> > 
> > (The scripts gives info about CPU characteristics, interrupts,
> > modules, etc. -- you know, all those "unknown" variables.)
> 
> He already has most of this information and the trace shows _exactly_ 
> what's going on. All this information should be more than enough to allow 
> an initial judgement whether my analysis is correct.
> Also none of this information is needed to explain the CFS logic a little 
> more, which I'm still waiting for...

Roman,

fortunately all bug reporters are not like you. It's amazing how long
you can resist sending a simple bug report to a developer! Maybe you
consider that you need to fix the bug by yourself after you understand
the code, but if you systematically refuse to return the small information
Ingo asks you, we will have to wait for some more cooperative users to be
hit by the same bug when 2.6.23 is released, which is stupid.

I thought you could at least understand that one developer who is used
to read traces from the same tool every day will be far faster at decoding
a trace from the same tool than trying to figure out what your self-maid
dump means.

It's the exact same reason I ask for pcap files when people send me
outputs of tcpdumps without the information I *need*.

I you definitely do not want to cooperate, stop asking for a personal
explanation, and go figure by yourself how the code works. BTW, in the
trace you "kindly offered" in exchange for the cfs-debug-info dump,
you show several useful variables, but nothing says where they are
captured. And as you can see, they're changing. That's a fantastic
trace for a developer, really...

Please try to be a little bit more transparent if you really want the
bugs fixed, and don't behave as if you wanted this bug to survive
till -final.

Thanks,
Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 17:25                                             ` Roman Zippel
@ 2007-08-10 19:44                                               ` Ingo Molnar
  2007-08-10 19:47                                               ` Willy Tarreau
  1 sibling, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-10 19:44 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Michael Chang, Linus Torvalds, Andi Kleen, Mike Galbraith,
	Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > Not yet, but if you give Ingo what he wants (as opposed to what 
> > you're giving him) it'll be easier for him to answer what's going 
> > wrong, and perhaps "fix" the problem to boot.
> > 
> > (The scripts gives info about CPU characteristics, interrupts, 
> > modules, etc. -- you know, all those "unknown" variables.)
> 
> He already has most of this information and the trace shows _exactly_ 
> what's going on. [...]

i'll need the other bits of information too to have a complete picture 
about what's going on while your test is running - to maximize the 
chances of me being able to fix it. I'm a bit perplexed (and a bit 
worried) about this - you've spent _far_ more effort to _not send_ that 
script output (captured while the workload is running) than it would 
have taken to do it :-/ If you'd like me to fix bugs then please just 
send it (in private mail if you want) - or give me an ssh login to that 
box - whichever variant you prefer. Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 16:54                                           ` Michael Chang
@ 2007-08-10 17:25                                             ` Roman Zippel
  2007-08-10 19:44                                               ` Ingo Molnar
  2007-08-10 19:47                                               ` Willy Tarreau
  0 siblings, 2 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-10 17:25 UTC (permalink / raw)
  To: Michael Chang
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Mike Galbraith,
	Andrew Morton, linux-kernel

Hi,

On Fri, 10 Aug 2007, Michael Chang wrote:

> On 8/10/07, Roman Zippel <zippel@linux-m68k.org> wrote:
> > Is there any reason to believe my analysis is wrong?
> 
> Not yet, but if you give Ingo what he wants (as opposed to what you're
> giving him) it'll be easier for him to answer what's going wrong, and
> perhaps "fix" the problem to boot.
> 
> (The scripts gives info about CPU characteristics, interrupts,
> modules, etc. -- you know, all those "unknown" variables.)

He already has most of this information and the trace shows _exactly_ 
what's going on. All this information should be more than enough to allow 
an initial judgement whether my analysis is correct.
Also none of this information is needed to explain the CFS logic a little 
more, which I'm still waiting for...

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 16:47                                           ` Mike Galbraith
@ 2007-08-10 17:19                                             ` Roman Zippel
  0 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-10 17:19 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel

Hi,

On Fri, 10 Aug 2007, Mike Galbraith wrote:

> I guess I'm going to have to give up on trying to reproduce this... my
> 3GHz P4 is just not getting there from here.  Last attempt, compiled UP,
> HZ=1000 dynticks, full preempt and highres timers fwiw.
> 
> 6392 root      20   0  1696  332  248 R 25.5  0.0   3:00.14 0 lt
> 6393 root      20   0  1696  332  248 R 24.9  0.0   3:00.15 0 lt
> 6391 root      20   0  1696  488  404 R 24.7  0.0   3:00.20 0 lt
> 6394 root      20   0  2888 1232 1028 R 24.5  0.1   2:58.58 0 sh

Except for UP and HZ=1000, everything else is pretty much turned off.
If you use a very recent kernel, the problem may not be visible like this 
anymore.
It may be a bit easier to reproduce, if you change the end time t0 in lt.c 
a little. Also try to start the busy loop first.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 13:52                                         ` Roman Zippel
  2007-08-10 14:18                                           ` Ingo Molnar
  2007-08-10 16:47                                           ` Mike Galbraith
@ 2007-08-10 16:54                                           ` Michael Chang
  2007-08-10 17:25                                             ` Roman Zippel
  2 siblings, 1 reply; 123+ messages in thread
From: Michael Chang @ 2007-08-10 16:54 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Mike Galbraith,
	Andrew Morton, linux-kernel

On 8/10/07, Roman Zippel <zippel@linux-m68k.org> wrote:
> Is there any reason to believe my analysis is wrong?

Not yet, but if you give Ingo what he wants (as opposed to what you're
giving him) it'll be easier for him to answer what's going wrong, and
perhaps "fix" the problem to boot.

(The scripts gives info about CPU characteristics, interrupts,
modules, etc. -- you know, all those "unknown" variables.)

And perhaps a patch to show what parts you commented out, too, so one
can tell if anything got broken (unintentionally).

-- 
Michael Chang

Please avoid sending me Word or PowerPoint attachments. Send me ODT,
RTF, or HTML instead.
See http://www.gnu.org/philosophy/no-word-attachments.html
Thank you.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 13:52                                         ` Roman Zippel
  2007-08-10 14:18                                           ` Ingo Molnar
@ 2007-08-10 16:47                                           ` Mike Galbraith
  2007-08-10 17:19                                             ` Roman Zippel
  2007-08-10 16:54                                           ` Michael Chang
  2 siblings, 1 reply; 123+ messages in thread
From: Mike Galbraith @ 2007-08-10 16:47 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel

I guess I'm going to have to give up on trying to reproduce this... my
3GHz P4 is just not getting there from here.  Last attempt, compiled UP,
HZ=1000 dynticks, full preempt and highres timers fwiw.

6392 root      20   0  1696  332  248 R 25.5  0.0   3:00.14 0 lt
6393 root      20   0  1696  332  248 R 24.9  0.0   3:00.15 0 lt
6391 root      20   0  1696  488  404 R 24.7  0.0   3:00.20 0 lt
6394 root      20   0  2888 1232 1028 R 24.5  0.1   2:58.58 0 sh

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10 13:52                                         ` Roman Zippel
@ 2007-08-10 14:18                                           ` Ingo Molnar
  2007-08-10 16:47                                           ` Mike Galbraith
  2007-08-10 16:54                                           ` Michael Chang
  2 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-10 14:18 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > Also, could you please send me 
> > the cfs-debug-info.sh:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > captured _while_ the above workload is running. This is the third time 
> > i've asked for that :-)
> 
> Is there any reason to believe my analysis is wrong?

please first give me the debug data captured with the script above 
(while the workload is running) - so that i can see the full picture 
about what's happening. Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-10  5:49                                       ` Ingo Molnar
@ 2007-08-10 13:52                                         ` Roman Zippel
  2007-08-10 14:18                                           ` Ingo Molnar
                                                             ` (2 more replies)
  0 siblings, 3 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-10 13:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Fri, 10 Aug 2007, Ingo Molnar wrote:

> > I disabled the jiffies logic and the result is still the same, so this 
> > problem isn't related to resolution at all.
> 
> how did you disable the jiffies logic?

I commented it out.

> Also, could you please send me 
> the cfs-debug-info.sh:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> captured _while_ the above workload is running. This is the third time 
> i've asked for that :-)

Is there any reason to believe my analysis is wrong?
So far you haven't answered a single question about the CFS design...

Anyway, I give you something better - the raw trace data for 2ms:

1186747669.274790012: update_curr 0xc7fb06f0,479587,319708,21288884188,159880,7360532
1186747669.274790375: dequeue_entity 0xc7fb06f0,21280402988,159880
1186747669.274792580: sched 2848,2846,0xc7432cb0,-7520413
1186747669.274820987: update_curr 0xc7432ce0,29302,-130577,21288913490,1,-7680293
1186747669.274821269: dequeue_entity 0xc7432ce0,21296077409,1
1186747669.274821930: enqueue_entity 0xc7432ce0,21296593783,1
1186747669.274826979: update_curr 0xc7432ce0,5707,5707,21288919197,1,-7680294
1186747669.274827724: enqueue_entity 0xc7432180,21280919197,639451
1186747669.274829948: update_curr 0xc7432ce0,1553,-318172,21288920750,319726,-8000000
1186747669.274831878: sched 2846,2847,0xc7432150,8000000
1186747669.275789883: update_curr 0xc7432180,479797,319935,21289400547,159864,7360339
1186747669.275790295: dequeue_entity 0xc7432180,21280919197,159864
1186747669.275792439: sched 2847,2846,0xc7432cb0,-7520203
1186747669.275820819: update_curr 0xc7432ce0,29238,-130625,21289429785,1,-7680067
1186747669.275821109: dequeue_entity 0xc7432ce0,21296593783,1
1186747669.275821763: enqueue_entity 0xc7432ce0,21297109852,1
1186747669.275826887: update_curr 0xc7432ce0,5772,5772,21289435557,1,-7680068
1186747669.275827652: enqueue_entity 0xc7fb0ca0,21281435557,639881
1186747669.275829826: update_curr 0xc7432ce0,1549,-318391,21289437106,319941,-8000000
1186747669.275831584: sched 2846,2849,0xc7fb0c70,8000000

About the values:

update_curr: sched_entity, delta_fair, delta_mine, fair_clock, sleeper_bonus, wait_runtime
	(final values at the end of __update_curr)
{en,de}queue_entity: sched_entity, fair_key, sleeper_bonus
	(at the start of __enqueue_entity/__dequeue_entity)
sched: prev_pid,pid,current,wait_runtime
	(at the end of scheduling, note that current has a small structure 
	offset to sched_entity)

It starts with a timer task going to sleep, the busy loop runs for a few 
usec until the timer tick, the next timer task is woken up (sleeper_bonus 
is increased). Before switching the tasks, current task is updated and 
it's punished with the sleeper_bonus.

These tests are done without the recent updates, but they don't seem to
change the basic logic. AFAICT the change to __update_curr() only makes it 
more unpredictable, which task is punished with the sleeper_bonus.

So again, what is this logic _supposed_ to do?

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-09 23:14                                     ` Roman Zippel
  2007-08-10  5:49                                       ` Ingo Molnar
@ 2007-08-10  7:23                                       ` Mike Galbraith
  1 sibling, 0 replies; 123+ messages in thread
From: Mike Galbraith @ 2007-08-10  7:23 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel

On Fri, 2007-08-10 at 01:14 +0200, Roman Zippel wrote:
> Hi,

Greetings,

> On Wed, 1 Aug 2007, Ingo Molnar wrote:
> 
> > just to make sure, how does 'top' output of the l + "lt 3" testcase look 
> > like now on your laptop? Yesterday it was this:
> > 
> >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > 
> > and i'm still wondering how that output was possible.
> 
> I disabled the jiffies logic and the result is still the same, so this 
> problem isn't related to resolution at all.
> I traced it a little and what's happing is that the busy loop really only 
> gets little time, it only runs inbetween the timer tasks. When the timer 
> task is woken up __enqueue_sleeper() updates sleeper_bonus and a little 
> later when the busy loop is preempted __update_curr() is called a last 
> time and it's fully hit by the sleeper_bonus. So the timer tasks use less 
> time than they actually get and thus produce overflows, the busy loop OTOH 
> is punished and underflows.

I still can't reproduce this here.  Can you please send your .config, so
I can try again with a config as close to yours as possible?

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-09 23:14                                     ` Roman Zippel
@ 2007-08-10  5:49                                       ` Ingo Molnar
  2007-08-10 13:52                                         ` Roman Zippel
  2007-08-10  7:23                                       ` Mike Galbraith
  1 sibling, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-10  5:49 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > 
> > and i'm still wondering how that output was possible.
> 
> I disabled the jiffies logic and the result is still the same, so this 
> problem isn't related to resolution at all.

how did you disable the jiffies logic? Also, could you please send me 
the cfs-debug-info.sh:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

captured _while_ the above workload is running. This is the third time 
i've asked for that :-)

to establish that the basic sched_clock() behavior is sound on that box, 
could you please also run this tool:

   http://people.redhat.com/mingo/cfs-scheduler/tools/tsc-dump.c

please run it both while the system is idle, and while there's a CPU hog 
running:

  while :; do :; done &

and send me that output too? (it's 2x 60 lines only) Thanks!

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 19:05                                   ` Ingo Molnar
@ 2007-08-09 23:14                                     ` Roman Zippel
  2007-08-10  5:49                                       ` Ingo Molnar
  2007-08-10  7:23                                       ` Mike Galbraith
  0 siblings, 2 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-09 23:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> just to make sure, how does 'top' output of the l + "lt 3" testcase look 
> like now on your laptop? Yesterday it was this:
> 
>  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
>  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
>  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
>  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> 
> and i'm still wondering how that output was possible.

I disabled the jiffies logic and the result is still the same, so this 
problem isn't related to resolution at all.
I traced it a little and what's happing is that the busy loop really only 
gets little time, it only runs inbetween the timer tasks. When the timer 
task is woken up __enqueue_sleeper() updates sleeper_bonus and a little 
later when the busy loop is preempted __update_curr() is called a last 
time and it's fully hit by the sleeper_bonus. So the timer tasks use less 
time than they actually get and thus produce overflows, the busy loop OTOH 
is punished and underflows.
So it seems my initial suspicion was right and this logic is dodgy, what 
is it actually supposed to do? Why is some random task accounted with the 
sleeper_bonus?

bye, Roman

PS: Can I still expect answer about all the other stuff?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  4:38                           ` Matt Mackall
  2007-08-03  8:44                             ` Ingo Molnar
@ 2007-08-03  9:29                             ` Andi Kleen
  1 sibling, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2007-08-03  9:29 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Arjan van de Ven, Ingo Molnar, Roman Zippel, Mike Galbraith,
	Linus Torvalds, Andrew Morton, linux-kernel

Matt Mackall <mpm@selenic.com> writes:
> 
> Indeed. I'm just pointing out that not having TSC, fast HZ, no-HZ
> mode, or high-res timers should not be treated as an unusual
> circumstance. That's a PC-centric view.

The question is if it would be that hard to add TSC equivalent
sched_clock() support to more systems. At least a lot of CPUs I have
ever looked at had some kind of fast clock available. Perhaps it's
more laziness of the developers or cut'n'paste that these are not as
widely used as they should be?

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  4:38                           ` Matt Mackall
@ 2007-08-03  8:44                             ` Ingo Molnar
  2007-08-03  9:29                             ` Andi Kleen
  1 sibling, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-03  8:44 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Arjan van de Ven, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrew Morton, linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> > question is if it's significantly worse than before. With a 100 or 
> > 1000Hz timer, you can't expect perfect fairness just due to the 
> > extremely rough measurement of time spent...
> 
> Indeed. I'm just pointing out that not having TSC, fast HZ, no-HZ 
> mode, or high-res timers should not be treated as an unusual 
> circumstance. That's a PC-centric view.

actually, you dont need high-res or fast HZ or TSC to reduce those timer 
artifacts: all you need is _two_ (low-res, slow) hw clocks.

Most platforms do have that (even the really really cheap ones), but 
arches do not set up the scheduler tick one of them and the timer tick 
to the other, and to skew the periodic-timer programming setup a bit (by 
nature of physics they are usually already skewed a bit) so that the 
scheduler tick and timer tick are not coupled. This whole thing is not a 
big deal on embedded anyway. (you dont get students log in to the 
toaster or to the fridge to run timer exploits, do you? :-)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  4:31                             ` Arjan van de Ven
@ 2007-08-03  4:53                               ` Willy Tarreau
  0 siblings, 0 replies; 123+ messages in thread
From: Willy Tarreau @ 2007-08-03  4:53 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Matt Mackall, Ingo Molnar, Roman Zippel, Mike Galbraith,
	Linus Torvalds, Andrew Morton, linux-kernel

On Thu, Aug 02, 2007 at 09:31:19PM -0700, Arjan van de Ven wrote:
> On Fri, 2007-08-03 at 06:18 +0200, Willy Tarreau wrote:
> > On Thu, Aug 02, 2007 at 08:57:47PM -0700, Arjan van de Ven wrote:
> > > On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> > > > On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > > > > 
> > > > > * Roman Zippel <zippel@linux-m68k.org> wrote:
> > > > > 
> > > > > > [...] e.g. in this example there are three tasks that run only for 
> > > > > > about 1ms every 3ms, but they get far more time than should have 
> > > > > > gotten fairly:
> > > > > > 
> > > > > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > > > > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > > > > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > > > > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > > > > 
> > > > > Mike and me have managed to reproduce similarly looking 'top' output, 
> > > > > but it takes some effort: we had to deliberately run a non-TSC 
> > > > > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> > > > 
> > > > ..which is pretty much the state of play for lots of non-x86 hardware.
> > > 
> > > question is if it's significantly worse than before. With a 100 or
> > > 1000Hz timer, you can't expect perfect fairness just due to the
> > > extremely rough measurement of time spent...
> > 
> > Well, at least we're able to *measure* that task 'l' used 3.3% and
> > that tasks 'lt' used 32%.
> 
> It's not measured if you use jiffies level stuff. It's at best sampled!

But if we rely on the same sampling method, at least we will report
something consistent with what happens. And sampling is often the
correct method to get finer resolution on a macroscopic scale.

I mean, we're telling users that we include the "completely fair scheduler"
in 2.6.23, a scheduler which will ensure that all tasks get a fair share of
CPU time. A user starts top and sees 33%+32%+32+3% for 4 tasks while he
would have expected to see 25%+25%+25%+25%. You can try to explain users
that it's the fairest distribution, but they will have a hard time believing
it, especially when they measure the time spent on CPU with the "time"
command. OK this is all sampling, but we should try to avoid relying on
different sources of data for computation and reporting. Time and Top
should report something close to 4*25% for comparable tasks. And if not,
because of some sampling problem, maybe the scheduler cannot be that fair
in some situations, but either it should make use of the sampling time
and top use, or top and time should rely on the view of the scheduler.

I'll try to quickly hack up a program which makes use of rdtsc from
userspace to precisely measure user-space time, and disable TSC use
from the kernel to see how the values diverge.

Regards,
Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  3:57                         ` Arjan van de Ven
  2007-08-03  4:18                           ` Willy Tarreau
@ 2007-08-03  4:38                           ` Matt Mackall
  2007-08-03  8:44                             ` Ingo Molnar
  2007-08-03  9:29                             ` Andi Kleen
  1 sibling, 2 replies; 123+ messages in thread
From: Matt Mackall @ 2007-08-03  4:38 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrew Morton, linux-kernel

On Thu, Aug 02, 2007 at 08:57:47PM -0700, Arjan van de Ven wrote:
> On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> > On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > > 
> > > * Roman Zippel <zippel@linux-m68k.org> wrote:
> > > 
> > > > [...] e.g. in this example there are three tasks that run only for 
> > > > about 1ms every 3ms, but they get far more time than should have 
> > > > gotten fairly:
> > > > 
> > > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > > 
> > > Mike and me have managed to reproduce similarly looking 'top' output, 
> > > but it takes some effort: we had to deliberately run a non-TSC 
> > > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> > 
> > ..which is pretty much the state of play for lots of non-x86 hardware.
> 
> question is if it's significantly worse than before. With a 100 or
> 1000Hz timer, you can't expect perfect fairness just due to the
> extremely rough measurement of time spent...

Indeed. I'm just pointing out that not having TSC, fast HZ, no-HZ
mode, or high-res timers should not be treated as an unusual
circumstance. That's a PC-centric view.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  4:18                           ` Willy Tarreau
@ 2007-08-03  4:31                             ` Arjan van de Ven
  2007-08-03  4:53                               ` Willy Tarreau
  0 siblings, 1 reply; 123+ messages in thread
From: Arjan van de Ven @ 2007-08-03  4:31 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Matt Mackall, Ingo Molnar, Roman Zippel, Mike Galbraith,
	Linus Torvalds, Andrew Morton, linux-kernel

On Fri, 2007-08-03 at 06:18 +0200, Willy Tarreau wrote:
> On Thu, Aug 02, 2007 at 08:57:47PM -0700, Arjan van de Ven wrote:
> > On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> > > On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Roman Zippel <zippel@linux-m68k.org> wrote:
> > > > 
> > > > > [...] e.g. in this example there are three tasks that run only for 
> > > > > about 1ms every 3ms, but they get far more time than should have 
> > > > > gotten fairly:
> > > > > 
> > > > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > > > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > > > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > > > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > > > 
> > > > Mike and me have managed to reproduce similarly looking 'top' output, 
> > > > but it takes some effort: we had to deliberately run a non-TSC 
> > > > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> > > 
> > > ..which is pretty much the state of play for lots of non-x86 hardware.
> > 
> > question is if it's significantly worse than before. With a 100 or
> > 1000Hz timer, you can't expect perfect fairness just due to the
> > extremely rough measurement of time spent...
> 
> Well, at least we're able to *measure* that task 'l' used 3.3% and
> that tasks 'lt' used 32%.

It's not measured if you use jiffies level stuff. It's at best sampled!

>  If we're able to measure it, then that's
> already fine enough to be able to adjust future timeslices credits.
> Granted it may be rough for small periods (a few jiffies), but it
> should be fair for larger periods. Or at least it should *report*

but the testcase here uses a LOT shorter time than jiffies... not "a few
jiffies".

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  3:57                         ` Arjan van de Ven
@ 2007-08-03  4:18                           ` Willy Tarreau
  2007-08-03  4:31                             ` Arjan van de Ven
  2007-08-03  4:38                           ` Matt Mackall
  1 sibling, 1 reply; 123+ messages in thread
From: Willy Tarreau @ 2007-08-03  4:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Matt Mackall, Ingo Molnar, Roman Zippel, Mike Galbraith,
	Linus Torvalds, Andrew Morton, linux-kernel

On Thu, Aug 02, 2007 at 08:57:47PM -0700, Arjan van de Ven wrote:
> On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> > On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > > 
> > > * Roman Zippel <zippel@linux-m68k.org> wrote:
> > > 
> > > > [...] e.g. in this example there are three tasks that run only for 
> > > > about 1ms every 3ms, but they get far more time than should have 
> > > > gotten fairly:
> > > > 
> > > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > > 
> > > Mike and me have managed to reproduce similarly looking 'top' output, 
> > > but it takes some effort: we had to deliberately run a non-TSC 
> > > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> > 
> > ..which is pretty much the state of play for lots of non-x86 hardware.
> 
> question is if it's significantly worse than before. With a 100 or
> 1000Hz timer, you can't expect perfect fairness just due to the
> extremely rough measurement of time spent...

Well, at least we're able to *measure* that task 'l' used 3.3% and
that tasks 'lt' used 32%. If we're able to measure it, then that's
already fine enough to be able to adjust future timeslices credits.
Granted it may be rough for small periods (a few jiffies), but it
should be fair for larger periods. Or at least it should *report*
some fair distribution.

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-03  3:04                       ` Matt Mackall
@ 2007-08-03  3:57                         ` Arjan van de Ven
  2007-08-03  4:18                           ` Willy Tarreau
  2007-08-03  4:38                           ` Matt Mackall
  0 siblings, 2 replies; 123+ messages in thread
From: Arjan van de Ven @ 2007-08-03  3:57 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Ingo Molnar, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrew Morton, linux-kernel

On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > 
> > * Roman Zippel <zippel@linux-m68k.org> wrote:
> > 
> > > [...] e.g. in this example there are three tasks that run only for 
> > > about 1ms every 3ms, but they get far more time than should have 
> > > gotten fairly:
> > > 
> > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > 
> > Mike and me have managed to reproduce similarly looking 'top' output, 
> > but it takes some effort: we had to deliberately run a non-TSC 
> > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> 
> ..which is pretty much the state of play for lots of non-x86 hardware.

question is if it's significantly worse than before. With a 100 or
1000Hz timer, you can't expect perfect fairness just due to the
extremely rough measurement of time spent...



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 11:22                     ` Ingo Molnar
  2007-08-01 12:21                       ` Roman Zippel
@ 2007-08-03  3:04                       ` Matt Mackall
  2007-08-03  3:57                         ` Arjan van de Ven
  1 sibling, 1 reply; 123+ messages in thread
From: Matt Mackall @ 2007-08-03  3:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roman Zippel, Mike Galbraith, Linus Torvalds, Andrew Morton,
	linux-kernel

On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> 
> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > [...] e.g. in this example there are three tasks that run only for 
> > about 1ms every 3ms, but they get far more time than should have 
> > gotten fairly:
> > 
> >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> 
> Mike and me have managed to reproduce similarly looking 'top' output, 
> but it takes some effort: we had to deliberately run a non-TSC 
> sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.

..which is pretty much the state of play for lots of non-x86 hardware.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02  2:17                         ` Linus Torvalds
                                             ` (2 preceding siblings ...)
  2007-08-02 19:16                           ` Daniel Phillips
@ 2007-08-02 23:23                           ` Roman Zippel
  3 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-02 23:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Mike Galbraith, Ingo Molnar, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Linus Torvalds wrote:

> So I think it would be entirely appropriate to
> 
>  - do something that *approximates* microseconds.
> 
>    Using microseconds instead of nanoseconds would likely allow us to do 
>    32-bit arithmetic in more areas, without any real overflow.

The basic problem is that one needs a number of bits (at least 16) for 
normalization, which limits the time range one can work with. This means 
that 32 bit leaves only room for 1 millisecond resolution, the remainder 
could maybe saved and reused later.
So AFAICT using micro- or nanosecond resolution doesn't make much 
computational difference.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02 16:09                           ` Ingo Molnar
@ 2007-08-02 22:38                             ` Roman Zippel
  0 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-02 22:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Thu, 2 Aug 2007, Ingo Molnar wrote:

> Most importantly, CFS _already_ includes a number of measures that act 
> against too frequent math. So even though you can see 64-bit math code 
> in it, it's only rarely called if your clock has a low resolution - and 
> that happens all automatically! (see below the details of this buffered 
> delta math)
> 
> I have not seen Roman notice and mention any of these important details 
> (perhaps because he was concentrating on finding faults in CFS - which a 
> reviewer should do), but those measures are still very important for a 
> complete, balanced picture, especially if one focuses on overhead on 
> small boxes where the clock is low-resolution.
> 
> As Peter has said it in his detailed review of Roman's suggested 
> algorithm, our main focus is on keeping total complexity down - and we 
> are (of course) fundamentally open to changing the math behind CFS, we 
> ourselves tweaked it numerous times, it's not cast into stone in any 
> way, shape or form.

You're comparing apples with oranges, I explicitely said:

"At this point I'm not that much interested in a few localized 
optimizations, what I'm interested in is how can this optimized at the 
design level"

IMO it's very important to keep computational and algorithmic complexity 
separately, I want to concentrate on the latter, so unless you can _prove_ 
that a similiar set of optimizations is impossible within my example, I'm 
going to ignore them for now. CFS has already gone through several 
versions of optimization and tuning, expecting the same from my design 
prototype is a little confusing...

I want to analyze the foundation CFS is based on, in the review I 
mentioned a number of other issues and design related questions. If you 
need more time, that's fine, but I'd appreciate more background 
information related to that and not that you only jump on the more trivial 
issues.

> In Roman's variant of CFS's algorithm the variables are 32-bit, but the 
> error is rolled forward in separate fract_* (fractional) 32-bit 
> variables, so we still have 32+32==64 bit of stuff to handle. So we 
> think that in the end such a 32+32 scheme would be more complex (and 
> anyone who took a look at fs2.c would i think agree - it took Peter a 
> day to decypher the math!)

Come on, Ingo, you can do better than that, I did mention in my review 
some of the requirements for the data types.
I'm amazed how you can get to that judgement so quickly, could you please 
substantiate that a little more?

I admit that the lack of source comments is an open invitation for further 
questions and Peter did exactly this and his comments were great - I'm 
hoping for more like that. You OTOH jump to conclusions based on a partial 
understanding what I'm actually trying to do.
Ingo, how about you provide some of the mathematical prove CFS is based 
on? Can you prove that the rounding errors are irrelevant? Can you prove 
that all the limit checks can have no adverse effect? I tried that and I'm 
not entirely convinced of that, but maybe it's just me, so I'd love to see 
someone else's attempt at this.
A major goal of my design is it to be able to define the limits within the 
scheduler is working correctly, so I know which information is relevant 
and what can be approximated.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02  2:17                         ` Linus Torvalds
  2007-08-02  4:57                           ` Willy Tarreau
  2007-08-02 16:09                           ` Ingo Molnar
@ 2007-08-02 19:16                           ` Daniel Phillips
  2007-08-02 23:23                           ` Roman Zippel
  3 siblings, 0 replies; 123+ messages in thread
From: Daniel Phillips @ 2007-08-02 19:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, Andi Kleen, Mike Galbraith, Ingo Molnar,
	Andrew Morton, linux-kernel

Hi Linus,

On Wednesday 01 August 2007 19:17, Linus Torvalds wrote:
>    And the "approximates" thing would be about the fact that we don't
>    actually care about "absolute" microseconds as much as something
> that is in the "roughly a microsecond" area. So if we say "it doesn't
> have to be microseconds, but it should be within a factor of two of a
> ms", we could avoid all the expensive divisions (even if they turn
> into multiplications with reciprocals), and just let people *shift*
> the CPU counter instead.

On that theme, expressing the subsecond part of high precision time in 
decimal instead of left-aligned binary always was an insane idea.  
Applications end up with silly numbers of multiplies and divides 
(likely as not incorrect) whereas they would often just need a simple 
shift as you say, if the tv struct had been defined sanely from the 
start.  As a bonus, whenever precision gets bumped up, the new bits 
appear on the right in formerly zero locations on the right, meaning 
little if any code needs to change.  What we have in the incumbent libc 
timeofday scheme is the moral equivalent of BCD.

Of course libc is unlikely ever to repent, but we can at least put off 
converting into the awkward decimal format until the last possible 
instant.  In other words, I do not see why xtime is expressed as a tv 
instead of simple 32.32 fixed point.  Perhaps somebody can elucidate 
me?

Regards,

Daniel

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 14:49                     ` Peter Zijlstra
@ 2007-08-02 17:36                       ` Roman Zippel
  0 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-02 17:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Peter Zijlstra wrote:

> Took me most of today trying to figure out WTH you did in fs2.c, more
> math and fundamental explanations would have been good. So please bear
> with me as I try to recap this thing. (No, your code was very much _not_
> obvious, a few comments and broken out functions would have made a world
> of a difference)

Thanks for the effort though. :)
I know I'm not the best explaining these things, so I really appreciate 
the questions, so I know what to concentrate on.

> So, for each task we keep normalised time
> 
>  normalised time := time/weight
> 
> using Bresenham's algorithm we can do this prefectly (up until a renice
> - where you'd get errors)
> 
>         avg_frac += weight_inv
>         
>         weight_inv = X / weight
>         
>         avg = avg_frac / weight0_inv
>         
>         weight0_inv = X / weight0
>         
>         avg = avg_frac / (X / weight0) 
>             = (X / weight) / (X / weight0) 
>             = X / weight * weight0 / X 
>             = weight0 / weight
>         
> 
> So avg ends up being in units of [weight0/weight].
> 
> Then, in order to allow sleeping, we need to have a global clock to sync
> with. Its this global clock that gave me headaches to reconstruct.
> 
> We're looking for a time like this:
> 
>   rq_time := sum(time)/sum(weight)
> 
> And you commented that the /sum(weight) part is where CFS obtained its
> accumulating rounding error? (I'm inclined to believe the error will
> statistically be 0, but I'll readily accept otherwise if you can show a
> practical 'exploit')
> 
> Its not obvious how to do this using modulo logic like Bresenham because
> that would involve using a gcm of all possible weights.

I think I've sent you off into the wrong direction somehow. Sorry. :)

Let's ignore the average for a second, normalized time is maintained as:

	normalized time := time * (2^16 / weight)

The important point is that I keep the value in full resolution of 2^-16 
vsec units (vsec for virtual second or sec/weight, where every tasks gets 
weight seconds for every virtual second, to keep things simpler I also 
omit the nano prefix from the units for a moment). Compared to that CFS 
maintains a global normalized value in 1 vsec units.
Since I don't round the value down I avoid the accumulating error, this 
means that 

	time_norm += time_delta1 * (2^16 / weight)
	time_norm += time_delta2 * (2^16 / weight)

is the same as

	time_norm += (time_delta1 + time_delta2) * (2^16 / weight)

CFS for example does this

	delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);

in above terms this means

	time = time_delta * weight * (2^16 / weight_sum) / 2^16

The last shift now rounds the value down and if one does that 1000 times 
per second, the resolution of the value that is finally accounted to 
wait_runtime is also reduced appropriately.

The other rounding problem is based on that this term

	x * prio_to_weight[i] * prio_to_wmult[i] / 2^32

doesn't produce x for most values in that tables (the same applies to the 
weight sum), so if we have chains, where the values are converted from one 
scale to the other, a rounding error is produced. In CFS this happens now 
because wait_runtime is maintained in nanoseconds and fair_clock is a 
normalized value.

The problem here isn't that these errors might have a statistical 
relevance, as they are usually completely overshadowed by measurement 
errors anyway. The problem is that these errors exist at all, this means 
they have to be compensated somehow, so that they don't accumulate over 
time and then become significant. This also has to be seen in the context 
of the overflow checks. All this adds a number of variables to the system 
which considerably increases complexity and makes a thorough analysis 
quite challenging.

So to get back to the average, if you look for this

	rq_time := sum(time)/sum(weight)

you won't find it like this, this basically produces a weighted average 
and I agree this can't really be maintained via the modulo logic (at least 
AFAICT), so I'm using a simple average instead, so if we have:

	time_norm = time/weight

we can write your rq_time like this:

	weighted_avg = sum_{i}^{N}(time_norm_{i}*weight_{i})/sum_{i}^{N}(weight_{i})

this is the formula for a weighted average, so we can aproximate the value 
using a simple average instead:

	avg = sum_{i}^{N}(time_norm_{i})/N

This sum is now what I maintain at runtime incrementally:

	time_{i} = sum_{j}^{S}(time_{j})

	time_norm_{i} = time_{i}/weight_{i}
	              = sum_{j}^{S}(time_{j})/weight_{i}
	              = sum_{j}^{S}(time_{j}/weight_{i})

If I add this up and add weigth0 I get:

	avg*N*weigth0 = sum_{i}^{N}(time_norm_{i})*weight0

and now I have also the needed modulo factors.

The average probably could be further simplified by using a different 
approximation. The question is how perfect this average has to be. The 
average is only used to control when a task gets its share, currently 
higher priority tasks are already given a bit more preference or a sleep 
bonus is added.

In CFS the average already isn't perfect due to above rounding problems, 
otherwise the sum of all updated wait_runtime should be 0 and if a task 
with a wait_runtime value different from 0 is added, fair_clock would have 
to change too to keep the balance.

So unless someone has a brilliant idea, I guess we have to settle for the 
approximation of a perfect scheduler. :) My approach is insofar different 
that I at least maintain an accurate fair share and approximate based on 
this the scheduling decision. This has IMO the advantage that the 
scheduler function can be easily exchanged, one can do it the quick and 
dirty way or one can put in the extra effort to get it closer to 
perfection. Either way every task will get its fair share.

The scheduling function I used is rather simple:

	if (j >= 0 &&
	    ((int)(task[l].time_norm - (task[j].time_norm + SLICE(j))) >= 0 ||
	     ((int)(task[l].time_norm - task[j].time_norm) >= 0 &&
	      (int)(task[l].time_norm - (s + SLICE(l))) >= 0))) {

So a new task is selected if there is a higher priority task or if the 
task has used up its share (unless the other task has lower priority and 
already got its share). It would be interesting to use a dynamic (per 
task) time slice here, which should make it possible to control the 
burstiness that has been mentioned.

> I'm not sure all this is less complex than CFS, I'd be inclined to say
> it is more so.

The complexity is different, IMO the basic complexity to maintain the fair 
share is less and I can add arbitrary complexity to improve scheduling on 
top of it. Most of it comes now from maintaining the accurate average, it 
depends now on how the scheduling is finally done what other optimizations 
are possible.

> Also, I think you have an accumulating error on wakeup where you sync
> with the global clock but fully discard the fraction.

I hope not, otherwise the checks at the end should have triggered. :)
The per task requirement is

	time_norm = time_avg * weight0 + avg_fract

I don't simply discard it, it's accounted to time_norm and then synced to 
the global average.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02  2:17                         ` Linus Torvalds
  2007-08-02  4:57                           ` Willy Tarreau
@ 2007-08-02 16:09                           ` Ingo Molnar
  2007-08-02 22:38                             ` Roman Zippel
  2007-08-02 19:16                           ` Daniel Phillips
  2007-08-02 23:23                           ` Roman Zippel
  3 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-02 16:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> It would be better, I suspect, to make the scheduler clock totally 
> distinct from the other clock sources (many architectures have per-cpu 
> cycle counters), and *not* try to even necessarily force it to be a 
> "time-based" one.

yeah. Note that i largely detached sched_clock() from the GTOD 
clocksources already in CFS, so part of this is already implemented and 
the intention is clear. For example, when the following happens:

   Marking TSC unstable due to: possible TSC halt in C2.
   Clocksource tsc unstable (delta = -71630388 ns)

sched_clock() does _not_ stop using the TSC. It is very careful with the 
TSC value, it checks against wraps, jumping, etc. (the whole rq_clock() 
wrapper around sched_clock()), but still tries to use the highest 
resolution time source possible, even if that time source is not good 
enough for GTOD's purposes anymore. So the scheduler clock is already 
largely detached from other clocksources in the system.

> It would still be true that something that is purely based on timer 
> ticks will always be liable to have rounding errors that will 
> inevitably mean that you don't get good fairness - tuning threads to 
> run less than a timer tick at a time would effectively "hide" them 
> from the scheduler accounting. However, in the end, I think that's 
> pretty much unavoidable.

Note that there is a relatively easy way of reducing the effects of such 
intentional coupling: turn on CONFIG_HIGH_RES_TIMERS. That decouples the 
scheduler tick from the jiffy tick and works against such 'exploits' - 
_even_ if the scheduler clock is otherwise low resolution. Also enable 
CONFIG_NO_HZ and the whole thing (of when the scheduler tick kicks in) 
becomes very hard to predict.

[ So while in a low-res clock situation scheduling will always be less 
  precise, with hres-timers and dynticks we have a natural 'random 
  sampler' mechanism so that no task can couple to the scheduler tick - 
  accidentally or even intentionally.

  The only 'unavoidable coupling' scenario is when the hardware has only 
  a single, low-resolution time sampling method. (that is pretty rare 
  though, even in the ultra-embedded space. If a box has two independent 
  hw clocks, even if they are low resolution, the timer tick can be
  decoupled from the scheduler tick.) ]

> I have to say, it would be interesting to try to use 32-bit 
> arithmetic.

yeah. We tried to do as much of that as possible, please read on below 
for (many) more details. There's no short summary i'm afraid :-/

Most importantly, CFS _already_ includes a number of measures that act 
against too frequent math. So even though you can see 64-bit math code 
in it, it's only rarely called if your clock has a low resolution - and 
that happens all automatically! (see below the details of this buffered 
delta math)

I have not seen Roman notice and mention any of these important details 
(perhaps because he was concentrating on finding faults in CFS - which a 
reviewer should do), but those measures are still very important for a 
complete, balanced picture, especially if one focuses on overhead on 
small boxes where the clock is low-resolution.

As Peter has said it in his detailed review of Roman's suggested 
algorithm, our main focus is on keeping total complexity down - and we 
are (of course) fundamentally open to changing the math behind CFS, we 
ourselves tweaked it numerous times, it's not cast into stone in any 
way, shape or form.

> I also think it's likely a mistake to do a nanosecond resolution. 
> That's part of what forces us to 64 bits, and it's just not even an 
> *interesting* resolution.

yes, very much not interesting, and we did not pick nanoseconds because 
we find anything "interesting" in that timescale. Firstly, before i go 
into the thinking behind nanoseconds, microseconds indeed have 
advantages too, so the choice was not easy, see the arguments in favor 
of microseconds below at: [*].

There are two fundamental reasons for nanoseconds:

 1) they _automatically_ act as a 'carry-over' for fractional math and
    thus reduce rounding errors. As Roman has noticed we dont care much
    about math rounding errors in the algorithm: _exactly_ because we
    *dont have to* care about rounding errors: we've got extra 10
    "uninteresting" bits in the time variables to offset the effects of 
    rounding errors and to carry over fractionals.

    ( Sidenote: in fact we had some simple extra anti-rounding-error 
      code in CFS and removed it because it made no measurable 
      difference. So even in the current structure there's additional 
      design reserves that we could tap before having to go to another 
      math model. All we need is a testcase that demonstrates rounding 
      errors. Roman's testcase was _not_ a demonstration of math 
      rounding errors, it was about clock granularity! )

 2) i tried microseconds resolution once (it's easy) but on fast hw it 
    already showed visible accounting/rounding artifacts in 
    high-frequency scheduling workloads, which, if hw gets just a little 
    bit faster, will become pain.

    ( Sidenote: if a workload is rescheduling once every few 
      microseconds, then it very much matters to the balance of things 
      whether there's a fundamental +- 1 microsecond skew on who gets 
      accounted what. In fact the placement of sched_clock() call within 
      schedule() is already visible in practice in some testcases, 
      whether the runqueue-spinlock acquire spinning time is accounted 
      to the 'previous' or the 'next' task - despite that time being 
      sub-microsecond on average. Going to microseconds makes this too 
      coarse. )

I.e. microseconds are on the limit today on fast hardware, and 
nanoseconds give us an automatic buffer against rounding errors.

On _slow_ hardware, with a low-resolution clock, i very much agree that 
we should not do too frequent math, and there are already four 
independent measures that we did in CFS to keep the math overhead down:

Firstly, CFS fundamentally "buffers the math" via deltas, _everywhere_ 
in the fastpath:

        if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) {
                __update_curr(cfs_rq, curr, now);
                curr->delta_exec = 0;
        }

I.e. we only call the math routines if there was any delta. The beauty 
is that this "math buffering" works _automatically_ if there is a 
low-resolution sched_clock() present, because with a low-resolution 
clock a delta is only observed if a tick happens. I.e. in a 
high-frequency scheduling workload (the only place where scheduler 
overhead would be noticeable) all the CFS math is in a rare _slowpath_, 
that gets triggered only every 10 msecs or so! (if HZ=1000)

I.e. we didnt have to go down the (very painful) path of ktime_t 
split-model math and we didnt have to introduce a variable precision 
model for "scheduler time" either, because the delta math itself 
automatically buffers on slow boxes.

Secondly, note that all the key fastpath variables are already 32-bit 
(on 32-bit platforms):

        long                    wait_runtime;
        unsigned long           delta_fair_run;
        unsigned long           delta_fair_sleep;
        unsigned long           delta_exec;

The _timestamps_ are still 64-bit, but most of the actual math goes on 
in 32-bit delta variables. That's one of the advantages of doing deltas 
instead of absolute values.

Thirdly, if even this amount of buffering is not enough for an 
architecture, CFS also includes the sched_stat_granularity_ns tunable 
that allows the further reduction of the sampling frequency (and the 
frequency of us having to do the math) - so if the math overhead is a 
problem an architecture can set it.

Fourthly, in CFS there is a _single_ 64-bit division, and even for that 
division, the actual values passed to it are typically in a 32-bit 
range. Hence we've introduced the following optimization:

  static u64 div64_likely32(u64 divident, unsigned long divisor)
  {
  #if BITS_PER_LONG == 32
          if (likely(divident <= 0xffffffffULL))
                  return (u32)divident / divisor;
          do_div(divident, divisor);

          return divident;
  #else
          return divident / divisor;
  #endif
  }

Which, if the divident is in 32-bit range, does a 32-bit division.

About math related rounding errors mentioned by Roman (not to be 
confused with clock granularity rounding), in our analysis and in our 
experience rounding errors of the math were never an issue in CFS, due 
to the extra buffering that nanosecs gives - i tweaked it a bit around 
CFSv10 but it was unprovable to have any effect. That's one of the 
advantages of working with nanoseconds: the fundamental time unit 
includes about 10 "low bits" that can carry over much of the error and 
reduce rounding artifacts. And even that math rounding errors we believe 
centers around 0.

In Roman's variant of CFS's algorithm the variables are 32-bit, but the 
error is rolled forward in separate fract_* (fractional) 32-bit 
variables, so we still have 32+32==64 bit of stuff to handle. So we 
think that in the end such a 32+32 scheme would be more complex (and 
anyone who took a look at fs2.c would i think agree - it took Peter a 
day to decypher the math!) - but we'll be happy to be surprised with 
patches of course :-)

	Ingo

[*] the main advantage of microseconds would be that we could use "u32"
    throughout to carry around the "now" variable (timestamp). That 
    property of microseconds _is_ tempting and would reduce the 
    task_struct footprint as well. But if we did that it would have
    ripple effects: we'd have to resort to (pretty complex and 
    non-obvious) math to act against rounding errors. We'd either have 
    to carry the rounding error with us in separate 32-bit variables (in 
    essence creating 32+32 bit 64-bit variable), or we'd have to shift 
    up the microseconds by say ... 10 binary positions ... in essence 
    bringing us back into the same nanoseconds range. And then the 
    wraparound problem of microseconds - 72 hours is not _that_ 
    unrealistic to trigger intentionally, so we have to do _something_
    about it to inhibit infinite starvation. We experimented
    around with all this and the overwhelming conclusion (so far) was 
    that trying to reduce timestamps back to 32 bits is just not worth 
    it. _Deltas_ should, can and _are_ 32-bit values already, even in 
    the nanosecond model. So all the buffering and delta logic gives us
    most of the 32-bit advantages already, without the disadvantages of
    microseconds.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
                                       ` (5 preceding siblings ...)
  2007-08-01 14:49                     ` Peter Zijlstra
@ 2007-08-02 15:46                     ` Ingo Molnar
  6 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-02 15:46 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> [...] With the increased text comes increased runtime memory usage, 
> e.g. task_struct increased so that only 5 of them instead 6 fit now 
> into 8KB.

yeah, thanks for the reminder, this is on my todo list. As i suspect you 
noticed it too, much of the task_struct size increase is not fundamental 
and not related to 64-bit math at all - it's simply debug and 
instrumentation overhead.

Look at the following table (i386, nodebug):

                         size 
                         ----
    pre-CFS              1328
        CFS              1472
        CFS+patch        1376 

the very small patch below gets rid of 96 bytes. And that's only the
beginning.

	Ingo

-------------------------------------------------->
---
 include/linux/sched.h |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -905,23 +905,28 @@ struct sched_entity {
 	struct rb_node		run_node;
 	unsigned int		on_rq;
 
+	u64			exec_start;
+	u64			sum_exec_runtime;
 	u64			wait_start_fair;
+	u64			sleep_start_fair;
+
+#ifdef CONFIG_SCHEDSTATS
 	u64			wait_start;
-	u64			exec_start;
+	u64			wait_max;
+	s64			sum_wait_runtime;
+
 	u64			sleep_start;
-	u64			sleep_start_fair;
-	u64			block_start;
 	u64			sleep_max;
+	s64			sum_sleep_runtime;
+
+	u64			block_start;
 	u64			block_max;
 	u64			exec_max;
-	u64			wait_max;
-	u64			last_ran;
 
-	u64			sum_exec_runtime;
-	s64			sum_wait_runtime;
-	s64			sum_sleep_runtime;
 	unsigned long		wait_runtime_overruns;
 	unsigned long		wait_runtime_underruns;
+#endif
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	struct sched_entity	*parent;
 	/* rq on which this entity is (to be) queued: */

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02  4:57                           ` Willy Tarreau
@ 2007-08-02 10:43                             ` Andi Kleen
  2007-08-02 10:07                               ` Willy Tarreau
  0 siblings, 1 reply; 123+ messages in thread
From: Andi Kleen @ 2007-08-02 10:43 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Roman Zippel, Andi Kleen, Mike Galbraith,
	Ingo Molnar, Andrew Morton, linux-kernel

Willy Tarreau <w@1wt.eu> writes:

>(I remember it could not play with registers renaming, etc...).

This has changed in recent gccs. It doesn't force register pairs anymore.
Given the code is still not that good, but some of the worst sins are gone

> However, I undertand why Ingo chose to use 64 bits. It has the advantage
> that the numbers never wrap within 584 years. I'm well aware that it's
> very difficult to keep tasks ordered according to a key which can wrap.

If you define an appropiate window and use some macros for the comparisons
it shouldn't be a big problem.

> But if we consider that we don't need to be more precise than the return
> value from gettimeofday() that all applications use,

gettimeofday() has too strict requirements, that make it unnecessarily slow
for this task.

> every hour. We may accept to recompute all keys every hour.

You don't need to recompute keys; just use careful comparisons using
subtractions and a window. TCPs have done that for decades. 

> Have all keys use 32-bit resolution, and monitor the 32nd bit. All tasks

If you're worried about wrapping in one hour why is wrapping in two
hours not a problem? 

I have one request though. If anybody adds anything complicated
for this please make it optional so that 64bit platforms are not 
burdened by it.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02 10:43                             ` Andi Kleen
@ 2007-08-02 10:07                               ` Willy Tarreau
  0 siblings, 0 replies; 123+ messages in thread
From: Willy Tarreau @ 2007-08-02 10:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Roman Zippel, Mike Galbraith, Ingo Molnar,
	Andrew Morton, linux-kernel

On Thu, Aug 02, 2007 at 12:43:29PM +0200, Andi Kleen wrote:
> > However, I undertand why Ingo chose to use 64 bits. It has the advantage
> > that the numbers never wrap within 584 years. I'm well aware that it's
> > very difficult to keep tasks ordered according to a key which can wrap.
> 
> If you define an appropiate window and use some macros for the comparisons
> it shouldn't be a big problem.

It should not, but the hardest thing is often to keep values sorted within a
window. When you store your values in a tree according to a key, it's not
always easy to do find the first one relative to a sliding offset.

> > But if we consider that we don't need to be more precise than the return
> > value from gettimeofday() that all applications use,
> 
> gettimeofday() has too strict requirements, that make it unnecessarily slow
> for this task.

maybe we would use (TSC >> (x bits)) when TSC is available, or jiffies in other
cases. I believe that people not interested in high time accuracies are not
much interested in perfect fairness either. However, we should do the best to
avoid any form of starvation, otherwise we jump back to the problem with the
current scheduler.

> > every hour. We may accept to recompute all keys every hour.
> 
> You don't need to recompute keys; just use careful comparisons using
> subtractions and a window. TCPs have done that for decades. 
> 
> > Have all keys use 32-bit resolution, and monitor the 32nd bit. All tasks
> 
> If you're worried about wrapping in one hour why is wrapping in two
> hours not a problem? 

I've not said I was worried about that (or maybe I explained myself poorly).
Even if it was once a minute, it would not bother me. I wanted to explain
that if the solution requires to do it that often, it should be acceptable.

> I have one request though. If anybody adds anything complicated
> for this please make it optional so that 64bit platforms are not 
> burdened by it.

Fair enough. But the solution is not there yet. Maybe once it's there, it
will be better than current design for all platforms and the only difference
will be the accuracy due to optional use of the TSC.

> -Andi

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-02  2:17                         ` Linus Torvalds
@ 2007-08-02  4:57                           ` Willy Tarreau
  2007-08-02 10:43                             ` Andi Kleen
  2007-08-02 16:09                           ` Ingo Molnar
                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 123+ messages in thread
From: Willy Tarreau @ 2007-08-02  4:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, Andi Kleen, Mike Galbraith, Ingo Molnar,
	Andrew Morton, linux-kernel

On Wed, Aug 01, 2007 at 07:17:51PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 1 Aug 2007, Roman Zippel wrote:
> > 
> > I'm not so sure about that. sched_clock() has to be fast, so many archs 
> > may want to continue to use jiffies. As soon as one does that one can also 
> > save a lot of computational overhead by using 32bit instead of 64bit.
> > The question is then how easy that is possible.
> 
> I have to say, it would be interesting to try to use 32-bit arithmetic.
> 
> I also think it's likely a mistake to do a nanosecond resolution. That's 
> part of what forces us to 64 bits, and it's just not even an *interesting* 
> resolution.

I would add that I have been bothered by the 64-bit arithmetics when
trying to see what could be improved in the code. In fact, it's very
hard to optimize anything when you have arithmetics on integers larger
than the CPU's, and gcc is known not to emit very good code in this
situation (I remember it could not play with registers renaming, etc...).

However, I undertand why Ingo chose to use 64 bits. It has the advantage
that the numbers never wrap within 584 years. I'm well aware that it's
very difficult to keep tasks ordered according to a key which can wrap.

But if we consider that we don't need to be more precise than the return
value from gettimeofday() that all applications use, we see that a bunch
of microseconds is enough. 32 bits at the microsecond level wraps around
every hour. We may accept to recompute all keys every hour. It's not that
dramatic. The problem is how to detect that we will need to.

I remember a trick used by Tim Schmielau in his jiffies64 patch for 2.4.
He kept a copy of the highest bit of the lower word in the lowest bit of
the higher word, and considered that the lower one could not wrap before
we could check it. I liked this approach, which could be translated here
in something like the following :

Have all keys use 32-bit resolution, and monitor the 32nd bit. All tasks
must have the same value in this bit, otherwise we consider that their
keys have wrapped. The "current" value of this bit is copied somewhere.
When we walk the tree and find a task with a key which does not have its
32nd bit equal to the current value, it means that this key has wrapped,
so we have to use this information in our arithmetics.

When all keys have their 32nd bit different from the "current" value,
then we switch this value to reflect the new 32nd bit, and everything is
in sync again. The only requirement is that no key wraps around before
the "current" value is switched. This implies that no couple of tasks
could have their keys distant by more than 31 bits (35 minutes), which
seems reasonable. If we can recompute all tasks' keys when all of them
have wrapped, then we do not have to store the "current" bit value
anymore, and consider that it is always zero instead (I don't know if
the code permits this).

It is possible that using the 32nd bit to detect the wrapping may impose
us to perform some computations on 33 bits. If this is the case, then it
would be fine if we reduced the range to 31 bits, with all tasks distant
from at most 30 bits (17 minutes).

Also, I remember that the key is signed. I've never experimented with the
tricks above on signed values, but we might be able to define something
like this for the higher bits :

  00 = positive, no wrap
  01 = positive, wrapped
  10 = negative, wrapped
  11 = negative, no wrap

I have no code to show, I just wanted to expose this idea. I know that if
Ingo likes it, he will beat everyone at implementing it ;-)

> 			Linus

Willy


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:33                       ` Roman Zippel
  2007-08-01 14:36                         ` Ingo Molnar
@ 2007-08-02  2:17                         ` Linus Torvalds
  2007-08-02  4:57                           ` Willy Tarreau
                                             ` (3 more replies)
  1 sibling, 4 replies; 123+ messages in thread
From: Linus Torvalds @ 2007-08-02  2:17 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Andi Kleen, Mike Galbraith, Ingo Molnar, Andrew Morton, linux-kernel



On Wed, 1 Aug 2007, Roman Zippel wrote:
> 
> I'm not so sure about that. sched_clock() has to be fast, so many archs 
> may want to continue to use jiffies. As soon as one does that one can also 
> save a lot of computational overhead by using 32bit instead of 64bit.
> The question is then how easy that is possible.

I have to say, it would be interesting to try to use 32-bit arithmetic.

I also think it's likely a mistake to do a nanosecond resolution. That's 
part of what forces us to 64 bits, and it's just not even an *interesting* 
resolution.

It would be better, I suspect, to make the scheduler clock totally 
distinct from the other clock sources (many architectures have per-cpu 
cycle counters), and *not* try to even necessarily force it to be a 
"time-based" one.

So I think it would be entirely appropriate to

 - do something that *approximates* microseconds.

   Using microseconds instead of nanoseconds would likely allow us to do 
   32-bit arithmetic in more areas, without any real overflow.

   And quite frankly, even on fast CPU's, the scheduler is almost 
   certainly not going to be able to take any advantage of the nanosecond 
   resolution. Just about anything takes a microsecond - including IO. I 
   don't think nanoseconds are worth the ten extra bits they need, if we 
   could do microseconds in 32 bits.

   And the "approximates" thing would be about the fact that we don't 
   actually care about "absolute" microseconds as much as something that 
   is in the "roughly a microsecond" area. So if we say "it doesn't have 
   to be microseconds, but it should be within a factor of two of a ms", 
   we could avoid all the expensive divisions (even if they turn into 
   multiplications with reciprocals), and just let people *shift* the CPU 
   counter instead.

   In fact, we could just say that we don't even care about CPU counters 
   that shift frequency - so what? It gets a bit further off the "ideal 
   microsecond", but the scheduler just cares about _relative_ times 
   between tasks (and that the total latency is within some reasonable 
   value), it doesn't really care about absolute time.

Hmm?

It would still be true that something that is purely based on timer ticks 
will always be liable to have rounding errors that will inevitably mean 
that you don't get good fairness - tuning threads to run less than a timer 
tick at a time would effectively "hide" them from the scheduler 
accounting. However, in the end, I think that's pretty much unavoidable. 
We should make sure that things mostly *work* for that situation, but I 
think it's ok to say that the *quality* of the fairness will obviously 
suffer (and possibly a lot in the extreme cases).

			Linus

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 18:01                                 ` Roman Zippel
@ 2007-08-01 19:05                                   ` Ingo Molnar
  2007-08-09 23:14                                     ` Roman Zippel
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 19:05 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Hi,
> 
> On Wed, 1 Aug 2007, Ingo Molnar wrote:
> 
> > Andi's theory cannot be true either, Roman's debug info also shows this 
> > /proc/<PID>/sched data:
> > 
> >   clock-delta              :                  95
> > 
> > that means that sched_clock() is in high-res mode, the TSC is alive and 
> > kicking and a sched_clock() call took 95 nanoseconds.
> > 
> > Roman, could you please help us with this mystery?
> 
> Actually, Andi is right. What I sent you was generated directly after 
> boot, as I had to reboot for the right kernel, so a little later 
> appeared this:
> 
> Aug  1 14:54:30 spit kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
> Aug  1 15:09:56 spit kernel: Clocksource tsc unstable (delta = 656747233 ns)
> Aug  1 15:09:56 spit kernel: Time: pit clocksource has been installed.

just to make sure, how does 'top' output of the l + "lt 3" testcase look 
like now on your laptop? Yesterday it was this:

 4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
 4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
 4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
 4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l

and i'm still wondering how that output was possible.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 17:41                             ` Ingo Molnar
@ 2007-08-01 18:14                               ` Roman Zippel
  0 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 18:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > [...] I didn't say 'sleeper starvation' or 'rounding error', these are 
> > your words and it's your perception of what I said.
> 
> Oh dear :-) It was indeed my preception that yesterday you said:

*sigh* and here you go off again nitpicking on a minor issue just to prove 
your point...
When I wrote the earlier stuff I hadn't realized it was resolution 
related, so things have to be put into proper context and you make it 
yourself a little easy by equating them.
Yippi, you found another small error I made, can we drop this now? Please?

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 17:50                               ` Ingo Molnar
@ 2007-08-01 18:01                                 ` Roman Zippel
  2007-08-01 19:05                                   ` Ingo Molnar
  0 siblings, 1 reply; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 18:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Mike Galbraith, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> Andi's theory cannot be true either, Roman's debug info also shows this 
> /proc/<PID>/sched data:
> 
>   clock-delta              :                  95
> 
> that means that sched_clock() is in high-res mode, the TSC is alive and 
> kicking and a sched_clock() call took 95 nanoseconds.
> 
> Roman, could you please help us with this mystery?

Actually, Andi is right. What I sent you was generated directly after 
boot, as I had to reboot for the right kernel, so a little later appeared 
this:

Aug  1 14:54:30 spit kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Aug  1 15:09:56 spit kernel: Clocksource tsc unstable (delta = 656747233 ns)
Aug  1 15:09:56 spit kernel: Time: pit clocksource has been installed.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 16:27                             ` Linus Torvalds
  2007-08-01 17:48                               ` Andi Kleen
@ 2007-08-01 17:50                               ` Ingo Molnar
  2007-08-01 18:01                                 ` Roman Zippel
  1 sibling, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 17:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Roman Zippel, Mike Galbraith, Andrew Morton, linux-kernel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 1 Aug 2007, Andi Kleen wrote:
> 
> > Ingo Molnar <mingo@elte.hu> writes:
> > 
> > > thanks. Just to make sure, while you said that your TSC was off on that 
> > > laptop, the bootup log of yours suggests a working TSC:
> > > 
> > >   Time: tsc clocksource has been installed.
> > 
> > Standard kernels often disable the TSC later after running a bit 
> > with it (e.g. on any cpufreq change without p state invariant TSC) 
> 
> I assume that what Roman hit was that he had explicitly disabled the 
> TSC because of TSC instability with the "notsc" kernel command line. 
> Which disabled is *entirely*.

but that does not appear to be the case, the debug info i got from Roman 
includes the following boot options:

  Kernel command line: auto BOOT_IMAGE=2.6.23-rc1-git9 ro root=306

there's no "notsc" option there.

Andi's theory cannot be true either, Roman's debug info also shows this 
/proc/<PID>/sched data:

  clock-delta              :                  95

that means that sched_clock() is in high-res mode, the TSC is alive and 
kicking and a sched_clock() call took 95 nanoseconds.

Roman, could you please help us with this mystery?

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 16:27                             ` Linus Torvalds
@ 2007-08-01 17:48                               ` Andi Kleen
  2007-08-01 17:50                               ` Ingo Molnar
  1 sibling, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2007-08-01 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Roman Zippel, Mike Galbraith,
	Andrew Morton, linux-kernel

> I assume that what Roman hit was that he had explicitly disabled the TSC 
> because of TSC instability with the "notsc" kernel command line. Which 
> disabled is *entirely*.

It might just have been cpufreq. That nearly hits everybody with cpufreq
unless you have a pstate invariant TSC; and that's pretty much
always the case on older laptops.

It used to not be that drastic, but since i386 switched to the
generic clock frame work it is like that :/

> These days, of course, we should notice it on our own, and just switch 
> away from the TSC as a reliable clock-source, but still allow it to be 
> used for the cases where absolute accuracy is not a big issue.

The rewritten sched_clock() i still have queued does just that. I planned
to submit it for .23, but then during later in deepth testing
on my machine park I found a show stopper that I couldn't fix on time.
Hopefully for .24

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 15:44                           ` Roman Zippel
@ 2007-08-01 17:41                             ` Ingo Molnar
  2007-08-01 18:14                               ` Roman Zippel
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 17:41 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > i tried your fl.c and if sched_clock() is high-resolution it's scheduled 
> > _perfectly_ by CFS:
> > 
> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >   5906 mingo     20   0  1576  244  196 R 71.2  0.0   0:30.11 l
> >   5909 mingo     20   0  1844  344  260 S  9.6  0.0   0:04.02 lt
> >   5907 mingo     20   0  1844  508  424 S  9.5  0.0   0:04.01 lt
> >   5908 mingo     20   0  1844  344  260 S  9.5  0.0   0:04.02 lt
> > 
> > if sched_clock() is low-resolution then indeed the 'lt' tasks will 
> > "hide":
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  2366 mingo     20   0  1576  248  196 R 99.9  0.0   0:07.95 loop_silent
> >     1 root      20   0  2132  636  548 S  0.0  0.0   0:04.64 init
> > 
> > but that's nothing new. CFS cannot conjure up time measurement 
> > methods that do not exist. If you have a low-res clock and if you 
> > create an app that syncs precisely to the tick of that clock via 
> > timers that run off that exact tick then there's nothing the 
> > scheduler can do about it. It is false to charachterise this as 
> > 'sleeper starvation' or 'rounding error' like you did. No amount of 
> > rounding logic can create a high-resolution clock out of thin air.
> 
> [...] I didn't say 'sleeper starvation' or 'rounding error', these are 
> your words and it's your perception of what I said.

Oh dear :-) It was indeed my preception that yesterday you said:

| A problem here is that this can be exploited, if a job is spread over 
|                        ^^^^^^^^^^^^^^^^^^^^^
| a few threads, they can get more time relativ to other tasks, e.g. in 
|                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| this example there are three tasks that run only for about 1ms every 
| 3ms, but they get far more time than should have gotten fairly:
|          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
| 4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
| 4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
| 4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l

 [ http://lkml.org/lkml/2007/7/31/668 ]

( the underlined portion, in other words, is called 'starvation'.)

And again today, i clearly perceived you to say:

| > in that case 'top' accounting symptoms similar to the above are not 
| > due to the scheduler starvation you suspected, but due the effect of 
| > a low-resolution scheduler clock and a tightly coupled 
| > timer/scheduler tick to it.
|
| Well, it magnifies the rounding problems in CFS.

 [ http://lkml.org/lkml/2007/8/1/153 ]

But you are right, that must be my perception alone, you couldnt 
possibly have said any of that =B-)

Or are you perhaps one of those who claims that saying something 
analogous to sleeper starvation does not equal to talking about 'sleeper 
starvation' and saying something about 'rounding problems in CFS' does 
in no way mean you were talking about rounding errors? :-)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 15:07                         ` Ingo Molnar
@ 2007-08-01 17:10                           ` Andi Kleen
  2007-08-01 16:27                             ` Linus Torvalds
  0 siblings, 1 reply; 123+ messages in thread
From: Andi Kleen @ 2007-08-01 17:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roman Zippel, Mike Galbraith, Linus Torvalds, Andrew Morton,
	linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> thanks. Just to make sure, while you said that your TSC was off on that 
> laptop, the bootup log of yours suggests a working TSC:
> 
>   Time: tsc clocksource has been installed.

Standard kernels often disable the TSC later after running a bit 
with it (e.g. on any cpufreq change without p state invariant TSC) 

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 17:10                           ` Andi Kleen
@ 2007-08-01 16:27                             ` Linus Torvalds
  2007-08-01 17:48                               ` Andi Kleen
  2007-08-01 17:50                               ` Ingo Molnar
  0 siblings, 2 replies; 123+ messages in thread
From: Linus Torvalds @ 2007-08-01 16:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Roman Zippel, Mike Galbraith, Andrew Morton, linux-kernel



On Wed, 1 Aug 2007, Andi Kleen wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > thanks. Just to make sure, while you said that your TSC was off on that 
> > laptop, the bootup log of yours suggests a working TSC:
> > 
> >   Time: tsc clocksource has been installed.
> 
> Standard kernels often disable the TSC later after running a bit 
> with it (e.g. on any cpufreq change without p state invariant TSC) 

I assume that what Roman hit was that he had explicitly disabled the TSC 
because of TSC instability with the "notsc" kernel command line. Which 
disabled is *entirely*.

That *used* to be the right thing to do, since the gettimeofday() logic 
originally didn't know about TSC instability, and it just resulted in 
somewhat flaky timekeeping.

These days, of course, we should notice it on our own, and just switch 
away from the TSC as a reliable clock-source, but still allow it to be 
used for the cases where absolute accuracy is not a big issue.

So I suspect that Roman - by virtue of being an old-timer - ends up having 
a workaround for an old problem that isn't needed, and that in turn ends 
up meaning that his scheduler clock also ends up using the really not very 
good timer tick..

			Linus

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 14:36                         ` Ingo Molnar
@ 2007-08-01 16:11                           ` Andi Kleen
  0 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2007-08-01 16:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roman Zippel, Andi Kleen, Mike Galbraith, Linus Torvalds,
	Andrew Morton, linux-kernel

On Wed, Aug 01, 2007 at 04:36:24PM +0200, Ingo Molnar wrote:
> 
> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > > jiffies based sched_clock should be soon very rare. It's probably 
> > > not worth optimizing for it.
> > 
> > I'm not so sure about that. sched_clock() has to be fast, so many 
> > archs may want to continue to use jiffies. [...]
> 
> i think Andi was talking about the vast majority of the systems out 
> there. For example, check out the arch demography of current Fedora 
> installs (according to the Smolt opt-in UUID based user metrics):

I meant that in many cases where the TSC is considered unreliable
today it'll be possible to use it anyways at least for sched_clock()
(and possibly even gtod()) 

The exception would be system which really have none, but there
should be very few of those.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:59                         ` Ingo Molnar
  2007-08-01 14:04                           ` Arjan van de Ven
@ 2007-08-01 15:44                           ` Roman Zippel
  2007-08-01 17:41                             ` Ingo Molnar
  1 sibling, 1 reply; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 15:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > > in that case 'top' accounting symptoms similar to the above are not 
> > > due to the scheduler starvation you suspected, but due the effect of 
> > > a low-resolution scheduler clock and a tightly coupled 
> > > timer/scheduler tick to it.
> > 
> > Well, it magnifies the rounding problems in CFS.
> 
> why do you say that? 2.6.22 behaves similarly with a low-res 
> sched_clock(). This has nothing to do with 'rounding problems'!
> 
> i tried your fl.c and if sched_clock() is high-resolution it's scheduled 
> _perfectly_ by CFS:
> 
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   5906 mingo     20   0  1576  244  196 R 71.2  0.0   0:30.11 l
>   5909 mingo     20   0  1844  344  260 S  9.6  0.0   0:04.02 lt
>   5907 mingo     20   0  1844  508  424 S  9.5  0.0   0:04.01 lt
>   5908 mingo     20   0  1844  344  260 S  9.5  0.0   0:04.02 lt
> 
> if sched_clock() is low-resolution then indeed the 'lt' tasks will 
> "hide":
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2366 mingo     20   0  1576  248  196 R 99.9  0.0   0:07.95 loop_silent
>     1 root      20   0  2132  636  548 S  0.0  0.0   0:04.64 init
> 
> but that's nothing new. CFS cannot conjure up time measurement methods 
> that do not exist. If you have a low-res clock and if you create an app 
> that syncs precisely to the tick of that clock via timers that run off 
> that exact tick then there's nothing the scheduler can do about it. It 
> is false to charachterise this as 'sleeper starvation' or 'rounding 
> error' like you did. No amount of rounding logic can create a 
> high-resolution clock out of thin air.

Please calm down. You apparantly already get worked up about one of the 
secondary problems. I didn't say 'sleeper starvation' or 'rounding 
error', these are your words and it's your perception of what I said.

sched_clock() can have a low resolution, which can be a problem for the 
scheduler. This is all this program demonstrates. If and how this problem 
should be solved is a completely different issue, about which I haven't 
said anything yet and since it's not that important right now I'll leave 
it at that for now.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:19                       ` Roman Zippel
@ 2007-08-01 15:07                         ` Ingo Molnar
  2007-08-01 17:10                           ` Andi Kleen
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 15:07 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > Please also send me the output of this script:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> Send privately.

thanks. Just to make sure, while you said that your TSC was off on that 
laptop, the bootup log of yours suggests a working TSC:

  Time: tsc clocksource has been installed.

and still your fl.c testcases produces the top output that you've 
reported in your first mail? If so then this could be a regression. Or 
did you turn off the tsc manually via notsc? (or was it with a different 
.config or on a different machine)? Please help us figure this out 
exactly, we dont want a real regression go unnoticed.

If you can reproduce that problem with a working TSC then please 
generate a second cfs-debug-info.sh snapshot _while_ your fl+l workload 
is running and send that to me (i'll reply back to it publicly). Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
                                       ` (4 preceding siblings ...)
  2007-08-01 14:40                     ` Ingo Molnar
@ 2007-08-01 14:49                     ` Peter Zijlstra
  2007-08-02 17:36                       ` Roman Zippel
  2007-08-02 15:46                     ` Ingo Molnar
  6 siblings, 1 reply; 123+ messages in thread
From: Peter Zijlstra @ 2007-08-01 14:49 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel

Hi Roman,

Took me most of today trying to figure out WTH you did in fs2.c, more
math and fundamental explanations would have been good. So please bear
with me as I try to recap this thing. (No, your code was very much _not_
obvious, a few comments and broken out functions would have made a world
of a difference)


So, for each task we keep normalised time

 normalised time := time/weight

using Bresenham's algorithm we can do this prefectly (up until a renice
- where you'd get errors)

        avg_frac += weight_inv
        
        weight_inv = X / weight
        
        avg = avg_frac / weight0_inv
        
        weight0_inv = X / weight0
        
        avg = avg_frac / (X / weight0) 
            = (X / weight) / (X / weight0) 
            = X / weight * weight0 / X 
            = weight0 / weight
        

So avg ends up being in units of [weight0/weight].

Then, in order to allow sleeping, we need to have a global clock to sync
with. Its this global clock that gave me headaches to reconstruct.

We're looking for a time like this:

  rq_time := sum(time)/sum(weight)

And you commented that the /sum(weight) part is where CFS obtained its
accumulating rounding error? (I'm inclined to believe the error will
statistically be 0, but I'll readily accept otherwise if you can show a
practical 'exploit')

Its not obvious how to do this using modulo logic like Bresenham because
that would involve using a gcm of all possible weights.

What you ended up with is quite interesting if correct.

	sum_avg_frac += weight_inv_{i}

however by virtue of the scheduler minimising:

  avg_{i} - avg_{j} | i != j

this gets a factor of:

  weight_{i}/sum_{j}^{N}(weight_{j})

 ( seems correct, needs more analysis though, this is very much a
   statistical step based on the previous constraint. this might
   very well introduce some errors )

resulting in:

        sum_avg_frac += sum_{i}^{N}(weight_inv_{i} *
        weight_{i}/sum_{j}^{N}(weight_{j}))
        
        weight_inv = X / weight
        
        sum_avg = sum_avg_frac / sum(weight0_inv)
                = sum_avg_frac / N*weight0_inv
        
        weight0_inv = X / weight0
        
        sum_avg = sum_avg_frac / N*weight0_inv
                = sum_{i}^{N}(weight_inv_{i} *
        weight_{i}/sum_{j}^{N}(weight_{j})) / N*weight0_inv
                = sum_{i}^{N}(X/weight_{i} *
        weight_{i}/sum_{j}^{N}(weight_{j})) / N*(X/weight0)
                = N*X / sum_{j}(weight_{j}) * weight0/N*X
                = weight0 / sum_{j}(weight_{j})

Exactly the unit we were looking for [weight0/sum(weight)]

 ( the extra weight0 matching the one we had in the per task normalised
   time )

I'm not sure all this is less complex than CFS, I'd be inclined to say
it is more so.

Also, I think you have an accumulating error on wakeup where you sync
with the global clock but fully discard the fraction.

Anyway, as said a more detailed explanation and certainly a proof of
your math would be nice. Is this something along the lines of what you
intended to convey?

If so, in the future please use more understandable language, we were
taught math for a reason :-)

Regards,
Peter




^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
                                       ` (3 preceding siblings ...)
  2007-08-01 13:20                     ` Andi Kleen
@ 2007-08-01 14:40                     ` Ingo Molnar
  2007-08-01 14:49                     ` Peter Zijlstra
  2007-08-02 15:46                     ` Ingo Molnar
  6 siblings, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 14:40 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> 	while (1)
> 		sched_yield();

sched_yield() is being reworked at the moment. But in general we want 
apps to move away to sane locking constructs ASAP. There's some movement 
in the 3D space at least.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:33                       ` Roman Zippel
@ 2007-08-01 14:36                         ` Ingo Molnar
  2007-08-01 16:11                           ` Andi Kleen
  2007-08-02  2:17                         ` Linus Torvalds
  1 sibling, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 14:36 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Andi Kleen, Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > jiffies based sched_clock should be soon very rare. It's probably 
> > not worth optimizing for it.
> 
> I'm not so sure about that. sched_clock() has to be fast, so many 
> archs may want to continue to use jiffies. [...]

i think Andi was talking about the vast majority of the systems out 
there. For example, check out the arch demography of current Fedora 
installs (according to the Smolt opt-in UUID based user metrics):

   http://smolt.fedoraproject.org/

   i686:     74743
   x86_64:   18599
   i386:      1208
   ppc:        527
   ppc64:      396
   sparc64:     14
   ---------------
   Total:    95488

even pure i386 (kernels, not systems) is a only 1.2% of all installs. By 
the time the CFS kernel gets into a distro (a few months at minimum, 
typically a year) this percentage will go down further. And embedded 
doesnt really care about task-statistics corner cases [ (it likely 
doesnt have 'top' installed - likely doesnt even have /proc mounted or 
even built in ;-) ].

of course CFS should not do _worse_ stats than what we had before, and 
should not break or massively misbehave. Also, anything sane we can do 
for low-resolution arches we should do (and we already do quite a bit - 
the while wmult stuff is to avoid expensive divisions) - and i regularly 
booted CFS with a low-resolution clock to make sure it works. So i'm not 
trying to duck anything, we've just got to keep our design priorities 
right :-)

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:59                         ` Ingo Molnar
@ 2007-08-01 14:04                           ` Arjan van de Ven
  2007-08-01 15:44                           ` Roman Zippel
  1 sibling, 0 replies; 123+ messages in thread
From: Arjan van de Ven @ 2007-08-01 14:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Roman Zippel, Mike Galbraith, Linus Torvalds, Andrew Morton,
	linux-kernel


> but that's nothing new. CFS cannot conjure up time measurement methods 
> that do not exist. If you have a low-res clock and if you create an app 
> that syncs precisely to the tick of that clock via timers that run off 
> that exact tick then there's nothing the scheduler can do about it. It 
> is false to charachterise this as 'sleeper starvation' or 'rounding 
> error' like you did. No amount of rounding logic can create a 
> high-resolution clock out of thin air.


CFS is only as fair as your clock is good.



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 12:21                       ` Roman Zippel
  2007-08-01 12:23                         ` Ingo Molnar
@ 2007-08-01 13:59                         ` Ingo Molnar
  2007-08-01 14:04                           ` Arjan van de Ven
  2007-08-01 15:44                           ` Roman Zippel
  1 sibling, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 13:59 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > in that case 'top' accounting symptoms similar to the above are not 
> > due to the scheduler starvation you suspected, but due the effect of 
> > a low-resolution scheduler clock and a tightly coupled 
> > timer/scheduler tick to it.
> 
> Well, it magnifies the rounding problems in CFS.

why do you say that? 2.6.22 behaves similarly with a low-res 
sched_clock(). This has nothing to do with 'rounding problems'!

i tried your fl.c and if sched_clock() is high-resolution it's scheduled 
_perfectly_ by CFS:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  5906 mingo     20   0  1576  244  196 R 71.2  0.0   0:30.11 l
  5909 mingo     20   0  1844  344  260 S  9.6  0.0   0:04.02 lt
  5907 mingo     20   0  1844  508  424 S  9.5  0.0   0:04.01 lt
  5908 mingo     20   0  1844  344  260 S  9.5  0.0   0:04.02 lt

if sched_clock() is low-resolution then indeed the 'lt' tasks will 
"hide":

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2366 mingo     20   0  1576  248  196 R 99.9  0.0   0:07.95 loop_silent
    1 root      20   0  2132  636  548 S  0.0  0.0   0:04.64 init

but that's nothing new. CFS cannot conjure up time measurement methods 
that do not exist. If you have a low-res clock and if you create an app 
that syncs precisely to the tick of that clock via timers that run off 
that exact tick then there's nothing the scheduler can do about it. It 
is false to charachterise this as 'sleeper starvation' or 'rounding 
error' like you did. No amount of rounding logic can create a 
high-resolution clock out of thin air.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 13:20                     ` Andi Kleen
@ 2007-08-01 13:33                       ` Roman Zippel
  2007-08-01 14:36                         ` Ingo Molnar
  2007-08-02  2:17                         ` Linus Torvalds
  0 siblings, 2 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 13:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Andi Kleen wrote:

> > especially if one already knows that
> > scheduler clock has only limited resolution (because it's based on
> > jiffies), it becomes possible to use mostly 32bit values.
> 
> jiffies based sched_clock should be soon very rare. It's probably
> not worth optimizing for it.

I'm not so sure about that. sched_clock() has to be fast, so many archs 
may want to continue to use jiffies. As soon as one does that one can also 
save a lot of computational overhead by using 32bit instead of 64bit.
The question is then how easy that is possible.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
                                       ` (2 preceding siblings ...)
  2007-08-01 11:37                     ` Ingo Molnar
@ 2007-08-01 13:20                     ` Andi Kleen
  2007-08-01 13:33                       ` Roman Zippel
  2007-08-01 14:40                     ` Ingo Molnar
                                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 123+ messages in thread
From: Andi Kleen @ 2007-08-01 13:20 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Mike Galbraith, Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel

Roman Zippel <zippel@linux-m68k.org> writes:

> especially if one already knows that
> scheduler clock has only limited resolution (because it's based on
> jiffies), it becomes possible to use mostly 32bit values.

jiffies based sched_clock should be soon very rare. It's probably
not worth optimizing for it.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  7:12                     ` Ingo Molnar
  2007-08-01  7:26                       ` Mike Galbraith
@ 2007-08-01 13:19                       ` Roman Zippel
  2007-08-01 15:07                         ` Ingo Molnar
  1 sibling, 1 reply; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 13:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> Please also send me the output of this script:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

Send privately.

> Could you also please send the source code for the "l.c" and "lt.c" apps
> you used for your testing so i can have a look. Thanks!

l.c is a simple busy loop (well, with the option to start many of them).
This is lt.c, what it does is to run a bit less than a jiffie, so it 
needs a low resolution clock to trigger the problem:

#include <stdio.h>
#include <signal.h>
#include <time.h>
#include <sys/time.h>

#define NSEC 1000000000
#define USEC 1000000

#define PERIOD	(NSEC/1000)

int i;

void worker(int sig)
{
	struct timeval tv;
	long long t0, t;

	gettimeofday(&tv, 0);
	//printf("%u,%lu\n", i, tv.tv_usec);
	t0 = (long long)tv.tv_sec * 1000000 + tv.tv_usec + PERIOD / 1000 - 50;
	do {
		gettimeofday(&tv, 0);
		t = (long long)tv.tv_sec * 1000000 + tv.tv_usec;
	} while (t < t0);
	
}

int main(int ac, char **av)
{
	int cnt;
	timer_t timer;
	struct itimerspec its;
	struct sigaction sa;

	cnt = i = atoi(av[1]);

	sa.sa_handler = worker;
	sa.sa_flags = 0;
	sigemptyset(&sa.sa_mask);

	sigaction(SIGALRM, &sa, 0);

	clock_gettime(CLOCK_MONOTONIC, &its.it_value);
	its.it_interval.tv_sec = 0;
	its.it_interval.tv_nsec = PERIOD * cnt;

	while (--i > 0 && fork() > 0)
		;

	its.it_value.tv_nsec += i * PERIOD;
	if (its.it_value.tv_nsec > NSEC) {
		its.it_value.tv_sec++;
		its.it_value.tv_nsec -= NSEC;
	}

	timer_create(CLOCK_MONOTONIC, 0, &timer);
	timer_settime(timer, TIMER_ABSTIME, &its, 0);

	printf("%u,%lu\n", i, its.it_interval.tv_nsec);

	while (1) 
		pause();
	return 0;
}


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 11:37                     ` Ingo Molnar
@ 2007-08-01 12:27                       ` Roman Zippel
  0 siblings, 0 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 12:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > [...] the increase in code size:
> > 
> > 2.6.22:
> >    text    data     bss     dec     hex filename
> >   10150      24    3344   13518    34ce kernel/sched.o
> > 
> > recent git:
> >    text    data     bss     dec     hex filename
> >   14724     228    2020   16972    424c kernel/sched.o
> > 
> > That's i386 without stats/debug. [...]
> 
> that's without CONFIG_SMP, right? :-) On SMP they are about net break 
> even:
> 
>      text    data     bss     dec     hex filename
>     26535    4173      24   30732    780c kernel/sched.o-2.6.22
>     28378    2574      16   30968    78f8 kernel/sched.o-2.6.23-git

That's still quite an increase in some rather important code paths and 
it's not just the code size, but also code complexity which is important 
- a major point I tried to address in my review.

> (plus a further ~1.5K per CPU data reduction which is not visible here) 

That's why I mentioned the increased runtime memory usage...

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 12:21                       ` Roman Zippel
@ 2007-08-01 12:23                         ` Ingo Molnar
  2007-08-01 13:59                         ` Ingo Molnar
  1 sibling, 0 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 12:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> On Wed, 1 Aug 2007, Ingo Molnar wrote:
> 
> > > [...] e.g. in this example there are three tasks that run only for 
> > > about 1ms every 3ms, but they get far more time than should have 
> > > gotten fairly:
> > > 
> > >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> > >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> > >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> > >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> > 
> > Mike and me have managed to reproduce similarly looking 'top' output, 
> > but it takes some effort: we had to deliberately run a non-TSC 
> > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> 
> I used my old laptop for these tests, where tsc is indeed disabled due 
> to instability. Otherwise the kernel was configured with 
> CONFIG_HZ=1000.

please send all the debug info and source code we asked for - thanks!

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01 11:22                     ` Ingo Molnar
@ 2007-08-01 12:21                       ` Roman Zippel
  2007-08-01 12:23                         ` Ingo Molnar
  2007-08-01 13:59                         ` Ingo Molnar
  2007-08-03  3:04                       ` Matt Mackall
  1 sibling, 2 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-01 12:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel

Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > [...] e.g. in this example there are three tasks that run only for 
> > about 1ms every 3ms, but they get far more time than should have 
> > gotten fairly:
> > 
> >  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> >  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> >  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> >  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> 
> Mike and me have managed to reproduce similarly looking 'top' output, 
> but it takes some effort: we had to deliberately run a non-TSC 
> sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.

I used my old laptop for these tests, where tsc is indeed disabled due to 
instability. Otherwise the kernel was configured with CONFIG_HZ=1000.

> in that case 'top' accounting symptoms similar to the above are not due 
> to the scheduler starvation you suspected, but due the effect of a 
> low-resolution scheduler clock and a tightly coupled timer/scheduler 
> tick to it.

Well, it magnifies the rounding problems in CFS.
I mainly wanted to test a little the behaviour of CFS and I thought a saw 
patch which enabled the use of TSC in these cases, so I didn't check 
sched_clock().

Anyway, I want to point out that this wasn't the main focus of what I 
wrote.

bye, Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
  2007-08-01  7:12                     ` Ingo Molnar
  2007-08-01 11:22                     ` Ingo Molnar
@ 2007-08-01 11:37                     ` Ingo Molnar
  2007-08-01 12:27                       ` Roman Zippel
  2007-08-01 13:20                     ` Andi Kleen
                                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 11:37 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> [...] the increase in code size:
> 
> 2.6.22:
>    text    data     bss     dec     hex filename
>   10150      24    3344   13518    34ce kernel/sched.o
> 
> recent git:
>    text    data     bss     dec     hex filename
>   14724     228    2020   16972    424c kernel/sched.o
> 
> That's i386 without stats/debug. [...]

that's without CONFIG_SMP, right? :-) On SMP they are about net break 
even:

     text    data     bss     dec     hex filename
    26535    4173      24   30732    780c kernel/sched.o-2.6.22
    28378    2574      16   30968    78f8 kernel/sched.o-2.6.23-git

(plus a further ~1.5K per CPU data reduction which is not visible here) 

btw., here's the general change in size of a generic vmlinux from .22 to 
.23-git, using the same .config:

     text    data     bss     dec     hex filename
  5256628  520760 1331200 7108588  6c77ec vmlinux.22
  5306918  535844 1327104 7169866  6d674a vmlinux.23-git

+50K. (this was on UP)

In any case, there's still some debugging code in the scheduler (beyond 
SCHED_DEBUG), i'll work some more on reducing it.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
  2007-08-01  7:12                     ` Ingo Molnar
@ 2007-08-01 11:22                     ` Ingo Molnar
  2007-08-01 12:21                       ` Roman Zippel
  2007-08-03  3:04                       ` Matt Mackall
  2007-08-01 11:37                     ` Ingo Molnar
                                       ` (4 subsequent siblings)
  6 siblings, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01 11:22 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


* Roman Zippel <zippel@linux-m68k.org> wrote:

> [...] e.g. in this example there are three tasks that run only for 
> about 1ms every 3ms, but they get far more time than should have 
> gotten fairly:
> 
>  4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
>  4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
>  4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
>  4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l

Mike and me have managed to reproduce similarly looking 'top' output, 
but it takes some effort: we had to deliberately run a non-TSC 
sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.

in that case 'top' accounting symptoms similar to the above are not due 
to the scheduler starvation you suspected, but due the effect of a 
low-resolution scheduler clock and a tightly coupled timer/scheduler 
tick to it. I tried the very same workload on 2.6.22 (with the same 
.config) and i saw similarly anomalous 'top' output. (Not only can one 
create really anomalous CPU usage, one can completely hide tasks from 
'top' output.)

if your test-box has a high-resolution sched_clock() [easily possible] 
then please send us the lt.c and l.c code so that we can have a look.

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  7:36                           ` Mike Galbraith
@ 2007-08-01  8:49                             ` Mike Galbraith
  0 siblings, 0 replies; 123+ messages in thread
From: Mike Galbraith @ 2007-08-01  8:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Roman Zippel, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, 2007-08-01 at 09:36 +0200, Mike Galbraith wrote:
> On Wed, 2007-08-01 at 09:30 +0200, Ingo Molnar wrote:
>  
> > yeah, the posted numbers look most weird, but there's a complete lack of 
> > any identification of test environment - so we'll need some more word 
> > >from Roman. Perhaps this was run on some really old box that does not 
> > have a high-accuracy sched_clock()? The patch below should simulate that 
> > scenario on 32-bit x86.
> > 
> > 	Ingo
> > 
> > Index: linux/arch/i386/kernel/tsc.c
> > ===================================================================
> > --- linux.orig/arch/i386/kernel/tsc.c
> > +++ linux/arch/i386/kernel/tsc.c
> > @@ -110,7 +110,7 @@ unsigned long long native_sched_clock(vo
> >  	 *   very important for it to be as fast as the platform
> >  	 *   can achive it. )
> >  	 */
> > -	if (unlikely(!tsc_enabled && !tsc_unstable))
> > +//	if (unlikely(!tsc_enabled && !tsc_unstable))
> >  		/* No locking but a rare wrong value is not a big deal: */
> >  		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
> >  
> 
> Ah, thanks.  I noticed that clocksource= went away.  I'll test with
> stats, with and without jiffies resolution.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
6465 root      20   0  1432  356  296 R   30  0.0   1:02.55 1 chew
6462 root      20   0  1576  216  140 R   23  0.0   0:50.29 1 massive_intr_x
6463 root      20   0  1576  216  140 R   23  0.0   0:50.23 1 massive_intr_x
6464 root      20   0  1576  216  140 R   23  0.0   0:50.28 1 massive_intr_x

Well, jiffies resolution clock did upset fairness a bit with a right at
jiffies resolution burn time, but not nearly as bad as on Roman's box,
and not in favor of the sleepers.  With the longer burn time of stock
massive_intr.c (8ms burn, 1ms sleep), lower resolution clock didn't
upset it.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
6511 root      20   0  1572  220  140 R   25  0.0   1:00.11 1 massive_intr
6512 root      20   0  1572  220  140 R   25  0.0   1:00.14 1 massive_intr
6514 root      20   0  1432  356  296 R   25  0.0   1:00.31 1 chew
6513 root      20   0  1572  220  140 R   24  0.0   1:00.14 1 massive_intr

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  7:30                         ` Ingo Molnar
@ 2007-08-01  7:36                           ` Mike Galbraith
  2007-08-01  8:49                             ` Mike Galbraith
  0 siblings, 1 reply; 123+ messages in thread
From: Mike Galbraith @ 2007-08-01  7:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Roman Zippel, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, 2007-08-01 at 09:30 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:

> > I haven't been able to reproduce this with any combination of 
> > features, and massive_intr tweaked to his work/sleep cycle.  I notice 
> > he's collecting stats though, and they look funky.  Recompiling.
> 
> yeah, the posted numbers look most weird, but there's a complete lack of 
> any identification of test environment - so we'll need some more word 
> >from Roman. Perhaps this was run on some really old box that does not 
> have a high-accuracy sched_clock()? The patch below should simulate that 
> scenario on 32-bit x86.
> 
> 	Ingo
> 
> Index: linux/arch/i386/kernel/tsc.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/tsc.c
> +++ linux/arch/i386/kernel/tsc.c
> @@ -110,7 +110,7 @@ unsigned long long native_sched_clock(vo
>  	 *   very important for it to be as fast as the platform
>  	 *   can achive it. )
>  	 */
> -	if (unlikely(!tsc_enabled && !tsc_unstable))
> +//	if (unlikely(!tsc_enabled && !tsc_unstable))
>  		/* No locking but a rare wrong value is not a big deal: */
>  		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
>  

Ah, thanks.  I noticed that clocksource= went away.  I'll test with
stats, with and without jiffies resolution.

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  7:26                       ` Mike Galbraith
@ 2007-08-01  7:30                         ` Ingo Molnar
  2007-08-01  7:36                           ` Mike Galbraith
  0 siblings, 1 reply; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01  7:30 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Roman Zippel, Linus Torvalds, Andrew Morton, linux-kernel


* Mike Galbraith <efault@gmx.de> wrote:

> > Thanks for the testing and the feedback, it's much appreciated! :-) 
> > On what platform did you do your tests, and what .config did you use 
> > (and could you please send me your .config)?
> > 
> > Please also send me the output of this script:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > (if the output is too large send it to me privately, or bzip2 -9 
> > it.)
> > 
> > Could you also please send the source code for the "l.c" and "lt.c" 
> > apps you used for your testing so i can have a look. Thanks!
> 
> I haven't been able to reproduce this with any combination of 
> features, and massive_intr tweaked to his work/sleep cycle.  I notice 
> he's collecting stats though, and they look funky.  Recompiling.

yeah, the posted numbers look most weird, but there's a complete lack of 
any identification of test environment - so we'll need some more word 
from Roman. Perhaps this was run on some really old box that does not 
have a high-accuracy sched_clock()? The patch below should simulate that 
scenario on 32-bit x86.

	Ingo

Index: linux/arch/i386/kernel/tsc.c
===================================================================
--- linux.orig/arch/i386/kernel/tsc.c
+++ linux/arch/i386/kernel/tsc.c
@@ -110,7 +110,7 @@ unsigned long long native_sched_clock(vo
 	 *   very important for it to be as fast as the platform
 	 *   can achive it. )
 	 */
-	if (unlikely(!tsc_enabled && !tsc_unstable))
+//	if (unlikely(!tsc_enabled && !tsc_unstable))
 		/* No locking but a rare wrong value is not a big deal: */
 		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  7:12                     ` Ingo Molnar
@ 2007-08-01  7:26                       ` Mike Galbraith
  2007-08-01  7:30                         ` Ingo Molnar
  2007-08-01 13:19                       ` Roman Zippel
  1 sibling, 1 reply; 123+ messages in thread
From: Mike Galbraith @ 2007-08-01  7:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Roman Zippel, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, 2007-08-01 at 09:12 +0200, Ingo Molnar wrote:
> Roman,
> 
> Thanks for the testing and the feedback, it's much appreciated! :-) On
> what platform did you do your tests, and what .config did you use (and
> could you please send me your .config)?
> 
> Please also send me the output of this script:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> (if the output is too large send it to me privately, or bzip2 -9 it.)
> 
> Could you also please send the source code for the "l.c" and "lt.c" apps
> you used for your testing so i can have a look. Thanks!

I haven't been able to reproduce this with any combination of features,
and massive_intr tweaked to his work/sleep cycle.  I notice he's
collecting stats though, and they look funky.  Recompiling.

	-Mike


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: CFS review
  2007-08-01  3:41                   ` CFS review Roman Zippel
@ 2007-08-01  7:12                     ` Ingo Molnar
  2007-08-01  7:26                       ` Mike Galbraith
  2007-08-01 13:19                       ` Roman Zippel
  2007-08-01 11:22                     ` Ingo Molnar
                                       ` (5 subsequent siblings)
  6 siblings, 2 replies; 123+ messages in thread
From: Ingo Molnar @ 2007-08-01  7:12 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Mike Galbraith, Linus Torvalds, Andrew Morton, linux-kernel


Roman,

Thanks for the testing and the feedback, it's much appreciated! :-) On
what platform did you do your tests, and what .config did you use (and
could you please send me your .config)?

Please also send me the output of this script:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

(if the output is too large send it to me privately, or bzip2 -9 it.)

Could you also please send the source code for the "l.c" and "lt.c" apps
you used for your testing so i can have a look. Thanks!

	Ingo

^ permalink raw reply	[flat|nested] 123+ messages in thread

* CFS review
  2007-07-14  5:04                 ` Mike Galbraith
@ 2007-08-01  3:41                   ` Roman Zippel
  2007-08-01  7:12                     ` Ingo Molnar
                                       ` (6 more replies)
  0 siblings, 7 replies; 123+ messages in thread
From: Roman Zippel @ 2007-08-01  3:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11264 bytes --]

Hi,

On Sat, 14 Jul 2007, Mike Galbraith wrote:

> > On Fri, 13 Jul 2007, Mike Galbraith wrote:
> > 
> > > > The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> > > > attempt to scale that down a little...
> > > 
> > > See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
> > > Perhaps more can be done, but "without any attempt..." isn't accurate.
> > 
> > Calculating these values at runtime would have been completely insane, the 
> > alternative would be a crummy approximation, so using a lookup table is 
> > actually a good thing. That's not the problem.
> 
> I meant see usage.

I more meant serious attempts. At this point I'm not that much interested 
in a few localized optimizations, what I'm interested in is how can this 
optimized at the design level (e.g. how can arch information be used to 
simplify things). So I spent quite a bit of time looking through cfs and 
experimenting with some ideas. I want to put the main focus on the 
performance aspect, but there are a few other issues as well.


But first something else (especially for Ingo): I tried to be very careful 
with any claims made in this mail, but this of course doesn't exclude the 
possibility of errors, in which case I'd appreciate any corrections. Any 
explanations done in this mail don't imply that anyone needs any such 
explanations, they're done to keep things in context, so that interested 
readers have a chance to follow even if they don't have the complete 
background information. Any suggestions made don't imply that they have to 
be implemented like this, there are more an incentive for further 
discussion and I'm always interested in better solutions.


A first indication that something may not be quite right is the increase
in code size:

2.6.22:
   text    data     bss     dec     hex filename
  10150      24    3344   13518    34ce kernel/sched.o

recent git:
   text    data     bss     dec     hex filename
  14724     228    2020   16972    424c kernel/sched.o

That's i386 without stats/debug. A lot of the new code is in regularly
executed regions and it's often not exactly trivial code as cfs added
lots of heavy 64bit calculations. With the increased text comes
increased runtime memory usage, e.g. task_struct increased so that only
5 of them instead 6 fit now into 8KB.

Since sched-design-CFS.txt doesn't really go into any serious detail, so
the EEVDF paper was more helpful and after playing with the ideas a
little I noticed that the whole idea of fair scheduling can be explained
somewhat simpler and I'm a little surprised not finding it mentioned
anywhere.
So a different view on this is that the runtime of a task is simply
normalized and the virtual time (or fair_clock) is the weighted average of
these normalized runtimes. The advantage of normalization is that it
makes things comparable, once the normalized time values are equal each
task got its fair share. It's more obvious in the EEVDF paper, cfs makes
it a bit more complicated, as it uses the virtual time to calculate the
eligible runtime, but it doesn't maintain a per process virtual time
(fair_key is not quite the same).

Here we get to the first problem, cfs is not overly accurate at
maintaining a precise balance. First there a lot of rounding errors due
to constant conversion between normalized and non-normalized values and
the higher the update frequency the bigger the error. The effect of
this can be seen by running:

	while (1)
		sched_yield();

and watching the sched_debug output and watch the underrun counter go 
crazy. cfs thus needs the limiting to keep this misbehaviour under
control. The problem here is that it's not that difficult to hit one of
the many limits, which may change the behaviour and makes cfs hard to
predict how it will behave under different situations.

The next issue is scheduler granularity, here I don't quite understand
why the actual running time has no influence at all, which makes it
difficult to predict how much cpu time a process will get at a time
(even the comments only refer to the vmstat output). What is basically
used instead is the normalized time since it was enqueued and
practically it's a bit more complicated, as fair_key is not entirely a
normalized time value. If the wait_runtime value is positive, higher
prioritized tasks are given even more priority than they already get
from their larger wait_runtime value. The problem here is that this
triggers underruns and lower priority tasks get even less time.

Another issue is the sleep bonus given to sleeping tasks. A problem here
is that this can be exploited, if a job is spread over a few threads,
they can get more time relativ to other tasks, e.g. in this example
there are three tasks that run only for about 1ms every 3ms, but they
get far more time than should have gotten fairly:

 4544 roman     20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
 4545 roman     20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
 4546 roman     20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
 4547 roman     20   0  1532  272  216 R  3.3  0.2   0:01.94 l

The debug output for this is also interesting:

            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
              lt  4544  42958653977764      -2800118       2797765     11753   120     11449444657       201615584     23750292211            9600               0
              lt  4545  42958653767067      -3010815       2759284     11747   120     11442604925       202910051     23746575676            9613               0
               l  4547  42958653330276      -3447606       1332892     32333   120      1035284848       -47628857               0               0           14247

Practically this means a few interactive tasks can steal quite a lot of
time from other tasks, which might try to get some work done. This may
be fine in Desktop environments, but I'm not sure it can be that easily
generalized. This can make cfs quite unfair, if waiting outside the
runqueue has more advantages than waiting in the runqueue.

Finally there is still the issue with the nice levels, where I still
think the massive increase overcompensates for adequate scheduling
classes, e.g. to give audio apps a fixed amount of time instead of a
relative portion.

Overall while cfs has a number of good ideas, I don't think it was
quite ready for a stable release. Maybe it's possible to fix this until
the release, but I certainly would have prefered less time pressure.
It's not just the small inaccuracies, it's also the significantly
increased complexity. In this regard I find the claim that cfs "has no
heuristics whatsoever" interesting, for that to be true I would expect a
little more accuracy, but that wouldn't be a big problem, if cfs were a
really fast and compact scheduler and here it's quite hard to argue that
cfs has been improvement in this particular area. Anyway, before Ingo
starts accussing me now of "negativism", let's look at some
possibilities how this could be improved.


To reduce the inaccuracies it's better to avoid conversion between 
normalized and real time values, so the first example program pretty much 
shows just that and demonstrate the very core of a scheduler. It maintains 
per task normalized times and uses only that to make the scheduling 
decision (it doesn't even need an explicit wait_runtime value). The nice 
thing about this how consistently it gives out time shares - unless there 
is a higher priority task, that task will get the maximum share, which 
makes the scheduling behaviour quite a bit more predictable.

The second example program is more complete (e.g. it demonstrates
adding/removing tasks) but it's based on the same basic idea. First the
floating point values are converted to fixed point values. To maintain
accuracy one has to take overflows into account, cfs currently avoids
overflows by throwing away (possibly important) information, but that
adds checks all over the place instead of dealing with them within the
design, so I consider this overflow avoidance a failed strategy - it
doesn't make anything simpler and creates problems elsewhere. Of course
the example program has its own limits, but in this case I can define
them, so that within them the scheduler will work correctly. The most
important limits are:

max normalized task time delta = max inverse weight * max cpu share
max normalized average time delta = max active tasks * max inverse weight * cpu share base

The first limit is used for comparing individual task and the second one
is used for maintaining the average.  With these limits it's possible to
choose the appropriate data types that can hold these maximum values and
then I don't really have to worry about overflows and I know the
scheduler is accurate without the need for a lot of extra checks.

The second example also adds a normalized average, but contrary to
current cfs it's not central to the design. An average is not needed to
give every task its fair share, but it can be used to make better
scheduling decisions to control _when_ a task gets its share. In this
example I provided two possibilities where it's used to schedule new
tasks, the first divides time into slices and sets a new task to the
start of that slice, the second gives the task a full time share
relative to the current average, but approximating the average (by
looking just at the min/max time) should work as well.
The average here is not a weighted average, a weighted average is a
little more complex to maintain accurately and has issues regarding
overflows, so I'm using a simple average, which is sufficient especially
since it's not a primary part of the scheduler anymore.

BTW above unfairness of sleeping tasks can be easily avoided in this
model by simply making sure that normalized time never goes backward.

The accuracy of this model makes it possible to further optimize the
code (it's really a key element, that's why I'm stressing it so much,
OTOH it's rather difficult to further optimize current cfs without
risking to make it worse).
For example the regular updates aren't really necessary, they can also
be done when necessary (i.e. when scheduling, where an update is
necessary anyway for precise accounting), the next schedule time can be
easily precalculated instead. OTOH the regular updates allow for very
cheap incremental updates, especially if one already knows that
scheduler clock has only limited resolution (because it's based on
jiffies), it becomes possible to use mostly 32bit values.

I hope the code example helps to further improve scheduler, I'm quite
aware that it doesn't implement everything, but this just means some of
the cfs design decisions need more explanation. I'm not really that
much interested in scheduler, I only want a small and fast scheduler and
that's some areas where cfs is no real improvement right now. cfs
practically obliterated my efforts I put into the ntp code to keep the
regular updates both cheap and highly precise...

bye, Roman

[-- Attachment #2: Type: TEXT/x-csrc, Size: 1247 bytes --]

#include <stdio.h>

int weight[10] = {
	20, 20, 20, 50, 20, 20, 20
};
double time[10];
double ntime[10];

#define MIN_S		1
#define MAX_S		10

#define SLICE(x)	(MAX_S / (double)weight[x])
#define MINSLICE(x)	(MIN_S / (double)weight[x])

int main(void)
{
	int i, j, l, w;
	double s, t, min, max;

	for (i = 0; i < 10; i++)
		ntime[i] = time[i] = 0;

	j = 0;
	l = 0;
	s = 0;
	while (1) {
		j = l ? 0 : 1;
		for (i = 0; i < 10; i++) {
			if (!weight[i] || i == l)
				continue;
			if (ntime[i] + MINSLICE(i) < ntime[j] + MINSLICE(j))
				j = i;
		}
		if (ntime[l] >= ntime[j] + SLICE(j) ||
		    (ntime[l] >= ntime[j] &&
		     ntime[l] >= s + SLICE(l))) {
			l = j;
			s = ntime[l];
		}
		time[l] += MIN_S;
		ntime[l] += MIN_S / (double)weight[l];

		printf("%u", l);
		for (i = 0, w = 0, t = 0; i < 10; i++) {
			if (!weight[i])
				continue;
			w += weight[i];
			t += ntime[i] * weight[i];
			printf("\t%3u/%u:%5g/%-7g", i, weight[i], time[i], ntime[i]);
		}
		t /= w;
		min = max = t;
		for (i = 0; i < 10; i++) {
			if (!weight[i])
				continue;
			if (ntime[i] < min)
				min = ntime[i];
			if (ntime[i] > max)
				max = ntime[i];
		}
		printf("\t| %g (%g)\n", t, max - min);
	}
}

[-- Attachment #3: Type: TEXT/x-csrc, Size: 4966 bytes --]

#include <stdio.h>
#include <stdlib.h>

struct task {
	unsigned int weight, weight_inv;
	int active;

	unsigned int time, time_avg;
	int time_norm, avg_fract;
} task[10]  = {
	{ .weight = 10 },
	{ .weight = 40 },
	{ .weight = 80 },
};

#define MIN_S		100
#define MAX_S		1000

#define SLICE(x)	(MAX_S * task[x].weight_inv)
#define MINSLICE(x)	(MIN_S * task[x].weight_inv)

#define WEIGTH0		40
#define WEIGTH0_INV	((1 << 16) / WEIGTH0)

unsigned int time_avg, time_norm_sum;
int avg_fract, weight_sum_inv;

static void normalize_avg(int i)
{
	if (!weight_sum_inv)
		return;
	/* assume the common case of 0/1 first, then fallback */
	if (task[i].avg_fract < 0 || task[i].avg_fract >= WEIGTH0_INV * MAX_S) {
		task[i].time_avg++;
		task[i].avg_fract -= WEIGTH0_INV * MAX_S;
		if (task[i].avg_fract < 0 || task[i].avg_fract >= WEIGTH0_INV * MAX_S) {
			task[i].time_avg += task[i].avg_fract / (WEIGTH0_INV * MAX_S);
			task[i].avg_fract %= WEIGTH0_INV * MAX_S;
		}
	}
	if (avg_fract < 0 || avg_fract >= weight_sum_inv) {
		time_avg++;
		avg_fract -= weight_sum_inv;
		if (avg_fract < 0 || avg_fract >= weight_sum_inv) {
			time_avg += avg_fract / weight_sum_inv;
			avg_fract %= weight_sum_inv;
		}
	}
}

int main(void)
{
	int i, j, l, task_cnt;
	unsigned int s;
	unsigned int time_sum, time_sum2;

	task_cnt = time_avg = 0;
	for (i = 0; i < 10; i++) {
		if (!task[i].weight)
			continue;
		task[i].active = 1;
		task_cnt++;
		task[i].weight_inv = (1 << 16) / task[i].weight;
	}
	weight_sum_inv = task_cnt * WEIGTH0_INV * MAX_S;
	printf("w: %u,%u\n", WEIGTH0_INV * MAX_S, weight_sum_inv);

	time_norm_sum = avg_fract = 0;
	l = 0;
	s = 0;
	while (1) {
		j = -1;
		for (i = 0; i < 10; i++) {
			if (i == l)
				continue;
			if (!task[i].active && task[i].weight) {
				if (!(rand() % 30)) {
					normalize_avg(i);
					task[i].active = 1;
					if (!task_cnt)
						goto done;
#if 1
					if ((int)(task[i].time_avg - time_avg) < 0) {
						task[i].time_norm -= (int)(task[i].time_avg - time_avg) * WEIGTH0_INV * MAX_S + task[i].avg_fract;
						task[i].time_avg = time_avg;
						task[i].avg_fract = 0;
					}
#else
					unsigned int new_time_avg = time_avg;
					int new_avg_fract = avg_fract / task_cnt - task[i].weight_inv * MAX_S;
					while (new_avg_fract < 0) {
						new_time_avg--;
						new_avg_fract += WEIGTH0_INV * MAX_S;
					}
					if ((int)(task[i].time_avg - new_time_avg) < 0 ||
					    ((int)(task[i].time_avg - new_time_avg) == 0 &&
					     task[i].avg_fract < new_avg_fract)) {
						task[i].time_norm += (int)(new_time_avg - task[i].time_avg) * WEIGTH0_INV * MAX_S +
							new_avg_fract - task[i].avg_fract;
						task[i].time_avg = new_time_avg;
						task[i].avg_fract = new_avg_fract;
					}
#endif
				done:
					task_cnt++;
					weight_sum_inv += WEIGTH0_INV * MAX_S;
					avg_fract += (int)(task[i].time_avg - time_avg) * WEIGTH0_INV * MAX_S + task[i].avg_fract;
					time_norm_sum += task[i].time_norm;
				}
			}
			if (!task[i].active)
				continue;
			if (j < 0 ||
			    (int)(task[i].time_norm + MINSLICE(i) - (task[j].time_norm + MINSLICE(j))) < 0)
				j = i;
		}

		if (!task[l].active) {
			if (j < 0)
				continue;
			goto do_switch;
		}

		if (!(rand() % 100)) {
			task[l].active = 0;
			task_cnt--;
			weight_sum_inv -= WEIGTH0_INV * MAX_S;
			avg_fract -= (int)(task[l].time_avg - time_avg) * WEIGTH0_INV * MAX_S + task[l].avg_fract;
			time_norm_sum -= task[l].time_norm;
			if (j < 0)
				continue;
			goto do_switch;
		}

		if (j >= 0 &&
		    ((int)(task[l].time_norm - (task[j].time_norm + SLICE(j))) >= 0 ||
		     ((int)(task[l].time_norm - task[j].time_norm) >= 0 &&
		      (int)(task[l].time_norm - (s + SLICE(l))) >= 0))) {
			int prev_time_avg;
		do_switch:
			prev_time_avg = time_avg;
			normalize_avg(l);
			if (prev_time_avg < time_avg)
				printf("-\n");
			l = j;
			s = task[l].time_norm;
		}
		task[l].time += MIN_S;
		task[l].time_norm += MINSLICE(l);
		task[l].avg_fract += MINSLICE(l);
		time_norm_sum += MINSLICE(l);
		avg_fract += MINSLICE(l);

		printf("%u", l);
		time_sum = time_sum2 = 0;
		for (i = 0; i < 10; i++) {
			if (!task[i].active) {
				if (task[i].weight)
					printf("\t%3u/%u: -\t", i, task[i].weight);
				continue;
			}
			time_sum += task[i].time_norm;
			time_sum2 += task[i].time_avg * WEIGTH0_INV * MAX_S + task[i].avg_fract;
			printf("\t%3u/%u:%5u/%-7g/%-7g", i, task[i].weight, task[i].time,
				(double)task[i].time_norm / (1 << 16),
				task[i].time_avg + (double)task[i].avg_fract / (WEIGTH0_INV * MAX_S));
		}
		if (time_sum != time_norm_sum)
			abort();
		if (time_sum2 != time_avg * weight_sum_inv + avg_fract)
			abort();
		if (time_sum != time_sum2)
			abort();
		printf("\t| %g/%g\n", (double)time_norm_sum / task_cnt / (1 << 16),
			time_avg + (double)(int)avg_fract / weight_sum_inv);
	}
}

^ permalink raw reply	[flat|nested] 123+ messages in thread

end of thread, other threads:[~2007-08-31 14:42 UTC | newest]

Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-11 10:44 CFS review Al Boldi
2007-08-12  4:17 ` Ingo Molnar
2007-08-12 15:27   ` Al Boldi
2007-08-12 15:52     ` Ingo Molnar
2007-08-12 19:43       ` Al Boldi
2007-08-21 10:58         ` Ingo Molnar
2007-08-21 22:27           ` Al Boldi
2007-08-24 13:45             ` Ingo Molnar
2007-08-25 22:27               ` Al Boldi
2007-08-25 23:15                 ` Ingo Molnar
2007-08-26 16:27                   ` Al Boldi
2007-08-26 16:39                     ` Ingo Molnar
2007-08-27  4:06                       ` Al Boldi
2007-08-27 10:53                         ` Ingo Molnar
2007-08-27 14:46                           ` Al Boldi
2007-08-27 20:41                             ` Ingo Molnar
2007-08-28  4:37                               ` Al Boldi
2007-08-28  5:05                                 ` Linus Torvalds
2007-08-28  5:23                                   ` Al Boldi
2007-08-28  7:28                                     ` Mike Galbraith
2007-08-28  7:36                                       ` Ingo Molnar
2007-08-28 16:34                                     ` Linus Torvalds
2007-08-28 16:44                                       ` Arjan van de Ven
2007-08-28 16:45                                       ` Ingo Molnar
2007-08-29  4:19                                         ` Al Boldi
2007-08-29  4:53                                           ` Ingo Molnar
2007-08-29  5:58                                             ` Al Boldi
2007-08-29  6:43                                               ` Ingo Molnar
2007-08-28 20:46                                   ` Valdis.Kletnieks
2007-08-28  7:43                                 ` Xavier Bestel
2007-08-28  8:02                                   ` Ingo Molnar
2007-08-28 19:19                                     ` Willy Tarreau
2007-08-28 19:55                                       ` Ingo Molnar
2007-08-29  4:18                                 ` Ingo Molnar
2007-08-29  4:29                                   ` Keith Packard
2007-08-29  4:46                                     ` Ingo Molnar
2007-08-29  7:57                                       ` Keith Packard
2007-08-29  8:04                                         ` Ingo Molnar
2007-08-29  8:53                                           ` Al Boldi
2007-08-29 15:57                                           ` Keith Packard
2007-08-29 19:56                                             ` Rene Herman
2007-08-30  7:05                                               ` Rene Herman
2007-08-30  7:20                                                 ` Ingo Molnar
2007-08-31  6:46                                                 ` Tilman Sauerbeck
2007-08-31 10:44                                                   ` DRM and/or X trouble (was Re: CFS review) Rene Herman
2007-08-31 14:55                                                     ` DRM and/or X trouble Satyam Sharma
2007-08-30 16:06                                               ` CFS review Chuck Ebbert
2007-08-30 16:48                                                 ` Rene Herman
2007-08-29  4:40                                   ` Mike Galbraith
2007-08-29  3:42                   ` Bill Davidsen
2007-08-29  3:37                 ` Bill Davidsen
2007-08-29  3:45                   ` Ingo Molnar
2007-08-29 13:11                     ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
2007-07-11 12:43 ` x86 status was " Andi Kleen
2007-07-11 17:42   ` Ingo Molnar
2007-07-11 21:16     ` Andi Kleen
2007-07-11 21:46       ` Andrea Arcangeli
2007-07-11 22:09         ` Linus Torvalds
2007-07-13  2:23           ` Roman Zippel
2007-07-13  4:47             ` Mike Galbraith
2007-07-13 17:23               ` Roman Zippel
2007-07-14  5:04                 ` Mike Galbraith
2007-08-01  3:41                   ` CFS review Roman Zippel
2007-08-01  7:12                     ` Ingo Molnar
2007-08-01  7:26                       ` Mike Galbraith
2007-08-01  7:30                         ` Ingo Molnar
2007-08-01  7:36                           ` Mike Galbraith
2007-08-01  8:49                             ` Mike Galbraith
2007-08-01 13:19                       ` Roman Zippel
2007-08-01 15:07                         ` Ingo Molnar
2007-08-01 17:10                           ` Andi Kleen
2007-08-01 16:27                             ` Linus Torvalds
2007-08-01 17:48                               ` Andi Kleen
2007-08-01 17:50                               ` Ingo Molnar
2007-08-01 18:01                                 ` Roman Zippel
2007-08-01 19:05                                   ` Ingo Molnar
2007-08-09 23:14                                     ` Roman Zippel
2007-08-10  5:49                                       ` Ingo Molnar
2007-08-10 13:52                                         ` Roman Zippel
2007-08-10 14:18                                           ` Ingo Molnar
2007-08-10 16:47                                           ` Mike Galbraith
2007-08-10 17:19                                             ` Roman Zippel
2007-08-10 16:54                                           ` Michael Chang
2007-08-10 17:25                                             ` Roman Zippel
2007-08-10 19:44                                               ` Ingo Molnar
2007-08-10 19:47                                               ` Willy Tarreau
2007-08-10 21:15                                                 ` Roman Zippel
2007-08-10 21:36                                                   ` Ingo Molnar
2007-08-10 22:50                                                     ` Roman Zippel
2007-08-11  5:28                                                       ` Willy Tarreau
2007-08-12  5:17                                                         ` Ingo Molnar
2007-08-11  0:30                                                     ` Ingo Molnar
2007-08-20 22:19                                                       ` Roman Zippel
2007-08-21  7:33                                                         ` Mike Galbraith
2007-08-21  8:35                                                           ` Ingo Molnar
2007-08-21 11:54                                                           ` Roman Zippel
2007-08-11  5:15                                                   ` Willy Tarreau
2007-08-10  7:23                                       ` Mike Galbraith
2007-08-01 11:22                     ` Ingo Molnar
2007-08-01 12:21                       ` Roman Zippel
2007-08-01 12:23                         ` Ingo Molnar
2007-08-01 13:59                         ` Ingo Molnar
2007-08-01 14:04                           ` Arjan van de Ven
2007-08-01 15:44                           ` Roman Zippel
2007-08-01 17:41                             ` Ingo Molnar
2007-08-01 18:14                               ` Roman Zippel
2007-08-03  3:04                       ` Matt Mackall
2007-08-03  3:57                         ` Arjan van de Ven
2007-08-03  4:18                           ` Willy Tarreau
2007-08-03  4:31                             ` Arjan van de Ven
2007-08-03  4:53                               ` Willy Tarreau
2007-08-03  4:38                           ` Matt Mackall
2007-08-03  8:44                             ` Ingo Molnar
2007-08-03  9:29                             ` Andi Kleen
2007-08-01 11:37                     ` Ingo Molnar
2007-08-01 12:27                       ` Roman Zippel
2007-08-01 13:20                     ` Andi Kleen
2007-08-01 13:33                       ` Roman Zippel
2007-08-01 14:36                         ` Ingo Molnar
2007-08-01 16:11                           ` Andi Kleen
2007-08-02  2:17                         ` Linus Torvalds
2007-08-02  4:57                           ` Willy Tarreau
2007-08-02 10:43                             ` Andi Kleen
2007-08-02 10:07                               ` Willy Tarreau
2007-08-02 16:09                           ` Ingo Molnar
2007-08-02 22:38                             ` Roman Zippel
2007-08-02 19:16                           ` Daniel Phillips
2007-08-02 23:23                           ` Roman Zippel
2007-08-01 14:40                     ` Ingo Molnar
2007-08-01 14:49                     ` Peter Zijlstra
2007-08-02 17:36                       ` Roman Zippel
2007-08-02 15:46                     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).