LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: How how latent should non-preemptive scheduling be?
       [not found]       ` <fa.Jx/Ygtm46CVRawlA6OnfYNn6cN0@ifi.uio.no>
@ 2008-09-18  7:26         ` Sitsofe Wheeler
       [not found]         ` <fa.iIHgL48F3T5VAqFw3mqaf9Pzrs4@ifi.uio.no>
  1 sibling, 0 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-18  7:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Arjan van de Ven, linux-kernel, Ingo Molnar

Peter Zijlstra wrote:
> On Wed, 2008-09-17 at 14:54 -0700, Arjan van de Ven wrote:
>> On Wed, 17 Sep 2008 22:48:55 +0100
>>
>>>> Cause                                                Maximum
>>>>
>> Percentage 
>>
>> Scheduler: waiting for cpu                        208 msec         59.4 %
>>
>>
>> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
>> that sounds like a scheduler fairness issue!
> 
> Really hard subject. Perfect fairness requires 0 latency - which with a
> CPU only being able to run one thing at a time is impossible. So what
> latency ends up being is a measure for the convergence towards fairness.
> 
> Anyway - 200ms isn't too weird depending on the circumstances. We start
> out with a 20ms latency for UP, we then multiply with 1+log2(nr_cpus)
> which in say a quad core machine ends up with 60ms. That ought to mean
> that under light load the max latency should not exceed twice that
> (basically a consequence of the Nyquist-Shannon sampling theorem IIRC).
> 
> Now, if you get get under some load (by default: nr_running > 5) the
> expected latency starts to linearly grow with nr_running.
> 
>>From what I gather from the reply to this email the machine was not
> doing much (and after having looked up the original email I see its a
> eeeeeeeee atom - which is dual cpu iirc, so that yields 40ms default) -
> so 200 is definately on the high side.

No, it's not an eeeeee atom. It's an eee celeron M (900Mhz) so it's 
definitely a single CPU with no hyperthreading (and SMP is not enabled 
in the kernel config either). It has less grunt that the atom and can't 
do cpu scaling either (although it seems to have C states).

The load average is less than 0.5 but obviously I don't know if it is 
periodically spiking over 5 and then smoothing out.

> What you can do to investigate this, is use the sched_wakeup tracer from
> ftrace, that should give a function trace of the highest wakeup latency
> showing what the kernel is doing.

Thanks for the hint - I was wondering where to look next.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
       [not found]           ` <fa.Td8xkKZKMSMghlJmEYefTRVF2kc@ifi.uio.no>
@ 2008-09-19 11:54             ` Sitsofe Wheeler
  2008-09-19 14:20               ` Peter Zijlstra
  2008-09-22 11:57               ` Ingo Molnar
  0 siblings, 2 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-19 11:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt

Ingo Molnar wrote:
> here's two quick howtos:
> 
>   http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
>   http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt

These two files appear to be identical. Is this intentional?

Anyway after following your instructions a putting together a small 
script to dice the output, I collated the 10 switches which took the 
longest:

# Top ten longest switches
# Rel TS Process   Abs TS
0.122161 hald-3423 1867.821170 ***
0.039438 <idle>-0  1867.379054
0.036318 hald-3423 1867.669009
0.031362 <idle>-0  1868.002762
0.030000 hald-3423 1867.699009
0.028933 <idle>-0  1867.529238
0.028539 <idle>-0  1867.228861
0.028196 <idle>-0  1867.128731
0.027763 <idle>-0  1868.101449
0.027513 <idle>-0  1867.028606

# tracer: sched_switch from around longest switch
#
#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
#              | |      |          |         |
           <idle>-0     [00]  1867.608017:      0:140:R   +     3:115:S
           <idle>-0     [00]  1867.608038:      0:140:R   +  3423:120:D
           <idle>-0     [00]  1867.608045:      0:140:R ==>     3:115:R
      ksoftirqd/0-3     [00]  1867.608048:      3:115:S ==>  3423:120:R
             hald-3423  [00]  1867.629350:   3423:120:R   +  6096:120:S
             hald-3423  [00]  1867.632691:   3423:120:R   +  3827:120:S
             hald-3423  [00]  1867.669009:   3423:120:R   +  3998:120:S
             hald-3423  [00]  1867.699009:   3423:120:R   +  6097:120:S
          ***hald-3423  [00]  1867.821170:   3423:120:R ==>  6096:120:R
        rhythmbox-6096  [00]  1867.821219:   6096:120:S ==>  6097:120:R
        rhythmbox-6097  [00]  1867.821262:   6097:120:R   +  3826:120:S
        rhythmbox-6097  [00]  1867.821289:   6097:120:S ==>  3826:120:R
       pulseaudio-3826  [00]  1867.821332:   3826:120:R   +  6097:120:S
       pulseaudio-3826  [00]  1867.821374:   3826:120:S ==>  6097:120:R
        rhythmbox-6097  [00]  1867.821380:   6097:120:S ==>  3998:120:R
        rhythmbox-3998  [00]  1867.821709:   3998:120:S ==>  3827:120:R
       pulseaudio-3827  [00]  1867.824041:   3827:120:R   +  3826:120:S


>    LatencyTOP version 0.4       (C) 2008 Intel Corporation
> 
> Cause                                                Maximum     Percentage
> Scheduler: waiting for cpu                        152.4 msec         13.8 %
> Userspace lock contention                           5.0 msec         68.0 %
> Waiting for event (poll)                            5.0 msec         14.3 %
> Waiting for event (select)                          4.9 msec          3.5 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.1 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re  1.9 msec          0.2 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.1 %
> Executing raw SCSI command                          1.1 msec          0.0 %
> Waiting for TTY to finish sending                   0.4 msec          0.0 %
> 
> 
> Process rhythmbox (3998)                   Total: 328.3 msec
> Scheduler: waiting for cpu                        152.4 msec         80.0 %
> Userspace lock contention                           4.1 msec         15.7 %
> Waiting for event (poll)                            2.7 msec          4.1 %


> you need to enable:
> 
>   CONFIG_SCHED_TRACER=y
>   CONFIG_CONTEXT_SWITCH_TRACER=y

I actually have both of these enabled but there's still no wakeup tracer 
(as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l 
). The good news is that your instructions don't need the wakeup tracer.

> it's not particularly well named though. Why doesnt it say 
> LATENCY_TRACER or something?

I agree it would be nicer if it had a better name.

>> Additionally I think I found a trigger - unplugging the power cable 
>> from the EeePC and having it run on battery seems to then set off this 
>> periodic stall every 30 seconds... There's no CPU frequency scaling 
>> enabled either (Celeron M's seemingly don't have P states and support 
>> for cpufreq is configured out).
> 
> sounds like potential SMM triggered latencies.

I have just gone away and read about the SMM ( 
http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If 
you're right there is pretty much nothing that can be done about the 
problem : (

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-19 11:54             ` Sitsofe Wheeler
@ 2008-09-19 14:20               ` Peter Zijlstra
  2008-09-22 11:57               ` Ingo Molnar
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2008-09-19 14:20 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Ingo Molnar, Arjan van de Ven, linux-kernel, Steven Rostedt

On Fri, 2008-09-19 at 12:54 +0100, Sitsofe Wheeler wrote:
> Ingo Molnar wrote:
> > here's two quick howtos:
> > 
> >   http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
> >   http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
> 
> These two files appear to be identical. Is this intentional?
> 
> Anyway after following your instructions a putting together a small 
> script to dice the output, I collated the 10 switches which took the 
> longest:
> 
> # Top ten longest switches
> # Rel TS Process   Abs TS
> 0.122161 hald-3423 1867.821170 ***
> 0.039438 <idle>-0  1867.379054
> 0.036318 hald-3423 1867.669009
> 0.031362 <idle>-0  1868.002762
> 0.030000 hald-3423 1867.699009
> 0.028933 <idle>-0  1867.529238
> 0.028539 <idle>-0  1867.228861
> 0.028196 <idle>-0  1867.128731
> 0.027763 <idle>-0  1868.101449
> 0.027513 <idle>-0  1867.028606
> 
> # tracer: sched_switch from around longest switch
> #
> #           TASK-PID   CPU#    TIMESTAMP  FUNCTION
> #              | |      |          |         |
>            <idle>-0     [00]  1867.608017:      0:140:R   +     3:115:S
>            <idle>-0     [00]  1867.608038:      0:140:R   +  3423:120:D
>            <idle>-0     [00]  1867.608045:      0:140:R ==>     3:115:R
>       ksoftirqd/0-3     [00]  1867.608048:      3:115:S ==>  3423:120:R
>              hald-3423  [00]  1867.629350:   3423:120:R   +  6096:120:S
>              hald-3423  [00]  1867.632691:   3423:120:R   +  3827:120:S
>              hald-3423  [00]  1867.669009:   3423:120:R   +  3998:120:S
>              hald-3423  [00]  1867.699009:   3423:120:R   +  6097:120:S
>           ***hald-3423  [00]  1867.821170:   3423:120:R ==>  6096:120:R
>         rhythmbox-6096  [00]  1867.821219:   6096:120:S ==>  6097:120:R
>         rhythmbox-6097  [00]  1867.821262:   6097:120:R   +  3826:120:S
>         rhythmbox-6097  [00]  1867.821289:   6097:120:S ==>  3826:120:R
>        pulseaudio-3826  [00]  1867.821332:   3826:120:R   +  6097:120:S
>        pulseaudio-3826  [00]  1867.821374:   3826:120:S ==>  6097:120:R
>         rhythmbox-6097  [00]  1867.821380:   6097:120:S ==>  3998:120:R
>         rhythmbox-3998  [00]  1867.821709:   3998:120:S ==>  3827:120:R
>        pulseaudio-3827  [00]  1867.824041:   3827:120:R   +  3826:120:S
> 

Its actually function tracer output I'm interested in.. that shows what
all its doing to make it take 120+ms.

I thought we had a wakeup latency tracer exacty like we have preempt and
irq off latency tracers, Steve, where'd that go?

> > sounds like potential SMM triggered latencies.
> 
> I have just gone away and read about the SMM ( 
> http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If 
> you're right there is pretty much nothing that can be done about the 
> problem : (

Yeah, SMM/SMI is nasty stuff :-(


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-19 11:54             ` Sitsofe Wheeler
  2008-09-19 14:20               ` Peter Zijlstra
@ 2008-09-22 11:57               ` Ingo Molnar
  2008-09-22 12:07                 ` Steven Rostedt
  2008-09-23  6:33                 ` Sitsofe Wheeler
  1 sibling, 2 replies; 28+ messages in thread
From: Ingo Molnar @ 2008-09-22 11:57 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt


* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Ingo Molnar wrote:
>> here's two quick howtos:
>>
>>   http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
>>   http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
>
> These two files appear to be identical. Is this intentional?
>
> Anyway after following your instructions a putting together a small  
> script to dice the output, I collated the 10 switches which took the  
> longest:
>
> # Top ten longest switches
> # Rel TS Process   Abs TS
> 0.122161 hald-3423 1867.821170 ***
> 0.039438 <idle>-0  1867.379054
> 0.036318 hald-3423 1867.669009
> 0.031362 <idle>-0  1868.002762
> 0.030000 hald-3423 1867.699009
> 0.028933 <idle>-0  1867.529238
> 0.028539 <idle>-0  1867.228861
> 0.028196 <idle>-0  1867.128731
> 0.027763 <idle>-0  1868.101449
> 0.027513 <idle>-0  1867.028606
>
> # tracer: sched_switch from around longest switch
> #
> #           TASK-PID   CPU#    TIMESTAMP  FUNCTION
> #              | |      |          |         |
>           <idle>-0     [00]  1867.608017:      0:140:R   +     3:115:S
>           <idle>-0     [00]  1867.608038:      0:140:R   +  3423:120:D
>           <idle>-0     [00]  1867.608045:      0:140:R ==>     3:115:R
>      ksoftirqd/0-3     [00]  1867.608048:      3:115:S ==>  3423:120:R
>             hald-3423  [00]  1867.629350:   3423:120:R   +  6096:120:S
>             hald-3423  [00]  1867.632691:   3423:120:R   +  3827:120:S
>             hald-3423  [00]  1867.669009:   3423:120:R   +  3998:120:S
>             hald-3423  [00]  1867.699009:   3423:120:R   +  6097:120:S
>          ***hald-3423  [00]  1867.821170:   3423:120:R ==>  6096:120:R
>        rhythmbox-6096  [00]  1867.821219:   6096:120:S ==>  6097:120:R
>        rhythmbox-6097  [00]  1867.821262:   6097:120:R   +  3826:120:S
>        rhythmbox-6097  [00]  1867.821289:   6097:120:S ==>  3826:120:R
>       pulseaudio-3826  [00]  1867.821332:   3826:120:R   +  6097:120:S
>       pulseaudio-3826  [00]  1867.821374:   3826:120:S ==>  6097:120:R
>        rhythmbox-6097  [00]  1867.821380:   6097:120:S ==>  3998:120:R
>        rhythmbox-3998  [00]  1867.821709:   3998:120:S ==>  3827:120:R
>       pulseaudio-3827  [00]  1867.824041:   3827:120:R   +  3826:120:S
>
>
>>    LatencyTOP version 0.4       (C) 2008 Intel Corporation
>>
>> Cause                                                Maximum     Percentage
>> Scheduler: waiting for cpu                        152.4 msec         13.8 %
>> Userspace lock contention                           5.0 msec         68.0 %
>> Waiting for event (poll)                            5.0 msec         14.3 %
>> Waiting for event (select)                          4.9 msec          3.5 %
>> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.1 %
>> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re  1.9 msec          0.2 %
>> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.1 %
>> Executing raw SCSI command                          1.1 msec          0.0 %
>> Waiting for TTY to finish sending                   0.4 msec          0.0 %
>>
>>
>> Process rhythmbox (3998)                   Total: 328.3 msec
>> Scheduler: waiting for cpu                        152.4 msec         80.0 %
>> Userspace lock contention                           4.1 msec         15.7 %
>> Waiting for event (poll)                            2.7 msec          4.1 %
>
>
>> you need to enable:
>>
>>   CONFIG_SCHED_TRACER=y
>>   CONFIG_CONTEXT_SWITCH_TRACER=y
>
> I actually have both of these enabled but there's still no wakeup tracer  
> (as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l  
> ). The good news is that your instructions don't need the wakeup tracer.
>
>> it's not particularly well named though. Why doesnt it say  
>> LATENCY_TRACER or something?
>
> I agree it would be nicer if it had a better name.
>
>>> Additionally I think I found a trigger - unplugging the power cable  
>>> from the EeePC and having it run on battery seems to then set off 
>>> this periodic stall every 30 seconds... There's no CPU frequency 
>>> scaling enabled either (Celeron M's seemingly don't have P states and 
>>> support for cpufreq is configured out).
>>
>> sounds like potential SMM triggered latencies.
>
> I have just gone away and read about the SMM ( 
> http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If 
> you're right there is pretty much nothing that can be done about the 
> problem : (

well, since they went away after you enabled CONFIG_PREEMPT=y, they are 
definitely in-kernel latencies, not any external SMM latencies.

I.e. they are inherently fixable. Could you enable:

  CONFIG_DYNAMIC_FTRACE=y
  CONFIG_FTRACE_MCOUNT_RECORD=y

that should make the traces a lot more verbose - every kernel function 
executed in the latency path will be logged. That way we'll be able to 
say which one takes that long.

note, you might have to increase /debug/tracing/trace_entries to get a 
long enough trace to capture the relevant portion of the latency.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-22 11:57               ` Ingo Molnar
@ 2008-09-22 12:07                 ` Steven Rostedt
  2008-09-23  6:33                 ` Sitsofe Wheeler
  1 sibling, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2008-09-22 12:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-kernel


On Mon, 22 Sep 2008, Ingo Molnar wrote:
> * Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> >>
> >> sounds like potential SMM triggered latencies.
> >
> > I have just gone away and read about the SMM ( 
> > http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If 
> > you're right there is pretty much nothing that can be done about the 
> > problem : (
> 
> well, since they went away after you enabled CONFIG_PREEMPT=y, they are 
> definitely in-kernel latencies, not any external SMM latencies.
> 
> I.e. they are inherently fixable. Could you enable:
> 
>   CONFIG_DYNAMIC_FTRACE=y
>   CONFIG_FTRACE_MCOUNT_RECORD=y
> 
> that should make the traces a lot more verbose - every kernel function 
> executed in the latency path will be logged. That way we'll be able to 
> say which one takes that long.
> 
> note, you might have to increase /debug/tracing/trace_entries to get a 
> long enough trace to capture the relevant portion of the latency.

Also note, to modify trace_entries, you must be in the none (nop?) tracer, 
otherwise the size will not be effected.

If you find the trace is also too big, you can echo a list of functions 
into:

  /debug/tracing/set_ftrace_notrace

to not trace those functions. using '>' will remove any existing function 
in that file, but using '>>' will append functions to the file.

For a list of functions that you can add, see:

  /debug/tracing/available_filter_functions

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-22 11:57               ` Ingo Molnar
  2008-09-22 12:07                 ` Steven Rostedt
@ 2008-09-23  6:33                 ` Sitsofe Wheeler
  2008-09-23 11:53                   ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-23  6:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt

Ingo Molnar wrote:
> well, since they went away after you enabled CONFIG_PREEMPT=y, they are 
> definitely in-kernel latencies, not any external SMM latencies.
> 
> I.e. they are inherently fixable. Could you enable:
> 
>   CONFIG_DYNAMIC_FTRACE=y
>   CONFIG_FTRACE_MCOUNT_RECORD=y
> 
> that should make the traces a lot more verbose - every kernel function 
> executed in the latency path will be logged. That way we'll be able to 
> say which one takes that long.

I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in 
2.6.27rc7. Is it an option that is only in -tip ?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-23  6:33                 ` Sitsofe Wheeler
@ 2008-09-23 11:53                   ` Ingo Molnar
  2008-09-23 16:30                     ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2008-09-23 11:53 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt

* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Ingo Molnar wrote:
>> well, since they went away after you enabled CONFIG_PREEMPT=y, they are 
>> definitely in-kernel latencies, not any external SMM latencies.
>>
>> I.e. they are inherently fixable. Could you enable:
>>
>>   CONFIG_DYNAMIC_FTRACE=y
>>   CONFIG_FTRACE_MCOUNT_RECORD=y
>>
>> that should make the traces a lot more verbose - every kernel function  
>> executed in the latency path will be logged. That way we'll be able to  
>> say which one takes that long.
>
> I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in 
> 2.6.27rc7. Is it an option that is only in -tip ?

yeah - it's a new ftrace feature queued up for v2.6.28.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-23 11:53                   ` Ingo Molnar
@ 2008-09-23 16:30                     ` Sitsofe Wheeler
  2008-09-23 19:39                       ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-23 16:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
>> Ingo Molnar wrote:
>>> well, since they went away after you enabled CONFIG_PREEMPT=y, they are 
>>> definitely in-kernel latencies, not any external SMM latencies.
>>>
>>> I.e. they are inherently fixable. Could you enable:
>>>
>>>   CONFIG_DYNAMIC_FTRACE=y
>>>   CONFIG_FTRACE_MCOUNT_RECORD=y
>>>
>>> that should make the traces a lot more verbose - every kernel function  
>>> executed in the latency path will be logged. That way we'll be able to  
>>> say which one takes that long.
>> I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in 
>> 2.6.27rc7. Is it an option that is only in -tip ?
> 
> yeah - it's a new ftrace feature queued up for v2.6.28.

I've been struggling to boot -tip/master - currently it blows up just 
after printing SLUB information saying:

BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0120078>] account_system_time+0x48/0x120
*pde = 00000000
Thread overran stack, or stack corrupted
Oops: 0002 [#1] PREEMPT

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-23 16:30                     ` Sitsofe Wheeler
@ 2008-09-23 19:39                       ` Sitsofe Wheeler
  2008-09-23 22:01                         ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-23 19:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-kernel,
	Steven Rostedt

Sitsofe Wheeler wrote:
> I've been struggling to boot -tip/master - currently it blows up just 
> after printing SLUB information saying:
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c0120078>] account_system_time+0x48/0x120
> *pde = 00000000
> Thread overran stack, or stack corrupted
> Oops: 0002 [#1] PREEMPT

OK this BUG seems to have gone away in the past few hours.

Ingo when you were asking for the ftrace report I presume that would be 
for a non preempt kernel (as a preemptive one only showed a very worst 
latency of 19657 us in one exceptional case)?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-23 19:39                       ` Sitsofe Wheeler
@ 2008-09-23 22:01                         ` Sitsofe Wheeler
  2008-09-27 20:48                           ` Ingo Molnar
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-23 22:01 UTC (permalink / raw)
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Arjan van de Ven,
	Steven Rostedt

Sitsofe Wheeler wrote:
> Ingo when you were asking for the ftrace report I presume that would be 
> for a non preempt kernel (as a preemptive one only showed a very worst 
> latency of 19657 us in one exceptional case)?

With a non preempt kernel I found the logs would simply become too big 
without filtering. On Peter's suggestion I used the following to remove 
the most frequently called functions:

echo acpi_os_release_object > set_ftrace_notrace && echo kmem_cache_* >> 
set_ftrace_notrace && echo acpi_ut_* >> set_ftrace_notrace

By doing counts across multiple runs I would say that the most 
frequently called functions are the following (in most frequently called 
order). The counts are definitely approximate but are reasonable 
relative to each other.

  475325 acpi_os_release_object (acpi_ut_delete_generic_state)
  406895 kmem_cache_free (acpi_os_release_object)
  402838 kmem_cache_alloc (acpi_ut_create_generic_state)
  132968 acpi_ut_update_ref_count (acpi_ut_update_object_reference)
  131041 acpi_ut_pop_generic_state (acpi_ut_update_object_reference)
  131036 acpi_ut_delete_generic_state (acpi_ut_update_object_reference)
  131025 acpi_ut_create_generic_state (acpi_ut_create_update_state)
  131023 acpi_ut_create_update_state_and_push 
(acpi_ut_update_object_reference)
  131020 acpi_ut_create_update_state (acpi_ut_create_update_state_and_push)
  131018 acpi_ut_push_generic_state (acpi_ut_create_update_state_and_push)
   60147 acpi_ns_get_next_node (acpi_ns_delete_namespace_by_owner)
   28974 acpi_ns_get_next_valid_node (acpi_ns_get_next_node)

Logs with the filtering on can be seen here (15Mbytes decompressed each):
http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz
http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz

It looks like lots of acpi state is created and deleted...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-23 22:01                         ` Sitsofe Wheeler
@ 2008-09-27 20:48                           ` Ingo Molnar
  2008-09-28 20:56                             ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2008-09-27 20:48 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt


* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Logs with the filtering on can be seen here (15Mbytes decompressed each):
> http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz
> http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz
>
> It looks like lots of acpi state is created and deleted...

yeah. The latency starts here:

     cat-5901    0dNh.    1us : default_wake_function (__wake_up_common)
     cat-5901    0.Nh.    2us : kill_fasync (snd_pcm_period_elapsed)
     [...]

and ends here:

     [...]
     cat-5901    0.N.. 270501us+: mutex_lock (acpi_ec_transaction)
     cat-5901    0d... 270507us : __cond_resched (_cond_resched)

270 _milliseconds_ later. That's excessive.

The main overhead is starting here:

     cat-5901    0.N..  167us : acpi_ds_result_push (acpi_ds_exec_end_op)

lots of ACPI code executed ...

does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That 
_should_ break up this section neatly. If it doesnt then please add a 
might_sleep() check to kernel/kernel/semaphore.c's down_timeout() 
function - that is called a number of times in this trace.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-27 20:48                           ` Ingo Molnar
@ 2008-09-28 20:56                             ` Sitsofe Wheeler
  2008-09-29  8:37                               ` Ingo Molnar
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-28 20:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That 

CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results.

> _should_ break up this section neatly. If it doesnt then please add a 
> might_sleep() check to kernel/kernel/semaphore.c's down_timeout() 
> function - that is called a number of times in this trace.

I added might_sleep() to the start of down_timeout() but it neither 
printed anything to dmesg nor changed the latency issue...

Strangely stalling only seems to turn up in linux-tip kernels with very 
little debugging options set within them. I have a linux-tip that has 
lots of extra debugging options set and this problem doesn't show up...

I've also asked about this on the ACPI mailing list and had a huge 
amount of help from finding a good point for a cond_resched - 
http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things 
(like games) still stutter when the battery is read but it's enough to 
play back sound in rhythmbox without stuttering.

Any ideas why the issue would go away with a debug kernel though?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-28 20:56                             ` Sitsofe Wheeler
@ 2008-09-29  8:37                               ` Ingo Molnar
  2008-09-29 23:11                                 ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2008-09-29  8:37 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt


* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Ingo Molnar wrote:
>> does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That 
>> 
>
> CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results.

okay - then this definitely needs fixed.

>> _should_ break up this section neatly. If it doesnt then please add a  
>> might_sleep() check to kernel/kernel/semaphore.c's down_timeout()  
>> function - that is called a number of times in this trace.
>
> I added might_sleep() to the start of down_timeout() but it neither 
> printed anything to dmesg nor changed the latency issue...
>
> Strangely stalling only seems to turn up in linux-tip kernels with 
> very little debugging options set within them. I have a linux-tip that 
> has lots of extra debugging options set and this problem doesn't show 
> up...
>
> I've also asked about this on the ACPI mailing list and had a huge 
> amount of help from finding a good point for a cond_resched - 
> http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things 
> (like games) still stutter when the battery is read but it's enough to 
> play back sound in rhythmbox without stuttering.
>
> Any ideas why the issue would go away with a debug kernel though?

hm, that's weird indeed. You could try to trace the full battery readout 
itself, via a script like this:

  cat /debug/tracing/trace > /dev/null        # drain trace
  cat /proc/acpi/whatever  
  cat /debug/tracing/trace > trace.txt

this way you should be seeing a full trace of the whole thing. (with the 
'ftrace' tracer plugin set in current_tracer)

then, assuming the trace.txt is complete, you can check where it 
reschedules and why. Search for 'schedule' calls. Compare the debug 
versus non-debug traces.

You can even turn on stack backtrace tracing feature, via:

  echo stacktrace > /debug/tracing/iter_ctrl

this adds extra trace entries that show the call path of every 
reschedule. [this attribute is default-disabled.]

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-29  8:37                               ` Ingo Molnar
@ 2008-09-29 23:11                                 ` Sitsofe Wheeler
  2008-09-30 11:22                                   ` Ingo Molnar
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-29 23:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
>
>> Any ideas why the issue would go away with a debug kernel though?
> 
> hm, that's weird indeed. You could try to trace the full battery readout 
> itself, via a script like this:
> 
>   cat /debug/tracing/trace > /dev/null        # drain trace
>   cat /proc/acpi/whatever  
>   cat /debug/tracing/trace > trace.txt
> 
> You can even turn on stack backtrace tracing feature, via:
> 
>   echo stacktrace > /debug/tracing/iter_ctrl
> 
> this adds extra trace entries that show the call path of every 
> reschedule. [this attribute is default-disabled.]

(Something bad seems to be happening with my kernels as I have been 
finding strange problems like network-manager detecting the wifi as a 
wired interface unless I did a make clean on my kernel sources before 
compiling). The stalling issue seems to keep coming and going and is 
currently showing up in both debug and non debug kernels.

There are traces when looking at battery information on both AC (where 
the problem is never there) and battery (where the problem always is 
there) here:
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-ac.txt.bz2
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2 
(15Mbytes uncompressed)

I wasn't quite sure how to tell when a reschedule was done. I simply did 
a grep schedule on the file and most of it seemed reasonable. One part 
that caught my eye was the following:

     speaker-test-3891  [000]  1392.751699: __switch_to <-schedule
           <idle>-0     [000]  1392.752070: account_scheduler_latency 
<-enqueue_task_fair
           <idle>-0     [000]  1392.752091: __switch_to <-schedule
              cat-7717  [000]  1392.752092: del_timer <-schedule_timeout
              cat-7717  [000]  1392.772263: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773224: __tasklet_schedule <-kbd_event
              cat-7717  [000]  1392.773225: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773225: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773226: queue_work <-schedule_work
              cat-7717  [000]  1392.773232: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773240: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773251: schedule_delayed_work 
<-put_queue
              cat-7717  [000]  1392.773251: queue_delayed_work 
<-schedule_delayed_work
              cat-7717  [000]  1392.773258: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773258: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773259: queue_work <-schedule_work
              cat-7717  [000]  1392.773267: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773268: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773268: queue_work <-schedule_work
              cat-7717  [000]  1392.807931: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.839464: __tasklet_schedule <-ath5k_intr
              cat-7717  [000]  1392.839506: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.941867: __tasklet_schedule <-ath5k_intr
              cat-7717  [000]  1393.005963: __tasklet_schedule <-ath5k_intr
              cat-7717  [000]  1393.033222: __switch_to <-schedule

Here a schedule seemingly doesn't happen for a few hundredths of a second...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-29 23:11                                 ` Sitsofe Wheeler
@ 2008-09-30 11:22                                   ` Ingo Molnar
  2008-09-30 13:18                                     ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2008-09-30 11:22 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt


* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

>              cat-7717  [000]  1392.941867: __tasklet_schedule <-ath5k_intr
>              cat-7717  [000]  1393.005963: __tasklet_schedule <-ath5k_intr
>              cat-7717  [000]  1393.033222: __switch_to <-schedule
>
> Here a schedule seemingly doesn't happen for a few hundredths of a second...

that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers 
are non-preemptible. (even under CONFIG_PREEMPT=y)

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-30 11:22                                   ` Ingo Molnar
@ 2008-09-30 13:18                                     ` Sitsofe Wheeler
  2008-10-04 10:50                                       ` Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how latent should non-preemptive scheduling be?) Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-30 13:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
>>              cat-7717  [000]  1392.941867: __tasklet_schedule <-ath5k_intr
>>              cat-7717  [000]  1393.005963: __tasklet_schedule <-ath5k_intr
>>              cat-7717  [000]  1393.033222: __switch_to <-schedule
>>
>> Here a schedule seemingly doesn't happen for a few hundredths of a second...
> 
> that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers 
> are non-preemptible. (even under CONFIG_PREEMPT=y)

What about the piece leading up to that:

              cat-7717  [000]  1392.752092: del_timer <-schedule_timeout
              cat-7717  [000]  1392.772263: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773224: __tasklet_schedule <-kbd_event
              cat-7717  [000]  1392.773225: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773225: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773226: queue_work <-schedule_work
              cat-7717  [000]  1392.773232: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773240: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.773251: schedule_delayed_work 
<-put_queue
              cat-7717  [000]  1392.773251: queue_delayed_work 
<-schedule_delayed_work
              cat-7717  [000]  1392.773258: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773258: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773259: queue_work <-schedule_work
              cat-7717  [000]  1392.773267: schedule_console_callback 
<-kbd_event
              cat-7717  [000]  1392.773268: schedule_work 
<-schedule_console_callback
              cat-7717  [000]  1392.773268: queue_work <-schedule_work
              cat-7717  [000]  1392.807931: account_scheduler_latency 
<-enqueue_task_fair
              cat-7717  [000]  1392.839464: __tasklet_schedule <-ath5k_intr

Here's a snippet from 
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2 :
              cat-7717  [000]  1392.752092: del_timer <-schedule_timeout
              cat-7717  [000]  1392.752092: acpi_ec_check_status 
<-acpi_ec_wait
              cat-7717  [000]  1392.752093: constant_test_bit 
<-acpi_ec_check_status
              cat-7717  [000]  1392.752095: mutex_unlock 
<-acpi_ec_transaction
              cat-7717  [000]  1392.752096: acpi_ex_enter_interpreter 
<-acpi_ev_address_space_dispatch
              cat-7717  [000]  1392.752097: acpi_os_wait_semaphore 
<-acpi_ut_acquire_mutex
              cat-7717  [000]  1392.752097: down_timeout 
<-acpi_os_wait_semaphore
              cat-7717  [000]  1392.752098: acpi_ex_release_global_lock 
<-acpi_ex_read_data_from_field
              cat-7717  [000]  1392.752099: acpi_ex_release_mutex_object 
<-acpi_ex_release_global_lock
              cat-7717  [000]  1392.752100: acpi_ev_release_global_lock 
<-acpi_ex_release_mutex_object
              cat-7717  [000]  1392.752100: __acpi_release_global_lock 
<-acpi_ev_release_global_lock
              cat-7717  [000]  1392.752101: acpi_os_signal_semaphore 
<-acpi_ev_release_global_lock
              cat-7717  [000]  1392.752102: up <-acpi_os_signal_semaphore
...
              cat-7717  [000]  1392.839447: 
acpi_ex_truncate_for32bit_table <-acpi_ds_exec_end_op
              cat-7717  [000]  1392.839447: acpi_ps_next_parse_state 
<-acpi_ps_parse_loop
              cat-7717  [000]  1392.839448: acpi_ps_complete_op 
<-acpi_ps_parse_loop
              cat-7717  [000]  1392.839448: acpi_ps_complete_this_op 
<-acpi_ps_complete_op
              cat-7717  [000]  1392.839449: acpi_ps_delete_parse_tree 
<-acpi_ps_complete_this_op
              cat-7717  [000]  1392.839449: acpi_ps_get_arg 
<-acpi_ps_delete_parse_tree
              cat-7717  [000]  1392.839450: acpi_ps_get_opcode_info 
<-acpi_ps_get_arg
              cat-7717  [000]  1392.839450: acpi_ps_get_arg 
<-acpi_ps_delete_parse_tree
              cat-7717  [000]  1392.839451: acpi_ps_get_opcode_info 
<-acpi_ps_get_arg
              cat-7717  [000]  1392.839452: do_IRQ <-common_interrupt
              cat-7717  [000]  1392.839453: irq_enter <-do_IRQ
              cat-7717  [000]  1392.839453: irq_to_desc <-do_IRQ
              cat-7717  [000]  1392.839454: handle_fasteoi_irq <-do_IRQ
              cat-7717  [000]  1392.839455: handle_IRQ_event 
<-handle_fasteoi_irq
              cat-7717  [000]  1392.839455: usb_hcd_irq <-handle_IRQ_event
              cat-7717  [000]  1392.839456: uhci_irq <-usb_hcd_irq
              cat-7717  [000]  1392.839458: ath5k_intr <-handle_IRQ_event
              cat-7717  [000]  1392.839459: ath5k_hw_is_intr_pending 
<-ath5k_intr
              cat-7717  [000]  1392.839461: ath5k_hw_get_isr <-ath5k_intr
              cat-7717  [000]  1392.839464: __tasklet_schedule <-ath5k_intr


Why didn't another process cut in from 1392.752092 to 1392.839464? In 
the schedule grep normally runs for cat are half as long e.g.:

           <idle>-0     [000]  1392.747102: __switch_to <-schedule
              cat-7717  [000]  1392.747103: del_timer <-schedule_timeout
              cat-7717  [000]  1392.747107: 
schedule_timeout_uninterruptible <-msleep
              cat-7717  [000]  1392.747107: schedule_timeout 
<-schedule_timeout_uninterruptible
              cat-7717  [000]  1392.747108: init_timer <-schedule_timeout
              cat-7717  [000]  1392.747108: __mod_timer <-schedule_timeout
              cat-7717  [000]  1392.747111: __switch_to <-schedule
           <idle>-0     [000]  1392.749067: account_scheduler_latency 
<-enqueue_task_fair


Further why (after taking a break to process the ath5k interrupt) did it 
not schedule another process other than cat until 1393.033226:

              cat-7717  [000]  1393.033215: set_normalized_timespec 
<-ktime_get_ts
              cat-7717  [000]  1393.033216: ktime_add_safe 
<-hrtimer_start_range_ns
              cat-7717  [000]  1393.033217: ktime_add_safe 
<-hrtimer_start_range_ns
              cat-7717  [000]  1393.033217: enqueue_hrtimer 
<-hrtimer_start_range_ns
              cat-7717  [000]  1393.033222: __switch_to <-schedule
  hald-addon-inpu-3237  [000]  1393.033226: fget_light <-do_sys_poll

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how   latent should non-preemptive scheduling be?)
  2008-09-30 13:18                                     ` Sitsofe Wheeler
@ 2008-10-04 10:50                                       ` Sitsofe Wheeler
  0 siblings, 0 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-10-04 10:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt,
	Christoph Lameter

Sitsofe Wheeler wrote:
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be
> skipping while playing sound through various desktop apps with a
> 2.6.27rc6 kernel. It is running off an SD card which really shows up

After weeks of head scratching as to why these latencies didn't occur
using the xandros 2.6.21.4 kernel (but keeping the same userspace) when
my own kernels would always show this problem I finally found the answer
after reading
http://tservice.net.ru/~s0mbre/blog/devel/other/2008_10_01 on kernel
planet - SLUB can cause regressions compared to SLAB. Switching from 
SLUB to SLAB made the problem more or less disappear (which I guess 
makes sense given the large number of kmem_* calls that are made when 
reading the battery).

My understanding with the memory allocators is that SLOB is the smallest 
and simplest. Its forté is that it uses very little memory which makes 
it ideal for embedded systems and is the easiest allocator to 
understand. The downside is that it might tend towards memory 
fragmentation the longer a system runs and is apparently a little bit 
slower than SLAB. The SLAB allocator is something that the linux kernel 
has had for many years and was a good performer until the pressures to 
perform with large SMP systems started to come in to play (at which 
point lots of memory would be used up on fixed structures - 
http://lwn.net/Articles/229984/ ). SLUB is the newest allocator, scales 
up well and has finer grained diagnostics that SLAB.

Prior to today my understanding was that SLUB would be able to replace 
SLAB in all scenarios and perform just as well or better. However now 
I'm not so sure (SLAB appears to be less latent than SLUB). Other than 
SLUB's newness are there cases where SLAB should be chosen instead of 
SLUB (e.g. uniprocessor desktop systems with hundreds of megabytes of 
RAM)? Will SLAB exist as an alternative to SLUB for certain setups?

-- 
Sitsofe | http://sucs.org/~sits

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-21 20:56 ` Matt Keenan
@ 2008-09-22  6:50   ` Sitsofe Wheeler
  0 siblings, 0 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-22  6:50 UTC (permalink / raw)
  To: Matt Keenan; +Cc: linux-kernel, Ingo Molnar

Matt Keenan wrote:
> If you use ALSA you could write a custom .asoundrc file and increase
> ALSA's default buffer size, this gives the hardware a longer buffer to
> play which may work around the problem. This used to do the trick for me
> with the old scheduler with slow machines. Not a fix but if nothing else
> works...

In all fairness turning on preemption also helps (I don't hear any drops 
or see any stalls once it's on) but I'm trying to work out if this is an 
underlying problem that can be fixed (and then there is the case of the 
default desktop distros).

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:18 Sitsofe Wheeler
  2008-09-17 21:28 ` Sitsofe Wheeler
  2008-09-17 21:34 ` Arjan van de Ven
@ 2008-09-21 20:56 ` Matt Keenan
  2008-09-22  6:50   ` Sitsofe Wheeler
  2 siblings, 1 reply; 28+ messages in thread
From: Matt Keenan @ 2008-09-21 20:56 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-kernel, Ingo Molnar

On Wed, 2008-09-17 at 22:18 +0100, Sitsofe Wheeler wrote:
> Hi,
> 
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping 
> while playing sound through various desktop apps with a 2.6.27rc6 
> kernel. It is running off an SD card which really shows up slow writes 
> but the sound is seemingly skipping even when ext3 is not being used. 

[snip snip]

If you use ALSA you could write a custom .asoundrc file and increase
ALSA's default buffer size, this gives the hardware a longer buffer to
play which may work around the problem. This used to do the trick for me
with the old scheduler with slow machines. Not a fix but if nothing else
works...

Matt


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-18 18:25         ` Sitsofe Wheeler
@ 2008-09-19  8:44           ` Ingo Molnar
  0 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2008-09-19  8:44 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt


* Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Peter Zijlstra wrote:
>> What you can do to investigate this, is use the sched_wakeup tracer from
>> ftrace, that should give a function trace of the highest wakeup latency
>> showing what the kernel is doing.
>
> I struggled to find documentation of ftrace because it's quite new. I  
> have come across  
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Using_the_ftrace_Utility_for_Tracing_Latencies.html 
> and  
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/ftrace.txt;h=d330fe3103da9c9a3cb8f888ac7255ce48e666d4;hb=45e9c0de2e86485f8b6633fd64ab19cfbff167f6 
>  .
>
> Thanks to those I started up the debugfs filesystem and went to the  
> trace directories but the on tracers in available_tracers are
> ftrace sched_switch none
>
> I can't see anything in the code that would disable wakeup... Any ideas  
> on what might be wrong? I'm using a 2.6.27rc6 kernel.

here's two quick howtos:

  http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
  http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt

you need to enable:

  CONFIG_SCHED_TRACER=y
  CONFIG_CONTEXT_SWITCH_TRACER=y

it's not particularly well named though. Why doesnt it say 
LATENCY_TRACER or something?

> Additionally I think I found a trigger - unplugging the power cable 
> from the EeePC and having it run on battery seems to then set off this 
> periodic stall every 30 seconds... There's no CPU frequency scaling 
> enabled either (Celeron M's seemingly don't have P states and support 
> for cpufreq is configured out).

sounds like potential SMM triggered latencies.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-18  2:42       ` Peter Zijlstra
@ 2008-09-18 18:25         ` Sitsofe Wheeler
  2008-09-19  8:44           ` Ingo Molnar
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-18 18:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Arjan van de Ven, linux-kernel, Ingo Molnar

Peter Zijlstra wrote:
> What you can do to investigate this, is use the sched_wakeup tracer from
> ftrace, that should give a function trace of the highest wakeup latency
> showing what the kernel is doing.

I struggled to find documentation of ftrace because it's quite new. I 
have come across 
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Using_the_ftrace_Utility_for_Tracing_Latencies.html 
and 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/ftrace.txt;h=d330fe3103da9c9a3cb8f888ac7255ce48e666d4;hb=45e9c0de2e86485f8b6633fd64ab19cfbff167f6 
  .

Thanks to those I started up the debugfs filesystem and went to the 
trace directories but the on tracers in available_tracers are
ftrace sched_switch none

I can't see anything in the code that would disable wakeup... Any ideas 
on what might be wrong? I'm using a 2.6.27rc6 kernel.

Additionally I think I found a trigger - unplugging the power cable from 
the EeePC and having it run on battery seems to then set off this 
periodic stall every 30 seconds... There's no CPU frequency scaling 
enabled either (Celeron M's seemingly don't have P states and support 
for cpufreq is configured out).


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:54     ` Arjan van de Ven
  2008-09-17 22:29       ` Sitsofe Wheeler
@ 2008-09-18  2:42       ` Peter Zijlstra
  2008-09-18 18:25         ` Sitsofe Wheeler
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2008-09-18  2:42 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Sitsofe Wheeler, linux-kernel, Ingo Molnar

On Wed, 2008-09-17 at 14:54 -0700, Arjan van de Ven wrote:
> On Wed, 17 Sep 2008 22:48:55 +0100
> Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
> > Arjan van de Ven wrote:
> > > this says you haven't done "make install" on the latencytop
> > > directory so it's not translating things for you.. can you do that
> > > please?
> > 
> > > Cause                                                Maximum
> > > 
> Percentage 
> 
> Scheduler: waiting for cpu                        208 msec         59.4 %
> 
> 
> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
> that sounds like a scheduler fairness issue!

Really hard subject. Perfect fairness requires 0 latency - which with a
CPU only being able to run one thing at a time is impossible. So what
latency ends up being is a measure for the convergence towards fairness.

Anyway - 200ms isn't too weird depending on the circumstances. We start
out with a 20ms latency for UP, we then multiply with 1+log2(nr_cpus)
which in say a quad core machine ends up with 60ms. That ought to mean
that under light load the max latency should not exceed twice that
(basically a consequence of the Nyquist-Shannon sampling theorem IIRC).

Now, if you get get under some load (by default: nr_running > 5) the
expected latency starts to linearly grow with nr_running.

>From what I gather from the reply to this email the machine was not
doing much (and after having looked up the original email I see its a
eeeeeeeee atom - which is dual cpu iirc, so that yields 40ms default) -
so 200 is definately on the high side.

What you can do to investigate this, is use the sched_wakeup tracer from
ftrace, that should give a function trace of the highest wakeup latency
showing what the kernel is doing.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:54     ` Arjan van de Ven
@ 2008-09-17 22:29       ` Sitsofe Wheeler
  2008-09-18  2:42       ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-17 22:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, Ingo Molnar, peterz

Arjan van de Ven wrote:
> On Wed, 17 Sep 2008 22:48:55 +0100
> Sitsofe Wheeler <sitsofe@yahoo.com> wrote:
> 
>> Arjan van de Ven wrote:
>>> this says you haven't done "make install" on the latencytop
>>> directory so it's not translating things for you.. can you do that
>>> please?
>>> Cause                                                Maximum
>>>
> Percentage 
> 
> Scheduler: waiting for cpu                        208 msec         59.4 %
> 
> 
> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
> that sounds like a scheduler fairness issue!

I'm at a bit of a loss to explain this. I'm not using the group 
scheduler and the load average is less than 0.4. CPU usage does seem to 
spike a fair bit (I'm not sure why rhythmbox needs 50%+ for decoding 
oggs from time to time) but there always seems to be 20% CPU free...

I don't know where to start debugging this one and I'm suspicious of the 
way the problem would happen every 30 seconds too... (this laptop is 
using ath5k for its wifi)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:48   ` How how latent should non-preemptive scheduling be? Sitsofe Wheeler
@ 2008-09-17 21:54     ` Arjan van de Ven
  2008-09-17 22:29       ` Sitsofe Wheeler
  2008-09-18  2:42       ` Peter Zijlstra
  0 siblings, 2 replies; 28+ messages in thread
From: Arjan van de Ven @ 2008-09-17 21:54 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-kernel, Ingo Molnar, peterz

On Wed, 17 Sep 2008 22:48:55 +0100
Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Arjan van de Ven wrote:
> > this says you haven't done "make install" on the latencytop
> > directory so it's not translating things for you.. can you do that
> > please?
> 
> > Cause                                                Maximum
> > 
Percentage 

Scheduler: waiting for cpu                        208 msec         59.4 %


you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
that sounds like a scheduler fairness issue!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
       [not found] ` <fa.808p0ZtU9DCpeky4KfNS8Drdw9w@ifi.uio.no>
@ 2008-09-17 21:48   ` Sitsofe Wheeler
  2008-09-17 21:54     ` Arjan van de Ven
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-17 21:48 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, Ingo Molnar

Arjan van de Ven wrote:
> this says you haven't done "make install" on the latencytop directory
> so it's not translating things for you.. can you do that please?

> Cause                                                Maximum     Percentage
> Scheduler: waiting for cpu                        208.3 msec         59.4 %
> Userspace lock contention                           4.9 msec         29.7 %
> Waiting for event (poll)                            4.8 msec          7.1 %
> Waiting for event (select)                          4.5 msec          0.6 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re  1.9 msec          2.0 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.7 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.6 %
> Executing raw SCSI command                          0.7 msec          0.0 %
> ex_resolve_to_value
> 
> 
> 
> 
> 
> Process firefox-bin (5756)                 Total: 317.2 msec                                                                 
> Scheduler: waiting for cpu                        208.3 msec         96.4 %
> Waiting for event (poll)                            3.7 msec          3.0 %
> Userspace lock contention                           1.0 msec          0.7 %
> 
> 
> 
> 
> 
> 
> 
> 
>  pdflush  gnome-vfs-daemo  bash  top  bash  pulseaudio  gconf-helper  bash  rhythmbox  firefox  run-mozilla.sh  firefox-bin 

Better?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:18 Sitsofe Wheeler
  2008-09-17 21:28 ` Sitsofe Wheeler
@ 2008-09-17 21:34 ` Arjan van de Ven
  2008-09-21 20:56 ` Matt Keenan
  2 siblings, 0 replies; 28+ messages in thread
From: Arjan van de Ven @ 2008-09-17 21:34 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-kernel, Ingo Molnar

On Wed, 17 Sep 2008 22:18:18 +0100
Sitsofe Wheeler <sitsofe@yahoo.com> wrote:

> Hi,
> 
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be
> skipping while playing sound through various desktop apps with a
> 2.6.27rc6 kernel. It is running off an SD card which really shows up
> slow writes but the sound is seemingly skipping even when ext3 is not
> being used. When look at latencytop I often see results similar to
> this:
> 
> > Cause                                                Maximum
> > Percentage Scheduler: waiting for cpu                        247.2
> > msec         64.6 % futex_wait do_futex sys_futex
> > sysenter_do_call      5.0 msec         24.8 % do_select
> > core_sys_select sys_select sysenter_do_c  4.7 msec          0.7 %
> > do_sys_poll sys_poll sysenter_do_call               4.5


this says you haven't done "make install" on the latencytop directory
so it's not translating things for you.. can you do that please?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: How how latent should non-preemptive scheduling be?
  2008-09-17 21:18 Sitsofe Wheeler
@ 2008-09-17 21:28 ` Sitsofe Wheeler
  2008-09-17 21:34 ` Arjan van de Ven
  2008-09-21 20:56 ` Matt Keenan
  2 siblings, 0 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-17 21:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Arjan van de Ven

Sitsofe Wheeler wrote:
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping 
> while playing sound through various desktop apps with a 2.6.27rc6 
> kernel. It is running off an SD card which really shows up slow writes 
> but the sound is seemingly skipping even when ext3 is not being used. 
> When look at latencytop I often see results similar to this:

One other small datapoint. When the skipping starts to occur regularly 
it seems to be exactly 30 seconds apart. ALSA doesn't report any xruns 
either...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* How how latent should non-preemptive scheduling be?
@ 2008-09-17 21:18 Sitsofe Wheeler
  2008-09-17 21:28 ` Sitsofe Wheeler
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Sitsofe Wheeler @ 2008-09-17 21:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

Hi,

I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping 
while playing sound through various desktop apps with a 2.6.27rc6 
kernel. It is running off an SD card which really shows up slow writes 
but the sound is seemingly skipping even when ext3 is not being used. 
When look at latencytop I often see results similar to this:

> Cause                                                Maximum     Percentage
> Scheduler: waiting for cpu                        247.2 msec         64.6 %
> futex_wait do_futex sys_futex sysenter_do_call      5.0 msec         24.8 %
> do_select core_sys_select sys_select sysenter_do_c  4.7 msec          0.7 %
> do_sys_poll sys_poll sysenter_do_call               4.5 msec          5.6 %
> do_sys_poll sys_ppoll sysenter_do_call              4.2 msec          1.7 %
> rt_mutex_timed_lock futex_lock_pi do_futex sys_fut  3.5 msec          0.5 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re  1.9 msec          1.3 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.4 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.5 %
> on acpi_ex_field_datum_io acpi_ex_extract_from_field acpi_ex_read_data_from_field acpi_ex_resolve_node_to_value acpi_ex_resolve_to_va
> lue
> 
> 
> 
> Process rhythmbox (5694)                   Total: 358.4 msec                                                                         
> Scheduler: waiting for cpu                        221.8 msec         69.9 %
> futex_wait do_futex sys_futex sysenter_do_call      4.5 msec         26.3 %
> do_sys_poll sys_poll sysenter_do_call               3.8 msec          3.2 %
> do_select core_sys_select sys_select sysenter_do_c  1.0 msec          0.4 %
> rt_mutex_timed_lock futex_lock_pi do_futex sys_fut  0.2 msec          0.1 %


> Cause                                                Maximum     Percentage
> Scheduler: waiting for cpu                        234.8 msec         31.1 %
> futex_wait do_futex sys_futex sysenter_do_call      5.0 msec         34.4 %
> do_select core_sys_select sys_select sysenter_do_c  5.0 msec         10.3 %
> do_sys_poll sys_ppoll sysenter_do_call              5.0 msec          6.2 %
> do_sys_poll sys_poll sysenter_do_call               5.0 msec         16.4 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re  1.9 msec          0.9 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.3 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu  1.9 msec          0.3 %
> blk_execute_rq scsi_execute scsi_execute_req scsi_  1.0 msec          0.0 %
> 
> 
> Process firefox-bin (5756)                 Total: 482.2 msec                                                                         
> Scheduler: waiting for cpu                        234.8 msec         95.5 %
> do_sys_poll sys_poll sysenter_do_call               4.1 msec          3.5 %
> futex_wait do_futex sys_futex sysenter_do_call      3.9 msec          1.0 %
> do_select core_sys_select sys_select sysenter_do_c  0.2 msec          0.0 %

top seems to show that there is CPU time still available. The kernel is 
compiled with voluntary preemption, dynticks and a HZ of 1000.

Is the scheduler waiting for CPU anything to do with the skips? If so 
what sort of maximum should I be expecting? If you are listing to music 
is it expected that you have preemption on? Would preemption even help?


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-10-04 10:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.ERZTl/6uH+mhNoef5fPJKTRjJag@ifi.uio.no>
     [not found] ` <fa.PtPFzP5kIJVCCov6YCewrh+o4z4@ifi.uio.no>
     [not found]   ` <fa.C6WSm5Rh2Nb+Qho7b0qDOZ9RPV8@ifi.uio.no>
     [not found]     ` <fa.ch6j4qXs/2sFpLkHz5fXrtjTR8c@ifi.uio.no>
     [not found]       ` <fa.Jx/Ygtm46CVRawlA6OnfYNn6cN0@ifi.uio.no>
2008-09-18  7:26         ` How how latent should non-preemptive scheduling be? Sitsofe Wheeler
     [not found]         ` <fa.iIHgL48F3T5VAqFw3mqaf9Pzrs4@ifi.uio.no>
     [not found]           ` <fa.Td8xkKZKMSMghlJmEYefTRVF2kc@ifi.uio.no>
2008-09-19 11:54             ` Sitsofe Wheeler
2008-09-19 14:20               ` Peter Zijlstra
2008-09-22 11:57               ` Ingo Molnar
2008-09-22 12:07                 ` Steven Rostedt
2008-09-23  6:33                 ` Sitsofe Wheeler
2008-09-23 11:53                   ` Ingo Molnar
2008-09-23 16:30                     ` Sitsofe Wheeler
2008-09-23 19:39                       ` Sitsofe Wheeler
2008-09-23 22:01                         ` Sitsofe Wheeler
2008-09-27 20:48                           ` Ingo Molnar
2008-09-28 20:56                             ` Sitsofe Wheeler
2008-09-29  8:37                               ` Ingo Molnar
2008-09-29 23:11                                 ` Sitsofe Wheeler
2008-09-30 11:22                                   ` Ingo Molnar
2008-09-30 13:18                                     ` Sitsofe Wheeler
2008-10-04 10:50                                       ` Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how latent should non-preemptive scheduling be?) Sitsofe Wheeler
     [not found] <fa.vMKgvqjqmYnI2J40GHoTENeYm8U@ifi.uio.no>
     [not found] ` <fa.808p0ZtU9DCpeky4KfNS8Drdw9w@ifi.uio.no>
2008-09-17 21:48   ` How how latent should non-preemptive scheduling be? Sitsofe Wheeler
2008-09-17 21:54     ` Arjan van de Ven
2008-09-17 22:29       ` Sitsofe Wheeler
2008-09-18  2:42       ` Peter Zijlstra
2008-09-18 18:25         ` Sitsofe Wheeler
2008-09-19  8:44           ` Ingo Molnar
2008-09-17 21:18 Sitsofe Wheeler
2008-09-17 21:28 ` Sitsofe Wheeler
2008-09-17 21:34 ` Arjan van de Ven
2008-09-21 20:56 ` Matt Keenan
2008-09-22  6:50   ` Sitsofe Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).