LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* runqueue locks in schedule()
@ 2008-01-17  0:29 stephane eranian
  2008-01-17 13:24 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: stephane eranian @ 2008-01-17  0:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: ia64, Stephane Eranian, Corey J Ashford

Hello,

As suggested by people on this list, I have changed perfmon2 to use
the high resolution
timers as the interface to allow timeout-based event set multiplexing.
This works around
the problems I had with tickless-enabled kernels.

Multiplexing is supported in per-thread as well. In that case, the
timeout measures virtual
time. When the thread is context switched out, we need to save the
remainder of the timeout
and cancel the timer. When the thread is context switched in, we need
to reinstall the timer.
These timer save/restore operations have to be done in the switch_to()
code near the end
of schedule().

There are situations where hrtimer_start() may end up trying to
acquire the runqueue lock.
This happens on a context switch where the current thread is blocking
(not preempted) and
the new timeout happens to be either in the past or just expiring.
We've run into such situations
with simple tests.

On all architectures, but IA-64, it seems thet the runqueue lock is
held until the end of schedule().
On IA-64, the lock is released BEFORE switch_to() for some reason I
don't quite remember. That
may not even be needed anymore.

The early unlocking is controlled by a macro named __ARCH_WANT_UNLOCKED_CTXSW.
Defining this macros on X86 (or PPC) fixed our problem.

It is not clear to me why the runqueue lock needs to be held up until
the end of schedule() on some
platforms and not on others. Not that releasing the lock earlier does
not necessarily introduce
more overhead because the lock is never re-acquired later in the
schedule() function.

Question:
   - is it safe to release the lock before switch_to() on all architectures?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-01-17  0:29 runqueue locks in schedule() stephane eranian
@ 2008-01-17 13:24 ` Peter Zijlstra
  2008-01-18  2:07   ` Nick Piggin
  2008-02-23 14:50   ` stephane eranian
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-01-17 13:24 UTC (permalink / raw)
  To: stephane eranian
  Cc: linux-kernel, ia64, Stephane Eranian, Corey J Ashford, Ingo Molnar


[ At the very least CC'ing the scheduler maintainer would be
helpful :-) ]

On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> Hello,
> 
> As suggested by people on this list, I have changed perfmon2 to use
> the high resolution timers as the interface to allow timeout-based
> event set multiplexing. This works around the problems I had with
> tickless-enabled kernels.
> 
> Multiplexing is supported in per-thread as well. In that case, the
> timeout measures virtual time. When the thread is context switched
> out, we need to save the remainder of the timeout and cancel the
> timer. When the thread is context switched in, we need to reinstall
> the timer. These timer save/restore operations have to be done in the
> switch_to() code near the end of schedule().
> 
> There are situations where hrtimer_start() may end up trying to
> acquire the runqueue lock. This happens on a context switch where the
> current thread is blocking (not preempted) and the new timeout happens
> to be either in the past or just expiring. We've run into such
> situations with simple tests.
> 
> On all architectures, but IA-64, it seems thet the runqueue lock is
> held until the end of schedule(). On IA-64, the lock is released
> BEFORE switch_to() for some reason I don't quite remember. That may
> not even be needed anymore.
> 
> The early unlocking is controlled by a macro named
> __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> our problem.
> 
> It is not clear to me why the runqueue lock needs to be held up until
> the end of schedule() on some platforms and not on others. Not that
> releasing the lock earlier does not necessarily introduce more
> overhead because the lock is never re-acquired later in the schedule()
> function.
> 
> Question:
>    - is it safe to release the lock before switch_to() on all architectures?

I had similar problem when using hrtimers from the scheduler, I extended
the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
unlocked.

http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93

As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
as I'm unaware of the arch ramifications there.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-01-17 13:24 ` Peter Zijlstra
@ 2008-01-18  2:07   ` Nick Piggin
  2008-01-18  6:33     ` stephane eranian
  2008-02-23 14:50   ` stephane eranian
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2008-01-18  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stephane eranian, linux-kernel, ia64, Stephane Eranian,
	Corey J Ashford, Ingo Molnar

On Friday 18 January 2008 00:24, Peter Zijlstra wrote:
> [ At the very least CC'ing the scheduler maintainer would be
> helpful :-) ]
>
> On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > Hello,
> >
> > As suggested by people on this list, I have changed perfmon2 to use
> > the high resolution timers as the interface to allow timeout-based
> > event set multiplexing. This works around the problems I had with
> > tickless-enabled kernels.
> >
> > Multiplexing is supported in per-thread as well. In that case, the
> > timeout measures virtual time. When the thread is context switched
> > out, we need to save the remainder of the timeout and cancel the
> > timer. When the thread is context switched in, we need to reinstall
> > the timer. These timer save/restore operations have to be done in the
> > switch_to() code near the end of schedule().
> >
> > There are situations where hrtimer_start() may end up trying to
> > acquire the runqueue lock. This happens on a context switch where the
> > current thread is blocking (not preempted) and the new timeout happens
> > to be either in the past or just expiring. We've run into such
> > situations with simple tests.
> >
> > On all architectures, but IA-64, it seems thet the runqueue lock is
> > held until the end of schedule(). On IA-64, the lock is released
> > BEFORE switch_to() for some reason I don't quite remember. That may
> > not even be needed anymore.
> >
> > The early unlocking is controlled by a macro named
> > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > our problem.
> >
> > It is not clear to me why the runqueue lock needs to be held up until
> > the end of schedule() on some platforms and not on others. Not that
> > releasing the lock earlier does not necessarily introduce more
> > overhead because the lock is never re-acquired later in the schedule()
> > function.
> >
> > Question:
> >    - is it safe to release the lock before switch_to() on all
> > architectures?
>
> I had similar problem when using hrtimers from the scheduler, I extended
> the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> unlocked.
>
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93
>
> As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
> as I'm unaware of the arch ramifications there.

It is arch specific. If an architecture wants interrupts on during context
switch, or runqueue unlocked, then they set it (btw INTERRUPTS_ON_CTXSW
also implies UNLOCKED_CTXSW).

Although, eg on x86, you would hold off interrupts and runqueue lock for
slightly less time if you defined those, it results in _slightly_ more
complicated context switching... although I did once find a workload
where the reduced runqueue contention improved throughput a bit, it is
not much problem in general to hold the lock.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-01-18  2:07   ` Nick Piggin
@ 2008-01-18  6:33     ` stephane eranian
  2008-01-18  8:28       ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: stephane eranian @ 2008-01-18  6:33 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, linux-kernel, ia64, Stephane Eranian,
	Corey J Ashford, Ingo Molnar

Nick,

On Jan 18, 2008 3:07 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> On Friday 18 January 2008 00:24, Peter Zijlstra wrote:
> > [ At the very least CC'ing the scheduler maintainer would be
> > helpful :-) ]
> >
> > On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > > Hello,
> > >
> > > As suggested by people on this list, I have changed perfmon2 to use
> > > the high resolution timers as the interface to allow timeout-based
> > > event set multiplexing. This works around the problems I had with
> > > tickless-enabled kernels.
> > >
> > > Multiplexing is supported in per-thread as well. In that case, the
> > > timeout measures virtual time. When the thread is context switched
> > > out, we need to save the remainder of the timeout and cancel the
> > > timer. When the thread is context switched in, we need to reinstall
> > > the timer. These timer save/restore operations have to be done in the
> > > switch_to() code near the end of schedule().
> > >
> > > There are situations where hrtimer_start() may end up trying to
> > > acquire the runqueue lock. This happens on a context switch where the
> > > current thread is blocking (not preempted) and the new timeout happens
> > > to be either in the past or just expiring. We've run into such
> > > situations with simple tests.
> > >
> > > On all architectures, but IA-64, it seems thet the runqueue lock is
> > > held until the end of schedule(). On IA-64, the lock is released
> > > BEFORE switch_to() for some reason I don't quite remember. That may
> > > not even be needed anymore.
> > >
> > > The early unlocking is controlled by a macro named
> > > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > > our problem.
> > >
> > > It is not clear to me why the runqueue lock needs to be held up until
> > > the end of schedule() on some platforms and not on others. Not that
> > > releasing the lock earlier does not necessarily introduce more
> > > overhead because the lock is never re-acquired later in the schedule()
> > > function.
> > >
> > > Question:
> > >    - is it safe to release the lock before switch_to() on all
> > > architectures?
> >
> > I had similar problem when using hrtimers from the scheduler, I extended
> > the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> > unlocked.
> >
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93
> >
> > As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
> > as I'm unaware of the arch ramifications there.
>
> It is arch specific. If an architecture wants interrupts on during context
> switch, or runqueue unlocked, then they set it (btw INTERRUPTS_ON_CTXSW
> also implies UNLOCKED_CTXSW).
>
Yes , I noticed that. I am only interested in UNLOCKED_CTXSW.
But it appears that the approach suggested my Peter does work. We are
running some tests.

> Although, eg on x86, you would hold off interrupts and runqueue lock for
> slightly less time if you defined those, it results in _slightly_ more
> complicated context switching... although I did once find a workload
> where the reduced runqueue contention improved throughput a bit, it is
> not much problem in general to hold the lock.
>
By complicated you mean that now you'd have to make sure you don't
need to access runqueue data?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-01-18  6:33     ` stephane eranian
@ 2008-01-18  8:28       ` Nick Piggin
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2008-01-18  8:28 UTC (permalink / raw)
  To: stephane eranian
  Cc: Peter Zijlstra, linux-kernel, ia64, Stephane Eranian,
	Corey J Ashford, Ingo Molnar

On Friday 18 January 2008 17:33, stephane eranian wrote:
> Nick,

> > It is arch specific. If an architecture wants interrupts on during
> > context switch, or runqueue unlocked, then they set it (btw
> > INTERRUPTS_ON_CTXSW also implies UNLOCKED_CTXSW).
>
> Yes , I noticed that. I am only interested in UNLOCKED_CTXSW.
> But it appears that the approach suggested my Peter does work. We are
> running some tests.

OK, that might be OK.


> > Although, eg on x86, you would hold off interrupts and runqueue lock for
> > slightly less time if you defined those, it results in _slightly_ more
> > complicated context switching... although I did once find a workload
> > where the reduced runqueue contention improved throughput a bit, it is
> > not much problem in general to hold the lock.
>
> By complicated you mean that now you'd have to make sure you don't
> need to access runqueue data?

Well, not speaking about the arch-specific code (which may involve
more complexities), but the core scheduler needs the
task_struct->oncpu variable wheras that isn't required if the
runqueue is locked while switching tasks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-01-17 13:24 ` Peter Zijlstra
  2008-01-18  2:07   ` Nick Piggin
@ 2008-02-23 14:50   ` stephane eranian
  2008-02-23 20:40     ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: stephane eranian @ 2008-02-23 14:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, ia64, Stephane Eranian, Corey J Ashford, Ingo Molnar

Peter,

>  On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
>  > Hello,
>  >
>  > As suggested by people on this list, I have changed perfmon2 to use
>  > the high resolution timers as the interface to allow timeout-based
>  > event set multiplexing. This works around the problems I had with
>  > tickless-enabled kernels.
>  >
>  > Multiplexing is supported in per-thread as well. In that case, the
>  > timeout measures virtual time. When the thread is context switched
>  > out, we need to save the remainder of the timeout and cancel the
>  > timer. When the thread is context switched in, we need to reinstall
>  > the timer. These timer save/restore operations have to be done in the
>  > switch_to() code near the end of schedule().
>  >
>  > There are situations where hrtimer_start() may end up trying to
>  > acquire the runqueue lock. This happens on a context switch where the
>  > current thread is blocking (not preempted) and the new timeout happens
>  > to be either in the past or just expiring. We've run into such
>  > situations with simple tests.
>  >
>  > On all architectures, but IA-64, it seems thet the runqueue lock is
>  > held until the end of schedule(). On IA-64, the lock is released
>  > BEFORE switch_to() for some reason I don't quite remember. That may
>  > not even be needed anymore.
>  >
>  > The early unlocking is controlled by a macro named
>  > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
>  > our problem.
>  >
>  > It is not clear to me why the runqueue lock needs to be held up until
>  > the end of schedule() on some platforms and not on others. Not that
>  > releasing the lock earlier does not necessarily introduce more
>  > overhead because the lock is never re-acquired later in the schedule()
>  > function.
>  >
>  > Question:
>  >    - is it safe to release the lock before switch_to() on all architectures?
>
>  I had similar problem when using hrtimers from the scheduler, I extended
>  the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
>  unlocked.
>
I am running into an issue when enabling this flag. Basically, the
timer never fires
when it gets into this situation where in hrtimer_start() the timer
ends up being the
next one to fire. In this mode,  hrtimer_enqueue_reprogram() become a NOP. But
then nobody never inserts the time into any queue. There is a comment that
says "caller site takes care of this". Could you elaborate on this?


Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runqueue locks in schedule()
  2008-02-23 14:50   ` stephane eranian
@ 2008-02-23 20:40     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-02-23 20:40 UTC (permalink / raw)
  To: stephane eranian
  Cc: linux-kernel, ia64, Stephane Eranian, Corey J Ashford, Ingo Molnar


On Sat, 2008-02-23 at 15:50 +0100, stephane eranian wrote:
> Peter,
> 
> >  On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> >  > Hello,
> >  >
> >  > As suggested by people on this list, I have changed perfmon2 to use
> >  > the high resolution timers as the interface to allow timeout-based
> >  > event set multiplexing. This works around the problems I had with
> >  > tickless-enabled kernels.
> >  >
> >  > Multiplexing is supported in per-thread as well. In that case, the
> >  > timeout measures virtual time. When the thread is context switched
> >  > out, we need to save the remainder of the timeout and cancel the
> >  > timer. When the thread is context switched in, we need to reinstall
> >  > the timer. These timer save/restore operations have to be done in the
> >  > switch_to() code near the end of schedule().
> >  >
> >  > There are situations where hrtimer_start() may end up trying to
> >  > acquire the runqueue lock. This happens on a context switch where the
> >  > current thread is blocking (not preempted) and the new timeout happens
> >  > to be either in the past or just expiring. We've run into such
> >  > situations with simple tests.
> >  >
> >  > On all architectures, but IA-64, it seems thet the runqueue lock is
> >  > held until the end of schedule(). On IA-64, the lock is released
> >  > BEFORE switch_to() for some reason I don't quite remember. That may
> >  > not even be needed anymore.
> >  >
> >  > The early unlocking is controlled by a macro named
> >  > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> >  > our problem.
> >  >
> >  > It is not clear to me why the runqueue lock needs to be held up until
> >  > the end of schedule() on some platforms and not on others. Not that
> >  > releasing the lock earlier does not necessarily introduce more
> >  > overhead because the lock is never re-acquired later in the schedule()
> >  > function.
> >  >
> >  > Question:
> >  >    - is it safe to release the lock before switch_to() on all architectures?
> >
> >  I had similar problem when using hrtimers from the scheduler, I extended
> >  the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> >  unlocked.
> >
> I am running into an issue when enabling this flag. Basically, the
> timer never fires
> when it gets into this situation where in hrtimer_start() the timer
> ends up being the
> next one to fire. In this mode,  hrtimer_enqueue_reprogram() become a NOP. But
> then nobody never inserts the time into any queue. There is a comment that
> says "caller site takes care of this". Could you elaborate on this?

That would mean the timer already expired by the time you get to program
it.

The way to handle these is:

for (;;) {
	if (hrtimer_active(timer))
		break;

	now = hrtimer_cb_get_time(timer);
	hrtimer_forward(timer, now, period);
	hrtimer_start(timer, timer->expires, HRTIMER_MODE_ABS);
}

You could use the return value from hrtimer_forward() to determine how
many events you missed if that is needed. The timer function needs a
similar loop if it wants to use HRTIMER_RESTART.

Single shot timers can handle it like in kernel/hrtimer.c:do_nanosleep()

  hrtimer_start(timer, ...);
  if (!hrtimer_active(timer))
	/* handle the missed expiration */




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-23 20:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-17  0:29 runqueue locks in schedule() stephane eranian
2008-01-17 13:24 ` Peter Zijlstra
2008-01-18  2:07   ` Nick Piggin
2008-01-18  6:33     ` stephane eranian
2008-01-18  8:28       ` Nick Piggin
2008-02-23 14:50   ` stephane eranian
2008-02-23 20:40     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).