LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* scheduler: IRQs disabled over context switches @ 2004-05-23 16:43 Russell King 2004-05-23 18:59 ` Davide Libenzi 2004-05-24 8:37 ` Ingo Molnar 0 siblings, 2 replies; 13+ messages in thread From: Russell King @ 2004-05-23 16:43 UTC (permalink / raw) To: Linux Kernel List Hi, The 2.6.6 scheduler disables IRQs across context switches, which is bad news for IRQ latency on ARM - to the point where 16550A FIFO UARTs to overrun. I'm considering defining prepare_arch_switch & co as follows on ARM, so that we release IRQs over the call to context_switch(). #define prepare_arch_switch(rq,next) \ do { \ spin_lock(&(next)->switch_lock); \ spin_unlock_irq(&(rq)->lock); \ } while (0) #define finish_arch_switch(rq,prev) \ spin_unlock(&(prev)->switch_lock) #define task_running(rq,p) \ ((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock)) The question is... why are we keeping IRQs disabled over context_switch() in the first case? Looking at the code, the only thing which is touched outside of the two tasks is rq->prev_mm. Since runqueues are CPU- specific and we're holding at least one spinlock, I think the above is preempt safe and SMP safe. However, I'd like to find out from someone who knows this code why IRQs are disabled by default here. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 16:43 scheduler: IRQs disabled over context switches Russell King @ 2004-05-23 18:59 ` Davide Libenzi 2004-05-23 19:38 ` Russell King 2004-05-24 8:37 ` Ingo Molnar 1 sibling, 1 reply; 13+ messages in thread From: Davide Libenzi @ 2004-05-23 18:59 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List On Sun, 23 May 2004, Russell King wrote: > The 2.6.6 scheduler disables IRQs across context switches, which is > bad news for IRQ latency on ARM - to the point where 16550A FIFO > UARTs to overrun. > > I'm considering defining prepare_arch_switch & co as follows on ARM, > so that we release IRQs over the call to context_switch(). > > #define prepare_arch_switch(rq,next) \ > do { \ > spin_lock(&(next)->switch_lock); \ > spin_unlock_irq(&(rq)->lock); \ > } while (0) > #define finish_arch_switch(rq,prev) \ > spin_unlock(&(prev)->switch_lock) > #define task_running(rq,p) \ > ((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock)) > > The question is... why are we keeping IRQs disabled over context_switch() > in the first case? Looking at the code, the only thing which is touched > outside of the two tasks is rq->prev_mm. Since runqueues are CPU- > specific and we're holding at least one spinlock, I think the above > is preempt safe and SMP safe. Other archs already do the above. The only reason I can think of is an optimization issue. The above code does a spin_lock/unlock more than the default code. For archs where the ctx switch is fast, this might matter. Whereas archs with slow ctx switch might want to use the above code to reduce IRQ latency. Also, if the ctx switch code involves acquiring some "external" spinlock, care should be taken to verify that cross-lock happens (see ia64 code for example). IMO, if this is an optimization issue, and if the thing does not buy us much in terms of performance, I'd rather use the above code as default. Ingo? - Davide ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 18:59 ` Davide Libenzi @ 2004-05-23 19:38 ` Russell King 2004-05-23 23:04 ` Davide Libenzi 0 siblings, 1 reply; 13+ messages in thread From: Russell King @ 2004-05-23 19:38 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel List On Sun, May 23, 2004 at 11:59:01AM -0700, Davide Libenzi wrote: > On Sun, 23 May 2004, Russell King wrote: > > > The 2.6.6 scheduler disables IRQs across context switches, which is > > bad news for IRQ latency on ARM - to the point where 16550A FIFO > > UARTs to overrun. > > > > I'm considering defining prepare_arch_switch & co as follows on ARM, > > so that we release IRQs over the call to context_switch(). > > > > #define prepare_arch_switch(rq,next) \ > > do { \ > > spin_lock(&(next)->switch_lock); \ > > spin_unlock_irq(&(rq)->lock); \ > > } while (0) > > #define finish_arch_switch(rq,prev) \ > > spin_unlock(&(prev)->switch_lock) > > #define task_running(rq,p) \ > > ((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock)) > > > > The question is... why are we keeping IRQs disabled over context_switch() > > in the first case? Looking at the code, the only thing which is touched > > outside of the two tasks is rq->prev_mm. Since runqueues are CPU- > > specific and we're holding at least one spinlock, I think the above > > is preempt safe and SMP safe. > > Other archs already do the above. Not quite - look harder. They use spin_unlock_irq in finish_arch_switch rather than prepare_arch_switch. Ralf Baechle mentioned that he thinks the above on MIPS causes some rare but weird problems, though he wasn't able to give anything specific at the time. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 19:38 ` Russell King @ 2004-05-23 23:04 ` Davide Libenzi 2004-05-23 23:33 ` Russell King 0 siblings, 1 reply; 13+ messages in thread From: Davide Libenzi @ 2004-05-23 23:04 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List On Sun, 23 May 2004, Russell King wrote: > Not quite - look harder. They use spin_unlock_irq in finish_arch_switch > rather than prepare_arch_switch. Hmm, they do indeed. Hmm, if we release the rq lock before the ctx switch, "prev" (the real one) will result not running since we already set "rq->curr" to "next" (and we do not hold "prev->switch_lock"). So another CPU might see "prev" as free-to-pull, while we're still playing with its fields. Even in UP, we will have a window of time where "rq->curr" is different from "current", with IRQ enabled (time tick). IMO we have two problems in releasing the "rq->lock" and enabling IRQs. One is that "prev" will result free-to-steal from another CPU (after "rq->curr" is set to "next"). The other one is the timer tick, that might screw us up while switching. OTOH we cannot enable IRQs while holding the "rq->lock" for obvious reasons. Maybe something like below. With preempt, we already have the preempt_disable() at the beginning of schedule(), so the timer tick will not issue a reschedule on return. - Davide Index: kernel/sched.c =================================================================== RCS file: /usr/src/bkcvs/linux-2.5/kernel/sched.c,v retrieving revision 1.303 diff -u -r1.303 sched.c --- a/kernel/sched.c 21 May 2004 20:17:47 -0000 1.303 +++ b/kernel/sched.c 23 May 2004 22:20:29 -0000 @@ -220,6 +220,8 @@ prio_array_t *active, *expired, arrays[2]; int best_expired_prio; atomic_t nr_iowait; + task_t *prev; + atomic_t in_ctx_switch; #ifdef CONFIG_SMP struct sched_domain *sd; @@ -243,13 +245,20 @@ #define task_rq(p) cpu_rq(task_cpu(p)) #define cpu_curr(cpu) (cpu_rq(cpu)->curr) +#define rq_switching(rq) atomic_read(&rq->in_ctx_switch) + /* * Default context-switch locking: */ #ifndef prepare_arch_switch -# define prepare_arch_switch(rq, next) do { } while (0) -# define finish_arch_switch(rq, next) spin_unlock_irq(&(rq)->lock) -# define task_running(rq, p) ((rq)->curr == (p)) +# define prepare_arch_switch(rq, next) \ +do { \ + rq->prev = current; \ + atomic_inc(&rq->in_ctx_switch); \ + spin_unlock_irq(&(rq)->lock); \ +} while (0) +# define finish_arch_switch(rq, next) atomic_dec(&rq->in_ctx_switch) +# define task_running(rq, p) ((rq)->curr == (p) || (rq_switching(rq) && rq->prev == (p))) #endif /* @@ -1966,6 +1975,9 @@ runqueue_t *rq = this_rq(); task_t *p = current; + if (rq_switching(rq)) + return; + rq->timestamp_last_tick = sched_clock(); if (rcu_pending(cpu)) @@ -3915,6 +3927,8 @@ INIT_LIST_HEAD(&rq->migration_queue); #endif atomic_set(&rq->nr_iowait, 0); + atomic_set(&rq->in_ctx_switch, 0); + rq->prev = NULL; for (j = 0; j < 2; j++) { array = rq->arrays + j; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 23:04 ` Davide Libenzi @ 2004-05-23 23:33 ` Russell King 2004-05-24 0:27 ` Davide Libenzi 0 siblings, 1 reply; 13+ messages in thread From: Russell King @ 2004-05-23 23:33 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel List On Sun, May 23, 2004 at 04:04:39PM -0700, Davide Libenzi wrote: > On Sun, 23 May 2004, Russell King wrote: > > Not quite - look harder. They use spin_unlock_irq in finish_arch_switch > > rather than prepare_arch_switch. > > Hmm, they do indeed. Hmm, if we release the rq lock before the ctx switch, > "prev" (the real one) will result not running since we already set > "rq->curr" to "next" (and we do not hold "prev->switch_lock"). We do hold prev->switch_lock - we hold it all the time that the thread is running. Consider what prepare_arch_switch() is doing - it's taking the next threads switch_lock. It only gets released _after_ we've switched away from that thread. So I think your analysis is flawed. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 23:33 ` Russell King @ 2004-05-24 0:27 ` Davide Libenzi 0 siblings, 0 replies; 13+ messages in thread From: Davide Libenzi @ 2004-05-24 0:27 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List On Mon, 24 May 2004, Russell King wrote: > On Sun, May 23, 2004 at 04:04:39PM -0700, Davide Libenzi wrote: > > On Sun, 23 May 2004, Russell King wrote: > > > Not quite - look harder. They use spin_unlock_irq in finish_arch_switch > > > rather than prepare_arch_switch. > > > > Hmm, they do indeed. Hmm, if we release the rq lock before the ctx switch, > > "prev" (the real one) will result not running since we already set > > "rq->curr" to "next" (and we do not hold "prev->switch_lock"). > > We do hold prev->switch_lock - we hold it all the time that the thread > is running. Consider what prepare_arch_switch() is doing - it's taking > the next threads switch_lock. It only gets released _after_ we've > switched away from that thread. It is flawed indeed :) (/me looking at it after ages). Even looking at the scheduler tick code, it does not play with mm fields, so it should be fine after the rq lock (and IRQ) release. Preempt is fine due to the preempt_disable() (plus switch_lock held), tasks result running (due switch_lock held) so remote CPUs won't touch them, and scheduler_tick() looking innocuous with respect to the fields that matters for switch. Is it blowing up on ARM? - Davide ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-23 16:43 scheduler: IRQs disabled over context switches Russell King 2004-05-23 18:59 ` Davide Libenzi @ 2004-05-24 8:37 ` Ingo Molnar 2004-05-24 6:41 ` Davide Libenzi 1 sibling, 1 reply; 13+ messages in thread From: Ingo Molnar @ 2004-05-24 8:37 UTC (permalink / raw) To: Linux Kernel List; +Cc: rmk+lkml, davidel * Russell King <rmk+lkml@arm.linux.org.uk> wrote: > The 2.6.6 scheduler disables IRQs across context switches, which is > bad news for IRQ latency on ARM - to the point where 16550A FIFO UARTs > to overrun. > > I'm considering defining prepare_arch_switch & co as follows on ARM, > so that we release IRQs over the call to context_switch(). > The question is... why are we keeping IRQs disabled over > context_switch() in the first case? Looking at the code, the only > thing which is touched outside of the two tasks is rq->prev_mm. Since > runqueues are CPU- specific and we're holding at least one spinlock, I > think the above is preempt safe and SMP safe. historically x86 context-switching has been pretty fragile when done with irqs enabled. (x86 has tons of legacy baggage, segments, etc.) It's also slightly faster to do the context-switch in one atomic swoop. On x86 we do this portion in like 1 usec so it's not a latency issue. if on ARM context-switching latency gives you UART problems then you can enables irqs via ARM-specific version of prepare_arch_switch() & finish_arch_switch(). Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 8:37 ` Ingo Molnar @ 2004-05-24 6:41 ` Davide Libenzi 2004-05-24 9:05 ` Ingo Molnar 0 siblings, 1 reply; 13+ messages in thread From: Davide Libenzi @ 2004-05-24 6:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List, rmk+lkml On Mon, 24 May 2004, Ingo Molnar wrote: > > * Russell King <rmk+lkml@arm.linux.org.uk> wrote: > > > The 2.6.6 scheduler disables IRQs across context switches, which is > > bad news for IRQ latency on ARM - to the point where 16550A FIFO UARTs > > to overrun. > > > > I'm considering defining prepare_arch_switch & co as follows on ARM, > > so that we release IRQs over the call to context_switch(). > > > The question is... why are we keeping IRQs disabled over > > context_switch() in the first case? Looking at the code, the only > > thing which is touched outside of the two tasks is rq->prev_mm. Since > > runqueues are CPU- specific and we're holding at least one spinlock, I > > think the above is preempt safe and SMP safe. > > historically x86 context-switching has been pretty fragile when done > with irqs enabled. (x86 has tons of legacy baggage, segments, etc.) It's > also slightly faster to do the context-switch in one atomic swoop. On > x86 we do this portion in like 1 usec so it's not a latency issue. We used to do it in 2.4. What changed to make it fragile? The threading (TLS) thing? - Davide ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 6:41 ` Davide Libenzi @ 2004-05-24 9:05 ` Ingo Molnar 2004-05-24 7:10 ` Nick Piggin 2004-05-24 17:16 ` Davide Libenzi 0 siblings, 2 replies; 13+ messages in thread From: Ingo Molnar @ 2004-05-24 9:05 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel List, rmk+lkml * Davide Libenzi <davidel@xmailserver.org> wrote: > We used to do it in 2.4. What changed to make it fragile? The > threading (TLS) thing? it _should_ work, but in the past we only had trouble from such changes (at least in the O(1) tree of scheduling - 2.4 scheduler is OK.). We could try the patch below. It certainly boots on SMP x86. But it causes a 3.5% slowdown in lat_ctx so i'd not do it unless there are some really good reasons. Ingo --- linux/kernel/sched.c.orig +++ linux/kernel/sched.c @@ -247,9 +247,15 @@ static DEFINE_PER_CPU(struct runqueue, r * Default context-switch locking: */ #ifndef prepare_arch_switch -# define prepare_arch_switch(rq, next) do { } while (0) -# define finish_arch_switch(rq, next) spin_unlock_irq(&(rq)->lock) -# define task_running(rq, p) ((rq)->curr == (p)) +# define prepare_arch_switch(rq, next) \ + do { \ + spin_lock(&(next)->switch_lock); \ + spin_unlock(&(rq)->lock); \ + } while (0) +# define finish_arch_switch(rq, prev) \ + spin_unlock_irq(&(prev)->switch_lock) +# define task_running(rq, p) \ + ((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock)) #endif /* ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 9:05 ` Ingo Molnar @ 2004-05-24 7:10 ` Nick Piggin 2004-05-24 9:15 ` Ingo Molnar 2004-05-24 17:16 ` Davide Libenzi 1 sibling, 1 reply; 13+ messages in thread From: Nick Piggin @ 2004-05-24 7:10 UTC (permalink / raw) To: Ingo Molnar; +Cc: Davide Libenzi, Linux Kernel List, rmk+lkml Ingo Molnar wrote: > * Davide Libenzi <davidel@xmailserver.org> wrote: > > >>We used to do it in 2.4. What changed to make it fragile? The >>threading (TLS) thing? > > > it _should_ work, but in the past we only had trouble from such changes > (at least in the O(1) tree of scheduling - 2.4 scheduler is OK.). We > could try the patch below. It certainly boots on SMP x86. But it causes > a 3.5% slowdown in lat_ctx so i'd not do it unless there are some really > good reasons. > > Ingo > > --- linux/kernel/sched.c.orig > +++ linux/kernel/sched.c > @@ -247,9 +247,15 @@ static DEFINE_PER_CPU(struct runqueue, r > * Default context-switch locking: > */ > #ifndef prepare_arch_switch > -# define prepare_arch_switch(rq, next) do { } while (0) > -# define finish_arch_switch(rq, next) spin_unlock_irq(&(rq)->lock) > -# define task_running(rq, p) ((rq)->curr == (p)) > +# define prepare_arch_switch(rq, next) \ > + do { \ > + spin_lock(&(next)->switch_lock); \ > + spin_unlock(&(rq)->lock); \ spin_unlock_irq? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 7:10 ` Nick Piggin @ 2004-05-24 9:15 ` Ingo Molnar 0 siblings, 0 replies; 13+ messages in thread From: Ingo Molnar @ 2004-05-24 9:15 UTC (permalink / raw) To: Nick Piggin; +Cc: Davide Libenzi, Linux Kernel List, rmk+lkml * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >+# define prepare_arch_switch(rq, next) \ > >+ do { \ > >+ spin_lock(&(next)->switch_lock); \ > >+ spin_unlock(&(rq)->lock); \ > > spin_unlock_irq? and spin_unlock in the finish_arch_switch() case, instead of spin_unlock_irq. New patch below. Ingo --- linux/kernel/sched.c.orig +++ linux/kernel/sched.c @@ -247,9 +247,15 @@ static DEFINE_PER_CPU(struct runqueue, r * Default context-switch locking: */ #ifndef prepare_arch_switch -# define prepare_arch_switch(rq, next) do { } while (0) -# define finish_arch_switch(rq, next) spin_unlock_irq(&(rq)->lock) -# define task_running(rq, p) ((rq)->curr == (p)) +# define prepare_arch_switch(rq, next) \ + do { \ + spin_lock(&(next)->switch_lock); \ + spin_unlock_irq(&(rq)->lock); \ + } while (0) +# define finish_arch_switch(rq, prev) \ + spin_unlock(&(prev)->switch_lock) +# define task_running(rq, p) \ + ((rq)->curr == (p) || spin_is_locked(&(p)->switch_lock)) #endif /* ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 9:05 ` Ingo Molnar 2004-05-24 7:10 ` Nick Piggin @ 2004-05-24 17:16 ` Davide Libenzi 2004-05-24 17:46 ` Davide Libenzi 1 sibling, 1 reply; 13+ messages in thread From: Davide Libenzi @ 2004-05-24 17:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List, rmk+lkml On Mon, 24 May 2004, Ingo Molnar wrote: > > * Davide Libenzi <davidel@xmailserver.org> wrote: > > > We used to do it in 2.4. What changed to make it fragile? The > > threading (TLS) thing? > > it _should_ work, but in the past we only had trouble from such changes > (at least in the O(1) tree of scheduling - 2.4 scheduler is OK.). We > could try the patch below. It certainly boots on SMP x86. But it causes > a 3.5% slowdown in lat_ctx so i'd not do it unless there are some really > good reasons. IMO it is fine, as long as it works with IRQ disabled. There are archs where IRQ latencies matters more than lat_ctx times (that BTW are bogus). And we already have the infrastructure in place to let the arch to choose the way better fits it. Russel reported that a guy trying it (IRQ enabled ctx switch) with MIPS was having some problem with it though. BTW, the unlock_irq should go in prepare not finish. - Davide ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: scheduler: IRQs disabled over context switches 2004-05-24 17:16 ` Davide Libenzi @ 2004-05-24 17:46 ` Davide Libenzi 0 siblings, 0 replies; 13+ messages in thread From: Davide Libenzi @ 2004-05-24 17:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List, rmk+lkml On Mon, 24 May 2004, Davide Libenzi wrote: > On Mon, 24 May 2004, Ingo Molnar wrote: > > > > > * Davide Libenzi <davidel@xmailserver.org> wrote: > > > > > We used to do it in 2.4. What changed to make it fragile? The > > > threading (TLS) thing? > > > > it _should_ work, but in the past we only had trouble from such changes > > (at least in the O(1) tree of scheduling - 2.4 scheduler is OK.). We > > could try the patch below. It certainly boots on SMP x86. But it causes > > a 3.5% slowdown in lat_ctx so i'd not do it unless there are some really > > good reasons. > > IMO it is fine, as long as it works with IRQ disabled. There are archs s/disabled/enabled/ - Davide ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-05-24 17:49 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-05-23 16:43 scheduler: IRQs disabled over context switches Russell King 2004-05-23 18:59 ` Davide Libenzi 2004-05-23 19:38 ` Russell King 2004-05-23 23:04 ` Davide Libenzi 2004-05-23 23:33 ` Russell King 2004-05-24 0:27 ` Davide Libenzi 2004-05-24 8:37 ` Ingo Molnar 2004-05-24 6:41 ` Davide Libenzi 2004-05-24 9:05 ` Ingo Molnar 2004-05-24 7:10 ` Nick Piggin 2004-05-24 9:15 ` Ingo Molnar 2004-05-24 17:16 ` Davide Libenzi 2004-05-24 17:46 ` Davide Libenzi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).