From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752643AbYJYCdT (ORCPT ); Fri, 24 Oct 2008 22:33:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751739AbYJYCdI (ORCPT ); Fri, 24 Oct 2008 22:33:08 -0400 Received: from e4.ny.us.ibm.com ([32.97.182.144]:49925 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751256AbYJYCdG (ORCPT ); Fri, 24 Oct 2008 22:33:06 -0400 Date: Fri, 24 Oct 2008 19:33:03 -0700 From: "Paul E. McKenney" To: Gautham R Shenoy Cc: Thomas Gleixner , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org Subject: Re: [PATCH] timers: Handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context Message-ID: <20081025023303.GD6248@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20081023215003.GC1480@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081023215003.GC1480@in.ibm.com> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 24, 2008 at 03:20:03AM +0530, Gautham R Shenoy wrote: > timers: Handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context > > From: Gautham R Shenoy > > While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline, > we queue them on the cb_pending list, so that they won't go > stale. > > Thus, when the callbacks of the timers run from the softirq context, > they could run into potential deadlocks, since these callbacks > assume that they're running with irq's disabled, thereby annoying > lockdep (see below)! > > Fix this by emulating hardirq context while running these callbacks from > the hrtimer softirq. Good catch!!! One confusion (probably on my part) below. > ================================= > [ INFO: inconsistent lock state ] > 2.6.27 #2 > -------------------------------- > inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes: > (&rq->lock){++..}, at: [] sched_rt_period_timer+0x9e/0x1fc > {in-hardirq-W} state was registered at: > [] __lock_acquire+0x549/0x121e > [] native_sched_clock+0x88/0x99 > [] clocksource_get_next+0x39/0x3f > [] update_wall_time+0x616/0x7df > [] lock_acquire+0x5a/0x74 > [] scheduler_tick+0x3a/0x18d > [] _spin_lock+0x1c/0x45 > [] scheduler_tick+0x3a/0x18d > [] scheduler_tick+0x3a/0x18d > [] update_process_times+0x3a/0x44 > [] tick_periodic+0x63/0x6d > [] tick_handle_periodic+0x14/0x5e > [] timer_interrupt+0x44/0x4a > [] handle_IRQ_event+0x13/0x3d > [] handle_level_irq+0x79/0xbd > [] do_IRQ+0x69/0x7d > [] common_interrupt+0x28/0x30 > [] aac_probe_one+0x1a3/0x3f3 > [] _spin_unlock_irqrestore+0x36/0x39 > [] setup_irq+0x1be/0x1f9 > [] start_kernel+0x259/0x2c5 > [] 0xffffffff > irq event stamp: 50102 > hardirqs last enabled at (50102): [] _spin_unlock_irq+0x20/0x23 > hardirqs last disabled at (50101): [] _spin_lock_irq+0xa/0x4b > softirqs last enabled at (50088): [] do_softirq+0x37/0x4d > softirqs last disabled at (50099): [] do_softirq+0x37/0x4d > > other info that might help us debug this: > no locks held by ksoftirqd/0/4. > > stack backtrace: > Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2 > [] print_usage_bug+0x13e/0x147 > [] mark_lock+0x493/0x797 > [] __lock_acquire+0x5be/0x121e > [] lock_acquire+0x5a/0x74 > [] sched_rt_period_timer+0x9e/0x1fc > [] _spin_lock+0x1c/0x45 > [] sched_rt_period_timer+0x9e/0x1fc > [] sched_rt_period_timer+0x9e/0x1fc > [] finish_task_switch+0x41/0xbd > [] native_sched_clock+0x88/0x99 > [] sched_rt_period_timer+0x0/0x1fc > [] run_hrtimer_pending+0x54/0xe5 > [] sched_rt_period_timer+0x0/0x1fc > [] __do_softirq+0x7b/0xef > [] do_softirq+0x37/0x4d > [] ksoftirqd+0x56/0xc5 > [] ksoftirqd+0x0/0xc5 > [] kthread+0x38/0x5d > [] kthread+0x0/0x5d > [] kernel_thread_helper+0x7/0x10 > ======================= > > Signed-off-by: Gautham R Shenoy > Cc: Thomas Gleixner > Cc: Peter Zijlstra > --- > > kernel/hrtimer.c | 21 ++++++++++++++++++--- > 1 files changed, 18 insertions(+), 3 deletions(-) > > > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c > index cdec83e..60aaad6 100644 > --- a/kernel/hrtimer.c > +++ b/kernel/hrtimer.c > @@ -1188,6 +1188,7 @@ static void run_hrtimer_pending(struct hrtimer_cpu_base *cpu_base) > enum hrtimer_restart (*fn)(struct hrtimer *); > struct hrtimer *timer; > int restart; > + int emulate_hardirq_ctx = 0; > > timer = list_entry(cpu_base->cb_pending.next, > struct hrtimer, cb_entry); > @@ -1196,12 +1197,26 @@ static void run_hrtimer_pending(struct hrtimer_cpu_base *cpu_base) > timer_stats_account_hrtimer(timer); > > fn = timer->function; > + /* > + * A timer might have been added to the cb_pending list > + * when it was migrated during a cpu-offline operation. > + * Emulate hardirq context for such timers. > + */ > + if (timer->cb_mode == HRTIMER_CB_IRQSAFE_PERCPU || > + timer->cb_mode == HRTIMER_CB_IRQSAFE_UNLOCKED) > + emulate_hardirq_ctx = 1; Is this the case where we need to emulate a hardirq context? > + > __remove_hrtimer(timer, timer->base, HRTIMER_STATE_CALLBACK, 0); > - spin_unlock_irq(&cpu_base->lock); > + spin_unlock(&cpu_base->lock); > > - restart = fn(timer); > + if (likely(!emulate_hardirq_ctx)) { If so, why the "!" above? Or am I misinterpreting the name of the variable? Thanx, Paul > + local_irq_enable(); > + restart = fn(timer); > + local_irq_disable(); > + } else > + restart = fn(timer); > > - spin_lock_irq(&cpu_base->lock); > + spin_lock(&cpu_base->lock); > > timer->state &= ~HRTIMER_STATE_CALLBACK; > if (restart == HRTIMER_RESTART) { > -- > Thanks and Regards > gautham