LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [patch 0/3] Clocksource / clockevent updates
[not found] <20070430102837.748238000@linutronix.de>
@ 2007-05-02 0:33 ` Andrew Morton
2007-05-02 6:09 ` Thomas Gleixner
[not found] ` <20070430102852.042964000@linutronix.de>
[not found] ` <20070430102851.987296000@linutronix.de>
2 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-02 0:33 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz
On Mon, 30 Apr 2007 10:43:31 -0000
Thomas Gleixner <tglx@linutronix.de> wrote:
> Andrew,
>
> please pick up the following updates to clocksource / clockevents:
>
> - Fixups to the resume logic
> - Keep TSC stable, when lapic_timer_c2_ok is set
>
Should we be targetting these at 2.6.20.x?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 0/3] Clocksource / clockevent updates
2007-05-02 0:33 ` [patch 0/3] Clocksource / clockevent updates Andrew Morton
@ 2007-05-02 6:09 ` Thomas Gleixner
2007-05-02 6:14 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-02 6:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: LKML, Ingo Molnar, John Stultz
On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote:
> On Mon, 30 Apr 2007 10:43:31 -0000
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Andrew,
> >
> > please pick up the following updates to clocksource / clockevents:
> >
> > - Fixups to the resume logic
> > - Keep TSC stable, when lapic_timer_c2_ok is set
> >
>
> Should we be targetting these at 2.6.20.x?
2.6.21.x ?
Hmm. They should get some testing first, but otherwise yes.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 0/3] Clocksource / clockevent updates
2007-05-02 6:09 ` Thomas Gleixner
@ 2007-05-02 6:14 ` Andrew Morton
0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-02 6:14 UTC (permalink / raw)
To: tglx; +Cc: LKML, Ingo Molnar, John Stultz
On Wed, 02 May 2007 08:09:29 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote:
> > On Mon, 30 Apr 2007 10:43:31 -0000
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > > Andrew,
> > >
> > > please pick up the following updates to clocksource / clockevents:
> > >
> > > - Fixups to the resume logic
> > > - Keep TSC stable, when lapic_timer_c2_ok is set
> > >
> >
> > Should we be targetting these at 2.6.20.x?
>
> 2.6.21.x ?
>
> Hmm. They should get some testing first, but otherwise yes.
>
OK, I added the cc. The second patch won't apply to 2.6.21 when we get
around to it, but it'll be pretty simple to repair.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic
[not found] ` <20070430102852.042964000@linutronix.de>
@ 2007-05-05 7:34 ` Andrew Morton
2007-05-05 11:51 ` Ingo Molnar
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-05 7:34 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz
On Mon, 30 Apr 2007 10:43:34 -0000 Thomas Gleixner <tglx@linutronix.de> wrote:
> We need to make sure, that the clockevent devices are resumed, before
> the tick is resumed. The current resume logic does not guarantee this.
>
> Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
> event devices before resuming the tick / oneshot functionality.
>
> Fixup the existing users.
This one makes the Vaio-of-fun hang during suspend to disk. It gets up to
"swsusp: critical section/: done (%d pages copied)" then it freezes.
I retested only 2.6.21 plus
origin
clocksource-fix-resume-logic
clockevents-fix-resume-logic
and that hung in the same fashion.
Config at http://userweb.kernel.org/~akpm/config-sony.txt
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic
2007-05-05 7:34 ` [patch 3/3] clockevents: Fix resume logic Andrew Morton
@ 2007-05-05 11:51 ` Ingo Molnar
2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2007-05-05 11:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: Thomas Gleixner, LKML, John Stultz
* Andrew Morton <akpm@linux-foundation.org> wrote:
> > Fixup the existing users.
>
> This one makes the Vaio-of-fun hang during suspend to disk. It gets
> up to "swsusp: critical section/: done (%d pages copied)" then it
> freezes.
after trying to reproduce it on 2 boxes without success it did trigger
some sw-suspend weirdness on a third box :) We are debugging it now.
Ingo
^ permalink raw reply [flat|nested] 61+ messages in thread
* [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-05 11:51 ` Ingo Molnar
@ 2007-05-06 15:03 ` Thomas Gleixner
2007-05-08 10:37 ` Andrew Morton
2007-05-09 5:59 ` Andrew Morton
0 siblings, 2 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-06 15:03 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, LKML, John Stultz
Andrew,
On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > > Fixup the existing users.
> >
> > This one makes the Vaio-of-fun hang during suspend to disk. It gets
> > up to "swsusp: critical section/: done (%d pages copied)" then it
> > freezes.
>
> after trying to reproduce it on 2 boxes without success it did trigger
> some sw-suspend weirdness on a third box :) We are debugging it now.
find an updated patch below. It fixes the problem on Ingo's
VAIO-of-fun-emulator and I got confirmation from several other affected
users, that the patch series is still solving their problems.
tglx
------------------------>
Subject: clockevents: Fix resume logic
We need to make sure, that the clockevent devices are resumed, before
the tick is resumed. The current resume logic does not guarantee this.
Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
event devices before resuming the tick / oneshot functionality.
Fixup the existing users.
Fix the PIT handling of CLOCK_EVT_SHUTDOWN by disabling the interrupt
instead of switching to oneshot mode, as this causes slowness on a couple
of systems.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/i386/kernel/apic.c | 3 +
arch/i386/kernel/hpet.c | 71 +++----------------------------------------
arch/i386/kernel/i8253.c | 43 +++++++++++++++++---------
include/linux/clockchips.h | 1
kernel/time/tick-broadcast.c | 4 +-
kernel/time/tick-common.c | 16 ++++++---
6 files changed, 51 insertions(+), 87 deletions(-)
Index: linux-2.6.21/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/apic.c
+++ linux-2.6.21/arch/i386/kernel/apic.c
@@ -242,6 +242,9 @@ static void lapic_timer_setup(enum clock
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write_around(APIC_LVTT, v);
break;
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
+ break;
}
local_irq_restore(flags);
Index: linux-2.6.21/arch/i386/kernel/hpet.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/hpet.c
+++ linux-2.6.21/arch/i386/kernel/hpet.c
@@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve
cfg &= ~HPET_TN_ENABLE;
hpet_writel(cfg, HPET_T0_CFG);
break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ hpet_enable_int();
+ break;
}
}
@@ -217,6 +221,7 @@ static struct clocksource clocksource_hp
.mask = HPET_MASK,
.shift = HPET_SHIFT,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+ .resume = hpet_start_counter,
};
/*
@@ -291,7 +296,6 @@ int __init hpet_enable(void)
clocksource_register(&clocksource_hpet);
-
if (id & HPET_ID_LEGSUP) {
hpet_enable_int();
hpet_reserve_platform_timers(id);
@@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq,
return IRQ_HANDLED;
}
#endif
-
-
-/*
- * Suspend/resume part
- */
-
-#ifdef CONFIG_PM
-
-static int hpet_suspend(struct sys_device *sys_device, pm_message_t state)
-{
- unsigned long cfg = hpet_readl(HPET_CFG);
-
- cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY);
- hpet_writel(cfg, HPET_CFG);
-
- return 0;
-}
-
-static int hpet_resume(struct sys_device *sys_device)
-{
- unsigned int id;
-
- hpet_start_counter();
-
- id = hpet_readl(HPET_ID);
-
- if (id & HPET_ID_LEGSUP)
- hpet_enable_int();
-
- return 0;
-}
-
-static struct sysdev_class hpet_class = {
- set_kset_name("hpet"),
- .suspend = hpet_suspend,
- .resume = hpet_resume,
-};
-
-static struct sys_device hpet_device = {
- .id = 0,
- .cls = &hpet_class,
-};
-
-
-static __init int hpet_register_sysfs(void)
-{
- int err;
-
- if (!is_hpet_capable())
- return 0;
-
- err = sysdev_class_register(&hpet_class);
-
- if (!err) {
- err = sysdev_register(&hpet_device);
- if (err)
- sysdev_class_unregister(&hpet_class);
- }
-
- return err;
-}
-
-device_initcall(hpet_register_sysfs);
-
-#endif
Index: linux-2.6.21/arch/i386/kernel/i8253.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/i8253.c
+++ linux-2.6.21/arch/i386/kernel/i8253.c
@@ -3,11 +3,11 @@
*
*/
#include <linux/clockchips.h>
-#include <linux/spinlock.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
#include <linux/jiffies.h>
-#include <linux/sysdev.h>
#include <linux/module.h>
-#include <linux/init.h>
+#include <linux/spinlock.h>
#include <asm/smp.h>
#include <asm/delay.h>
@@ -25,6 +25,24 @@ EXPORT_SYMBOL(i8253_lock);
*/
struct clock_event_device *global_clock_event;
+/* Status of the PIT interrupt */
+static int pit_irq_disabled;
+
+/*
+ * Control pit interrupt enable / disable
+ */
+static void pit_control_irq(int disable)
+{
+ if (pit_irq_disabled == disable)
+ return;
+
+ pit_irq_disabled = disable;
+ if (disable)
+ disable_irq(0);
+ else
+ enable_irq(0);
+}
+
/*
* Initialize the PIT timer.
*
@@ -41,26 +59,23 @@ static void init_pit_timer(enum clock_ev
case CLOCK_EVT_MODE_PERIODIC:
/* binary, mode 2, LSB/MSB, ch 0 */
outb_p(0x34, PIT_MODE);
- udelay(10);
outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
- udelay(10);
outb(LATCH >> 8 , PIT_CH0); /* MSB */
+ pit_control_irq(0);
break;
- /*
- * Avoid unnecessary state transitions, as it confuses
- * Geode / Cyrix based boxen.
- */
case CLOCK_EVT_MODE_SHUTDOWN:
- if (evt->mode == CLOCK_EVT_MODE_UNUSED)
- break;
case CLOCK_EVT_MODE_UNUSED:
- if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
- break;
+ pit_control_irq(1);
+ break;
case CLOCK_EVT_MODE_ONESHOT:
/* One shot setup */
outb_p(0x38, PIT_MODE);
- udelay(10);
+ pit_control_irq(0);
+ break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
break;
}
spin_unlock_irqrestore(&i8253_lock, flags);
Index: linux-2.6.21/include/linux/clockchips.h
===================================================================
--- linux-2.6.21.orig/include/linux/clockchips.h
+++ linux-2.6.21/include/linux/clockchips.h
@@ -23,6 +23,7 @@ enum clock_event_mode {
CLOCK_EVT_MODE_SHUTDOWN,
CLOCK_EVT_MODE_PERIODIC,
CLOCK_EVT_MODE_ONESHOT,
+ CLOCK_EVT_MODE_RESUME,
};
/* Clock event notification values */
Index: linux-2.6.21/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.21.orig/kernel/time/tick-broadcast.c
+++ linux-2.6.21/kernel/time/tick-broadcast.c
@@ -292,7 +292,7 @@ void tick_suspend_broadcast(void)
spin_lock_irqsave(&tick_broadcast_lock, flags);
bc = tick_broadcast_device.evtdev;
- if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+ if (bc)
clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
@@ -309,6 +309,8 @@ int tick_resume_broadcast(void)
bc = tick_broadcast_device.evtdev;
if (bc) {
+ clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME);
+
switch (tick_broadcast_device.mode) {
case TICKDEV_MODE_PERIODIC:
if(!cpus_empty(tick_broadcast_mask))
Index: linux-2.6.21/kernel/time/tick-common.c
===================================================================
--- linux-2.6.21.orig/kernel/time/tick-common.c
+++ linux-2.6.21/kernel/time/tick-common.c
@@ -318,12 +318,17 @@ static void tick_resume(void)
{
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
+ int broadcast = tick_resume_broadcast();
spin_lock_irqsave(&tick_device_lock, flags);
- if (td->mode == TICKDEV_MODE_PERIODIC)
- tick_setup_periodic(td->evtdev, 0);
- else
- tick_resume_oneshot();
+ clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME);
+
+ if (!broadcast) {
+ if (td->mode == TICKDEV_MODE_PERIODIC)
+ tick_setup_periodic(td->evtdev, 0);
+ else
+ tick_resume_oneshot();
+ }
spin_unlock_irqrestore(&tick_device_lock, flags);
}
@@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b
break;
case CLOCK_EVT_NOTIFY_RESUME:
- if (!tick_resume_broadcast())
- tick_resume();
+ tick_resume();
break;
default:
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set
[not found] ` <20070430102851.987296000@linutronix.de>
@ 2007-05-07 6:43 ` Andrew Morton
2007-05-07 7:01 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-07 6:43 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, John Stultz, Len Brown, linux-acpi
On Mon, 30 Apr 2007 10:43:33 -0000 Thomas Gleixner <tglx@linutronix.de> wrote:
> The local apic timer stop in C2 resp. C3 states is coupled with the
> stop of the TSC. When the local apic timer is marked stable in C2
> on the kernel commandline, then keep the TSC marked stable in C2 as well.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> drivers/acpi/processor_idle.c | 19 ++++++++-----------
> 1 file changed, 8 insertions(+), 11 deletions(-)
>
> Index: linux-2.6/drivers/acpi/processor_idle.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/processor_idle.c
> +++ linux-2.6/drivers/acpi/processor_idle.c
> @@ -305,18 +305,23 @@ static void acpi_state_timer_broadcast(s
> struct acpi_processor_cx *cx,
> int broadcast)
> {
> -#ifdef CONFIG_GENERIC_CLOCKEVENTS
> -
> int state = cx - pr->power.states;
>
> if (state >= pr->power.timer_broadcast_on_state) {
> +
> +#ifdef CONFIG_GENERIC_CLOCKEVENTS
> unsigned long reason;
>
> reason = broadcast ? CLOCK_EVT_NOTIFY_BROADCAST_ENTER :
> CLOCK_EVT_NOTIFY_BROADCAST_EXIT;
> clockevents_notify(reason, &pr->id);
> - }
> #endif
> +
> +#ifdef CONFIG_GENERIC_TIME
> + /* TSC halts in C2/3, so notify users */
> + mark_tsc_unstable();
> +#endif
> + }
> }
>
> #else
> @@ -481,10 +486,6 @@ static void acpi_processor_idle(void)
> /* Get end time (ticks) */
> t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>
> -#ifdef CONFIG_GENERIC_TIME
> - /* TSC halts in C2, so notify users */
> - mark_tsc_unstable();
> -#endif
> /* Re-enable interrupts */
> local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> @@ -523,10 +524,6 @@ static void acpi_processor_idle(void)
> acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0);
> }
>
> -#ifdef CONFIG_GENERIC_TIME
> - /* TSC halts in C3, so notify users */
> - mark_tsc_unstable();
> -#endif
> /* Re-enable interrupts */
> local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
>
> --
I'm now up to the third or fourth time where I need to try to repair this
patch against churn in Len's tree. I'm not sure what I'm doing and that
changelog is of little help to me.
I think I'll drop it. Please work it with Len.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set
2007-05-07 6:43 ` [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set Andrew Morton
@ 2007-05-07 7:01 ` Thomas Gleixner
0 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-07 7:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: LKML, Ingo Molnar, John Stultz, Len Brown, linux-acpi
On Sun, 2007-05-06 at 23:43 -0700, Andrew Morton wrote:
> I'm now up to the third or fourth time where I need to try to repair this
> patch against churn in Len's tree. I'm not sure what I'm doing and that
> changelog is of little help to me.
>
> I think I'll drop it. Please work it with Len.
Will do.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner
@ 2007-05-08 10:37 ` Andrew Morton
2007-05-09 5:59 ` Andrew Morton
1 sibling, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-08 10:37 UTC (permalink / raw)
To: tglx; +Cc: Ingo Molnar, LKML, John Stultz
On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote:
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > > > Fixup the existing users.
> > >
> > > This one makes the Vaio-of-fun hang during suspend to disk. It gets
> > > up to "swsusp: critical section/: done (%d pages copied)" then it
> > > freezes.
> >
> > after trying to reproduce it on 2 boxes without success it did trigger
> > some sw-suspend weirdness on a third box :) We are debugging it now.
>
> find an updated patch below. It fixes the problem on Ingo's
> VAIO-of-fun-emulator and I got confirmation from several other affected
> users, that the patch series is still solving their problems.
yup, that now survives suspend-to-disk and resume, thanks.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner
2007-05-08 10:37 ` Andrew Morton
@ 2007-05-09 5:59 ` Andrew Morton
2007-05-09 7:10 ` Andrew Morton
1 sibling, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-09 5:59 UTC (permalink / raw)
To: tglx; +Cc: Ingo Molnar, LKML, John Stultz
On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> Andrew,
>
> On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote:
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > > > Fixup the existing users.
> > >
> > > This one makes the Vaio-of-fun hang during suspend to disk. It gets
> > > up to "swsusp: critical section/: done (%d pages copied)" then it
> > > freezes.
> >
> > after trying to reproduce it on 2 boxes without success it did trigger
> > some sw-suspend weirdness on a third box :) We are debugging it now.
>
> find an updated patch below. It fixes the problem on Ingo's
> VAIO-of-fun-emulator and I got confirmation from several other affected
> users, that the patch series is still solving their problems.
>
The machine is still hanging with this patch applied.
suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)"
No netconsole, no printk-timestamping.
ho hum, I guess I get to debug this.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 5:59 ` Andrew Morton
@ 2007-05-09 7:10 ` Andrew Morton
2007-05-09 8:18 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-09 7:10 UTC (permalink / raw)
To: tglx, Ingo Molnar, LKML, John Stultz; +Cc: linux-acpi
On Tue, 8 May 2007 22:59:20 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 06 May 2007 17:03:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Andrew,
> >
> > On Sat, 2007-05-05 at 13:51 +0200, Ingo Molnar wrote:
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > > > Fixup the existing users.
> > > >
> > > > This one makes the Vaio-of-fun hang during suspend to disk. It gets
> > > > up to "swsusp: critical section/: done (%d pages copied)" then it
> > > > freezes.
> > >
> > > after trying to reproduce it on 2 boxes without success it did trigger
> > > some sw-suspend weirdness on a third box :) We are debugging it now.
> >
> > find an updated patch below. It fixes the problem on Ingo's
> > VAIO-of-fun-emulator and I got confirmation from several other affected
> > users, that the patch series is still solving their problems.
> >
>
> The machine is still hanging with this patch applied.
>
> suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)"
>
> No netconsole, no printk-timestamping.
>
> ho hum, I guess I get to debug this.
It got ugly.
We finish swsusp_save() and a few other functions then we go
hibernate
->platform_finish
->acpi_hibernation_finish
->acpi_leave_sleep_state
->acpi_evaluate_object
and there it dies, in this call:
status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL);
I wonder how your patch caused that?
<debugs further>
OK, it gets to the last statement in acpi_evaluate_object():
return_ACPI_STATUS(status);
but doesn't hit the printk on return to the caller,
acpi_leave_sleep_state().
A working theory would be that something we did trashed the stack in
acpi_evaluate_object().
<switches from 8k stacks to 4k. No change>
foo. I'm not sure what to do now.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 7:10 ` Andrew Morton
@ 2007-05-09 8:18 ` Thomas Gleixner
2007-05-09 8:22 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 8:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote:
> > > find an updated patch below. It fixes the problem on Ingo's
> > > VAIO-of-fun-emulator and I got confirmation from several other affected
> > > users, that the patch series is still solving their problems.
> > >
> >
> > The machine is still hanging with this patch applied.
What did change since yesterday ?
> yup, that now survives suspend-to-disk and resume, thanks.
> > suspend-to-disk gets up to "swsusp: critical section: done (NNN pages copied)"
> >
> > No netconsole, no printk-timestamping.
> >
> > ho hum, I guess I get to debug this.
>
> It got ugly.
>
> We finish swsusp_save() and a few other functions then we go
>
> hibernate
> ->platform_finish
> ->acpi_hibernation_finish
> ->acpi_leave_sleep_state
> ->acpi_evaluate_object
>
> and there it dies, in this call:
>
> status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL);
>
> I wonder how your patch caused that?
hmm, that's more than strange
> <debugs further>
>
> OK, it gets to the last statement in acpi_evaluate_object():
>
> return_ACPI_STATUS(status);
>
> but doesn't hit the printk on return to the caller,
> acpi_leave_sleep_state().
>
> A working theory would be that something we did trashed the stack in
> acpi_evaluate_object().
>
> <switches from 8k stacks to 4k. No change>
>
> foo. I'm not sure what to do now.
Hmm. Do we enable interrupts somewhere where we should not ?
/me goes looking for such places.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 8:18 ` Thomas Gleixner
@ 2007-05-09 8:22 ` Andrew Morton
2007-05-09 8:31 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-09 8:22 UTC (permalink / raw)
To: tglx; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 09 May 2007 10:18:17 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote:
> > > > find an updated patch below. It fixes the problem on Ingo's
> > > > VAIO-of-fun-emulator and I got confirmation from several other affected
> > > > users, that the patch series is still solving their problems.
> > > >
> > >
> > > The machine is still hanging with this patch applied.
>
> What did change since yesterday ?
I suspect I just tested the wrong thing yesterday. Let me recheck just
these patches against 2.6.21.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 8:22 ` Andrew Morton
@ 2007-05-09 8:31 ` Andrew Morton
2007-05-09 8:59 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-09 8:31 UTC (permalink / raw)
To: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 9 May 2007 01:22:57 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 09 May 2007 10:18:17 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > On Wed, 2007-05-09 at 00:10 -0700, Andrew Morton wrote:
> > > > > find an updated patch below. It fixes the problem on Ingo's
> > > > > VAIO-of-fun-emulator and I got confirmation from several other affected
> > > > > users, that the patch series is still solving their problems.
> > > > >
> > > >
> > > > The machine is still hanging with this patch applied.
> >
> > What did change since yesterday ?
>
> I suspect I just tested the wrong thing yesterday. Let me recheck just
> these patches against 2.6.21.
yup, same hang with just these three:
origin
clocksource-fix-resume-logic
clockevents-fix-resume-logic-updated-version
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 8:31 ` Andrew Morton
@ 2007-05-09 8:59 ` Thomas Gleixner
2007-05-09 11:45 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 8:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > these patches against 2.6.21.
>
> yup, same hang with just these three:
>
> origin
> clocksource-fix-resume-logic
> clockevents-fix-resume-logic-updated-version
I have no idea, how this affects acpi_evaluate_object()
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 8:59 ` Thomas Gleixner
@ 2007-05-09 11:45 ` Rafael J. Wysocki
2007-05-09 12:24 ` Thomas Gleixner
2007-05-09 12:52 ` Rafael J. Wysocki
0 siblings, 2 replies; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 11:45 UTC (permalink / raw)
To: tglx, Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > these patches against 2.6.21.
> >
> > yup, same hang with just these three:
> >
> > origin
> > clocksource-fix-resume-logic
> > clockevents-fix-resume-logic-updated-version
>
> I have no idea, how this affects acpi_evaluate_object()
I think the problem is that the ACPI code ordering here is broken in a
difficult to fix way.
Definitely, we shouldn't execute the _BFS method after creating the image
and most probably _WAK shouldn't be executed here either. Moreover,
acpi_leave_sleep_state() enables the runtime GPEs, which AFAICS
is equivalent to allowing ACPI to generate SCIs. I'm not sure if this is a
good idea to do such a thing in this particular place.
Andrew, could you please apply the appended patch and see if that
helps (should apply to -mm2)?
Rafael
---
NOTE: This is not a complete solution, because it removes the enabling of GPEs
from the resume-during-hibernation code path entirely, which probbably is not a
good idea in general.
---
kernel/power/disk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6.21/kernel/power/disk.c
===================================================================
--- linux-2.6.21.orig/kernel/power/disk.c
+++ linux-2.6.21/kernel/power/disk.c
@@ -131,7 +131,9 @@ int hibernation_snapshot(int platform_mo
}
enable_nonboot_cpus();
Resume_devices:
- platform_finish(platform_mode);
+ if (!in_suspend || error)
+ platform_finish(platform_mode);
+
device_resume();
resume_console();
Finish:
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 11:45 ` Rafael J. Wysocki
@ 2007-05-09 12:24 ` Thomas Gleixner
2007-05-09 13:12 ` Rafael J. Wysocki
2007-05-09 12:52 ` Rafael J. Wysocki
1 sibling, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 12:24 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > > these patches against 2.6.21.
> > >
> > > yup, same hang with just these three:
> > >
> > > origin
> > > clocksource-fix-resume-logic
> > > clockevents-fix-resume-logic-updated-version
> >
> > I have no idea, how this affects acpi_evaluate_object()
>
> I think the problem is that the ACPI code ordering here is broken in a
> difficult to fix way.
Any explanation aside of witchcraft why this is affected by the clock
event resume changes ?
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 11:45 ` Rafael J. Wysocki
2007-05-09 12:24 ` Thomas Gleixner
@ 2007-05-09 12:52 ` Rafael J. Wysocki
2007-05-09 17:14 ` Andrew Morton
1 sibling, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 12:52 UTC (permalink / raw)
To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 13:45, Rafael J. Wysocki wrote:
> On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > > these patches against 2.6.21.
> > >
> > > yup, same hang with just these three:
> > >
> > > origin
> > > clocksource-fix-resume-logic
> > > clockevents-fix-resume-logic-updated-version
> >
> > I have no idea, how this affects acpi_evaluate_object()
>
> I think the problem is that the ACPI code ordering here is broken in a
> difficult to fix way.
>
> Definitely, we shouldn't execute the _BFS method after creating the image
> and most probably _WAK shouldn't be executed here either. Moreover,
> acpi_leave_sleep_state() enables the runtime GPEs, which AFAICS
> is equivalent to allowing ACPI to generate SCIs. I'm not sure if this is a
> good idea to do such a thing in this particular place.
>
> Andrew, could you please apply the appended patch and see if that
> helps (should apply to -mm2)?
Argh, sorry. This needs yet another patch (sent for review to linux-pm) to
be applied. The following one is against -mm2:
---
NOTE: This is not a complete solution, because it removes the enabling of GPEs
from the resume-during-hibernation code path entirely, which probbably is not a
good idea in general.
---
kernel/power/disk.c | 1 -
1 file changed, 1 deletion(-)
Index: linux-2.6.21-mm2/kernel/power/disk.c
===================================================================
--- linux-2.6.21-mm2.orig/kernel/power/disk.c
+++ linux-2.6.21-mm2/kernel/power/disk.c
@@ -205,7 +205,6 @@ int hibernate(void)
if (in_suspend) {
enable_nonboot_cpus();
- platform_finish();
device_resume();
resume_console();
pr_debug("PM: writing image.\n");
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 12:24 ` Thomas Gleixner
@ 2007-05-09 13:12 ` Rafael J. Wysocki
2007-05-09 13:19 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 13:12 UTC (permalink / raw)
To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote:
> On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote:
> > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > > > these patches against 2.6.21.
> > > >
> > > > yup, same hang with just these three:
> > > >
> > > > origin
> > > > clocksource-fix-resume-logic
> > > > clockevents-fix-resume-logic-updated-version
> > >
> > > I have no idea, how this affects acpi_evaluate_object()
> >
> > I think the problem is that the ACPI code ordering here is broken in a
> > difficult to fix way.
>
> Any explanation aside of witchcraft why this is affected by the clock
> event resume changes ?
Well, where is unregister_time_interpolator() called from?
Rafael
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 13:12 ` Rafael J. Wysocki
@ 2007-05-09 13:19 ` Thomas Gleixner
2007-05-09 17:09 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 13:19 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 15:12 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote:
> > On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> > > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > > > > these patches against 2.6.21.
> > > > >
> > > > > yup, same hang with just these three:
> > > > >
> > > > > origin
> > > > > clocksource-fix-resume-logic
> > > > > clockevents-fix-resume-logic-updated-version
> > > >
> > > > I have no idea, how this affects acpi_evaluate_object()
> > >
> > > I think the problem is that the ACPI code ordering here is broken in a
> > > difficult to fix way.
> >
> > Any explanation aside of witchcraft why this is affected by the clock
> > event resume changes ?
>
> Well, where is unregister_time_interpolator() called from?
# grep -rn unregister_time_interpolator .
./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti)
./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *);
I don't see a caller. i386 does not use time interpolator anyway.
# find -iname Kconfig | xargs grep TIME_INTERPOLATION
./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION
./arch/ia64/Kconfig:60:config TIME_INTERPOLATION
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 13:19 ` Thomas Gleixner
@ 2007-05-09 17:09 ` Rafael J. Wysocki
2007-05-09 17:15 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 17:09 UTC (permalink / raw)
To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 15:19, Thomas Gleixner wrote:
> On Wed, 2007-05-09 at 15:12 +0200, Rafael J. Wysocki wrote:
> > On Wednesday, 9 May 2007 14:24, Thomas Gleixner wrote:
> > > On Wed, 2007-05-09 at 13:45 +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, 9 May 2007 10:59, Thomas Gleixner wrote:
> > > > > On Wed, 2007-05-09 at 01:31 -0700, Andrew Morton wrote:
> > > > > > > I suspect I just tested the wrong thing yesterday. Let me recheck just
> > > > > > > these patches against 2.6.21.
> > > > > >
> > > > > > yup, same hang with just these three:
> > > > > >
> > > > > > origin
> > > > > > clocksource-fix-resume-logic
> > > > > > clockevents-fix-resume-logic-updated-version
> > > > >
> > > > > I have no idea, how this affects acpi_evaluate_object()
> > > >
> > > > I think the problem is that the ACPI code ordering here is broken in a
> > > > difficult to fix way.
> > >
> > > Any explanation aside of witchcraft why this is affected by the clock
> > > event resume changes ?
> >
> > Well, where is unregister_time_interpolator() called from?
>
> # grep -rn unregister_time_interpolator .
> ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti)
> ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *);
>
> I don't see a caller. i386 does not use time interpolator anyway.
>
> # find -iname Kconfig | xargs grep TIME_INTERPOLATION
> ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION
> ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION
But clocksource_resume() has no other caller, AFAICS ...
Well, I don't see any explanation that wouldn't involve witchcraft.
Rafael
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 12:52 ` Rafael J. Wysocki
@ 2007-05-09 17:14 ` Andrew Morton
2007-05-09 18:55 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-09 17:14 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 9 May 2007 14:52:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > Andrew, could you please apply the appended patch and see if that
> > helps (should apply to -mm2)?
>
> Argh, sorry. This needs yet another patch (sent for review to linux-pm) to
> be applied. The following one is against -mm2:
>
> ---
> NOTE: This is not a complete solution, because it removes the enabling of GPEs
> from the resume-during-hibernation code path entirely, which probbably is not a
> good idea in general.
> ---
> kernel/power/disk.c | 1 -
> 1 file changed, 1 deletion(-)
>
> Index: linux-2.6.21-mm2/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.21-mm2.orig/kernel/power/disk.c
> +++ linux-2.6.21-mm2/kernel/power/disk.c
> @@ -205,7 +205,6 @@ int hibernate(void)
>
> if (in_suspend) {
> enable_nonboot_cpus();
> - platform_finish();
> device_resume();
> resume_console();
> pr_debug("PM: writing image.\n");
It now hangs in a similar fashion in the device_resume() call.
If I back off Thomas's clockevents-fix-resume-logic-updated-version.patch
and include just the above patch the machine does suspend and resume
correctly.
I can delve further into the device_resume() hang if we think it would be
useful. Maybe Linus's clock-stomping tracer can help here, if I can
remember how to use it.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 17:09 ` Rafael J. Wysocki
@ 2007-05-09 17:15 ` Thomas Gleixner
2007-05-09 18:36 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 17:15 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 19:09 +0200, Rafael J. Wysocki wrote:
> > > Well, where is unregister_time_interpolator() called from?
> >
> > # grep -rn unregister_time_interpolator .
> > ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti)
> > ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *);
> >
> > I don't see a caller. i386 does not use time interpolator anyway.
> >
> > # find -iname Kconfig | xargs grep TIME_INTERPOLATION
> > ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION
> > ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION
>
> But clocksource_resume() has no other caller, AFAICS ...
Eeep ?
clocksource_resume is called from timekeeping_resume()
timestatic int timekeeping_resume(struct sys_device *dev)
{
unsigned long flags;
unsigned long now = read_persistent_clock();
clocksource_resume();
....
}
keeping_resume() called via the sysdev resume
static struct sysdev_class timekeeping_sysclass = {
.resume = timekeeping_resume,
.suspend = timekeeping_suspend,
set_kset_name("timekeeping"),
};
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 17:15 ` Thomas Gleixner
@ 2007-05-09 18:36 ` Rafael J. Wysocki
2007-05-09 20:45 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 18:36 UTC (permalink / raw)
To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 19:15, Thomas Gleixner wrote:
> On Wed, 2007-05-09 at 19:09 +0200, Rafael J. Wysocki wrote:
> > > > Well, where is unregister_time_interpolator() called from?
> > >
> > > # grep -rn unregister_time_interpolator .
> > > ./kernel/timer.c:1893:unregister_time_interpolator(struct time_interpolator *ti)
> > > ./include/linux/timex.h:270:extern void unregister_time_interpolator(struct time_interpolator *);
> > >
> > > I don't see a caller. i386 does not use time interpolator anyway.
> > >
> > > # find -iname Kconfig | xargs grep TIME_INTERPOLATION
> > > ./arch/sparc64/Kconfig:37:config TIME_INTERPOLATION
> > > ./arch/ia64/Kconfig:60:config TIME_INTERPOLATION
> >
> > But clocksource_resume() has no other caller, AFAICS ...
>
> Eeep ?
>
> clocksource_resume is called from timekeeping_resume()
>
> timestatic int timekeeping_resume(struct sys_device *dev)
> {
> unsigned long flags;
> unsigned long now = read_persistent_clock();
>
> clocksource_resume();
> ....
> }
>
> keeping_resume() called via the sysdev resume
>
> static struct sysdev_class timekeeping_sysclass = {
> .resume = timekeeping_resume,
> .suspend = timekeeping_suspend,
> set_kset_name("timekeeping"),
> };
Well, apparently, not in -mm2:
rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' *
kernel/time/timekeeping.c
rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c
rafael@albercik:~/src/mm/linux-2.6.21-mm2>
Hmm?
Rafael
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 17:14 ` Andrew Morton
@ 2007-05-09 18:55 ` Rafael J. Wysocki
0 siblings, 0 replies; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 18:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: tglx, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 19:14, Andrew Morton wrote:
> On Wed, 9 May 2007 14:52:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > > Andrew, could you please apply the appended patch and see if that
> > > helps (should apply to -mm2)?
> >
> > Argh, sorry. This needs yet another patch (sent for review to linux-pm) to
> > be applied. The following one is against -mm2:
> >
> > ---
> > NOTE: This is not a complete solution, because it removes the enabling of GPEs
> > from the resume-during-hibernation code path entirely, which probbably is not a
> > good idea in general.
> > ---
> > kernel/power/disk.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > Index: linux-2.6.21-mm2/kernel/power/disk.c
> > ===================================================================
> > --- linux-2.6.21-mm2.orig/kernel/power/disk.c
> > +++ linux-2.6.21-mm2/kernel/power/disk.c
> > @@ -205,7 +205,6 @@ int hibernate(void)
> >
> > if (in_suspend) {
> > enable_nonboot_cpus();
> > - platform_finish();
> > device_resume();
> > resume_console();
> > pr_debug("PM: writing image.\n");
>
> It now hangs in a similar fashion in the device_resume() call.
>
> If I back off Thomas's clockevents-fix-resume-logic-updated-version.patch
> and include just the above patch the machine does suspend and resume
> correctly.
>
> I can delve further into the device_resume() hang if we think it would be
> useful.
Frankly, I'm not sure. On the one hand, I'd like to know wth is going on in
there, but on the other hand, I don't know if that knowledge will help us in
general.
It looks like we'll need to revamp the ACPI stuff in the hibernation/suspend
code paths anyway.
> Maybe Linus's clock-stomping tracer can help here, if I can remember how to
> use it.
It's described a bit in Documentation/power/s2ram.txt .
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 18:36 ` Rafael J. Wysocki
@ 2007-05-09 20:45 ` Thomas Gleixner
2007-05-09 20:53 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 20:45 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote:
> Well, apparently, not in -mm2:
>
> rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' *
> kernel/time/timekeeping.c
> rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c
> rafael@albercik:~/src/mm/linux-2.6.21-mm2>
Andrew dropped the patch because it did not work on his jinxed VAIO, but
he debugged with the patch applied.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 20:45 ` Thomas Gleixner
@ 2007-05-09 20:53 ` Rafael J. Wysocki
2007-05-09 21:26 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-09 20:53 UTC (permalink / raw)
To: tglx; +Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wednesday, 9 May 2007 22:45, Thomas Gleixner wrote:
> On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote:
> > Well, apparently, not in -mm2:
> >
> > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' *
> > kernel/time/timekeeping.c
> > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c
> > rafael@albercik:~/src/mm/linux-2.6.21-mm2>
>
> Andrew dropped the patch because it did not work on his jinxed VAIO, but
> he debugged with the patch applied.
I see.
In that case, since timekeeping_resume() is called via sysdev_resume, then it's
executed before acpi_leave_sleep_state() and may very well interfere with the
ACPI methods executed from there, depending on what's happening in the
cs->resume() callbacks in clocksource_resume().
Greetings,
Rafael
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 20:53 ` Rafael J. Wysocki
@ 2007-05-09 21:26 ` Thomas Gleixner
2007-05-10 8:46 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-09 21:26 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 2007-05-09 at 22:53 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 9 May 2007 22:45, Thomas Gleixner wrote:
> > On Wed, 2007-05-09 at 20:36 +0200, Rafael J. Wysocki wrote:
> > > Well, apparently, not in -mm2:
> > >
> > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep -r -I -l 'timekeeping_resume' *
> > > kernel/time/timekeeping.c
> > > rafael@albercik:~/src/mm/linux-2.6.21-mm2> grep clocksource_resume kernel/time/timekeeping.c
> > > rafael@albercik:~/src/mm/linux-2.6.21-mm2>
> >
> > Andrew dropped the patch because it did not work on his jinxed VAIO, but
> > he debugged with the patch applied.
>
> I see.
>
> In that case, since timekeeping_resume() is called via sysdev_resume, then it's
> executed before acpi_leave_sleep_state() and may very well interfere with the
> ACPI methods executed from there, depending on what's happening in the
> cs->resume() callbacks in clocksource_resume().
The only difference in the clocksource_resume() path with the patch
applied on Andrews laptop is, that the watchdog for the TSC is reset.
But that's only touching some variables.
The other change is the way how the clock events are resumed. It touches
the PIT, local APIC and the interrupt controller.
Andrew,
can you test the alternative replacement patch for
clockevents: Fix resume logic - updated version
It does not touch the interrupt controller, it does the PIT restart
different. That's a patch from Chris Wright and confirmed to work on the
affected machines - except the @%$#! VAIO of course.
If that patch makes the problem go away, then we should have a quite
good hint what we need to look at.
tglx
Index: linux-2.6.21-git-x86-64/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/apic.c
+++ linux-2.6.21-git-x86-64/arch/i386/kernel/apic.c
@@ -264,6 +264,9 @@ static void lapic_timer_setup(enum clock
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write_around(APIC_LVTT, v);
break;
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
+ break;
}
local_irq_restore(flags);
Index: linux-2.6.21-git-x86-64/arch/i386/kernel/hpet.c
===================================================================
--- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/hpet.c
+++ linux-2.6.21-git-x86-64/arch/i386/kernel/hpet.c
@@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve
cfg &= ~HPET_TN_ENABLE;
hpet_writel(cfg, HPET_T0_CFG);
break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ hpet_enable_int();
+ break;
}
}
@@ -217,6 +221,7 @@ static struct clocksource clocksource_hp
.mask = HPET_MASK,
.shift = HPET_SHIFT,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+ .resume = hpet_start_counter,
};
/*
@@ -291,7 +296,6 @@ int __init hpet_enable(void)
clocksource_register(&clocksource_hpet);
-
if (id & HPET_ID_LEGSUP) {
hpet_enable_int();
hpet_reserve_platform_timers(id);
@@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq,
return IRQ_HANDLED;
}
#endif
-
-
-/*
- * Suspend/resume part
- */
-
-#ifdef CONFIG_PM
-
-static int hpet_suspend(struct sys_device *sys_device, pm_message_t state)
-{
- unsigned long cfg = hpet_readl(HPET_CFG);
-
- cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY);
- hpet_writel(cfg, HPET_CFG);
-
- return 0;
-}
-
-static int hpet_resume(struct sys_device *sys_device)
-{
- unsigned int id;
-
- hpet_start_counter();
-
- id = hpet_readl(HPET_ID);
-
- if (id & HPET_ID_LEGSUP)
- hpet_enable_int();
-
- return 0;
-}
-
-static struct sysdev_class hpet_class = {
- set_kset_name("hpet"),
- .suspend = hpet_suspend,
- .resume = hpet_resume,
-};
-
-static struct sys_device hpet_device = {
- .id = 0,
- .cls = &hpet_class,
-};
-
-
-static __init int hpet_register_sysfs(void)
-{
- int err;
-
- if (!is_hpet_capable())
- return 0;
-
- err = sysdev_class_register(&hpet_class);
-
- if (!err) {
- err = sysdev_register(&hpet_device);
- if (err)
- sysdev_class_unregister(&hpet_class);
- }
-
- return err;
-}
-
-device_initcall(hpet_register_sysfs);
-
-#endif
Index: linux-2.6.21-git-x86-64/arch/i386/kernel/i8253.c
===================================================================
--- linux-2.6.21-git-x86-64.orig/arch/i386/kernel/i8253.c
+++ linux-2.6.21-git-x86-64/arch/i386/kernel/i8253.c
@@ -3,11 +3,11 @@
*
*/
#include <linux/clockchips.h>
-#include <linux/spinlock.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
#include <linux/jiffies.h>
-#include <linux/sysdev.h>
#include <linux/module.h>
-#include <linux/init.h>
+#include <linux/spinlock.h>
#include <asm/smp.h>
#include <asm/delay.h>
@@ -41,26 +41,24 @@ static void init_pit_timer(enum clock_ev
case CLOCK_EVT_MODE_PERIODIC:
/* binary, mode 2, LSB/MSB, ch 0 */
outb_p(0x34, PIT_MODE);
- udelay(10);
outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
- udelay(10);
outb(LATCH >> 8 , PIT_CH0); /* MSB */
break;
- /*
- * Avoid unnecessary state transitions, as it confuses
- * Geode / Cyrix based boxen.
- */
case CLOCK_EVT_MODE_SHUTDOWN:
- if (evt->mode == CLOCK_EVT_MODE_UNUSED)
- break;
case CLOCK_EVT_MODE_UNUSED:
- if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
- break;
+ outb_p(0x30, PIT_MODE);
+ outb_p(0, PIT_CH0); /* LSB */
+ outb_p(0, PIT_CH0); /* MSB */
+ break;
+
case CLOCK_EVT_MODE_ONESHOT:
/* One shot setup */
outb_p(0x38, PIT_MODE);
- udelay(10);
+ break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
break;
}
spin_unlock_irqrestore(&i8253_lock, flags);
Index: linux-2.6.21-git-x86-64/include/linux/clockchips.h
===================================================================
--- linux-2.6.21-git-x86-64.orig/include/linux/clockchips.h
+++ linux-2.6.21-git-x86-64/include/linux/clockchips.h
@@ -23,6 +23,7 @@ enum clock_event_mode {
CLOCK_EVT_MODE_SHUTDOWN,
CLOCK_EVT_MODE_PERIODIC,
CLOCK_EVT_MODE_ONESHOT,
+ CLOCK_EVT_MODE_RESUME,
};
/* Clock event notification values */
Index: linux-2.6.21-git-x86-64/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.21-git-x86-64.orig/kernel/time/tick-broadcast.c
+++ linux-2.6.21-git-x86-64/kernel/time/tick-broadcast.c
@@ -292,7 +292,7 @@ void tick_suspend_broadcast(void)
spin_lock_irqsave(&tick_broadcast_lock, flags);
bc = tick_broadcast_device.evtdev;
- if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+ if (bc)
clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
@@ -309,6 +309,8 @@ int tick_resume_broadcast(void)
bc = tick_broadcast_device.evtdev;
if (bc) {
+ clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME);
+
switch (tick_broadcast_device.mode) {
case TICKDEV_MODE_PERIODIC:
if(!cpus_empty(tick_broadcast_mask))
Index: linux-2.6.21-git-x86-64/kernel/time/tick-common.c
===================================================================
--- linux-2.6.21-git-x86-64.orig/kernel/time/tick-common.c
+++ linux-2.6.21-git-x86-64/kernel/time/tick-common.c
@@ -318,12 +318,17 @@ static void tick_resume(void)
{
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
+ int broadcast = tick_resume_broadcast();
spin_lock_irqsave(&tick_device_lock, flags);
- if (td->mode == TICKDEV_MODE_PERIODIC)
- tick_setup_periodic(td->evtdev, 0);
- else
- tick_resume_oneshot();
+ clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME);
+
+ if (!broadcast) {
+ if (td->mode == TICKDEV_MODE_PERIODIC)
+ tick_setup_periodic(td->evtdev, 0);
+ else
+ tick_resume_oneshot();
+ }
spin_unlock_irqrestore(&tick_device_lock, flags);
}
@@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b
break;
case CLOCK_EVT_NOTIFY_RESUME:
- if (!tick_resume_broadcast())
- tick_resume();
+ tick_resume();
break;
default:
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-09 21:26 ` Thomas Gleixner
@ 2007-05-10 8:46 ` Andrew Morton
2007-05-10 8:55 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-10 8:46 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> Andrew,
>
> can you test the alternative replacement patch for
>
> clockevents: Fix resume logic - updated version
>
> It does not touch the interrupt controller, it does the PIT restart
> different. That's a patch from Chris Wright and confirmed to work on the
> affected machines - except the @%$#! VAIO of course.
>
> If that patch makes the problem go away, then we should have a quite
> good hint what we need to look at.
No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-10 8:46 ` Andrew Morton
@ 2007-05-10 8:55 ` Thomas Gleixner
2007-05-10 9:18 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-10 8:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Thu, 2007-05-10 at 01:46 -0700, Andrew Morton wrote:
> On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Andrew,
> >
> > can you test the alternative replacement patch for
> >
> > clockevents: Fix resume logic - updated version
> >
> > It does not touch the interrupt controller, it does the PIT restart
> > different. That's a patch from Chris Wright and confirmed to work on the
> > affected machines - except the @%$#! VAIO of course.
> >
> > If that patch makes the problem go away, then we should have a quite
> > good hint what we need to look at.
>
> No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
Can you add "nolapic_timer" to the command line please ?
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-10 8:55 ` Thomas Gleixner
@ 2007-05-10 9:18 ` Andrew Morton
2007-05-10 9:27 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-10 9:18 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Thu, 10 May 2007 10:55:45 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 2007-05-10 at 01:46 -0700, Andrew Morton wrote:
> > On Wed, 09 May 2007 23:26:22 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > > Andrew,
> > >
> > > can you test the alternative replacement patch for
> > >
> > > clockevents: Fix resume logic - updated version
> > >
> > > It does not touch the interrupt controller, it does the PIT restart
> > > different. That's a patch from Chris Wright and confirmed to work on the
> > > affected machines - except the @%$#! VAIO of course.
> > >
> > > If that patch makes the problem go away, then we should have a quite
> > > good hint what we need to look at.
> >
> > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
>
> Can you add "nolapic_timer" to the command line please ?
>
That works.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-10 9:18 ` Andrew Morton
@ 2007-05-10 9:27 ` Thomas Gleixner
2007-05-10 20:12 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-10 9:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote:
> > > > If that patch makes the problem go away, then we should have a quite
> > > > good hint what we need to look at.
> > >
> > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
> >
> > Can you add "nolapic_timer" to the command line please ?
> >
>
> That works.
Ok, that boils it down to the change, which affects the lapic timer
resume. It does nothing else, than fiddling in some APIC registers, but
I don't see how this affects the ACPI stuff. This smells extremly fishy.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-10 9:27 ` Thomas Gleixner
@ 2007-05-10 20:12 ` Rafael J. Wysocki
2007-05-11 16:47 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-10 20:12 UTC (permalink / raw)
To: Thomas Gleixner, Andrew Morton; +Cc: Ingo Molnar, LKML, John Stultz, linux-acpi
On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote:
> On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote:
> > > > > If that patch makes the problem go away, then we should have a quite
> > > > > good hint what we need to look at.
> > > >
> > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
> > >
> > > Can you add "nolapic_timer" to the command line please ?
> > >
> >
> > That works.
>
> Ok, that boils it down to the change, which affects the lapic timer
> resume. It does nothing else, than fiddling in some APIC registers, but
> I don't see how this affects the ACPI stuff. This smells extremly fishy.
Hmm, I wonder if it might be related to the execution of the _GTS ACPI control
method in acpi_enter_sleep_state_prep() in violation of the spec.
Andrew, could you please try to apply the following change on top of the
previous ones and see if the machine still hangs in the same place without
"nolapic_timer"?
---
drivers/acpi/hardware/hwsleep.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
===================================================================
--- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
+++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
@@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
return_ACPI_STATUS(status);
}
- status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
+ /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
return_ACPI_STATUS(status);
- }
+ }*/
/* Setup the argument to _SST */
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-10 20:12 ` Rafael J. Wysocki
@ 2007-05-11 16:47 ` Andrew Morton
2007-05-11 20:10 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-11 16:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Thu, 10 May 2007 22:12:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote:
> > On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote:
> > > > > > If that patch makes the problem go away, then we should have a quite
> > > > > > good hint what we need to look at.
> > > > >
> > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
> > > >
> > > > Can you add "nolapic_timer" to the command line please ?
> > > >
> > >
> > > That works.
> >
> > Ok, that boils it down to the change, which affects the lapic timer
> > resume. It does nothing else, than fiddling in some APIC registers, but
> > I don't see how this affects the ACPI stuff. This smells extremly fishy.
>
> Hmm, I wonder if it might be related to the execution of the _GTS ACPI control
> method in acpi_enter_sleep_state_prep() in violation of the spec.
>
> Andrew, could you please try to apply the following change on top of the
> previous ones and see if the machine still hangs in the same place without
> "nolapic_timer"?
>
> ---
> drivers/acpi/hardware/hwsleep.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> ===================================================================
> --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
> +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
> return_ACPI_STATUS(status);
> }
>
> - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
> return_ACPI_STATUS(status);
> - }
> + }*/
>
> /* Setup the argument to _SST */
>
I tested this against Thomas's original patch (below). Still hangs, in the
same way.
From: Thomas Gleixner <tglx@linutronix.de>
We need to make sure that the clockevent devices are resumed before the
tick is resumed. The current resume logic does not guarantee this.
Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
event devices before resuming the tick / oneshot functionality.
Fixup the existing users.
Fix the PIT handling of CLOCK_EVT_SHUTDOWN by disabling the interrupt
instead of switching to oneshot mode, as this causes slowness on a couple
of systems.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/i386/kernel/apic.c | 3 +
arch/i386/kernel/hpet.c | 71 ++-------------------------------
arch/i386/kernel/i8253.c | 43 +++++++++++++------
include/linux/clockchips.h | 1
kernel/time/tick-broadcast.c | 4 +
kernel/time/tick-common.c | 16 ++++---
6 files changed, 51 insertions(+), 87 deletions(-)
diff -puN arch/i386/kernel/apic.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/apic.c
--- a/arch/i386/kernel/apic.c~clockevents-fix-resume-logic-updated-version
+++ a/arch/i386/kernel/apic.c
@@ -263,6 +263,9 @@ static void lapic_timer_setup(enum clock
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write_around(APIC_LVTT, v);
break;
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
+ break;
}
local_irq_restore(flags);
diff -puN arch/i386/kernel/hpet.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/hpet.c
--- a/arch/i386/kernel/hpet.c~clockevents-fix-resume-logic-updated-version
+++ a/arch/i386/kernel/hpet.c
@@ -187,6 +187,10 @@ static void hpet_set_mode(enum clock_eve
cfg &= ~HPET_TN_ENABLE;
hpet_writel(cfg, HPET_T0_CFG);
break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ hpet_enable_int();
+ break;
}
}
@@ -217,6 +221,7 @@ static struct clocksource clocksource_hp
.mask = HPET_MASK,
.shift = HPET_SHIFT,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+ .resume = hpet_start_counter,
};
/*
@@ -291,7 +296,6 @@ int __init hpet_enable(void)
clocksource_register(&clocksource_hpet);
-
if (id & HPET_ID_LEGSUP) {
hpet_enable_int();
hpet_reserve_platform_timers(id);
@@ -524,68 +528,3 @@ irqreturn_t hpet_rtc_interrupt(int irq,
return IRQ_HANDLED;
}
#endif
-
-
-/*
- * Suspend/resume part
- */
-
-#ifdef CONFIG_PM
-
-static int hpet_suspend(struct sys_device *sys_device, pm_message_t state)
-{
- unsigned long cfg = hpet_readl(HPET_CFG);
-
- cfg &= ~(HPET_CFG_ENABLE|HPET_CFG_LEGACY);
- hpet_writel(cfg, HPET_CFG);
-
- return 0;
-}
-
-static int hpet_resume(struct sys_device *sys_device)
-{
- unsigned int id;
-
- hpet_start_counter();
-
- id = hpet_readl(HPET_ID);
-
- if (id & HPET_ID_LEGSUP)
- hpet_enable_int();
-
- return 0;
-}
-
-static struct sysdev_class hpet_class = {
- set_kset_name("hpet"),
- .suspend = hpet_suspend,
- .resume = hpet_resume,
-};
-
-static struct sys_device hpet_device = {
- .id = 0,
- .cls = &hpet_class,
-};
-
-
-static __init int hpet_register_sysfs(void)
-{
- int err;
-
- if (!is_hpet_capable())
- return 0;
-
- err = sysdev_class_register(&hpet_class);
-
- if (!err) {
- err = sysdev_register(&hpet_device);
- if (err)
- sysdev_class_unregister(&hpet_class);
- }
-
- return err;
-}
-
-device_initcall(hpet_register_sysfs);
-
-#endif
diff -puN arch/i386/kernel/i8253.c~clockevents-fix-resume-logic-updated-version arch/i386/kernel/i8253.c
--- a/arch/i386/kernel/i8253.c~clockevents-fix-resume-logic-updated-version
+++ a/arch/i386/kernel/i8253.c
@@ -3,11 +3,11 @@
*
*/
#include <linux/clockchips.h>
-#include <linux/spinlock.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
#include <linux/jiffies.h>
-#include <linux/sysdev.h>
#include <linux/module.h>
-#include <linux/init.h>
+#include <linux/spinlock.h>
#include <asm/smp.h>
#include <asm/delay.h>
@@ -26,6 +26,24 @@ EXPORT_SYMBOL(i8253_lock);
*/
struct clock_event_device *global_clock_event;
+/* Status of the PIT interrupt */
+static int pit_irq_disabled;
+
+/*
+ * Control pit interrupt enable / disable
+ */
+static void pit_control_irq(int disable)
+{
+ if (pit_irq_disabled == disable)
+ return;
+
+ pit_irq_disabled = disable;
+ if (disable)
+ disable_irq(0);
+ else
+ enable_irq(0);
+}
+
/*
* Initialize the PIT timer.
*
@@ -42,26 +60,23 @@ static void init_pit_timer(enum clock_ev
case CLOCK_EVT_MODE_PERIODIC:
/* binary, mode 2, LSB/MSB, ch 0 */
outb_p(0x34, PIT_MODE);
- udelay(10);
outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
- udelay(10);
outb(LATCH >> 8 , PIT_CH0); /* MSB */
+ pit_control_irq(0);
break;
- /*
- * Avoid unnecessary state transitions, as it confuses
- * Geode / Cyrix based boxen.
- */
case CLOCK_EVT_MODE_SHUTDOWN:
- if (evt->mode == CLOCK_EVT_MODE_UNUSED)
- break;
case CLOCK_EVT_MODE_UNUSED:
- if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
- break;
+ pit_control_irq(1);
+ break;
case CLOCK_EVT_MODE_ONESHOT:
/* One shot setup */
outb_p(0x38, PIT_MODE);
- udelay(10);
+ pit_control_irq(0);
+ break;
+
+ case CLOCK_EVT_MODE_RESUME:
+ /* Nothing to do here */
break;
}
spin_unlock_irqrestore(&i8253_lock, flags);
diff -puN include/linux/clockchips.h~clockevents-fix-resume-logic-updated-version include/linux/clockchips.h
--- a/include/linux/clockchips.h~clockevents-fix-resume-logic-updated-version
+++ a/include/linux/clockchips.h
@@ -23,6 +23,7 @@ enum clock_event_mode {
CLOCK_EVT_MODE_SHUTDOWN,
CLOCK_EVT_MODE_PERIODIC,
CLOCK_EVT_MODE_ONESHOT,
+ CLOCK_EVT_MODE_RESUME,
};
/* Clock event notification values */
diff -puN kernel/time/tick-broadcast.c~clockevents-fix-resume-logic-updated-version kernel/time/tick-broadcast.c
--- a/kernel/time/tick-broadcast.c~clockevents-fix-resume-logic-updated-version
+++ a/kernel/time/tick-broadcast.c
@@ -292,7 +292,7 @@ void tick_suspend_broadcast(void)
spin_lock_irqsave(&tick_broadcast_lock, flags);
bc = tick_broadcast_device.evtdev;
- if (bc && tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+ if (bc)
clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
@@ -309,6 +309,8 @@ int tick_resume_broadcast(void)
bc = tick_broadcast_device.evtdev;
if (bc) {
+ clockevents_set_mode(bc, CLOCK_EVT_MODE_RESUME);
+
switch (tick_broadcast_device.mode) {
case TICKDEV_MODE_PERIODIC:
if(!cpus_empty(tick_broadcast_mask))
diff -puN kernel/time/tick-common.c~clockevents-fix-resume-logic-updated-version kernel/time/tick-common.c
--- a/kernel/time/tick-common.c~clockevents-fix-resume-logic-updated-version
+++ a/kernel/time/tick-common.c
@@ -318,12 +318,17 @@ static void tick_resume(void)
{
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
+ int broadcast = tick_resume_broadcast();
spin_lock_irqsave(&tick_device_lock, flags);
- if (td->mode == TICKDEV_MODE_PERIODIC)
- tick_setup_periodic(td->evtdev, 0);
- else
- tick_resume_oneshot();
+ clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_RESUME);
+
+ if (!broadcast) {
+ if (td->mode == TICKDEV_MODE_PERIODIC)
+ tick_setup_periodic(td->evtdev, 0);
+ else
+ tick_resume_oneshot();
+ }
spin_unlock_irqrestore(&tick_device_lock, flags);
}
@@ -360,8 +365,7 @@ static int tick_notify(struct notifier_b
break;
case CLOCK_EVT_NOTIFY_RESUME:
- if (!tick_resume_broadcast())
- tick_resume();
+ tick_resume();
break;
default:
_
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-11 16:47 ` Andrew Morton
@ 2007-05-11 20:10 ` Rafael J. Wysocki
2007-05-11 20:28 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-11 20:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Friday, 11 May 2007 18:47, Andrew Morton wrote:
> On Thu, 10 May 2007 22:12:07 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > On Thursday, 10 May 2007 11:27, Thomas Gleixner wrote:
> > > On Thu, 2007-05-10 at 02:18 -0700, Andrew Morton wrote:
> > > > > > > If that patch makes the problem go away, then we should have a quite
> > > > > > > good hint what we need to look at.
> > > > > >
> > > > > > No joy, sorry. It still hangs at the last statement in acpi_evaluate_object().
> > > > >
> > > > > Can you add "nolapic_timer" to the command line please ?
> > > > >
> > > >
> > > > That works.
> > >
> > > Ok, that boils it down to the change, which affects the lapic timer
> > > resume. It does nothing else, than fiddling in some APIC registers, but
> > > I don't see how this affects the ACPI stuff. This smells extremly fishy.
> >
> > Hmm, I wonder if it might be related to the execution of the _GTS ACPI control
> > method in acpi_enter_sleep_state_prep() in violation of the spec.
> >
> > Andrew, could you please try to apply the following change on top of the
> > previous ones and see if the machine still hangs in the same place without
> > "nolapic_timer"?
> >
> > ---
> > drivers/acpi/hardware/hwsleep.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > ===================================================================
> > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
> > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
> > return_ACPI_STATUS(status);
> > }
> >
> > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
> > return_ACPI_STATUS(status);
> > - }
> > + }*/
> >
> > /* Setup the argument to _SST */
> >
>
> I tested this against Thomas's original patch (below). Still hangs, in the
> same way.
Well, I'm out of ideas. :-(
Could you please send me the Vaio's DSDT?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-11 20:10 ` Rafael J. Wysocki
@ 2007-05-11 20:28 ` Andrew Morton
2007-05-11 21:02 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-11 20:28 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Fri, 11 May 2007 22:10:24 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > ===================================================================
> > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
> > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
> > > return_ACPI_STATUS(status);
> > > }
> > >
> > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
> > > return_ACPI_STATUS(status);
> > > - }
> > > + }*/
> > >
> > > /* Setup the argument to _SST */
> > >
> >
> > I tested this against Thomas's original patch (below). Still hangs, in the
> > same way.
>
> Well, I'm out of ideas. :-(
>
> Could you please send me the Vaio's DSDT?
Maybe I should send you the Vaio ;)
I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based
Supermicro machine and it just *always* works. Sigh.
hm, Fedora don't seem to want to give me an RPM which contains acpidump and
all the yum servers are featuring scrogged checksums. I could build it, I
guess, but there's a principle involved ;)
http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-11 20:28 ` Andrew Morton
@ 2007-05-11 21:02 ` Rafael J. Wysocki
2007-05-11 21:09 ` Rafael J. Wysocki
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-11 21:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Friday, 11 May 2007 22:28, Andrew Morton wrote:
> On Fri, 11 May 2007 22:10:24 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > > ===================================================================
> > > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
> > > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
> > > > return_ACPI_STATUS(status);
> > > > }
> > > >
> > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
> > > > return_ACPI_STATUS(status);
> > > > - }
> > > > + }*/
> > > >
> > > > /* Setup the argument to _SST */
> > > >
> > >
> > > I tested this against Thomas's original patch (below). Still hangs, in the
> > > same way.
> >
> > Well, I'm out of ideas. :-(
> >
> > Could you please send me the Vaio's DSDT?
>
> Maybe I should send you the Vaio ;)
Well, machines with so much potential of exposing problems are priceless. ;-)
> I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based
> Supermicro machine and it just *always* works. Sigh.
>
>
> hm, Fedora don't seem to want to give me an RPM which contains acpidump and
> all the yum servers are featuring scrogged checksums. I could build it, I
> guess, but there's a principle involved ;)
>
> http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
Yes, thanks.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-11 21:02 ` Rafael J. Wysocki
@ 2007-05-11 21:09 ` Rafael J. Wysocki
2007-05-12 6:56 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Rafael J. Wysocki @ 2007-05-11 21:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Friday, 11 May 2007 23:02, Rafael J. Wysocki wrote:
> On Friday, 11 May 2007 22:28, Andrew Morton wrote:
> > On Fri, 11 May 2007 22:10:24 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >
> > > > > Index: linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > > > ===================================================================
> > > > > --- linux-2.6.21.orig/drivers/acpi/hardware/hwsleep.c
> > > > > +++ linux-2.6.21/drivers/acpi/hardware/hwsleep.c
> > > > > @@ -200,10 +200,10 @@ acpi_status acpi_enter_sleep_state_prep(
> > > > > return_ACPI_STATUS(status);
> > > > > }
> > > > >
> > > > > - status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > > > + /*status = acpi_evaluate_object(NULL, METHOD_NAME__GTS, &arg_list, NULL);
> > > > > if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
> > > > > return_ACPI_STATUS(status);
> > > > > - }
> > > > > + }*/
> > > > >
> > > > > /* Setup the argument to _SST */
> > > > >
> > > >
> > > > I tested this against Thomas's original patch (below). Still hangs, in the
> > > > same way.
> > >
> > > Well, I'm out of ideas. :-(
> > >
> > > Could you please send me the Vaio's DSDT?
> >
> > Maybe I should send you the Vaio ;)
>
> Well, machines with so much potential of exposing problems are priceless. ;-)
>
> > I have this 1999-era pre-ACPI, pre-everything-else Dual-PIII-based
> > Supermicro machine and it just *always* works. Sigh.
> >
> >
> > hm, Fedora don't seem to want to give me an RPM which contains acpidump and
> > all the yum servers are featuring scrogged checksums. I could build it, I
> > guess, but there's a principle involved ;)
> >
> > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
>
> Yes, thanks.
Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the
hibernation?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-11 21:09 ` Rafael J. Wysocki
@ 2007-05-12 6:56 ` Andrew Morton
2007-05-12 8:46 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-12 6:56 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Thomas Gleixner, Ingo Molnar, LKML, John Stultz, linux-acpi
On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > >
> > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and
> > > all the yum servers are featuring scrogged checksums. I could build it, I
> > > guess, but there's a principle involved ;)
> > >
> > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
> >
> > Yes, thanks.
>
> Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the
> hibernation?
That didn't change the behaviour.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 6:56 ` Andrew Morton
@ 2007-05-12 8:46 ` Thomas Gleixner
2007-05-12 9:00 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 8:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Fri, 2007-05-11 at 23:56 -0700, Andrew Morton wrote:
> On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > > >
> > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and
> > > > all the yum servers are featuring scrogged checksums. I could build it, I
> > > > guess, but there's a principle involved ;)
> > > >
> > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
> > >
> > > Yes, thanks.
> >
> > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the
> > hibernation?
>
> That didn't change the behaviour.
Andrew,
can you try the desperate witchcraft patch below ?
tglx
Index: linux-2.6.21/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/apic.c
+++ linux-2.6.21/arch/i386/kernel/apic.c
@@ -238,9 +238,13 @@ static void lapic_timer_setup(enum clock
break;
case CLOCK_EVT_MODE_UNUSED:
case CLOCK_EVT_MODE_SHUTDOWN:
- v = apic_read(APIC_LVTT);
- v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
- apic_write_around(APIC_LVTT, v);
+
+ if (evt->mode == CLOCK_EVT_MODE_PERIODIC ||
+ evt->mode == CLOCK_EVT_MODE_ONESHOT) {
+ v = apic_read(APIC_LVTT);
+ v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
+ apic_write_around(APIC_LVTT, v);
+ }
break;
case CLOCK_EVT_MODE_RESUME:
/* Nothing to do here */
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 8:46 ` Thomas Gleixner
@ 2007-05-12 9:00 ` Andrew Morton
2007-05-12 9:18 ` Thomas Gleixner
2007-05-13 16:12 ` Thomas Gleixner
0 siblings, 2 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-12 9:00 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 12 May 2007 10:46:03 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 2007-05-11 at 23:56 -0700, Andrew Morton wrote:
> > On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >
> > > > >
> > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump and
> > > > > all the yum servers are featuring scrogged checksums. I could build it, I
> > > > > guess, but there's a principle involved ;)
> > > > >
> > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK?
> > > >
> > > > Yes, thanks.
> > >
> > > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the
> > > hibernation?
> >
> > That didn't change the behaviour.
>
> Andrew,
>
> can you try the desperate witchcraft patch below ?
>
> tglx
>
> Index: linux-2.6.21/arch/i386/kernel/apic.c
> ===================================================================
> --- linux-2.6.21.orig/arch/i386/kernel/apic.c
> +++ linux-2.6.21/arch/i386/kernel/apic.c
> @@ -238,9 +238,13 @@ static void lapic_timer_setup(enum clock
> break;
> case CLOCK_EVT_MODE_UNUSED:
> case CLOCK_EVT_MODE_SHUTDOWN:
> - v = apic_read(APIC_LVTT);
> - v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
> - apic_write_around(APIC_LVTT, v);
> +
> + if (evt->mode == CLOCK_EVT_MODE_PERIODIC ||
> + evt->mode == CLOCK_EVT_MODE_ONESHOT) {
> + v = apic_read(APIC_LVTT);
> + v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
> + apic_write_around(APIC_LVTT, v);
> + }
> break;
> case CLOCK_EVT_MODE_RESUME:
> /* Nothing to do here */
>
Still hangs in the same fashion, sorry.
It's peculiar that the hang happens when acpi_evaluate_object() hits its
return statement. Any theories there?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 9:00 ` Andrew Morton
@ 2007-05-12 9:18 ` Thomas Gleixner
2007-05-12 10:07 ` Andrew Morton
2007-05-13 16:12 ` Thomas Gleixner
1 sibling, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 9:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote:
>
> Still hangs in the same fashion, sorry.
I did not really expect that it fixes the problem. It just restored the
local APIC suspend/resume register fiddling which we had before the
resume logic patch.
> It's peculiar that the hang happens when acpi_evaluate_object() hits its
> return statement. Any theories there?
Only stack or memory corruption come into mind, but I have no clue how
this is related to the resume logic changes.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 9:18 ` Thomas Gleixner
@ 2007-05-12 10:07 ` Andrew Morton
2007-05-12 11:44 ` Thomas Gleixner
` (2 more replies)
0 siblings, 3 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-12 10:07 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > return statement. Any theories there?
>
> Only stack or memory corruption come into mind, but I have no clue how
> this is related to the resume logic changes.
So I had the brilliant idea of turning on some kernel debugging. It's
a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC.
[ 73.533454] swsusp: Basic memory bitmaps created
[ 73.550429] Stopping tasks ... BUG: at kernel/lockdep.c:2414 check_flags()
[ 73.550988] [<c0104c14>] show_trace_log_lvl+0x1a/0x30
[ 73.551143] [<c0105769>] show_trace+0x12/0x14
[ 73.551279] [<c01057c1>] dump_stack+0x15/0x17
[ 73.551412] [<c0132732>] check_flags+0x93/0x13d
[ 73.551554] [<c0135558>] lock_acquire+0x28/0x7f
[ 73.551691] [<c0310e35>] _spin_lock+0x2b/0x38
[ 73.551827] [<c013dc43>] refrigerator+0x16/0xc7
[ 73.551965] [<c0125d2e>] get_signal_to_deliver+0x32/0x387
[ 73.552124] [<c010336d>] do_notify_resume+0x91/0x6a9
[ 73.552271] [<c0103df1>] work_notifysig+0x13/0x1a
[ 73.552413] =======================
[ 73.552507] irq event stamp: 3075
[ 73.552595] hardirqs last enabled at (3075): [<c0103e51>] syscall_exit_work+0x11/0x26
[ 73.552821] hardirqs last disabled at (3074): [<c0103d35>] syscall_exit+0x9/0x1a
[ 73.553046] softirqs last enabled at (2778): [<c01209f2>] __do_softirq+0x92/0x9a
[ 73.553255] softirqs last disabled at (2693): [<c0120a27>] do_softirq+0x2d/0x46
[ 73.559504] done.
[ 73.559569] Shrinking memory... \b-\bdone (0 pages freed)
[ 73.646511] Freed 0 kbytes in 0.08 seconds (0.00 MB/s)
[ 73.649595] platform sonypi: freeze
[ 73.649707] platform bluetooth: freeze
[ 73.649817] usb_endpoint usbdev5.1_ep81: PM: suspend 0->1, parent 5-0:1.0 already 2
[ 73.650023] hub 5-0:1.0: PM: suspend 2-->1
<snippage>
[ 73.739499] ipw2200 0000:06:0b.0: freeze
[ 73.743860] eth1: Going into suspend...
[ 73.748444] e100 0000:06:08.0: freeze
at this point I lost netconsole (earlier testing was without netconsole
btw)
The lockdep spew is coming out of here:
static void check_flags(unsigned long flags)
{
#if defined(CONFIG_DEBUG_LOCKDEP) && defined(CONFIG_TRACE_IRQFLAGS)
if (!debug_locks)
return;
if (irqs_disabled_flags(flags))
--> DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled);
else
DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled);
and the callsite is:
void refrigerator(void)
{
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
long save;
--> task_lock(current);
if (freezing(current)) {
frozen_process();
task_unlock(current);
} else {
I don't really know what lockdep is complaining about there. I assume I'm
not supposed to, given that whoever wrote that couldn't be bothered
documenting any of it.
I _think_ it means that lockdep believes that local irqs are enabled
(according to its state tracking), only it turns out that they're not.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 10:07 ` Andrew Morton
@ 2007-05-12 11:44 ` Thomas Gleixner
2007-05-12 16:51 ` Andrew Morton
2007-05-12 11:52 ` Thomas Gleixner
2007-05-15 16:52 ` Pavel Machek
2 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 11:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 03:07 -0700, Andrew Morton wrote:
> On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > > return statement. Any theories there?
> >
> > Only stack or memory corruption come into mind, but I have no clue how
> > this is related to the resume logic changes.
>
> So I had the brilliant idea of turning on some kernel debugging. It's
> a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC.
Really brilliant. I tried to reproduce your problem and stumbled across
something else.
tglx
------------------------------->
Subject: clocksource fix lock order in the resume path
lockdep complains about the lock nesting of clocksource and watchdog
lock in the resume path. Move watchdog resume out of the clocksource
lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Index: linux-2.6.21/kernel/time/clocksource.c
===================================================================
--- linux-2.6.21.orig/kernel/time/clocksource.c
+++ linux-2.6.21/kernel/time/clocksource.c
@@ -151,9 +151,11 @@ static void clocksource_watchdog(unsigne
}
static void clocksource_resume_watchdog(void)
{
- spin_lock(&watchdog_lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&watchdog_lock, flags);
watchdog_resumed = 1;
- spin_unlock(&watchdog_lock);
+ spin_unlock_irqrestore(&watchdog_lock, flags);
}
static void clocksource_check_watchdog(struct clocksource *cs)
@@ -224,9 +226,9 @@ void clocksource_resume(void)
cs->resume();
}
- clocksource_resume_watchdog();
-
spin_unlock_irqrestore(&clocksource_lock, flags);
+
+ clocksource_resume_watchdog();
}
/**
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 10:07 ` Andrew Morton
2007-05-12 11:44 ` Thomas Gleixner
@ 2007-05-12 11:52 ` Thomas Gleixner
2007-05-15 16:52 ` Pavel Machek
2 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 11:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 03:07 -0700, Andrew Morton wrote:
> On Sat, 12 May 2007 11:18:09 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > > return statement. Any theories there?
> >
> > Only stack or memory corruption come into mind, but I have no clue how
> > this is related to the resume logic changes.
>
> So I had the brilliant idea of turning on some kernel debugging. It's
> a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC.
...
> I don't really know what lockdep is complaining about there. I assume I'm
> not supposed to, given that whoever wrote that couldn't be bothered
> documenting any of it.
>
> I _think_ it means that lockdep believes that local irqs are enabled
> (according to its state tracking), only it turns out that they're not.
Right. Lockdep checks, whether the tracked interrupt disabled/enabled
state is the same as the state in the hardware. When the check triggers,
then we disabled / enabled interrupts somewhere without going through
lockdeps state tracker.
/me suspects ASM code missing the TRACE_IFQS_OFF macro in some place,
which was recently modified.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 11:44 ` Thomas Gleixner
@ 2007-05-12 16:51 ` Andrew Morton
2007-05-12 17:01 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-12 16:51 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 12 May 2007 13:44:13 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> lockdep complains about the lock nesting of clocksource and watchdog
> lock in the resume path. Move watchdog resume out of the clocksource
> lock.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> Index: linux-2.6.21/kernel/time/clocksource.c
> ===================================================================
> --- linux-2.6.21.orig/kernel/time/clocksource.c
> +++ linux-2.6.21/kernel/time/clocksource.c
> @@ -151,9 +151,11 @@ static void clocksource_watchdog(unsigne
> }
> static void clocksource_resume_watchdog(void)
> {
> - spin_lock(&watchdog_lock);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&watchdog_lock, flags);
> watchdog_resumed = 1;
> - spin_unlock(&watchdog_lock);
> + spin_unlock_irqrestore(&watchdog_lock, flags);
> }
>
> static void clocksource_check_watchdog(struct clocksource *cs)
> @@ -224,9 +226,9 @@ void clocksource_resume(void)
> cs->resume();
> }
>
> - clocksource_resume_watchdog();
> -
> spin_unlock_irqrestore(&clocksource_lock, flags);
> +
> + clocksource_resume_watchdog();
> }
>
The locking in clocksource_resume_watchdog looks pretty pointless anyway.
Can't we just delete it?
The only thing it can race against is, conceivably,
resumed = watchdog_resumed;
if (unlikely(resumed))
watchdog_resumed = 0;
which could be solved by using test_and_clear_bit().
Does anyone have any theories about my lockdep warning?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 16:51 ` Andrew Morton
@ 2007-05-12 17:01 ` Thomas Gleixner
2007-05-12 17:23 ` Andrew Morton
2007-05-12 18:49 ` Thomas Gleixner
0 siblings, 2 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 17:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 09:51 -0700, Andrew Morton wrote:
> The locking in clocksource_resume_watchdog looks pretty pointless anyway.
> Can't we just delete it?
>
> The only thing it can race against is, conceivably,
>
> resumed = watchdog_resumed;
> if (unlikely(resumed))
> watchdog_resumed = 0;
>
> which could be solved by using test_and_clear_bit().
True. I'll redo it.
> Does anyone have any theories about my lockdep warning?
Can you upload a snapshot of your current queue ?
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 17:01 ` Thomas Gleixner
@ 2007-05-12 17:23 ` Andrew Morton
2007-05-12 19:36 ` Thomas Gleixner
2007-05-12 19:56 ` Thomas Gleixner
2007-05-12 18:49 ` Thomas Gleixner
1 sibling, 2 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-12 17:23 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 12 May 2007 19:01:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> Can you upload a snapshot of your current queue ?
Yeah, that's super-easy. It just happens that it all compiles
and runs at present ;)
The single-big-patch is at http://userweb.kernel.org/~akpm/tg.bz2
The broken-out queue should turn up at
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/
in a few minutes.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 17:01 ` Thomas Gleixner
2007-05-12 17:23 ` Andrew Morton
@ 2007-05-12 18:49 ` Thomas Gleixner
1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 18:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 19:01 +0200, Thomas Gleixner wrote:
> On Sat, 2007-05-12 at 09:51 -0700, Andrew Morton wrote:
> > The locking in clocksource_resume_watchdog looks pretty pointless anyway.
> > Can't we just delete it?
> >
> > The only thing it can race against is, conceivably,
> >
> > resumed = watchdog_resumed;
> > if (unlikely(resumed))
> > watchdog_resumed = 0;
> >
> > which could be solved by using test_and_clear_bit().
>
> True. I'll redo it.
Here you go.
tglx
------------------------------->
Subject: clocksource fix lock order in the resume path
lockdep complains about the lock nesting of clocksource and watchdog
lock in the resume path.
Change the resume marker to a bit operation and remove the lock from
this path.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -74,7 +74,7 @@ static struct clocksource *watchdog;
static struct timer_list watchdog_timer;
static DEFINE_SPINLOCK(watchdog_lock);
static cycle_t watchdog_last;
-static int watchdog_resumed;
+static unsigned long watchdog_resumed;
/*
* Interval: 0.5sec Threshold: 0.0625s
@@ -104,9 +104,7 @@ static void clocksource_watchdog(unsigned long data)
spin_lock(&watchdog_lock);
- resumed = watchdog_resumed;
- if (unlikely(resumed))
- watchdog_resumed = 0;
+ resumed = test_and_clear_bit(0, &watchdog_resumed);
wdnow = watchdog->read();
wd_nsec = cyc2ns(watchdog, (wdnow - watchdog_last) & watchdog->mask);
@@ -151,9 +149,7 @@ static void clocksource_watchdog(unsigned long data)
}
static void clocksource_resume_watchdog(void)
{
- spin_lock(&watchdog_lock);
- watchdog_resumed = 1;
- spin_unlock(&watchdog_lock);
+ set_bit(0, &watchdog_resumed);
}
static void clocksource_check_watchdog(struct clocksource *cs)
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 17:23 ` Andrew Morton
@ 2007-05-12 19:36 ` Thomas Gleixner
2007-05-12 19:56 ` Thomas Gleixner
1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 19:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote:
> On Sat, 12 May 2007 19:01:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Can you upload a snapshot of your current queue ?
>
> Yeah, that's super-easy. It just happens that it all compiles
> and runs at present ;)
Really ?
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c: In function ‘pcxhr_trigger’:
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:640: error: expected expression before ‘<<’ token
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:642: error: expected expression before ‘==’ token
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:644: warning: ISO C90 forbids mixed declarations and code
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:645: error: expected expression before ‘>>’ token
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: ‘s’ undeclared (first use in this function)
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: (Each undeclared identifier is reported only once
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: for each function it appears in.)
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: type defaults to ‘int’ in declaration of ‘type name’
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: error: request for member ‘link_list’ in something not a structure or union
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: type defaults to ‘int’ in declaration of ‘__mptr’
/home/tglx/work/kernel/hrtimers/linux-2.6.21/sound/pci/pcxhr/pcxhr.c:653: warning: initialization from incompatible pointer type
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 17:23 ` Andrew Morton
2007-05-12 19:36 ` Thomas Gleixner
@ 2007-05-12 19:56 ` Thomas Gleixner
2007-05-13 1:11 ` Andrew Morton
1 sibling, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-12 19:56 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote:
> The broken-out queue should turn up at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/
> in a few minutes.
Sigh. I can't reproduce your lockdep problem :(
Can you send me your .config please ?
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 19:56 ` Thomas Gleixner
@ 2007-05-13 1:11 ` Andrew Morton
2007-05-13 8:07 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-13 1:11 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 12 May 2007 21:56:10 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote:
> > The broken-out queue should turn up at
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/
> > in a few minutes.
>
> Sigh. I can't reproduce your lockdep problem :(
We should write a Vaio emulator.
> Can you send me your .config please ?
>
http://userweb.kernel.org/~akpm/config-sony.txt
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 1:11 ` Andrew Morton
@ 2007-05-13 8:07 ` Thomas Gleixner
2007-05-13 16:48 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-13 8:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 18:11 -0700, Andrew Morton wrote:
> On Sat, 12 May 2007 21:56:10 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > On Sat, 2007-05-12 at 10:23 -0700, Andrew Morton wrote:
> > > The broken-out queue should turn up at
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/
> > > in a few minutes.
> >
> > Sigh. I can't reproduce your lockdep problem :(
>
> We should write a Vaio emulator.
:)
> > Can you send me your .config please ?
> >
>
> http://userweb.kernel.org/~akpm/config-sony.txt
No luck.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 9:00 ` Andrew Morton
2007-05-12 9:18 ` Thomas Gleixner
@ 2007-05-13 16:12 ` Thomas Gleixner
2007-05-14 6:42 ` Andrew Morton
1 sibling, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-13 16:12 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote:
> Still hangs in the same fashion, sorry.
>
> It's peculiar that the hang happens when acpi_evaluate_object() hits its
> return statement. Any theories there?
I assume you have a printk right before the return and one after the
call to acpi_evaluate_object().
Can you dump the stack at the point before the return, so we can see if
the stack is corrupt there ? A WARN_ON(1) should do the trick.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 8:07 ` Thomas Gleixner
@ 2007-05-13 16:48 ` Thomas Gleixner
2007-05-13 19:09 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-13 16:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sun, 2007-05-13 at 10:07 +0200, Thomas Gleixner wrote:
> > > Can you send me your .config please ?
> > >
> >
> > http://userweb.kernel.org/~akpm/config-sony.txt
>
> No luck.
I'm making progress. After fiddling with the config options I get a
solid lockup of netconsole during boot :(
But also lockdep itself is broken on my box, so I would not see warnings
anyway:
soft-irqs-on + irq-safe-A/21: ok | ok | ok |
sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 472
hardirqs last enabled at (472): [<c01e8bba>] irqsafe2A_rlock_12+0x94/0xa1
hardirqs last disabled at (471): [<c0109541>] native_sched_clock+0x55/0xe8
softirqs last enabled at (468): [<c01e8ba5>] irqsafe2A_rlock_12+0x7f/0xa1
softirqs last disabled at (464): [<c01e8b31>] irqsafe2A_rlock_12+0xb/0xa1
FAILED| [<c0104d21>] dump_trace+0x61/0x1e7
[<c0104ec1>] show_trace_log_lvl+0x1a/0x30
[<c0105a1b>] show_trace+0x12/0x14
[<c0105a73>] dump_stack+0x15/0x17
[<c01e6971>] dotest+0x6b/0x3ce
[<c01efc1f>] locking_selftest+0x915/0x1a5a
[<c04a0a0a>] start_kernel+0x1d5/0x2a7
=======================
.... more of those
-----------------------------------------------------------------
BUG: 6 unexpected failures (out of 218) - debugging disabled! |
-----------------------------------------------------------------
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 16:48 ` Thomas Gleixner
@ 2007-05-13 19:09 ` Andrew Morton
2007-05-13 20:07 ` Ingo Molnar
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-13 19:09 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sun, 13 May 2007 18:48:07 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sun, 2007-05-13 at 10:07 +0200, Thomas Gleixner wrote:
> > > > Can you send me your .config please ?
> > > >
> > >
> > > http://userweb.kernel.org/~akpm/config-sony.txt
> >
> > No luck.
>
> I'm making progress. After fiddling with the config options I get a
> solid lockup of netconsole during boot :(
neat. Sometimes this can be a device driver failure.
> But also lockdep itself is broken on my box, so I would not see warnings
> anyway:
>
> soft-irqs-on + irq-safe-A/21: ok | ok | ok |
> sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 472
> hardirqs last enabled at (472): [<c01e8bba>] irqsafe2A_rlock_12+0x94/0xa1
> hardirqs last disabled at (471): [<c0109541>] native_sched_clock+0x55/0xe8
> softirqs last enabled at (468): [<c01e8ba5>] irqsafe2A_rlock_12+0x7f/0xa1
> softirqs last disabled at (464): [<c01e8b31>] irqsafe2A_rlock_12+0xb/0xa1
> FAILED| [<c0104d21>] dump_trace+0x61/0x1e7
> [<c0104ec1>] show_trace_log_lvl+0x1a/0x30
> [<c0105a1b>] show_trace+0x12/0x14
> [<c0105a73>] dump_stack+0x15/0x17
> [<c01e6971>] dotest+0x6b/0x3ce
> [<c01efc1f>] locking_selftest+0x915/0x1a5a
> [<c04a0a0a>] start_kernel+0x1d5/0x2a7
> =======================
>
> .... more of those
>
> -----------------------------------------------------------------
> BUG: 6 unexpected failures (out of 218) - debugging disabled! |
> -----------------------------------------------------------------
>
Yeah, I expect quite a few people will start seeing that. iirc it was
triggered by a couple of changes: a local_irq_save/local_irq_restore in
sched_clock() and a change Jeremy made to the softlockup detector.
There was no reasonable explanation why either of those changes should have
triggered the six failures so I just went ahead with them so we can flush
the real bug out.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 19:09 ` Andrew Morton
@ 2007-05-13 20:07 ` Ingo Molnar
2007-05-13 22:35 ` Andrew Morton
0 siblings, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2007-05-13 20:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: tglx, Rafael J. Wysocki, LKML, John Stultz, linux-acpi
* Andrew Morton <akpm@linux-foundation.org> wrote:
> Yeah, I expect quite a few people will start seeing that. iirc it was
> triggered by a couple of changes: a local_irq_save/local_irq_restore
> in sched_clock() and a change Jeremy made to the softlockup detector.
hm, i specifically asked to not do local_irq_save/restore in
sched_clock(). It's the _scheduler_ clock and it very absolutely
positively needs no flags save/restore.
Ingo
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 20:07 ` Ingo Molnar
@ 2007-05-13 22:35 ` Andrew Morton
0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2007-05-13 22:35 UTC (permalink / raw)
To: Ingo Molnar; +Cc: tglx, Rafael J. Wysocki, LKML, John Stultz, linux-acpi
On Sun, 13 May 2007 22:07:39 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > Yeah, I expect quite a few people will start seeing that. iirc it was
> > triggered by a couple of changes: a local_irq_save/local_irq_restore
> > in sched_clock() and a change Jeremy made to the softlockup detector.
>
> hm, i specifically asked to not do local_irq_save/restore in
> sched_clock(). It's the _scheduler_ clock and it very absolutely
> positively needs no flags save/restore.
>
It got taken out again.
It doesn't matter though: a local_irq_save/restore in some callee shouldn't
cause the locking API tests to break. And they're apparently now breaking
without that local_irq_save/restore in there anyway.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-13 16:12 ` Thomas Gleixner
@ 2007-05-14 6:42 ` Andrew Morton
2007-05-14 6:58 ` Thomas Gleixner
0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2007-05-14 6:42 UTC (permalink / raw)
To: tglx; +Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sun, 13 May 2007 18:12:50 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote:
> > Still hangs in the same fashion, sorry.
> >
> > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > return statement. Any theories there?
>
> I assume you have a printk right before the return and one after the
> call to acpi_evaluate_object().
>
> Can you dump the stack at the point before the return, so we can see if
> the stack is corrupt there ? A WARN_ON(1) should do the trick.
>
It all looks clean. I spose I should do a hex dump and a byte-by-byte
walkthough, but time is short...
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-14 6:42 ` Andrew Morton
@ 2007-05-14 6:58 ` Thomas Gleixner
0 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2007-05-14 6:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
On Sun, 2007-05-13 at 23:42 -0700, Andrew Morton wrote:
> On Sun, 13 May 2007 18:12:50 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > On Sat, 2007-05-12 at 02:00 -0700, Andrew Morton wrote:
> > > Still hangs in the same fashion, sorry.
> > >
> > > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > > return statement. Any theories there?
> >
> > I assume you have a printk right before the return and one after the
> > call to acpi_evaluate_object().
> >
> > Can you dump the stack at the point before the return, so we can see if
> > the stack is corrupt there ? A WARN_ON(1) should do the trick.
> >
>
> It all looks clean. I spose I should do a hex dump and a byte-by-byte
> walkthough, but time is short...
I could backport the highres/dyntick stuff to 2.6.20, so we have the
other changes out of the picture.
tglx
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [patch 3/3] clockevents: Fix resume logic - updated version
2007-05-12 10:07 ` Andrew Morton
2007-05-12 11:44 ` Thomas Gleixner
2007-05-12 11:52 ` Thomas Gleixner
@ 2007-05-15 16:52 ` Pavel Machek
2 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2007-05-15 16:52 UTC (permalink / raw)
To: Andrew Morton, Rafael J. Wysocki
Cc: tglx, Rafael J. Wysocki, Ingo Molnar, LKML, John Stultz, linux-acpi
Hi!
> > > It's peculiar that the hang happens when acpi_evaluate_object() hits its
> > > return statement. Any theories there?
> >
> > Only stack or memory corruption come into mind, but I have no clue how
> > this is related to the resume logic changes.
>
> So I had the brilliant idea of turning on some kernel debugging. It's
> a shame that CONFIG_SOFTWARE_SUSPEND disables CONFIG_DEBUG_PAGEALLOC.
We now switch to separate page tables for hibernation, IIRC, so we may
be able to get DEBUG_PAGEALLOC to work. I'll take a look.
> [ 73.533454] swsusp: Basic memory bitmaps created
> [ 73.550429] Stopping tasks ... BUG: at kernel/lockdep.c:2414 check_flags()
> [ 73.550988] [<c0104c14>] show_trace_log_lvl+0x1a/0x30
> [ 73.551143] [<c0105769>] show_trace+0x12/0x14
> [ 73.551279] [<c01057c1>] dump_stack+0x15/0x17
> [ 73.551412] [<c0132732>] check_flags+0x93/0x13d
> [ 73.551554] [<c0135558>] lock_acquire+0x28/0x7f
> [ 73.551691] [<c0310e35>] _spin_lock+0x2b/0x38
> [ 73.551827] [<c013dc43>] refrigerator+0x16/0xc7
> [ 73.551965] [<c0125d2e>] get_signal_to_deliver+0x32/0x387
> [ 73.552124] [<c010336d>] do_notify_resume+0x91/0x6a9
> [ 73.552271] [<c0103df1>] work_notifysig+0x13/0x1a
> [ 73.552413] =======================
> [ 73.552507] irq event stamp: 3075
> [ 73.552595] hardirqs last enabled at (3075): [<c0103e51>] syscall_exit_work+0x11/0x26
> [ 73.552821] hardirqs last disabled at (3074): [<c0103d35>] syscall_exit+0x9/0x1a
> [ 73.553046] softirqs last enabled at (2778): [<c01209f2>] __do_softirq+0x92/0x9a
> [ 73.553255] softirqs last disabled at (2693): [<c0120a27>] do_softirq+0x2d/0x46
> [ 73.559504] done.
> [ 73.559569] Shrinking memory... \b-\bdone (0 pages freed)
> [ 73.646511] Freed 0 kbytes in 0.08 seconds (0.00 MB/s)
> [ 73.649595] platform sonypi: freeze
> [ 73.649707] platform bluetooth: freeze
> [ 73.649817] usb_endpoint usbdev5.1_ep81: PM: suspend 0->1, parent 5-0:1.0 already 2
> [ 73.650023] hub 5-0:1.0: PM: suspend 2-->1
>
> <snippage>
>
> [ 73.739499] ipw2200 0000:06:0b.0: freeze
> [ 73.743860] eth1: Going into suspend...
> [ 73.748444] e100 0000:06:08.0: freeze
>
> at this point I lost netconsole (earlier testing was without netconsole
> btw)
Commenting out e100's suspend routine might do the trick.
> void refrigerator(void)
> {
> /* Hmm, should we be allowed to suspend when there are realtime
> processes around? */
> long save;
>
> --> task_lock(current);
> if (freezing(current)) {
> frozen_process();
> task_unlock(current);
> } else {
>
>
> I don't really know what lockdep is complaining about there. I assume I'm
> not supposed to, given that whoever wrote that couldn't be bothered
> documenting any of it.
> I _think_ it means that lockdep believes that local irqs are enabled
> (according to its state tracking), only it turns out that they're not.
Huh, running refrigerator with interrupts disabled? I do not think we
are doing _that_.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2007-05-16 13:36 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20070430102837.748238000@linutronix.de>
2007-05-02 0:33 ` [patch 0/3] Clocksource / clockevent updates Andrew Morton
2007-05-02 6:09 ` Thomas Gleixner
2007-05-02 6:14 ` Andrew Morton
[not found] ` <20070430102852.042964000@linutronix.de>
2007-05-05 7:34 ` [patch 3/3] clockevents: Fix resume logic Andrew Morton
2007-05-05 11:51 ` Ingo Molnar
2007-05-06 15:03 ` [patch 3/3] clockevents: Fix resume logic - updated version Thomas Gleixner
2007-05-08 10:37 ` Andrew Morton
2007-05-09 5:59 ` Andrew Morton
2007-05-09 7:10 ` Andrew Morton
2007-05-09 8:18 ` Thomas Gleixner
2007-05-09 8:22 ` Andrew Morton
2007-05-09 8:31 ` Andrew Morton
2007-05-09 8:59 ` Thomas Gleixner
2007-05-09 11:45 ` Rafael J. Wysocki
2007-05-09 12:24 ` Thomas Gleixner
2007-05-09 13:12 ` Rafael J. Wysocki
2007-05-09 13:19 ` Thomas Gleixner
2007-05-09 17:09 ` Rafael J. Wysocki
2007-05-09 17:15 ` Thomas Gleixner
2007-05-09 18:36 ` Rafael J. Wysocki
2007-05-09 20:45 ` Thomas Gleixner
2007-05-09 20:53 ` Rafael J. Wysocki
2007-05-09 21:26 ` Thomas Gleixner
2007-05-10 8:46 ` Andrew Morton
2007-05-10 8:55 ` Thomas Gleixner
2007-05-10 9:18 ` Andrew Morton
2007-05-10 9:27 ` Thomas Gleixner
2007-05-10 20:12 ` Rafael J. Wysocki
2007-05-11 16:47 ` Andrew Morton
2007-05-11 20:10 ` Rafael J. Wysocki
2007-05-11 20:28 ` Andrew Morton
2007-05-11 21:02 ` Rafael J. Wysocki
2007-05-11 21:09 ` Rafael J. Wysocki
2007-05-12 6:56 ` Andrew Morton
2007-05-12 8:46 ` Thomas Gleixner
2007-05-12 9:00 ` Andrew Morton
2007-05-12 9:18 ` Thomas Gleixner
2007-05-12 10:07 ` Andrew Morton
2007-05-12 11:44 ` Thomas Gleixner
2007-05-12 16:51 ` Andrew Morton
2007-05-12 17:01 ` Thomas Gleixner
2007-05-12 17:23 ` Andrew Morton
2007-05-12 19:36 ` Thomas Gleixner
2007-05-12 19:56 ` Thomas Gleixner
2007-05-13 1:11 ` Andrew Morton
2007-05-13 8:07 ` Thomas Gleixner
2007-05-13 16:48 ` Thomas Gleixner
2007-05-13 19:09 ` Andrew Morton
2007-05-13 20:07 ` Ingo Molnar
2007-05-13 22:35 ` Andrew Morton
2007-05-12 18:49 ` Thomas Gleixner
2007-05-12 11:52 ` Thomas Gleixner
2007-05-15 16:52 ` Pavel Machek
2007-05-13 16:12 ` Thomas Gleixner
2007-05-14 6:42 ` Andrew Morton
2007-05-14 6:58 ` Thomas Gleixner
2007-05-09 12:52 ` Rafael J. Wysocki
2007-05-09 17:14 ` Andrew Morton
2007-05-09 18:55 ` Rafael J. Wysocki
[not found] ` <20070430102851.987296000@linutronix.de>
2007-05-07 6:43 ` [patch 2/3] ACPI: Keep TSC stable, when lapic_timer_c2_ok is set Andrew Morton
2007-05-07 7:01 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).