LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Suspend to RAM generates oops and general protection fault
@ 2007-01-22  2:34 Jean-Marc Valin
  2007-01-22  3:23 ` Nigel Cunningham
  2007-01-22 11:59 ` Rafael J. Wysocki
  0 siblings, 2 replies; 12+ messages in thread
From: Jean-Marc Valin @ 2007-01-22  2:34 UTC (permalink / raw)
  To: linux-kernel

Hi,

I just encountered the following oops and general protection fault
trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
relevant errors are below but the full dmesg log is at
http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
http://people.xiph.org/~jm/config-2.6.20-rc5.txt

This happens when I'm running 2.6.20-rc5. The previous kernel version I
was using is 2.6.19-rc6 and was much more broken (second attempt
*always* failed), so it's probably not a regression.

Cheers,

	Jean-Marc

P.S. This is the same laptop I had at LCA for which Linus told me to
disable preemption and try the newest rc version.

[10746.449071] Unable to handle kernel NULL pointer dereference at
0000000000000038 RIP:
[10746.449080]  [<ffffffff8022b9c8>] iput+0x18/0x80
[10746.449092] PGD 3a607067 PUD 27b20067 PMD 0
[10746.449099] Oops: 0000 [1] SMP
[10746.449104] CPU 0
[10746.449107] Modules linked in: psmouse battery ac thermal fan button
ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep
ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm
speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand
cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock
asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp
parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss
snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp
pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket
rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394
ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor
[10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1
[10746.449196] RIP: 0010:[<ffffffff8022b9c8>]  [<ffffffff8022b9c8>]
iput+0x18/0x80
[10746.449206] RSP: 0000:ffff810037f2dd50  EFLAGS: 00010283
[10746.449212] RAX: 0000000000000000 RBX: ffff81000003fcf0 RCX:
ffff81000003fd20
[10746.449219] RDX: 0000000000000001 RSI: 0000000000000286 RDI:
ffff81000003fcf0
[10746.449225] RBP: 0000000000000042 R08: 0000000000000000 R09:
0000000000000000
[10746.449232] R10: 28f5c28f5c28f5c3 R11: ffffffff8023ae90 R12:
0000000000000000
[10746.449239] R13: ffff810075721c70 R14: ffffffff805fa940 R15:
0000000000000000
[10746.449246] FS:  0000000000000000(0000) GS:ffffffff8058e000(0000)
knlGS:0000000000000000
[10746.449253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[10746.449259] CR2: 0000000000000038 CR3: 000000001207f000 CR4:
00000000000006e0
[10746.449265] Process kswapd0 (pid: 218, threadinfo ffff810037f2c000,
task ffff810037a1b760)
[10746.449269] Stack:  ffff8100001ce2f0 ffffffff802ddaf8
ffff8100001ce3c0 ffff8100001ce2f0
[10746.449280]  0000000000000042 ffffffff8022f645 ffff810037f2dd80
000000000001cb60
[10746.449288]  0000000000000090 ffff81007daa0e00 00000000000000d0
ffffffff802ddb49
[10746.449296] Call Trace:
[10746.449305]  [<ffffffff802ddaf8>] prune_one_dentry+0x68/0xa0
[10746.449314]  [<ffffffff8022f645>] prune_dcache+0x145/0x1e0
[10746.449323]  [<ffffffff802ddb49>] shrink_dcache_memory+0x19/0x50
[10746.449331]  [<ffffffff802418a7>] shrink_slab+0x117/0x190
[10746.449342]  [<ffffffff8025a392>] kswapd+0x382/0x4e0
[10746.449356]  [<ffffffff802a13b0>] autoremove_wake_function+0x0/0x30
[10746.449370]  [<ffffffff8025a010>] kswapd+0x0/0x4e0
[10746.449376]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
[10746.449383]  [<ffffffff802335a9>] kthread+0xd9/0x120
[10746.449394]  [<ffffffff80260ec8>] child_rip+0xa/0x12
[10746.449401]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
[10746.449414]  [<ffffffff802334d0>] kthread+0x0/0x120
[10746.449421]  [<ffffffff80260ebe>] child_rip+0x0/0x12
[10746.449426]
[10746.449429]
[10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b
40 28 48
[10746.449449] RIP  [<ffffffff8022b9c8>] iput+0x18/0x80
[10746.449456]  RSP <ffff810037f2dd50>
[10746.449460] CR2: 0000000000000038
[10746.449463]  ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to
get data from device DCKS [20060707]


and later:


[    3.668009] SMP alternatives: switching to SMP code
[    3.668168] Booting processor 1/2 APIC 0x1
[    4.149691] Initializing CPU#1
[    4.229595] Calibrating delay using timer specific routine.. 3990.32
BogoMIPS (lpj=7980654)
[    4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K
[    4.229604] CPU: L2 cache: 4096K
[    4.229606] CPU 1/1 -> Node 0
[    4.229608] CPU: Physical Processor ID: 0
[    4.229609] CPU: Processor Core ID: 1
[    4.230107] Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06
[    4.233607] CPU 1: Syncing TSC to CPU 0.
[    3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles,
maxerr 960 cycles)
[    3.764689] general protection fault: 0000 [2] SMP
[    3.764963] CPU 1
[    3.764983] Modules linked in: psmouse battery ac thermal fan button
arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc
rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace
cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container
ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom
snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia
snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore
snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr
evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd
usbcore generic sd_mod processor
[    3.765304] Pid: 7824, comm: sleep.sh Not tainted 2.6.20-rc5-x86-64 #1
[    3.765330] RIP: 0010:[<ffffffff80289c45>]  [<ffffffff80289c45>]
task_rq_lock+0x35/0x90
[    3.765379] RSP: 0018:ffff81001d291d28  EFLAGS: 00010086
[    3.765403] RAX: 6e696c7761726320 RBX: ffffffff805f88a0 RCX:
0000000000000000
[    3.765431] RDX: 0000000000000000 RSI: ffff81001d291db0 RDI:
ffff810037a1b760
[    3.765460] RBP: ffff81001d291d48 R08: 0000000000000000 R09:
0000000000000000
[    3.765488] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffffff805f88a0
[    3.765516] R13: ffff810037a1b760 R14: ffff81001d291db0 R15:
ffff81000a9e3e80
[    3.765546] FS:  00002b1bcd567ae0(0000) GS:ffff810037ad1cc0(0000)
knlGS:0000000000000000
[    3.765589] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    3.765615] CR2: 0000000000000000 CR3: 000000001d4d0000 CR4:
00000000000006a0
[    3.765645] Process sleep.sh (pid: 7824, threadinfo ffff81001d290000,
task ffff8100661cf080)
[    3.765689] Stack:  ffff810037a1b760 0000000000000002
0000000000000003 0000000000000003
[    3.765739]  ffff81001d291dd8 ffffffff8028c454 0000000000000001
0000000000000003
[    3.765786]  0000000000000000 0000000000000002 ffffffff8058e4c0
0000000000000003
[    3.765822] Call Trace:
[    3.765861]  [<ffffffff8028c454>] set_cpus_allowed+0x24/0xc0
[    3.765890]  [<ffffffff8033cf3b>] kobject_register+0x3b/0x50
[    3.765919]  [<ffffffff80292850>] cpu_callback+0x50/0x70
[    3.765949]  [<ffffffff80269406>] notifier_call_chain+0x26/0x40
[    3.765980]  [<ffffffff802a704c>] _cpu_up+0xbc/0xf0
[    3.766005]  [<ffffffff802a70b2>] cpu_up+0x32/0x60
[    3.766030]  [<ffffffff802a712b>] enable_nonboot_cpus+0x4b/0xa0
[    3.766057]  [<ffffffff802aba8b>] enter_state+0x15b/0x1b0
[    3.766083]  [<ffffffff802abb5b>] state_store+0x7b/0xb0
[    3.766112]  [<ffffffff80306c09>] sysfs_write_file+0xd9/0x120
[    3.766143]  [<ffffffff8021653e>] vfs_write+0xde/0x1a0
[    3.766169]  [<ffffffff80217073>] sys_write+0x53/0x90
[    3.766197]  [<ffffffff8026011e>] system_call+0x7e/0x83
[    3.766228]
[    3.766247]
[    3.766248] Code: 8b 40 18 48 8b 04 c5 80 f2 59 80 48 03 58 08 48 89
df e8 54
[    3.766331] RIP  [<ffffffff80289c45>] task_rq_lock+0x35/0x90
[    3.766359]  RSP <ffff81001d291d28>
[    3.766650]  <6>ata1.00: configured for UDMA/133

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22  2:34 Suspend to RAM generates oops and general protection fault Jean-Marc Valin
@ 2007-01-22  3:23 ` Nigel Cunningham
  2007-01-22  5:16   ` Jean-Marc Valin
  2007-01-22 11:59 ` Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: Nigel Cunningham @ 2007-01-22  3:23 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: linux-kernel

Hi.

On Mon, 2007-01-22 at 13:34 +1100, Jean-Marc Valin wrote:
> Hi,
> 
> I just encountered the following oops and general protection fault
> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
> relevant errors are below but the full dmesg log is at
> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
> http://people.xiph.org/~jm/config-2.6.20-rc5.txt
> 
> This happens when I'm running 2.6.20-rc5. The previous kernel version I
> was using is 2.6.19-rc6 and was much more broken (second attempt
> *always* failed), so it's probably not a regression.

A second attempt always failing usually indicates that a driver was
dazed and confused after the first cycle and properly killed by the
second attempt, usually because of a lack of [proper] power management
code.

Between any two versions, some things can be fixed, some things can be
broken and some things can become broken in different ways, so your
different experience with 2.6.20-rc5 doesn't necessarily mean that this
is a different issue.

It looks like something is stomping on memory it shouldn't be touching,
so I would suggest testing multiple cycles with a minimal (preferably
zero) number of modules loaded. If that looks good and reliable, add
modules & processes until you can say 'If I do X, it breaks.'. If having
a minimal number of modules loaded doesn't help, I would then suggest
reviewing your kernel config to see if other things can be built as
modules and the same logic applied. You can be reasonably sure that it
will be a device driver. Common causes of suspend/resume problems from
the list you give below are acpi modules, bluetooth and usb. I'd also be
consider pcmcia, drm and fuse possibilities. But again, go for unloading
everything possible in the first instance.

Regards,

Nigel

> Cheers,
> 
> 	Jean-Marc
> 
> P.S. This is the same laptop I had at LCA for which Linus told me to
> disable preemption and try the newest rc version.
> 
> [10746.449071] Unable to handle kernel NULL pointer dereference at
> 0000000000000038 RIP:
> [10746.449080]  [<ffffffff8022b9c8>] iput+0x18/0x80
> [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0
> [10746.449099] Oops: 0000 [1] SMP
> [10746.449104] CPU 0
> [10746.449107] Modules linked in: psmouse battery ac thermal fan button
> ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep
> ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm
> speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand
> cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock
> asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp
> parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss
> snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp
> pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket
> rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394
> ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor
> [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1
> [10746.449196] RIP: 0010:[<ffffffff8022b9c8>]  [<ffffffff8022b9c8>]
> iput+0x18/0x80
> [10746.449206] RSP: 0000:ffff810037f2dd50  EFLAGS: 00010283
> [10746.449212] RAX: 0000000000000000 RBX: ffff81000003fcf0 RCX:
> ffff81000003fd20
> [10746.449219] RDX: 0000000000000001 RSI: 0000000000000286 RDI:
> ffff81000003fcf0
> [10746.449225] RBP: 0000000000000042 R08: 0000000000000000 R09:
> 0000000000000000
> [10746.449232] R10: 28f5c28f5c28f5c3 R11: ffffffff8023ae90 R12:
> 0000000000000000
> [10746.449239] R13: ffff810075721c70 R14: ffffffff805fa940 R15:
> 0000000000000000
> [10746.449246] FS:  0000000000000000(0000) GS:ffffffff8058e000(0000)
> knlGS:0000000000000000
> [10746.449253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [10746.449259] CR2: 0000000000000038 CR3: 000000001207f000 CR4:
> 00000000000006e0
> [10746.449265] Process kswapd0 (pid: 218, threadinfo ffff810037f2c000,
> task ffff810037a1b760)
> [10746.449269] Stack:  ffff8100001ce2f0 ffffffff802ddaf8
> ffff8100001ce3c0 ffff8100001ce2f0
> [10746.449280]  0000000000000042 ffffffff8022f645 ffff810037f2dd80
> 000000000001cb60
> [10746.449288]  0000000000000090 ffff81007daa0e00 00000000000000d0
> ffffffff802ddb49
> [10746.449296] Call Trace:
> [10746.449305]  [<ffffffff802ddaf8>] prune_one_dentry+0x68/0xa0
> [10746.449314]  [<ffffffff8022f645>] prune_dcache+0x145/0x1e0
> [10746.449323]  [<ffffffff802ddb49>] shrink_dcache_memory+0x19/0x50
> [10746.449331]  [<ffffffff802418a7>] shrink_slab+0x117/0x190
> [10746.449342]  [<ffffffff8025a392>] kswapd+0x382/0x4e0
> [10746.449356]  [<ffffffff802a13b0>] autoremove_wake_function+0x0/0x30
> [10746.449370]  [<ffffffff8025a010>] kswapd+0x0/0x4e0
> [10746.449376]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
> [10746.449383]  [<ffffffff802335a9>] kthread+0xd9/0x120
> [10746.449394]  [<ffffffff80260ec8>] child_rip+0xa/0x12
> [10746.449401]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
> [10746.449414]  [<ffffffff802334d0>] kthread+0x0/0x120
> [10746.449421]  [<ffffffff80260ebe>] child_rip+0x0/0x12
> [10746.449426]
> [10746.449429]
> [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b
> 40 28 48
> [10746.449449] RIP  [<ffffffff8022b9c8>] iput+0x18/0x80
> [10746.449456]  RSP <ffff810037f2dd50>
> [10746.449460] CR2: 0000000000000038
> [10746.449463]  ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to
> get data from device DCKS [20060707]
> 
> 
> and later:
> 
> 
> [    3.668009] SMP alternatives: switching to SMP code
> [    3.668168] Booting processor 1/2 APIC 0x1
> [    4.149691] Initializing CPU#1
> [    4.229595] Calibrating delay using timer specific routine.. 3990.32
> BogoMIPS (lpj=7980654)
> [    4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K
> [    4.229604] CPU: L2 cache: 4096K
> [    4.229606] CPU 1/1 -> Node 0
> [    4.229608] CPU: Physical Processor ID: 0
> [    4.229609] CPU: Processor Core ID: 1
> [    4.230107] Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06
> [    4.233607] CPU 1: Syncing TSC to CPU 0.
> [    3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles,
> maxerr 960 cycles)
> [    3.764689] general protection fault: 0000 [2] SMP
> [    3.764963] CPU 1
> [    3.764983] Modules linked in: psmouse battery ac thermal fan button
> arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc
> rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace
> cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
> cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container
> ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom
> snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia
> snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore
> snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr
> evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd
> usbcore generic sd_mod processor
> [    3.765304] Pid: 7824, comm: sleep.sh Not tainted 2.6.20-rc5-x86-64 #1
> [    3.765330] RIP: 0010:[<ffffffff80289c45>]  [<ffffffff80289c45>]
> task_rq_lock+0x35/0x90
> [    3.765379] RSP: 0018:ffff81001d291d28  EFLAGS: 00010086
> [    3.765403] RAX: 6e696c7761726320 RBX: ffffffff805f88a0 RCX:
> 0000000000000000
> [    3.765431] RDX: 0000000000000000 RSI: ffff81001d291db0 RDI:
> ffff810037a1b760
> [    3.765460] RBP: ffff81001d291d48 R08: 0000000000000000 R09:
> 0000000000000000
> [    3.765488] R10: 0000000000000001 R11: 0000000000000000 R12:
> ffffffff805f88a0
> [    3.765516] R13: ffff810037a1b760 R14: ffff81001d291db0 R15:
> ffff81000a9e3e80
> [    3.765546] FS:  00002b1bcd567ae0(0000) GS:ffff810037ad1cc0(0000)
> knlGS:0000000000000000
> [    3.765589] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    3.765615] CR2: 0000000000000000 CR3: 000000001d4d0000 CR4:
> 00000000000006a0
> [    3.765645] Process sleep.sh (pid: 7824, threadinfo ffff81001d290000,
> task ffff8100661cf080)
> [    3.765689] Stack:  ffff810037a1b760 0000000000000002
> 0000000000000003 0000000000000003
> [    3.765739]  ffff81001d291dd8 ffffffff8028c454 0000000000000001
> 0000000000000003
> [    3.765786]  0000000000000000 0000000000000002 ffffffff8058e4c0
> 0000000000000003
> [    3.765822] Call Trace:
> [    3.765861]  [<ffffffff8028c454>] set_cpus_allowed+0x24/0xc0
> [    3.765890]  [<ffffffff8033cf3b>] kobject_register+0x3b/0x50
> [    3.765919]  [<ffffffff80292850>] cpu_callback+0x50/0x70
> [    3.765949]  [<ffffffff80269406>] notifier_call_chain+0x26/0x40
> [    3.765980]  [<ffffffff802a704c>] _cpu_up+0xbc/0xf0
> [    3.766005]  [<ffffffff802a70b2>] cpu_up+0x32/0x60
> [    3.766030]  [<ffffffff802a712b>] enable_nonboot_cpus+0x4b/0xa0
> [    3.766057]  [<ffffffff802aba8b>] enter_state+0x15b/0x1b0
> [    3.766083]  [<ffffffff802abb5b>] state_store+0x7b/0xb0
> [    3.766112]  [<ffffffff80306c09>] sysfs_write_file+0xd9/0x120
> [    3.766143]  [<ffffffff8021653e>] vfs_write+0xde/0x1a0
> [    3.766169]  [<ffffffff80217073>] sys_write+0x53/0x90
> [    3.766197]  [<ffffffff8026011e>] system_call+0x7e/0x83
> [    3.766228]
> [    3.766247]
> [    3.766248] Code: 8b 40 18 48 8b 04 c5 80 f2 59 80 48 03 58 08 48 89
> df e8 54
> [    3.766331] RIP  [<ffffffff80289c45>] task_rq_lock+0x35/0x90
> [    3.766359]  RSP <ffff81001d291d28>
> [    3.766650]  <6>ata1.00: configured for UDMA/133
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22  3:23 ` Nigel Cunningham
@ 2007-01-22  5:16   ` Jean-Marc Valin
  2007-01-22  5:19     ` Nigel Cunningham
  2007-01-22 13:25     ` Pavel Machek
  0 siblings, 2 replies; 12+ messages in thread
From: Jean-Marc Valin @ 2007-01-22  5:16 UTC (permalink / raw)
  To: nigel; +Cc: linux-kernel

>> I just encountered the following oops and general protection fault
>> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
>> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
>> relevant errors are below but the full dmesg log is at
>> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
>> http://people.xiph.org/~jm/config-2.6.20-rc5.txt
...
> It looks like something is stomping on memory it shouldn't be touching,
> so I would suggest testing multiple cycles with a minimal (preferably
> zero) number of modules loaded. If that looks good and reliable, add
> modules & processes until you can say 'If I do X, it breaks.'. If having
> a minimal number of modules loaded doesn't help, I would then suggest
> reviewing your kernel config to see if other things can be built as
> modules and the same logic applied. You can be reasonably sure that it
> will be a device driver. Common causes of suspend/resume problems from
> the list you give below are acpi modules, bluetooth and usb. I'd also be
> consider pcmcia, drm and fuse possibilities. But again, go for unloading
> everything possible in the first instance.

Actually, the reason I sent this is that when I showed the oops/gpf to
Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
suspend to RAM now works ~95% of the time.

	Jean-Marc

> Regards,
> 
> Nigel
> 
>> Cheers,
>>
>> 	Jean-Marc
>>
>> P.S. This is the same laptop I had at LCA for which Linus told me to
>> disable preemption and try the newest rc version.
>>
>> [10746.449071] Unable to handle kernel NULL pointer dereference at
>> 0000000000000038 RIP:
>> [10746.449080]  [<ffffffff8022b9c8>] iput+0x18/0x80
>> [10746.449092] PGD 3a607067 PUD 27b20067 PMD 0
>> [10746.449099] Oops: 0000 [1] SMP
>> [10746.449104] CPU 0
>> [10746.449107] Modules linked in: psmouse battery ac thermal fan button
>> ipw3945 ieee80211 tg3 arc4 ecb blkcipher ieee80211_crypt_wep
>> ieee80211_crypt binfmt_misc rfcomm l2cap bluetooth i915 drm
>> speedstep_centrino cpufreq_userspace cpufreq_powersave cpufreq_ondemand
>> cpufreq_stats freq_table cpufreq_conservative video sbs i2c_ec dock
>> asus_acpi backlight container ipv6 fuse sbp2 af_packet parport_pc lp
>> parport sg sr_mod cdrom snd_hda_intel snd_hda_codec tsdev snd_pcm_oss
>> snd_mixer_oss pcmcia snd_pcm snd_timer ata_generic snd shpchp
>> pci_hotplug soundcore snd_page_alloc serio_raw yenta_socket
>> rsrc_nonstatic pcmcia_core pcspkr evdev ext3 jbd mbcache ohci1394
>> ehci_hcd ieee1394 ide_generic uhci_hcd usbcore generic sd_mod processor
>> [10746.449190] Pid: 218, comm: kswapd0 Not tainted 2.6.20-rc5-x86-64 #1
>> [10746.449196] RIP: 0010:[<ffffffff8022b9c8>]  [<ffffffff8022b9c8>]
>> iput+0x18/0x80
>> [10746.449206] RSP: 0000:ffff810037f2dd50  EFLAGS: 00010283
>> [10746.449212] RAX: 0000000000000000 RBX: ffff81000003fcf0 RCX:
>> ffff81000003fd20
>> [10746.449219] RDX: 0000000000000001 RSI: 0000000000000286 RDI:
>> ffff81000003fcf0
>> [10746.449225] RBP: 0000000000000042 R08: 0000000000000000 R09:
>> 0000000000000000
>> [10746.449232] R10: 28f5c28f5c28f5c3 R11: ffffffff8023ae90 R12:
>> 0000000000000000
>> [10746.449239] R13: ffff810075721c70 R14: ffffffff805fa940 R15:
>> 0000000000000000
>> [10746.449246] FS:  0000000000000000(0000) GS:ffffffff8058e000(0000)
>> knlGS:0000000000000000
>> [10746.449253] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> [10746.449259] CR2: 0000000000000038 CR3: 000000001207f000 CR4:
>> 00000000000006e0
>> [10746.449265] Process kswapd0 (pid: 218, threadinfo ffff810037f2c000,
>> task ffff810037a1b760)
>> [10746.449269] Stack:  ffff8100001ce2f0 ffffffff802ddaf8
>> ffff8100001ce3c0 ffff8100001ce2f0
>> [10746.449280]  0000000000000042 ffffffff8022f645 ffff810037f2dd80
>> 000000000001cb60
>> [10746.449288]  0000000000000090 ffff81007daa0e00 00000000000000d0
>> ffffffff802ddb49
>> [10746.449296] Call Trace:
>> [10746.449305]  [<ffffffff802ddaf8>] prune_one_dentry+0x68/0xa0
>> [10746.449314]  [<ffffffff8022f645>] prune_dcache+0x145/0x1e0
>> [10746.449323]  [<ffffffff802ddb49>] shrink_dcache_memory+0x19/0x50
>> [10746.449331]  [<ffffffff802418a7>] shrink_slab+0x117/0x190
>> [10746.449342]  [<ffffffff8025a392>] kswapd+0x382/0x4e0
>> [10746.449356]  [<ffffffff802a13b0>] autoremove_wake_function+0x0/0x30
>> [10746.449370]  [<ffffffff8025a010>] kswapd+0x0/0x4e0
>> [10746.449376]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
>> [10746.449383]  [<ffffffff802335a9>] kthread+0xd9/0x120
>> [10746.449394]  [<ffffffff80260ec8>] child_rip+0xa/0x12
>> [10746.449401]  [<ffffffff802a11d0>] keventd_create_kthread+0x0/0x90
>> [10746.449414]  [<ffffffff802334d0>] kthread+0x0/0x120
>> [10746.449421]  [<ffffffff80260ebe>] child_rip+0x0/0x12
>> [10746.449426]
>> [10746.449429]
>> [10746.449430] Code: 48 8b 40 38 75 04 0f 0b eb fe 48 85 c0 74 0b 48 8b
>> 40 28 48
>> [10746.449449] RIP  [<ffffffff8022b9c8>] iput+0x18/0x80
>> [10746.449456]  RSP <ffff810037f2dd50>
>> [10746.449460] CR2: 0000000000000038
>> [10746.449463]  ACPI Exception (pci_bind-0299): AE_NOT_FOUND, Unable to
>> get data from device DCKS [20060707]
>>
>>
>> and later:
>>
>>
>> [    3.668009] SMP alternatives: switching to SMP code
>> [    3.668168] Booting processor 1/2 APIC 0x1
>> [    4.149691] Initializing CPU#1
>> [    4.229595] Calibrating delay using timer specific routine.. 3990.32
>> BogoMIPS (lpj=7980654)
>> [    4.229602] CPU: L1 I cache: 32K, L1 D cache: 32K
>> [    4.229604] CPU: L2 cache: 4096K
>> [    4.229606] CPU 1/1 -> Node 0
>> [    4.229608] CPU: Physical Processor ID: 0
>> [    4.229609] CPU: Processor Core ID: 1
>> [    4.230107] Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz stepping 06
>> [    4.233607] CPU 1: Syncing TSC to CPU 0.
>> [    3.762970] CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles,
>> maxerr 960 cycles)
>> [    3.764689] general protection fault: 0000 [2] SMP
>> [    3.764963] CPU 1
>> [    3.764983] Modules linked in: psmouse battery ac thermal fan button
>> arc4 ecb blkcipher ieee80211_crypt_wep ieee80211_crypt binfmt_misc
>> rfcomm l2cap bluetooth i915 drm speedstep_centrino cpufreq_userspace
>> cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
>> cpufreq_conservative video sbs i2c_ec dock asus_acpi backlight container
>> ipv6 fuse sbp2 af_packet parport_pc lp parport sg sr_mod cdrom
>> snd_hda_intel snd_hda_codec tsdev snd_pcm_oss snd_mixer_oss pcmcia
>> snd_pcm snd_timer ata_generic snd shpchp pci_hotplug soundcore
>> snd_page_alloc serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr
>> evdev ext3 jbd mbcache ohci1394 ehci_hcd ieee1394 ide_generic uhci_hcd
>> usbcore generic sd_mod processor
>> [    3.765304] Pid: 7824, comm: sleep.sh Not tainted 2.6.20-rc5-x86-64 #1
>> [    3.765330] RIP: 0010:[<ffffffff80289c45>]  [<ffffffff80289c45>]
>> task_rq_lock+0x35/0x90
>> [    3.765379] RSP: 0018:ffff81001d291d28  EFLAGS: 00010086
>> [    3.765403] RAX: 6e696c7761726320 RBX: ffffffff805f88a0 RCX:
>> 0000000000000000
>> [    3.765431] RDX: 0000000000000000 RSI: ffff81001d291db0 RDI:
>> ffff810037a1b760
>> [    3.765460] RBP: ffff81001d291d48 R08: 0000000000000000 R09:
>> 0000000000000000
>> [    3.765488] R10: 0000000000000001 R11: 0000000000000000 R12:
>> ffffffff805f88a0
>> [    3.765516] R13: ffff810037a1b760 R14: ffff81001d291db0 R15:
>> ffff81000a9e3e80
>> [    3.765546] FS:  00002b1bcd567ae0(0000) GS:ffff810037ad1cc0(0000)
>> knlGS:0000000000000000
>> [    3.765589] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [    3.765615] CR2: 0000000000000000 CR3: 000000001d4d0000 CR4:
>> 00000000000006a0
>> [    3.765645] Process sleep.sh (pid: 7824, threadinfo ffff81001d290000,
>> task ffff8100661cf080)
>> [    3.765689] Stack:  ffff810037a1b760 0000000000000002
>> 0000000000000003 0000000000000003
>> [    3.765739]  ffff81001d291dd8 ffffffff8028c454 0000000000000001
>> 0000000000000003
>> [    3.765786]  0000000000000000 0000000000000002 ffffffff8058e4c0
>> 0000000000000003
>> [    3.765822] Call Trace:
>> [    3.765861]  [<ffffffff8028c454>] set_cpus_allowed+0x24/0xc0
>> [    3.765890]  [<ffffffff8033cf3b>] kobject_register+0x3b/0x50
>> [    3.765919]  [<ffffffff80292850>] cpu_callback+0x50/0x70
>> [    3.765949]  [<ffffffff80269406>] notifier_call_chain+0x26/0x40
>> [    3.765980]  [<ffffffff802a704c>] _cpu_up+0xbc/0xf0
>> [    3.766005]  [<ffffffff802a70b2>] cpu_up+0x32/0x60
>> [    3.766030]  [<ffffffff802a712b>] enable_nonboot_cpus+0x4b/0xa0
>> [    3.766057]  [<ffffffff802aba8b>] enter_state+0x15b/0x1b0
>> [    3.766083]  [<ffffffff802abb5b>] state_store+0x7b/0xb0
>> [    3.766112]  [<ffffffff80306c09>] sysfs_write_file+0xd9/0x120
>> [    3.766143]  [<ffffffff8021653e>] vfs_write+0xde/0x1a0
>> [    3.766169]  [<ffffffff80217073>] sys_write+0x53/0x90
>> [    3.766197]  [<ffffffff8026011e>] system_call+0x7e/0x83
>> [    3.766228]
>> [    3.766247]
>> [    3.766248] Code: 8b 40 18 48 8b 04 c5 80 f2 59 80 48 03 58 08 48 89
>> df e8 54
>> [    3.766331] RIP  [<ffffffff80289c45>] task_rq_lock+0x35/0x90
>> [    3.766359]  RSP <ffff81001d291d28>
>> [    3.766650]  <6>ata1.00: configured for UDMA/133
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22  5:16   ` Jean-Marc Valin
@ 2007-01-22  5:19     ` Nigel Cunningham
  2007-01-22 13:25     ` Pavel Machek
  1 sibling, 0 replies; 12+ messages in thread
From: Nigel Cunningham @ 2007-01-22  5:19 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: linux-kernel

Hi.

On Mon, 2007-01-22 at 16:16 +1100, Jean-Marc Valin wrote:
> >> I just encountered the following oops and general protection fault
> >> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
> >> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
> >> relevant errors are below but the full dmesg log is at
> >> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
> >> http://people.xiph.org/~jm/config-2.6.20-rc5.txt
> ...
> > It looks like something is stomping on memory it shouldn't be touching,
> > so I would suggest testing multiple cycles with a minimal (preferably
> > zero) number of modules loaded. If that looks good and reliable, add
> > modules & processes until you can say 'If I do X, it breaks.'. If having
> > a minimal number of modules loaded doesn't help, I would then suggest
> > reviewing your kernel config to see if other things can be built as
> > modules and the same logic applied. You can be reasonably sure that it
> > will be a device driver. Common causes of suspend/resume problems from
> > the list you give below are acpi modules, bluetooth and usb. I'd also be
> > consider pcmcia, drm and fuse possibilities. But again, go for unloading
> > everything possible in the first instance.
> 
> Actually, the reason I sent this is that when I showed the oops/gpf to
> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
> suspend to RAM now works ~95% of the time.

I agree that the second is cpu hotplug, but the first is something else,
hence my recommendations above.

Regards,

Nigel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22  2:34 Suspend to RAM generates oops and general protection fault Jean-Marc Valin
  2007-01-22  3:23 ` Nigel Cunningham
@ 2007-01-22 11:59 ` Rafael J. Wysocki
  2007-01-23  4:38   ` Jean-Marc Valin
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2007-01-22 11:59 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

Hi,

On Monday, 22 January 2007 03:34, Jean-Marc Valin wrote:
> Hi,
> 
> I just encountered the following oops and general protection fault
> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
> relevant errors are below but the full dmesg log is at
> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
> http://people.xiph.org/~jm/config-2.6.20-rc5.txt
> 
> This happens when I'm running 2.6.20-rc5. The previous kernel version I
> was using is 2.6.19-rc6 and was much more broken (second attempt
> *always* failed), so it's probably not a regression.

This is a shot against the odds, but could you please check if the attached
patch has any effect?

Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
		- Stephen King

[-- Attachment #2: page_alloc-fix.patch --]
[-- Type: text/x-diff, Size: 1532 bytes --]

Both process_zones()and drain_node_pages() check for populated zones before
touching pagesets. However, __drain_pages does not do so,

This may result in a NULL pointer dereference for pagesets in unpopulated
zones if a NUMA setup is combined with cpu hotplug.

Initially the unpopulated zone has the pcp pointers pointing to the boot
pagesets.  Since the zone is not populated the boot pageset pointers will
not be changed during page allocator and slab bootstrap.

If a cpu is later brought down (first call to __drain_pages()) then the pcp
pointers for cpus in unpopulated zones are set to NULL since __drain_pages
does not first check for an unpopulated zone.

If the cpu is then brought up again then we call process_zones() which will ignore
the unpopulated zone. So the pageset pointers will still be NULL.

If the cpu is then again brought down then __drain_pages will attempt to drain
pages by following the NULL pageset pointer for unpopulated zones.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page_alloc.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.20-rc4/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc4.orig/mm/page_alloc.c
+++ linux-2.6.20-rc4/mm/page_alloc.c
@@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c
 		if (!populated_zone(zone))
 			continue;
 
+		if (!populated_zone(zone))
+			continue;
+
 		pset = zone_pcp(zone, cpu);
 		for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
 			struct per_cpu_pages *pcp;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22  5:16   ` Jean-Marc Valin
  2007-01-22  5:19     ` Nigel Cunningham
@ 2007-01-22 13:25     ` Pavel Machek
  2007-01-23  4:42       ` Jean-Marc Valin
  2007-03-23 12:34       ` Jean-Marc Valin
  1 sibling, 2 replies; 12+ messages in thread
From: Pavel Machek @ 2007-01-22 13:25 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: nigel, linux-kernel

Hi!

> > will be a device driver. Common causes of suspend/resume problems from
> > the list you give below are acpi modules, bluetooth and usb. I'd also be
> > consider pcmcia, drm and fuse possibilities. But again, go for unloading
> > everything possible in the first instance.
> 
> Actually, the reason I sent this is that when I showed the oops/gpf to
> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
> suspend to RAM now works ~95% of the time.

Try a kernel without CONFIG_SMP... that will verify if it is SMP
related.
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22 11:59 ` Rafael J. Wysocki
@ 2007-01-23  4:38   ` Jean-Marc Valin
  0 siblings, 0 replies; 12+ messages in thread
From: Jean-Marc Valin @ 2007-01-23  4:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel

>> I just encountered the following oops and general protection fault
>> trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
>> GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
>> relevant errors are below but the full dmesg log is at
>> http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
>> http://people.xiph.org/~jm/config-2.6.20-rc5.txt
>>
>> This happens when I'm running 2.6.20-rc5. The previous kernel version I
>> was using is 2.6.19-rc6 and was much more broken (second attempt
>> *always* failed), so it's probably not a regression.
> 
> This is a shot against the odds, but could you please check if the attached
> patch has any effect?

Thanks, I'll try that. It may take a while because the problem only
happened once in dozens of suspend/resume cycles.

	Jean-Marc

> Rafael
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Both process_zones()and drain_node_pages() check for populated zones before
> touching pagesets. However, __drain_pages does not do so,
> 
> This may result in a NULL pointer dereference for pagesets in unpopulated
> zones if a NUMA setup is combined with cpu hotplug.
> 
> Initially the unpopulated zone has the pcp pointers pointing to the boot
> pagesets.  Since the zone is not populated the boot pageset pointers will
> not be changed during page allocator and slab bootstrap.
> 
> If a cpu is later brought down (first call to __drain_pages()) then the pcp
> pointers for cpus in unpopulated zones are set to NULL since __drain_pages
> does not first check for an unpopulated zone.
> 
> If the cpu is then brought up again then we call process_zones() which will ignore
> the unpopulated zone. So the pageset pointers will still be NULL.
> 
> If the cpu is then again brought down then __drain_pages will attempt to drain
> pages by following the NULL pageset pointer for unpopulated zones.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> ---
>  mm/page_alloc.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6.20-rc4/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.20-rc4.orig/mm/page_alloc.c
> +++ linux-2.6.20-rc4/mm/page_alloc.c
> @@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c
>  		if (!populated_zone(zone))
>  			continue;
>  
> +		if (!populated_zone(zone))
> +			continue;
> +
>  		pset = zone_pcp(zone, cpu);
>  		for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
>  			struct per_cpu_pages *pcp;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22 13:25     ` Pavel Machek
@ 2007-01-23  4:42       ` Jean-Marc Valin
  2007-01-23  7:11         ` Luming Yu
  2007-03-23 12:34       ` Jean-Marc Valin
  1 sibling, 1 reply; 12+ messages in thread
From: Jean-Marc Valin @ 2007-01-23  4:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: nigel, linux-kernel

>>> will be a device driver. Common causes of suspend/resume problems from
>>> the list you give below are acpi modules, bluetooth and usb. I'd also be
>>> consider pcmcia, drm and fuse possibilities. But again, go for unloading
>>> everything possible in the first instance.
>> Actually, the reason I sent this is that when I showed the oops/gpf to
>> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
>> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
>> suspend to RAM now works ~95% of the time.
> 
> Try a kernel without CONFIG_SMP... that will verify if it is SMP
> related.

Well, this happens to be my main work machine, which I'm not willing to
have running at half speed for several weeks. Anything else you can suggest?

	Jean-Marc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-23  4:42       ` Jean-Marc Valin
@ 2007-01-23  7:11         ` Luming Yu
  2007-01-23 14:53           ` Jean-Marc Valin
  0 siblings, 1 reply; 12+ messages in thread
From: Luming Yu @ 2007-01-23  7:11 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Pavel Machek, nigel, linux-kernel

what about removing psmouse module?

On 1/23/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
> >>> will be a device driver. Common causes of suspend/resume problems from
> >>> the list you give below are acpi modules, bluetooth and usb. I'd also be
> >>> consider pcmcia, drm and fuse possibilities. But again, go for unloading
> >>> everything possible in the first instance.
> >> Actually, the reason I sent this is that when I showed the oops/gpf to
> >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
> >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
> >> suspend to RAM now works ~95% of the time.
> >
> > Try a kernel without CONFIG_SMP... that will verify if it is SMP
> > related.
>
> Well, this happens to be my main work machine, which I'm not willing to
> have running at half speed for several weeks. Anything else you can suggest?
>
>         Jean-Marc
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-23  7:11         ` Luming Yu
@ 2007-01-23 14:53           ` Jean-Marc Valin
  2007-01-23 15:05             ` Luming Yu
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Marc Valin @ 2007-01-23 14:53 UTC (permalink / raw)
  To: Luming Yu; +Cc: Pavel Machek, nigel, linux-kernel

Luming Yu a écrit :
> what about removing psmouse module?

Trying that now. Any particular reason you suspect that one?

	Jean-Marc

> On 1/23/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
>> >>> will be a device driver. Common causes of suspend/resume problems
>> from
>> >>> the list you give below are acpi modules, bluetooth and usb. I'd
>> also be
>> >>> consider pcmcia, drm and fuse possibilities. But again, go for
>> unloading
>> >>> everything possible in the first instance.
>> >> Actually, the reason I sent this is that when I showed the oops/gpf to
>> >> Matthew Garrett at linux.conf.au, he said it looked like a CPU hotplug
>> >> problem and suggested I send it to lkml. BTW, with 2.6.20-rc5, the
>> >> suspend to RAM now works ~95% of the time.
>> >
>> > Try a kernel without CONFIG_SMP... that will verify if it is SMP
>> > related.
>>
>> Well, this happens to be my main work machine, which I'm not willing to
>> have running at half speed for several weeks. Anything else you can
>> suggest?
>>
>>         Jean-Marc
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-23 14:53           ` Jean-Marc Valin
@ 2007-01-23 15:05             ` Luming Yu
  0 siblings, 0 replies; 12+ messages in thread
From: Luming Yu @ 2007-01-23 15:05 UTC (permalink / raw)
  To: Jean-Marc Valin; +Cc: Pavel Machek, nigel, linux-kernel

On 1/23/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
> Luming Yu a écrit :
> > what about removing psmouse module?
>
> Trying that now. Any particular reason you suspect that one?
>

I suspect it is due to broken modules. If not psmouse, please trying a
boot with minimal modules loaded, and re-test .

Thanks,
Luming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Suspend to RAM generates oops and general protection fault
  2007-01-22 13:25     ` Pavel Machek
  2007-01-23  4:42       ` Jean-Marc Valin
@ 2007-03-23 12:34       ` Jean-Marc Valin
  1 sibling, 0 replies; 12+ messages in thread
From: Jean-Marc Valin @ 2007-03-23 12:34 UTC (permalink / raw)
  To: Pavel Machek; +Cc: nigel, linux-kernel

Hi,

Sorry I haven't replied recently about that bug, but I have to admit I
have no idea where to start. There actually seems to be much more
fundamental problems with the kernel on my machines. I initially
realised that even without using suspend to RAM, I was still getting
crashes when docking. So I stopped docking and realised my machine would
sometimes just crash when I plug/unplug the AC adaptor. Just to give an
idea, I've experienced about 10-15 crashes in the past two months -- I
don't think I've even done a single clean shutdown during that period.

To make things worse, the behaviour is always different. Sometimes I get
a panic with keyboard LEDs flashing. Sometimes I get nothing at all and
the machine is just frozen (doesn't respond to pings or to Alt-SysRq
commands). Sometimes, I just lose my keyboard and/or mouse but the
machine stays up. I'm running a vanilla 2.6.20 kernel (not tainted) with
the following configuration: http://jmspeex.livejournal.com/1090.html

	Jean-Marc



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-03-23 12:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-22  2:34 Suspend to RAM generates oops and general protection fault Jean-Marc Valin
2007-01-22  3:23 ` Nigel Cunningham
2007-01-22  5:16   ` Jean-Marc Valin
2007-01-22  5:19     ` Nigel Cunningham
2007-01-22 13:25     ` Pavel Machek
2007-01-23  4:42       ` Jean-Marc Valin
2007-01-23  7:11         ` Luming Yu
2007-01-23 14:53           ` Jean-Marc Valin
2007-01-23 15:05             ` Luming Yu
2007-03-23 12:34       ` Jean-Marc Valin
2007-01-22 11:59 ` Rafael J. Wysocki
2007-01-23  4:38   ` Jean-Marc Valin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).