LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] Code reordering in swsusp breaks suspend on SMP systems
@ 2007-03-21 16:40 Maxim Levitsky
  2007-03-21 21:22 ` Nigel Cunningham
  2007-03-21 22:21 ` Pavel Machek
  0 siblings, 2 replies; 30+ messages in thread
From: Maxim Levitsky @ 2007-03-21 16:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pavel Machek, Rafael J. Wysocki

Hi,

Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.

I did a git-bisect and found that those commits break it:

e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c

I already reported about it, but now i know the reason why suspend breaks.

The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
and it did work because those functions could be called only in process context
(the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).

But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
that is taken by different process, it can't since the different proccess is frozen and can't release the lock.

I tested this and all results are positive:

I disabled 2nd cpu by hand, and then suspend to ram was successfull.
Suspend to disk went correctly, but it hang on resume, and I know why.
It hang in old kernel trying to disable 2nd cpu that was enabled by it.

I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that
idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback.

The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier,
to ensure that it doesn't sleep. 
Also documentation should be changed to note about it.

Or, it is also possible to revert this change.

	Regards,
		Maxim Levitsky


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 16:40 [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim Levitsky
@ 2007-03-21 21:22 ` Nigel Cunningham
  2007-03-21 21:38   ` Rafael J. Wysocki
       [not found]   ` <200703220114.05228.maximlevitsky@gmail.com>
  2007-03-21 22:21 ` Pavel Machek
  1 sibling, 2 replies; 30+ messages in thread
From: Nigel Cunningham @ 2007-03-21 21:22 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux-kernel, Pavel Machek, Rafael J. Wysocki

Hi.

On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> Hi,
> 
> Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> 
> I did a git-bisect and found that those commits break it:
> 
> e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> 
> I already reported about it, but now i know the reason why suspend breaks.
> 
> The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> and it did work because those functions could be called only in process context
> (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> 
> But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> 
> I tested this and all results are positive:
> 
> I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> Suspend to disk went correctly, but it hang on resume, and I know why.
> It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> 
> I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that
> idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback.
> 
> The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier,
> to ensure that it doesn't sleep. 
> Also documentation should be changed to note about it.
> 
> Or, it is also possible to revert this change.

Do you know exactly which mutex was being waited on and where it was
taken? If you can say that, it would be much more helpful.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 21:22 ` Nigel Cunningham
@ 2007-03-21 21:38   ` Rafael J. Wysocki
  2007-03-21 23:47     ` Nigel Cunningham
  2007-03-22  4:51     ` David Chinner
       [not found]   ` <200703220114.05228.maximlevitsky@gmail.com>
  1 sibling, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-21 21:38 UTC (permalink / raw)
  To: nigel, Maxim Levitsky; +Cc: linux-kernel, Pavel Machek, David Chinner

On Wednesday, 21 March 2007 22:22, Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > Hi,
> > 
> > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > 
> > I did a git-bisect and found that those commits break it:
> > 
> > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > 
> > I already reported about it, but now i know the reason why suspend breaks.
> > 
> > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > and it did work because those functions could be called only in process context
> > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > 
> > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > 
> > I tested this and all results are positive:
> > 
> > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > Suspend to disk went correctly, but it hang on resume, and I know why.
> > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > 
> > I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that
> > idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback.
> > 
> > The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier,
> > to ensure that it doesn't sleep. 
> > Also documentation should be changed to note about it.
> > 
> > Or, it is also possible to revert this change.
> 
> Do you know exactly which mutex was being waited on and where it was
> taken? If you can say that, it would be much more helpful.

I think this is the XFS problem with freezable workqueues.

Maxim, please try to apply the appended patch and see if it helps.

Greetings,
Rafael


---
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
it's better to remove them altogether for 2.6.21 and change the only user of
them (XFS) accordingly.

---
 fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
@@ -1829,11 +1829,11 @@ xfs_buf_init(void)
 	if (!xfs_buf_zone)
 		goto out_free_trace_buf;
 
-	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
+	xfslogd_workqueue = create_workqueue("xfslogd");
 	if (!xfslogd_workqueue)
 		goto out_free_buf_zone;
 
-	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
+	xfsdatad_workqueue = create_workqueue("xfsdatad");
 	if (!xfsdatad_workqueue)
 		goto out_destroy_xfslogd_workqueue;
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 16:40 [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim Levitsky
  2007-03-21 21:22 ` Nigel Cunningham
@ 2007-03-21 22:21 ` Pavel Machek
  2007-03-21 22:39   ` Rafael J. Wysocki
  1 sibling, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2007-03-21 22:21 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux-kernel, Rafael J. Wysocki

Hi!

> Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> 
> I did a git-bisect and found that those commits break it:
> 
> e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> 

(Yep, it was in my "to analyze" queue).

> I already reported about it, but now i know the reason why suspend breaks.
> 
> The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> and it did work because those functions could be called only in process context
> (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> 
> But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> 

Thanks for detailed explanation.

...but, on my machine suspend works ok in -rc4. I'm not seeing this.

...by design, "frozen" tasks must not hold any locks. If frozen task
holds a lock, that's a bug.

> Or, it is also possible to revert this change.

Are you using xfs?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 22:21 ` Pavel Machek
@ 2007-03-21 22:39   ` Rafael J. Wysocki
  2007-03-21 22:58     ` [RFC] : Is /proc/kcore still usefull and/or maintained ? Eric Dumazet
       [not found]     ` <200703220109.54719.maximlevitsky@gmail.com>
  0 siblings, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-21 22:39 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Maxim Levitsky, linux-kernel

On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> Hi!
> 
> > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > 
> > I did a git-bisect and found that those commits break it:
> > 
> > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > 
> 
> (Yep, it was in my "to analyze" queue).
> 
> > I already reported about it, but now i know the reason why suspend breaks.
> > 
> > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > and it did work because those functions could be called only in process context
> > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > 
> > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > 
> 
> Thanks for detailed explanation.
> 
> ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> 
> ...by design, "frozen" tasks must not hold any locks. If frozen task
> holds a lock, that's a bug.
> 
> > Or, it is also possible to revert this change.
> 
> Are you using xfs?

Well, this is the only case that can trigger it.  There are no other freezable
workqueues.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-21 22:39   ` Rafael J. Wysocki
@ 2007-03-21 22:58     ` Eric Dumazet
  2007-03-21 23:11       ` Jan Engelhardt
       [not found]     ` <200703220109.54719.maximlevitsky@gmail.com>
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2007-03-21 22:58 UTC (permalink / raw)
  To: Andi Kleen, David Howells, Jeremy Fitzhardinge; +Cc: linux-kernel

Hi all

On i386 , 2.6.20 / 2.6.21-rc4 :

# gdb vmlinux /proc/kcore
error
# file /proc/kcore
error


Apparently we can not llseek() anymore on this file (returns -EINVAL)

On x86_64 2.6.20 it's working

# file /proc/kcore
/proc/kcore: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style


On i386 2.6.14 it's working too.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-21 22:58     ` [RFC] : Is /proc/kcore still usefull and/or maintained ? Eric Dumazet
@ 2007-03-21 23:11       ` Jan Engelhardt
  2007-03-21 23:28         ` Maxim
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Engelhardt @ 2007-03-21 23:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, David Howells, Jeremy Fitzhardinge, linux-kernel


On Mar 21 2007 23:58, Eric Dumazet wrote:
> Hi all
>
> On i386 , 2.6.20 / 2.6.21-rc4 :
>
> # gdb vmlinux /proc/kcore
> error
> # file /proc/kcore
> error

00:11 ichi:/hld # file /proc/kcore
/proc/kcore: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'vmlinux'
00:11 ichi:/hld # hexdump -C /proc/kcore | head -n5
00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  04 00 03 00 01 00 00 00  00 00 00 00 34 00 00 00  |............4...|
00000020  00 00 00 00 00 00 00 00  34 00 20 00 03 00 00 00  |........4. .....|
00000030  00 00 00 00 04 00 00 00  94 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 a8 06 00 00  00 00 00 00 00 00 00 00  |................|
00:11 ichi:/hld # uname -rm
2.6.20.2 i686

>
>
> Apparently we can not llseek() anymore on this file (returns -EINVAL)
>
> On x86_64 2.6.20 it's working
>
> # file /proc/kcore
> /proc/kcore: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style
>
>
> On i386 2.6.14 it's working too.
>
> Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>

Jan
-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
       [not found]   ` <200703220114.05228.maximlevitsky@gmail.com>
@ 2007-03-21 23:16     ` Maxim
  2007-03-22  0:32     ` Maxim
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-21 23:16 UTC (permalink / raw)
  To: nigel; +Cc: Rafael J. Wysocki, linux-kernel, Pavel Machek, Rafael J. Wysocki

On Thursday 22 March 2007 01:14:05 Maxim wrote:
> On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > > Hi,
> > > 
> > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > 
> > > I did a git-bisect and found that those commits break it:
> > > 
> > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > 
> > > I already reported about it, but now i know the reason why suspend breaks.
> > > 
> > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > and it did work because those functions could be called only in process context
> > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > 
> > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > 
> > > I tested this and all results are positive:
> > > 
> > > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > > Suspend to disk went correctly, but it hang on resume, and I know why.
> > > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > > 
> > > I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that
> > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback.
> > > 
> > > The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier,
> > > to ensure that it doesn't sleep. 
> > > Also documentation should be changed to note about it.
> > > 
> > > Or, it is also possible to revert this change.
> > 
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.
> > 
> > Regards,
> > 
> > Nigel
> > 
> > 
> 
> Hello,
> 
> 	It is workqueue_mutex
> 	and it is taken in kernel/workqueue.c:797
> 
> 	this is guilt of freezable work queues , and XFS uses it (and I use XFS)
> 
> 	Thanks to Rafael J. Wysocki for pointing it out to me.
> 
> 	Regards,
> 		Maxim Levitsky
> 

oops, Forgot to cc

Regards,
	Maxim Levitsky
	

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
       [not found]     ` <200703220109.54719.maximlevitsky@gmail.com>
@ 2007-03-21 23:18       ` Maxim
       [not found]       ` <200703220024.25436.rjw@sisk.pl>
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-21 23:18 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek, nigel

On Thursday 22 March 2007 01:09:54 Maxim wrote:
> On Thursday 22 March 2007 00:39:02 you wrote:
> > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > 
> > > > I did a git-bisect and found that those commits break it:
> > > > 
> > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > 
> > > 
> > > (Yep, it was in my "to analyze" queue).
> > > 
> > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > 
> > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > and it did work because those functions could be called only in process context
> > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > 
> > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > 
> > > 
> > > Thanks for detailed explanation.
> > > 
> > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > 
> > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > holds a lock, that's a bug.
> > > 
> > > > Or, it is also possible to revert this change.
> > > 
> > > Are you using xfs?
> > 
> > Well, this is the only case that can trigger it.  There are no other freezable
> > workqueues.
> > 
> > Greetings,
> > Rafael
> > 
> 
> Hello,
> 
> 	Yes, you are right and it is XFS
> 
> 	System suspends and resumes with xfs and your patch correctly,
> 
> 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> 	because it calls firmware loader helper, and again sleeps on lock
> 
> 	And also I noticed now that system oopses on second attempt to suspend ether to ram or disk
> 	in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon.
> 
> 	Thanks a lot,
> 		Regards,
> 			Maxim Levitsky
> 
> 

And I forgot to cc here too, I didn't intend to send private email.

	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-21 23:11       ` Jan Engelhardt
@ 2007-03-21 23:28         ` Maxim
  2007-03-21 23:53           ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim @ 2007-03-21 23:28 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Eric Dumazet, Andi Kleen, David Howells, Jeremy Fitzhardinge,
	linux-kernel

On Thursday 22 March 2007 01:11:57 Jan Engelhardt wrote:
> 
> On Mar 21 2007 23:58, Eric Dumazet wrote:
> > Hi all
> >
> > On i386 , 2.6.20 / 2.6.21-rc4 :
> >
> > # gdb vmlinux /proc/kcore
> > error
> > # file /proc/kcore
> > error
> 
> 00:11 ichi:/hld # file /proc/kcore
> /proc/kcore: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
> SVR4-style, from 'vmlinux'
> 00:11 ichi:/hld # hexdump -C /proc/kcore | head -n5
> 00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
> 00000010  04 00 03 00 01 00 00 00  00 00 00 00 34 00 00 00  |............4...|
> 00000020  00 00 00 00 00 00 00 00  34 00 20 00 03 00 00 00  |........4. .....|
> 00000030  00 00 00 00 04 00 00 00  94 00 00 00 00 00 00 00  |................|
> 00000040  00 00 00 00 a8 06 00 00  00 00 00 00 00 00 00 00  |................|
> 00:11 ichi:/hld # uname -rm
> 2.6.20.2 i686
> 
> >
> >
> > Apparently we can not llseek() anymore on this file (returns -EINVAL)
> >
> > On x86_64 2.6.20 it's working
> >
> > # file /proc/kcore
> > /proc/kcore: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style
> >
> >
> > On i386 2.6.14 it's working too.
> >
> > Eric
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> 
> Jan

Hello, 

	I once had similar problem with /proc/kcore

	then gdb showed all zeros for all kernel memory,

	I had look at code , and I found that "Sparse memory model" was the problem, it doesn't say where kernel memory is 
	(I don't remember details now)
	I once choosed it just for experiment,  so I switched to Flat memory, and /proc/kcore works fine till then,

	I use 32 bit x86 kernel.

	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
       [not found]       ` <200703220024.25436.rjw@sisk.pl>
@ 2007-03-21 23:39         ` Maxim
  2007-03-21 23:44           ` Maxim
  2007-03-21 23:53           ` Rafael J. Wysocki
  0 siblings, 2 replies; 30+ messages in thread
From: Maxim @ 2007-03-21 23:39 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek

On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:09, Maxim wrote:
> > On Thursday 22 March 2007 00:39:02 you wrote:
> > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > Hi!
> > > > 
> > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > 
> > > > > I did a git-bisect and found that those commits break it:
> > > > > 
> > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > 
> > > > 
> > > > (Yep, it was in my "to analyze" queue).
> > > > 
> > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > 
> > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > and it did work because those functions could be called only in process context
> > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > 
> > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > 
> > > > 
> > > > Thanks for detailed explanation.
> > > > 
> > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > 
> > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > holds a lock, that's a bug.
> > > > 
> > > > > Or, it is also possible to revert this change.
> > > > 
> > > > Are you using xfs?
> > > 
> > > Well, this is the only case that can trigger it.  There are no other freezable
> > > workqueues.
> > > 
> > > Greetings,
> > > Rafael
> > > 
> > 
> > Hello,
> > 
> > 	Yes, you are right and it is XFS
> > 
> > 	System suspends and resumes with xfs and your patch correctly,
> 
> Could you please sent this information to the list?  I'd like it to reach all
> of the CCed parites. ;-)

I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
I didn't intend to send private email.
> 
> > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > 	because it calls firmware loader helper, and again sleeps on lock
> 
> This is interesting.  Did it happen before or is it a regression?

It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
Of course it didn't happen before those reordering commits were introduced
> 
> > 	And also I noticed now that system oopses on second attempt to suspend ether to ram or disk
> > 	in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon.
> 
> Thanks.  We've had such reports earlier, but I think the problem is still unresolved.  Any
> additional information will be valuable.

I will do my best,
Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk
or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, 
so I assume, that this it is same problem , because it didn't hang of first try)

> 
> Greetings,
> Rafael
> 





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 23:39         ` Maxim
@ 2007-03-21 23:44           ` Maxim
  2007-03-21 23:53           ` Rafael J. Wysocki
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-21 23:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek

On Thursday 22 March 2007 01:39:24 Maxim wrote:
> On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > Hi!
> > > > > 
> > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > 
> > > > > > I did a git-bisect and found that those commits break it:
> > > > > > 
> > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > 
> > > > > 
> > > > > (Yep, it was in my "to analyze" queue).
> > > > > 
> > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > 
> > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > and it did work because those functions could be called only in process context
> > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > 
> > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > 
> > > > > 
> > > > > Thanks for detailed explanation.
> > > > > 
> > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > 
> > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > holds a lock, that's a bug.
> > > > > 
> > > > > > Or, it is also possible to revert this change.
> > > > > 
> > > > > Are you using xfs?
> > > > 
> > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > workqueues.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > 
> > > Hello,
> > > 
> > > 	Yes, you are right and it is XFS
> > > 
> > > 	System suspends and resumes with xfs and your patch correctly,
> > 
> > Could you please sent this information to the list?  I'd like it to reach all
> > of the CCed parites. ;-)
> 
> I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> I didn't intend to send private email.
> > 
> > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > 	because it calls firmware loader helper, and again sleeps on lock
> > 
> > This is interesting.  Did it happen before or is it a regression?
> 
> It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> Of course it didn't happen before those reordering commits were introduced
> > 
> > > 	And also I noticed now that system oopses on second attempt to suspend ether to ram or disk
> > > 	in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon.
> > 
> > Thanks.  We've had such reports earlier, but I think the problem is still unresolved.  Any
> > additional information will be valuable.
> 
> I will do my best,
> Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk
> or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, 
> so I assume, that this it is same problem , because it didn't hang of first try)
> 
> > 
> > Greetings,
> > Rafael
> > 
> 
> 
> 
> 
> 
 
And I forgot signature too (what happens with me today,...  please sorry)

	Regards
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 21:38   ` Rafael J. Wysocki
@ 2007-03-21 23:47     ` Nigel Cunningham
  2007-03-22  0:25       ` Maxim
  2007-03-22  4:51     ` David Chinner
  1 sibling, 1 reply; 30+ messages in thread
From: Nigel Cunningham @ 2007-03-21 23:47 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Rafael J. Wysocki, linux-kernel, Pavel Machek, David Chinner

Hi.

On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.

Yeah, me too, but assuming too much sometimes bites me :)

> I think this is the XFS problem with freezable workqueues.
> 
> Maxim, please try to apply the appended patch and see if it helps.

Thanks for your subsequent messages, Maxim. Could you confirm for us
that the patch Rafael attached fixes it?

Regards,

Nigel

> ---
> Since freezable workqueues are broken in 2.6.21-rc
> (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> it's better to remove them altogether for 2.6.21 and change the only user of
> them (XFS) accordingly.
> 
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
>  	if (!xfs_buf_zone)
>  		goto out_free_trace_buf;
>  
> -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> +	xfslogd_workqueue = create_workqueue("xfslogd");
>  	if (!xfslogd_workqueue)
>  		goto out_free_buf_zone;
>  
> -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> +	xfsdatad_workqueue = create_workqueue("xfsdatad");
>  	if (!xfsdatad_workqueue)
>  		goto out_destroy_xfslogd_workqueue;
>  


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-21 23:28         ` Maxim
@ 2007-03-21 23:53           ` Eric Dumazet
  2007-03-22  0:04             ` Maxim
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2007-03-21 23:53 UTC (permalink / raw)
  To: Maxim, Andrew Morton
  Cc: Jan Engelhardt, Andi Kleen, David Howells, Jeremy Fitzhardinge,
	linux-kernel

I stand corrected : This is a new bug

The /proc/kcore problem appears with linux-2.6.21-rc4-mm1

fd = open("/proc/kcore", 0);
llseek(fd, ...) returns an -EINVAL error


Quick code inspection (before going to sleep...) shows that

proc_reg_llseek() (file fs/proc/inode.c)

is doing something like :

rv = -EINVAL;
llseek = pde->proc_fops->llseek;
spin_unlock(&pde->pde_unload_lock);
if (llseek)
	rv = llseek(file, offset, whence);

As kcore dont have a .llseek handler, proc_reg_llseek() returns -EINVAL;

Previous kernel was probably calling a default llseek() handler.

if (!llseek)
	llseek = default_llseek;

Hum ???

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 23:39         ` Maxim
  2007-03-21 23:44           ` Maxim
@ 2007-03-21 23:53           ` Rafael J. Wysocki
  2007-03-22  0:01             ` Maxim
  2007-03-22 23:30             ` Rafael J. Wysocki
  1 sibling, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-21 23:53 UTC (permalink / raw)
  To: Maxim; +Cc: linux-kernel, Pavel Machek

On Thursday, 22 March 2007 00:39, Maxim wrote:
> On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > Hi!
> > > > > 
> > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > 
> > > > > > I did a git-bisect and found that those commits break it:
> > > > > > 
> > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > 
> > > > > 
> > > > > (Yep, it was in my "to analyze" queue).
> > > > > 
> > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > 
> > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > and it did work because those functions could be called only in process context
> > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > 
> > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > 
> > > > > 
> > > > > Thanks for detailed explanation.
> > > > > 
> > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > 
> > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > holds a lock, that's a bug.
> > > > > 
> > > > > > Or, it is also possible to revert this change.
> > > > > 
> > > > > Are you using xfs?
> > > > 
> > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > workqueues.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > 
> > > Hello,
> > > 
> > > 	Yes, you are right and it is XFS
> > > 
> > > 	System suspends and resumes with xfs and your patch correctly,
> > 
> > Could you please sent this information to the list?  I'd like it to reach all
> > of the CCed parites. ;-)
> 
> I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> I didn't intend to send private email.
> > 
> > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > 	because it calls firmware loader helper, and again sleeps on lock
> > 
> > This is interesting.  Did it happen before or is it a regression?
> 
> It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> Of course it didn't happen before those reordering commits were introduced

Well, we want cpu_up/down to be called after processes have been frozen, for
various reasons (one of them being that applications shouldn't see us playing
with the CPUs).

Thanks for reporting this, I'll have a look at the microcode update driver.

> > > 	And also I noticed now that system oopses on second attempt to suspend ether to ram or disk
> > > 	in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon.
> > 
> > Thanks.  We've had such reports earlier, but I think the problem is still unresolved.  Any
> > additional information will be valuable.
> 
> I will do my best,
> Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk
> or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after
> suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try)

Thanks for the information.

BTW, what's the last kernel you have tested?

Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 23:53           ` Rafael J. Wysocki
@ 2007-03-22  0:01             ` Maxim
  2007-03-22 23:30             ` Rafael J. Wysocki
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-22  0:01 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek

On Thursday 22 March 2007 01:53:54 Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:39, Maxim wrote:
> > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > > 
> > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > 
> > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > > 
> > > > > > 
> > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > 
> > > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > > 
> > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > > and it did work because those functions could be called only in process context
> > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > > 
> > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > > 
> > > > > > 
> > > > > > Thanks for detailed explanation.
> > > > > > 
> > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > 
> > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > holds a lock, that's a bug.
> > > > > > 
> > > > > > > Or, it is also possible to revert this change.
> > > > > > 
> > > > > > Are you using xfs?
> > > > > 
> > > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > > workqueues.
> > > > > 
> > > > > Greetings,
> > > > > Rafael
> > > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > 	Yes, you are right and it is XFS
> > > > 
> > > > 	System suspends and resumes with xfs and your patch correctly,
> > > 
> > > Could you please sent this information to the list?  I'd like it to reach all
> > > of the CCed parites. ;-)
> > 
> > I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> > I didn't intend to send private email.
> > > 
> > > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > > 	because it calls firmware loader helper, and again sleeps on lock
> > > 
> > > This is interesting.  Did it happen before or is it a regression?
> > 
> > It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> > Of course it didn't happen before those reordering commits were introduced
> 
> Well, we want cpu_up/down to be called after processes have been frozen, for
> various reasons (one of them being that applications shouldn't see us playing
> with the CPUs).
> 
> Thanks for reporting this, I'll have a look at the microcode update driver.
> 
> > > > 	And also I noticed now that system oopses on second attempt to suspend ether to ram or disk
> > > > 	in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon.
> > > 
> > > Thanks.  We've had such reports earlier, but I think the problem is still unresolved.  Any
> > > additional information will be valuable.
> > 
> > I will do my best,
> > Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk
> > or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after
> > suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try)
> 
> Thanks for the information.
> 
> BTW, what's the last kernel you have tested?
> 
> Rafael
> 

Hello, 
	Thanks for quick response, 
	I will continue to test my system

I use literally latest Linus's git tree.

The kernel that works is 2.6.20 , and except very weird hang that happens sometimes (1 in 5~6 times) on resume from ram, everything works
I described it in http://lkml.org/lkml/2007/3/16/126

	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-21 23:53           ` Eric Dumazet
@ 2007-03-22  0:04             ` Maxim
  2007-03-22  6:35               ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim @ 2007-03-22  0:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Jan Engelhardt, Andi Kleen, David Howells,
	Jeremy Fitzhardinge, linux-kernel

On Thursday 22 March 2007 01:53:10 Eric Dumazet wrote:
> I stand corrected : This is a new bug
> 
> The /proc/kcore problem appears with linux-2.6.21-rc4-mm1
> 
> fd = open("/proc/kcore", 0);
> llseek(fd, ...) returns an -EINVAL error
> 
> 
> Quick code inspection (before going to sleep...) shows that
> 
> proc_reg_llseek() (file fs/proc/inode.c)
> 
> is doing something like :
> 
> rv = -EINVAL;
> llseek = pde->proc_fops->llseek;
> spin_unlock(&pde->pde_unload_lock);
> if (llseek)
> 	rv = llseek(file, offset, whence);
> 
> As kcore dont have a .llseek handler, proc_reg_llseek() returns -EINVAL;
> 
> Previous kernel was probably calling a default llseek() handler.
> 
> if (!llseek)
> 	llseek = default_llseek;
> 
> Hum ???
> 

Hi,
	Yes, you are right, you have different problem that I had

	But why do you need llseek ?

	Why not to mmap it ?
	It is natural thing to do with files that represent memory.

	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 23:47     ` Nigel Cunningham
@ 2007-03-22  0:25       ` Maxim
  0 siblings, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-22  0:25 UTC (permalink / raw)
  To: nigel; +Cc: Rafael J. Wysocki, linux-kernel, Pavel Machek, David Chinner

On Thursday 22 March 2007 01:47:05 Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
> > > Do you know exactly which mutex was being waited on and where it was
> > > taken? If you can say that, it would be much more helpful.
> 
> Yeah, me too, but assuming too much sometimes bites me :)
> 
> > I think this is the XFS problem with freezable workqueues.
> > 
> > Maxim, please try to apply the appended patch and see if it helps.
> 
> Thanks for your subsequent messages, Maxim. Could you confirm for us
> that the patch Rafael attached fixes it?
> 
> Regards,
> 
> Nigel
> 
> > ---
> > Since freezable workqueues are broken in 2.6.21-rc
> > (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> > it's better to remove them altogether for 2.6.21 and change the only user of
> > them (XFS) accordingly.
> > 
> > ---
> >  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > ===================================================================
> > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> >  	if (!xfs_buf_zone)
> >  		goto out_free_trace_buf;
> >  
> > -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > +	xfslogd_workqueue = create_workqueue("xfslogd");
> >  	if (!xfslogd_workqueue)
> >  		goto out_free_buf_zone;
> >  
> > -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > +	xfsdatad_workqueue = create_workqueue("xfsdatad");
> >  	if (!xfsdatad_workqueue)
> >  		goto out_destroy_xfslogd_workqueue;
> >  
> 
> 

Hello,

I can confirm now that the above patch work,

First as I said I did try to suspend with this patch and without XFS, and it did work,
Now I reverted it and now system still suspends correctly without xfs module loaded ( I didn't tell you that i use now ext3,
and that I generally compile everything in kernel, so i put XFS too, because I used it once, and I still have a XFS disk image)

But system hangs with XFS loaded, so this patch works.

Regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
       [not found]   ` <200703220114.05228.maximlevitsky@gmail.com>
  2007-03-21 23:16     ` Maxim
@ 2007-03-22  0:32     ` Maxim
  1 sibling, 0 replies; 30+ messages in thread
From: Maxim @ 2007-03-22  0:32 UTC (permalink / raw)
  To: nigel; +Cc: Rafael J. Wysocki, linux-kernel

On Thursday 22 March 2007 01:14:05 Maxim wrote:
> On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > > Hi,
> > > 
> > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > 
> > > I did a git-bisect and found that those commits break it:
> > > 
> > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > 
> > > I already reported about it, but now i know the reason why suspend breaks.
> > > 
> > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > and it did work because those functions could be called only in process context
> > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > 
> > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > 
> > > I tested this and all results are positive:
> > > 
> > > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > > Suspend to disk went correctly, but it hang on resume, and I know why.
> > > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > > 
> > > I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that
> > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback.
> > > 
> > > The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier,
> > > to ensure that it doesn't sleep. 
> > > Also documentation should be changed to note about it.
> > > 
> > > Or, it is also possible to revert this change.
> > 
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.
> > 
> > Regards,
> > 
> > Nigel
> > 
> > 
> 
> Hello,
> 
> 	It is workqueue_mutex
> 	and it is taken in kernel/workqueue.c:797
> 
> 	this is guilt of freezable work queues , and XFS uses it (and I use XFS)
> 
> 	Thanks to Rafael J. Wysocki for pointing it out to me.
> 
> 	Regards,
> 		Maxim Levitsky
> 


I think that I made a mistake,

I now reverted the patch that fixes the above error, and I wrote down back-trace from this hang,
and it appears that system hangs in kthread_stop:

This I have written in paper, using kdb:

workqueue_cpu_callback ->
cleanup_workqueue_thread ->
kthread_stop ->

and then wake_up_process (I think , I didn't wrote this on paper, I will check again)


	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 21:38   ` Rafael J. Wysocki
  2007-03-21 23:47     ` Nigel Cunningham
@ 2007-03-22  4:51     ` David Chinner
  2007-03-22  7:23       ` Rafael J. Wysocki
  1 sibling, 1 reply; 30+ messages in thread
From: David Chinner @ 2007-03-22  4:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Maxim Levitsky, linux-kernel, Pavel Machek, David Chinner

On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> 
> I think this is the XFS problem with freezable workqueues.
> 
> Maxim, please try to apply the appended patch and see if it helps.
> 
> Greetings,
> Rafael
> 
> 
> ---
> Since freezable workqueues are broken in 2.6.21-rc
> (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> it's better to remove them altogether for 2.6.21 and change the only user of
> them (XFS) accordingly.
> 
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
>  	if (!xfs_buf_zone)
>  		goto out_free_trace_buf;
>  
> -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> +	xfslogd_workqueue = create_workqueue("xfslogd");
>  	if (!xfslogd_workqueue)
>  		goto out_free_buf_zone;
>  
> -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> +	xfsdatad_workqueue = create_workqueue("xfsdatad");
>  	if (!xfsdatad_workqueue)
>  		goto out_destroy_xfslogd_workqueue;
>  

Acked-by: Dave Chinner <dgc@sgi.com>

Rafael, it sounds like this really needs to go into the next -rc
kernel. Can you push it to Linus?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] : Is /proc/kcore still usefull and/or maintained ?
  2007-03-22  0:04             ` Maxim
@ 2007-03-22  6:35               ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2007-03-22  6:35 UTC (permalink / raw)
  To: Maxim
  Cc: Andrew Morton, Jan Engelhardt, Andi Kleen, David Howells,
	Jeremy Fitzhardinge, linux-kernel

On Thu, 22 Mar 2007 02:04:50 +0200
Maxim <maximlevitsky@gmail.com> wrote:


> Hi,
> 	Yes, you are right, you have different problem that I had
> 
> 	But why do you need llseek ?

I dont personnaly, but tools do need llseek.

> 
> 	Why not to mmap it ?
> 	It is natural thing to do with files that represent memory.

Because I dont want to rewrite gdb, file, and other various tools ?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-22  4:51     ` David Chinner
@ 2007-03-22  7:23       ` Rafael J. Wysocki
  2007-03-22  7:31         ` Andrew Morton
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-22  7:23 UTC (permalink / raw)
  To: David Chinner, Andrew Morton
  Cc: nigel, Maxim Levitsky, linux-kernel, Pavel Machek

On Thursday, 22 March 2007 05:51, David Chinner wrote:
> On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > 
> > I think this is the XFS problem with freezable workqueues.
> > 
> > Maxim, please try to apply the appended patch and see if it helps.
> > 
> > Greetings,
> > Rafael
> > 
> > 
> > ---
> > Since freezable workqueues are broken in 2.6.21-rc
> > (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> > it's better to remove them altogether for 2.6.21 and change the only user of
> > them (XFS) accordingly.
> > 
> > ---
> >  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > ===================================================================
> > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> >  	if (!xfs_buf_zone)
> >  		goto out_free_trace_buf;
> >  
> > -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > +	xfslogd_workqueue = create_workqueue("xfslogd");
> >  	if (!xfslogd_workqueue)
> >  		goto out_free_buf_zone;
> >  
> > -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > +	xfsdatad_workqueue = create_workqueue("xfsdatad");
> >  	if (!xfsdatad_workqueue)
> >  		goto out_destroy_xfslogd_workqueue;
> >  
> 
> Acked-by: Dave Chinner <dgc@sgi.com>
> 
> Rafael, it sounds like this really needs to go into the next -rc
> kernel. Can you push it to Linus?

I tried, but Andrew resent it to you. :-)

Andrew, could we please merge it?

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-22  7:23       ` Rafael J. Wysocki
@ 2007-03-22  7:31         ` Andrew Morton
  2007-03-22  8:17           ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2007-03-22  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, nigel, Maxim Levitsky, linux-kernel, Pavel Machek

On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Thursday, 22 March 2007 05:51, David Chinner wrote:
> > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > > 
> > > I think this is the XFS problem with freezable workqueues.
> > > 
> > > Maxim, please try to apply the appended patch and see if it helps.
> > > 
> > > Greetings,
> > > Rafael
> > > 
> > > 
> > > ---
> > > Since freezable workqueues are broken in 2.6.21-rc
> > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> > > it's better to remove them altogether for 2.6.21 and change the only user of
> > > them (XFS) accordingly.
> > > 
> > > ---
> > >  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > ===================================================================
> > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > >  	if (!xfs_buf_zone)
> > >  		goto out_free_trace_buf;
> > >  
> > > -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > > +	xfslogd_workqueue = create_workqueue("xfslogd");
> > >  	if (!xfslogd_workqueue)
> > >  		goto out_free_buf_zone;
> > >  
> > > -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > > +	xfsdatad_workqueue = create_workqueue("xfsdatad");
> > >  	if (!xfsdatad_workqueue)
> > >  		goto out_destroy_xfslogd_workqueue;
> > >  
> > 
> > Acked-by: Dave Chinner <dgc@sgi.com>
> > 
> > Rafael, it sounds like this really needs to go into the next -rc
> > kernel. Can you push it to Linus?
> 
> I tried, but Andrew resent it to you. :-)
> 
> Andrew, could we please merge it?

Sure.

It shouldn't be this hard :(

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-22  7:31         ` Andrew Morton
@ 2007-03-22  8:17           ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-22  8:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Chinner, nigel, Maxim Levitsky, linux-kernel, Pavel Machek

On Thursday, 22 March 2007 08:31, Andrew Morton wrote:
> On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Thursday, 22 March 2007 05:51, David Chinner wrote:
> > > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > > > 
> > > > I think this is the XFS problem with freezable workqueues.
> > > > 
> > > > Maxim, please try to apply the appended patch and see if it helps.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > > 
> > > > ---
> > > > Since freezable workqueues are broken in 2.6.21-rc
> > > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
> > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
> > > > it's better to remove them altogether for 2.6.21 and change the only user of
> > > > them (XFS) accordingly.
> > > > 
> > > > ---
> > > >  fs/xfs/linux-2.6/xfs_buf.c |    4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > > ===================================================================
> > > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > > >  	if (!xfs_buf_zone)
> > > >  		goto out_free_trace_buf;
> > > >  
> > > > -	xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > > > +	xfslogd_workqueue = create_workqueue("xfslogd");
> > > >  	if (!xfslogd_workqueue)
> > > >  		goto out_free_buf_zone;
> > > >  
> > > > -	xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > > > +	xfsdatad_workqueue = create_workqueue("xfsdatad");
> > > >  	if (!xfsdatad_workqueue)
> > > >  		goto out_destroy_xfslogd_workqueue;
> > > >  
> > > 
> > > Acked-by: Dave Chinner <dgc@sgi.com>
> > > 
> > > Rafael, it sounds like this really needs to go into the next -rc
> > > kernel. Can you push it to Linus?
> > 
> > I tried, but Andrew resent it to you. :-)
> > 
> > Andrew, could we please merge it?
> 
> Sure.
> 
> It shouldn't be this hard :(

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-21 23:53           ` Rafael J. Wysocki
  2007-03-22  0:01             ` Maxim
@ 2007-03-22 23:30             ` Rafael J. Wysocki
  2007-03-23 14:42               ` Rafael J. Wysocki
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-22 23:30 UTC (permalink / raw)
  To: Maxim; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:39, Maxim wrote:
> > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > > 
> > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > 
> > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > > 
> > > > > > 
> > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > 
> > > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > > 
> > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > > and it did work because those functions could be called only in process context
> > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > > 
> > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > > 
> > > > > > 
> > > > > > Thanks for detailed explanation.
> > > > > > 
> > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > 
> > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > holds a lock, that's a bug.
> > > > > > 
> > > > > > > Or, it is also possible to revert this change.
> > > > > > 
> > > > > > Are you using xfs?
> > > > > 
> > > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > > workqueues.
> > > > > 
> > > > > Greetings,
> > > > > Rafael
> > > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > 	Yes, you are right and it is XFS
> > > > 
> > > > 	System suspends and resumes with xfs and your patch correctly,
> > > 
> > > Could you please sent this information to the list?  I'd like it to reach all
> > > of the CCed parites. ;-)
> > 
> > I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> > I didn't intend to send private email.
> > > 
> > > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > > 	because it calls firmware loader helper, and again sleeps on lock
> > > 
> > > This is interesting.  Did it happen before or is it a regression?
> > 
> > It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> > Of course it didn't happen before those reordering commits were introduced
> 
> Well, we want cpu_up/down to be called after processes have been frozen, for
> various reasons (one of them being that applications shouldn't see us playing
> with the CPUs).
> 
> Thanks for reporting this, I'll have a look at the microcode update driver.

Well, I have invented the appended workaround, but I'm not sure how much
sense it makes with respect to the microcode driver.  At least, it doesn't
break my AMD64 SMP setup. ;-)

Greetings,
Rafael

---
 arch/i386/kernel/microcode.c |   23 +++++++++++++++++++++--
 include/linux/cpu.h          |    2 ++
 kernel/cpu.c                 |   20 +++++++++++++++-----
 3 files changed, 38 insertions(+), 7 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c	2007-03-16 21:51:23.000000000 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c	2007-03-22 23:55:40.000000000 +0100
@@ -703,6 +703,16 @@ static struct sysdev_driver mc_sysdev_dr
 	.resume = mc_sysdev_resume,
 };
 
+static __cpuinit void apply_microcode_on_cpu(int cpu)
+{
+	cpumask_t old;
+
+	old = current->cpus_allowed;
+	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	apply_microcode(cpu);
+	set_cpus_allowed(current, old);
+}
+
 static __cpuinit int
 mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu)
 {
@@ -713,10 +723,19 @@ mc_cpu_callback(struct notifier_block *n
 	switch (action) {
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
-		mc_sysdev_add(sys_dev);
+		if (suspend_cpu_hotplug)
+			/* This means we have been called during the resume.
+			 * This is not the real CPU hotplug and we don't need
+			 * to register the sysdev
+			 */
+			apply_microcode_on_cpu(cpu);
+		else
+			mc_sysdev_add(sys_dev);
 		break;
 	case CPU_DOWN_PREPARE:
-		mc_sysdev_remove(sys_dev);
+		/* During the suspend there is no need to remove the sysdev */
+		if (!suspend_cpu_hotplug)
+			mc_sysdev_remove(sys_dev);
 		break;
 	}
 	return NOTIFY_OK;
Index: linux-2.6.21-rc4/include/linux/cpu.h
===================================================================
--- linux-2.6.21-rc4.orig/include/linux/cpu.h	2007-03-16 21:51:34.000000000 +0100
+++ linux-2.6.21-rc4/include/linux/cpu.h	2007-03-22 23:55:54.000000000 +0100
@@ -127,6 +127,8 @@ static inline int cpu_is_offline(int cpu
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_SUSPEND_SMP
+extern int suspend_cpu_hotplug;
+
 extern int disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
 #else
Index: linux-2.6.21-rc4/kernel/cpu.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/cpu.c	2007-03-16 21:51:35.000000000 +0100
+++ linux-2.6.21-rc4/kernel/cpu.c	2007-03-23 00:05:41.000000000 +0100
@@ -254,6 +254,12 @@ int __cpuinit cpu_up(unsigned int cpu)
 }
 
 #ifdef CONFIG_SUSPEND_SMP
+/* Needed to prevent the microcode driver from playing with sysfs in its CPU
+ * hotplug notifier.
+ */
+int suspend_cpu_hotplug;
+EXPORT_SYMBOL(suspend_cpu_hotplug);
+
 static cpumask_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
@@ -261,6 +267,7 @@ int disable_nonboot_cpus(void)
 	int cpu, first_cpu, error = 0;
 
 	mutex_lock(&cpu_add_remove_lock);
+	suspend_cpu_hotplug = 1;
 	first_cpu = first_cpu(cpu_present_map);
 	if (!cpu_online(first_cpu)) {
 		error = _cpu_up(first_cpu);
@@ -297,6 +304,7 @@ int disable_nonboot_cpus(void)
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
 out:
+	suspend_cpu_hotplug = 0;
 	mutex_unlock(&cpu_add_remove_lock);
 	return error;
 }
@@ -308,20 +316,22 @@ void enable_nonboot_cpus(void)
 	/* Allow everyone to use the CPU hotplug again */
 	mutex_lock(&cpu_add_remove_lock);
 	cpu_hotplug_disabled = 0;
-	mutex_unlock(&cpu_add_remove_lock);
 	if (cpus_empty(frozen_cpus))
-		return;
+		goto out;
 
+	suspend_cpu_hotplug = 1;
 	printk("Enabling non-boot CPUs ...\n");
 	for_each_cpu_mask(cpu, frozen_cpus) {
-		error = cpu_up(cpu);
+		error = _cpu_up(cpu);
 		if (!error) {
 			printk("CPU%d is up\n", cpu);
 			continue;
 		}
-		printk(KERN_WARNING "Error taking CPU%d up: %d\n",
-			cpu, error);
+		printk(KERN_WARNING "Error taking CPU%d up: %d\n", cpu, error);
 	}
 	cpus_clear(frozen_cpus);
+	suspend_cpu_hotplug = 0;
+out:
+	mutex_unlock(&cpu_add_remove_lock);
 }
 #endif

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-22 23:30             ` Rafael J. Wysocki
@ 2007-03-23 14:42               ` Rafael J. Wysocki
  2007-03-25  0:40                 ` Maxim
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-23 14:42 UTC (permalink / raw)
  To: Maxim; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > > Hi!
> > > > > > > 
> > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > > > 
> > > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > > 
> > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > > > 
> > > > > > > 
> > > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > > 
> > > > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > > > 
> > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > > > and it did work because those functions could be called only in process context
> > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > > > 
> > > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks for detailed explanation.
> > > > > > > 
> > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > > 
> > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > > holds a lock, that's a bug.
> > > > > > > 
> > > > > > > > Or, it is also possible to revert this change.
> > > > > > > 
> > > > > > > Are you using xfs?
> > > > > > 
> > > > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > > > workqueues.
> > > > > > 
> > > > > > Greetings,
> > > > > > Rafael
> > > > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > 	Yes, you are right and it is XFS
> > > > > 
> > > > > 	System suspends and resumes with xfs and your patch correctly,
> > > > 
> > > > Could you please sent this information to the list?  I'd like it to reach all
> > > > of the CCed parites. ;-)
> > > 
> > > I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> > > I didn't intend to send private email.
> > > > 
> > > > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > > > 	because it calls firmware loader helper, and again sleeps on lock
> > > > 
> > > > This is interesting.  Did it happen before or is it a regression?
> > > 
> > > It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> > > Of course it didn't happen before those reordering commits were introduced
> > 
> > Well, we want cpu_up/down to be called after processes have been frozen, for
> > various reasons (one of them being that applications shouldn't see us playing
> > with the CPUs).
> > 
> > Thanks for reporting this, I'll have a look at the microcode update driver.
> 
> Well, I have invented the appended workaround, but I'm not sure how much
> sense it makes with respect to the microcode driver.  At least, it doesn't
> break my AMD64 SMP setup. ;-)

Modified version of the patch is appended.  Unfortunately I have no hardware
supporting the microcode updates.

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   28 +++++++++++++++++++++++++---
 include/linux/cpu.h          |    2 ++
 kernel/cpu.c                 |   32 ++++++++++++++++----------------
 3 files changed, 43 insertions(+), 19 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
@@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
 	return error;
 }
 
+static void apply_microcode_on_cpu(int cpu)
+{
+	cpumask_t old;
+
+	old = current->cpus_allowed;
+	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+	apply_microcode(cpu);
+	set_cpus_allowed(current, old);
+}
+
 static void microcode_init_cpu(int cpu)
 {
 	cpumask_t old;
@@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi
 		return 0;
 
 	pr_debug("Microcode:CPU %d added\n", cpu);
-	memset(uci, 0, sizeof(*uci));
+	/* If suspend_cpu_hotplug is set, the system is resuming and we should
+	 * use the data from before the suspend.
+	 */
+	if (!suspend_cpu_hotplug)
+		memset(uci, 0, sizeof(*uci));
 
 	err = sysfs_create_group(&sys_dev->kobj, &mc_attr_group);
 	if (err)
 		return err;
 
-	microcode_init_cpu(cpu);
+	if (suspend_cpu_hotplug && uci->valid)
+		apply_microcode_on_cpu(cpu);
+	else
+		microcode_init_cpu(cpu);
+
 	return 0;
 }
 
@@ -680,7 +698,11 @@ static int mc_sysdev_remove(struct sys_d
 	if (!cpu_online(cpu))
 		return 0;
 	pr_debug("Microcode:CPU %d removed\n", cpu);
-	microcode_fini_cpu(cpu);
+	/* If suspend_cpu_hotplug is set, the system is suspending and we should
+	 * keep the microcode in memory for the resume.
+	 */
+	if (!suspend_cpu_hotplug)
+		microcode_fini_cpu(cpu);
 	sysfs_remove_group(&sys_dev->kobj, &mc_attr_group);
 	return 0;
 }
Index: linux-2.6.21-rc4/include/linux/cpu.h
===================================================================
--- linux-2.6.21-rc4.orig/include/linux/cpu.h
+++ linux-2.6.21-rc4/include/linux/cpu.h
@@ -127,6 +127,8 @@ static inline int cpu_is_offline(int cpu
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_SUSPEND_SMP
+extern int suspend_cpu_hotplug;
+
 extern int disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
 #else
Index: linux-2.6.21-rc4/kernel/cpu.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/cpu.c
+++ linux-2.6.21-rc4/kernel/cpu.c
@@ -254,6 +254,12 @@ int __cpuinit cpu_up(unsigned int cpu)
 }
 
 #ifdef CONFIG_SUSPEND_SMP
+/* Needed to prevent the microcode driver from requesting firmware in its CPU
+ * hotplug notifier during the suspend/resume.
+ */
+int suspend_cpu_hotplug;
+EXPORT_SYMBOL(suspend_cpu_hotplug);
+
 static cpumask_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
@@ -261,16 +267,8 @@ int disable_nonboot_cpus(void)
 	int cpu, first_cpu, error = 0;
 
 	mutex_lock(&cpu_add_remove_lock);
-	first_cpu = first_cpu(cpu_present_map);
-	if (!cpu_online(first_cpu)) {
-		error = _cpu_up(first_cpu);
-		if (error) {
-			printk(KERN_ERR "Could not bring CPU%d up.\n",
-				first_cpu);
-			goto out;
-		}
-	}
-
+	suspend_cpu_hotplug = 1;
+	first_cpu = first_cpu(cpu_online_map);
 	/* We take down all of the non-boot CPUs in one shot to avoid races
 	 * with the userspace trying to use the CPU hotplug at the same time
 	 */
@@ -296,7 +294,7 @@ int disable_nonboot_cpus(void)
 	} else {
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
-out:
+	suspend_cpu_hotplug = 0;
 	mutex_unlock(&cpu_add_remove_lock);
 	return error;
 }
@@ -308,20 +306,22 @@ void enable_nonboot_cpus(void)
 	/* Allow everyone to use the CPU hotplug again */
 	mutex_lock(&cpu_add_remove_lock);
 	cpu_hotplug_disabled = 0;
-	mutex_unlock(&cpu_add_remove_lock);
 	if (cpus_empty(frozen_cpus))
-		return;
+		goto out;
 
+	suspend_cpu_hotplug = 1;
 	printk("Enabling non-boot CPUs ...\n");
 	for_each_cpu_mask(cpu, frozen_cpus) {
-		error = cpu_up(cpu);
+		error = _cpu_up(cpu);
 		if (!error) {
 			printk("CPU%d is up\n", cpu);
 			continue;
 		}
-		printk(KERN_WARNING "Error taking CPU%d up: %d\n",
-			cpu, error);
+		printk(KERN_WARNING "Error taking CPU%d up: %d\n", cpu, error);
 	}
 	cpus_clear(frozen_cpus);
+	suspend_cpu_hotplug = 0;
+out:
+	mutex_unlock(&cpu_add_remove_lock);
 }
 #endif

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-23 14:42               ` Rafael J. Wysocki
@ 2007-03-25  0:40                 ` Maxim
  2007-03-25 12:13                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim @ 2007-03-25  0:40 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > > > Hi!
> > > > > > > > 
> > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system.
> > > > > > > > > 
> > > > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > > > 
> > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c
> > > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c
> > > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > > > 
> > > > > > > > > I already reported about it, but now i know the reason why suspend breaks.
> > > > > > > > > 
> > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > > > > > > > and it did work because those functions could be called only in process context
> > > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  that does smp_init()).
> > > > > > > > > 
> > > > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock
> > > > > > > > > that is taken by different process, it can't since the different proccess is frozen and can't release the lock.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks for detailed explanation.
> > > > > > > > 
> > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > > > 
> > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > > > holds a lock, that's a bug.
> > > > > > > > 
> > > > > > > > > Or, it is also possible to revert this change.
> > > > > > > > 
> > > > > > > > Are you using xfs?
> > > > > > > 
> > > > > > > Well, this is the only case that can trigger it.  There are no other freezable
> > > > > > > workqueues.
> > > > > > > 
> > > > > > > Greetings,
> > > > > > > Rafael
> > > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > 	Yes, you are right and it is XFS
> > > > > > 
> > > > > > 	System suspends and resumes with xfs and your patch correctly,
> > > > > 
> > > > > Could you please sent this information to the list?  I'd like it to reach all
> > > > > of the CCed parites. ;-)
> > > > 
> > > > I did now ( sorry I just keep using this Answer command, instead of Answer to everybody)
> > > > I didn't intend to send private email.
> > > > > 
> > > > > > 	Of course I need to mention that I had to unload microcode update driver because it prevented resume,
> > > > > > 	because it calls firmware loader helper, and again sleeps on lock
> > > > > 
> > > > > This is interesting.  Did it happen before or is it a regression?
> > > > 
> > > > It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks
> > > > Of course it didn't happen before those reordering commits were introduced
> > > 
> > > Well, we want cpu_up/down to be called after processes have been frozen, for
> > > various reasons (one of them being that applications shouldn't see us playing
> > > with the CPUs).
> > > 
> > > Thanks for reporting this, I'll have a look at the microcode update driver.
> > 
> > Well, I have invented the appended workaround, but I'm not sure how much
> > sense it makes with respect to the microcode driver.  At least, it doesn't
> > break my AMD64 SMP setup. ;-)
> 
> Modified version of the patch is appended.  Unfortunately I have no hardware
> supporting the microcode updates.
> 
> Greetings,
> Rafael
> 
> 
> ---
>  arch/i386/kernel/microcode.c |   28 +++++++++++++++++++++++++---
>  include/linux/cpu.h          |    2 ++
>  kernel/cpu.c                 |   32 ++++++++++++++++----------------
>  3 files changed, 43 insertions(+), 19 deletions(-)
> 
> Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
> +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
>  	return error;
>  }
>  
> +static void apply_microcode_on_cpu(int cpu)
> +{
> +	cpumask_t old;
> +
> +	old = current->cpus_allowed;
> +	set_cpus_allowed(current, cpumask_of_cpu(cpu));
> +	apply_microcode(cpu);
> +	set_cpus_allowed(current, old);
> +}
> +
>  static void microcode_init_cpu(int cpu)
>  {
>  	cpumask_t old;
> @@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi
>  		return 0;
>  
>  	pr_debug("Microcode:CPU %d added\n", cpu);
> -	memset(uci, 0, sizeof(*uci));
> +	/* If suspend_cpu_hotplug is set, the system is resuming and we should
> +	 * use the data from before the suspend.
> +	 */
> +	if (!suspend_cpu_hotplug)
> +		memset(uci, 0, sizeof(*uci));
>  
>  	err = sysfs_create_group(&sys_dev->kobj, &mc_attr_group);
>  	if (err)
>  		return err;
>  
> -	microcode_init_cpu(cpu);
> +	if (suspend_cpu_hotplug && uci->valid)
> +		apply_microcode_on_cpu(cpu);
> +	else
> +		microcode_init_cpu(cpu);
> +
>  	return 0;
>  }
>  
> @@ -680,7 +698,11 @@ static int mc_sysdev_remove(struct sys_d
>  	if (!cpu_online(cpu))
>  		return 0;
>  	pr_debug("Microcode:CPU %d removed\n", cpu);
> -	microcode_fini_cpu(cpu);
> +	/* If suspend_cpu_hotplug is set, the system is suspending and we should
> +	 * keep the microcode in memory for the resume.
> +	 */
> +	if (!suspend_cpu_hotplug)
> +		microcode_fini_cpu(cpu);
>  	sysfs_remove_group(&sys_dev->kobj, &mc_attr_group);
>  	return 0;
>  }
> Index: linux-2.6.21-rc4/include/linux/cpu.h
> ===================================================================
> --- linux-2.6.21-rc4.orig/include/linux/cpu.h
> +++ linux-2.6.21-rc4/include/linux/cpu.h
> @@ -127,6 +127,8 @@ static inline int cpu_is_offline(int cpu
>  #endif		/* CONFIG_HOTPLUG_CPU */
>  
>  #ifdef CONFIG_SUSPEND_SMP
> +extern int suspend_cpu_hotplug;
> +
>  extern int disable_nonboot_cpus(void);
>  extern void enable_nonboot_cpus(void);
>  #else
> Index: linux-2.6.21-rc4/kernel/cpu.c
> ===================================================================
> --- linux-2.6.21-rc4.orig/kernel/cpu.c
> +++ linux-2.6.21-rc4/kernel/cpu.c
> @@ -254,6 +254,12 @@ int __cpuinit cpu_up(unsigned int cpu)
>  }
>  
>  #ifdef CONFIG_SUSPEND_SMP
> +/* Needed to prevent the microcode driver from requesting firmware in its CPU
> + * hotplug notifier during the suspend/resume.
> + */
> +int suspend_cpu_hotplug;
> +EXPORT_SYMBOL(suspend_cpu_hotplug);
> +
>  static cpumask_t frozen_cpus;
>  
>  int disable_nonboot_cpus(void)
> @@ -261,16 +267,8 @@ int disable_nonboot_cpus(void)
>  	int cpu, first_cpu, error = 0;
>  
>  	mutex_lock(&cpu_add_remove_lock);
> -	first_cpu = first_cpu(cpu_present_map);
> -	if (!cpu_online(first_cpu)) {
> -		error = _cpu_up(first_cpu);
> -		if (error) {
> -			printk(KERN_ERR "Could not bring CPU%d up.\n",
> -				first_cpu);
> -			goto out;
> -		}
> -	}
> -
> +	suspend_cpu_hotplug = 1;
> +	first_cpu = first_cpu(cpu_online_map);
>  	/* We take down all of the non-boot CPUs in one shot to avoid races
>  	 * with the userspace trying to use the CPU hotplug at the same time
>  	 */
> @@ -296,7 +294,7 @@ int disable_nonboot_cpus(void)
>  	} else {
>  		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>  	}
> -out:
> +	suspend_cpu_hotplug = 0;
>  	mutex_unlock(&cpu_add_remove_lock);
>  	return error;
>  }
> @@ -308,20 +306,22 @@ void enable_nonboot_cpus(void)
>  	/* Allow everyone to use the CPU hotplug again */
>  	mutex_lock(&cpu_add_remove_lock);
>  	cpu_hotplug_disabled = 0;
> -	mutex_unlock(&cpu_add_remove_lock);
>  	if (cpus_empty(frozen_cpus))
> -		return;
> +		goto out;
>  
> +	suspend_cpu_hotplug = 1;
>  	printk("Enabling non-boot CPUs ...\n");
>  	for_each_cpu_mask(cpu, frozen_cpus) {
> -		error = cpu_up(cpu);
> +		error = _cpu_up(cpu);
>  		if (!error) {
>  			printk("CPU%d is up\n", cpu);
>  			continue;
>  		}
> -		printk(KERN_WARNING "Error taking CPU%d up: %d\n",
> -			cpu, error);
> +		printk(KERN_WARNING "Error taking CPU%d up: %d\n", cpu, error);
>  	}
>  	cpus_clear(frozen_cpus);
> +	suspend_cpu_hotplug = 0;
> +out:
> +	mutex_unlock(&cpu_add_remove_lock);
>  }
>  #endif
> 

Hi,
	I confirm that the above patch works,

	At least system didn't hang on resume with microcode driver loaded,

	I can't really test whenever it did update microcode because I almost sure there is nothing to update
	(I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) )
	I selected this driver just in case when I compiled kernel.

	Regards,
		Maxim Levitsky

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-25  0:40                 ` Maxim
@ 2007-03-25 12:13                   ` Rafael J. Wysocki
  2007-03-25 15:10                     ` Maxim
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-25 12:13 UTC (permalink / raw)
  To: Maxim; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Sunday, 25 March 2007 01:40, Maxim wrote:
> On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> Hi,
> 	I confirm that the above patch works,
> 
> 	At least system didn't hang on resume with microcode driver loaded,
> 
> 	I can't really test whenever it did update microcode because I almost sure there is nothing to update
> 	(I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) )
> 	I selected this driver just in case when I compiled kernel.

OK, thanks for testing.

In the meantime I've prepared the more sophisticated version of the patch that is
appended.  Could you please check if it still works for you?

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   71 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/cpu.h          |    2 +
 kernel/cpu.c                 |   32 +++++++++----------
 3 files changed, 85 insertions(+), 20 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===================================================================
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c	2007-03-23 22:57:28.000000000 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c	2007-03-23 23:43:27.000000000 +0100
@@ -567,6 +567,53 @@ static int cpu_request_microcode(int cpu
 	return error;
 }
 
+static int apply_microcode_on_cpu(int cpu)
+{
+	struct cpuinfo_x86 *c = cpu_data + cpu;
+	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+	cpumask_t old;
+	unsigned int val[2];
+	int err = 0;
+
+	if (!uci->mc)
+		return -EINVAL;
+
+	old = current->cpus_allowed;
+	set_cpus_allowed(current, cpumask_of_cpu(cpu));
+
+	/* Check if the microcode we have in memory matches the CPU */
+	if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 < 6 ||
+	    cpu_has(c, X86_FEATURE_IA64) || uci->sig != cpuid_eax(0x00000001))
+		err = -EINVAL;
+
+	if (!err && ((c->x86_model >= 5) || (c->x86 > 6))) {
+		/* get processor flags from MSR 0x17 */
+		rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+		if (uci->pf != (1 << ((val[1] >> 18) & 7)))
+			err = -EINVAL;
+	}
+
+	if (!err) {
+		wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+		/* see notes above for revision 1.07.  Apparent chip bug */
+		sync_core();
+		/* get the current revision from MSR 0x8B */
+		rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+		if (uci->rev != val[1])
+			err = -EINVAL;
+	}
+
+	if (!err)
+		apply_microcode(cpu);
+	else
+		printk(KERN_ERR "microcode: Could not apply microcode to CPU%d:"
+			" sig=0x%x, pf=0x%x, rev=0x%x\n",
+			cpu, uci->sig, uci->pf, uci->rev);
+
+	set_cpus_allowed(current, old);
+	return err;
+}
+
 static void microcode_init_cpu(int cpu)
 {
 	cpumask_t old;
@@ -577,7 +624,8 @@ static void microcode_init_cpu(int cpu)
 	set_cpus_allowed(current, cpumask_of_cpu(cpu));
 	mutex_lock(&microcode_mutex);
 	collect_cpu_info(cpu);
-	if (uci->valid && system_state == SYSTEM_RUNNING)
+	if (uci->valid && system_state == SYSTEM_RUNNING &&
+	    !suspend_cpu_hotplug)
 		cpu_request_microcode(cpu);
 	mutex_unlock(&microcode_mutex);
 	set_cpus_allowed(current, old);
@@ -663,13 +711,24 @@ static int mc_sysdev_add(struct sys_devi
 		return 0;
 
 	pr_debug("Microcode:CPU %d added\n", cpu);
-	memset(uci, 0, sizeof(*uci));
+	/* If suspend_cpu_hotplug is set, the system is resuming and we should
+	 * use the data from before the suspend.
+	 */
+	if (suspend_cpu_hotplug) {
+		err = apply_microcode_on_cpu(cpu);
+		if (err)
+			microcode_fini_cpu(cpu);
+	}
+	if (!uci->valid)
+		memset(uci, 0, sizeof(*uci));
 
 	err = sysfs_create_group(&sys_dev->kobj, &mc_attr_group);
 	if (err)
 		return err;
 
-	microcode_init_cpu(cpu);
+	if (!uci->valid)
+		microcode_init_cpu(cpu);
+
 	return 0;
 }
 
@@ -680,7 +739,11 @@ static int mc_sysdev_remove(struct sys_d
 	if (!cpu_online(cpu))
 		return 0;
 	pr_debug("Microcode:CPU %d removed\n", cpu);
-	microcode_fini_cpu(cpu);
+	/* If suspend_cpu_hotplug is set, the system is suspending and we should
+	 * keep the microcode in memory for the resume.
+	 */
+	if (!suspend_cpu_hotplug)
+		microcode_fini_cpu(cpu);
 	sysfs_remove_group(&sys_dev->kobj, &mc_attr_group);
 	return 0;
 }
Index: linux-2.6.21-rc4/include/linux/cpu.h
===================================================================
--- linux-2.6.21-rc4.orig/include/linux/cpu.h	2007-03-23 22:57:28.000000000 +0100
+++ linux-2.6.21-rc4/include/linux/cpu.h	2007-03-23 22:57:39.000000000 +0100
@@ -127,6 +127,8 @@ static inline int cpu_is_offline(int cpu
 #endif		/* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_SUSPEND_SMP
+extern int suspend_cpu_hotplug;
+
 extern int disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
 #else
Index: linux-2.6.21-rc4/kernel/cpu.c
===================================================================
--- linux-2.6.21-rc4.orig/kernel/cpu.c	2007-03-23 22:57:28.000000000 +0100
+++ linux-2.6.21-rc4/kernel/cpu.c	2007-03-23 22:57:39.000000000 +0100
@@ -254,6 +254,12 @@ int __cpuinit cpu_up(unsigned int cpu)
 }
 
 #ifdef CONFIG_SUSPEND_SMP
+/* Needed to prevent the microcode driver from requesting firmware in its CPU
+ * hotplug notifier during the suspend/resume.
+ */
+int suspend_cpu_hotplug;
+EXPORT_SYMBOL(suspend_cpu_hotplug);
+
 static cpumask_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
@@ -261,16 +267,8 @@ int disable_nonboot_cpus(void)
 	int cpu, first_cpu, error = 0;
 
 	mutex_lock(&cpu_add_remove_lock);
-	first_cpu = first_cpu(cpu_present_map);
-	if (!cpu_online(first_cpu)) {
-		error = _cpu_up(first_cpu);
-		if (error) {
-			printk(KERN_ERR "Could not bring CPU%d up.\n",
-				first_cpu);
-			goto out;
-		}
-	}
-
+	suspend_cpu_hotplug = 1;
+	first_cpu = first_cpu(cpu_online_map);
 	/* We take down all of the non-boot CPUs in one shot to avoid races
 	 * with the userspace trying to use the CPU hotplug at the same time
 	 */
@@ -296,7 +294,7 @@ int disable_nonboot_cpus(void)
 	} else {
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
-out:
+	suspend_cpu_hotplug = 0;
 	mutex_unlock(&cpu_add_remove_lock);
 	return error;
 }
@@ -308,20 +306,22 @@ void enable_nonboot_cpus(void)
 	/* Allow everyone to use the CPU hotplug again */
 	mutex_lock(&cpu_add_remove_lock);
 	cpu_hotplug_disabled = 0;
-	mutex_unlock(&cpu_add_remove_lock);
 	if (cpus_empty(frozen_cpus))
-		return;
+		goto out;
 
+	suspend_cpu_hotplug = 1;
 	printk("Enabling non-boot CPUs ...\n");
 	for_each_cpu_mask(cpu, frozen_cpus) {
-		error = cpu_up(cpu);
+		error = _cpu_up(cpu);
 		if (!error) {
 			printk("CPU%d is up\n", cpu);
 			continue;
 		}
-		printk(KERN_WARNING "Error taking CPU%d up: %d\n",
-			cpu, error);
+		printk(KERN_WARNING "Error taking CPU%d up: %d\n", cpu, error);
 	}
 	cpus_clear(frozen_cpus);
+	suspend_cpu_hotplug = 0;
+out:
+	mutex_unlock(&cpu_add_remove_lock);
 }
 #endif

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-25 12:13                   ` Rafael J. Wysocki
@ 2007-03-25 15:10                     ` Maxim
  2007-03-25 19:27                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim @ 2007-03-25 15:10 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
> On Sunday, 25 March 2007 01:40, Maxim wrote:
> > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> > Hi,
> > 	I confirm that the above patch works,
> > 
> > 	At least system didn't hang on resume with microcode driver loaded,
> > 
> > 	I can't really test whenever it did update microcode because I almost sure there is nothing to update
> > 	(I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) )
> > 	I selected this driver just in case when I compiled kernel.
> 
> OK, thanks for testing.
> 


Hello,
	I tested this patch, and it does work too,
	System suspended /resumed correctly, no errors in kernel log
	


> -	first_cpu = first_cpu(cpu_present_map);
> -	if (!cpu_online(first_cpu)) {
> -		error = _cpu_up(first_cpu);
> -		if (error) {
> -			printk(KERN_ERR "Could not bring CPU%d up.\n",
> -				first_cpu);
> -			goto out;
> -		}
> -	}
> -

Nice, I once have seen those lines too, and they look ridiculous, but could that be that they are still necessary on some systems,


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
  2007-03-25 15:10                     ` Maxim
@ 2007-03-25 19:27                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2007-03-25 19:27 UTC (permalink / raw)
  To: Maxim; +Cc: linux-kernel, Pavel Machek, Tigran Aivazian

On Sunday, 25 March 2007 17:10, Maxim wrote:
> On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
> > On Sunday, 25 March 2007 01:40, Maxim wrote:
> > > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> > > Hi,
> > > 	I confirm that the above patch works,
> > > 
> > > 	At least system didn't hang on resume with microcode driver loaded,
> > > 
> > > 	I can't really test whenever it did update microcode because I almost sure there is nothing to update
> > > 	(I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) )
> > > 	I selected this driver just in case when I compiled kernel.
> > 
> > OK, thanks for testing.
> > 
> 
> 
> Hello,
> 	I tested this patch, and it does work too,
> 	System suspended /resumed correctly, no errors in kernel log

Thanks for the confirmation.

I wonder if someone who actually needs the microcode to be updated can test
it ...

> > -	first_cpu = first_cpu(cpu_present_map);
> > -	if (!cpu_online(first_cpu)) {
> > -		error = _cpu_up(first_cpu);
> > -		if (error) {
> > -			printk(KERN_ERR "Could not bring CPU%d up.\n",
> > -				first_cpu);
> > -			goto out;
> > -		}
> > -	}
> > -
> 
> Nice, I once have seen those lines too, and they look ridiculous, but
> could that be that they are still necessary on some systems, 

I think this was a mistake from the beginning.

Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-03-25 19:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-21 16:40 [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim Levitsky
2007-03-21 21:22 ` Nigel Cunningham
2007-03-21 21:38   ` Rafael J. Wysocki
2007-03-21 23:47     ` Nigel Cunningham
2007-03-22  0:25       ` Maxim
2007-03-22  4:51     ` David Chinner
2007-03-22  7:23       ` Rafael J. Wysocki
2007-03-22  7:31         ` Andrew Morton
2007-03-22  8:17           ` Rafael J. Wysocki
     [not found]   ` <200703220114.05228.maximlevitsky@gmail.com>
2007-03-21 23:16     ` Maxim
2007-03-22  0:32     ` Maxim
2007-03-21 22:21 ` Pavel Machek
2007-03-21 22:39   ` Rafael J. Wysocki
2007-03-21 22:58     ` [RFC] : Is /proc/kcore still usefull and/or maintained ? Eric Dumazet
2007-03-21 23:11       ` Jan Engelhardt
2007-03-21 23:28         ` Maxim
2007-03-21 23:53           ` Eric Dumazet
2007-03-22  0:04             ` Maxim
2007-03-22  6:35               ` Eric Dumazet
     [not found]     ` <200703220109.54719.maximlevitsky@gmail.com>
2007-03-21 23:18       ` [BUG] Code reordering in swsusp breaks suspend on SMP systems Maxim
     [not found]       ` <200703220024.25436.rjw@sisk.pl>
2007-03-21 23:39         ` Maxim
2007-03-21 23:44           ` Maxim
2007-03-21 23:53           ` Rafael J. Wysocki
2007-03-22  0:01             ` Maxim
2007-03-22 23:30             ` Rafael J. Wysocki
2007-03-23 14:42               ` Rafael J. Wysocki
2007-03-25  0:40                 ` Maxim
2007-03-25 12:13                   ` Rafael J. Wysocki
2007-03-25 15:10                     ` Maxim
2007-03-25 19:27                       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).