LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* cpu hotplug support broken in 2.6.23-rc3
@ 2007-08-27 10:43 Pavel Machek
  2007-08-27 10:58 ` Pavel Machek
  2007-08-29  8:08 ` cpu hotplug support broken in 2.6.23-rc3 Gautham R Shenoy
  0 siblings, 2 replies; 31+ messages in thread
From: Pavel Machek @ 2007-08-27 10:43 UTC (permalink / raw)
  To: rusty, vatsa, zwane, kernel list; +Cc: Rafael J. Wysocki

Hi!

Trying to do few onlines/offlines reliably hangs my machine (thinkpad
x60, i386 architecture).

Plus I guess it would be nice to add CPU HOTPLUG into MAINTAINERS
file:

pavel@amd:/data/l/linux$ grep CPU MAINTAINERS
CPU FREQUENCY DRIVERS
CPUID/MSR DRIVER
CPUSETS
i386 SETUP CODE / CPU ERRATA WORKAROUNDS
SCx200 CPU SUPPORT
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 10:43 cpu hotplug support broken in 2.6.23-rc3 Pavel Machek
@ 2007-08-27 10:58 ` Pavel Machek
  2007-08-27 14:36   ` Jeff Chua
  2007-08-29  8:08 ` cpu hotplug support broken in 2.6.23-rc3 Gautham R Shenoy
  1 sibling, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-08-27 10:58 UTC (permalink / raw)
  To: rusty, vatsa, zwane, kernel list; +Cc: Rafael J. Wysocki

On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> Hi!
> 
> Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> x60, i386 architecture).
> 
> Plus I guess it would be nice to add CPU HOTPLUG into MAINTAINERS
> file:
> 
> pavel@amd:/data/l/linux$ grep CPU MAINTAINERS
> CPU FREQUENCY DRIVERS
> CPUID/MSR DRIVER
> CPUSETS
> i386 SETUP CODE / CPU ERRATA WORKAROUNDS
> SCx200 CPU SUPPORT

...plus it actually breaks suspend, and it is regression from 2.6.22.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 10:58 ` Pavel Machek
@ 2007-08-27 14:36   ` Jeff Chua
  2007-08-27 15:22     ` Michal Piotrowski
  2007-08-27 21:32     ` Pavel Machek
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff Chua @ 2007-08-27 14:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > Hi!
> >
> > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > x60, i386 architecture).

I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
and my system still survives.

Jeff.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 14:36   ` Jeff Chua
@ 2007-08-27 15:22     ` Michal Piotrowski
  2007-08-27 21:32     ` Pavel Machek
  1 sibling, 0 replies; 31+ messages in thread
From: Michal Piotrowski @ 2007-08-27 15:22 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Pavel Machek, rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

Hi,

On 27/08/07, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > Hi!
> > >
> > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > x60, i386 architecture).
>
> I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> and my system still survives.

So maybe diff between your and Pavel's config file will give an answer.

Any details about the software environment?

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 14:36   ` Jeff Chua
  2007-08-27 15:22     ` Michal Piotrowski
@ 2007-08-27 21:32     ` Pavel Machek
  2007-08-27 21:59       ` Rafael J. Wysocki
  2007-08-28 14:21       ` Jeff Chua
  1 sibling, 2 replies; 31+ messages in thread
From: Pavel Machek @ 2007-08-27 21:32 UTC (permalink / raw)
  To: Jeff Chua; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

On Mon 2007-08-27 22:36:57, Jeff Chua wrote:
> On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > Hi!
> > >
> > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > x60, i386 architecture).
> 
> I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> and my system still survives.

Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
so cycles at one point.

...or maybe difference is in the .config, or maybe I broken something
in my kernel sources....
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 21:59       ` Rafael J. Wysocki
@ 2007-08-27 21:58         ` Pavel Machek
  2007-08-28 10:30           ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-08-27 21:58 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Jeff Chua, rusty, vatsa, zwane, kernel list

On Mon 2007-08-27 23:59:31, Rafael J. Wysocki wrote:
> On Monday, 27 August 2007 23:32, Pavel Machek wrote:
> > On Mon 2007-08-27 22:36:57, Jeff Chua wrote:
> > > On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > > > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > > > Hi!
> > > > >
> > > > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > > > x60, i386 architecture).
> > > 
> > > I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> > > and my system still survives.
> > 
> > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > so cycles at one point.
> > 
> > ...or maybe difference is in the .config, or maybe I broken something
> > in my kernel sources....
> 
> Well, something seems to be wrong with the CPU hotplug, but it's insanely
> difficult to reproduce on my boxes.
> 
> I bet on one of the notifiers blocking while waiting on a frozen task.

It happens reliably for me, with this script... and randomly, when I
just echo 0/1 > online from commandline... so it should not be
anything with the frozen tasks.

echo test > /sys/power/disk
echo disk > /sys/power/state

reliably hangs on resume in the attached script. It works ok with
nosmp.

								Pavel

#!/bin/bash
killall klogd

echo -n "testing refrigerator (testproc)..."
echo testproc > /sys/power/disk
echo disk > /sys/power/state
echo "okay"

sleep 2
echo -n "testing drivers (test)..."
echo test > /sys/power/disk
echo disk > /sys/power/state
echo "okay"

sleep 2
echo -n "testing swsusp (reboot)..."
echo reboot > /sys/power/disk
echo disk > /sys/power/state
echo "okay"

sleep 2
echo -n "testing s2ram..."
s2ram
echo "okay"

sleep 2
echo -n "testing swsusp (shutdown)..."
echo shutdown > /sys/power/disk
echo disk > /sys/power/state
echo "okay"

sleep 2
echo -n "testing swsusp (platform)..."
echo platform > /sys/power/disk
echo disk > /sys/power/state
echo "okay"

sleep 2
echo -n "testing s2ram..."
s2ram
echo "okay"
 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 21:32     ` Pavel Machek
@ 2007-08-27 21:59       ` Rafael J. Wysocki
  2007-08-27 21:58         ` Pavel Machek
  2007-08-28 14:21       ` Jeff Chua
  1 sibling, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2007-08-27 21:59 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jeff Chua, rusty, vatsa, zwane, kernel list

On Monday, 27 August 2007 23:32, Pavel Machek wrote:
> On Mon 2007-08-27 22:36:57, Jeff Chua wrote:
> > On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > > x60, i386 architecture).
> > 
> > I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> > and my system still survives.
> 
> Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> so cycles at one point.
> 
> ...or maybe difference is in the .config, or maybe I broken something
> in my kernel sources....

Well, something seems to be wrong with the CPU hotplug, but it's insanely
difficult to reproduce on my boxes.

I bet on one of the notifiers blocking while waiting on a frozen task.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 21:58         ` Pavel Machek
@ 2007-08-28 10:30           ` Rafael J. Wysocki
  2007-08-28 13:00             ` Akinobu Mita
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2007-08-28 10:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jeff Chua, vatsa, zwane, kernel list, satyam, akinobu.mita,
	ashok.raj, ego, rusty

On Monday, 27 August 2007 23:58, Pavel Machek wrote:
> On Mon 2007-08-27 23:59:31, Rafael J. Wysocki wrote:
> > On Monday, 27 August 2007 23:32, Pavel Machek wrote:
> > > On Mon 2007-08-27 22:36:57, Jeff Chua wrote:
> > > > On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > > > > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > > > > Hi!
> > > > > >
> > > > > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > > > > x60, i386 architecture).
> > > > 
> > > > I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> > > > and my system still survives.
> > > 
> > > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > > so cycles at one point.
> > > 
> > > ...or maybe difference is in the .config, or maybe I broken something
> > > in my kernel sources....
> > 
> > Well, something seems to be wrong with the CPU hotplug, but it's insanely
> > difficult to reproduce on my boxes.
> > 
> > I bet on one of the notifiers blocking while waiting on a frozen task.
> 
> It happens reliably for me, with this script... and randomly, when I
> just echo 0/1 > online from commandline... so it should not be
> anything with the frozen tasks.

That suggests the CPU hotplug just deadlocks internally.

Can you put some printk's into _cpu_down() and see where exactly it hangs?

> echo test > /sys/power/disk
> echo disk > /sys/power/state
> 
> reliably hangs on resume in the attached script. It works ok with
> nosmp.

Which step hangs it?  Or is it at random?

Rafael


> #!/bin/bash
> killall klogd
> 
> echo -n "testing refrigerator (testproc)..."
> echo testproc > /sys/power/disk
> echo disk > /sys/power/state
> echo "okay"
> 
> sleep 2
> echo -n "testing drivers (test)..."
> echo test > /sys/power/disk
> echo disk > /sys/power/state
> echo "okay"
> 
> sleep 2
> echo -n "testing swsusp (reboot)..."
> echo reboot > /sys/power/disk
> echo disk > /sys/power/state
> echo "okay"
> 
> sleep 2
> echo -n "testing s2ram..."
> s2ram
> echo "okay"
> 
> sleep 2
> echo -n "testing swsusp (shutdown)..."
> echo shutdown > /sys/power/disk
> echo disk > /sys/power/state
> echo "okay"
> 
> sleep 2
> echo -n "testing swsusp (platform)..."
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> echo "okay"
> 
> sleep 2
> echo -n "testing s2ram..."
> s2ram
> echo "okay"
>  
> 

-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-28 10:30           ` Rafael J. Wysocki
@ 2007-08-28 13:00             ` Akinobu Mita
  0 siblings, 0 replies; 31+ messages in thread
From: Akinobu Mita @ 2007-08-28 13:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Jeff Chua, vatsa, zwane, kernel list, satyam,
	ashok.raj, ego, rusty

2007/8/28, Rafael J. Wysocki <rjw@sisk.pl>:
> On Monday, 27 August 2007 23:58, Pavel Machek wrote:
> > On Mon 2007-08-27 23:59:31, Rafael J. Wysocki wrote:
> > > On Monday, 27 August 2007 23:32, Pavel Machek wrote:
> > > > On Mon 2007-08-27 22:36:57, Jeff Chua wrote:
> > > > > On 8/27/07, Pavel Machek <pavel@ucw.cz> wrote:
> > > > > > On Mon 2007-08-27 12:43:50, Pavel Machek wrote:
> > > > > > > Hi!
> > > > > > >
> > > > > > > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > > > > > > x60, i386 architecture).
> > > > >
> > > > > I just 3 cycles of on-line/off-line on 2.6.23-rc3 on ThinkPad x60s,
> > > > > and my system still survives.
> > > >
> > > > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > > > so cycles at one point.
> > > >
> > > > ...or maybe difference is in the .config, or maybe I broken something
> > > > in my kernel sources....

I have been doing enough CPU offline/online test these days and it works fine.
But there is no cpufreq driver which supports my machine. So my test didn't
cover test cpu hotplug code in cpufreq.

If you have cpufreq driver and it is built as module, it is worth trying
same test after unloading cpufreq driver in order to narrow down the problem
area.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 21:32     ` Pavel Machek
  2007-08-27 21:59       ` Rafael J. Wysocki
@ 2007-08-28 14:21       ` Jeff Chua
  2007-09-03  3:47         ` Pavel Machek
  2007-09-03  3:56         ` highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3] Pavel Machek
  1 sibling, 2 replies; 31+ messages in thread
From: Jeff Chua @ 2007-08-28 14:21 UTC (permalink / raw)
  To: Pavel Machek; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

On 8/28/07, Pavel Machek <pavel@ucw.cz> wrote:

> Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> so cycles at one point.

Mine still survives with this ... with sleep 1 ...

# for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
>/sys/devices/system/cpu/cpu1/online; sleep 1; done

and this as well ... without sleep ...

# for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
>/sys/devices/system/cpu/cpu1/online; done

I'm on reiserfs. gcc 3.4.5. Config sent to you seperately so as not to
cloud lkml. If anyone wants the config, please let me know. Is mime
"attachment" acceptable now on lkml?

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-27 10:43 cpu hotplug support broken in 2.6.23-rc3 Pavel Machek
  2007-08-27 10:58 ` Pavel Machek
@ 2007-08-29  8:08 ` Gautham R Shenoy
  2007-09-03  3:58   ` Pavel Machek
  1 sibling, 1 reply; 31+ messages in thread
From: Gautham R Shenoy @ 2007-08-29  8:08 UTC (permalink / raw)
  To: Pavel Machek; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

Hi Pavel,
On Mon, Aug 27, 2007 at 12:43:50PM +0200, Pavel Machek wrote:
> Hi!
> 
> Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> x60, i386 architecture).
> 

That's strange. 

I've been running cpu offline/online tests with kern bench, 
cpufreq-ondemand and a few rt-tasks running in the background
and it has worked for me. 
Something like 100 iterations without a problem. But these were on
machines with 4-8 cpus.  So may be this could be something specific to
the dual cpu machine.

Can you post the .config? I'll try to recreate it?

It's really strange since you mention that it tooks was  
an echo 1/0 into the sysfs file to break it. 

> Plus I guess it would be nice to add CPU HOTPLUG into MAINTAINERS
> file:
> 

There is a list of maintainers in the Documentation/cpu-hotplug.txt, 
which includes maintainers for different platforms as well.

It's a good idea to add that info to the MAINTAINERS file as well.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-28 14:21       ` Jeff Chua
@ 2007-09-03  3:47         ` Pavel Machek
  2007-09-03 10:19           ` Rafael J. Wysocki
  2007-09-03  3:56         ` highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3] Pavel Machek
  1 sibling, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-09-03  3:47 UTC (permalink / raw)
  To: Jeff Chua; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

Hi!

> > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > so cycles at one point.
> 
> Mine still survives with this ... with sleep 1 ...
> 
> # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> >/sys/devices/system/cpu/cpu1/online; sleep 1; done
> 
> and this as well ... without sleep ...
> 
> # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> >/sys/devices/system/cpu/cpu1/online; done
> 
> I'm on reiserfs. gcc 3.4.5. Config sent to you seperately so as not to
> cloud lkml. If anyone wants the config, please let me know. Is mime
> "attachment" acceptable now on lkml?

Ok, so it gets weirder. I have now machine in "hung" state; other
consoles still work, but there are no timers -  sleep 1 hangs forever.

sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.

So I indeed suspect difference-in-kconfig to trigger this, and will
try disabling noidlehz.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3]
  2007-08-28 14:21       ` Jeff Chua
  2007-09-03  3:47         ` Pavel Machek
@ 2007-09-03  3:56         ` Pavel Machek
  2007-09-03 12:34           ` Jeff Chua
  1 sibling, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-09-03  3:56 UTC (permalink / raw)
  To: Jeff Chua; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki, tglx

Hi!

> > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > so cycles at one point.
> 
> Mine still survives with this ... with sleep 1 ...
> 
> # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> >/sys/devices/system/cpu/cpu1/online; sleep 1; done
> 
> and this as well ... without sleep ...
> 
> # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> >/sys/devices/system/cpu/cpu1/online; done
> 
> I'm on reiserfs. gcc 3.4.5. Config sent to you seperately so as not to
> cloud lkml. If anyone wants the config, please let me know. Is mime
> "attachment" acceptable now on lkml?

It gets weirder. With "nohz=off" on commandline, I have to press any
key (generate interrupt?) for echo 1  > online to finish. 2.6.23-rc5
kernel... but hotplug/unplug works reliably now.

With nohz=off highres=off I can unplug/replug cpus as much as I
want... running in tight loop now.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-08-29  8:08 ` cpu hotplug support broken in 2.6.23-rc3 Gautham R Shenoy
@ 2007-09-03  3:58   ` Pavel Machek
  2007-11-15 22:37     ` cpu hotplug strangeness in 2.6.24-rc2 (was Re: cpu hotplug support broken in 2.6.23-rc3) Pavel Machek
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-09-03  3:58 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

On Wed 2007-08-29 13:38:27, Gautham R Shenoy wrote:
> Hi Pavel,
> On Mon, Aug 27, 2007 at 12:43:50PM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > Trying to do few onlines/offlines reliably hangs my machine (thinkpad
> > x60, i386 architecture).
> > 
> 
> That's strange. 
> 
> I've been running cpu offline/online tests with kern bench, 
> cpufreq-ondemand and a few rt-tasks running in the background
> and it has worked for me. 
> Something like 100 iterations without a problem. But these were on
> machines with 4-8 cpus.  So may be this could be something specific to
> the dual cpu machine.

Seems like it is specific to nohz/highrestimers. 

> Can you post the .config? I'll try to recreate it?

Will send privately.

> > Plus I guess it would be nice to add CPU HOTPLUG into MAINTAINERS
> > file:
> > 
> 
> There is a list of maintainers in the Documentation/cpu-hotplug.txt, 
> which includes maintainers for different platforms as well.
> 
> It's a good idea to add that info to the MAINTAINERS file as well.

Yes, please.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-03  3:47         ` Pavel Machek
@ 2007-09-03 10:19           ` Rafael J. Wysocki
  2007-09-03 12:35             ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2007-09-03 10:19 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jeff Chua, rusty, vatsa, zwane, kernel list, Thomas Gleixner

On Monday, 3 September 2007 05:47, Pavel Machek wrote:
> Hi!
> 
> > > Can you try 20-or-so tests? Mine hangs randomly, so it survived 4 or
> > > so cycles at one point.
> > 
> > Mine still survives with this ... with sleep 1 ...
> > 
> > # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> > >/sys/devices/system/cpu/cpu1/online; sleep 1; done
> > 
> > and this as well ... without sleep ...
> > 
> > # for((i=0; i<100; i++)); do echo $i; echo $((i % 2))
> > >/sys/devices/system/cpu/cpu1/online; done
> > 
> > I'm on reiserfs. gcc 3.4.5. Config sent to you seperately so as not to
> > cloud lkml. If anyone wants the config, please let me know. Is mime
> > "attachment" acceptable now on lkml?
> 
> Ok, so it gets weirder. I have now machine in "hung" state; other
> consoles still work, but there are no timers -  sleep 1 hangs forever.
> 
> sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> 
> So I indeed suspect difference-in-kconfig to trigger this, and will
> try disabling noidlehz.

I would unset CONFIG_HIGH_RES_TIMERS for starters.

Well, I guess Thomas should know about that. ;-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3]
  2007-09-03  3:56         ` highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3] Pavel Machek
@ 2007-09-03 12:34           ` Jeff Chua
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Chua @ 2007-09-03 12:34 UTC (permalink / raw)
  To: Pavel Machek; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki, tglx

On 9/3/07, Pavel Machek <pavel@ucw.cz> wrote:

> It gets weirder. With "nohz=off" on commandline, I have to press any
> key (generate interrupt?) for echo 1  > online to finish. 2.6.23-rc5
> kernel... but hotplug/unplug works reliably now.
>
> With nohz=off highres=off I can unplug/replug cpus as much as I
> want... running in tight loop now.

Yes. CONFIG_NO_HZ and and CONFIG_HIGH_RES_TIMERS has to be unset or
suspend-to-disk would just hang, unless you type something on the
keyboard, and then you can suspend to disk. It seems interrupts are
not missing.

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-03 10:19           ` Rafael J. Wysocki
@ 2007-09-03 12:35             ` Thomas Gleixner
  2007-09-04  7:27               ` Pavel Machek
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-03 12:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Jeff Chua, rusty, vatsa, zwane, kernel list

On Mon, 2007-09-03 at 12:19 +0200, Rafael J. Wysocki wrote:
> > Ok, so it gets weirder. I have now machine in "hung" state; other
> > consoles still work, but there are no timers -  sleep 1 hangs forever.
> > 
> > sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> > 
> > So I indeed suspect difference-in-kconfig to trigger this, and will
> > try disabling noidlehz.
> 
> I would unset CONFIG_HIGH_RES_TIMERS for starters.
> 
> Well, I guess Thomas should know about that. ;-)

What was the last known to work version ?

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-03 12:35             ` Thomas Gleixner
@ 2007-09-04  7:27               ` Pavel Machek
  2007-09-13 20:01                 ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-09-04  7:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list

> On Mon, 2007-09-03 at 12:19 +0200, Rafael J. Wysocki wrote:
> > > Ok, so it gets weirder. I have now machine in "hung" state; other
> > > consoles still work, but there are no timers -  sleep 1 hangs forever.
> > > 
> > > sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> > > 
> > > So I indeed suspect difference-in-kconfig to trigger this, and will
> > > try disabling noidlehz.
> > 
> > I would unset CONFIG_HIGH_RES_TIMERS for starters.
> > 
> > Well, I guess Thomas should know about that. ;-)
> 
> What was the last known to work version ?

I'm afraid I only turned on HIGH_RES_TIMERS in 2.6.23-rc1
timeframe... so I'm not sure if it ever worked for me.

I can confirm it is working in 2.6.23-rc5 with highres disabled, and
broken with highres enabled. NOHZ turns "waits for keypress during
unplug/replug" into "just plain hangs".
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-04  7:27               ` Pavel Machek
@ 2007-09-13 20:01                 ` Thomas Gleixner
  2007-09-14 12:38                   ` Pavel Machek
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-13 20:01 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown


On Tue, 2007-09-04 at 09:27 +0200, Pavel Machek wrote:
> > On Mon, 2007-09-03 at 12:19 +0200, Rafael J. Wysocki wrote:
> > > > Ok, so it gets weirder. I have now machine in "hung" state; other
> > > > consoles still work, but there are no timers -  sleep 1 hangs forever.
> > > > 
> > > > sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> > > > 
> > > > So I indeed suspect difference-in-kconfig to trigger this, and will
> > > > try disabling noidlehz.
> > > 
> > > I would unset CONFIG_HIGH_RES_TIMERS for starters.
> > > 
> > > Well, I guess Thomas should know about that. ;-)
> > 
> > What was the last known to work version ?
> 
> I'm afraid I only turned on HIGH_RES_TIMERS in 2.6.23-rc1
> timeframe... so I'm not sure if it ever worked for me.
> 
> I can confirm it is working in 2.6.23-rc5 with highres disabled, and
> broken with highres enabled. NOHZ turns "waits for keypress during
> unplug/replug" into "just plain hangs".

Ok, I can reproduce it and I tracked down what happens:

When the CPU goes offline, the clock event source for this CPU (lapic)
is removed from the clock events framework. This also clears the
information that the CPU is using C-States which stop the local APIC
timer.

Now you put the CPU online again and the local APIC timer is used, but
the C-State information is not evaluated again in ACPI. This means that
the clock events code does not know that the APIC might stop. In the
worst case this will happen and make the CPU wait for timer interrupts
forever.

The problem only appears when you are on battery (c3/c4 available) or on
those broken machines, where C2 is in reality C3 (e.g. akpm's VAIO)

I have an yet untested fix, which preserves the broadcast state across
the offline state, but Len is looking into it as well, whether we can
just reevaluate the power states (and the broadcast flags) when a cpu
becomes online again. If Len can do that easily for 2.6.23, I'd prefer
that.

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-13 20:01                 ` Thomas Gleixner
@ 2007-09-14 12:38                   ` Pavel Machek
  2007-09-14 12:50                     ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2007-09-14 12:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

Hi!

> > > What was the last known to work version ?
> > 
> > I'm afraid I only turned on HIGH_RES_TIMERS in 2.6.23-rc1
> > timeframe... so I'm not sure if it ever worked for me.
> > 
> > I can confirm it is working in 2.6.23-rc5 with highres disabled, and
> > broken with highres enabled. NOHZ turns "waits for keypress during
> > unplug/replug" into "just plain hangs".
> 
> Ok, I can reproduce it and I tracked down what happens:
> 
> When the CPU goes offline, the clock event source for this CPU (lapic)
> is removed from the clock events framework. This also clears the
> information that the CPU is using C-States which stop the local APIC
> timer.
> 
> Now you put the CPU online again and the local APIC timer is used, but
> the C-State information is not evaluated again in ACPI. This means that
> the clock events code does not know that the APIC might stop. In the
> worst case this will happen and make the CPU wait for timer interrupts
> forever.
> 
> The problem only appears when you are on battery (c3/c4 available) or on
> those broken machines, where C2 is in reality C3 (e.g. akpm's VAIO)
> 
> I have an yet untested fix, which preserves the broadcast state across
> the offline state, but Len is looking into it as well, whether we can
> just reevaluate the power states (and the broadcast flags) when a cpu
> becomes online again. If Len can do that easily for 2.6.23, I'd prefer
> that.

Is there a patch you want me to test? Or does Len have anything to
play with?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-14 12:38                   ` Pavel Machek
@ 2007-09-14 12:50                     ` Thomas Gleixner
  2007-09-14 13:15                       ` Thomas Gleixner
  2007-09-14 18:49                       ` Pallipadi, Venkatesh
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-14 12:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

Pavel,

On Fri, 2007-09-14 at 14:38 +0200, Pavel Machek wrote:
> > I have an yet untested fix, which preserves the broadcast state across
> > the offline state, but Len is looking into it as well, whether we can
> > just reevaluate the power states (and the broadcast flags) when a cpu
> > becomes online again. If Len can do that easily for 2.6.23, I'd prefer
> > that.
> 
> Is there a patch you want me to test? Or does Len have anything to
> play with?

Venki sent me an initial patch, but it has issues with the notify
ordering. Find below my "cache the broadcast flags" version for testing.

Thanks,

	tglx

---
 kernel/time/tick-broadcast.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c	2007-09-14 13:22:29.000000000 +0200
+++ linux-2.6/kernel/time/tick-broadcast.c	2007-09-14 13:22:29.000000000 +0200
@@ -261,10 +261,25 @@ void tick_broadcast_on_off(unsigned long
 	int cpu = get_cpu();
 
 	if (!cpu_isset(*oncpu, cpu_online_map)) {
-		printk(KERN_ERR "tick-braodcast: ignoring broadcast for "
-		       "offline CPU #%d\n", *oncpu);
-	} else {
+		unsigned long flags;
+
+		spin_lock_irqsave(&tick_broadcast_lock, flags);
+		/*
+		 * We need to cache the broadcast flag for offline
+		 * CPUs. ACPI currently does not reevaluate the
+		 * broadcast flag when a CPU goes online again. Adding
+		 * a cpu notifier to ACPI is probably the correct
+		 * solution, but it is hard to get this correct due to
+		 * notify ordering problems. So caching the flag is
+		 * the safe solution for now.
+		 */
+		if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ON)
+			cpu_set(*oncpu, tick_broadcast_mask);
+		else
+			cpu_clear(*oncpu, tick_broadcast_mask);
 
+		spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+	} else {
 		if (cpu == *oncpu)
 			tick_do_broadcast_on_off(&reason);
 		else



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-14 12:50                     ` Thomas Gleixner
@ 2007-09-14 13:15                       ` Thomas Gleixner
  2007-09-15  9:49                         ` Thomas Gleixner
  2007-09-14 18:49                       ` Pallipadi, Venkatesh
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-14 13:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

On Fri, 2007-09-14 at 14:50 +0200, Thomas Gleixner wrote:
> Pavel,
> 
> On Fri, 2007-09-14 at 14:38 +0200, Pavel Machek wrote:
> > > I have an yet untested fix, which preserves the broadcast state across
> > > the offline state, but Len is looking into it as well, whether we can
> > > just reevaluate the power states (and the broadcast flags) when a cpu
> > > becomes online again. If Len can do that easily for 2.6.23, I'd prefer
> > > that.
> > 
> > Is there a patch you want me to test? Or does Len have anything to
> > play with?
> 
> Venki sent me an initial patch, but it has issues with the notify
> ordering. Find below my "cache the broadcast flags" version for testing.

Hmmpf, the flag is still cleared when the cpu goes offline. Need to take
a closer look.

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: cpu hotplug support broken in 2.6.23-rc3
  2007-09-14 12:50                     ` Thomas Gleixner
  2007-09-14 13:15                       ` Thomas Gleixner
@ 2007-09-14 18:49                       ` Pallipadi, Venkatesh
  2007-09-14 19:18                         ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Pallipadi, Venkatesh @ 2007-09-14 18:49 UTC (permalink / raw)
  To: Thomas Gleixner, Pavel Machek
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>Thomas Gleixner
>Sent: Friday, September 14, 2007 5:51 AM
>To: Pavel Machek
>Cc: Rafael J. Wysocki; Jeff Chua; rusty@rustycorp.com.au; 
>vatsa@in.ibm.com; zwane@arm.linux.org.uk; kernel list; Len Brown
>Subject: Re: cpu hotplug support broken in 2.6.23-rc3
>
>Pavel,
>
>On Fri, 2007-09-14 at 14:38 +0200, Pavel Machek wrote:
>> > I have an yet untested fix, which preserves the broadcast 
>state across
>> > the offline state, but Len is looking into it as well, 
>whether we can
>> > just reevaluate the power states (and the broadcast flags) 
>when a cpu
>> > becomes online again. If Len can do that easily for 
>2.6.23, I'd prefer
>> > that.
>> 
>> Is there a patch you want me to test? Or does Len have anything to
>> play with?
>
>Venki sent me an initial patch, but it has issues with the notify
>ordering. Find below my "cache the broadcast flags" version 
>for testing.
>

While wirting that patch, I knew solution could not be that simple :(.
Does the patch work for online offline case atleast?
Will look at the Suspend/Resume ordering part in that case.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: cpu hotplug support broken in 2.6.23-rc3
  2007-09-14 18:49                       ` Pallipadi, Venkatesh
@ 2007-09-14 19:18                         ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-14 19:18 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Pavel Machek, Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane,
	kernel list, Len Brown

On Fri, 2007-09-14 at 11:49 -0700, Pallipadi, Venkatesh wrote:
> >> 
> >> Is there a patch you want me to test? Or does Len have anything to
> >> play with?
> >
> >Venki sent me an initial patch, but it has issues with the notify
> >ordering. Find below my "cache the broadcast flags" version 
> >for testing.
> >
> 
> While wirting that patch, I knew solution could not be that simple :(.
> Does the patch work for online offline case atleast?
> Will look at the Suspend/Resume ordering part in that case.

Yup, the online/offline part works and it helped me to decode the other
reason (/me needs a dark brown paperbag) why Pavel noticed that his box
turned into a brick. I'll send out a full series of fixups (including
your online/offline one) tomorrow morning. I want to give that some more
testing.

Vs. the resume reevaluation: I don't think it's an urgent problem. It's
only my VAIO which does not tell the kernel after resume that the power
supply source has changed. All my other boxen do that and we never had a
complaint about that from other folks.

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-14 13:15                       ` Thomas Gleixner
@ 2007-09-15  9:49                         ` Thomas Gleixner
  2007-09-15 10:18                           ` Andrew Morton
  2007-10-02  9:45                           ` Pavel Machek
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-15  9:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

Pavel,

On Fri, 2007-09-14 at 15:15 +0200, Thomas Gleixner wrote:
> > Venki sent me an initial patch, but it has issues with the notify
> > ordering. Find below my "cache the broadcast flags" version for testing.
> 
> Hmmpf, the flag is still cleared when the cpu goes offline. Need to take
> a closer look.

I finally tracked it down. There were several ways to turn the box into
a brick. Sigh !

Can you please test the combo patch below ?

The details are available from the for-2.6.23 branch of my hrt git repo:

http://git.kernel.org/?p=linux/kernel/git/tglx/linux-2.6-hrt.git;a=shortlog;h=for-2.6.23

Thanks,

	tglx

Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c	2007-09-15 11:42:09.000000000 +0200
+++ linux-2.6/kernel/time/timekeeping.c	2007-09-15 11:43:03.000000000 +0200
@@ -217,6 +217,7 @@ static void change_clocksource(void)
 }
 #else
 static inline void change_clocksource(void) { }
+static inline s64 __get_nsec_offset(void) { return 0; }
 #endif
 
 /**
@@ -280,6 +281,8 @@ void __init timekeeping_init(void)
 static int timekeeping_suspended;
 /* time in seconds when suspend began */
 static unsigned long timekeeping_suspend_time;
+/* xtime offset when we went into suspend */
+static s64 timekeeping_suspend_nsecs;
 
 /**
  * timekeeping_resume - Resumes the generic timekeeping subsystem.
@@ -305,6 +308,8 @@ static int timekeeping_resume(struct sys
 		wall_to_monotonic.tv_sec -= sleep_length;
 		total_sleep_time += sleep_length;
 	}
+	/* Make sure that we have the correct xtime reference */
+	timespec_add_ns(&xtime, timekeeping_suspend_nsecs);
 	/* re-base the last cycle value */
 	clock->cycle_last = clocksource_read(clock);
 	clock->error = 0;
@@ -325,9 +330,12 @@ static int timekeeping_suspend(struct sy
 {
 	unsigned long flags;
 
+	timekeeping_suspend_time = read_persistent_clock();
+
 	write_seqlock_irqsave(&xtime_lock, flags);
+	/* Get the current xtime offset */
+	timekeeping_suspend_nsecs = __get_nsec_offset();
 	timekeeping_suspended = 1;
-	timekeeping_suspend_time = read_persistent_clock();
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
 	clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
Index: linux-2.6/drivers/acpi/processor_core.c
===================================================================
--- linux-2.6.orig/drivers/acpi/processor_core.c	2007-09-15 11:42:09.000000000 +0200
+++ linux-2.6/drivers/acpi/processor_core.c	2007-09-15 11:43:03.000000000 +0200
@@ -724,6 +724,25 @@ static void acpi_processor_notify(acpi_h
 	return;
 }
 
+static int acpi_cpu_soft_notify(struct notifier_block *nfb,
+		unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	struct acpi_processor *pr = processors[cpu];
+
+	if (action == CPU_ONLINE && pr) {
+		acpi_processor_ppc_has_changed(pr);
+		acpi_processor_cst_has_changed(pr);
+		acpi_processor_tstate_has_changed(pr);
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block acpi_cpu_notifier =
+{
+	    .notifier_call = acpi_cpu_soft_notify,
+};
+
 static int acpi_processor_add(struct acpi_device *device)
 {
 	struct acpi_processor *pr = NULL;
@@ -987,6 +1006,7 @@ void acpi_processor_install_hotplug_noti
 			    ACPI_UINT32_MAX,
 			    processor_walk_namespace_cb, &action, NULL);
 #endif
+	register_hotcpu_notifier(&acpi_cpu_notifier);
 }
 
 static
@@ -999,6 +1019,7 @@ void acpi_processor_uninstall_hotplug_no
 			    ACPI_UINT32_MAX,
 			    processor_walk_namespace_cb, &action, NULL);
 #endif
+	unregister_hotcpu_notifier(&acpi_cpu_notifier);
 }
 
 /*
Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c	2007-09-15 11:42:09.000000000 +0200
+++ linux-2.6/kernel/time/tick-broadcast.c	2007-09-15 11:43:03.000000000 +0200
@@ -382,12 +382,23 @@ static int tick_broadcast_set_event(ktim
 
 int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {
+	int cpu = smp_processor_id();
+
+	/*
+	 * If the CPU is marked for broadcast, enforce oneshot
+	 * broadcast mode. The jinxed VAIO does not resume otherwise.
+	 * No idea why it ends up in a lower C State during resume
+	 * without notifying the clock events layer.
+	 */
+	if (cpu_isset(cpu, tick_broadcast_mask))
+		cpu_set(cpu, tick_broadcast_oneshot_mask);
+
 	clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
 
 	if(!cpus_empty(tick_broadcast_oneshot_mask))
 		tick_broadcast_set_event(ktime_get(), 1);
 
-	return cpu_isset(smp_processor_id(), tick_broadcast_oneshot_mask);
+	return cpu_isset(cpu, tick_broadcast_oneshot_mask);
 }
 
 /*
@@ -549,20 +560,17 @@ void tick_broadcast_switch_to_oneshot(vo
  */
 void tick_shutdown_broadcast_oneshot(unsigned int *cpup)
 {
-	struct clock_event_device *bc;
 	unsigned long flags;
 	unsigned int cpu = *cpup;
 
 	spin_lock_irqsave(&tick_broadcast_lock, flags);
 
-	bc = tick_broadcast_device.evtdev;
+	/*
+	 * Clear the broadcast mask flag for the dead cpu, but do not
+	 * stop the broadcast device!
+	 */
 	cpu_clear(cpu, tick_broadcast_oneshot_mask);
 
-	if (tick_broadcast_device.mode == TICKDEV_MODE_ONESHOT) {
-		if (bc && cpus_empty(tick_broadcast_oneshot_mask))
-			clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
-	}
-
 	spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c	2007-09-15 11:42:09.000000000 +0200
+++ linux-2.6/kernel/time/tick-sched.c	2007-09-15 11:43:41.000000000 +0200
@@ -160,6 +160,18 @@ void tick_nohz_stop_sched_tick(void)
 	cpu = smp_processor_id();
 	ts = &per_cpu(tick_cpu_sched, cpu);
 
+	/*
+	 * If this cpu is offline and it is the one which updates
+	 * jiffies, then give up the assignment and let it be taken by
+	 * the cpu which runs the tick timer next. If we don't drop
+	 * this here the jiffies might be stale and do_timer() never
+	 * invoked.
+	 */
+	if (unlikely(!cpu_online(cpu))) {
+		if (cpu == tick_do_timer_cpu)
+			tick_do_timer_cpu = -1;
+	}
+
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
 		goto end;
 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-15  9:49                         ` Thomas Gleixner
@ 2007-09-15 10:18                           ` Andrew Morton
  2007-09-15 13:28                             ` Thomas Gleixner
  2007-09-15 13:44                             ` Thomas Gleixner
  2007-10-02  9:45                           ` Pavel Machek
  1 sibling, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2007-09-15 10:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pavel Machek, Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane,
	kernel list, Len Brown

On Sat, 15 Sep 2007 11:49:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 2007-09-14 at 15:15 +0200, Thomas Gleixner wrote:
> > > Venki sent me an initial patch, but it has issues with the notify
> > > ordering. Find below my "cache the broadcast flags" version for testing.
> > 
> > Hmmpf, the flag is still cleared when the cpu goes offline. Need to take
> > a closer look.
> 
> I finally tracked it down. There were several ways to turn the box into
> a brick. Sigh !
> 
> Can you please test the combo patch below ?
> 
> The details are available from the for-2.6.23 branch of my hrt git repo:
> 
> http://git.kernel.org/?p=linux/kernel/git/tglx/linux-2.6-hrt.git;a=shortlog;h=for-2.6.23
> 

That patch fixes the resume-from-ram and suspend-to-ram regressions on the
Vaio.

I dropped the timekeeping.c hunks because they are an older version of
timekeeping-prevent-time-going-backwards-on-resume.patch which I already
had.

Is this good to go?  Needs a bit of changelogging.


 drivers/acpi/processor_core.c |   21 +++++++++++++++++++++
 kernel/time/tick-broadcast.c  |   24 ++++++++++++++++--------
 kernel/time/tick-sched.c      |   12 ++++++++++++
 3 files changed, 49 insertions(+), 8 deletions(-)

diff -puN drivers/acpi/processor_core.c~cpu-hotplug-support-broken-in-2623-rc3 drivers/acpi/processor_core.c
--- a/drivers/acpi/processor_core.c~cpu-hotplug-support-broken-in-2623-rc3
+++ a/drivers/acpi/processor_core.c
@@ -724,6 +724,25 @@ static void acpi_processor_notify(acpi_h
 	return;
 }
 
+static int acpi_cpu_soft_notify(struct notifier_block *nfb,
+		unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	struct acpi_processor *pr = processors[cpu];
+
+	if (action == CPU_ONLINE && pr) {
+		acpi_processor_ppc_has_changed(pr);
+		acpi_processor_cst_has_changed(pr);
+		acpi_processor_tstate_has_changed(pr);
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block acpi_cpu_notifier =
+{
+	    .notifier_call = acpi_cpu_soft_notify,
+};
+
 static int acpi_processor_add(struct acpi_device *device)
 {
 	struct acpi_processor *pr = NULL;
@@ -987,6 +1006,7 @@ void acpi_processor_install_hotplug_noti
 			    ACPI_UINT32_MAX,
 			    processor_walk_namespace_cb, &action, NULL);
 #endif
+	register_hotcpu_notifier(&acpi_cpu_notifier);
 }
 
 static
@@ -999,6 +1019,7 @@ void acpi_processor_uninstall_hotplug_no
 			    ACPI_UINT32_MAX,
 			    processor_walk_namespace_cb, &action, NULL);
 #endif
+	unregister_hotcpu_notifier(&acpi_cpu_notifier);
 }
 
 /*
diff -puN kernel/time/tick-broadcast.c~cpu-hotplug-support-broken-in-2623-rc3 kernel/time/tick-broadcast.c
--- a/kernel/time/tick-broadcast.c~cpu-hotplug-support-broken-in-2623-rc3
+++ a/kernel/time/tick-broadcast.c
@@ -382,12 +382,23 @@ static int tick_broadcast_set_event(ktim
 
 int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {
+	int cpu = smp_processor_id();
+
+	/*
+	 * If the CPU is marked for broadcast, enforce oneshot
+	 * broadcast mode. The jinxed VAIO does not resume otherwise.
+	 * No idea why it ends up in a lower C State during resume
+	 * without notifying the clock events layer.
+	 */
+	if (cpu_isset(cpu, tick_broadcast_mask))
+		cpu_set(cpu, tick_broadcast_oneshot_mask);
+
 	clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
 
 	if(!cpus_empty(tick_broadcast_oneshot_mask))
 		tick_broadcast_set_event(ktime_get(), 1);
 
-	return cpu_isset(smp_processor_id(), tick_broadcast_oneshot_mask);
+	return cpu_isset(cpu, tick_broadcast_oneshot_mask);
 }
 
 /*
@@ -549,20 +560,17 @@ void tick_broadcast_switch_to_oneshot(vo
  */
 void tick_shutdown_broadcast_oneshot(unsigned int *cpup)
 {
-	struct clock_event_device *bc;
 	unsigned long flags;
 	unsigned int cpu = *cpup;
 
 	spin_lock_irqsave(&tick_broadcast_lock, flags);
 
-	bc = tick_broadcast_device.evtdev;
+	/*
+	 * Clear the broadcast mask flag for the dead cpu, but do not
+	 * stop the broadcast device!
+	 */
 	cpu_clear(cpu, tick_broadcast_oneshot_mask);
 
-	if (tick_broadcast_device.mode == TICKDEV_MODE_ONESHOT) {
-		if (bc && cpus_empty(tick_broadcast_oneshot_mask))
-			clockevents_set_mode(bc, CLOCK_EVT_MODE_SHUTDOWN);
-	}
-
 	spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
diff -puN kernel/time/tick-sched.c~cpu-hotplug-support-broken-in-2623-rc3 kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c~cpu-hotplug-support-broken-in-2623-rc3
+++ a/kernel/time/tick-sched.c
@@ -160,6 +160,18 @@ void tick_nohz_stop_sched_tick(void)
 	cpu = smp_processor_id();
 	ts = &per_cpu(tick_cpu_sched, cpu);
 
+	/*
+	 * If this cpu is offline and it is the one which updates
+	 * jiffies, then give up the assignment and let it be taken by
+	 * the cpu which runs the tick timer next. If we don't drop
+	 * this here the jiffies might be stale and do_timer() never
+	 * invoked.
+	 */
+	if (unlikely(!cpu_online(cpu))) {
+		if (cpu == tick_do_timer_cpu)
+			tick_do_timer_cpu = -1;
+	}
+
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
 		goto end;
 
_


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-15 10:18                           ` Andrew Morton
@ 2007-09-15 13:28                             ` Thomas Gleixner
  2007-09-15 22:01                               ` Andrew Morton
  2007-09-15 13:44                             ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-15 13:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pavel Machek, Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane,
	kernel list, Len Brown

On Sat, 2007-09-15 at 03:18 -0700, Andrew Morton wrote:
> > http://git.kernel.org/?p=linux/kernel/git/tglx/linux-2.6-hrt.git;a=shortlog;h=for-2.6.23
> > 
> 
> That patch fixes the resume-from-ram and suspend-to-ram regressions on the
> Vaio.
> 
> I dropped the timekeeping.c hunks because they are an older version of
> timekeeping-prevent-time-going-backwards-on-resume.patch which I already
> had.
> 
> Is this good to go?  Needs a bit of changelogging.

Changelog it in the git tree. Please pull from there:

The following changes since commit 53a3f3087be361dacfc02e7a85b6d6142a41ce8a:
  Linus Torvalds (1):
        Merge branch 'for-linus' of master.kernel.org:/.../cooloney/blackfin-2.6

are available in the git repository at:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt.git for-2.6.23

Thomas Gleixner (6):
      timekeeping: access rtc outside of xtime lock
      timekeeping: Prevent time going backwards on resume
      ACPI: Reevaluate C/P/T states when a cpu becomes online
      clockevents: Enforce oneshot broadcast when broadcast mask is set on resume
      clockevents: do not shutdown the oneshot broadcast device
      clockevents: prevent stale tick update on offline cpu

 drivers/acpi/processor_core.c |   21 +++++++++++++++++++++
 kernel/time/tick-broadcast.c  |   24 ++++++++++++++++--------
 kernel/time/tick-sched.c      |   12 ++++++++++++
 kernel/time/timekeeping.c     |   10 +++++++++-
 4 files changed, 58 insertions(+), 9 deletions(-)



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-15 10:18                           ` Andrew Morton
  2007-09-15 13:28                             ` Thomas Gleixner
@ 2007-09-15 13:44                             ` Thomas Gleixner
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2007-09-15 13:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pavel Machek, Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane,
	kernel list, Len Brown

On Sat, 2007-09-15 at 03:18 -0700, Andrew Morton wrote:
> On Sat, 15 Sep 2007 11:49:41 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I dropped the timekeeping.c hunks because they are an older version of
> timekeeping-prevent-time-going-backwards-on-resume.patch which I already
> had.

Err, no. The timekeeping hunk is redone due to the lockdep fix which I
made.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-15 13:28                             ` Thomas Gleixner
@ 2007-09-15 22:01                               ` Andrew Morton
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Morton @ 2007-09-15 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pavel Machek, Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane,
	kernel list, Len Brown

On Sat, 15 Sep 2007 15:28:23 +0200 Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sat, 2007-09-15 at 03:18 -0700, Andrew Morton wrote:
> > > http://git.kernel.org/?p=linux/kernel/git/tglx/linux-2.6-hrt.git;a=shortlog;h=for-2.6.23
> > > 
> > 
> > That patch fixes the resume-from-ram and suspend-to-ram regressions on the
> > Vaio.
> > 
> > I dropped the timekeeping.c hunks because they are an older version of
> > timekeeping-prevent-time-going-backwards-on-resume.patch which I already
> > had.
> > 
> > Is this good to go?  Needs a bit of changelogging.
> 
> Changelog it in the git tree. Please pull from there:

who, me?

> The following changes since commit 53a3f3087be361dacfc02e7a85b6d6142a41ce8a:
>   Linus Torvalds (1):
>         Merge branch 'for-linus' of master.kernel.org:/.../cooloney/blackfin-2.6
> 
> are available in the git repository at:
> 
>   ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt.git for-2.6.23
> 
> Thomas Gleixner (6):
>       timekeeping: access rtc outside of xtime lock
>       timekeeping: Prevent time going backwards on resume
>       ACPI: Reevaluate C/P/T states when a cpu becomes online
>       clockevents: Enforce oneshot broadcast when broadcast mask is set on resume
>       clockevents: do not shutdown the oneshot broadcast device
>       clockevents: prevent stale tick update on offline cpu

please send it to Linus?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: cpu hotplug support broken in 2.6.23-rc3
  2007-09-15  9:49                         ` Thomas Gleixner
  2007-09-15 10:18                           ` Andrew Morton
@ 2007-10-02  9:45                           ` Pavel Machek
  1 sibling, 0 replies; 31+ messages in thread
From: Pavel Machek @ 2007-10-02  9:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Jeff Chua, rusty, vatsa, zwane, kernel list,
	Len Brown

Hi!

> > > Venki sent me an initial patch, but it has issues with the notify
> > > ordering. Find below my "cache the broadcast flags" version for testing.
> > 
> > Hmmpf, the flag is still cleared when the cpu goes offline. Need to take
> > a closer look.
> 
> I finally tracked it down. There were several ways to turn the box into
> a brick. Sigh !
> 
> Can you please test the combo patch below ?

Sorry, I was on holidays. I assume this is in -rc9 or so, already?
Yes, seems so.

Unfortunately, cpu hotplug seems to be still behaving strangely in
-rc9. I can echo 0 > online (and cpu will go down). I do echo 0 >
online, again, and I get -EBUSY. Good. But I try to do echo 1 >
online, and get -EBUSY, too... and that's bad :-(.

root@amd:/sys/devices/system/cpu/cpu1# echo 0 > online
root@amd:/sys/devices/system/cpu/cpu1# echo 0 > online
-bash: echo: write error: Device or resource busy
root@amd:/sys/devices/system/cpu/cpu1# echo 1 > online
-bash: echo: write error: Device or resource busy
root@amd:/sys/devices/system/cpu/cpu1# uname -a
Linux amd 2.6.23-rc9 #507 SMP Tue Oct 2 09:58:40 CEST 2007 i686
GNU/Linux

Kernel says:

Oct  2 11:42:12 amd log1n[1436]: ROOT LOGIN on `tty1'
Oct  2 11:42:56 amd kernel: CPU 1 is now offline
Oct  2 11:42:56 amd kernel: SMP alternatives: switching to UP code

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* cpu hotplug strangeness in 2.6.24-rc2 (was Re: cpu hotplug support broken in 2.6.23-rc3)
  2007-09-03  3:58   ` Pavel Machek
@ 2007-11-15 22:37     ` Pavel Machek
  0 siblings, 0 replies; 31+ messages in thread
From: Pavel Machek @ 2007-11-15 22:37 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: rusty, vatsa, zwane, kernel list, Rafael J. Wysocki

Hi!

> > > Plus I guess it would be nice to add CPU HOTPLUG into MAINTAINERS
> > > file:
> > > 
> > 
> > There is a list of maintainers in the Documentation/cpu-hotplug.txt, 
> > which includes maintainers for different platforms as well.
> > 
> > It's a good idea to add that info to the MAINTAINERS file as well.
> 
> Yes, please.

Just an update... In 2.6.24-rc2, cpu hotplug basically works, _but_:

if I do echo 0 > online; echo 0 > online; at same cpu, I get error,
and can't up anything any more. It is not serious, but it is not
pretty, either.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-11-15 22:37 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-27 10:43 cpu hotplug support broken in 2.6.23-rc3 Pavel Machek
2007-08-27 10:58 ` Pavel Machek
2007-08-27 14:36   ` Jeff Chua
2007-08-27 15:22     ` Michal Piotrowski
2007-08-27 21:32     ` Pavel Machek
2007-08-27 21:59       ` Rafael J. Wysocki
2007-08-27 21:58         ` Pavel Machek
2007-08-28 10:30           ` Rafael J. Wysocki
2007-08-28 13:00             ` Akinobu Mita
2007-08-28 14:21       ` Jeff Chua
2007-09-03  3:47         ` Pavel Machek
2007-09-03 10:19           ` Rafael J. Wysocki
2007-09-03 12:35             ` Thomas Gleixner
2007-09-04  7:27               ` Pavel Machek
2007-09-13 20:01                 ` Thomas Gleixner
2007-09-14 12:38                   ` Pavel Machek
2007-09-14 12:50                     ` Thomas Gleixner
2007-09-14 13:15                       ` Thomas Gleixner
2007-09-15  9:49                         ` Thomas Gleixner
2007-09-15 10:18                           ` Andrew Morton
2007-09-15 13:28                             ` Thomas Gleixner
2007-09-15 22:01                               ` Andrew Morton
2007-09-15 13:44                             ` Thomas Gleixner
2007-10-02  9:45                           ` Pavel Machek
2007-09-14 18:49                       ` Pallipadi, Venkatesh
2007-09-14 19:18                         ` Thomas Gleixner
2007-09-03  3:56         ` highres timers break cpu hotplug in 2.6.23-rc5 [was Re: cpu hotplug support broken in 2.6.23-rc3] Pavel Machek
2007-09-03 12:34           ` Jeff Chua
2007-08-29  8:08 ` cpu hotplug support broken in 2.6.23-rc3 Gautham R Shenoy
2007-09-03  3:58   ` Pavel Machek
2007-11-15 22:37     ` cpu hotplug strangeness in 2.6.24-rc2 (was Re: cpu hotplug support broken in 2.6.23-rc3) Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).