LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
@ 2008-01-27 21:29 Rafael J. Wysocki
2008-01-27 21:59 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-27 21:29 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML
Hi,
2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation code paths
(probably related to the disabling of nonboot CPUs), which is !@#$%^&*()
annoying.
It's 100% reproducible on my HP nx6325 and bisection idendified the following
commit as the first bad one:
commit 764a9d6fe4b52995c8aba277e3634385699354f4
Author: Steven Rostedt <srostedt@redhat.com>
Date: Fri Jan 25 21:08:04 2008 +0100
sched: track highest prio task queued
Greetings,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki
@ 2008-01-27 21:59 ` Ingo Molnar
2008-01-28 1:26 ` Rafael J. Wysocki
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-01-27 21:59 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
* Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Hi,
>
> 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> code paths (probably related to the disabling of nonboot CPUs), which
> is !@#$%^&*() annoying.
>
> It's 100% reproducible on my HP nx6325 and bisection idendified the
> following commit as the first bad one:
>
> commit 764a9d6fe4b52995c8aba277e3634385699354f4
> Author: Steven Rostedt <srostedt@redhat.com>
> Date: Fri Jan 25 21:08:04 2008 +0100
>
> sched: track highest prio task queued
hm, this patch is a NOP, so it's weird that it has an effect.
Do you have serial logging enabled perhaps? If the following WARN_ON()
triggers:
+ WARN_ON(p->prio < rq->rt.highest_prio);
then perhaps that can cause a 5-10 seconds delay. (that's how much time
it takes to printk a warning on the slowest serial settings)
but if you use suspend, then any such printks would be preserved in the
dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
get faster?
this does have the feel of being scheduling related, but are you
absolutely sure about the precise identity of the patch? It's not the
next patch or the preceding patch by any chance? (which would also be
scheduler patches)
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-27 21:59 ` Ingo Molnar
@ 2008-01-28 1:26 ` Rafael J. Wysocki
2008-01-28 1:40 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 1:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
On Sunday, 27 of January 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> > Hi,
> >
> > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > code paths (probably related to the disabling of nonboot CPUs), which
> > is !@#$%^&*() annoying.
> >
> > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > following commit as the first bad one:
> >
> > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > Author: Steven Rostedt <srostedt@redhat.com>
> > Date: Fri Jan 25 21:08:04 2008 +0100
> >
> > sched: track highest prio task queued
>
> hm, this patch is a NOP, so it's weird that it has an effect.
>
> Do you have serial logging enabled perhaps? If the following WARN_ON()
> triggers:
>
> + WARN_ON(p->prio < rq->rt.highest_prio);
>
> then perhaps that can cause a 5-10 seconds delay. (that's how much time
> it takes to printk a warning on the slowest serial settings)
>
> but if you use suspend, then any such printks would be preserved in the
> dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> get faster?
No, this isn't the WARN_ON().
> this does have the feel of being scheduling related, but are you
> absolutely sure about the precise identity of the patch?
Actually, not quite. That's why I have verified it and found that another
patch is really responsible for the issue, namely:
commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
Author: Ingo Molnar <mingo@elte.hu>
Date: Fri Jan 25 21:08:02 2008 +0100
softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
Reverting this commit (it reverts with some minor modifications) fixes the
problem for me.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 1:26 ` Rafael J. Wysocki
@ 2008-01-28 1:40 ` Steven Rostedt
2008-01-28 11:31 ` Rafael J. Wysocki
2008-01-28 8:56 ` Dmitry Adamushko
2008-01-31 15:58 ` Peter Zijlstra
2 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2008-01-28 1:40 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
Rafael J. Wysocki wrote:
> No, this isn't the WARN_ON().
>
>> this does have the feel of being scheduling related, but are you
>> absolutely sure about the precise identity of the patch?
>
> Actually, not quite. That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
>
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date: Fri Jan 25 21:08:02 2008 +0100
>
> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
Are you getting a bunch of prints from the softlockup detector in dmesg?
I wonder if the detector can detect a long timeout caused by suspend and
resume and if not is triggering false positives?
-- Steve
>
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 1:26 ` Rafael J. Wysocki
2008-01-28 1:40 ` Steven Rostedt
@ 2008-01-28 8:56 ` Dmitry Adamushko
2008-01-28 11:32 ` Rafael J. Wysocki
2008-01-28 16:08 ` Rafael J. Wysocki
2008-01-31 15:58 ` Peter Zijlstra
2 siblings, 2 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-01-28 8:56 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Peter Zijlstra
On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Sunday, 27 of January 2008, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > > Hi,
> > >
> > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > code paths (probably related to the disabling of nonboot CPUs), which
> > > is !@#$%^&*() annoying.
> > >
> > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > following commit as the first bad one:
> > >
> > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > Author: Steven Rostedt <srostedt@redhat.com>
> > > Date: Fri Jan 25 21:08:04 2008 +0100
> > >
> > > sched: track highest prio task queued
> >
> > hm, this patch is a NOP, so it's weird that it has an effect.
> >
> > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > triggers:
> >
> > + WARN_ON(p->prio < rq->rt.highest_prio);
> >
> > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > it takes to printk a warning on the slowest serial settings)
> >
> > but if you use suspend, then any such printks would be preserved in the
> > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > get faster?
>
> No, this isn't the WARN_ON().
>
> > this does have the feel of being scheduling related, but are you
> > absolutely sure about the precise identity of the patch?
>
> Actually, not quite. That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
>
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date: Fri Jan 25 21:08:02 2008 +0100
>
> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.
What if you use the same kernel that triggers a problem and just disable
this new 'softlockup' functionality:
echo 0 > /proc/sys/kernel/hung_task_timeout_secs
does the problem disapear?
TIA,
>
> Thanks,
> Rafael
>
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 1:40 ` Steven Rostedt
@ 2008-01-28 11:31 ` Rafael J. Wysocki
2008-01-28 16:31 ` Rafael J. Wysocki
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 11:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
On Monday, 28 of January 2008, Steven Rostedt wrote:
> Rafael J. Wysocki wrote:
>
> > No, this isn't the WARN_ON().
> >
> >> this does have the feel of being scheduling related, but are you
> >> absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite. That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Fri Jan 25 21:08:02 2008 +0100
> >
> > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>
> Are you getting a bunch of prints from the softlockup detector in dmesg?
No, I don't. In fact, I don't get _any_ messages from it whatsoever.
> I wonder if the detector can detect a long timeout caused by suspend and
> resume and if not is triggering false positives?
I'm not sure, but the code is supposed to be suspend-aware, IIRC. However,
I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
more directly related to the CPU hotplug. I'll try to verify that.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 8:56 ` Dmitry Adamushko
@ 2008-01-28 11:32 ` Rafael J. Wysocki
2008-01-28 16:08 ` Rafael J. Wysocki
1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 11:32 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Peter Zijlstra
On Monday, 28 of January 2008, Dmitry Adamushko wrote:
> On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >
> > > > Hi,
> > > >
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > > code paths (probably related to the disabling of nonboot CPUs), which
> > > > is !@#$%^&*() annoying.
> > > >
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > > following commit as the first bad one:
> > > >
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date: Fri Jan 25 21:08:04 2008 +0100
> > > >
> > > > sched: track highest prio task queued
> > >
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > >
> > > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > > triggers:
> > >
> > > + WARN_ON(p->prio < rq->rt.highest_prio);
> > >
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > > it takes to printk a warning on the slowest serial settings)
> > >
> > > but if you use suspend, then any such printks would be preserved in the
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > > get faster?
> >
> > No, this isn't the WARN_ON().
> >
> > > this does have the feel of being scheduling related, but are you
> > > absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite. That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Fri Jan 25 21:08:02 2008 +0100
> >
> > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
>
> What if you use the same kernel that triggers a problem and just disable
> this new 'softlockup' functionality:
>
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>
> does the problem disapear?
I haven't checked that yet, will do it later today.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 8:56 ` Dmitry Adamushko
2008-01-28 11:32 ` Rafael J. Wysocki
@ 2008-01-28 16:08 ` Rafael J. Wysocki
1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 16:08 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Peter Zijlstra
On Monday, 28 of January 2008, Dmitry Adamushko wrote:
> On 28/01/2008, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >
> > > > Hi,
> > > >
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > > code paths (probably related to the disabling of nonboot CPUs), which
> > > > is !@#$%^&*() annoying.
> > > >
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > > following commit as the first bad one:
> > > >
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date: Fri Jan 25 21:08:04 2008 +0100
> > > >
> > > > sched: track highest prio task queued
> > >
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > >
> > > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > > triggers:
> > >
> > > + WARN_ON(p->prio < rq->rt.highest_prio);
> > >
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > > it takes to printk a warning on the slowest serial settings)
> > >
> > > but if you use suspend, then any such printks would be preserved in the
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > > get faster?
> >
> > No, this isn't the WARN_ON().
> >
> > > this does have the feel of being scheduling related, but are you
> > > absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite. That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Fri Jan 25 21:08:02 2008 +0100
> >
> > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
>
> What if you use the same kernel that triggers a problem and just disable
> this new 'softlockup' functionality:
>
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>
> does the problem disapear?
No, it doesn't. The setting doesn't seem to have any effect on it.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 11:31 ` Rafael J. Wysocki
@ 2008-01-28 16:31 ` Rafael J. Wysocki
2008-01-28 16:46 ` Steven Rostedt
2008-01-29 0:08 ` Rafael J. Wysocki
0 siblings, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-28 16:31 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Steven Rostedt wrote:
> > Rafael J. Wysocki wrote:
> >
> > > No, this isn't the WARN_ON().
> > >
> > >> this does have the feel of being scheduling related, but are you
> > >> absolutely sure about the precise identity of the patch?
> > >
> > > Actually, not quite. That's why I have verified it and found that another
> > > patch is really responsible for the issue, namely:
> > >
> > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > > Author: Ingo Molnar <mingo@elte.hu>
> > > Date: Fri Jan 25 21:08:02 2008 +0100
> > >
> > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Are you getting a bunch of prints from the softlockup detector in dmesg?
>
> No, I don't. In fact, I don't get _any_ messages from it whatsoever.
>
> > I wonder if the detector can detect a long timeout caused by suspend and
> > resume and if not is triggering false positives?
>
> I'm not sure, but the code is supposed to be suspend-aware, IIRC. However,
> I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
> more directly related to the CPU hotplug. I'll try to verify that.
As I expected, the delay is also observable when I do:
echo 0 > /sys/devices/system/cpu/cpu1/online
(it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg
when this happens.
I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
to try that later today).
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 16:31 ` Rafael J. Wysocki
@ 2008-01-28 16:46 ` Steven Rostedt
2008-01-29 0:08 ` Rafael J. Wysocki
1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2008-01-28 16:46 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ingo Molnar, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
>> On Monday, 28 of January 2008, Steven Rostedt wrote:
>>> Rafael J. Wysocki wrote:
>>>
>>>> No, this isn't the WARN_ON().
>>>>
>>>>> this does have the feel of being scheduling related, but are you
>>>>> absolutely sure about the precise identity of the patch?
>>>> Actually, not quite. That's why I have verified it and found that another
>>>> patch is really responsible for the issue, namely:
>>>>
>>>> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
>>>> Author: Ingo Molnar <mingo@elte.hu>
>>>> Date: Fri Jan 25 21:08:02 2008 +0100
>>>>
>>>> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>>> Are you getting a bunch of prints from the softlockup detector in dmesg?
>> No, I don't. In fact, I don't get _any_ messages from it whatsoever.
>>
>>> I wonder if the detector can detect a long timeout caused by suspend and
>>> resume and if not is triggering false positives?
>> I'm not sure, but the code is supposed to be suspend-aware, IIRC. However,
>> I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
>> more directly related to the CPU hotplug. I'll try to verify that.
>
> As I expected, the delay is also observable when I do:
>
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> (it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg
> when this happens.
>
> I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
> to try that later today).
Could you download my tracing patches from here:
http://people.redhat.com/srostedt/tracing/mcount-tracing-patches-v6.tar.bz2
Apply the above patches and select all but the histogram tracers. Then
run this program
http://people.redhat.com/srostedt/tracing/trace-cmd.c
./trace-cmd -f echo 0 > /sys/devices/system/cpu/cpu1/online
then do this (mount debugfs):
bzip2 -c /debugfs/tracing/latency_trace > /tmp/trace.bz2
and send the result to myself and Ingo.
Thanks,
-- Steve
>
> Thanks,
> Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 16:31 ` Rafael J. Wysocki
2008-01-28 16:46 ` Steven Rostedt
@ 2008-01-29 0:08 ` Rafael J. Wysocki
1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-29 0:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko, Peter Zijlstra
On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> On Monday, 28 of January 2008, Rafael J. Wysocki wrote:
> > On Monday, 28 of January 2008, Steven Rostedt wrote:
> > > Rafael J. Wysocki wrote:
> > >
> > > > No, this isn't the WARN_ON().
> > > >
> > > >> this does have the feel of being scheduling related, but are you
> > > >> absolutely sure about the precise identity of the patch?
> > > >
> > > > Actually, not quite. That's why I have verified it and found that another
> > > > patch is really responsible for the issue, namely:
> > > >
> > > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > > > Author: Ingo Molnar <mingo@elte.hu>
> > > > Date: Fri Jan 25 21:08:02 2008 +0100
> > > >
> > > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> > >
> > > Are you getting a bunch of prints from the softlockup detector in dmesg?
> >
> > No, I don't. In fact, I don't get _any_ messages from it whatsoever.
> >
> > > I wonder if the detector can detect a long timeout caused by suspend and
> > > resume and if not is triggering false positives?
> >
> > I'm not sure, but the code is supposed to be suspend-aware, IIRC. However,
> > I'm seeing a similar symptom on poweroff on an SMP x86-64 box, so it may be
> > more directly related to the CPU hotplug. I'll try to verify that.
>
> As I expected, the delay is also observable when I do:
>
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> (it's variable, between 3 and 30 seconds). Again, no messages appear in dmesg
> when this happens.
>
> I suspect I'll be able to reproduce it on another x86-64 SMP machine (I'm going
> to try that later today).
FWIW, the problem is reproducible on the second machine, with the same
symptoms. Both machines are AMD-based.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-28 1:26 ` Rafael J. Wysocki
2008-01-28 1:40 ` Steven Rostedt
2008-01-28 8:56 ` Dmitry Adamushko
@ 2008-01-31 15:58 ` Peter Zijlstra
2008-01-31 20:54 ` Rafael J. Wysocki
2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-01-31 15:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko
On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote:
> On Sunday, 27 of January 2008, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > > Hi,
> > >
> > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > code paths (probably related to the disabling of nonboot CPUs), which
> > > is !@#$%^&*() annoying.
> > >
> > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > following commit as the first bad one:
> > >
> > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > Author: Steven Rostedt <srostedt@redhat.com>
> > > Date: Fri Jan 25 21:08:04 2008 +0100
> > >
> > > sched: track highest prio task queued
> >
> > hm, this patch is a NOP, so it's weird that it has an effect.
> >
> > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > triggers:
> >
> > + WARN_ON(p->prio < rq->rt.highest_prio);
> >
> > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > it takes to printk a warning on the slowest serial settings)
> >
> > but if you use suspend, then any such printks would be preserved in the
> > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > get faster?
>
> No, this isn't the WARN_ON().
>
> > this does have the feel of being scheduling related, but are you
> > absolutely sure about the precise identity of the patch?
>
> Actually, not quite. That's why I have verified it and found that another
> patch is really responsible for the issue, namely:
>
> commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> Author: Ingo Molnar <mingo@elte.hu>
> Date: Fri Jan 25 21:08:02 2008 +0100
>
> softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
>
> Reverting this commit (it reverts with some minor modifications) fixes the
> problem for me.
I can seem to reproduce this:
[root@opteron cpu1]# time echo 0 > online
real 0m6.230s
user 0m0.000s
sys 0m0.010s
[root@opteron cpu1]# echo 1 > online
[root@opteron cpu1]# time echo 0 > online
real 0m7.966s
user 0m0.000s
sys 0m0.011s
I'll have a look at it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-31 15:58 ` Peter Zijlstra
@ 2008-01-31 20:54 ` Rafael J. Wysocki
2008-02-01 12:04 ` Peter Zijlstra
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-01-31 20:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko
On Thursday, 31 of January 2008, Peter Zijlstra wrote:
>
> On Mon, 2008-01-28 at 02:26 +0100, Rafael J. Wysocki wrote:
> > On Sunday, 27 of January 2008, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > >
> > > > Hi,
> > > >
> > > > 2.6.24-git3 adds a 5 - 10 sec delay to the suspend and hibernation
> > > > code paths (probably related to the disabling of nonboot CPUs), which
> > > > is !@#$%^&*() annoying.
> > > >
> > > > It's 100% reproducible on my HP nx6325 and bisection idendified the
> > > > following commit as the first bad one:
> > > >
> > > > commit 764a9d6fe4b52995c8aba277e3634385699354f4
> > > > Author: Steven Rostedt <srostedt@redhat.com>
> > > > Date: Fri Jan 25 21:08:04 2008 +0100
> > > >
> > > > sched: track highest prio task queued
> > >
> > > hm, this patch is a NOP, so it's weird that it has an effect.
> > >
> > > Do you have serial logging enabled perhaps? If the following WARN_ON()
> > > triggers:
> > >
> > > + WARN_ON(p->prio < rq->rt.highest_prio);
> > >
> > > then perhaps that can cause a 5-10 seconds delay. (that's how much time
> > > it takes to printk a warning on the slowest serial settings)
> > >
> > > but if you use suspend, then any such printks would be preserved in the
> > > dmesg, right? If the WARN_ON() triggers, and if you remove it, do things
> > > get faster?
> >
> > No, this isn't the WARN_ON().
> >
> > > this does have the feel of being scheduling related, but are you
> > > absolutely sure about the precise identity of the patch?
> >
> > Actually, not quite. That's why I have verified it and found that another
> > patch is really responsible for the issue, namely:
> >
> > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Fri Jan 25 21:08:02 2008 +0100
> >
> > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
> >
> > Reverting this commit (it reverts with some minor modifications) fixes the
> > problem for me.
>
> I can seem to reproduce this:
>
> [root@opteron cpu1]# time echo 0 > online
>
> real 0m6.230s
> user 0m0.000s
> sys 0m0.010s
> [root@opteron cpu1]# echo 1 > online
> [root@opteron cpu1]# time echo 0 > online
>
> real 0m7.966s
> user 0m0.000s
> sys 0m0.011s
>
>
> I'll have a look at it.
Much appreciated, thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-01-31 20:54 ` Rafael J. Wysocki
@ 2008-02-01 12:04 ` Peter Zijlstra
2008-02-01 12:47 ` Ingo Molnar
2008-02-01 15:11 ` Dmitry Adamushko
0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2008-02-01 12:04 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML,
Dmitry Adamushko
On Thu, 2008-01-31 at 21:54 +0100, Rafael J. Wysocki wrote:
> On Thursday, 31 of January 2008, Peter Zijlstra wrote:
> > I can seem to reproduce this:
> >
> > [root@opteron cpu1]# time echo 0 > online
> >
> > real 0m6.230s
> > user 0m0.000s
> > sys 0m0.010s
> > [root@opteron cpu1]# echo 1 > online
> > [root@opteron cpu1]# time echo 0 > online
> >
> > real 0m7.966s
> > user 0m0.000s
> > sys 0m0.011s
> >
> >
> > I'll have a look at it.
>
> Much appreciated, thanks!
The below fixes it for me..
---
- restore the old wakeup mechanism
- fix break usage in do_each_thread() { } while_eac_thread().
- fix the hotplug switch stmt, a fall-through case was broken.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index c1d7655..7c2da88 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -101,6 +101,10 @@ void softlockup_tick(void)
now = get_timestamp(this_cpu);
+ /* Wake up the high-prio watchdog task every second: */
+ if (now > (touch_timestamp + 1))
+ wake_up_process(per_cpu(watchdog_task, this_cpu));
+
/* Warn about unreasonable delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
@@ -191,11 +195,11 @@ static void check_hung_uninterruptible_tasks(int this_cpu)
read_lock(&tasklist_lock);
do_each_thread(g, t) {
if (!--max_count)
- break;
+ goto unlock;
if (t->state & TASK_UNINTERRUPTIBLE)
check_hung_task(t, now);
} while_each_thread(g, t);
-
+ unlock:
read_unlock(&tasklist_lock);
}
@@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
* debug-printout triggers in softlockup_tick().
*/
while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
- msleep_interruptible(10000);
+ schedule();
+
+ if (kthread_should_stop())
+ break;
if (this_cpu != check_cpu)
continue;
if (sysctl_hung_task_timeout_secs)
check_hung_uninterruptible_tasks(this_cpu);
+
}
return 0;
@@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
wake_up_process(per_cpu(watchdog_task, hotcpu));
break;
#ifdef CONFIG_HOTPLUG_CPU
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- if (!per_cpu(watchdog_task, hotcpu))
- break;
- /* Unbind so it can run. Fall thru. */
- kthread_bind(per_cpu(watchdog_task, hotcpu),
- any_online_cpu(cpu_online_map));
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
if (hotcpu == check_cpu) {
@@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
check_cpu = any_online_cpu(temp_cpu_online_map);
}
break;
+
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ if (!per_cpu(watchdog_task, hotcpu))
+ break;
+ /* Unbind so it can run. Fall thru. */
+ kthread_bind(per_cpu(watchdog_task, hotcpu),
+ any_online_cpu(cpu_online_map));
case CPU_DEAD:
case CPU_DEAD_FROZEN:
p = per_cpu(watchdog_task, hotcpu);
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 12:04 ` Peter Zijlstra
@ 2008-02-01 12:47 ` Ingo Molnar
2008-02-01 14:42 ` Rafael J. Wysocki
2008-02-01 15:11 ` Dmitry Adamushko
1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 12:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Steven Rostedt, Andrew Morton, Linus Torvalds,
LKML, Dmitry Adamushko
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> The below fixes it for me..
>
> ---
> - restore the old wakeup mechanism
> - fix break usage in do_each_thread() { } while_eac_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.
thanks Peter, i've picked your fix up. I hope this solves the problem
for Rafael too.
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 12:47 ` Ingo Molnar
@ 2008-02-01 14:42 ` Rafael J. Wysocki
2008-02-01 23:19 ` Rafael J. Wysocki
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 14:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
LKML, Dmitry Adamushko
On Friday, 1 of February 2008, Ingo Molnar wrote:
>
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > The below fixes it for me..
> >
> > ---
> > - restore the old wakeup mechanism
> > - fix break usage in do_each_thread() { } while_eac_thread().
> > - fix the hotplug switch stmt, a fall-through case was broken.
>
> thanks Peter, i've picked your fix up. I hope this solves the problem
> for Rafael too.
No, it doesn't, unfortunately. Actually, it breaks things even more, as
suspend now hangs (probably deadlocks) while disabling the nonboot CPU.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 12:04 ` Peter Zijlstra
2008-02-01 12:47 ` Ingo Molnar
@ 2008-02-01 15:11 ` Dmitry Adamushko
2008-02-01 17:10 ` Ingo Molnar
1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 15:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Ingo Molnar, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> ---
> - restore the old wakeup mechanism
and how does it change behavior, logically-wise?
do we somehow miss a 'wake-up' from kthread_stop() so that its caller
gets blocked until watchdog's msleep_interruptible(10000) timeouts? On
average, it would take +-5 sec. and might explain the first
observation of Ravael -- "...adds a 5 - 10 sec delay..." (although,
lately he reported up to +30 sec. delays).
(/me goint to also try reproducing it later today)
> [ ... ]
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 15:11 ` Dmitry Adamushko
@ 2008-02-01 17:10 ` Ingo Molnar
2008-02-01 21:54 ` Dmitry Adamushko
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 17:10 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> On 01/02/2008, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > ---
> > - restore the old wakeup mechanism
>
> and how does it change behavior, logically-wise?
>
> do we somehow miss a 'wake-up' from kthread_stop() so that its caller
> gets blocked until watchdog's msleep_interruptible(10000) timeouts? On
> average, it would take +-5 sec. and might explain the first
> observation of Ravael -- "...adds a 5 - 10 sec delay..." (although,
> lately he reported up to +30 sec. delays).
>
> (/me goint to also try reproducing it later today)
thanks - i cannot reproduce it on my usual suspend/resume testbox
because e1000 broke on it, and this is a pretty annoying regression.
We'll have to undo the hung-tasks detection feature if it's not fixed
quickly. (there's no point in debugging features that _add_ bugs)
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 17:10 ` Ingo Molnar
@ 2008-02-01 21:54 ` Dmitry Adamushko
2008-02-01 22:44 ` Dmitry Adamushko
0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 21:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> thanks - i cannot reproduce it on my usual suspend/resume testbox
> because e1000 broke on it, and this is a pretty annoying regression.
> We'll have to undo the hung-tasks detection feature if it's not fixed
> quickly. (there's no point in debugging features that _add_ bugs)
I can reproduce it and Peter's patch does fix it for me.
e.g.
root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
real 0m6.743s
root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
real 0m7.770s
I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the
10s. delay of msleep_interruptible() might be related but
I'm still looking for the reason why this fix helps (and what goes
wrong with the current code).
>
> Ingo
>
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 21:54 ` Dmitry Adamushko
@ 2008-02-01 22:44 ` Dmitry Adamushko
2008-02-01 22:48 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 22:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
On 01/02/2008, Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > thanks - i cannot reproduce it on my usual suspend/resume testbox
> > because e1000 broke on it, and this is a pretty annoying regression.
> > We'll have to undo the hung-tasks detection feature if it's not fixed
> > quickly. (there's no point in debugging features that _add_ bugs)
>
> I can reproduce it and Peter's patch does fix it for me.
>
> e.g.
> root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
> real 0m6.743s
> root@earth:/sys/devices/system/cpu/cpu1# time echo 0 > online
> real 0m7.770s
>
> I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so the
> 10s. delay of msleep_interruptible() might be related but
> I'm still looking for the reason why this fix helps (and what goes
> wrong with the current code).
>
heh... it's pretty obvious indeed. What's msleep_interruptible() is
all about? :-)
"sleep waiting for signals"
so the 'watchdog' thread gets woken up
[ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
just to be immediately scheduled out again for as long as the
remaining timeout > 0.
So it should work if we substitute msleep_interruptible() with
schedule_timeout_interruptible().
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 22:44 ` Dmitry Adamushko
@ 2008-02-01 22:48 ` Ingo Molnar
2008-02-01 23:22 ` Dmitry Adamushko
0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 22:48 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so
> > the 10s. delay of msleep_interruptible() might be related but I'm
> > still looking for the reason why this fix helps (and what goes wrong
> > with the current code).
>
> heh... it's pretty obvious indeed. What's msleep_interruptible() is
> all about? :-)
>
> "sleep waiting for signals"
>
> so the 'watchdog' thread gets woken up
>
> [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
>
> just to be immediately scheduled out again for as long as the
> remaining timeout > 0.
>
> So it should work if we substitute msleep_interruptible() with
> schedule_timeout_interruptible().
Doh. Could someone with ths problem please test the patch below, does it
do the trick?
Ingo
---
kernel/softlockup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -228,7 +228,7 @@ static int watchdog(void *__bind_cpu)
*/
while (!kthread_should_stop()) {
touch_softlockup_watchdog();
- msleep_interruptible(10000);
+ schedule_timeout_interruptible(10*HZ);
if (this_cpu != check_cpu)
continue;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 14:42 ` Rafael J. Wysocki
@ 2008-02-01 23:19 ` Rafael J. Wysocki
2008-02-01 23:24 ` Ingo Molnar
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 23:19 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
LKML, Dmitry Adamushko
On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Ingo Molnar wrote:
> >
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > > The below fixes it for me..
> > >
> > > ---
> > > - restore the old wakeup mechanism
> > > - fix break usage in do_each_thread() { } while_eac_thread().
> > > - fix the hotplug switch stmt, a fall-through case was broken.
> >
> > thanks Peter, i've picked your fix up. I hope this solves the problem
> > for Rafael too.
>
> No, it doesn't, unfortunately. Actually, it breaks things even more, as
> suspend now hangs (probably deadlocks) while disabling the nonboot CPU.
For some obscure reason, I'm no longer able to reproduce the problem
with the Peter's patch and I think it's correct. You can add
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
to it.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 22:48 ` Ingo Molnar
@ 2008-02-01 23:22 ` Dmitry Adamushko
2008-02-01 23:25 ` Ingo Molnar
2008-02-01 23:26 ` Rafael J. Wysocki
0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-01 23:22 UTC (permalink / raw)
To: Ingo Molnar, Rafael J. Wysocki
Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds, LKML
On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
>
> > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so
> > > the 10s. delay of msleep_interruptible() might be related but I'm
> > > still looking for the reason why this fix helps (and what goes wrong
> > > with the current code).
> >
> > heh... it's pretty obvious indeed. What's msleep_interruptible() is
> > all about? :-)
> >
> > "sleep waiting for signals"
> >
> > so the 'watchdog' thread gets woken up
> >
> > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
> >
> > just to be immediately scheduled out again for as long as the
> > remaining timeout > 0.
> >
> > So it should work if we substitute msleep_interruptible() with
> > schedule_timeout_interruptible().
>
> Doh. Could someone with ths problem please test the patch below, does it
> do the trick?
yeah, I was already on a half-way to check it out.
It does fix a problem for me.
Don't forget to take along these 2 fixes from Peter's patch:
- fix break usage in do_each_thread() { } while_each_thread().
- fix the hotplug switch stmt, a fall-through case was broken.
Rafael, does your system still hangs? I'd expect, yes -- as
effectively this fix is not that different from Peter's one when it
comes to suspend-case.
Can you then send your config? Anything special about this machine
(say, some /proc tunables are altered)?
>
> Ingo
>
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 23:19 ` Rafael J. Wysocki
@ 2008-02-01 23:24 ` Ingo Molnar
0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 23:24 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, Steven Rostedt, Andrew Morton, Linus Torvalds,
LKML, Dmitry Adamushko
* Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, 1 of February 2008, Rafael J. Wysocki wrote:
> > On Friday, 1 of February 2008, Ingo Molnar wrote:
> > >
> > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > >
> > > > The below fixes it for me..
> > > >
> > > > ---
> > > > - restore the old wakeup mechanism
> > > > - fix break usage in do_each_thread() { } while_eac_thread().
> > > > - fix the hotplug switch stmt, a fall-through case was broken.
> > >
> > > thanks Peter, i've picked your fix up. I hope this solves the problem
> > > for Rafael too.
> >
> > No, it doesn't, unfortunately. Actually, it breaks things even more, as
> > suspend now hangs (probably deadlocks) while disabling the nonboot CPU.
>
> For some obscure reason, I'm no longer able to reproduce the problem
> with the Peter's patch and I think it's correct. You can add
>
> Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
>
> to it.
Great! I sent the fix to Linus with your Tested-by line. Thanks Rafael
for all the testing!
Ingo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 23:22 ` Dmitry Adamushko
@ 2008-02-01 23:25 ` Ingo Molnar
2008-02-02 0:03 ` Dmitry Adamushko
2008-02-01 23:26 ` Rafael J. Wysocki
1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2008-02-01 23:25 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> yeah, I was already on a half-way to check it out.
>
> It does fix a problem for me.
>
> Don't forget to take along these 2 fixes from Peter's patch:
>
> - fix break usage in do_each_thread() { } while_each_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.
Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it
fixes all the practical and theoretical problems you could see with the
code too?
Ingo
--------------->
Subject: debug: softlockup looping fix
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Rafael J. Wysocki reported weird, multi-seconds delays during
suspend/resume and bisected it back to:
commit 82a1fcb90287052aabfa235e7ffc693ea003fe69
Author: Ingo Molnar <mingo@elte.hu>
Date: Fri Jan 25 21:08:02 2008 +0100
softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks
fix it:
- restore the old wakeup mechanism
- fix break usage in do_each_thread() { } while_each_thread().
- fix the hotplug switch stmt, a fall-through case was broken.
Bisected-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/softlockup.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
Index: linux/kernel/softlockup.c
===================================================================
--- linux.orig/kernel/softlockup.c
+++ linux/kernel/softlockup.c
@@ -101,6 +101,10 @@ void softlockup_tick(void)
now = get_timestamp(this_cpu);
+ /* Wake up the high-prio watchdog task every second: */
+ if (now > (touch_timestamp + 1))
+ wake_up_process(per_cpu(watchdog_task, this_cpu));
+
/* Warn about unreasonable delays: */
if (now <= (touch_timestamp + softlockup_thresh))
return;
@@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t
read_lock(&tasklist_lock);
do_each_thread(g, t) {
if (!--max_count)
- break;
+ goto unlock;
if (t->state & TASK_UNINTERRUPTIBLE)
check_hung_task(t, now);
} while_each_thread(g, t);
-
+ unlock:
read_unlock(&tasklist_lock);
}
@@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu)
* debug-printout triggers in softlockup_tick().
*/
while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
- msleep_interruptible(10000);
+ schedule();
+
+ if (kthread_should_stop())
+ break;
if (this_cpu != check_cpu)
continue;
if (sysctl_hung_task_timeout_secs)
check_hung_uninterruptible_tasks(this_cpu);
+
}
return 0;
@@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb,
wake_up_process(per_cpu(watchdog_task, hotcpu));
break;
#ifdef CONFIG_HOTPLUG_CPU
- case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
- if (!per_cpu(watchdog_task, hotcpu))
- break;
- /* Unbind so it can run. Fall thru. */
- kthread_bind(per_cpu(watchdog_task, hotcpu),
- any_online_cpu(cpu_online_map));
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
if (hotcpu == check_cpu) {
@@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb,
check_cpu = any_online_cpu(temp_cpu_online_map);
}
break;
+
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ if (!per_cpu(watchdog_task, hotcpu))
+ break;
+ /* Unbind so it can run. Fall thru. */
+ kthread_bind(per_cpu(watchdog_task, hotcpu),
+ any_online_cpu(cpu_online_map));
case CPU_DEAD:
case CPU_DEAD_FROZEN:
p = per_cpu(watchdog_task, hotcpu);
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 23:22 ` Dmitry Adamushko
2008-02-01 23:25 ` Ingo Molnar
@ 2008-02-01 23:26 ` Rafael J. Wysocki
1 sibling, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-02-01 23:26 UTC (permalink / raw)
To: Dmitry Adamushko
Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
On Saturday, 2 of February 2008, Dmitry Adamushko wrote:
> On 01/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> >
> > > > I've observed delays from ~3 s. up to ~8 s. (out of ~20 tests) so
> > > > the 10s. delay of msleep_interruptible() might be related but I'm
> > > > still looking for the reason why this fix helps (and what goes wrong
> > > > with the current code).
> > >
> > > heh... it's pretty obvious indeed. What's msleep_interruptible() is
> > > all about? :-)
> > >
> > > "sleep waiting for signals"
> > >
> > > so the 'watchdog' thread gets woken up
> > >
> > > [ as a result of cpu_callback(action = CPU_DEAD) --> kthread_stop() ]
> > >
> > > just to be immediately scheduled out again for as long as the
> > > remaining timeout > 0.
> > >
> > > So it should work if we substitute msleep_interruptible() with
> > > schedule_timeout_interruptible().
> >
> > Doh. Could someone with ths problem please test the patch below, does it
> > do the trick?
>
> yeah, I was already on a half-way to check it out.
>
> It does fix a problem for me.
>
> Don't forget to take along these 2 fixes from Peter's patch:
>
> - fix break usage in do_each_thread() { } while_each_thread().
> - fix the hotplug switch stmt, a fall-through case was broken.
>
>
> Rafael, does your system still hangs? I'd expect, yes -- as
> effectively this fix is not that different from Peter's one when it
> comes to suspend-case.
I haven't tested it yet, but I don't expect it to hang, as the Peter's patch
started to work for me. I'm going to test it in a while.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected)
2008-02-01 23:25 ` Ingo Molnar
@ 2008-02-02 0:03 ` Dmitry Adamushko
0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2008-02-02 0:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: Rafael J. Wysocki, Peter Zijlstra, Steven Rostedt, Andrew Morton,
Linus Torvalds, LKML
On 02/02/2008, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
>
> > yeah, I was already on a half-way to check it out.
> >
> > It does fix a problem for me.
> >
> > Don't forget to take along these 2 fixes from Peter's patch:
> >
> > - fix break usage in do_each_thread() { } while_each_thread().
> > - fix the hotplug switch stmt, a fall-through case was broken.
>
> Dmitry, i sent Peter's fix(es) below to Linus. Do you concur that it
> fixes all the practical and theoretical problems you could see with the
> code too?
>
One comment: any argument for waking up the 'watchdog' thread from
softlockup_tick() wrt just having schedule_timeout_interruptible(HZ)
in watchdog() ?
(although, it's not that important).
IMHO, it looks good.
(I'll look again tomorrow when I'm less sleepy :-/)
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2008-02-02 0:03 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-27 21:29 [Regression] 2.6.24-git3: Major annoyance during suspend/hibernation on x86-64 (bisected) Rafael J. Wysocki
2008-01-27 21:59 ` Ingo Molnar
2008-01-28 1:26 ` Rafael J. Wysocki
2008-01-28 1:40 ` Steven Rostedt
2008-01-28 11:31 ` Rafael J. Wysocki
2008-01-28 16:31 ` Rafael J. Wysocki
2008-01-28 16:46 ` Steven Rostedt
2008-01-29 0:08 ` Rafael J. Wysocki
2008-01-28 8:56 ` Dmitry Adamushko
2008-01-28 11:32 ` Rafael J. Wysocki
2008-01-28 16:08 ` Rafael J. Wysocki
2008-01-31 15:58 ` Peter Zijlstra
2008-01-31 20:54 ` Rafael J. Wysocki
2008-02-01 12:04 ` Peter Zijlstra
2008-02-01 12:47 ` Ingo Molnar
2008-02-01 14:42 ` Rafael J. Wysocki
2008-02-01 23:19 ` Rafael J. Wysocki
2008-02-01 23:24 ` Ingo Molnar
2008-02-01 15:11 ` Dmitry Adamushko
2008-02-01 17:10 ` Ingo Molnar
2008-02-01 21:54 ` Dmitry Adamushko
2008-02-01 22:44 ` Dmitry Adamushko
2008-02-01 22:48 ` Ingo Molnar
2008-02-01 23:22 ` Dmitry Adamushko
2008-02-01 23:25 ` Ingo Molnar
2008-02-02 0:03 ` Dmitry Adamushko
2008-02-01 23:26 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).