LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: brookxu <brookxu.cn@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>,
john.stultz@linaro.org, sboyd@kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable
Date: Fri, 13 Aug 2021 08:54:12 +0800 [thread overview]
Message-ID: <5ef35e9e-3a91-282c-4254-0abcd10e0a7f@gmail.com> (raw)
In-Reply-To: <87fsvfhr2b.ffs@tglx>
Thomas Gleixner wrote on 2021/8/12 6:53 下午:
> On Wed, Aug 11 2021 at 23:26, brookxu wrote:
>> Thomas Gleixner wrote on 2021/8/11 22:01:
>>>> To be precise, we are processing interrupts in handle_edge_irq() for a long
>>>> time. Since the interrupts of multiple hardware queues are mapped to a single
>>>> CPU, multiple cores are continuously issuing IO, and then a single core is
>>>> processing IO. Perhaps the test case can be optimized, but shouldn't this lead
>>>> to switching clocks in principle?
>>>
>>> The clocksource watchdog failure is only _ONE_ consequence. Processing
>>> hard interrupts for 155 seconds straight will trigger lockup detectors
>>> of all sorts if you have them enabled.
>>>
>>> So just papering over the clocksource watchdog does not solve anything,
>>> really. Next week you have to add similar hacks to the lockup detectors,
>>> RCU and whatever.
>>
>> Yeah, we have observed soft lockup and RCU stall, but these behaviors are
>> expected because the current CPU scheduling is disabled. However, marking
>> TSC unstable is inconsistent with the actual situation. The worst problem
>> is that after the clocksource switched to hpet, the abnormal time will be
>> greatly prolonged due to the degradation of performance. We have not found
>> that soft lockup and RCU stall will affect the machine for a long time in
>> this test. Aside from these, as the watchdog is scheduled periodically, when
>> wd_nsec is 0, it means that something maybe abnormal, do we readlly still
>> need to continue to verify TSC? and how to ensure the correctness of the
>> results?
>
> Sorry no. While softlockups and RCU stalls might have no long term
> effect in the first place, this argumentation vs. the clocksource
> watchdog is just a strawman. You're abusing the system in a way which
> causes it to malfunction so you have to live with the consequences.
>
> Aside of that this 'workaround' is just duct taping a particular part of
> the problem. What guarantees that after the interrupt storm subsided the
> clocksource delta of the watchdog becomes 0 (negative)?
>
> Absolutely nothing. The delta can be positive, but then the watchdog and
> the TSC are not in sync anymore which will disable the TSC as well.
>
> A 24MHz HPET has a wraparound time of ~178s which means during:
>
> 89s < tdelta < 178s
>
> your hack papers over the problem. Any interrupt storm time outside of
> that window results in fail.
>
> Now run the same test on a machine with a 14MHz HPET and you get
>
> 153s < tdelta < 306s
>
> so your 155s interrupt storm barely fits. And what are you doing with
> your next test which runs only 80 seconds?
>
> Not to talk about the fact that you wreckage detection of a watchdog
> clocksource going stale.
>
> So no, we are not adding hacks to support abuse.
>
> What we really want to do is to add detection for interrupt storms of
> this sort and shut those interrupts down for good.
ok, thanks for your suggestion.
> Thanks,
>
> tglx
> ---
> Patient: "Doctor, it hurts when I hammer on my toe."
> Doctor: "Don't do that then!"
>
>
prev parent reply other threads:[~2021-08-13 0:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-11 9:55 brookxu
2021-08-11 12:44 ` Thomas Gleixner
2021-08-11 13:18 ` brookxu
2021-08-11 14:01 ` Thomas Gleixner
2021-08-11 15:26 ` brookxu
2021-08-12 10:53 ` Thomas Gleixner
2021-08-13 0:54 ` brookxu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ef35e9e-3a91-282c-4254-0abcd10e0a7f@gmail.com \
--to=brookxu.cn@gmail.com \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--subject='Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).