LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: brookxu <brookxu.cn@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	john.stultz@linaro.org, sboyd@kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable
Date: Fri, 13 Aug 2021 08:54:12 +0800	[thread overview]
Message-ID: <5ef35e9e-3a91-282c-4254-0abcd10e0a7f@gmail.com> (raw)
In-Reply-To: <87fsvfhr2b.ffs@tglx>



Thomas Gleixner wrote on 2021/8/12 6:53 下午:
> On Wed, Aug 11 2021 at 23:26, brookxu wrote:
>> Thomas Gleixner wrote on 2021/8/11 22:01:
>>>> To be precise, we are processing interrupts in handle_edge_irq() for a long
>>>> time. Since the interrupts of multiple hardware queues are mapped to a single
>>>> CPU, multiple cores are continuously issuing IO, and then a single core is
>>>> processing IO. Perhaps the test case can be optimized, but shouldn't this lead
>>>> to switching clocks in principle?
>>>
>>> The clocksource watchdog failure is only _ONE_ consequence. Processing
>>> hard interrupts for 155 seconds straight will trigger lockup detectors
>>> of all sorts if you have them enabled.
>>>
>>> So just papering over the clocksource watchdog does not solve anything,
>>> really. Next week you have to add similar hacks to the lockup detectors,
>>> RCU and whatever.
>>
>> Yeah, we have observed soft lockup and RCU stall, but these behaviors are
>> expected because the current CPU scheduling is disabled. However, marking
>> TSC unstable is inconsistent with the actual situation. The worst problem
>> is that after the clocksource switched to hpet, the abnormal time will be
>> greatly prolonged due to the degradation of performance. We have not found
>> that soft lockup and RCU stall will affect the machine for a long time in
>> this test. Aside from these, as the watchdog is scheduled periodically, when
>> wd_nsec is 0, it means that something maybe abnormal, do we readlly still
>> need to continue to verify TSC? and how to ensure the correctness of the
>> results?
> 
> Sorry no. While softlockups and RCU stalls might have no long term
> effect in the first place, this argumentation vs. the clocksource
> watchdog is just a strawman. You're abusing the system in a way which
> causes it to malfunction so you have to live with the consequences.
> 
> Aside of that this 'workaround' is just duct taping a particular part of
> the problem. What guarantees that after the interrupt storm subsided the
> clocksource delta of the watchdog becomes 0 (negative)?
> 
> Absolutely nothing. The delta can be positive, but then the watchdog and
> the TSC are not in sync anymore which will disable the TSC as well.
> 
> A 24MHz HPET has a wraparound time of ~178s which means during:
> 
>   89s < tdelta < 178s
> 
> your hack papers over the problem. Any interrupt storm time outside of
> that window results in fail.
> 
> Now run the same test on a machine with a 14MHz HPET and you get
> 
>  153s < tdelta < 306s
> 
> so your 155s interrupt storm barely fits. And what are you doing with
> your next test which runs only 80 seconds?
> 
> Not to talk about the fact that you wreckage detection of a watchdog
> clocksource going stale.
> 
> So no, we are not adding hacks to support abuse.
> 
> What we really want to do is to add detection for interrupt storms of
> this sort and shut those interrupts down for good.

ok, thanks for your suggestion.

> Thanks,
> 
>         tglx
> ---
> Patient: "Doctor, it hurts when I hammer on my toe."
> Doctor:  "Don't do that then!"
> 
> 

      reply	other threads:[~2021-08-13  0:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11  9:55 brookxu
2021-08-11 12:44 ` Thomas Gleixner
2021-08-11 13:18   ` brookxu
2021-08-11 14:01     ` Thomas Gleixner
2021-08-11 15:26       ` brookxu
2021-08-12 10:53         ` Thomas Gleixner
2021-08-13  0:54           ` brookxu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ef35e9e-3a91-282c-4254-0abcd10e0a7f@gmail.com \
    --to=brookxu.cn@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --subject='Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).