LKML Archive on
help / color / mirror / Atom feed
From: brookxu <>
To: Thomas Gleixner <>,,
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable
Date: Fri, 13 Aug 2021 08:54:12 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <87fsvfhr2b.ffs@tglx>

Thomas Gleixner wrote on 2021/8/12 6:53 下午:
> On Wed, Aug 11 2021 at 23:26, brookxu wrote:
>> Thomas Gleixner wrote on 2021/8/11 22:01:
>>>> To be precise, we are processing interrupts in handle_edge_irq() for a long
>>>> time. Since the interrupts of multiple hardware queues are mapped to a single
>>>> CPU, multiple cores are continuously issuing IO, and then a single core is
>>>> processing IO. Perhaps the test case can be optimized, but shouldn't this lead
>>>> to switching clocks in principle?
>>> The clocksource watchdog failure is only _ONE_ consequence. Processing
>>> hard interrupts for 155 seconds straight will trigger lockup detectors
>>> of all sorts if you have them enabled.
>>> So just papering over the clocksource watchdog does not solve anything,
>>> really. Next week you have to add similar hacks to the lockup detectors,
>>> RCU and whatever.
>> Yeah, we have observed soft lockup and RCU stall, but these behaviors are
>> expected because the current CPU scheduling is disabled. However, marking
>> TSC unstable is inconsistent with the actual situation. The worst problem
>> is that after the clocksource switched to hpet, the abnormal time will be
>> greatly prolonged due to the degradation of performance. We have not found
>> that soft lockup and RCU stall will affect the machine for a long time in
>> this test. Aside from these, as the watchdog is scheduled periodically, when
>> wd_nsec is 0, it means that something maybe abnormal, do we readlly still
>> need to continue to verify TSC? and how to ensure the correctness of the
>> results?
> Sorry no. While softlockups and RCU stalls might have no long term
> effect in the first place, this argumentation vs. the clocksource
> watchdog is just a strawman. You're abusing the system in a way which
> causes it to malfunction so you have to live with the consequences.
> Aside of that this 'workaround' is just duct taping a particular part of
> the problem. What guarantees that after the interrupt storm subsided the
> clocksource delta of the watchdog becomes 0 (negative)?
> Absolutely nothing. The delta can be positive, but then the watchdog and
> the TSC are not in sync anymore which will disable the TSC as well.
> A 24MHz HPET has a wraparound time of ~178s which means during:
>   89s < tdelta < 178s
> your hack papers over the problem. Any interrupt storm time outside of
> that window results in fail.
> Now run the same test on a machine with a 14MHz HPET and you get
>  153s < tdelta < 306s
> so your 155s interrupt storm barely fits. And what are you doing with
> your next test which runs only 80 seconds?
> Not to talk about the fact that you wreckage detection of a watchdog
> clocksource going stale.
> So no, we are not adding hacks to support abuse.
> What we really want to do is to add detection for interrupt storms of
> this sort and shut those interrupts down for good.

ok, thanks for your suggestion.

> Thanks,
>         tglx
> ---
> Patient: "Doctor, it hurts when I hammer on my toe."
> Doctor:  "Don't do that then!"

      reply	other threads:[~2021-08-13  0:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11  9:55 brookxu
2021-08-11 12:44 ` Thomas Gleixner
2021-08-11 13:18   ` brookxu
2021-08-11 14:01     ` Thomas Gleixner
2021-08-11 15:26       ` brookxu
2021-08-12 10:53         ` Thomas Gleixner
2021-08-13  0:54           ` brookxu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \
    --subject='Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).