LKML Archive on
help / color / mirror / Atom feed
From: Thomas Gleixner <>
To: brookxu <>,,
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable
Date: Thu, 12 Aug 2021 12:53:48 +0200	[thread overview]
Message-ID: <87fsvfhr2b.ffs@tglx> (raw)
In-Reply-To: <>

On Wed, Aug 11 2021 at 23:26, brookxu wrote:
> Thomas Gleixner wrote on 2021/8/11 22:01:
>>> To be precise, we are processing interrupts in handle_edge_irq() for a long
>>> time. Since the interrupts of multiple hardware queues are mapped to a single
>>> CPU, multiple cores are continuously issuing IO, and then a single core is
>>> processing IO. Perhaps the test case can be optimized, but shouldn't this lead
>>> to switching clocks in principle?
>> The clocksource watchdog failure is only _ONE_ consequence. Processing
>> hard interrupts for 155 seconds straight will trigger lockup detectors
>> of all sorts if you have them enabled.
>> So just papering over the clocksource watchdog does not solve anything,
>> really. Next week you have to add similar hacks to the lockup detectors,
>> RCU and whatever.
> Yeah, we have observed soft lockup and RCU stall, but these behaviors are
> expected because the current CPU scheduling is disabled. However, marking
> TSC unstable is inconsistent with the actual situation. The worst problem
> is that after the clocksource switched to hpet, the abnormal time will be
> greatly prolonged due to the degradation of performance. We have not found
> that soft lockup and RCU stall will affect the machine for a long time in
> this test. Aside from these, as the watchdog is scheduled periodically, when
> wd_nsec is 0, it means that something maybe abnormal, do we readlly still
> need to continue to verify TSC? and how to ensure the correctness of the
> results?

Sorry no. While softlockups and RCU stalls might have no long term
effect in the first place, this argumentation vs. the clocksource
watchdog is just a strawman. You're abusing the system in a way which
causes it to malfunction so you have to live with the consequences.

Aside of that this 'workaround' is just duct taping a particular part of
the problem. What guarantees that after the interrupt storm subsided the
clocksource delta of the watchdog becomes 0 (negative)?

Absolutely nothing. The delta can be positive, but then the watchdog and
the TSC are not in sync anymore which will disable the TSC as well.

A 24MHz HPET has a wraparound time of ~178s which means during:

  89s < tdelta < 178s

your hack papers over the problem. Any interrupt storm time outside of
that window results in fail.

Now run the same test on a machine with a 14MHz HPET and you get

 153s < tdelta < 306s

so your 155s interrupt storm barely fits. And what are you doing with
your next test which runs only 80 seconds?

Not to talk about the fact that you wreckage detection of a watchdog
clocksource going stale.

So no, we are not adding hacks to support abuse.

What we really want to do is to add detection for interrupt storms of
this sort and shut those interrupts down for good.


Patient: "Doctor, it hurts when I hammer on my toe."
Doctor:  "Don't do that then!"

  reply	other threads:[~2021-08-12 10:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-11  9:55 brookxu
2021-08-11 12:44 ` Thomas Gleixner
2021-08-11 13:18   ` brookxu
2021-08-11 14:01     ` Thomas Gleixner
2021-08-11 15:26       ` brookxu
2021-08-12 10:53         ` Thomas Gleixner [this message]
2021-08-13  0:54           ` brookxu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsvfhr2b.ffs@tglx \ \ \ \ \ \
    --subject='Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).