LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Chao Gao <chao.gao@intel.com>
Cc: Feng Tang <feng.tang@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Chris Mason <clm@fb.com>, LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	lkp@lists.01.org, lkp@intel.com, ying.huang@intel.com,
	zhengjun.xing@intel.com
Subject: Re: [clocksource]  8901ecc231:  stress-ng.lockbus.ops_per_sec -9.5% regression
Date: Tue, 3 Aug 2021 06:48:16 -0700	[thread overview]
Message-ID: <20210803134816.GO4397@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20210803085759.GA31621@gao-cwp>

On Tue, Aug 03, 2021 at 04:58:00PM +0800, Chao Gao wrote:
> On Mon, Aug 02, 2021 at 10:02:57AM -0700, Paul E. McKenney wrote:
> >On Mon, Aug 02, 2021 at 02:20:09PM +0800, Chao Gao wrote:
> >> [snip]
> >> >commit 48ebcfbfd877f5d9cddcc03c91352a8ca7b190af
> >> >Author: Paul E. McKenney <paulmck@kernel.org>
> >> >Date:   Thu May 27 11:03:28 2021 -0700
> >> >
> >> >    clocksource: Forgive repeated long-latency watchdog clocksource reads
> >> >    
> >> >    Currently, the clocksource watchdog reacts to repeated long-latency
> >> >    clocksource reads by marking that clocksource unstable on the theory that
> >> >    these long-latency reads are a sign of a serious problem.  And this theory
> >> >    does in fact have real-world support in the form of firmware issues [1].
> >> >    
> >> >    However, it is also possible to trigger this using stress-ng on what
> >> >    the stress-ng man page terms "poorly designed hardware" [2].  And it
> >> >    is not necessarily a bad thing for the kernel to diagnose cases where
> >> >    high-stress workloads are being run on hardware that is not designed
> >> >    for this sort of use.
> >> >    
> >> >    Nevertheless, it is quite possible that real-world use will result in
> >> >    some situation requiring that high-stress workloads run on hardware
> >> >    not designed to accommodate them, and also requiring that the kernel
> >> >    refrain from marking clocksources unstable.
> >> >    
> >> >    Therefore, provide an out-of-tree patch that reacts to this situation
> >> >    by leaving the clocksource alone, but using the old 62.5-millisecond
> >> >    skew-detection threshold in response persistent long-latency reads.
> >> >    In addition, the offending clocksource is marked for re-initialization
> >> >    in this case, which both restarts that clocksource with a clean bill of
> >> >    health and avoids false-positive skew reports on later watchdog checks.
> >> 
> >> Hi Paul,
> >> 
> >> Sorry to dig out this old thread.
> >
> >Not a problem, especially given that this is still an experimental patch
> >(marked with "EXP" in -rcu).  So one remaining question is "what is this
> >patch really supposed to do, if anything?".
> 
> We are testing with TDX [1] and analyzing why kernel in a TD, or Trust Domain,
> sometimes spots a large TSC skew. We have inspected tsc hardware/ucode/tdx
> module to ensure no hardware issue, and also ported tsc_sync.c to a userspace
> tool such that this tool can help to constantly check if tsc is synchronized
> when some workload is running. Finally, we believe that the large TSC skew 
> spotted by TD kernel is a false positive.
> 
> Your patches (those are merged) have improved clocksource watchdog a lot to
> reduce false-positives. But due to the nature of TDX, switching between TD
> and host takes more time. Then, the time window between two reads from
> watchdog clocksource in cs_watchdog_read() increases, so does the
> probability of the two reads being interrupted by whatever on host. Then,
> sometimes, especially when there are heavy workloads in both host and TD,
> the maximum number of retries in cs_watchdog_read() is exceeded and tsc is
> marked unstable.
> 
> Then we apply this out-of-tree patch, it helps to further reduce
> false-positives. But TD kernel still observes TSC skew in some cases. After
> a close look into kernel logs, we find patterns in those cases: an expected
> re-initialization somehow doesn't happen. That's why we raise this issue
> and ask for your advice.

I am glad that the patch at least helps.  ;-)

> [1]: https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
> 
> >And here the clocksource failed the coarse-grained check and marked
> >the clocksource as unstable.  Perhaps because the previous read
> >forced a coarse-grained check.  Except that this should have forced
> >a reinitialization.  Ah, it looks like I need to suppress setting
> >CLOCK_SOURCE_WATCHDOG if coarse-grained checks have been enabled.
> >That could cause false-positive failure for the next check, after all.
> >
> >And perhaps make cs_watchdog_read() modify its print if there is
> >a watchdog reset pending or if the current clocksource has the
> >CLOCK_SOURCE_WATCHDOG flag cleared.
> >
> >Perhaps as shown in the additional patch below, to be folded into the
> >original?
> 
> Thanks. Will test with below patch applied.

If this patch helps, but problems remain, another thing to try is to
increase the clocksource.max_cswd_read_retries kernel boot parameter
above its default value of 3.  Maybe to 5 or 10?

If this patch does not help, please let me know.  In that case, there
are probably more fixes required.

							Thanx, Paul

> Thanks
> Chao
> >
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> >index cfa992250c388..62da2485fd574 100644
> >--- a/kernel/time/clocksource.c
> >+++ b/kernel/time/clocksource.c
> >@@ -230,8 +230,13 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> > 		}
> > 	}
> > 
> >-	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, coarse-grained skew check followed by re-initialization\n",
> >-		smp_processor_id(), watchdog->name, wd_delay, nretries);
> >+	if ((cs->flags & CLOCK_SOURCE_WATCHDOG) && !atomic_read(&watchdog_reset_pending)) {
> >+		pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, coarse-grained skew check followed by re-initialization\n",
> >+			smp_processor_id(), watchdog->name, wd_delay, nretries);
> >+	} else {
> >+		pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, awaiting re-initialization\n",
> >+			smp_processor_id(), watchdog->name, wd_delay, nretries);
> >+	}
> > 	return true;
> > }
> > 
> >@@ -379,7 +384,8 @@ static void clocksource_watchdog(struct timer_list *unused)
> > 		/* Clocksource initialized ? */
> > 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
> > 		    atomic_read(&watchdog_reset_pending)) {
> >-			cs->flags |= CLOCK_SOURCE_WATCHDOG;
> >+			if (!coarse)
> >+				cs->flags |= CLOCK_SOURCE_WATCHDOG;
> > 			cs->wd_last = wdnow;
> > 			cs->cs_last = csnow;
> > 			continue;

  reply	other threads:[~2021-08-03 13:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21  8:33 kernel test robot
2021-05-21 13:56 ` Paul E. McKenney
2021-05-22 16:08   ` Paul E. McKenney
2021-05-26  6:49     ` Feng Tang
2021-05-26 13:49       ` Paul E. McKenney
2021-05-27 18:29         ` Paul E. McKenney
2021-05-27 19:01           ` Andi Kleen
2021-05-27 19:19             ` Paul E. McKenney
2021-05-27 19:29               ` Matthew Wilcox
2021-05-27 21:05                 ` Paul E. McKenney
2021-05-28  0:58                   ` Andi Kleen
2021-06-01 17:10                     ` Paul E. McKenney
2021-08-02  6:20           ` Chao Gao
2021-08-02 17:02             ` Paul E. McKenney
2021-08-03  8:58               ` Chao Gao
2021-08-03 13:48                 ` Paul E. McKenney [this message]
2021-08-05  2:16                   ` Chao Gao
2021-08-05  4:03                     ` Paul E. McKenney
2021-08-05  4:34                       ` Andi Kleen
2021-08-05 15:33                         ` Paul E. McKenney
2021-08-05  5:39                       ` Chao Gao
2021-08-05 15:37                         ` Paul E. McKenney
2021-08-06  2:10                           ` Chao Gao
2021-08-06  4:15                             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210803134816.GO4397@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=chao.gao@intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=feng.tang@intel.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=maz@kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    --cc=zhengjun.xing@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    --subject='Re: [clocksource]  8901ecc231:  stress-ng.lockbus.ops_per_sec -9.5% regression' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).