LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Waiman Long <longman@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Jeremy Linton <jeremy.linton@arm.com>,
	pbunyan@redhat.com
Subject: Re: [RFC PATCH v2] tick: Make tick_periodic() check for missing ticks
Date: Sun, 15 Mar 2020 19:20:14 -0700	[thread overview]
Message-ID: <20200316022014.GA30856@roeck-us.net> (raw)
In-Reply-To: <20200207193929.27308-1-longman@redhat.com>

Hi,

On Fri, Feb 07, 2020 at 02:39:29PM -0500, Waiman Long wrote:
> The tick_periodic() function is used at the beginning part of the
> bootup process for time keeping while the other clock sources are
> being initialized.
> 
> The current code assumes that all the timer interrupts are handled in
> a timely manner with no missing ticks. That is not actually true. Some
> ticks are missed and there are some discrepancies between the tick time
> (jiffies) and the timestamp reported in the kernel log.  Some systems,
> however, are more prone to missing ticks than the others.  In the extreme
> case, the discrepancy can actually cause a soft lockup message to be
> printed by the watchdog kthread. For example, on a Cavium ThunderX2
> Sabre arm64 system:
> 
>  [   25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s!
> 
> On that system, the missing ticks are especially prevalent during the
> smp_init() phase of the boot process. With an instrumented kernel,
> it was found that it took about 24s as reported by the timestamp for
> the tick to accumulate 4s of time.
> 
> Investigation and bisection done by others seemed to point to the
> commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or
> lack thereof") as the culprit. It could also be a firmware issue as
> new firmware was promised that would fix the issue.
> 
> To properly address this problem, we cannot assume that there will
> be no missing tick in tick_periodic(). This function is now modified
> to follow the example of tick_do_update_jiffies64() by using another
> reference clock to check for missing ticks. Since the watchdog timer
> uses running_clock(), it is used here as the reference. With this patch
> applied, the soft lockup problem in the arm64 system is gone and tick
> time tracks much more closely to the timestamp time.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Since this patch is in linux-next, roughly 10% of my x86 and x86_64
qemu emulation boots are stalling. Typical log:

[    0.002016] smpboot: Total of 1 processors activated (7576.40 BogoMIPS)
[    0.002016] devtmpfs: initialized
[    0.002016] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.002016] futex hash table entries: 256 (order: 3, 32768 bytes, linear)
[    0.002016] xor: measuring software checksum speed

another:

[    0.002653] Freeing SMP alternatives memory: 44K
[    0.002653] smpboot: CPU0: Intel Westmere E56xx/L56xx/X56xx (IBRS update) (family: 0x6, model: 0x2c, stepping: 0x1)
[    0.002653] Performance Events: unsupported p6 CPU model 44 no PMU driver, software events only.
[    0.002653] rcu: Hierarchical SRCU implementation.
[    0.002653] smp: Bringing up secondary CPUs ...
[    0.002653] x86: Booting SMP configuration:
[    0.002653] .... node  #0, CPUs:      #1
[    0.000000] smpboot: CPU 1 Converting physical 0 to logical die 1

... and then there is silence until the test aborts.

This is only (or at least predominantly) seen if the system running
the emulation is under load.

Reverting this patch fixes the problem.

Guenter

  parent reply	other threads:[~2020-03-16  2:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 19:39 Waiman Long
2020-03-04  9:20 ` [tip: timers/core] tick/common: " tip-bot2 for Waiman Long
2020-03-16  2:20 ` Guenter Roeck [this message]
2020-03-16  2:43   ` [RFC PATCH v2] tick: " Waiman Long
2020-03-16  2:57     ` Guenter Roeck
2020-03-16 14:20       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200316022014.GA30856@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=fweisbec@gmail.com \
    --cc=jeremy.linton@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=pbunyan@redhat.com \
    --cc=tglx@linutronix.de \
    --subject='Re: [RFC PATCH v2] tick: Make tick_periodic() check for missing ticks' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).