LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Sumanth Kamatala <skamatala@juniper.net>
Subject: [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init()
Date: Mon, 23 Aug 2021 13:41:22 -0700	[thread overview]
Message-ID: <20210823204122.GA1640015@agluck-desk2.amr.corp.intel.com> (raw)
In-Reply-To: <20210823184547.GA1638691@agluck-desk2.amr.corp.intel.com>

When a fatal machine check results in a system reset, Linux does
not clear the error(s) from machine check bank(s).

Hardware preserves the machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads,
logs, and clears all machine check banks.

But there is a problem. In:
commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot sequence.
This means that /dev/mcelog doesn't see those early error logs.

This was partially fixed by:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users
of /dev/mcelog should expect to find any early errors logged to
standard places.

Delay processing logs until after all built-in code has had a chance
to register on the mce notifier chain (modules are still out of luck,
there's not way to know how long to wait for those to load).

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..593af202f586 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -129,6 +129,8 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
  */
 BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
 
+static bool mce_init_complete;
+
 /* Do initial initialization of a struct mce */
 noinstr void mce_setup(struct mce *m)
 {
@@ -155,7 +157,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce *m)
 {
-	if (!mce_gen_pool_add(m))
+	if (!mce_gen_pool_add(m) && mce_init_complete)
 		irq_work_queue(&mce_irq_work);
 }
 EXPORT_SYMBOL_GPL(mce_log);
@@ -2771,6 +2773,8 @@ static int __init mcheck_late_init(void)
 
 	mcheck_debugfs_init();
 
+	mce_init_complete = true;
+
 	/*
 	 * Flush out everything that has been logged during early boot, now that
 	 * everything has been initialized (workqueues, decoders, ...).
-- 
2.29.2


  reply	other threads:[~2021-08-23 20:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 22:44 [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier Tony Luck
2021-08-20 12:28 ` Borislav Petkov
2021-08-20 14:43   ` Luck, Tony
2021-08-20 15:48     ` Borislav Petkov
2021-08-23 18:45       ` Luck, Tony
2021-08-23 20:41         ` Luck, Tony [this message]
2021-08-23 20:51           ` [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init() Borislav Petkov
2021-08-23 21:41             ` Luck, Tony
2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
2021-08-24  8:44               ` [tip: ras/core] " tip-bot2 for Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210823204122.GA1640015@agluck-desk2.amr.corp.intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skamatala@juniper.net \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).