LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "tip-bot2 for Borislav Petkov" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: Sumanth Kamatala <skamatala@juniper.net>,
	Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: [tip: ras/core] x86/mce: Defer processing of early errors
Date: Tue, 24 Aug 2021 08:44:55 -0000	[thread overview]
Message-ID: <162979469521.25758.12563962635907560846.tip-bot2@tip-bot2> (raw)
In-Reply-To: <20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com>

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     3bff147b187d5dfccfca1ee231b0761a89f1eff5
Gitweb:        https://git.kernel.org/tip/3bff147b187d5dfccfca1ee231b0761a89f1eff5
Author:        Borislav Petkov <bp@alien8.de>
AuthorDate:    Mon, 23 Aug 2021 17:31:29 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 24 Aug 2021 10:40:58 +02:00

x86/mce: Defer processing of early errors

When a fatal machine check results in a system reset, Linux does not
clear the error(s) from machine check bank(s) - hardware preserves the
machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads, logs,
and clears all machine check banks.

But there is a problem. In:

  5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")

the call to mce_register_decode_chain() moved later in the boot
sequence. This means that /dev/mcelog doesn't see those early error
logs.

This was partially fixed by:

  cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users of
/dev/mcelog should expect to find any early errors logged to standard
places.

Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early
machine check initialization to indicate that any errors found should
just be queued to genpool. When mcheck_late_init() is called it will
call mce_schedule_work() to actually log and flush any errors queued in
the genpool.

 [ Based on an original patch, commit message by and completely
   productized by Tony Luck. ]

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
---
 arch/x86/include/asm/mce.h     |  1 +
 arch/x86/kernel/cpu/mce/core.c | 11 ++++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4..da93215 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,6 +265,7 @@ enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
 	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aa..8cb7816 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -817,7 +817,10 @@ log_it:
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
-		mce_log(&m);
+		if (flags & MCP_QUEUE_LOG)
+			mce_gen_pool_add(&m);
+		else
+			mce_log(&m);
 
 clear_it:
 		/*
@@ -1639,10 +1642,12 @@ static void __mcheck_cpu_init_generic(void)
 		m_fl = MCP_DONTLOG;
 
 	/*
-	 * Log the machine checks left over from the previous reset.
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
 	 */
 	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | m_fl, &all_banks);
+	machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
 
 	cr4_set_bits(X86_CR4_MCE);
 

      reply	other threads:[~2021-08-24  8:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 22:44 [PATCH] x86/mce/dev-mcelog: Call mce_register_decode_chain() much earlier Tony Luck
2021-08-20 12:28 ` Borislav Petkov
2021-08-20 14:43   ` Luck, Tony
2021-08-20 15:48     ` Borislav Petkov
2021-08-23 18:45       ` Luck, Tony
2021-08-23 20:41         ` [PATCH v2] x86/mce: Defer processing early errors until mcheck_late_init() Luck, Tony
2021-08-23 20:51           ` Borislav Petkov
2021-08-23 21:41             ` Luck, Tony
2021-08-24  0:31             ` [PATCH v3] x86/mce: Defer processing of early errors Luck, Tony
2021-08-24  8:44               ` tip-bot2 for Borislav Petkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162979469521.25758.12563962635907560846.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=bp@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=skamatala@juniper.net \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --subject='Re: [tip: ras/core] x86/mce: Defer processing of early errors' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).