LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Darren Hart <dvhart@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery
Date: Tue, 2 Feb 2021 22:06:00 +0100	[thread overview]
Message-ID: <20210202210600.GF18075@zn.tnic> (raw)
In-Reply-To: <d99c43608909400199e9384bb7425beb@intel.com>

On Tue, Feb 02, 2021 at 04:04:17PM +0000, Luck, Tony wrote:
> > And the much more important question is, what is the code supposed to
> > do when that overflow *actually* happens in real life? Because IINM,
> > an overflow condition on the same page would mean killing the task to
> > contain the error and not killing the machine...
> 
> Correct. The cases I've actually hit, the second machine check is on the
> same address as the first. But from a recovery perspective Linux is going
> to take away the whole page anyway ... so not complaining if the second
> (or subsequent) access is within the same page makes sense (and that's
> what the patch does).
> 
> The code can't handle it if a subsequent #MC is to a different page (because
> we only have a single spot in the task structure to store the physical page
> address).  But that looks adequate. If the code is wildly accessing different
> pages *and* getting machine checks from those different pages ... then
> something is very seriously wrong with the system.

Right, that's clear.

But I meant something else: this thread had somewhere upthread:

"But the real validation folks took my patch and found that it has
destabilized cases 1 & 2 (and case 3 also chokes if you repeat a few
more times). System either hangs or panics. Generally before 100
injection/conumption cycles."

To which I asked:

"Or what is the failure exactly?"

and you said:

"After a few cycles of the test injection to user mode, I saw an
overflow in the machine check bank. As if it hadn't been cleared from
the previous iteration ... but all the banks are cleared as soon as we
find that the machine check is recoverable. A while before getting to
the code I changed."

And you also said:

"I tried reporoducing (applied the original patch I posted back to -rc3)
and the same issue stubbornly refused to show up again."

So that "system hang or panic" which the validation folks triggered,
that cannot be reproduced anymore? Did they run the latest version of
the patch?

And the overflow issue is something else which the patch handles fine,
as you say.

So that original mysterious hang is still unsolved.

Or am I missing something and am misrepresenting the situation?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-02-02 21:07 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 22:22 [PATCH 0/2] Fix infinite machine check loop in futex_wait_setup() Tony Luck
2021-01-08 22:22 ` [PATCH 1/2] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-01-08 22:22 ` [PATCH 2/2] futex, x86/mce: Avoid double machine checks Tony Luck
2021-01-08 22:47   ` Peter Zijlstra
2021-01-08 23:08     ` Luck, Tony
2021-01-08 23:14       ` Peter Zijlstra
2021-01-08 23:20         ` Luck, Tony
2021-01-11 21:44 ` [PATCH v2 0/3] Fix infinite machine check loop in futex_wait_setup() Tony Luck
2021-01-11 21:44   ` [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-01-11 22:11     ` Andy Lutomirski
2021-01-11 22:20       ` Luck, Tony
2021-01-12 17:00         ` Andy Lutomirski
2021-01-12 17:16           ` Luck, Tony
2021-01-12 17:21             ` Andy Lutomirski
2021-01-12 18:23               ` Luck, Tony
2021-01-12 18:57                 ` Andy Lutomirski
2021-01-12 20:52                   ` Luck, Tony
2021-01-12 22:04                     ` Andy Lutomirski
2021-01-13  1:50                       ` Luck, Tony
2021-01-13  4:15                         ` Andy Lutomirski
2021-01-13 10:00                           ` Borislav Petkov
2021-01-13 16:06                             ` Luck, Tony
2021-01-13 16:19                               ` Borislav Petkov
2021-01-13 16:32                                 ` Luck, Tony
2021-01-13 17:35                                   ` Borislav Petkov
2021-01-14 20:22     ` Borislav Petkov
2021-01-14 21:05       ` Luck, Tony
2021-01-11 21:44   ` [PATCH v2 2/3] x86/mce: Add new return value to get_user() for machine check Tony Luck
2021-01-11 21:44   ` [PATCH v2 3/3] futex, x86/mce: Avoid double machine checks Tony Luck
2021-01-14 17:22   ` [PATCH v2 0/3] Fix infinite machine check loop in futex_wait_setup() Andy Lutomirski
2021-01-15  0:38   ` [PATCH v3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-01-15 15:27     ` Borislav Petkov
2021-01-15 19:34       ` Luck, Tony
2021-01-15 20:51         ` [PATCH v4] " Luck, Tony
2021-01-15 23:23           ` Luck, Tony
2021-01-19 10:56             ` Borislav Petkov
2021-01-19 23:57               ` Luck, Tony
2021-01-20 12:18                 ` Borislav Petkov
2021-01-20 17:17                   ` Luck, Tony
2021-01-21 21:09                   ` Luck, Tony
2021-01-25 22:55                     ` [PATCH v5] " Luck, Tony
2021-01-26 11:03                       ` Borislav Petkov
2021-01-26 22:36                         ` Luck, Tony
2021-01-28 17:57                           ` Borislav Petkov
2021-02-01 18:58                             ` Luck, Tony
2021-02-02 11:01                               ` Borislav Petkov
2021-02-02 16:04                                 ` Luck, Tony
2021-02-02 21:06                                   ` Borislav Petkov [this message]
2021-02-02 22:12                                     ` Luck, Tony
2021-01-18 15:39         ` [PATCH v3] " Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202210600.GF18075@zn.tnic \
    --to=bp@alien8.de \
    --cc=akpm@linux-foundation.org \
    --cc=dvhart@infradead.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).