LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Jue Wang <juew@google.com>, Ding Hui <dinghui@sangfor.com.cn>,
	naoya.horiguchi@nec.com, osalvador@suse.de,
	Youquan Song <youquan.song@intel.com>,
	huangcun@sangfor.com.cn, x86@kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery
Date: Fri, 20 Aug 2021 13:23:46 -0700	[thread overview]
Message-ID: <20210820202346.GA1623796@agluck-desk2.amr.corp.intel.com> (raw)
In-Reply-To: <YSACMCEoU6FxjDNh@zn.tnic>

On Fri, Aug 20, 2021 at 09:27:44PM +0200, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 11:59:45AM -0700, Luck, Tony wrote:
> > It's the "when we return" part that is the problem here. Logical
> > trace looks like:
> > 
> > user-syscall:
> > 
> > 	kernel does get_user() or copyin(), hits user poison address
> > 
> > 		machine check
> > 		sees that this was kernel get_user()/copyin() and
> > 		uses extable to "return" to exception path
> > 
> > 	still in kernel, see that get_user() or copyin() failed
> > 
> > 	Kernel does another get_user() or copyin() (maybe the first
> 
> I forgot all the details we were talking at the time but there's no way
> to tell the kernel to back off here, is it?
> 
> As in: there was an MCE while trying to access this user memory, you
> should not do get_user anymore. You did add that
> 
>          * Return zero to pretend that this copy succeeded. This
>          * is counter-intuitive, but needed to prevent the code
>          * in lib/iov_iter.c from retrying and running back into
> 
> which you're removing with the last patch so I'm confused.
> 
> IOW, the problem is that with repeated MCEs while the kernel is
> accessing that memory, it should be the kernel which should back off.
> And then we should kill that process too but apparently we don't even
> come to that.

My first foray into this just tried to fix the futex() case ... which is one
of the first copy is with pagefault_disable() set, then try again with it
clear.  My attempt there was to make get_user() return -EHWPOISON, and
change the futex code to give up immediatley when it saw that code.

AndyL (and maybe others) barfed all over that (and rightly so ... there
are thousands of get_user() and copyin() calls ... making sure all of them
did the right thing with a new error code would be a huge effort. Very
error prone (because testing all these paths is hard). New direction was
just deal with the fact that we might take more than one machine check
before the kernel is finished poking at the poison.

> 
> > Maybe the message could be clearer?
> > 
> > 	mce_panic("Too many consecutive machine checks in kernel while accessing user data", m, msg);
> 
> That's not my point - it is rather: this is a recoverable error because
> it is in user memory even if it is the kernel which tries to access it.
> And maybe we should not panic the whole box but try to cordon off the
> faulty memory only and poison it after having killed the process using
> it...

To recover we need to have some other place to jump to (besides the
normal extable error return ... which isn't working if we find ourselves
in this situation) when we hit a fault covered by an extable entry. And
also know how many machine checks is "normal" before taking the other path.

For futex(2) things resolve in two machine checks (one with
pagefault_disable() and then one without).  For write(2) I see up to
four machine cehcks (I didn't do a detailed track ... but I think it is
because copyin() does a retry to see if a failure in a many-bytes-at-atime
copy might be able to get a few more bytes by doing byte-at-a-time).

The logical spot for the alternate return would be to unwind the stack
back to the syscall entry point, and then force an error return from
there.  But that seems a perilous path ... what if some code between
syscall entry and the copyin() grabbed a mutex? We would also need to
know about that and release it as part of the recovery.

Another failed approach was to mark the page not present in the page
tables of the process accessing the poison. That would get us out of the
machine check loop.  But walking page tables and changing them while still
in machine check context can't be done in a safe way (the process may be
multi-threaded and other threads could still be running on other cores).

Bottom line is that I don't think this panic can actually happen unless
there is some buggy kernel code that retries get_user() or copyin()
indefinitely.

Probably the same for the two different addresses case ... though I'm
not 100% confident about that. There could be some ioctl() that peeks
at two parts of a passed in structure, and the user might pass in a
structure that spans across a page boundary with both pages poisoned.
But that would only hit if the driver code ignored the failure of the
first get_user() and blindly tried the second. So I'd count that as a
critically bad driver bug.

-Tony


  reply	other threads:[~2021-08-20 20:23 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06 19:06 [PATCH 0/3] More machine check recovery fixes Tony Luck
2021-07-06 19:06 ` [PATCH 1/3] x86/mce: Change to not send SIGBUS error during copy from user Tony Luck
2021-07-06 19:06 ` [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-07-06 19:06 ` [PATCH 3/3] x86/mce: Drop copyin special case for #MC Tony Luck
2021-08-18  0:29 ` [PATCH v2 0/3] More machine check recovery fixes Tony Luck
2021-08-18  0:29   ` [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-08-20 17:31     ` Borislav Petkov
2021-08-20 18:59       ` Luck, Tony
2021-08-20 19:27         ` Borislav Petkov
2021-08-20 20:23           ` Luck, Tony [this message]
2021-08-21  4:51             ` Tony Luck
2021-08-21 21:51               ` Al Viro
2021-08-22 14:36             ` Borislav Petkov
2021-08-20 20:33           ` Luck, Tony
2021-08-22 14:46             ` Borislav Petkov
2021-08-23 15:24               ` Luck, Tony
2021-09-13  9:24     ` Borislav Petkov
2021-09-13 21:52       ` [PATCH v3] " Luck, Tony
2021-09-14  8:28         ` Borislav Petkov
2021-08-18  0:29   ` [PATCH v2 2/3] x86/mce: Change to not send SIGBUS error during copy from user Tony Luck
2021-09-21  7:52     ` [tip: ras/core] " tip-bot2 for Tony Luck
2021-08-18  0:29   ` [PATCH v2 3/3] x86/mce: Drop copyin special case for #MC Tony Luck
2021-09-20  9:13     ` Borislav Petkov
2021-09-20 16:18       ` Luck, Tony
2021-09-20 16:37         ` Borislav Petkov
2021-09-20 16:43           ` Luck, Tony
2021-09-21  7:52     ` [tip: ras/core] " tip-bot2 for Tony Luck
2021-08-18 16:14   ` [PATCH v2 0/3] More machine check recovery fixes Luck, Tony
  -- strict thread matches above, loose matches on Subject: below --
2021-01-08 22:22 [PATCH 0/2] Fix infinite machine check loop in futex_wait_setup() Tony Luck
2021-01-11 21:44 ` [PATCH v2 0/3] " Tony Luck
2021-01-11 21:44   ` [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-01-11 22:11     ` Andy Lutomirski
2021-01-11 22:20       ` Luck, Tony
2021-01-12 17:00         ` Andy Lutomirski
2021-01-12 17:16           ` Luck, Tony
2021-01-12 17:21             ` Andy Lutomirski
2021-01-12 18:23               ` Luck, Tony
2021-01-12 18:57                 ` Andy Lutomirski
2021-01-12 20:52                   ` Luck, Tony
2021-01-12 22:04                     ` Andy Lutomirski
2021-01-13  1:50                       ` Luck, Tony
2021-01-13  4:15                         ` Andy Lutomirski
2021-01-13 10:00                           ` Borislav Petkov
2021-01-13 16:06                             ` Luck, Tony
2021-01-13 16:19                               ` Borislav Petkov
2021-01-13 16:32                                 ` Luck, Tony
2021-01-13 17:35                                   ` Borislav Petkov
2021-01-14 20:22     ` Borislav Petkov
2021-01-14 21:05       ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210820202346.GA1623796@agluck-desk2.amr.corp.intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=dinghui@sangfor.com.cn \
    --cc=huangcun@sangfor.com.cn \
    --cc=juew@google.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    --subject='Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).