LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jue Wang <firstname.lastname@example.org>
To: "Luck, Tony" <email@example.com>
Cc: "Borislav Petkov" <firstname.lastname@example.org>,
"HORIGUCHI NAOYA(堀口 直也)" <email@example.com>,
"Oscar Salvador" <firstname.lastname@example.org>, x86 <email@example.com>,
"Song, Youquan" <firstname.lastname@example.org>
Subject: Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery
Date: Thu, 22 Jul 2021 21:16:40 -0700 [thread overview]
Message-ID: <CAPcxDJ7=UsAkDwVuoQcTt2B2UA4RWjs_o_=Fnk4Hfuqj+V8hAA@mail.gmail.com> (raw)
On Thu, Jul 22, 2021 at 9:01 PM Luck, Tony <email@example.com> wrote:
> >> I'm not aware of, nor expecting to find, places where the kernel
> >> tries to access user address A and hits poison, and then tries to
> >> access user address B (without returrning to user between access
> >> A and access B).
> >This seems a reasonablely easy scenario.
> > A user space app allocates a buffer of xyz KB/MB/GB.
> > Unfortunately the dimms are bad and multiple cache lines have
> > uncorrectable errors in them on different pages.
> > Then the user space app tries to write the content of the buffer into some
> > file via write(2) from the entire buffer in one go.
> Before this patch Linux gets into an infinite loop taking machine
> checks on the first of the poison addresses in the buffer.
> With this patch (and also patch 3/3 in this series). There are
> a few machine checks on the first poison address (I think the number
> depends on the alignment of the poison within a page ... but I'm
> not sure). My test code shows 4 machine checks at the same
> address. Then Linux returns a short byte count to the user
> showing how many bytes were actually written to the file.
> The fast that there are many more poison lines in the buffer
> beyond the place where the write stopped on the first one is
In our test, the application memory was anon.
With 1 UC error injected, the test always passes with the error
recovered and a SIGBUS delivered to user space.
When there are >1 UC errors in buffer, then indefinite mce loop.
> [Well, if the second poisoned line is immediately after the first
> you may hit h/w prefetch issues and h/w may signal a fatal
> machine check ... but that's a different problem that s/w could
> only solve with painful LFENCE operations between each 64-bytes
> of the copy]
next prev parent reply other threads:[~2021-07-23 4:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 13:54 Jue Wang
2021-07-22 15:19 ` Luck, Tony
2021-07-22 23:30 ` Jue Wang
2021-07-23 0:14 ` Luck, Tony
2021-07-23 3:47 ` Jue Wang
2021-07-23 4:01 ` Luck, Tony
2021-07-23 4:16 ` Jue Wang [this message]
2021-07-23 14:47 ` Luck, Tony
-- strict thread matches above, loose matches on Subject: below --
2021-07-31 6:30 Jue Wang
2021-07-31 20:43 ` Luck, Tony
2021-08-02 15:29 ` Jue Wang
2021-07-06 19:06 [PATCH 0/3] More machine check recovery fixes Tony Luck
2021-07-06 19:06 ` [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--subject='Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery' \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).