LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/3] More machine check recovery fixes
@ 2021-07-06 19:06 Tony Luck
  2021-07-06 19:06 ` [PATCH 1/3] x86/mce: Change to not send SIGBUS error during copy from user Tony Luck
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Tony Luck @ 2021-07-06 19:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Ding Hui, naoya.horiguchi, osalvador, Youquan Song,
	huangcun, x86, linux-edac, linux-kernel

Fix a couple of issues in machine check handling

1) A repeated machine check inside the kernel without calling the task
   work function between machine checks it will go into an infinite
   loop
2) Machine checks in kernel functions copying data from user addresses
   send SIGBUS to the user as if the application had consumed the
   poison. But this is wrong. The user should see either an -EFAULT
   error return or a reduced byte count (in the case of write(2)).

Tony Luck (3):
  x86/mce: Change to not send SIGBUS error during copy from user
  x86/mce: Avoid infinite loop for copy from user recovery
  x86/mce: Drop copyin special case for #MC

 arch/x86/kernel/cpu/mce/core.c | 62 ++++++++++++++++++++++++----------
 arch/x86/lib/copy_user_64.S    | 13 -------
 include/linux/sched.h          |  1 +
 3 files changed, 45 insertions(+), 31 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 47+ messages in thread
* [PATCH 0/2] Fix infinite machine check loop in futex_wait_setup()
@ 2021-01-08 22:22 Tony Luck
  2021-01-11 21:44 ` [PATCH v2 0/3] " Tony Luck
  0 siblings, 1 reply; 47+ messages in thread
From: Tony Luck @ 2021-01-08 22:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, x86, Andrew Morton, Peter Zijlstra, Darren Hart,
	Andy Lutomirski, linux-kernel, linux-edac, linux-mm

Linux can now recover from machine checks where kernel code is
doing get_user() to access application memory. But there isn't
a way to distinguish whether get_user() failed because of a page
fault or a machine check.

Thus there is a problem if any kernel code thinks it can retry
an access after doing something that would fix the page fault.

One such example (I'm sure there are more) is in futex_wait_setup()
where an attempt to read the futex with page faults disabled. Then
a retry (after dropping a lock so page faults are safe):


        ret = get_futex_value_locked(&uval, uaddr);

        if (ret) {
                queue_unlock(*hb);

                ret = get_user(uval, uaddr);

It would be good to avoid deliberately taking a second machine
check (especially as the recovery code does really bad things
and ends up in an infinite loop!).

My proposal is to add a new function arch_memory_failure()
that can be called after get_user() returns -EFAULT to allow
graceful recovery.

Futex reviewers: I just have one new call (that fixes my test
case). If you could point out other places this is needed,
that would be most helpful.

Patch roadmap:

Part 1: Add code to avoid the infinite loop in the machine check
code. Just panic if code runs into the same machine check a second
time. This should make it much easier to debug other places where
this happens.

Part 2: Add arch_memory_failure() and use it in futex_wait_setup().
[Suggestions gladly accepted for the current best way to handle the
#defines etc. to define an arch specific function to be used in
generic code]

Tony Luck (2):
  x86/mce: Avoid infinite loop for copy from user recovery
  futex, x86/mce: Avoid double machine checks

 arch/x86/include/asm/mmu.h     |  7 +++++++
 arch/x86/kernel/cpu/mce/core.c | 17 ++++++++++++++++-
 include/linux/mm.h             |  4 ++++
 include/linux/sched.h          |  3 ++-
 kernel/futex.c                 |  3 +++
 5 files changed, 32 insertions(+), 2 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-09-21  7:52 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 19:06 [PATCH 0/3] More machine check recovery fixes Tony Luck
2021-07-06 19:06 ` [PATCH 1/3] x86/mce: Change to not send SIGBUS error during copy from user Tony Luck
2021-07-06 19:06 ` [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-07-06 19:06 ` [PATCH 3/3] x86/mce: Drop copyin special case for #MC Tony Luck
2021-08-18  0:29 ` [PATCH v2 0/3] More machine check recovery fixes Tony Luck
2021-08-18  0:29   ` [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-08-20 17:31     ` Borislav Petkov
2021-08-20 18:59       ` Luck, Tony
2021-08-20 19:27         ` Borislav Petkov
2021-08-20 20:23           ` Luck, Tony
2021-08-21  4:51             ` Tony Luck
2021-08-21 21:51               ` Al Viro
2021-08-22 14:36             ` Borislav Petkov
2021-08-20 20:33           ` Luck, Tony
2021-08-22 14:46             ` Borislav Petkov
2021-08-23 15:24               ` Luck, Tony
2021-09-13  9:24     ` Borislav Petkov
2021-09-13 21:52       ` [PATCH v3] " Luck, Tony
2021-09-14  8:28         ` Borislav Petkov
2021-08-18  0:29   ` [PATCH v2 2/3] x86/mce: Change to not send SIGBUS error during copy from user Tony Luck
2021-09-21  7:52     ` [tip: ras/core] " tip-bot2 for Tony Luck
2021-08-18  0:29   ` [PATCH v2 3/3] x86/mce: Drop copyin special case for #MC Tony Luck
2021-09-20  9:13     ` Borislav Petkov
2021-09-20 16:18       ` Luck, Tony
2021-09-20 16:37         ` Borislav Petkov
2021-09-20 16:43           ` Luck, Tony
2021-09-21  7:52     ` [tip: ras/core] " tip-bot2 for Tony Luck
2021-08-18 16:14   ` [PATCH v2 0/3] More machine check recovery fixes Luck, Tony
  -- strict thread matches above, loose matches on Subject: below --
2021-01-08 22:22 [PATCH 0/2] Fix infinite machine check loop in futex_wait_setup() Tony Luck
2021-01-11 21:44 ` [PATCH v2 0/3] " Tony Luck
2021-01-11 21:44   ` [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-01-11 22:11     ` Andy Lutomirski
2021-01-11 22:20       ` Luck, Tony
2021-01-12 17:00         ` Andy Lutomirski
2021-01-12 17:16           ` Luck, Tony
2021-01-12 17:21             ` Andy Lutomirski
2021-01-12 18:23               ` Luck, Tony
2021-01-12 18:57                 ` Andy Lutomirski
2021-01-12 20:52                   ` Luck, Tony
2021-01-12 22:04                     ` Andy Lutomirski
2021-01-13  1:50                       ` Luck, Tony
2021-01-13  4:15                         ` Andy Lutomirski
2021-01-13 10:00                           ` Borislav Petkov
2021-01-13 16:06                             ` Luck, Tony
2021-01-13 16:19                               ` Borislav Petkov
2021-01-13 16:32                                 ` Luck, Tony
2021-01-13 17:35                                   ` Borislav Petkov
2021-01-14 20:22     ` Borislav Petkov
2021-01-14 21:05       ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).