LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
@ 2018-04-17 14:36 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-17 14:36 UTC (permalink / raw)
To: x86, LKML
Cc: Borislav Petkov, Dominik Brodowski, Denys Vlasenko, Andy Lutomirski
32-bit user code that uses int $80 doesn't care about r8-r11. There is,
however, some 64-bit user code that intentionally uses int $0x80 to
invoke 32-bit system calls. From what I've seen, basically all such
code assumes that r8-r15 are all preserved, but the kernel clobbers
r8-r11. Since I doubt that there's any code that depends on int $0x80
zeroing r8-r11, change the kernel to preserve them.
I suspect that very little user code is broken by the old clobber,
since r8-r11 are only rarely allocated by gcc, and they're clobbered
by function calls, so they only way we'd see a problem is if the
same function that invokes int $0x80 also spills something important
to one of these registers.
The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4". Before that, all regs were
preserved. I can't find any explanation of why this change was made.
This patch also updates the test_syscall_vdso_32 testcase to verify
the new behavior, and it strengthens the test to make sure that
the kernel doesn't accidentally permute r8..r15.
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
Ingo, Thomas: this could be a -stable candidate, but it's apparently not
severe enough for many people to have noticed.
arch/x86/entry/entry_64_compat.S | 8 +++---
tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
2 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 08425c42f8b7..e4b94b7494c6 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -368,13 +368,13 @@ ENTRY(entry_INT80_compat)
pushq %rdx /* pt_regs->dx */
pushq %rcx /* pt_regs->cx */
pushq $-ENOSYS /* pt_regs->ax */
- pushq $0 /* pt_regs->r8 = 0 */
+ pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
- pushq $0 /* pt_regs->r9 = 0 */
+ pushq %r9 /* pt_regs->r9 */
xorl %r9d, %r9d /* nospec r9 */
- pushq $0 /* pt_regs->r10 = 0 */
+ pushq %r10 /* pt_regs->r10 */
xorl %r10d, %r10d /* nospec r10 */
- pushq $0 /* pt_regs->r11 = 0 */
+ pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11 */
pushq %rbx /* pt_regs->rbx */
xorl %ebx, %ebx /* nospec rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
" shl $32, %r8\n"
" orq $0x7f7f7f7f, %r8\n"
" movq %r8, %r9\n"
- " movq %r8, %r10\n"
- " movq %r8, %r11\n"
- " movq %r8, %r12\n"
- " movq %r8, %r13\n"
- " movq %r8, %r14\n"
- " movq %r8, %r15\n"
+ " incq %r9\n"
+ " movq %r9, %r10\n"
+ " incq %r10\n"
+ " movq %r10, %r11\n"
+ " incq %r11\n"
+ " movq %r11, %r12\n"
+ " incq %r12\n"
+ " movq %r12, %r13\n"
+ " incq %r13\n"
+ " movq %r13, %r14\n"
+ " incq %r14\n"
+ " movq %r14, %r15\n"
+ " incq %r15\n"
" ret\n"
" .code32\n"
" .popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
int err = 0;
int num = 8;
uint64_t *r64 = ®s64.r8;
+ uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
if (!kernel_is_64bit)
return 0;
do {
- if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+ if (*r64 == expected++)
continue; /* register did not change */
if (syscall_addr != (long)&int80) {
/*
@@ -147,18 +155,17 @@ int check_regs64(void)
continue;
}
} else {
- /* INT80 syscall entrypoint can be used by
+ /*
+ * INT80 syscall entrypoint can be used by
* 64-bit programs too, unlike SYSCALL/SYSENTER.
* Therefore it must preserve R12+
* (they are callee-saved registers in 64-bit C ABI).
*
- * This was probably historically not intended,
- * but R8..11 are clobbered (cleared to 0).
- * IOW: they are the only registers which aren't
- * preserved across INT80 syscall.
+ * Starting in Linux 4.17 (and any kernel that
+ * backports the change), R8..11 are preserved.
+ * Historically (and probably unintentionally), they
+ * were clobbered or zeroed.
*/
- if (*r64 == 0 && num <= 11)
- continue;
}
printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
err++;
--
2.14.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
@ 2018-04-17 15:00 ` Denys Vlasenko
2018-04-18 16:53 ` Andy Lutomirski
2018-04-23 12:50 ` Borislav Petkov
2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
2 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-17 15:00 UTC (permalink / raw)
To: Andy Lutomirski, x86, LKML; +Cc: Borislav Petkov, Dominik Brodowski
On 04/17/2018 04:36 PM, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11. There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls. From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11. Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
>
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
>
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were
> preserved. I can't find any explanation of why this change was made.
This means that the new behavior is there for some 8 years already.
Whoever was impacted by it, probably already switched to the new ABI.
Current ABI is "weaker", it allows kernel to save fewer registers.
Which is generally a good thing, since saving/restoring things cost
cycles, and sometimes painful on entry paths where you may desperately
need a scratch register or two. (Recall this one? -
...
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
...
wouldn't it be _great_ if one of GPRs would be available here
to hold userspace %rsp?
)
If userspace needs some registers saved, it's trivial for it to have:
push reg1
push reg2
int 0x80
pop reg2
pop reg1
OTOH if userspace _does not_ need some registers saved,
but they are defined as saved by the entrypoint ABI, then save/restore work
is done every time, even when not needed.
Thus, I propose to retain the current behavior.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-18 16:53 ` Andy Lutomirski
2018-04-18 17:13 ` Denys Vlasenko
0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-18 16:53 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Andy Lutomirski, X86 ML, LKML, Borislav Petkov, Dominik Brodowski
On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This means that the new behavior is there for some 8 years already.
> Whoever was impacted by it, probably already switched to the new ABI.
>
> Current ABI is "weaker", it allows kernel to save fewer registers.
>
> Which is generally a good thing, since saving/restoring things cost
> cycles, and sometimes painful on entry paths where you may desperately
> need a scratch register or two. (Recall this one? -
> ...
> movq %rsp, PER_CPU_VAR(rsp_scratch)
> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
> /* Construct struct pt_regs on stack */
> pushq $__USER_DS /* pt_regs->ss */
> pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
> ...
> wouldn't it be _great_ if one of GPRs would be available here
> to hold userspace %rsp?
> )
But this is the int $0x80 entry, which uses the stack sanely and
doesn't have this problem at all.
>
> If userspace needs some registers saved, it's trivial for it to have:
>
> push reg1
> push reg2
> int 0x80
> pop reg2
> pop reg1
>
> OTOH if userspace _does not_ need some registers saved,
> but they are defined as saved by the entrypoint ABI, then save/restore work
> is done every time, even when not needed.
>
> Thus, I propose to retain the current behavior.
The problems are:
1. If you look up how to do int $0x80, every answer you get doesn't
mention any clobbers. The code works on x86_32 and seems to work on
x86_64. I think we should make it actually work.
2. It's very easy to make this mistake and get away with it for a long
time, and the failure modes are hard to debug. gcc doesn't allocate
r8..r11 that often, and there are plenty of contexts (near end of a
leaf function) where r8..r11 are dead even if they were allocated. So
there is probably a decent body of code out there that makes this
mistake and is okay for now. But if anyone ever compiles it with LTO,
it's reasonably likely to go boom.
So I think we should fix it in the interest of avoiding weird bugs.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
2018-04-18 16:53 ` Andy Lutomirski
@ 2018-04-18 17:13 ` Denys Vlasenko
0 siblings, 0 replies; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-18 17:13 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: X86 ML, LKML, Borislav Petkov, Dominik Brodowski
On 04/18/2018 06:53 PM, Andy Lutomirski wrote:
> On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> This means that the new behavior is there for some 8 years already.
>> Whoever was impacted by it, probably already switched to the new ABI.
>>
>> Current ABI is "weaker", it allows kernel to save fewer registers.
>>
>> Which is generally a good thing, since saving/restoring things cost
>> cycles, and sometimes painful on entry paths where you may desperately
>> need a scratch register or two. (Recall this one? -
>> ...
>> movq %rsp, PER_CPU_VAR(rsp_scratch)
>> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>> /* Construct struct pt_regs on stack */
>> pushq $__USER_DS /* pt_regs->ss */
>> pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
>> ...
>> wouldn't it be _great_ if one of GPRs would be available here
>> to hold userspace %rsp?
>> )
>
> But this is the int $0x80 entry, which uses the stack sanely and
> doesn't have this problem at all.
It was a general point why not committing to save every register
may help on the callee (in this case kernel) side.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-23 12:50 ` Borislav Petkov
2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
2 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2018-04-23 12:50 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: x86, LKML, Dominik Brodowski, Denys Vlasenko
On Tue, Apr 17, 2018 at 07:36:36AM -0700, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11. There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls. From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11. Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
>
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
>
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4". Before that, all regs were
> preserved. I can't find any explanation of why this change was made.
Probably because r8-r11 are callee-clobbered, according to ABI so
someone decided to whack them so that code which doesn't adhere to the
ABI would fall on its face...
Also, looking at PUSH_AND_CLEAR_REGS and how we call it on the 64-bit
entry path, we probably should keep clearing those regs to avoid
speculation crap.
Methinks.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:x86/pti] x86/entry/64/compat: Preserve r8-r11 in int $0x80
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
2018-04-23 12:50 ` Borislav Petkov
@ 2018-04-27 15:12 ` tip-bot for Andy Lutomirski
2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-04-27 15:12 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, luto, hpa, bp, linux-kernel, dvlasenk, linux, tglx
Commit-ID: 8bb2610bc4967f19672444a7b0407367f1540028
Gitweb: https://git.kernel.org/tip/8bb2610bc4967f19672444a7b0407367f1540028
Author: Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 17 Apr 2018 07:36:36 -0700
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 27 Apr 2018 17:07:58 +0200
x86/entry/64/compat: Preserve r8-r11 in int $0x80
32-bit user code that uses int $80 doesn't care about r8-r11. There is,
however, some 64-bit user code that intentionally uses int $0x80 to invoke
32-bit system calls. From what I've seen, basically all such code assumes
that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I
doubt that there's any code that depends on int $0x80 zeroing r8-r11,
change the kernel to preserve them.
I suspect that very little user code is broken by the old clobber, since
r8-r11 are only rarely allocated by gcc, and they're clobbered by function
calls, so they only way we'd see a problem is if the same function that
invokes int $0x80 also spills something important to one of these
registers.
The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4". Before that, all regs were
preserved. I can't find any explanation of why this change was made.
Update the test_syscall_vdso_32 testcase as well to verify the new
behavior, and it strengthens the test to make sure that the kernel doesn't
accidentally permute r8..r15.
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org
---
arch/x86/entry/entry_64_compat.S | 8 +++---
tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
2 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9af927e59d49..9de7f1e1dede 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -84,13 +84,13 @@ ENTRY(entry_SYSENTER_compat)
pushq %rdx /* pt_regs->dx */
pushq %rcx /* pt_regs->cx */
pushq $-ENOSYS /* pt_regs->ax */
- pushq $0 /* pt_regs->r8 = 0 */
+ pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
- pushq $0 /* pt_regs->r9 = 0 */
+ pushq %r9 /* pt_regs->r9 */
xorl %r9d, %r9d /* nospec r9 */
- pushq $0 /* pt_regs->r10 = 0 */
+ pushq %r10 /* pt_regs->r10 */
xorl %r10d, %r10d /* nospec r10 */
- pushq $0 /* pt_regs->r11 = 0 */
+ pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11 */
pushq %rbx /* pt_regs->rbx */
xorl %ebx, %ebx /* nospec rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
" shl $32, %r8\n"
" orq $0x7f7f7f7f, %r8\n"
" movq %r8, %r9\n"
- " movq %r8, %r10\n"
- " movq %r8, %r11\n"
- " movq %r8, %r12\n"
- " movq %r8, %r13\n"
- " movq %r8, %r14\n"
- " movq %r8, %r15\n"
+ " incq %r9\n"
+ " movq %r9, %r10\n"
+ " incq %r10\n"
+ " movq %r10, %r11\n"
+ " incq %r11\n"
+ " movq %r11, %r12\n"
+ " incq %r12\n"
+ " movq %r12, %r13\n"
+ " incq %r13\n"
+ " movq %r13, %r14\n"
+ " incq %r14\n"
+ " movq %r14, %r15\n"
+ " incq %r15\n"
" ret\n"
" .code32\n"
" .popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
int err = 0;
int num = 8;
uint64_t *r64 = ®s64.r8;
+ uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
if (!kernel_is_64bit)
return 0;
do {
- if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+ if (*r64 == expected++)
continue; /* register did not change */
if (syscall_addr != (long)&int80) {
/*
@@ -147,18 +155,17 @@ int check_regs64(void)
continue;
}
} else {
- /* INT80 syscall entrypoint can be used by
+ /*
+ * INT80 syscall entrypoint can be used by
* 64-bit programs too, unlike SYSCALL/SYSENTER.
* Therefore it must preserve R12+
* (they are callee-saved registers in 64-bit C ABI).
*
- * This was probably historically not intended,
- * but R8..11 are clobbered (cleared to 0).
- * IOW: they are the only registers which aren't
- * preserved across INT80 syscall.
+ * Starting in Linux 4.17 (and any kernel that
+ * backports the change), R8..11 are preserved.
+ * Historically (and probably unintentionally), they
+ * were clobbered or zeroed.
*/
- if (*r64 == 0 && num <= 11)
- continue;
}
printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
err++;
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-04-27 15:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
2018-04-18 16:53 ` Andy Lutomirski
2018-04-18 17:13 ` Denys Vlasenko
2018-04-23 12:50 ` Borislav Petkov
2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).