LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
@ 2018-04-17 14:36 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-17 14:36 UTC (permalink / raw)
  To: x86, LKML
  Cc: Borislav Petkov, Dominik Brodowski, Denys Vlasenko, Andy Lutomirski

32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
however, some 64-bit user code that intentionally uses int $0x80 to
invoke 32-bit system calls.  From what I've seen, basically all such
code assumes that r8-r15 are all preserved, but the kernel clobbers
r8-r11.  Since I doubt that there's any code that depends on int $0x80
zeroing r8-r11, change the kernel to preserve them.

I suspect that very little user code is broken by the old clobber,
since r8-r11 are only rarely allocated by gcc, and they're clobbered
by function calls, so they only way we'd see a problem is if the
same function that invokes int $0x80 also spills something important
to one of these registers.

The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
preserved.  I can't find any explanation of why this change was made.

This patch also updates the test_syscall_vdso_32 testcase to verify
the new behavior, and it strengthens the test to make sure that
the kernel doesn't accidentally permute r8..r15.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Ingo, Thomas: this could be a -stable candidate, but it's apparently not
severe enough for many people to have noticed.

 arch/x86/entry/entry_64_compat.S                |  8 +++---
 tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 08425c42f8b7..e4b94b7494c6 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -368,13 +368,13 @@ ENTRY(entry_INT80_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   %r8			/* pt_regs->r8 */
 	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   %r9			/* pt_regs->r9 */
 	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   %r10			/* pt_regs->r10 */
 	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   $0			/* pt_regs->r11 = 0 */
+	pushq   %r11			/* pt_regs->r11 */
 	xorl	%r11d, %r11d		/* nospec   r11 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	xorl	%ebx, %ebx		/* nospec   rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
 	"	shl	$32, %r8\n"
 	"	orq	$0x7f7f7f7f, %r8\n"
 	"	movq	%r8, %r9\n"
-	"	movq	%r8, %r10\n"
-	"	movq	%r8, %r11\n"
-	"	movq	%r8, %r12\n"
-	"	movq	%r8, %r13\n"
-	"	movq	%r8, %r14\n"
-	"	movq	%r8, %r15\n"
+	"	incq	%r9\n"
+	"	movq	%r9, %r10\n"
+	"	incq	%r10\n"
+	"	movq	%r10, %r11\n"
+	"	incq	%r11\n"
+	"	movq	%r11, %r12\n"
+	"	incq	%r12\n"
+	"	movq	%r12, %r13\n"
+	"	incq	%r13\n"
+	"	movq	%r13, %r14\n"
+	"	incq	%r14\n"
+	"	movq	%r14, %r15\n"
+	"	incq	%r15\n"
 	"	ret\n"
 	"	.code32\n"
 	"	.popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
 	int err = 0;
 	int num = 8;
 	uint64_t *r64 = &regs64.r8;
+	uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
 
 	if (!kernel_is_64bit)
 		return 0;
 
 	do {
-		if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+		if (*r64 == expected++)
 			continue; /* register did not change */
 		if (syscall_addr != (long)&int80) {
 			/*
@@ -147,18 +155,17 @@ int check_regs64(void)
 				continue;
 			}
 		} else {
-			/* INT80 syscall entrypoint can be used by
+			/*
+			 * INT80 syscall entrypoint can be used by
 			 * 64-bit programs too, unlike SYSCALL/SYSENTER.
 			 * Therefore it must preserve R12+
 			 * (they are callee-saved registers in 64-bit C ABI).
 			 *
-			 * This was probably historically not intended,
-			 * but R8..11 are clobbered (cleared to 0).
-			 * IOW: they are the only registers which aren't
-			 * preserved across INT80 syscall.
+			 * Starting in Linux 4.17 (and any kernel that
+			 * backports the change), R8..11 are preserved.
+			 * Historically (and probably unintentionally), they
+			 * were clobbered or zeroed.
 			 */
-			if (*r64 == 0 && num <= 11)
-				continue;
 		}
 		printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
 		err++;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
@ 2018-04-17 15:00 ` Denys Vlasenko
  2018-04-18 16:53   ` Andy Lutomirski
  2018-04-23 12:50 ` Borislav Petkov
  2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
  2 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-17 15:00 UTC (permalink / raw)
  To: Andy Lutomirski, x86, LKML; +Cc: Borislav Petkov, Dominik Brodowski

On 04/17/2018 04:36 PM, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls.  From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11.  Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
> 
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
> 
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
> preserved.  I can't find any explanation of why this change was made.

This means that the new behavior is there for some 8 years already.
Whoever was impacted by it, probably already switched to the new ABI.

Current ABI is "weaker", it allows kernel to save fewer registers.

Which is generally a good thing, since saving/restoring things cost
cycles, and sometimes painful on entry paths where you may desperately
need a scratch register or two. (Recall this one? -
...
         movq    %rsp, PER_CPU_VAR(rsp_scratch)
         movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
         /* Construct struct pt_regs on stack */
         pushq   $__USER_DS                      /* pt_regs->ss */
         pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
...
wouldn't it be _great_ if one of GPRs would be available here
to hold userspace %rsp?
)

If userspace needs some registers saved, it's trivial for it to have:

	push reg1
	push reg2
	int  0x80
	pop  reg2
	pop  reg1

OTOH if userspace _does not_ need some registers saved,
but they are defined as saved by the entrypoint ABI, then save/restore work
is done every time, even when not needed.

Thus, I propose to retain the current behavior.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-18 16:53   ` Andy Lutomirski
  2018-04-18 17:13     ` Denys Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-18 16:53 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, X86 ML, LKML, Borislav Petkov, Dominik Brodowski

On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This means that the new behavior is there for some 8 years already.
> Whoever was impacted by it, probably already switched to the new ABI.
>
> Current ABI is "weaker", it allows kernel to save fewer registers.
>
> Which is generally a good thing, since saving/restoring things cost
> cycles, and sometimes painful on entry paths where you may desperately
> need a scratch register or two. (Recall this one? -
> ...
>         movq    %rsp, PER_CPU_VAR(rsp_scratch)
>         movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>         /* Construct struct pt_regs on stack */
>         pushq   $__USER_DS                      /* pt_regs->ss */
>         pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
> ...
> wouldn't it be _great_ if one of GPRs would be available here
> to hold userspace %rsp?
> )

But this is the int $0x80 entry, which uses the stack sanely and
doesn't have this problem at all.

>
> If userspace needs some registers saved, it's trivial for it to have:
>
>         push reg1
>         push reg2
>         int  0x80
>         pop  reg2
>         pop  reg1
>
> OTOH if userspace _does not_ need some registers saved,
> but they are defined as saved by the entrypoint ABI, then save/restore work
> is done every time, even when not needed.
>
> Thus, I propose to retain the current behavior.

The problems are:

1. If you look up how to do int $0x80, every answer you get doesn't
mention any clobbers.  The code works on x86_32 and seems to work on
x86_64.  I think we should make it actually work.

2. It's very easy to make this mistake and get away with it for a long
time, and the failure modes are hard to debug.  gcc doesn't allocate
r8..r11 that often, and there are plenty of contexts (near end of a
leaf function) where r8..r11 are dead even if they were allocated.  So
there is probably a decent body of code out there that makes this
mistake and is okay for now.  But if anyone ever compiles it with LTO,
it's reasonably likely to go boom.

So I think we should fix it in the interest of avoiding weird bugs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-18 16:53   ` Andy Lutomirski
@ 2018-04-18 17:13     ` Denys Vlasenko
  0 siblings, 0 replies; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-18 17:13 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: X86 ML, LKML, Borislav Petkov, Dominik Brodowski

On 04/18/2018 06:53 PM, Andy Lutomirski wrote:
> On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> This means that the new behavior is there for some 8 years already.
>> Whoever was impacted by it, probably already switched to the new ABI.
>>
>> Current ABI is "weaker", it allows kernel to save fewer registers.
>>
>> Which is generally a good thing, since saving/restoring things cost
>> cycles, and sometimes painful on entry paths where you may desperately
>> need a scratch register or two. (Recall this one? -
>> ...
>>          movq    %rsp, PER_CPU_VAR(rsp_scratch)
>>          movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>>          /* Construct struct pt_regs on stack */
>>          pushq   $__USER_DS                      /* pt_regs->ss */
>>          pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
>> ...
>> wouldn't it be _great_ if one of GPRs would be available here
>> to hold userspace %rsp?
>> )
> 
> But this is the int $0x80 entry, which uses the stack sanely and
> doesn't have this problem at all.

It was a general point why not committing to save every register
may help on the callee (in this case kernel) side.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-23 12:50 ` Borislav Petkov
  2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2018-04-23 12:50 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, LKML, Dominik Brodowski, Denys Vlasenko

On Tue, Apr 17, 2018 at 07:36:36AM -0700, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls.  From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11.  Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
> 
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
> 
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
> preserved.  I can't find any explanation of why this change was made.

Probably because r8-r11 are callee-clobbered, according to ABI so
someone decided to whack them so that code which doesn't adhere to the
ABI would fall on its face...

Also, looking at PUSH_AND_CLEAR_REGS and how we call it on the 64-bit
entry path, we probably should keep clearing those regs to avoid
speculation crap.

Methinks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/pti] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
  2018-04-23 12:50 ` Borislav Petkov
@ 2018-04-27 15:12 ` tip-bot for Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-04-27 15:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, hpa, bp, linux-kernel, dvlasenk, linux, tglx

Commit-ID:  8bb2610bc4967f19672444a7b0407367f1540028
Gitweb:     https://git.kernel.org/tip/8bb2610bc4967f19672444a7b0407367f1540028
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 17 Apr 2018 07:36:36 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 27 Apr 2018 17:07:58 +0200

x86/entry/64/compat: Preserve r8-r11 in int $0x80

32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
however, some 64-bit user code that intentionally uses int $0x80 to invoke
32-bit system calls.  From what I've seen, basically all such code assumes
that r8-r15 are all preserved, but the kernel clobbers r8-r11.  Since I
doubt that there's any code that depends on int $0x80 zeroing r8-r11,
change the kernel to preserve them.

I suspect that very little user code is broken by the old clobber, since
r8-r11 are only rarely allocated by gcc, and they're clobbered by function
calls, so they only way we'd see a problem is if the same function that
invokes int $0x80 also spills something important to one of these
registers.

The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
preserved.  I can't find any explanation of why this change was made.

Update the test_syscall_vdso_32 testcase as well to verify the new
behavior, and it strengthens the test to make sure that the kernel doesn't
accidentally permute r8..r15.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org

---
 arch/x86/entry/entry_64_compat.S                |  8 +++---
 tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9af927e59d49..9de7f1e1dede 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -84,13 +84,13 @@ ENTRY(entry_SYSENTER_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   %r8			/* pt_regs->r8 */
 	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   %r9			/* pt_regs->r9 */
 	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   %r10			/* pt_regs->r10 */
 	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   $0			/* pt_regs->r11 = 0 */
+	pushq   %r11			/* pt_regs->r11 */
 	xorl	%r11d, %r11d		/* nospec   r11 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	xorl	%ebx, %ebx		/* nospec   rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
 	"	shl	$32, %r8\n"
 	"	orq	$0x7f7f7f7f, %r8\n"
 	"	movq	%r8, %r9\n"
-	"	movq	%r8, %r10\n"
-	"	movq	%r8, %r11\n"
-	"	movq	%r8, %r12\n"
-	"	movq	%r8, %r13\n"
-	"	movq	%r8, %r14\n"
-	"	movq	%r8, %r15\n"
+	"	incq	%r9\n"
+	"	movq	%r9, %r10\n"
+	"	incq	%r10\n"
+	"	movq	%r10, %r11\n"
+	"	incq	%r11\n"
+	"	movq	%r11, %r12\n"
+	"	incq	%r12\n"
+	"	movq	%r12, %r13\n"
+	"	incq	%r13\n"
+	"	movq	%r13, %r14\n"
+	"	incq	%r14\n"
+	"	movq	%r14, %r15\n"
+	"	incq	%r15\n"
 	"	ret\n"
 	"	.code32\n"
 	"	.popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
 	int err = 0;
 	int num = 8;
 	uint64_t *r64 = &regs64.r8;
+	uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
 
 	if (!kernel_is_64bit)
 		return 0;
 
 	do {
-		if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+		if (*r64 == expected++)
 			continue; /* register did not change */
 		if (syscall_addr != (long)&int80) {
 			/*
@@ -147,18 +155,17 @@ int check_regs64(void)
 				continue;
 			}
 		} else {
-			/* INT80 syscall entrypoint can be used by
+			/*
+			 * INT80 syscall entrypoint can be used by
 			 * 64-bit programs too, unlike SYSCALL/SYSENTER.
 			 * Therefore it must preserve R12+
 			 * (they are callee-saved registers in 64-bit C ABI).
 			 *
-			 * This was probably historically not intended,
-			 * but R8..11 are clobbered (cleared to 0).
-			 * IOW: they are the only registers which aren't
-			 * preserved across INT80 syscall.
+			 * Starting in Linux 4.17 (and any kernel that
+			 * backports the change), R8..11 are preserved.
+			 * Historically (and probably unintentionally), they
+			 * were clobbered or zeroed.
 			 */
-			if (*r64 == 0 && num <= 11)
-				continue;
 		}
 		printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
 		err++;

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-27 15:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
2018-04-18 16:53   ` Andy Lutomirski
2018-04-18 17:13     ` Denys Vlasenko
2018-04-23 12:50 ` Borislav Petkov
2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).