LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) @ 2015-03-09 14:05 Denys Vlasenko 2015-03-09 14:18 ` Andy Lutomirski 2015-03-09 16:08 ` Linus Torvalds 0 siblings, 2 replies; 24+ messages in thread From: Denys Vlasenko @ 2015-03-09 14:05 UTC (permalink / raw) To: Andy Lutomirski Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel Old code was trying to avoid having three branch insns, but instead it has a chain of six insns where each insn depends on previos one. And it was touching PT_OLDSS(%esp) unconditionally, even when it may contain bogus data. Elsewhere we have to jump thru hoops just to make sure here PT_OLDSS(%esp) is at least in a valid page. All this just to have one branch instead of three? The new code simply checks each condition. All three checks can run in parallel on an out-of-order CPU. Most of the time, none of branches will be taken. Comparison of object code: Old: 1e6: 8b 44 24 38 mov 0x38(%esp),%eax 1ea: 8a 64 24 40 mov 0x40(%esp),%ah 1ee: 8a 44 24 34 mov 0x34(%esp),%al 1f2: 25 03 04 02 00 and $0x20403,%eax 1f7: 3d 03 04 00 00 cmp $0x403,%eax 1fc: 74 0f je 20d <ldt_ss> New: 1e6: 0f ba 64 24 38 11 btl $0x11,0x38(%esp) 1ec: 72 0e jb 1fc <restore_nocheck> 1ee: f6 44 24 34 03 testb $0x3,0x34(%esp) 1f3: 74 07 je 1fc <restore_nocheck> 1f5: f6 44 24 40 04 testb $0x4,0x40(%esp) 1fa: 75 0f jne 20b <ldt_ss> Patch is run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Ingo Molnar <mingo@kernel.org> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: x86@kernel.org CC: linux-kernel@vger.kernel.org --- arch/x86/kernel/entry_32.S | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index e33ba51..0a4996b 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -516,16 +516,15 @@ restore_all: TRACE_IRQS_IRET restore_all_notrace: #ifdef CONFIG_X86_ESPFIX32 - movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS - # Warning: PT_OLDSS(%esp) contains the wrong/random values if we - # are returning to the kernel. - # See comments in process.c:copy_thread() for details. - movb PT_OLDSS(%esp), %ah - movb PT_CS(%esp), %al - andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax - cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax CFI_REMEMBER_STATE - je ldt_ss # returning to user-space with LDT SS + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) + jc restore_nocheck # VM set, not it + testb $3,PT_CS(%esp) + jz restore_nocheck # CPL0, not it + # Note: we access PT_OLDSS only when we know it exists. + # If PT_CS is from CPL0, it does not. + testb $SEGMENT_TI_MASK,PT_OLDSS(%esp) + jnz ldt_ss # returning to user-space with LDT SS #endif restore_nocheck: RESTORE_REGS 4 # skip orig_eax/error_code -- 1.8.1.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 14:05 [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) Denys Vlasenko @ 2015-03-09 14:18 ` Andy Lutomirski 2015-03-09 15:00 ` Denys Vlasenko 2015-03-09 16:08 ` Linus Torvalds 1 sibling, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 14:18 UTC (permalink / raw) To: Denys Vlasenko Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel On Mon, Mar 9, 2015 at 7:05 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote: > Old code was trying to avoid having three branch insns, > but instead it has a chain of six insns where each insn > depends on previos one. > > And it was touching PT_OLDSS(%esp) unconditionally, even when it may > contain bogus data. Elsewhere we have to jump thru hoops > just to make sure here PT_OLDSS(%esp) is at least in a valid page. > > All this just to have one branch instead of three? > > The new code simply checks each condition. > All three checks can run in parallel on an out-of-order CPU. > Most of the time, none of branches will be taken. > > Comparison of object code: > Old: > 1e6: 8b 44 24 38 mov 0x38(%esp),%eax > 1ea: 8a 64 24 40 mov 0x40(%esp),%ah > 1ee: 8a 44 24 34 mov 0x34(%esp),%al > 1f2: 25 03 04 02 00 and $0x20403,%eax > 1f7: 3d 03 04 00 00 cmp $0x403,%eax > 1fc: 74 0f je 20d <ldt_ss> > New: > 1e6: 0f ba 64 24 38 11 btl $0x11,0x38(%esp) > 1ec: 72 0e jb 1fc <restore_nocheck> > 1ee: f6 44 24 34 03 testb $0x3,0x34(%esp) > 1f3: 74 07 je 1fc <restore_nocheck> > 1f5: f6 44 24 40 04 testb $0x4,0x40(%esp) > 1fa: 75 0f jne 20b <ldt_ss> > > Patch is run-tested. > > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> > CC: Linus Torvalds <torvalds@linux-foundation.org> > CC: Steven Rostedt <rostedt@goodmis.org> > CC: Ingo Molnar <mingo@kernel.org> > CC: Borislav Petkov <bp@alien8.de> > CC: "H. Peter Anvin" <hpa@zytor.com> > CC: Andy Lutomirski <luto@amacapital.net> > CC: Oleg Nesterov <oleg@redhat.com> > CC: Frederic Weisbecker <fweisbec@gmail.com> > CC: Alexei Starovoitov <ast@plumgrid.com> > CC: Will Drewry <wad@chromium.org> > CC: Kees Cook <keescook@chromium.org> > CC: x86@kernel.org > CC: linux-kernel@vger.kernel.org > --- > arch/x86/kernel/entry_32.S | 17 ++++++++--------- > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S > index e33ba51..0a4996b 100644 > --- a/arch/x86/kernel/entry_32.S > +++ b/arch/x86/kernel/entry_32.S > @@ -516,16 +516,15 @@ restore_all: > TRACE_IRQS_IRET > restore_all_notrace: > #ifdef CONFIG_X86_ESPFIX32 > - movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS > - # Warning: PT_OLDSS(%esp) contains the wrong/random values if we > - # are returning to the kernel. > - # See comments in process.c:copy_thread() for details. > - movb PT_OLDSS(%esp), %ah > - movb PT_CS(%esp), %al > - andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax > - cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax > CFI_REMEMBER_STATE > - je ldt_ss # returning to user-space with LDT SS > + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) > + jc restore_nocheck # VM set, not it This seems useless. In vm86 mode, espfix should work fine (even if pointlessly), CS won't have the two low bits set, and SS won't reference the LDT because it's not a selector at all. That being said, what ends up in the high bits of esp when we iret to vm86 mode? Do we actually need espfix on all returns to vm86 mode? Your patch passes my sigreturn test, so it at least results in functional espvix32 in the non-vm86 case. --Andy > + testb $3,PT_CS(%esp) > + jz restore_nocheck # CPL0, not it > + # Note: we access PT_OLDSS only when we know it exists. > + # If PT_CS is from CPL0, it does not. > + testb $SEGMENT_TI_MASK,PT_OLDSS(%esp) > + jnz ldt_ss # returning to user-space with LDT SS > #endif > restore_nocheck: > RESTORE_REGS 4 # skip orig_eax/error_code > -- > 1.8.1.4 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 14:18 ` Andy Lutomirski @ 2015-03-09 15:00 ` Denys Vlasenko 2015-03-09 15:09 ` Andy Lutomirski 2015-03-09 15:13 ` Ingo Molnar 0 siblings, 2 replies; 24+ messages in thread From: Denys Vlasenko @ 2015-03-09 15:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel On Mon, Mar 9, 2015 at 3:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: > Do we actually need espfix on all returns to vm86 mode? No, the current code (and my new version) does *not* do espfix for vm86. It's not needed (apparently). >> + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) >> + jc restore_nocheck # VM set, not it > > This seems useless. In vm86 mode, espfix should work fine (even if > pointlessly), CS won't have the two low bits set, and SS won't > reference the LDT because it's not a selector at all. You seem to suggest we can drop VM flag test. If we do that, the tests for CS and SS will work on bogus data. I.e. they will semi-randomly rouse execution through espfix. Which will probably work correctly, but IIRC espfix does crazy stuff which is likely to be slow. What we definitely should do here is at least frame this check with "#ifdef CONFIG_VM86". > That being said, what ends up in the high bits of esp when we iret to > vm86 mode? I don't know. I guess it's time to write an actual vm86 testcase :) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:00 ` Denys Vlasenko @ 2015-03-09 15:09 ` Andy Lutomirski 2015-03-09 19:31 ` Denys Vlasenko 2015-03-09 15:13 ` Ingo Molnar 1 sibling, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 15:09 UTC (permalink / raw) To: Denys Vlasenko Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel On Mon, Mar 9, 2015 at 8:00 AM, Denys Vlasenko <vda.linux@googlemail.com> wrote: > On Mon, Mar 9, 2015 at 3:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> Do we actually need espfix on all returns to vm86 mode? > > No, the current code (and my new version) does *not* do > espfix for vm86. It's not needed (apparently). > >>> + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) >>> + jc restore_nocheck # VM set, not it >> >> This seems useless. In vm86 mode, espfix should work fine (even if >> pointlessly), CS won't have the two low bits set, and SS won't >> reference the LDT because it's not a selector at all. > > You seem to suggest we can drop VM flag test. > > If we do that, the tests for CS and SS will work on bogus data. > I.e. they will semi-randomly rouse execution through espfix. > Mmm, right. My bad, that test is needed. > Which will probably work correctly, but IIRC espfix does crazy stuff > which is likely to be slow. > > What we definitely should do here is at least frame this check with > "#ifdef CONFIG_VM86". > >> That being said, what ends up in the high bits of esp when we iret to >> vm86 mode? > > I don't know. I guess it's time to write an actual vm86 testcase :) Ick. I can try... Anyway, you've convinced me that your patch is good. I queued it up. -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:09 ` Andy Lutomirski @ 2015-03-09 19:31 ` Denys Vlasenko 0 siblings, 0 replies; 24+ messages in thread From: Denys Vlasenko @ 2015-03-09 19:31 UTC (permalink / raw) To: Andy Lutomirski Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel [-- Attachment #1: Type: text/plain, Size: 398 bytes --] >> What we definitely should do here is at least frame this check with >>> That being said, what ends up in the high bits of esp when we iret to >>> vm86 mode? >> >> I don't know. I guess it's time to write an actual vm86 testcase :) > > Ick. I can try... I found an example which runs small bit of 16-bit code using vm86 machinery. Tried in 32-bit kernel under qemu, it worked: printed "Hello". [-- Attachment #2: vm86.c --] [-- Type: text/x-csrc, Size: 4700 bytes --] /* * Adaped from: runcom version 0.1 (c) 2003 Fabrice Bellar * "Simple example of use of vm86: launch a basic .com DOS executable" * * gcc -m32 -Os -Wall -static vm86.c -ovm86 */ #include <stdlib.h> #include <stdio.h> #include <string.h> #include <inttypes.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> #include <signal.h> #include <linux/unistd.h> #include <asm/vm86.h> #include <sys/vm86.h> //#define SIGTEST #define COM_BASE_ADDR 0x10100 static inline void set_bit(uint8_t *a, unsigned int bit) { a[bit / 8] |= (1 << (bit % 8)); } static inline uint8_t *seg_to_linear(unsigned int seg, unsigned int reg) { return (uint8_t *)((seg << 4) + (reg & 0xffff)); } static inline void pushw(struct vm86_regs *r, int val) { r->esp = (r->esp & ~0xffff) | ((r->esp - 2) & 0xffff); *(uint16_t *)seg_to_linear(r->ss, r->esp) = val; } void dump_regs(struct vm86_regs *r) { fprintf(stderr, "AX=%08lx BX=%08lx CX=%08lx DX=%08lx\n" "SI=%08lx DI=%08lx BP=%08lx SP=%08lx\n" "IP=%08lx FL=%08lx\n" "CS=%04x DS=%04x ES=%04x SS=%04x FS=%04x GS=%04x\n", r->eax, r->ebx, r->ecx, r->edx, r->esi, r->edi, r->ebp, r->esp, r->eip, r->eflags, r->cs, r->ds, r->es, r->ss, r->fs, r->gs); } #ifdef SIGTEST void alarm_handler(int sig) { fprintf(stderr, "alarm signal=%d\n", sig); alarm(1); } #endif extern char code16; extern char code16_end; int main(int argc, char **argv) { uint8_t *vm86_mem; int ret, seg; struct vm86plus_struct ctx; struct vm86_regs *r; vm86_mem = mmap((void *)0x00000000, 0x110000, PROT_WRITE | PROT_READ | PROT_EXEC, MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0); if (vm86_mem == MAP_FAILED) { perror("mmap"); exit(1); } #ifdef SIGTEST { struct sigaction act; act.sa_handler = alarm_handler; sigemptyset(&act.sa_mask); act.sa_flags = 0; sigaction(SIGALRM, &act, NULL); alarm(1); } #endif /* load 16-bit code at COM_BASE_ADDR */ memcpy(vm86_mem + COM_BASE_ADDR, &code16, &code16_end - &code16); memset(&ctx, 0, sizeof(ctx)); /* init basic registers */ r = &ctx.regs; r->eip = 0x100; r->esp = 0xfffe; seg = (COM_BASE_ADDR - 0x100) >> 4; r->cs = seg; r->ss = seg; r->ds = seg; r->es = seg; r->fs = seg; r->gs = seg; r->eflags = 1 << 19; //EFLAGS.VIF set_bit((uint8_t *)&ctx.int_revectored, 0x21); /* put return code */ *seg_to_linear(r->cs, 0) = 0xb4; /* mov ah, $0 */ *seg_to_linear(r->cs, 1) = 0x00; *seg_to_linear(r->cs, 2) = 0xcd; /* int $0x21 */ *seg_to_linear(r->cs, 3) = 0x21; pushw(&ctx.regs, 0x0000); for(;;) { ret = vm86(VM86_ENTER, &ctx); switch(VM86_TYPE(ret)) { case VM86_INTx: { int int_num, ah; int_num = VM86_ARG(ret); if (int_num != 0x21) goto unknown_int; ah = (r->eax >> 8) & 0xff; switch(ah) { case 0x00: /* exit */ exit(0); case 0x02: /* write char */ { uint8_t c = r->edx; write(1, &c, 1); } break; case 0x09: /* write string */ { int ptr = r->edx; uint8_t c; for(;;) { c = *seg_to_linear(r->ds, ptr++); if (c == '$') break; write(1, &c, 1); } r->eax = (r->eax & ~0xff) | '$'; } break; default: unknown_int: fprintf(stderr, "unsupported int 0x%02x\n", int_num); dump_regs(&ctx.regs); // exit(1); } } break; case VM86_SIGNAL: /* a signal came, we just ignore that */ break; case VM86_STI: break; default: fprintf(stderr, "unhandled vm86 return code (0x%x)\n", ret); dump_regs(&ctx.regs); exit(1); } } } void code() { asm volatile("\n" " .code16""\n" "code16:""\n" " mov $(0x100+msg-code16),%dx""\n" " mov $0x09,%ah""\n" " int $0x21""\n" " ret""\n" "msg:""\n" " .string \"Hello\"""\n" " .byte 10""\n" " .string \"$\"""\n" "code16_end:""\n" " .code32""\n" ); } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:00 ` Denys Vlasenko 2015-03-09 15:09 ` Andy Lutomirski @ 2015-03-09 15:13 ` Ingo Molnar 2015-03-09 15:18 ` Andy Lutomirski 2015-03-09 15:47 ` Steven Rostedt 1 sibling, 2 replies; 24+ messages in thread From: Ingo Molnar @ 2015-03-09 15:13 UTC (permalink / raw) To: Denys Vlasenko Cc: Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel * Denys Vlasenko <vda.linux@googlemail.com> wrote: > > That being said, what ends up in the high bits of esp when we iret > > to vm86 mode? > > I don't know. I guess it's time to write an actual vm86 testcase :) Btw., could you please stick it and your other testcases into tools/x86/tests/ or so? Having such a testsuite would ease maintenance considerably. Thanks, Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:13 ` Ingo Molnar @ 2015-03-09 15:18 ` Andy Lutomirski 2015-03-09 15:47 ` Steven Rostedt 1 sibling, 0 replies; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 15:18 UTC (permalink / raw) To: Ingo Molnar Cc: Denys Vlasenko, Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel On Mon, Mar 9, 2015 at 8:13 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Denys Vlasenko <vda.linux@googlemail.com> wrote: > >> > That being said, what ends up in the high bits of esp when we iret >> > to vm86 mode? >> >> I don't know. I guess it's time to write an actual vm86 testcase :) > > Btw., could you please stick it and your other testcases into > tools/x86/tests/ or so? Having such a testsuite would ease maintenance > considerably. > My excuse used to be that the test actively exploited really nasty security issues. Then it was that I was waiting for the infrastructure to stabilize a bit. I'll go see how it's looking now. --Andy > Thanks, > > Ingo -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:13 ` Ingo Molnar 2015-03-09 15:18 ` Andy Lutomirski @ 2015-03-09 15:47 ` Steven Rostedt 2015-03-09 15:54 ` Ingo Molnar 1 sibling, 1 reply; 24+ messages in thread From: Steven Rostedt @ 2015-03-09 15:47 UTC (permalink / raw) To: Ingo Molnar Cc: Denys Vlasenko, Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel, Shuah Khan On Mon, 9 Mar 2015 16:13:55 +0100 Ingo Molnar <mingo@kernel.org> wrote: > Btw., could you please stick it and your other testcases into > tools/x86/tests/ or so? Having such a testsuite would ease maintenance > considerably. Should this be part of tools/testing/selftests? -- Steve ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 15:47 ` Steven Rostedt @ 2015-03-09 15:54 ` Ingo Molnar 0 siblings, 0 replies; 24+ messages in thread From: Ingo Molnar @ 2015-03-09 15:54 UTC (permalink / raw) To: Steven Rostedt Cc: Denys Vlasenko, Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel, Shuah Khan * Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 9 Mar 2015 16:13:55 +0100 > Ingo Molnar <mingo@kernel.org> wrote: > > > Btw., could you please stick it and your other testcases into > > tools/x86/tests/ or so? Having such a testsuite would ease > > maintenance considerably. > > Should this be part of tools/testing/selftests? Yeah. Thanks, Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 14:05 [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) Denys Vlasenko 2015-03-09 14:18 ` Andy Lutomirski @ 2015-03-09 16:08 ` Linus Torvalds 2015-03-09 16:28 ` Denys Vlasenko 2015-03-09 17:42 ` H. Peter Anvin 1 sibling, 2 replies; 24+ messages in thread From: Linus Torvalds @ 2015-03-09 16:08 UTC (permalink / raw) To: Denys Vlasenko Cc: Andy Lutomirski, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 7:05 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote: > New: > 1e6: 0f ba 64 24 38 11 btl $0x11,0x38(%esp) btl? Really? Why isn't that just testb $2,0x3a(%esp) which is both smaller and quite a bit faster on older machines. Sure, the btl is easier to explain in the source code, but instead of this: > + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) you'd have to add a comment, like testb $2, PT_EFLAGS+2(%esp) # X86_EFLAGS_VM_BIT or something. Or just at least *partially* do what we used to do, and make it all be movb PT_EFLAGS+2(%esp),%al andb $2,%al orb PT_CS(%esp),%al testb $3,%al je restore_nocheck testb $SEGMENT_TI_MASK,PT_OLDSS(%esp) jne ldt_ss which still avoids looking at SS unless needed, and is smaller and faster than the btl, afaik. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 16:08 ` Linus Torvalds @ 2015-03-09 16:28 ` Denys Vlasenko 2015-03-09 16:44 ` Linus Torvalds 2015-03-09 17:42 ` H. Peter Anvin 1 sibling, 1 reply; 24+ messages in thread From: Denys Vlasenko @ 2015-03-09 16:28 UTC (permalink / raw) To: Linus Torvalds Cc: Denys Vlasenko, Andy Lutomirski, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 5:08 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Mar 9, 2015 at 7:05 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote: >> New: >> 1e6: 0f ba 64 24 38 11 btl $0x11,0x38(%esp) > > btl? Really? > > Why isn't that just > > testb $2,0x3a(%esp) > > which is both smaller and quite a bit faster on older machines. Ok. (I thought people won't be happy about this.) > Or just at least *partially* do what we used to do, and make it all be > > movb PT_EFLAGS+2(%esp),%al > andb $2,%al > orb PT_CS(%esp),%al > testb $3,%al > je restore_nocheck These insns must run serially. 4 cycles. test branch test branch can execute both test/branch'es out-of-order in parallel. 2 cycles. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 16:28 ` Denys Vlasenko @ 2015-03-09 16:44 ` Linus Torvalds 2015-03-09 17:44 ` H. Peter Anvin 0 siblings, 1 reply; 24+ messages in thread From: Linus Torvalds @ 2015-03-09 16:44 UTC (permalink / raw) To: Denys Vlasenko Cc: Denys Vlasenko, Andy Lutomirski, Steven Rostedt, Ingo Molnar, Borislav Petkov, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 9:28 AM, Denys Vlasenko <vda.linux@googlemail.com> wrote: > > can execute both test/branch'es out-of-order in parallel. Assuming it predicts perfectly, yes, and the fallthrough is the default case. Which is *probably* true her, at least often. And at least for the VM bit. So it's likely good to have three branches. The unpredictable one is likely the CS low bit test, which with interrupts in the idle routine will possibly get a lot of noise from kernel returns too. The *old* code likely predicted perfectly (because with the cmp we would care about the LDT SS bit only if the other bits were set, which is correct). And remember: those zero-cost out-of-order branches turn quite expensive if they *ever* mispredict. Even a 5% mispredict rate is likely to mean "it's better to have a data dependency chain". So it could easily go either way. I'm not convinced the old code is bad at all. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 16:44 ` Linus Torvalds @ 2015-03-09 17:44 ` H. Peter Anvin 2015-03-09 19:13 ` Andy Lutomirski 0 siblings, 1 reply; 24+ messages in thread From: H. Peter Anvin @ 2015-03-09 17:44 UTC (permalink / raw) To: Linus Torvalds, Denys Vlasenko Cc: Denys Vlasenko, Andy Lutomirski, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On 03/09/2015 09:44 AM, Linus Torvalds wrote: > > And remember: those zero-cost out-of-order branches turn quite > expensive if they *ever* mispredict. Even a 5% mispredict rate is > likely to mean "it's better to have a data dependency chain". > > So it could easily go either way. I'm not convinced the old code is bad at all. > I'm inclined to side with Linus here. I'm hesitant to change this based on pure speculation. To answer Andy's question: I do believe we need espfix for V86 mode as well. -hpa ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 17:44 ` H. Peter Anvin @ 2015-03-09 19:13 ` Andy Lutomirski 2015-03-09 19:26 ` H. Peter Anvin 0 siblings, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 19:13 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Denys Vlasenko, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 717 bytes --] On Mon, Mar 9, 2015 at 10:44 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 03/09/2015 09:44 AM, Linus Torvalds wrote: >> >> And remember: those zero-cost out-of-order branches turn quite >> expensive if they *ever* mispredict. Even a 5% mispredict rate is >> likely to mean "it's better to have a data dependency chain". >> >> So it could easily go either way. I'm not convinced the old code is bad at all. >> > > I'm inclined to side with Linus here. I'm hesitant to change this based > on pure speculation. > > To answer Andy's question: I do believe we need espfix for V86 mode as well. > I think we don't. Did I screw up my test? --Andy > -hpa > > -- Andy Lutomirski AMA Capital Management, LLC [-- Attachment #2: vm86regs.c --] [-- Type: text/x-csrc, Size: 1740 bytes --] /* * vm86 regs test. * Copyright (c) 2014-2015 Andrew Lutomirski. * * This tests that vm86 regs work as expected. * * GPL v2. */ #define _GNU_SOURCE #include <time.h> #include <stdlib.h> #include <sys/syscall.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <inttypes.h> #include <sys/mman.h> #include <sys/signal.h> #include <sys/ucontext.h> #include <asm/ldt.h> #include <err.h> #include <setjmp.h> #include <stddef.h> #include <stdbool.h> #include <sys/user.h> #include <errno.h> #include <asm/vm86.h> static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), int flags) { struct sigaction sa; memset(&sa, 0, sizeof(sa)); sa.sa_sigaction = handler; sa.sa_flags = SA_SIGINFO | flags; sigemptyset(&sa.sa_mask); if (sigaction(sig, &sa, 0)) err(1, "sigaction"); } static void sigsegv_vm86(int sig, siginfo_t *info, void *ctx_void) { ucontext_t *ctx = (ucontext_t*)ctx_void; printf("Back from vm86. EIP = %lx\n", (unsigned long)ctx->uc_mcontext.gregs[REG_EIP]); } static void test_vm86(unsigned short cs, unsigned short ss) { struct vm86plus_struct v86, req_v86; long ret; memset(&v86, 0, sizeof(v86)); v86.regs.eip = 0; v86.regs.cs = cs; v86.regs.ss = ss; v86.regs.esp = 0xbaadf00d; req_v86 = v86; printf("[RUN]\tcs = 0x%hx, ss = 0x%hx\n", cs, ss); ret = syscall(SYS_vm86, VM86_ENTER, &v86); if (ret == -1 && errno == ENOSYS) { printf("[SKIP]\tvm86 not supported\n"); return; } printf("[OK]\tSurvived vm86 roundtrip. esp = %lx, should be %lx\n", v86.regs.esp, req_v86.regs.esp); } int main(void) { sethandler(SIGSEGV, sigsegv_vm86, SA_ONSTACK); test_vm86(0, 0); test_vm86(0, 3); test_vm86(3, 0); test_vm86(3, 3); return 0; } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 19:13 ` Andy Lutomirski @ 2015-03-09 19:26 ` H. Peter Anvin 2015-03-09 19:51 ` Andy Lutomirski 0 siblings, 1 reply; 24+ messages in thread From: H. Peter Anvin @ 2015-03-09 19:26 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Denys Vlasenko, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On 03/09/2015 12:13 PM, Andy Lutomirski wrote: > On Mon, Mar 9, 2015 at 10:44 AM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 03/09/2015 09:44 AM, Linus Torvalds wrote: >>> >>> And remember: those zero-cost out-of-order branches turn quite >>> expensive if they *ever* mispredict. Even a 5% mispredict rate is >>> likely to mean "it's better to have a data dependency chain". >>> >>> So it could easily go either way. I'm not convinced the old code is bad at all. >>> >> >> I'm inclined to side with Linus here. I'm hesitant to change this based >> on pure speculation. >> >> To answer Andy's question: I do believe we need espfix for V86 mode as well. >> > > I think we don't. Did I screw up my test? > I don't see how your test executes V86 mode code at all, since there seems to be nothing mapped at the bottom of memory? -hpa ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 19:26 ` H. Peter Anvin @ 2015-03-09 19:51 ` Andy Lutomirski 0 siblings, 0 replies; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 19:51 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Denys Vlasenko, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 12:26 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 03/09/2015 12:13 PM, Andy Lutomirski wrote: >> On Mon, Mar 9, 2015 at 10:44 AM, H. Peter Anvin <hpa@zytor.com> wrote: >>> On 03/09/2015 09:44 AM, Linus Torvalds wrote: >>>> >>>> And remember: those zero-cost out-of-order branches turn quite >>>> expensive if they *ever* mispredict. Even a 5% mispredict rate is >>>> likely to mean "it's better to have a data dependency chain". >>>> >>>> So it could easily go either way. I'm not convinced the old code is bad at all. >>>> >>> >>> I'm inclined to side with Linus here. I'm hesitant to change this based >>> on pure speculation. >>> >>> To answer Andy's question: I do believe we need espfix for V86 mode as well. >>> >> >> I think we don't. Did I screw up my test? >> > > I don't see how your test executes V86 mode code at all, since there > seems to be nothing mapped at the bottom of memory? It executes the implicit #PF at the bottom of memory :) Seriously, though, I think that IRET doesn't check whether the instruction we're returning to can be fetched, so IRET will complete successfully and then we'll get #PF. The resulting SIGSEGV kicks us out of vm86 mode, and, assuming that the kernel isn't buggy (hah!) the v86 state will get saved back to the vm86plus_struct and we can see if esp got corrupted. --Andy ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 16:08 ` Linus Torvalds 2015-03-09 16:28 ` Denys Vlasenko @ 2015-03-09 17:42 ` H. Peter Anvin 2015-03-09 17:45 ` Andy Lutomirski 1 sibling, 1 reply; 24+ messages in thread From: H. Peter Anvin @ 2015-03-09 17:42 UTC (permalink / raw) To: Linus Torvalds, Denys Vlasenko Cc: Andy Lutomirski, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On 03/09/2015 09:08 AM, Linus Torvalds wrote: > > Sure, the btl is easier to explain in the source code, but instead of this: > >> + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) > > you'd have to add a comment, like > > testb $2, PT_EFLAGS+2(%esp) # X86_EFLAGS_VM_BIT > > or something. > Maybe: testb $(X86_EFLAGS_VM-16), PT_EFLAGS+2(%esp) > Or just at least *partially* do what we used to do, and make it all be > > movb PT_EFLAGS+2(%esp),%al > andb $2,%al > orb PT_CS(%esp),%al > testb $3,%al > je restore_nocheck > testb $SEGMENT_TI_MASK,PT_OLDSS(%esp) > jne ldt_ss > > which still avoids looking at SS unless needed, and is smaller and > faster than the btl, afaik. The question is if avoiding looking at a field on the stack matters at all. -hpa ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 17:42 ` H. Peter Anvin @ 2015-03-09 17:45 ` Andy Lutomirski 2015-03-09 17:59 ` Linus Torvalds 0 siblings, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 17:45 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 10:42 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 03/09/2015 09:08 AM, Linus Torvalds wrote: >> >> Sure, the btl is easier to explain in the source code, but instead of this: >> >>> + btl $X86_EFLAGS_VM_BIT,PT_EFLAGS(%esp) >> >> you'd have to add a comment, like >> >> testb $2, PT_EFLAGS+2(%esp) # X86_EFLAGS_VM_BIT >> >> or something. >> > > Maybe: > > testb $(X86_EFLAGS_VM-16), PT_EFLAGS+2(%esp) > >> Or just at least *partially* do what we used to do, and make it all be >> >> movb PT_EFLAGS+2(%esp),%al >> andb $2,%al >> orb PT_CS(%esp),%al >> testb $3,%al >> je restore_nocheck >> testb $SEGMENT_TI_MASK,PT_OLDSS(%esp) >> jne ldt_ss >> >> which still avoids looking at SS unless needed, and is smaller and >> faster than the btl, afaik. > > The question is if avoiding looking at a field on the stack matters at all. It does for silly reasons. If sp0 is set to the very top of the stack, then an NMI immediately after sysenter will have OLDSS off the top of the stack, and reading it can crash. This is why 32-bit kernels have a (buggy!) 8 byte offset in sp0. An alternative would be to fix the bug, but I still think it's ugly. --Andy ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 17:45 ` Andy Lutomirski @ 2015-03-09 17:59 ` Linus Torvalds 2015-03-09 18:04 ` Andy Lutomirski 0 siblings, 1 reply; 24+ messages in thread From: Linus Torvalds @ 2015-03-09 17:59 UTC (permalink / raw) To: Andy Lutomirski Cc: H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 10:45 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > If sp0 is set to the very top of the stack, then an NMI immediately > after sysenter will have OLDSS off the top of the stack, and reading > it can crash. This is why 32-bit kernels have a (buggy!) 8 byte > offset in sp0. So I think that for sysenter, we *should* have that 8-byte buffer. Not in general for sp0, but for MSR_IA32_SYSENTER_ESP (which is sp1, afaik). Just make the rule be that you can never ever have a kernel stack frame that doesn't contain room for ss/sp at the top. We have various code that looks at and touches "pt_regs" anyway, and accesses things out for debugging/oopsing/tracing etc. Let's not make the rule be that you cannot look at regs->ss without checking various random other fields first. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 17:59 ` Linus Torvalds @ 2015-03-09 18:04 ` Andy Lutomirski 2015-03-09 18:16 ` Linus Torvalds 0 siblings, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 18:04 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 10:59 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Mar 9, 2015 at 10:45 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> If sp0 is set to the very top of the stack, then an NMI immediately >> after sysenter will have OLDSS off the top of the stack, and reading >> it can crash. This is why 32-bit kernels have a (buggy!) 8 byte >> offset in sp0. > > So I think that for sysenter, we *should* have that 8-byte buffer. > > Not in general for sp0, but for MSR_IA32_SYSENTER_ESP (which is sp1, afaik). Bah. I'm slightly wrong here. It's not NMI immediately after sysenter (that's handled by the emergency stack gunk, although we might need to fix that too). It's this: ENTRY(ia32_sysenter_target) CFI_STARTPROC simple CFI_SIGNAL_FRAME CFI_DEF_CFA esp, 0 CFI_REGISTER esp, ebp movl TSS_sysenter_sp0(%esp),%esp sysenter_past_esp: /* * Interrupts are disabled here, but we can't trace it until * enough kernel state to call TRACE_IRQS_OFF can be called - but * we immediately enable interrupts at that point anyway. */ --> NMI here pushl_cfi $__USER_DS --> or here /*CFI_REL_OFFSET ss, 0*/ pushl_cfi %ebp We could fix that locally by loading sp0 - 8 atomically (w/ percpu help) and populating the top two words with mov instead of push. One option would be to change the NMI entry code to move itself down 8 bytes if this happens (came from kernel mode or sp == sp0 - 12, perhaps). Or we could suck it up and keep that 8 byte offset (and fix the cpu_tss initializer bug that threw me off in the first place). --Andy > > Just make the rule be that you can never ever have a kernel stack > frame that doesn't contain room for ss/sp at the top. > > We have various code that looks at and touches "pt_regs" anyway, and > accesses things out for debugging/oopsing/tracing etc. Let's not make > the rule be that you cannot look at regs->ss without checking various > random other fields first. > > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 18:04 ` Andy Lutomirski @ 2015-03-09 18:16 ` Linus Torvalds 2015-03-09 18:32 ` Denys Vlasenko 2015-03-09 18:36 ` Andy Lutomirski 0 siblings, 2 replies; 24+ messages in thread From: Linus Torvalds @ 2015-03-09 18:16 UTC (permalink / raw) To: Andy Lutomirski Cc: H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 11:04 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > One option would be to change the NMI entry code to move itself down 8 > bytes if this happens (came from kernel mode or sp == sp0 - 12, > perhaps). Hmm. That whole code currently depends on the stack setup being just a single instruction (the move to esp). And that simplifies things, I'd like to keep it that way. I'd *much* rather just keep the 8-byte padding. What was so problematic with that? It worked. It's been around forever. Removing it is the bug. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 18:16 ` Linus Torvalds @ 2015-03-09 18:32 ` Denys Vlasenko 2015-03-09 18:36 ` Andy Lutomirski 1 sibling, 0 replies; 24+ messages in thread From: Denys Vlasenko @ 2015-03-09 18:32 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 7:16 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Mar 9, 2015 at 11:04 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> One option would be to change the NMI entry code to move itself down 8 >> bytes if this happens (came from kernel mode or sp == sp0 - 12, >> perhaps). > > Hmm. That whole code currently depends on the stack setup being just a > single instruction (the move to esp). And that simplifies things, I'd > like to keep it that way. > > I'd *much* rather just keep the 8-byte padding. What was so > problematic with that? It worked. It's been around forever. Removing > it is the bug. Unfortunately, looks like keeping the gap is the safest thing to do. Hopes that tss.sp0 was true top of the stack, sans "a few little problems", died so quickly :/ (and we didn't even touch vm86 yet). We probably better off to give that 8 a name. Now it's just another magic constant in the entry.S maze. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 18:16 ` Linus Torvalds 2015-03-09 18:32 ` Denys Vlasenko @ 2015-03-09 18:36 ` Andy Lutomirski 2015-03-10 6:25 ` Ingo Molnar 1 sibling, 1 reply; 24+ messages in thread From: Andy Lutomirski @ 2015-03-09 18:36 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List On Mon, Mar 9, 2015 at 11:16 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Mar 9, 2015 at 11:04 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> One option would be to change the NMI entry code to move itself down 8 >> bytes if this happens (came from kernel mode or sp == sp0 - 12, >> perhaps). > > Hmm. That whole code currently depends on the stack setup being just a > single instruction (the move to esp). And that simplifies things, I'd > like to keep it that way. > > I'd *much* rather just keep the 8-byte padding. What was so > problematic with that? It worked. It's been around forever. Removing > it is the bug. Let's at least fix it, then. processor.h has: #define INIT_TSS { \ .x86_tss = { \ .sp0 = sizeof(init_stack) + (long)&init_stack, \ (moved in -tip) That's bogus, and this bogosity is why I broke 32-bit -next in the first place: I assumed it was correct. I'll get it if no one beats me to it. --Andy > > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) 2015-03-09 18:36 ` Andy Lutomirski @ 2015-03-10 6:25 ` Ingo Molnar 0 siblings, 0 replies; 24+ messages in thread From: Ingo Molnar @ 2015-03-10 6:25 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, H. Peter Anvin, Denys Vlasenko, Steven Rostedt, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook, the arch/x86 maintainers, Linux Kernel Mailing List * Andy Lutomirski <luto@amacapital.net> wrote: > On Mon, Mar 9, 2015 at 11:16 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Mon, Mar 9, 2015 at 11:04 AM, Andy Lutomirski <luto@amacapital.net> wrote: > >> > >> One option would be to change the NMI entry code to move itself down 8 > >> bytes if this happens (came from kernel mode or sp == sp0 - 12, > >> perhaps). > > > > Hmm. That whole code currently depends on the stack setup being just a > > single instruction (the move to esp). And that simplifies things, I'd > > like to keep it that way. > > > > I'd *much* rather just keep the 8-byte padding. What was so > > problematic with that? It worked. It's been around forever. Removing > > it is the bug. > > Let's at least fix it, then. processor.h has: > > #define INIT_TSS { \ > .x86_tss = { \ > .sp0 = sizeof(init_stack) + (long)&init_stack, \ > > (moved in -tip) > > That's bogus, and this bogosity is why I broke 32-bit -next in the > first place: I assumed it was correct. > > I'll get it if no one beats me to it. Please do and please post patches ASAP so that I can move tip:x86/asm back into a correct state again. Thanks, Ingo ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-03-10 6:25 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-09 14:05 [PATCH] x86: entry_32.S: change ESPFIX test to not touch PT_OLDSS(%esp) Denys Vlasenko 2015-03-09 14:18 ` Andy Lutomirski 2015-03-09 15:00 ` Denys Vlasenko 2015-03-09 15:09 ` Andy Lutomirski 2015-03-09 19:31 ` Denys Vlasenko 2015-03-09 15:13 ` Ingo Molnar 2015-03-09 15:18 ` Andy Lutomirski 2015-03-09 15:47 ` Steven Rostedt 2015-03-09 15:54 ` Ingo Molnar 2015-03-09 16:08 ` Linus Torvalds 2015-03-09 16:28 ` Denys Vlasenko 2015-03-09 16:44 ` Linus Torvalds 2015-03-09 17:44 ` H. Peter Anvin 2015-03-09 19:13 ` Andy Lutomirski 2015-03-09 19:26 ` H. Peter Anvin 2015-03-09 19:51 ` Andy Lutomirski 2015-03-09 17:42 ` H. Peter Anvin 2015-03-09 17:45 ` Andy Lutomirski 2015-03-09 17:59 ` Linus Torvalds 2015-03-09 18:04 ` Andy Lutomirski 2015-03-09 18:16 ` Linus Torvalds 2015-03-09 18:32 ` Denys Vlasenko 2015-03-09 18:36 ` Andy Lutomirski 2015-03-10 6:25 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).