LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
@ 2015-02-24 18:51 Denys Vlasenko
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 18:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

In all three 32-bit entry points, %eax is zero-extended to %rax.
It is safe to do 32-bit compare when checking that syscall#
is not too large.

The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.

The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
instead of 5-byte one.

After TEST insn, JE anctually means "jump of zero",
let's use JZ mnemonic instead.

At irq_return_via_sysret:
* avoid redundant load of %r11 (it is already loaded a few instructions before).
* do not needlessly increment %rsp - we are going to return to userspace
  via SYSRET, this insn doesn't use stack for return.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S      | 14 +++++++-------
 arch/x86/include/asm/calling.h |  3 +++
 arch/x86/kernel/entry_64.S     | 13 ++++++-------
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 6dcd372..01eca80 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -163,8 +163,8 @@ sysenter_flags_fixed:
 	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
 	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
 	CFI_REMEMBER_STATE
-	jnz  sysenter_tracesys
-	cmpq	$(IA32_NR_syscalls-1),%rax
+	jnz	sysenter_tracesys
+	cmpl	$(IA32_NR_syscalls-1),%eax
 	ja	ia32_badsys
 sysenter_do_call:
 	/* 32bit syscall -> 64bit C ABI argument conversion */
@@ -350,9 +350,9 @@ ENTRY(ia32_cstar_target)
 	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
 	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
 	CFI_REMEMBER_STATE
-	jnz   cstar_tracesys
-	cmpq $IA32_NR_syscalls-1,%rax
-	ja  ia32_badsys
+	jnz	cstar_tracesys
+	cmpl	$(IA32_NR_syscalls-1),%eax
+	ja	ia32_badsys
 cstar_do_call:
 	/* 32bit syscall -> 64bit C ABI argument conversion */
 	movl	%edi,%r8d	/* arg5 */
@@ -473,7 +473,7 @@ ENTRY(ia32_syscall)
 	orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
 	jnz ia32_tracesys
-	cmpq $(IA32_NR_syscalls-1),%rax
+	cmpl $(IA32_NR_syscalls-1),%eax
 	ja ia32_badsys
 ia32_do_call:
 	/* 32bit syscall -> 64bit C ABI argument conversion */
@@ -536,7 +536,7 @@ ia32_ptregs_common:
 	CFI_ENDPROC
 	CFI_STARTPROC32	simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,SS+8
+	CFI_DEF_CFA	rsp,SIZEOF_PTREGS
 	CFI_REL_OFFSET	rax,RAX
 	CFI_REL_OFFSET	rcx,RCX
 	CFI_REL_OFFSET	rdx,RDX
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 3374235..f1a962f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -176,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
 	.macro RESTORE_C_REGS_EXCEPT_RCX
 	RESTORE_C_REGS_HELPER 1,0,1,1,1
 	.endm
+	.macro RESTORE_C_REGS_EXCEPT_R11
+	RESTORE_C_REGS_HELPER 1,1,0,1,1
+	.endm
 	.macro RESTORE_RSI_RDI
 	RESTORE_C_REGS_HELPER 0,0,0,0,0
 	.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 64f2fd3..b93f3a2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -312,7 +312,7 @@ int_ret_from_sys_call_fixup:
 	/* Do syscall tracing */
 tracesys:
 	movq %rsp, %rdi
-	movq $AUDIT_ARCH_X86_64, %rsi
+	movl $AUDIT_ARCH_X86_64, %esi
 	call syscall_trace_enter_phase1
 	test %rax, %rax
 	jnz tracesys_phase2		/* if needed, run the slow path */
@@ -323,7 +323,7 @@ tracesys_phase2:
 	SAVE_EXTRA_REGS
 	FIXUP_TOP_OF_STACK %rdi
 	movq %rsp, %rdi
-	movq $AUDIT_ARCH_X86_64, %rsi
+	movl $AUDIT_ARCH_X86_64, %esi
 	movq %rax,%rdx
 	call syscall_trace_enter_phase2
 
@@ -686,7 +686,7 @@ ret_from_intr:
 exit_intr:
 	GET_THREAD_INFO(%rcx)
 	testl $3,CS(%rsp)
-	je retint_kernel
+	jz retint_kernel
 
 	/* Interrupt came from user space */
 	/*
@@ -742,7 +742,7 @@ retint_swapgs:		/* return to user-space */
 	cmpq %r11,EFLAGS(%rsp)		/* R11 == RFLAGS */
 	jne opportunistic_sysret_failed
 
-	testq $X86_EFLAGS_RF,%r11	/* sysret can't restore RF */
+	testl $X86_EFLAGS_RF,%r11d	/* sysret can't restore RF */
 	jnz opportunistic_sysret_failed
 
 	/* nothing to check for RSP */
@@ -756,9 +756,8 @@ retint_swapgs:		/* return to user-space */
 	 */
 irq_return_via_sysret:
 	CFI_REMEMBER_STATE
-	RESTORE_C_REGS
-	REMOVE_PT_GPREGS_FROM_STACK 8
-	movq (RSP-RIP)(%rsp),%rsp
+	RESTORE_C_REGS_EXCEPT_R11
+	movq RSP(%rsp),%rsp
 	USERGS_SYSRET64
 	CFI_RESTORE_STATE
 
-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
@ 2015-02-24 18:51 ` Denys Vlasenko
  2015-02-24 19:30   ` Steven Rostedt
                     ` (2 more replies)
  2015-02-24 18:51 ` [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath Denys Vlasenko
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 18:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefit from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.
And ia32_sysenter_target even needs to *undo* this optimization:
it constructs iret stack with pushes instead of movs,
so it needs to start right at the top.

This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.

Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) -
"rsp is SIZEOF_PTREGS bytes below the top, calculate
thread_info's address using that information"

Net result in code is that one instruction is eliminated,
and constants in several others have changed.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S          | 35 +++++++++++++++++------------------
 arch/x86/include/asm/thread_info.h |  8 +++-----
 arch/x86/kernel/cpu/common.c       |  2 +-
 arch/x86/kernel/entry_64.S         |  8 ++++----
 arch/x86/kernel/process_32.c       |  3 +--
 arch/x86/kernel/process_64.c       |  3 +--
 arch/x86/kernel/smpboot.c          |  3 +--
 arch/x86/xen/smp.c                 |  3 +--
 8 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 01eca80..b349ae8 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -114,7 +114,6 @@ ENTRY(ia32_sysenter_target)
 	CFI_REGISTER	rsp,rbp
 	SWAPGS_UNSAFE_STACK
 	movq	PER_CPU_VAR(kernel_stack), %rsp
-	addq	$(KERNEL_STACK_OFFSET),%rsp
 	/*
 	 * No need to follow this irqs on/off section: the syscall
 	 * disabled irqs, here we enable it straight after entry:
@@ -128,7 +127,7 @@ ENTRY(ia32_sysenter_target)
 	CFI_REL_OFFSET rsp,0
 	pushfq_cfi
 	/*CFI_REL_OFFSET rflags,0*/
-	movl	TI_sysenter_return+THREAD_INFO(%rsp,3*8-KERNEL_STACK_OFFSET),%r10d
+	movl	TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
 	CFI_REGISTER rip,r10
 	pushq_cfi $__USER32_CS
 	/*CFI_REL_OFFSET cs,0*/
@@ -160,8 +159,8 @@ ENTRY(ia32_sysenter_target)
 	jnz sysenter_fix_flags
 sysenter_flags_fixed:
 
-	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
-	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	CFI_REMEMBER_STATE
 	jnz	sysenter_tracesys
 	cmpl	$(IA32_NR_syscalls-1),%eax
@@ -178,10 +177,10 @@ sysenter_dispatch:
 	movq	%rax,RAX(%rsp)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
-	testl	$_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+	testl	$_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz	sysexit_audit
 sysexit_from_sys_call:
-	andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+	andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	/* clear IF, that popfq doesn't enable interrupts early */
 	andl	$~0x200,EFLAGS(%rsp)
 	movl	RIP(%rsp),%edx		/* User %eip */
@@ -226,7 +225,7 @@ sysexit_from_sys_call:
 	.endm
 
 	.macro auditsys_exit exit
-	testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+	testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz ia32_ret_from_sys_call
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
@@ -241,7 +240,7 @@ sysexit_from_sys_call:
 	movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
-	testl %edi,TI_flags+THREAD_INFO(%rsp,RIP)
+	testl %edi,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jz \exit
 	CLEAR_RREGS
 	jmp int_with_check
@@ -263,7 +262,7 @@ sysenter_fix_flags:
 
 sysenter_tracesys:
 #ifdef CONFIG_AUDITSYSCALL
-	testl	$(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+	testl	$(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jz	sysenter_auditsys
 #endif
 	SAVE_EXTRA_REGS
@@ -312,7 +311,7 @@ ENDPROC(ia32_sysenter_target)
 ENTRY(ia32_cstar_target)
 	CFI_STARTPROC32	simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,KERNEL_STACK_OFFSET
+	CFI_DEF_CFA	rsp,0
 	CFI_REGISTER	rip,rcx
 	/*CFI_REGISTER	rflags,r11*/
 	SWAPGS_UNSAFE_STACK
@@ -324,7 +323,7 @@ ENTRY(ia32_cstar_target)
 	 * disabled irqs and here we enable it straight after entry:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	ALLOC_PT_GPREGS_ON_STACK 8	/* +8: space for orig_ax */
+	ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
 	SAVE_C_REGS_EXCEPT_RCX_R891011
 	movl 	%eax,%eax	/* zero extension */
 	movq	%rax,ORIG_RAX(%rsp)
@@ -347,8 +346,8 @@ ENTRY(ia32_cstar_target)
 1:	movl	(%r8),%r9d
 	_ASM_EXTABLE(1b,ia32_badarg)
 	ASM_CLAC
-	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
-	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	CFI_REMEMBER_STATE
 	jnz	cstar_tracesys
 	cmpl	$(IA32_NR_syscalls-1),%eax
@@ -365,10 +364,10 @@ cstar_dispatch:
 	movq %rax,RAX(%rsp)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
-	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz sysretl_audit
 sysretl_from_sys_call:
-	andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+	andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	RESTORE_RSI_RDI_RDX
 	movl RIP(%rsp),%ecx
 	CFI_REGISTER rip,rcx
@@ -403,7 +402,7 @@ sysretl_audit:
 
 cstar_tracesys:
 #ifdef CONFIG_AUDITSYSCALL
-	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+	testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jz cstar_auditsys
 #endif
 	xchgl %r9d,%ebp
@@ -470,8 +469,8 @@ ENTRY(ia32_syscall)
 	   this could be a problem. */
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS_EXCEPT_R891011
-	orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
-	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+	orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz ia32_tracesys
 	cmpl $(IA32_NR_syscalls-1),%eax
 	ja ia32_badsys
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e82e95a..3eb9d05 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -149,7 +149,6 @@ struct thread_info {
 #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
 
 #define STACK_WARN		(THREAD_SIZE/8)
-#define KERNEL_STACK_OFFSET	(5*(BITS_PER_LONG/8))
 
 /*
  * macros/functions for gaining access to the thread information structure
@@ -163,8 +162,7 @@ DECLARE_PER_CPU(unsigned long, kernel_stack);
 static inline struct thread_info *current_thread_info(void)
 {
 	struct thread_info *ti;
-	ti = (void *)(this_cpu_read_stable(kernel_stack) +
-		      KERNEL_STACK_OFFSET - THREAD_SIZE);
+	ti = (void *)(this_cpu_read_stable(kernel_stack) - THREAD_SIZE);
 	return ti;
 }
 
@@ -184,13 +182,13 @@ static inline unsigned long current_stack_pointer(void)
 /* how to get the thread information struct from ASM */
 #define GET_THREAD_INFO(reg) \
 	_ASM_MOV PER_CPU_VAR(kernel_stack),reg ; \
-	_ASM_SUB $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg ;
+	_ASM_SUB $(THREAD_SIZE),reg ;
 
 /*
  * Same if PER_CPU_VAR(kernel_stack) is, perhaps with some offset, already in
  * a certain register (to be used in assembler memory operands).
  */
-#define THREAD_INFO(reg, off) KERNEL_STACK_OFFSET+(off)-THREAD_SIZE(reg)
+#define THREAD_INFO(reg, off) (off)-THREAD_SIZE(reg)
 
 #endif
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb56925..a47ca751 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1116,7 +1116,7 @@ static __init int setup_disablecpuid(char *arg)
 __setup("clearcpuid=", setup_disablecpuid);
 
 DEFINE_PER_CPU(unsigned long, kernel_stack) =
-	(unsigned long)&init_thread_union - KERNEL_STACK_OFFSET + THREAD_SIZE;
+	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(kernel_stack);
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b93f3a2..91af6be 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -237,7 +237,7 @@ ENDPROC(native_usergs_sysret64)
 ENTRY(system_call)
 	CFI_STARTPROC	simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,KERNEL_STACK_OFFSET
+	CFI_DEF_CFA	rsp,0
 	CFI_REGISTER	rip,rcx
 	/*CFI_REGISTER	rflags,r11*/
 	SWAPGS_UNSAFE_STACK
@@ -256,13 +256,13 @@ GLOBAL(system_call_after_swapgs)
 	 * and short:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	ALLOC_PT_GPREGS_ON_STACK 8		/* +8: space for orig_ax */
+	ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
 	SAVE_C_REGS_EXCEPT_RAX_RCX
 	movq	$-ENOSYS,RAX(%rsp)
 	movq_cfi rax,ORIG_RAX
 	movq	%rcx,RIP(%rsp)
 	CFI_REL_OFFSET rip,RIP
-	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz tracesys
 system_call_fastpath:
 #if __SYSCALL_MASK == ~0
@@ -280,7 +280,7 @@ system_call_fastpath:
  * Has incomplete stack frame and undefined top of stack.
  */
 ret_from_sys_call:
-	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz int_ret_from_sys_call_fixup	/* Go the the slow path */
 
 	LOCKDEP_SYS_EXIT
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8f3ebfe..ecda2bd 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -311,8 +311,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	arch_end_context_switch(next_p);
 
 	this_cpu_write(kernel_stack,
-		  (unsigned long)task_stack_page(next_p) +
-		  THREAD_SIZE - KERNEL_STACK_OFFSET);
+		(unsigned long)task_stack_page(next_p) + THREAD_SIZE);
 
 	/*
 	 * Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5a2c029..975d342 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -414,8 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
 
 	this_cpu_write(kernel_stack,
-		  (unsigned long)task_stack_page(next_p) +
-		  THREAD_SIZE - KERNEL_STACK_OFFSET);
+		(unsigned long)task_stack_page(next_p) + THREAD_SIZE);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index febc6aa..73dfb1f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -811,8 +811,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	initial_gs = per_cpu_offset(cpu);
 #endif
 	per_cpu(kernel_stack, cpu) =
-		(unsigned long)task_stack_page(idle) -
-		KERNEL_STACK_OFFSET + THREAD_SIZE;
+		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
 	stack_start  = idle->thread.sp;
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 4c071ae..4e71a3b 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -452,8 +452,7 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
 	clear_tsk_thread_flag(idle, TIF_FORK);
 #endif
 	per_cpu(kernel_stack, cpu) =
-		(unsigned long)task_stack_page(idle) -
-		KERNEL_STACK_OFFSET + THREAD_SIZE;
+		(unsigned long)task_stack_page(idle) + THREAD_SIZE;
 
 	xen_setup_runstate_info(cpu);
 	xen_setup_timer(cpu);
-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath
  2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
@ 2015-02-24 18:51 ` Denys Vlasenko
  2015-02-24 22:44   ` Andy Lutomirski
  2015-02-24 18:51 ` [PATCH 4/4] x86: save user %rsp in pt_regs->sp Denys Vlasenko
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 18:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

Before this patch, rcx and r11 were saved in pt_regs->rcx
and pt_regs->r11. Which looks natural, but requires messy
shuffling to/from iret stack whenever ptrace or e.g. iopl
wants to modify return address or flags - because that's
how these registers are used by SYSCALL/SYSRET.

This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
and uses these values for SYSRET64 insn. Shuffling is eliminated.

stub_iopl is no longer needed: pt_regs->flags needs no fixing up.

Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/calling.h   | 20 ++++++++++++++------
 arch/x86/kernel/entry_64.S       | 33 +++++++++------------------------
 arch/x86/syscalls/syscall_64.tbl |  2 +-
 arch/x86/um/sys_call_table_64.c  |  2 +-
 4 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index f1a962f..4b5f7bf 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -95,9 +95,11 @@ For 32-bit we have the following conventions - kernel is built with
 	CFI_ADJUST_CFA_OFFSET 15*8+\addskip
 	.endm
 
-	.macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8plus=1
-	.if \r8plus
+	.macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
+	.if \r11
 	movq_cfi r11, 6*8+\offset
+	.endif
+	.if \r8910
 	movq_cfi r10, 7*8+\offset
 	movq_cfi r9,  8*8+\offset
 	movq_cfi r8,  9*8+\offset
@@ -113,16 +115,19 @@ For 32-bit we have the following conventions - kernel is built with
 	movq_cfi rdi, 14*8+\offset
 	.endm
 	.macro SAVE_C_REGS offset=0
-	SAVE_C_REGS_HELPER \offset, 1, 1, 1
+	SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
 	.endm
 	.macro SAVE_C_REGS_EXCEPT_RAX_RCX offset=0
-	SAVE_C_REGS_HELPER \offset, 0, 0, 1
+	SAVE_C_REGS_HELPER \offset, 0, 0, 1, 1
 	.endm
 	.macro SAVE_C_REGS_EXCEPT_R891011
-	SAVE_C_REGS_HELPER 0, 1, 1, 0
+	SAVE_C_REGS_HELPER 0, 1, 1, 0, 0
 	.endm
 	.macro SAVE_C_REGS_EXCEPT_RCX_R891011
-	SAVE_C_REGS_HELPER 0, 1, 0, 0
+	SAVE_C_REGS_HELPER 0, 1, 0, 0, 0
+	.endm
+	.macro SAVE_C_REGS_EXCEPT_RAX_RCX_R11
+	SAVE_C_REGS_HELPER 0, 0, 0, 1, 0
 	.endm
 
 	.macro SAVE_EXTRA_REGS offset=0
@@ -179,6 +184,9 @@ For 32-bit we have the following conventions - kernel is built with
 	.macro RESTORE_C_REGS_EXCEPT_R11
 	RESTORE_C_REGS_HELPER 1,1,0,1,1
 	.endm
+	.macro RESTORE_C_REGS_EXCEPT_RCX_R11
+	RESTORE_C_REGS_HELPER 1,0,0,1,1
+	.endm
 	.macro RESTORE_RSI_RDI
 	RESTORE_C_REGS_HELPER 0,0,0,0,0
 	.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 91af6be..2fd9349 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -121,14 +121,12 @@ ENDPROC(native_usergs_sysret64)
 #endif
 
 /*
- * C code is not supposed to know about undefined top of stack. Every time
+ * C code is not supposed to know that iret frame is not populated. Every time
  * a C function with an pt_regs argument is called from the SYSCALL based
  * fast path FIXUP_TOP_OF_STACK is needed.
  * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
  * manipulation.
  */
-
-	/* %rsp:at FRAMEEND */
 	.macro FIXUP_TOP_OF_STACK tmp offset=0
 	movq PER_CPU_VAR(old_rsp),\tmp
 	movq \tmp,RSP+\offset(%rsp)
@@ -136,15 +134,13 @@ ENDPROC(native_usergs_sysret64)
 	movq $__USER_CS,CS+\offset(%rsp)
 	movq RIP+\offset(%rsp),\tmp  /* get rip */
 	movq \tmp,RCX+\offset(%rsp)  /* copy it to rcx as sysret would do */
-	movq R11+\offset(%rsp),\tmp  /* get eflags */
-	movq \tmp,EFLAGS+\offset(%rsp)
+	movq EFLAGS+\offset(%rsp),\tmp /* ditto for rflags->r11 */
+	movq \tmp,R11+\offset(%rsp)
 	.endm
 
 	.macro RESTORE_TOP_OF_STACK tmp offset=0
 	movq RSP+\offset(%rsp),\tmp
 	movq \tmp,PER_CPU_VAR(old_rsp)
-	movq EFLAGS+\offset(%rsp),\tmp
-	movq \tmp,R11+\offset(%rsp)
 	.endm
 
 /*
@@ -257,9 +253,10 @@ GLOBAL(system_call_after_swapgs)
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
 	ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
-	SAVE_C_REGS_EXCEPT_RAX_RCX
+	SAVE_C_REGS_EXCEPT_RAX_RCX_R11
 	movq	$-ENOSYS,RAX(%rsp)
 	movq_cfi rax,ORIG_RAX
+	movq	%r11,EFLAGS(%rsp)
 	movq	%rcx,RIP(%rsp)
 	CFI_REL_OFFSET rip,RIP
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
@@ -277,7 +274,7 @@ system_call_fastpath:
 	movq %rax,RAX(%rsp)
 /*
  * Syscall return path ending with SYSRET (fast path)
- * Has incomplete stack frame and undefined top of stack.
+ * Has incompletely filled pt_regs, iret frame is also incomplete.
  */
 ret_from_sys_call:
 	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
@@ -291,9 +288,10 @@ ret_from_sys_call:
 	 * sysretq will re-enable interrupts:
 	 */
 	TRACE_IRQS_ON
-	RESTORE_C_REGS_EXCEPT_RCX
-	movq RIP(%rsp),%rcx
+	RESTORE_C_REGS_EXCEPT_RCX_R11
+	movq	RIP(%rsp),%rcx
 	CFI_REGISTER	rip,rcx
+	movq	EFLAGS(%rsp),%r11
 	/*CFI_REGISTER	rflags,r11*/
 	movq	PER_CPU_VAR(old_rsp), %rsp
 	/*
@@ -422,22 +420,9 @@ ENTRY(stub_\func)
 END(stub_\func)
 	.endm
 
-	.macro FIXED_FRAME label,func
-ENTRY(\label)
-	CFI_STARTPROC
-	DEFAULT_FRAME 0, 8		/* offset 8: return address */
-	FIXUP_TOP_OF_STACK %r11, 8
-	call \func
-	RESTORE_TOP_OF_STACK %r11, 8
-	ret
-	CFI_ENDPROC
-END(\label)
-	.endm
-
 	FORK_LIKE  clone
 	FORK_LIKE  fork
 	FORK_LIKE  vfork
-	FIXED_FRAME stub_iopl, sys_iopl
 
 ENTRY(stub_execve)
 	CFI_STARTPROC
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 8d656fb..9ef32d5 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -178,7 +178,7 @@
 169	common	reboot			sys_reboot
 170	common	sethostname		sys_sethostname
 171	common	setdomainname		sys_setdomainname
-172	common	iopl			stub_iopl
+172	common	iopl			sys_iopl
 173	common	ioperm			sys_ioperm
 174	64	create_module
 175	common	init_module		sys_init_module
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 5cdfa9d..a75d8700 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -16,7 +16,7 @@
  */
 
 /* Not going to be implemented by UML, since we have no hardware. */
-#define stub_iopl sys_ni_syscall
+#define sys_iopl sys_ni_syscall
 #define sys_ioperm sys_ni_syscall
 
 /*
-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/4] x86: save user %rsp in pt_regs->sp
  2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
  2015-02-24 18:51 ` [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath Denys Vlasenko
@ 2015-02-24 18:51 ` Denys Vlasenko
  2015-02-24 22:55   ` Andy Lutomirski
  2015-02-24 19:58 ` [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Borislav Petkov
  2015-02-24 22:25 ` Andy Lutomirski
  4 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 18:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

PER_CPU(old_rsp) usage is simplified - now it is used only
as temp storage, and userspace stack pointer is immediately stored
in pt_regs->sp on syscall entry, instead of being used later,
on syscall exit. This allows to get rid of thread_struct::usersp.

The lazy store of pt_regs->cs and pt_regs->ss is retained -
tests have shown that these insns do take ~2 cycles on fast path.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/compat.h    |  2 +-
 arch/x86/include/asm/processor.h |  6 ------
 arch/x86/include/asm/ptrace.h    |  8 ++------
 arch/x86/kernel/entry_64.S       | 22 +++++++++-------------
 arch/x86/kernel/perf_regs.c      |  2 +-
 arch/x86/kernel/process_64.c     |  8 +-------
 6 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 59c6c40..acdee09 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
 		sp = task_pt_regs(current)->sp;
 	} else {
 		/* -128 for the x32 ABI redzone */
-		sp = this_cpu_read(old_rsp) - 128;
+		sp = task_pt_regs(current)->sp - 128;
 	}
 
 	return (void __user *)round_down(sp - len, 16);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a092a0c..ce4aadf 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ struct thread_struct {
 #ifdef CONFIG_X86_32
 	unsigned long		sysenter_cs;
 #else
-	unsigned long		usersp;	/* Copy from PDA */
 	unsigned short		es;
 	unsigned short		ds;
 	unsigned short		fsindex;
@@ -935,11 +934,6 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 #define task_pt_regs(tsk)	((struct pt_regs *)(tsk)->thread.sp0 - 1)
 extern unsigned long KSTK_ESP(struct task_struct *task);
 
-/*
- * User space RSP while inside the SYSCALL fast path
- */
-DECLARE_PER_CPU(unsigned long, old_rsp);
-
 #endif /* CONFIG_X86_64 */
 
 extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 4077d96..74bb2e0 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -145,12 +145,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
 #endif
 }
 
-#define current_user_stack_pointer()	this_cpu_read(old_rsp)
-/* ia32 vs. x32 difference */
-#define compat_user_stack_pointer()	\
-	(test_thread_flag(TIF_IA32) 	\
-	 ? current_pt_regs()->sp 	\
-	 : this_cpu_read(old_rsp))
+#define current_user_stack_pointer()	current_pt_regs()->sp
+#define compat_user_stack_pointer()	current_pt_regs()->sp
 #endif
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2fd9349..c63a582 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -128,8 +128,6 @@ ENDPROC(native_usergs_sysret64)
  * manipulation.
  */
 	.macro FIXUP_TOP_OF_STACK tmp offset=0
-	movq PER_CPU_VAR(old_rsp),\tmp
-	movq \tmp,RSP+\offset(%rsp)
 	movq $__USER_DS,SS+\offset(%rsp)
 	movq $__USER_CS,CS+\offset(%rsp)
 	movq RIP+\offset(%rsp),\tmp  /* get rip */
@@ -139,8 +137,7 @@ ENDPROC(native_usergs_sysret64)
 	.endm
 
 	.macro RESTORE_TOP_OF_STACK tmp offset=0
-	movq RSP+\offset(%rsp),\tmp
-	movq \tmp,PER_CPU_VAR(old_rsp)
+	/* nothing to do */
 	.endm
 
 /*
@@ -222,9 +219,6 @@ ENDPROC(native_usergs_sysret64)
  * Interrupts are off on entry.
  * Only called from user space.
  *
- * XXX	if we had a free scratch register we could save the RSP into the stack frame
- *      and report it properly in ps. Unfortunately we haven't.
- *
  * When user can change the frames always force IRET. That is because
  * it deals with uncanonical addresses better. SYSRET has trouble
  * with them due to bugs in both AMD and Intel CPUs.
@@ -253,11 +247,13 @@ GLOBAL(system_call_after_swapgs)
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
 	ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
+	movq	%rcx,RIP(%rsp)
+	movq	%r11,EFLAGS(%rsp)
+	movq    PER_CPU_VAR(old_rsp),%rcx
+	movq    %rcx,RSP(%rsp)
+	movq_cfi rax,ORIG_RAX
 	SAVE_C_REGS_EXCEPT_RAX_RCX_R11
 	movq	$-ENOSYS,RAX(%rsp)
-	movq_cfi rax,ORIG_RAX
-	movq	%r11,EFLAGS(%rsp)
-	movq	%rcx,RIP(%rsp)
 	CFI_REL_OFFSET rip,RIP
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz tracesys
@@ -293,7 +289,7 @@ ret_from_sys_call:
 	CFI_REGISTER	rip,rcx
 	movq	EFLAGS(%rsp),%r11
 	/*CFI_REGISTER	rflags,r11*/
-	movq	PER_CPU_VAR(old_rsp), %rsp
+	movq	RSP(%rsp),%rsp
 	/*
 	 * 64bit SYSRET restores rip from rcx,
 	 * rflags from r11 (but RF and VM bits are forced to 0),
@@ -307,7 +303,7 @@ int_ret_from_sys_call_fixup:
 	FIXUP_TOP_OF_STACK %r11
 	jmp int_ret_from_sys_call
 
-	/* Do syscall tracing */
+	/* Do syscall entry tracing */
 tracesys:
 	movq %rsp, %rdi
 	movl $AUDIT_ARCH_X86_64, %esi
@@ -346,7 +342,7 @@ tracesys_phase2:
 
 /*
  * Syscall return path ending with IRET.
- * Has correct top of stack, but partial stack frame.
+ * Has correct iret frame.
  */
 GLOBAL(int_ret_from_sys_call)
 	DISABLE_INTERRUPTS(CLBR_NONE)
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 781861c..02a8720 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -177,7 +177,7 @@ void perf_get_regs_user(struct perf_regs *regs_user,
 		 * than just blindly copying user_regs.
 		 */
 		regs_user->abi = PERF_SAMPLE_REGS_ABI_64;
-		regs_user_copy->sp = this_cpu_read(old_rsp);
+		regs_user_copy->sp = user_regs->sp;
 		regs_user_copy->cs = __USER_CS;
 		regs_user_copy->ss = __USER_DS;
 		regs_user_copy->cx = -1;  /* usually contains garbage */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 975d342..ab79139 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,7 +161,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
 	childregs = task_pt_regs(p);
 	p->thread.sp = (unsigned long) childregs;
-	p->thread.usersp = me->thread.usersp;
 	set_tsk_thread_flag(p, TIF_FORK);
 	p->thread.io_bitmap_ptr = NULL;
 
@@ -235,10 +234,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
 	loadsegment(es, _ds);
 	loadsegment(ds, _ds);
 	load_gs_index(0);
-	current->thread.usersp	= new_sp;
 	regs->ip		= new_ip;
 	regs->sp		= new_sp;
-	this_cpu_write(old_rsp, new_sp);
 	regs->cs		= _cs;
 	regs->ss		= _ss;
 	regs->flags		= X86_EFLAGS_IF;
@@ -401,8 +398,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
-	prev->usersp = this_cpu_read(old_rsp);
-	this_cpu_write(old_rsp, next->usersp);
 	this_cpu_write(current_task, next_p);
 
 	/*
@@ -601,6 +596,5 @@ long sys_arch_prctl(int code, unsigned long addr)
 
 unsigned long KSTK_ESP(struct task_struct *task)
 {
-	return (test_tsk_thread_flag(task, TIF_IA32)) ?
-			(task_pt_regs(task)->sp) : ((task)->thread.usersp);
+	return task_pt_regs(task)->sp;
 }
-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
@ 2015-02-24 19:30   ` Steven Rostedt
  2015-02-24 20:02     ` Denys Vlasenko
  2015-02-24 22:37   ` Andy Lutomirski
  2015-02-25  8:53   ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2015-02-24 19:30 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel

On Tue, 24 Feb 2015 19:51:33 +0100
Denys Vlasenko <dvlasenk@redhat.com> wrote:

> PER_CPU_VAR(kernel_stack) was set up in a way where it points
> five stack slots below the top of stack.
> 
> Presumably, it was done to avoid one "sub $5*8,%rsp"
> in syscall/sysenter code paths, where iret frame needs to be
> created by hand.
> 
> Ironically, none of them benefit from this optimization,
> since all of them need to allocate additional data on stack
> (struct pt_regs), so they still have to perform subtraction.
> And ia32_sysenter_target even needs to *undo* this optimization:
> it constructs iret stack with pushes instead of movs,
> so it needs to start right at the top.
> 
> This patch eliminates KERNEL_STACK_OFFSET.
> PER_CPU_VAR(kernel_stack) now points directly to top of stack.
> pt_regs allocations are adjusted to allocate iret frame as well.
> 

I always thought the KERNEL_STACK_OFFSET wasn't an optimization, but a
buffer from the real top of stack, in case we had any off by one bugs,
it wouldn't crash the system.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
                   ` (2 preceding siblings ...)
  2015-02-24 18:51 ` [PATCH 4/4] x86: save user %rsp in pt_regs->sp Denys Vlasenko
@ 2015-02-24 19:58 ` Borislav Petkov
  2015-02-24 20:13   ` Denys Vlasenko
  2015-02-24 22:25 ` Andy Lutomirski
  4 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-02-24 19:58 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel

On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
> In all three 32-bit entry points, %eax is zero-extended to %rax.
> It is safe to do 32-bit compare when checking that syscall#
> is not too large.
> 
> The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.
> 
> The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
> is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
> instead of 5-byte one.
> 
> After TEST insn, JE anctually means "jump of zero",
> let's use JZ mnemonic instead.

Actually, JE == LZ as that's the same opcode for testing ZF=1.

And I have to object:

        testl $3,CS(%rsp)
	je retint_kernel

is much more understandable than

        testl $3,CS(%rsp)
	jz retint_kernel

It basically says are $3 and CS(%rsp) equal.

JZ, on the other hand, not so clear: the TEST ANDed the two operands and
set flags accordingly, so JZ is testing the ZF. This means, you actually
know what TEST does and you haven't forgotten.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 19:30   ` Steven Rostedt
@ 2015-02-24 20:02     ` Denys Vlasenko
  0 siblings, 0 replies; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 20:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel

On 02/24/2015 08:30 PM, Steven Rostedt wrote:
> On Tue, 24 Feb 2015 19:51:33 +0100
> Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 
>> PER_CPU_VAR(kernel_stack) was set up in a way where it points
>> five stack slots below the top of stack.
>>
>> Presumably, it was done to avoid one "sub $5*8,%rsp"
>> in syscall/sysenter code paths, where iret frame needs to be
>> created by hand.
>>
>> Ironically, none of them benefit from this optimization,
>> since all of them need to allocate additional data on stack
>> (struct pt_regs), so they still have to perform subtraction.
>> And ia32_sysenter_target even needs to *undo* this optimization:
>> it constructs iret stack with pushes instead of movs,
>> so it needs to start right at the top.
>>
>> This patch eliminates KERNEL_STACK_OFFSET.
>> PER_CPU_VAR(kernel_stack) now points directly to top of stack.
>> pt_regs allocations are adjusted to allocate iret frame as well.
>>
> 
> I always thought the KERNEL_STACK_OFFSET wasn't an optimization, but a
> buffer from the real top of stack, in case we had any off by one bugs,
> it wouldn't crash the system.

I was thinking about it, but it looks unlikely. Reasons:

(1) ia32_sysenter_target does "addq $(KERNEL_STACK_OFFSET),%rsp"
on entry before saving registers with PUSHes,
this returns %rsp to the very top of kernel stack.
If that is a problem (say, a NMI at this point would do bad things),
it would be noticed by now.

(2) even ordinary 64-bit syscall path uses IRET return at times.
For one, on every execve and signal return (because they need
to load a modified %rsp). With current layout, return frame
for IRET lies exactly there, in those 5 stack slots "reserved"
via KERNEL_STACK_OFFSET thingy.

(3) There are no comments anywhere about KERNEL_STACK_OFFSET being
a safety measure.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 19:58 ` [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Borislav Petkov
@ 2015-02-24 20:13   ` Denys Vlasenko
  2015-02-24 20:36     ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 20:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Ingo Molnar, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	Linux Kernel Mailing List

On Tue, Feb 24, 2015 at 8:58 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Feb 24, 2015 at 07:51:32PM +0100, Denys Vlasenko wrote:
>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>> It is safe to do 32-bit compare when checking that syscall#
>> is not too large.
>>
>> The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.
>>
>> The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
>> is a 32-bit constant, loading it with 64-bit MOV produces 10-byte insn
>> instead of 5-byte one.
>>
>> After TEST insn, JE anctually means "jump of zero",
>> let's use JZ mnemonic instead.
>
> Actually, JE == LZ as that's the same opcode for testing ZF=1.

Yes, I know that :)

> And I have to object:
>
>         testl $3,CS(%rsp)
>         je retint_kernel
>
> is much more understandable than
>
>         testl $3,CS(%rsp)
>         jz retint_kernel
>
> It basically says are $3 and CS(%rsp) equal.

They aren't equal.  $1 and $2 in two lowest bits will also
be interpreted as "userspace" here. "Equal to $3" sends
a wrong message here to a human reading the code,
the code doesn't test for CPL=3, it tests for any nonzero CPL.

> JZ, on the other hand, not so clear: the TEST ANDed the two operands and
> set flags accordingly, so JZ is testing the ZF. This means, you actually
> know what TEST does and you haven't forgotten.

JZ says "jump if zero", in this case, "jump if CPL is zero".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 20:13   ` Denys Vlasenko
@ 2015-02-24 20:36     ` Borislav Petkov
  2015-02-24 22:51       ` H. Peter Anvin
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-02-24 20:36 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Ingo Molnar, H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	Linux Kernel Mailing List

On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
> They aren't equal.  $1 and $2 in two lowest bits will also
> be interpreted as "userspace" here. "Equal to $3" sends
> a wrong message here to a human reading the code,
> the code doesn't test for CPL=3, it tests for any nonzero CPL.

Doh, of course.

I was going to propose to make it even more explicit:

	andb $3, CS(%rsp)...

but that's destructive.

So yours makes it less misleading, actually. To me at least and
obviously.

:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
                   ` (3 preceding siblings ...)
  2015-02-24 19:58 ` [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Borislav Petkov
@ 2015-02-24 22:25 ` Andy Lutomirski
  2015-02-24 22:52   ` H. Peter Anvin
  4 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 22:25 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> In all three 32-bit entry points, %eax is zero-extended to %rax.
> It is safe to do 32-bit compare when checking that syscall#
> is not too large.

Applied.  Thanks!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
  2015-02-24 19:30   ` Steven Rostedt
@ 2015-02-24 22:37   ` Andy Lutomirski
  2015-02-25  9:45     ` Ingo Molnar
  2015-02-25  8:53   ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 22:37 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> PER_CPU_VAR(kernel_stack) was set up in a way where it points
> five stack slots below the top of stack.
>
> Presumably, it was done to avoid one "sub $5*8,%rsp"
> in syscall/sysenter code paths, where iret frame needs to be
> created by hand.
>
> Ironically, none of them benefit from this optimization,
> since all of them need to allocate additional data on stack
> (struct pt_regs), so they still have to perform subtraction.
> And ia32_sysenter_target even needs to *undo* this optimization:
> it constructs iret stack with pushes instead of movs,
> so it needs to start right at the top.
>
> This patch eliminates KERNEL_STACK_OFFSET.
> PER_CPU_VAR(kernel_stack) now points directly to top of stack.
> pt_regs allocations are adjusted to allocate iret frame as well.
>
> Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
> are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) -
> "rsp is SIZEOF_PTREGS bytes below the top, calculate
> thread_info's address using that information"
>
> Net result in code is that one instruction is eliminated,
> and constants in several others have changed.

This has a side effect: kernel_stack now points to the first byte of
the page above the stack instead of to the page containing the stack.
That means we need this fix:

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c74f2f5652da..5f20a0a612a6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -174,7 +174,8 @@ void ist_begin_non_atomic(struct pt_regs *regs)
         * will catch asm bugs and any attempt to use ist_preempt_enable
         * from double_fault.
         */
-       BUG_ON(((current_stack_pointer() ^ this_cpu_read_stable(kernel_stack))
+       BUG_ON(((current_stack_pointer() ^
+                (this_cpu_read_stable(kernel_stack) - 1))
                & ~(THREAD_SIZE - 1)) != 0);

        preempt_count_sub(HARDIRQ_OFFSET);

I added that in and applied this patch.

Borislav and/or Tony, when all the dust settles, could you make sure
to re-run your memory failure test on the result?

Thanks,
Andy

>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/ia32/ia32entry.S          | 35 +++++++++++++++++------------------
>  arch/x86/include/asm/thread_info.h |  8 +++-----
>  arch/x86/kernel/cpu/common.c       |  2 +-
>  arch/x86/kernel/entry_64.S         |  8 ++++----
>  arch/x86/kernel/process_32.c       |  3 +--
>  arch/x86/kernel/process_64.c       |  3 +--
>  arch/x86/kernel/smpboot.c          |  3 +--
>  arch/x86/xen/smp.c                 |  3 +--
>  8 files changed, 29 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 01eca80..b349ae8 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -114,7 +114,6 @@ ENTRY(ia32_sysenter_target)
>         CFI_REGISTER    rsp,rbp
>         SWAPGS_UNSAFE_STACK
>         movq    PER_CPU_VAR(kernel_stack), %rsp
> -       addq    $(KERNEL_STACK_OFFSET),%rsp
>         /*
>          * No need to follow this irqs on/off section: the syscall
>          * disabled irqs, here we enable it straight after entry:
> @@ -128,7 +127,7 @@ ENTRY(ia32_sysenter_target)
>         CFI_REL_OFFSET rsp,0
>         pushfq_cfi
>         /*CFI_REL_OFFSET rflags,0*/
> -       movl    TI_sysenter_return+THREAD_INFO(%rsp,3*8-KERNEL_STACK_OFFSET),%r10d
> +       movl    TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
>         CFI_REGISTER rip,r10
>         pushq_cfi $__USER32_CS
>         /*CFI_REL_OFFSET cs,0*/
> @@ -160,8 +159,8 @@ ENTRY(ia32_sysenter_target)
>         jnz sysenter_fix_flags
>  sysenter_flags_fixed:
>
> -       orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
> -       testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
> +       orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
> +       testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         CFI_REMEMBER_STATE
>         jnz     sysenter_tracesys
>         cmpl    $(IA32_NR_syscalls-1),%eax
> @@ -178,10 +177,10 @@ sysenter_dispatch:
>         movq    %rax,RAX(%rsp)
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
> -       testl   $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl   $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz     sysexit_audit
>  sysexit_from_sys_call:
> -       andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
> +       andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         /* clear IF, that popfq doesn't enable interrupts early */
>         andl    $~0x200,EFLAGS(%rsp)
>         movl    RIP(%rsp),%edx          /* User %eip */
> @@ -226,7 +225,7 @@ sysexit_from_sys_call:
>         .endm
>
>         .macro auditsys_exit exit
> -       testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz ia32_ret_from_sys_call
>         TRACE_IRQS_ON
>         ENABLE_INTERRUPTS(CLBR_NONE)
> @@ -241,7 +240,7 @@ sysexit_from_sys_call:
>         movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
> -       testl %edi,TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl %edi,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jz \exit
>         CLEAR_RREGS
>         jmp int_with_check
> @@ -263,7 +262,7 @@ sysenter_fix_flags:
>
>  sysenter_tracesys:
>  #ifdef CONFIG_AUDITSYSCALL
> -       testl   $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl   $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jz      sysenter_auditsys
>  #endif
>         SAVE_EXTRA_REGS
> @@ -312,7 +311,7 @@ ENDPROC(ia32_sysenter_target)
>  ENTRY(ia32_cstar_target)
>         CFI_STARTPROC32 simple
>         CFI_SIGNAL_FRAME
> -       CFI_DEF_CFA     rsp,KERNEL_STACK_OFFSET
> +       CFI_DEF_CFA     rsp,0
>         CFI_REGISTER    rip,rcx
>         /*CFI_REGISTER  rflags,r11*/
>         SWAPGS_UNSAFE_STACK
> @@ -324,7 +323,7 @@ ENTRY(ia32_cstar_target)
>          * disabled irqs and here we enable it straight after entry:
>          */
>         ENABLE_INTERRUPTS(CLBR_NONE)
> -       ALLOC_PT_GPREGS_ON_STACK 8      /* +8: space for orig_ax */
> +       ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
>         SAVE_C_REGS_EXCEPT_RCX_R891011
>         movl    %eax,%eax       /* zero extension */
>         movq    %rax,ORIG_RAX(%rsp)
> @@ -347,8 +346,8 @@ ENTRY(ia32_cstar_target)
>  1:     movl    (%r8),%r9d
>         _ASM_EXTABLE(1b,ia32_badarg)
>         ASM_CLAC
> -       orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
> -       testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
> +       orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
> +       testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         CFI_REMEMBER_STATE
>         jnz     cstar_tracesys
>         cmpl    $(IA32_NR_syscalls-1),%eax
> @@ -365,10 +364,10 @@ cstar_dispatch:
>         movq %rax,RAX(%rsp)
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
> -       testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz sysretl_audit
>  sysretl_from_sys_call:
> -       andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
> +       andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         RESTORE_RSI_RDI_RDX
>         movl RIP(%rsp),%ecx
>         CFI_REGISTER rip,rcx
> @@ -403,7 +402,7 @@ sysretl_audit:
>
>  cstar_tracesys:
>  #ifdef CONFIG_AUDITSYSCALL
> -       testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jz cstar_auditsys
>  #endif
>         xchgl %r9d,%ebp
> @@ -470,8 +469,8 @@ ENTRY(ia32_syscall)
>            this could be a problem. */
>         ALLOC_PT_GPREGS_ON_STACK
>         SAVE_C_REGS_EXCEPT_R891011
> -       orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
> -       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
> +       orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
> +       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz ia32_tracesys
>         cmpl $(IA32_NR_syscalls-1),%eax
>         ja ia32_badsys
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index e82e95a..3eb9d05 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -149,7 +149,6 @@ struct thread_info {
>  #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
>
>  #define STACK_WARN             (THREAD_SIZE/8)
> -#define KERNEL_STACK_OFFSET    (5*(BITS_PER_LONG/8))
>
>  /*
>   * macros/functions for gaining access to the thread information structure
> @@ -163,8 +162,7 @@ DECLARE_PER_CPU(unsigned long, kernel_stack);
>  static inline struct thread_info *current_thread_info(void)
>  {
>         struct thread_info *ti;
> -       ti = (void *)(this_cpu_read_stable(kernel_stack) +
> -                     KERNEL_STACK_OFFSET - THREAD_SIZE);
> +       ti = (void *)(this_cpu_read_stable(kernel_stack) - THREAD_SIZE);
>         return ti;
>  }
>
> @@ -184,13 +182,13 @@ static inline unsigned long current_stack_pointer(void)
>  /* how to get the thread information struct from ASM */
>  #define GET_THREAD_INFO(reg) \
>         _ASM_MOV PER_CPU_VAR(kernel_stack),reg ; \
> -       _ASM_SUB $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg ;
> +       _ASM_SUB $(THREAD_SIZE),reg ;
>
>  /*
>   * Same if PER_CPU_VAR(kernel_stack) is, perhaps with some offset, already in
>   * a certain register (to be used in assembler memory operands).
>   */
> -#define THREAD_INFO(reg, off) KERNEL_STACK_OFFSET+(off)-THREAD_SIZE(reg)
> +#define THREAD_INFO(reg, off) (off)-THREAD_SIZE(reg)
>
>  #endif
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index cb56925..a47ca751 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1116,7 +1116,7 @@ static __init int setup_disablecpuid(char *arg)
>  __setup("clearcpuid=", setup_disablecpuid);
>
>  DEFINE_PER_CPU(unsigned long, kernel_stack) =
> -       (unsigned long)&init_thread_union - KERNEL_STACK_OFFSET + THREAD_SIZE;
> +       (unsigned long)&init_thread_union + THREAD_SIZE;
>  EXPORT_PER_CPU_SYMBOL(kernel_stack);
>
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index b93f3a2..91af6be 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -237,7 +237,7 @@ ENDPROC(native_usergs_sysret64)
>  ENTRY(system_call)
>         CFI_STARTPROC   simple
>         CFI_SIGNAL_FRAME
> -       CFI_DEF_CFA     rsp,KERNEL_STACK_OFFSET
> +       CFI_DEF_CFA     rsp,0
>         CFI_REGISTER    rip,rcx
>         /*CFI_REGISTER  rflags,r11*/
>         SWAPGS_UNSAFE_STACK
> @@ -256,13 +256,13 @@ GLOBAL(system_call_after_swapgs)
>          * and short:
>          */
>         ENABLE_INTERRUPTS(CLBR_NONE)
> -       ALLOC_PT_GPREGS_ON_STACK 8              /* +8: space for orig_ax */
> +       ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
>         SAVE_C_REGS_EXCEPT_RAX_RCX
>         movq    $-ENOSYS,RAX(%rsp)
>         movq_cfi rax,ORIG_RAX
>         movq    %rcx,RIP(%rsp)
>         CFI_REL_OFFSET rip,RIP
> -       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz tracesys
>  system_call_fastpath:
>  #if __SYSCALL_MASK == ~0
> @@ -280,7 +280,7 @@ system_call_fastpath:
>   * Has incomplete stack frame and undefined top of stack.
>   */
>  ret_from_sys_call:
> -       testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
> +       testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
>         jnz int_ret_from_sys_call_fixup /* Go the the slow path */
>
>         LOCKDEP_SYS_EXIT
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 8f3ebfe..ecda2bd 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -311,8 +311,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>         arch_end_context_switch(next_p);
>
>         this_cpu_write(kernel_stack,
> -                 (unsigned long)task_stack_page(next_p) +
> -                 THREAD_SIZE - KERNEL_STACK_OFFSET);
> +               (unsigned long)task_stack_page(next_p) + THREAD_SIZE);
>
>         /*
>          * Restore %gs if needed (which is common)
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 5a2c029..975d342 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -414,8 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>         this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
>
>         this_cpu_write(kernel_stack,
> -                 (unsigned long)task_stack_page(next_p) +
> -                 THREAD_SIZE - KERNEL_STACK_OFFSET);
> +               (unsigned long)task_stack_page(next_p) + THREAD_SIZE);
>
>         /*
>          * Now maybe reload the debug registers and handle I/O bitmaps
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index febc6aa..73dfb1f 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -811,8 +811,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
>         initial_gs = per_cpu_offset(cpu);
>  #endif
>         per_cpu(kernel_stack, cpu) =
> -               (unsigned long)task_stack_page(idle) -
> -               KERNEL_STACK_OFFSET + THREAD_SIZE;
> +               (unsigned long)task_stack_page(idle) + THREAD_SIZE;
>         early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
>         initial_code = (unsigned long)start_secondary;
>         stack_start  = idle->thread.sp;
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 4c071ae..4e71a3b 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -452,8 +452,7 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
>         clear_tsk_thread_flag(idle, TIF_FORK);
>  #endif
>         per_cpu(kernel_stack, cpu) =
> -               (unsigned long)task_stack_page(idle) -
> -               KERNEL_STACK_OFFSET + THREAD_SIZE;
> +               (unsigned long)task_stack_page(idle) + THREAD_SIZE;
>
>         xen_setup_runstate_info(cpu);
>         xen_setup_timer(cpu);
> --
> 1.8.1.4
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath
  2015-02-24 18:51 ` [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath Denys Vlasenko
@ 2015-02-24 22:44   ` Andy Lutomirski
  0 siblings, 0 replies; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 22:44 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Before this patch, rcx and r11 were saved in pt_regs->rcx
> and pt_regs->r11. Which looks natural, but requires messy
> shuffling to/from iret stack whenever ptrace or e.g. iopl
> wants to modify return address or flags - because that's
> how these registers are used by SYSCALL/SYSRET.
>
> This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
> and uses these values for SYSRET64 insn. Shuffling is eliminated.
>
> stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
>
> Testing shows that syscall fast path is ~54.3 ns before
> and after the patch (on 2.7 GHz Sandy Bridge CPU).
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/include/asm/calling.h   | 20 ++++++++++++++------
>  arch/x86/kernel/entry_64.S       | 33 +++++++++------------------------
>  arch/x86/syscalls/syscall_64.tbl |  2 +-
>  arch/x86/um/sys_call_table_64.c  |  2 +-
>  4 files changed, 25 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
> index f1a962f..4b5f7bf 100644
> --- a/arch/x86/include/asm/calling.h
> +++ b/arch/x86/include/asm/calling.h
> @@ -95,9 +95,11 @@ For 32-bit we have the following conventions - kernel is built with
>         CFI_ADJUST_CFA_OFFSET 15*8+\addskip
>         .endm
>
> -       .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8plus=1
> -       .if \r8plus
> +       .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
> +       .if \r11
>         movq_cfi r11, 6*8+\offset
> +       .endif
> +       .if \r8910
>         movq_cfi r10, 7*8+\offset
>         movq_cfi r9,  8*8+\offset
>         movq_cfi r8,  9*8+\offset
> @@ -113,16 +115,19 @@ For 32-bit we have the following conventions - kernel is built with
>         movq_cfi rdi, 14*8+\offset
>         .endm
>         .macro SAVE_C_REGS offset=0
> -       SAVE_C_REGS_HELPER \offset, 1, 1, 1
> +       SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
>         .endm
>         .macro SAVE_C_REGS_EXCEPT_RAX_RCX offset=0
> -       SAVE_C_REGS_HELPER \offset, 0, 0, 1
> +       SAVE_C_REGS_HELPER \offset, 0, 0, 1, 1
>         .endm
>         .macro SAVE_C_REGS_EXCEPT_R891011
> -       SAVE_C_REGS_HELPER 0, 1, 1, 0
> +       SAVE_C_REGS_HELPER 0, 1, 1, 0, 0
>         .endm
>         .macro SAVE_C_REGS_EXCEPT_RCX_R891011
> -       SAVE_C_REGS_HELPER 0, 1, 0, 0
> +       SAVE_C_REGS_HELPER 0, 1, 0, 0, 0
> +       .endm
> +       .macro SAVE_C_REGS_EXCEPT_RAX_RCX_R11
> +       SAVE_C_REGS_HELPER 0, 0, 0, 1, 0
>         .endm
>

This is unnecessarily difficult to read.  Could you rework it to use
named macro parameters?

>         .macro SAVE_EXTRA_REGS offset=0
> @@ -179,6 +184,9 @@ For 32-bit we have the following conventions - kernel is built with
>         .macro RESTORE_C_REGS_EXCEPT_R11
>         RESTORE_C_REGS_HELPER 1,1,0,1,1
>         .endm
> +       .macro RESTORE_C_REGS_EXCEPT_RCX_R11
> +       RESTORE_C_REGS_HELPER 1,0,0,1,1
> +       .endm

Ditto.

>         .macro RESTORE_RSI_RDI
>         RESTORE_C_REGS_HELPER 0,0,0,0,0
>         .endm
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 91af6be..2fd9349 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -121,14 +121,12 @@ ENDPROC(native_usergs_sysret64)
>  #endif
>
>  /*
> - * C code is not supposed to know about undefined top of stack. Every time
> + * C code is not supposed to know that iret frame is not populated. Every time

"that the iret frame," please.  (Вы говорите по-русски? :) )

>   * a C function with an pt_regs argument is called from the SYSCALL based
>   * fast path FIXUP_TOP_OF_STACK is needed.
>   * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
>   * manipulation.
>   */
> -
> -       /* %rsp:at FRAMEEND */
>         .macro FIXUP_TOP_OF_STACK tmp offset=0
>         movq PER_CPU_VAR(old_rsp),\tmp
>         movq \tmp,RSP+\offset(%rsp)
> @@ -136,15 +134,13 @@ ENDPROC(native_usergs_sysret64)
>         movq $__USER_CS,CS+\offset(%rsp)
>         movq RIP+\offset(%rsp),\tmp  /* get rip */
>         movq \tmp,RCX+\offset(%rsp)  /* copy it to rcx as sysret would do */
> -       movq R11+\offset(%rsp),\tmp  /* get eflags */
> -       movq \tmp,EFLAGS+\offset(%rsp)
> +       movq EFLAGS+\offset(%rsp),\tmp /* ditto for rflags->r11 */
> +       movq \tmp,R11+\offset(%rsp)
>         .endm

It occurs to me that both the name of this macro and comment are
wrong.  It's not fixing the *top* of the stack, since it fixes both
rcx and r11.  Oh, well, maybe we'll just delete it eventually.

The patch looks correct.  Can you submit a v2 once I finish reading
the rest of these?  (Also, can you put v2 in the subject?  My email is
going nuts here.)

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 20:36     ` Borislav Petkov
@ 2015-02-24 22:51       ` H. Peter Anvin
  0 siblings, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2015-02-24 22:51 UTC (permalink / raw)
  To: Borislav Petkov, Denys Vlasenko
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	Linux Kernel Mailing List

On 02/24/2015 12:36 PM, Borislav Petkov wrote:
> On Tue, Feb 24, 2015 at 09:13:03PM +0100, Denys Vlasenko wrote:
>> They aren't equal.  $1 and $2 in two lowest bits will also
>> be interpreted as "userspace" here. "Equal to $3" sends
>> a wrong message here to a human reading the code,
>> the code doesn't test for CPL=3, it tests for any nonzero CPL.
> 
> Doh, of course.
> 
> I was going to propose to make it even more explicit:
> 
> 	andb $3, CS(%rsp)...
> 
> but that's destructive.
> 
> So yours makes it less misleading, actually. To me at least and
> obviously.
> 

Yes, please use JZ/JNZ with TEST.

	-hpa



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 22:25 ` Andy Lutomirski
@ 2015-02-24 22:52   ` H. Peter Anvin
  2015-02-24 22:56     ` Andy Lutomirski
  2015-02-25  9:20     ` Ingo Molnar
  0 siblings, 2 replies; 38+ messages in thread
From: H. Peter Anvin @ 2015-02-24 22:52 UTC (permalink / raw)
  To: Andy Lutomirski, Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov,
	Will Drewry, Kees Cook, X86 ML, linux-kernel

On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>> It is safe to do 32-bit compare when checking that syscall#
>> is not too large.
> 
> Applied.  Thanks!
> 

NAK NAK NAK NAK NAK!!!!

We have already had this turn into a security issue not just once but
TWICE, because someone decided to "optimize" the path by taking out the
zero extend.

The use of a 64-bit compare here is an intentional "belts and
suspenders" safety issue.

	-hpa


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] x86: save user %rsp in pt_regs->sp
  2015-02-24 18:51 ` [PATCH 4/4] x86: save user %rsp in pt_regs->sp Denys Vlasenko
@ 2015-02-24 22:55   ` Andy Lutomirski
  0 siblings, 0 replies; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 22:55 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> PER_CPU(old_rsp) usage is simplified - now it is used only
> as temp storage, and userspace stack pointer is immediately stored
> in pt_regs->sp on syscall entry, instead of being used later,
> on syscall exit. This allows to get rid of thread_struct::usersp.

I'm not seeing any performance loss here, which is good.  This is a
nice cleanup.

> @@ -253,11 +247,13 @@ GLOBAL(system_call_after_swapgs)
>          */
>         ENABLE_INTERRUPTS(CLBR_NONE)
>         ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
> +       movq    %rcx,RIP(%rsp)
> +       movq    %r11,EFLAGS(%rsp)

> +       movq    PER_CPU_VAR(old_rsp),%rcx
> +       movq    %rcx,RSP(%rsp)

It's a bit unfortunate that this adds two instructions instead of just
one.  I guess this gets fix when we start using push instead of mov
here.  (And, if we use push, then we can fix up the RCX slot for free,
too.)

Acked-by me, but I won't apply it until v2 to avoid messy rebases.

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 22:52   ` H. Peter Anvin
@ 2015-02-24 22:56     ` Andy Lutomirski
  2015-02-24 23:01       ` H. Peter Anvin
  2015-02-25  9:20     ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 22:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
>> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>>> It is safe to do 32-bit compare when checking that syscall#
>>> is not too large.
>>
>> Applied.  Thanks!
>>
>
> NAK NAK NAK NAK NAK!!!!
>
> We have already had this turn into a security issue not just once but
> TWICE, because someone decided to "optimize" the path by taking out the
> zero extend.
>
> The use of a 64-bit compare here is an intentional "belts and
> suspenders" safety issue.

Fair enough.  OK if I just undo that part of this patch?

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 22:56     ` Andy Lutomirski
@ 2015-02-24 23:01       ` H. Peter Anvin
  2015-02-24 23:03         ` Andy Lutomirski
  0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2015-02-24 23:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
> On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
>>> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>>>> It is safe to do 32-bit compare when checking that syscall#
>>>> is not too large.
>>>
>>> Applied.  Thanks!
>>>
>>
>> NAK NAK NAK NAK NAK!!!!
>>
>> We have already had this turn into a security issue not just once but
>> TWICE, because someone decided to "optimize" the path by taking out the
>> zero extend.
>>
>> The use of a 64-bit compare here is an intentional "belts and
>> suspenders" safety issue.
> 
> Fair enough.  OK if I just undo that part of this patch?
> 

Actually this part should have been broken up.  The word "several" in
the patch description is by itself a cause to NAK the patch.

	-hpa



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 23:01       ` H. Peter Anvin
@ 2015-02-24 23:03         ` Andy Lutomirski
  2015-02-24 23:26           ` Denys Vlasenko
  0 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-24 23:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Tue, Feb 24, 2015 at 3:01 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 02/24/2015 02:56 PM, Andy Lutomirski wrote:
>> On Tue, Feb 24, 2015 at 2:52 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
>>>> On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>>> In all three 32-bit entry points, %eax is zero-extended to %rax.
>>>>> It is safe to do 32-bit compare when checking that syscall#
>>>>> is not too large.
>>>>
>>>> Applied.  Thanks!
>>>>
>>>
>>> NAK NAK NAK NAK NAK!!!!
>>>
>>> We have already had this turn into a security issue not just once but
>>> TWICE, because someone decided to "optimize" the path by taking out the
>>> zero extend.
>>>
>>> The use of a 64-bit compare here is an intentional "belts and
>>> suspenders" safety issue.
>>
>> Fair enough.  OK if I just undo that part of this patch?
>>
>
> Actually this part should have been broken up.  The word "several" in
> the patch description is by itself a cause to NAK the patch.

Point taken.

Denys, can you fix this and send a v2 of the entire series with the
traps.c fix as well?

Thanks,
Andy

>
>         -hpa
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 23:03         ` Andy Lutomirski
@ 2015-02-24 23:26           ` Denys Vlasenko
  0 siblings, 0 replies; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-24 23:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Ingo Molnar, Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Wed, Feb 25, 2015 at 12:03 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> Actually this part should have been broken up.  The word "several" in
>> the patch description is by itself a cause to NAK the patch.
>
> Point taken.
>
> Denys, can you fix this and send a v2 of the entire series with the
> traps.c fix as well?

Ok, working on it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
  2015-02-24 19:30   ` Steven Rostedt
  2015-02-24 22:37   ` Andy Lutomirski
@ 2015-02-25  8:53   ` Ingo Molnar
  2015-02-25 12:48     ` Denys Vlasenko
  2 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2015-02-25  8:53 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> PER_CPU_VAR(kernel_stack) was set up in a way where it 
> points five stack slots below the top of stack.
> 
> Presumably, it was done to avoid one "sub $5*8,%rsp" in 
> syscall/sysenter code paths, where iret frame needs to be 
> created by hand.
> 
> Ironically, none of them benefit from this optimization, 
> since all of them need to allocate additional data on 
> stack (struct pt_regs), so they still have to perform 
> subtraction.

Well, the original idea of percpu::kernel_stack was that of 
an optimization of the 64-bit system_call() path: to set up 
RSP as it has to be before we call into system calls.

This optimization has bitrotted away: because these days 
the first SAVE_ARGS in the 64-bit entry path modifies RSP 
as well, undoing the optimization.

But the fix should be to not touch RSP in SAVE_ARGS, to 
keep percpu::kernel_stack as an optimized entry point - 
with KERNEL_STACK_OFFSET pointing to.

So NAK - this should be fixed for real.

> And ia32_sysenter_target even needs to *undo* this 
> optimization: it constructs iret stack with pushes 
> instead of movs, so it needs to start right at the top.

Lets keep it in mind that in any case the micro-costs of 
the 32-bit entry path are almost always irrelevant: we 
optimize the 64-bit entry path, if that helps the 32-bit 
side as well then that's a happy coincidence, nothing more. 

If the 32-bit entry path can be optimized without affecting 
the 64-bit path then that's good, but we don't ever hurt 
the 64-bit path to make things easier or simpler for the 
32-bit path.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-24 22:52   ` H. Peter Anvin
  2015-02-24 22:56     ` Andy Lutomirski
@ 2015-02-25  9:20     ` Ingo Molnar
  2015-02-25  9:27       ` H. Peter Anvin
  2015-02-25 14:43       ` Steven Rostedt
  1 sibling, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2015-02-25  9:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
> > On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> >>
> >> In all three 32-bit entry points, %eax is 
> >> zero-extended to %rax. It is safe to do 32-bit compare 
> >> when checking that syscall# is not too large.
> > 
> > Applied.  Thanks!
> > 
> 
> NAK NAK NAK NAK NAK!!!!
> 
> We have already had this turn into a security issue not 
> just once but TWICE, because someone decided to 
> "optimize" the path by taking out the zero extend.
> 
> The use of a 64-bit compare here is an intentional "belts 
> and suspenders" safety issue.

I think the fundamental fragility is that we allow the high 
32 bits to be nonzero.

So could we just zap the high 32 bits of RAX early in the 
entry code, and then from that point on we could both use 
32-bit ops and won't have to remember the possibility 
either?

That's arguably one more (cheap) instruction in the 32-bit 
entry paths but then we could use the shorter 32-bit 
instructions for compares and other uses and could always 
be certain that we get what we want.


But, if we do that, we can do even better, and also do an 
optimization of the 64-bit entry path as well: we could 
simply mask RAX with 0x3ff and not do a compare. Pad the 
syscall table up to 0x400 (1024) entries and fill in the 
table with sys_ni syscall entries.

This is valid on 64-bit and 32-bit kernels as well, and it 
allows the removal of a compare from the syscall entry 
path, at the cost of a couple of kilobytes of unused 
syscall table.

The downside would be that if we ever grow past 1024 
syscall entries we'll be in trouble if new userspace calls 
syscall 513 on an old kernel and gets syscall 1.

I doubt we'll ever get so many syscalls, and user-space 
will be able to be smart in any case, so it's not a 
showstopper.

This is the safest and quickest implementation as well.

Thoughts?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25  9:20     ` Ingo Molnar
@ 2015-02-25  9:27       ` H. Peter Anvin
  2015-02-25  9:39         ` Ingo Molnar
  2015-02-25 14:43       ` Steven Rostedt
  1 sibling, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2015-02-25  9:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On 02/25/2015 01:20 AM, Ingo Molnar wrote:
>
> I think the fundamental fragility is that we allow the high
> 32 bits to be nonzero.
>
> So could we just zap the high 32 bits of RAX early in the
> entry code, and then from that point on we could both use
> 32-bit ops and won't have to remember the possibility
> either?
>

We do that, but people keep "optimizing" the zero extend away.  We have 
had this cause a wide-open security hole twice already.  So the extra 
REX prefix is a cheap cost to avoid this happen again.

	-hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25  9:27       ` H. Peter Anvin
@ 2015-02-25  9:39         ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2015-02-25  9:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Denys Vlasenko, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel


* H. Peter Anvin <hpa@zytor.com> wrote:

> > So could we just zap the high 32 bits of RAX early in 
> > the entry code, and then from that point on we could 
> > both use 32-bit ops and won't have to remember the 
> > possibility either?
> 
> We do that, [...]

Ok, indeed, so in ia32_sysenter_target() we have:

        movl    %eax, %eax

> [...] but people keep "optimizing" the zero extend away. 
> [...]

Possibly because there's not a single comment near that 
code explaining the importance of that line. But nobody 
will get a change past me with such a warning next to the 
line.

> [...]  We have had this cause a wide-open security hole 
> twice already.  So the extra REX prefix is a cheap cost 
> to avoid this happen again.

But since we already zap the high bits, there's no point in 
doing 64-bit compares...

Just make sure the high zero bit clearing is there and is 
never removed.

So in that sense the changes are correct, even in the 
security robustness sense.

Furthermore, with the masking suggestion I made in the 
previous mail it's moot as we can solve both problems: 
64-bit uses of RAX will become correct as well, and it
will be a bit faster as well.

Hm?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-24 22:37   ` Andy Lutomirski
@ 2015-02-25  9:45     ` Ingo Molnar
  2015-02-25 16:14       ` Andy Lutomirski
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2015-02-25  9:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel


* Andy Lutomirski <luto@amacapital.net> wrote:

> -       BUG_ON(((current_stack_pointer() ^ this_cpu_read_stable(kernel_stack))
> +       BUG_ON(((current_stack_pointer() ^
> +                (this_cpu_read_stable(kernel_stack) - 1))
>                 & ~(THREAD_SIZE - 1)) != 0);
> 
>         preempt_count_sub(HARDIRQ_OFFSET);
> 
> I added that in and applied this patch.

So this is not just slightly buggy, it's fundamentally 
wrong as well as it removes the possibility of an RSP value 
optimization from the 64-bit path, see my previous mail.

The right solution would be to make SAVE_ARGS recognize 
when KERNEL_STACK_OFFSET == args-offset and omit the RSP 
fixup in that case, or to simply use a __SAVE_ARGS for the 
64-bit path, knowing that RSP has the right value already.

Please also add comments that explain the relationship 
between percpu::kernel_stack, KERNEL_STACK_OFFSET and the 
64-bit system call entry code.

Also, guys, please slow down a bit and be more careful. 
Andy, could you please send me all currently pending entry 
bits pending in your tree, because all the in-flight 
changes make it hard for me to review patches? Please 
(re-)send all patches as well as part of the submission.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-25  8:53   ` Ingo Molnar
@ 2015-02-25 12:48     ` Denys Vlasenko
  2015-02-25 13:25       ` Denys Vlasenko
  0 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-25 12:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, Linux Kernel Mailing List

On Wed, Feb 25, 2015 at 9:53 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>
>> PER_CPU_VAR(kernel_stack) was set up in a way where it
>> points five stack slots below the top of stack.
>>
>> Presumably, it was done to avoid one "sub $5*8,%rsp" in
>> syscall/sysenter code paths, where iret frame needs to be
>> created by hand.
>>
>> Ironically, none of them benefit from this optimization,
>> since all of them need to allocate additional data on
>> stack (struct pt_regs), so they still have to perform
>> subtraction.
>
> Well, the original idea of percpu::kernel_stack was that of
> an optimization of the 64-bit system_call() path: to set up
> RSP as it has to be before we call into system calls.
>
> This optimization has bitrotted away: because these days
> the first SAVE_ARGS in the 64-bit entry path modifies RSP
> as well, undoing the optimization.

Yes, I figured this is how it was supposed to work.

> But the fix should be to not touch RSP in SAVE_ARGS, to
> keep percpu::kernel_stack as an optimized entry point -
> with KERNEL_STACK_OFFSET pointing to.
>
> So NAK - this should be fixed for real.

IOW, the proposal is to set KERNEL_STACK_OFFSET
to SIZEOF_PTREGS. I can do that.

However.

There is an ortogonal idea we were discussing: to save
registers and construct iret frame using PUSH insns, not MOVs.
IIRC Andy and Linus liked it. I am ambivalent: the code will be smaller,
but might get slower (at least on some CPUs).
If we go that way, we will require KERNEL_STACK_OFFSET = 0
(IOW: the current patch).

The decision on how exactly we should fix KERNEL_STACK_OFFSET
(set it to SIZEOF_PTREGS or to zero) depends on whether
we switch to using PUSHes, or not. What do you think?

-- 
vda

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-25 12:48     ` Denys Vlasenko
@ 2015-02-25 13:25       ` Denys Vlasenko
  2015-02-25 13:53         ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-25 13:25 UTC (permalink / raw)
  To: Denys Vlasenko, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	Linux Kernel Mailing List

On 02/25/2015 01:48 PM, Denys Vlasenko wrote:
> On Wed, Feb 25, 2015 at 9:53 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> But the fix should be to not touch RSP in SAVE_ARGS, to
>> keep percpu::kernel_stack as an optimized entry point -
>> with KERNEL_STACK_OFFSET pointing to.
>>
>> So NAK - this should be fixed for real.
> 
> IOW, the proposal is to set KERNEL_STACK_OFFSET
> to SIZEOF_PTREGS. I can do that.
> 
> However.
> 
> There is an ortogonal idea we were discussing: to save
> registers and construct iret frame using PUSH insns, not MOVs.
> IIRC Andy and Linus liked it. I am ambivalent: the code will be smaller,
> but might get slower (at least on some CPUs).
> If we go that way, we will require KERNEL_STACK_OFFSET = 0
> (IOW: the current patch).
> 
> The decision on how exactly we should fix KERNEL_STACK_OFFSET
> (set it to SIZEOF_PTREGS or to zero) depends on whether
> we switch to using PUSHes, or not. What do you think?

A data point. I implemented push-based creation of pt_regs
and benchmarked it. The patch is on top of all my latest
patches sent to ML.

On SandyBridge CPU, it does not get slower: seems to be 1 cycle
faster per syscall.

We lose a number of large insns there:

    text           data     bss     dec     hex filename
-   9863              0       0    9863    2687 entry_64.o
+   9671              0       0    9671    25c7 entry_64.o


diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index f505cb6..d97bd92 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -128,8 +128,6 @@ ENDPROC(native_usergs_sysret64)
  * manipulation.
  */
 	.macro FIXUP_TOP_OF_STACK tmp offset=0
-	movq $__USER_DS,SS+\offset(%rsp)
-	movq $__USER_CS,CS+\offset(%rsp)
 	movq RIP+\offset(%rsp),\tmp  /* get rip */
 	movq \tmp,RCX+\offset(%rsp)  /* copy it to rcx as sysret would do */
 	movq EFLAGS+\offset(%rsp),\tmp /* ditto for rflags->r11 */
@@ -245,14 +243,22 @@ GLOBAL(system_call_after_swapgs)
 	 * and short:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
-	movq	%rcx,RIP(%rsp)
-	movq	%r11,EFLAGS(%rsp)
-	movq    PER_CPU_VAR(old_rsp),%rcx
-	movq    %rcx,RSP(%rsp)
-	movq_cfi rax,ORIG_RAX
-	SAVE_C_REGS_EXCEPT_RAX_RCX_R11
-	movq	$-ENOSYS,RAX(%rsp)
+	/* Construct struct pt_regs on stack */
+	pushq	$__USER_DS		/* pt_regs->ss */
+	pushq	PER_CPU_VAR(old_rsp)	/* pt_regs->sp */
+	pushq	%r11			/* pt_regs->flags */
+	pushq	$__USER_CS		/* pt_regs->cs */
+	pushq	%rcx			/* pt_regs->ip */
+	pushq	%rax			/* pt_regs->orig_ax */
+	pushq	%rdi			/* pt_regs->di */
+	pushq	%rsi			/* pt_regs->si */
+	pushq	%rdx			/* pt_regs->dx */
+	pushq	%rcx			/* pt_regs->cx */
+	pushq	$-ENOSYS		/* pt_regs->ax */
+	pushq	%r8			/* pt_regs->r8 */
+	pushq	%r9			/* pt_regs->r9 */
+	pushq	%r10			/* pt_regs->r10 */
+	sub	$(7*8),%rsp /* pt_regs->r11,bp,bx,r12-15 */
 	CFI_REL_OFFSET rip,RIP
 	testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
 	jnz tracesys



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-25 13:25       ` Denys Vlasenko
@ 2015-02-25 13:53         ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2015-02-25 13:53 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Denys Vlasenko, Andy Lutomirski, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, Linux Kernel Mailing List


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> > The decision on how exactly we should fix 
> > KERNEL_STACK_OFFSET (set it to SIZEOF_PTREGS or to 
> > zero) depends on whether we switch to using PUSHes, or 
> > not. What do you think?

Yes.

> A data point. I implemented push-based creation of 
> pt_regs and benchmarked it. The patch is on top of all my 
> latest patches sent to ML.
> 
> On SandyBridge CPU, it does not get slower: seems to be 1 
> cycle faster per syscall.
> 
> We lose a number of large insns there:
> 
>     text           data     bss     dec     hex filename
> -   9863              0       0    9863    2687 entry_64.o
> +   9671              0       0    9671    25c7 entry_64.o

That's a nice reduction in I$ footprint ...

> +	/* Construct struct pt_regs on stack */
> +	pushq	$__USER_DS		/* pt_regs->ss */
> +	pushq	PER_CPU_VAR(old_rsp)	/* pt_regs->sp */
> +	pushq	%r11			/* pt_regs->flags */

Btw., this could also construct all the dwarf annotations 
in a natural, maintainable fashion - pushq_cfi and friends?

> +	pushq	$__USER_CS		/* pt_regs->cs */
> +	pushq	%rcx			/* pt_regs->ip */
> +	pushq	%rax			/* pt_regs->orig_ax */
> +	pushq	%rdi			/* pt_regs->di */
> +	pushq	%rsi			/* pt_regs->si */
> +	pushq	%rdx			/* pt_regs->dx */
> +	pushq	%rcx			/* pt_regs->cx */
> +	pushq	$-ENOSYS		/* pt_regs->ax */
> +	pushq	%r8			/* pt_regs->r8 */
> +	pushq	%r9			/* pt_regs->r9 */
> +	pushq	%r10			/* pt_regs->r10 */
> +	sub	$(7*8),%rsp /* pt_regs->r11,bp,bx,r12-15 */

So the 'SUB' there is a bit sad, but push sequences are 
generally easier to read, so I like it altogether.

Then we could indeed get rid of KERNEL_STACK_OFFSET.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25  9:20     ` Ingo Molnar
  2015-02-25  9:27       ` H. Peter Anvin
@ 2015-02-25 14:43       ` Steven Rostedt
  2015-02-25 15:40         ` Denys Vlasenko
  1 sibling, 1 reply; 38+ messages in thread
From: Steven Rostedt @ 2015-02-25 14:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Andy Lutomirski, Denys Vlasenko, Linus Torvalds,
	Borislav Petkov, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel

On Wed, 25 Feb 2015 10:20:43 +0100
Ingo Molnar <mingo@kernel.org> wrote:


> But, if we do that, we can do even better, and also do an 
> optimization of the 64-bit entry path as well: we could 
> simply mask RAX with 0x3ff and not do a compare. Pad the 
> syscall table up to 0x400 (1024) entries and fill in the 
> table with sys_ni syscall entries.
> 
> This is valid on 64-bit and 32-bit kernels as well, and it 
> allows the removal of a compare from the syscall entry 
> path, at the cost of a couple of kilobytes of unused 
> syscall table.
> 
> The downside would be that if we ever grow past 1024 
> syscall entries we'll be in trouble if new userspace calls 
> syscall 513 on an old kernel and gets syscall 1.

What if we test against ~0x3ff and jump to sys_ni if anything is set.
This would still work with new userspace calls on older kernels.

-- Steve


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25 14:43       ` Steven Rostedt
@ 2015-02-25 15:40         ` Denys Vlasenko
  2015-02-25 16:01           ` Steven Rostedt
  0 siblings, 1 reply; 38+ messages in thread
From: Denys Vlasenko @ 2015-02-25 15:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Denys Vlasenko,
	Linus Torvalds, Borislav Petkov, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, Feb 25, 2015 at 3:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 25 Feb 2015 10:20:43 +0100 Ingo Molnar <mingo@kernel.org> wrote:
>> But, if we do that, we can do even better, and also do an
>> optimization of the 64-bit entry path as well: we could
>> simply mask RAX with 0x3ff and not do a compare. Pad the
>> syscall table up to 0x400 (1024) entries and fill in the
>> table with sys_ni syscall entries.
>>
>> This is valid on 64-bit and 32-bit kernels as well, and it
>> allows the removal of a compare from the syscall entry
>> path, at the cost of a couple of kilobytes of unused
>> syscall table.
>>
>> The downside would be that if we ever grow past 1024
>> syscall entries we'll be in trouble if new userspace calls
>> syscall 513 on an old kernel and gets syscall 1.
>
> What if we test against ~0x3ff and jump to sys_ni if anything is set.
> This would still work with new userspace calls on older kernels.

That would require a branch insn. The whole idea of masking
was merely to avoid that branch. If you need a branch,
then you can as well just retain current code.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25 15:40         ` Denys Vlasenko
@ 2015-02-25 16:01           ` Steven Rostedt
  2015-02-25 16:06             ` Borislav Petkov
  2015-02-26 11:47             ` Ingo Molnar
  0 siblings, 2 replies; 38+ messages in thread
From: Steven Rostedt @ 2015-02-25 16:01 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Denys Vlasenko,
	Linus Torvalds, Borislav Petkov, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, 25 Feb 2015 16:40:49 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:

> >> The downside would be that if we ever grow past 1024
> >> syscall entries we'll be in trouble if new userspace calls
> >> syscall 513 on an old kernel and gets syscall 1.
> >
> > What if we test against ~0x3ff and jump to sys_ni if anything is set.
> > This would still work with new userspace calls on older kernels.
> 
> That would require a branch insn. The whole idea of masking
> was merely to avoid that branch. If you need a branch,
> then you can as well just retain current code.

I'm just curious, do all these micro optimizations have any real impact
on real use cases?

That is, if we are going to make the system less robust, shouldn't we
show that it has real benefit?

And yes, I consider user space passing in a system call of 0x401 and
having it work the same as 0x1 as being less robust.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25 16:01           ` Steven Rostedt
@ 2015-02-25 16:06             ` Borislav Petkov
  2015-02-26 11:47             ` Ingo Molnar
  1 sibling, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-02-25 16:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Denys Vlasenko, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Denys Vlasenko, Linus Torvalds, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Wed, Feb 25, 2015 at 11:01:29AM -0500, Steven Rostedt wrote:
> I'm just curious, do all these micro optimizations have any real impact
> on real use cases?
> 
> That is, if we are going to make the system less robust, shouldn't we
> show that it has real benefit?

I'm wondering the same thing this whole time. Is it even worth
the bother if those "improvements" don't show on any reasonable
benchmark...? And maybe we should benchmark stuff first and then
apply.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-25  9:45     ` Ingo Molnar
@ 2015-02-25 16:14       ` Andy Lutomirski
  2015-02-26 11:42         ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-25 16:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel,
	Tony Luck

On Wed, Feb 25, 2015 at 1:45 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>> -       BUG_ON(((current_stack_pointer() ^ this_cpu_read_stable(kernel_stack))
>> +       BUG_ON(((current_stack_pointer() ^
>> +                (this_cpu_read_stable(kernel_stack) - 1))
>>                 & ~(THREAD_SIZE - 1)) != 0);
>>
>>         preempt_count_sub(HARDIRQ_OFFSET);
>>
>> I added that in and applied this patch.
>
> So this is not just slightly buggy, it's fundamentally
> wrong as well as it removes the possibility of an RSP value
> optimization from the 64-bit path, see my previous mail.

This is just trying to check that the function is executing on the
per-thread stack.  It was correct (and fairly heavily tested by Tony)
wither KERNEL_STACK_OFFSET being nonzero, but we're checking the wrong
page if KERNEL_STACK_OFFSET becomes zero.

I don't think I understand your objection to this bit.

>
> The right solution would be to make SAVE_ARGS recognize
> when KERNEL_STACK_OFFSET == args-offset and omit the RSP
> fixup in that case, or to simply use a __SAVE_ARGS for the
> 64-bit path, knowing that RSP has the right value already.
>
> Please also add comments that explain the relationship
> between percpu::kernel_stack, KERNEL_STACK_OFFSET and the
> 64-bit system call entry code.
>
> Also, guys, please slow down a bit and be more careful.
> Andy, could you please send me all currently pending entry
> bits pending in your tree, because all the in-flight
> changes make it hard for me to review patches? Please
> (re-)send all patches as well as part of the submission.

Will do later today.

Note that someone just found a bug as a result of this being in -next.

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-25 16:14       ` Andy Lutomirski
@ 2015-02-26 11:42         ` Ingo Molnar
  2015-02-26 15:21           ` Andy Lutomirski
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2015-02-26 11:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel,
	Tony Luck


* Andy Lutomirski <luto@amacapital.net> wrote:

> >> I added that in and applied this patch.
> >
> > So this is not just slightly buggy, it's fundamentally 
> > wrong as well as it removes the possibility of an RSP 
> > value optimization from the 64-bit path, see my 
> > previous mail.
> 
> This is just trying to check that the function is 
> executing on the per-thread stack.  It was correct (and 
> fairly heavily tested by Tony) wither KERNEL_STACK_OFFSET 
> being nonzero, but we're checking the wrong page if 
> KERNEL_STACK_OFFSET becomes zero.
> 
> I don't think I understand your objection to this bit.

I object to the KERNEL_STACK_OFFSET removal patch you fixed 
here, not to the add-on fix in particular (which is correct 
in the context of that patch).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-25 16:01           ` Steven Rostedt
  2015-02-25 16:06             ` Borislav Petkov
@ 2015-02-26 11:47             ` Ingo Molnar
  2015-02-26 12:47               ` Steven Rostedt
  2015-02-26 12:50               ` Steven Rostedt
  1 sibling, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2015-02-26 11:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Denys Vlasenko, H. Peter Anvin, Andy Lutomirski, Denys Vlasenko,
	Linus Torvalds, Borislav Petkov, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > That would require a branch insn. The whole idea of 
> > masking was merely to avoid that branch. If you need a 
> > branch, then you can as well just retain current code.
> 
> I'm just curious, do all these micro optimizations have 
> any real impact on real use cases?

The bona fide removal of a real instruction from a true hot 
path is generally always worth doing, you don't even have 
to 'prove' that it improves things: unless the claim is 
that for some really robust reason the instruction was zero 
cost to begin with.

So the main question here is not whether it's worth doing 
it, the question is the cost of the removal:

  - the change in syscall number overflow handling
    behavior. (We might not want the new behavior)

  - the increase in the syscall table size.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-26 11:47             ` Ingo Molnar
@ 2015-02-26 12:47               ` Steven Rostedt
  2015-02-26 12:50               ` Steven Rostedt
  1 sibling, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2015-02-26 12:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, H. Peter Anvin, Andy Lutomirski, Denys Vlasenko,
	Linus Torvalds, Borislav Petkov, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar <mingo@kernel.org> wrote:

> So the main question here is not whether it's worth doing 
> it, the question is the cost of the removal:
> 
>   - the change in syscall number overflow handling
>     behavior. (We might not want the new behavior)
> 
>   - the increase in the syscall table size.

Yep, totally agree.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
  2015-02-26 11:47             ` Ingo Molnar
  2015-02-26 12:47               ` Steven Rostedt
@ 2015-02-26 12:50               ` Steven Rostedt
  1 sibling, 0 replies; 38+ messages in thread
From: Steven Rostedt @ 2015-02-26 12:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, H. Peter Anvin, Andy Lutomirski, Denys Vlasenko,
	Linus Torvalds, Borislav Petkov, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Thu, 26 Feb 2015 12:47:03 +0100
Ingo Molnar <mingo@kernel.org> wrote:


> The bona fide removal of a real instruction from a true hot 
> path is generally always worth doing, you don't even have 
> to 'prove' that it improves things: unless the claim is 
> that for some really robust reason the instruction was zero 
> cost to begin with.

I agree that removing instructions from hot paths are for the most part
worth doing without any proof, as long as it doesn't change behavior or
increase the memory footprint. But that's not the case here. If the
change does affect behavior or increases memory footprint, then it
should prove itself as worth doing.

-- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-26 11:42         ` Ingo Molnar
@ 2015-02-26 15:21           ` Andy Lutomirski
  2015-02-26 15:29             ` Andy Lutomirski
  0 siblings, 1 reply; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-26 15:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel,
	Tony Luck

On Thu, Feb 26, 2015 at 3:42 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>> >> I added that in and applied this patch.
>> >
>> > So this is not just slightly buggy, it's fundamentally
>> > wrong as well as it removes the possibility of an RSP
>> > value optimization from the 64-bit path, see my
>> > previous mail.
>>
>> This is just trying to check that the function is
>> executing on the per-thread stack.  It was correct (and
>> fairly heavily tested by Tony) wither KERNEL_STACK_OFFSET
>> being nonzero, but we're checking the wrong page if
>> KERNEL_STACK_OFFSET becomes zero.
>>
>> I don't think I understand your objection to this bit.
>
> I object to the KERNEL_STACK_OFFSET removal patch you fixed
> here, not to the add-on fix in particular (which is correct
> in the context of that patch).
>

Aha. I misunderstood.

I would object, too, except that I think that a better improvement
would be to build the entire frame using pushes, including the "top of
stack" part, even on fast-path syscalls.  That would have
KERNEL_STACK_OFFSET=0.

Actually, I want an even bigger change.  kernel_stack seems entirely
pointless to me, since we could just as easily be using tss.sp0 (I
think), as long as there's no useful offset.  So maybe the right way
to handle this patch would be to first clean up the ia32entry code and
all the non-asm users to use tss.sp0, and then to figure out what to
do about the syscall64 path.

I'd be happy to defer this patch until the FIXUP_TOP_OF_STACK cleanups
later on, assuming it can be easily extricated, which I haven't
checked yet.  Thoughts?

--Andy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET
  2015-02-26 15:21           ` Andy Lutomirski
@ 2015-02-26 15:29             ` Andy Lutomirski
  0 siblings, 0 replies; 38+ messages in thread
From: Andy Lutomirski @ 2015-02-26 15:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML, linux-kernel,
	Tony Luck

On Thu, Feb 26, 2015 at 7:21 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Feb 26, 2015 at 3:42 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Andy Lutomirski <luto@amacapital.net> wrote:
>>
>>> >> I added that in and applied this patch.
>>> >
>>> > So this is not just slightly buggy, it's fundamentally
>>> > wrong as well as it removes the possibility of an RSP
>>> > value optimization from the 64-bit path, see my
>>> > previous mail.
>>>
>>> This is just trying to check that the function is
>>> executing on the per-thread stack.  It was correct (and
>>> fairly heavily tested by Tony) wither KERNEL_STACK_OFFSET
>>> being nonzero, but we're checking the wrong page if
>>> KERNEL_STACK_OFFSET becomes zero.
>>>
>>> I don't think I understand your objection to this bit.
>>
>> I object to the KERNEL_STACK_OFFSET removal patch you fixed
>> here, not to the add-on fix in particular (which is correct
>> in the context of that patch).
>>
>
> Aha. I misunderstood.
>
> I would object, too, except that I think that a better improvement
> would be to build the entire frame using pushes, including the "top of
> stack" part, even on fast-path syscalls.  That would have
> KERNEL_STACK_OFFSET=0.
>
> Actually, I want an even bigger change.  kernel_stack seems entirely
> pointless to me, since we could just as easily be using tss.sp0 (I
> think), as long as there's no useful offset.  So maybe the right way
> to handle this patch would be to first clean up the ia32entry code and
> all the non-asm users to use tss.sp0, and then to figure out what to
> do about the syscall64 path.
>
> I'd be happy to defer this patch until the FIXUP_TOP_OF_STACK cleanups
> later on, assuming it can be easily extricated, which I haven't
> checked yet.  Thoughts?
>

Damnit, there are too many sets of patches that I've confused myself.
I haven't applied this one yet.  My bad for letting the set of things
flying around get out of control.

I think I'll NAK this patch as is.  Let's get the existing stuff
stabilized first.

For a resubmission of this, or something like it, let's do it in nice
bite-sized pieces:

a) Make sure that asm can read sp0.  (It's at a fixed offset from a
percpu variable, so it should be fine, but asm-offsets might need
updating.)

b) Remove the C inline helpers and fix anything that needs fixing,
such as the BUG_ON in traps.c

c) In appropriately split-up pieces, change over the asm code to use
sp0 instead of kernel_stack

d) For the 64-bit syscall path, either switch to sp0 or keep using
kernel_stack but *ename it to avoid to avoid confusion.

One thing that was problematic with the big layout change patch is
that it unnecessarily made too many changes at once.  IMO it really
ought to have introduced new macros (including
ALLOC_PARTIAL_WHATEVER_ON_STACK), then switch users of the macros to
new macros piece by piece without layout changes (to improve
bisectability and to fix the real underlying problem with the old
code, namely that it was totally incomprehensible), and *then* to
change the layout with a patch in which both the old and new code was
readable.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-02-26 15:30 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24 18:51 [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Denys Vlasenko
2015-02-24 18:51 ` [PATCH 2/4] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
2015-02-24 19:30   ` Steven Rostedt
2015-02-24 20:02     ` Denys Vlasenko
2015-02-24 22:37   ` Andy Lutomirski
2015-02-25  9:45     ` Ingo Molnar
2015-02-25 16:14       ` Andy Lutomirski
2015-02-26 11:42         ` Ingo Molnar
2015-02-26 15:21           ` Andy Lutomirski
2015-02-26 15:29             ` Andy Lutomirski
2015-02-25  8:53   ` Ingo Molnar
2015-02-25 12:48     ` Denys Vlasenko
2015-02-25 13:25       ` Denys Vlasenko
2015-02-25 13:53         ` Ingo Molnar
2015-02-24 18:51 ` [PATCH 3/4] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath Denys Vlasenko
2015-02-24 22:44   ` Andy Lutomirski
2015-02-24 18:51 ` [PATCH 4/4] x86: save user %rsp in pt_regs->sp Denys Vlasenko
2015-02-24 22:55   ` Andy Lutomirski
2015-02-24 19:58 ` [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Borislav Petkov
2015-02-24 20:13   ` Denys Vlasenko
2015-02-24 20:36     ` Borislav Petkov
2015-02-24 22:51       ` H. Peter Anvin
2015-02-24 22:25 ` Andy Lutomirski
2015-02-24 22:52   ` H. Peter Anvin
2015-02-24 22:56     ` Andy Lutomirski
2015-02-24 23:01       ` H. Peter Anvin
2015-02-24 23:03         ` Andy Lutomirski
2015-02-24 23:26           ` Denys Vlasenko
2015-02-25  9:20     ` Ingo Molnar
2015-02-25  9:27       ` H. Peter Anvin
2015-02-25  9:39         ` Ingo Molnar
2015-02-25 14:43       ` Steven Rostedt
2015-02-25 15:40         ` Denys Vlasenko
2015-02-25 16:01           ` Steven Rostedt
2015-02-25 16:06             ` Borislav Petkov
2015-02-26 11:47             ` Ingo Molnar
2015-02-26 12:47               ` Steven Rostedt
2015-02-26 12:50               ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).