LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/10] Control Flow Enforcement - Part (3)
@ 2018-06-07 14:37 Yu-cheng Yu
  2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
                   ` (12 more replies)
  0 siblings, 13 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:37 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

This series introduces CET - Shadow stack

At the high level, shadow stack is:

	Allocated from a task's address space with vm_flags VM_SHSTK;
	Its PTEs must be read-only and dirty;
	Fixed sized, but the default size can be changed by sys admin.

For a forked child, the shadow stack is duplicated when the next
shadow stack access takes place.

For a pthread child, a new shadow stack is allocated.

The signal handler uses the same shadow stack as the main program.

Yu-cheng Yu (10):
  x86/cet: User-mode shadow stack support
  x86/cet: Introduce WRUSS instruction
  x86/cet: Signal handling for shadow stack
  x86/cet: Handle thread shadow stack
  x86/cet: ELF header parsing of Control Flow Enforcement
  x86/cet: Add arch_prctl functions for shadow stack
  mm: Prevent mprotect from changing shadow stack
  mm: Prevent mremap of shadow stack
  mm: Prevent madvise from changing shadow stack
  mm: Prevent munmap and remap_file_pages of shadow stack

 arch/x86/Kconfig                              |   4 +
 arch/x86/ia32/ia32_signal.c                   |   5 +
 arch/x86/include/asm/cet.h                    |  48 ++++++
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/elf.h                    |   5 +
 arch/x86/include/asm/mmu_context.h            |   3 +
 arch/x86/include/asm/msr-index.h              |  14 ++
 arch/x86/include/asm/processor.h              |   5 +
 arch/x86/include/asm/special_insns.h          |  44 +++++
 arch/x86/include/uapi/asm/elf_property.h      |  16 ++
 arch/x86/include/uapi/asm/prctl.h             |  15 ++
 arch/x86/include/uapi/asm/sigcontext.h        |   4 +
 arch/x86/kernel/Makefile                      |   4 +
 arch/x86/kernel/cet.c                         | 224 ++++++++++++++++++++++++
 arch/x86/kernel/cet_prctl.c                   | 203 ++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c                  |  24 +++
 arch/x86/kernel/elf.c                         | 236 ++++++++++++++++++++++++++
 arch/x86/kernel/process.c                     |  10 ++
 arch/x86/kernel/process_64.c                  |   7 +
 arch/x86/kernel/signal.c                      |  11 ++
 arch/x86/lib/x86-opcode-map.txt               |   2 +-
 arch/x86/mm/fault.c                           |  13 +-
 fs/binfmt_elf.c                               |  16 ++
 fs/proc/task_mmu.c                            |   3 +
 include/uapi/linux/elf.h                      |   1 +
 mm/madvise.c                                  |   9 +
 mm/mmap.c                                     |  13 ++
 mm/mprotect.c                                 |   9 +
 mm/mremap.c                                   |   5 +-
 tools/objtool/arch/x86/lib/x86-opcode-map.txt |   2 +-
 30 files changed, 958 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/include/uapi/asm/elf_property.h
 create mode 100644 arch/x86/kernel/cet.c
 create mode 100644 arch/x86/kernel/cet_prctl.c
 create mode 100644 arch/x86/kernel/elf.c

-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
@ 2018-06-07 14:37 ` Yu-cheng Yu
  2018-06-07 16:37   ` Andy Lutomirski
  2018-06-12 11:56   ` Balbir Singh
  2018-06-07 14:37 ` [PATCH 02/10] x86/cet: Introduce WRUSS instruction Yu-cheng Yu
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:37 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

This patch adds basic shadow stack enabling/disabling routines.
A task's shadow stack is allocated from memory with VM_SHSTK
flag set and read-only protection.  The shadow stack is
allocated to a fixed size and that can be changed by the system
admin.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h               |  32 ++++++++
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/msr-index.h         |  14 ++++
 arch/x86/include/asm/processor.h         |   5 ++
 arch/x86/kernel/Makefile                 |   2 +
 arch/x86/kernel/cet.c                    | 123 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c             |  24 ++++++
 arch/x86/kernel/process.c                |   2 +
 fs/proc/task_mmu.c                       |   3 +
 9 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/kernel/cet.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
new file mode 100644
index 000000000000..9d5bc1efc9b7
--- /dev/null
+++ b/arch/x86/include/asm/cet.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CET_H
+#define _ASM_X86_CET_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+struct task_struct;
+/*
+ * Per-thread CET status
+ */
+struct cet_stat {
+	unsigned long	shstk_base;
+	unsigned long	shstk_size;
+	unsigned int	shstk_enabled:1;
+};
+
+#ifdef CONFIG_X86_INTEL_CET
+unsigned long cet_get_shstk_ptr(void);
+int cet_setup_shstk(void);
+void cet_disable_shstk(void);
+void cet_disable_free_shstk(struct task_struct *p);
+#else
+static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
+static inline int cet_setup_shstk(void) { return 0; }
+static inline void cet_disable_shstk(void) {}
+static inline void cet_disable_free_shstk(struct task_struct *p) {}
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_X86_CET_H */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 33833d1909af..3624a11e5ba6 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,6 +56,12 @@
 # define DISABLE_PTI		(1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+#define DISABLE_SHSTK	0
+#else
+#define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -75,7 +81,7 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fda2114197b3..428d13828ba9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -770,4 +770,18 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET		0x6a0
+#define MSR_IA32_S_CET		0x6a2
+#define MSR_IA32_PL0_SSP	0x6a4
+#define MSR_IA32_PL3_SSP	0x6a7
+#define MSR_IA32_INT_SSP_TAB	0x6a8
+
+/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */
+#define MSR_IA32_CET_SHSTK_EN		0x0000000000000001
+#define MSR_IA32_CET_WRSS_EN		0x0000000000000002
+#define MSR_IA32_CET_ENDBR_EN		0x0000000000000004
+#define MSR_IA32_CET_LEG_IW_EN		0x0000000000000008
+#define MSR_IA32_CET_NO_TRACK_EN	0x0000000000000010
+
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 21a114914ba4..e632dd7adaac 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -24,6 +24,7 @@ struct vm86;
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
 #include <asm/unwind_hints.h>
+#include <asm/cet.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -507,6 +508,10 @@ struct thread_struct {
 	unsigned int		sig_on_uaccess_err:1;
 	unsigned int		uaccess_err:1;	/* uaccess failed */
 
+#ifdef CONFIG_X86_INTEL_CET
+	struct cet_stat		cet;
+#endif
+
 	/* Floating point and extended processor state */
 	struct fpu		fpu;
 	/*
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..7ea5e099d558 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -138,6 +138,8 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
+obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
new file mode 100644
index 000000000000..8abbfd44322a
--- /dev/null
+++ b/arch/x86/kernel/cet.c
@@ -0,0 +1,123 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * cet.c - Control Flow Enforcement (CET)
+ *
+ * Copyright (c) 2018, Intel Corporation.
+ * Yu-cheng Yu <yu-cheng.yu@intel.com>
+ */
+
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/sched/signal.h>
+#include <asm/msr.h>
+#include <asm/user.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/cet.h>
+
+#define SHSTK_SIZE (0x8000 * (test_thread_flag(TIF_IA32) ? 4 : 8))
+
+static inline int cet_set_shstk_ptr(unsigned long addr)
+{
+	u64 r;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return -1;
+
+	if ((addr >= TASK_SIZE) || (!IS_ALIGNED(addr, 4)))
+		return -1;
+
+	rdmsrl(MSR_IA32_U_CET, r);
+	wrmsrl(MSR_IA32_U_CET, r | MSR_IA32_CET_SHSTK_EN);
+	wrmsrl(MSR_IA32_PL3_SSP, addr);
+	return 0;
+}
+
+unsigned long cet_get_shstk_ptr(void)
+{
+	unsigned long ptr;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	rdmsrl(MSR_IA32_PL3_SSP, ptr);
+	return ptr;
+}
+
+static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long populate;
+
+	down_write(&mm->mmap_sem);
+	addr = do_mmap(NULL, addr, len, PROT_READ,
+		       MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
+		       0, &populate, NULL);
+	up_write(&mm->mmap_sem);
+
+	if (populate)
+		mm_populate(addr, populate);
+
+	return addr;
+}
+
+int cet_setup_shstk(void)
+{
+	unsigned long addr, size;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return -EOPNOTSUPP;
+
+	size = SHSTK_SIZE;
+	addr = shstk_mmap(0, size);
+
+	if (addr >= TASK_SIZE)
+		return -ENOMEM;
+
+	cet_set_shstk_ptr(addr + size - sizeof(void *));
+	current->thread.cet.shstk_base = addr;
+	current->thread.cet.shstk_size = size;
+	current->thread.cet.shstk_enabled = 1;
+	return 0;
+}
+
+void cet_disable_shstk(void)
+{
+	u64 r;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return;
+
+	rdmsrl(MSR_IA32_U_CET, r);
+	r &= ~(MSR_IA32_CET_SHSTK_EN);
+	wrmsrl(MSR_IA32_U_CET, r);
+	wrmsrl(MSR_IA32_PL3_SSP, 0);
+	current->thread.cet.shstk_enabled = 0;
+}
+
+void cet_disable_free_shstk(struct task_struct *tsk)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
+	    !tsk->thread.cet.shstk_enabled)
+		return;
+
+	if (tsk == current)
+		cet_disable_shstk();
+
+	/*
+	 * Free only when tsk is current or shares mm
+	 * with current but has its own shstk.
+	 */
+	if (tsk->mm && (tsk->mm == current->mm) &&
+	    (tsk->thread.cet.shstk_base)) {
+		vm_munmap(tsk->thread.cet.shstk_base,
+			  tsk->thread.cet.shstk_size);
+		tsk->thread.cet.shstk_base = 0;
+		tsk->thread.cet.shstk_size = 0;
+	}
+
+	tsk->thread.cet.shstk_enabled = 0;
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 38276f58d3bf..f54fabdaef60 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -401,6 +401,29 @@ static __init int setup_disable_pku(char *arg)
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
+		cr4_set_bits(X86_CR4_CET);
+}
+
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+static __init int setup_disable_shstk(char *s)
+{
+	/* require an exact match without trailing characters */
+	if (strlen(s))
+		return 0;
+
+	if (!boot_cpu_has(X86_FEATURE_SHSTK))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_SHSTK);
+	pr_info("x86: 'noshstk' specified, disabling Shadow Stack\n");
+	return 1;
+}
+__setup("noshstk", setup_disable_shstk);
+#endif
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
@@ -1313,6 +1336,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	x86_init_rdrand(c);
 	x86_init_cache_qos(c);
 	setup_pku(c);
+	setup_cet(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 30ca2d1a9231..b3b0b482983a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -39,6 +39,7 @@
 #include <asm/desc.h>
 #include <asm/prctl.h>
 #include <asm/spec-ctrl.h>
+#include <asm/cet.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -136,6 +137,7 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
+	cet_disable_shstk();
 	fpu__clear(&tsk->thread.fpu);
 }
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c486ad4b43f0..6aca93ecec0e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_PKEY_BIT1)]	= "",
 		[ilog2(VM_PKEY_BIT2)]	= "",
 		[ilog2(VM_PKEY_BIT3)]	= "",
+#endif
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+		[ilog2(VM_SHSTK)]	= "ss"
 #endif
 	};
 	size_t i;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
  2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
@ 2018-06-07 14:37 ` Yu-cheng Yu
  2018-06-07 16:40   ` Andy Lutomirski
  2018-06-14  1:30   ` Balbir Singh
  2018-06-07 14:38 ` [PATCH 03/10] x86/cet: Signal handling for shadow stack Yu-cheng Yu
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:37 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

WRUSS is a new kernel-mode instruction but writes directly
to user shadow stack memory.  This is used to construct
a return address on the shadow stack for the signal
handler.

This instruction can fault if the user shadow stack is
invalid shadow stack memory.  In that case, the kernel does
fixup.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/special_insns.h          | 44 +++++++++++++++++++++++++++
 arch/x86/lib/x86-opcode-map.txt               |  2 +-
 arch/x86/mm/fault.c                           | 13 +++++++-
 tools/objtool/arch/x86/lib/x86-opcode-map.txt |  2 +-
 4 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 317fc59b512c..8ce532fcc171 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -237,6 +237,50 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+#ifdef CONFIG_X86_INTEL_CET
+
+#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	int err;
+
+	asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
+		     "xor %[err],%[err]\n"
+		     "2:\n"
+		     ".section .fixup,\"ax\"\n"
+		     "3: mov $-1,%[err]; jmp 2b\n"
+		     ".previous\n"
+		     _ASM_EXTABLE(1b, 3b)
+		: [err] "=a" (err)
+		: [val] "S" (val), [addr] "D" (addr)
+		: "memory");
+	return err;
+}
+#else
+static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
+{
+	return 0;
+}
+#endif
+
+static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
+{
+	int err;
+
+	asm volatile("1:.byte 0x66, 0x48, 0x0f, 0x38, 0xf5, 0x37\n"
+		     "xor %[err],%[err]\n"
+		     "2:\n"
+		     ".section .fixup,\"ax\"\n"
+		     "3: mov $-1,%[err]; jmp 2b\n"
+		     ".previous\n"
+		     _ASM_EXTABLE(1b, 3b)
+		: [err] "=a" (err)
+		: [val] "S" (val), [addr] "D" (addr)
+		: "memory");
+	return err;
+}
+#endif /* CONFIG_X86_INTEL_CET */
+
 #define nop() asm volatile ("nop")
 
 
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index e0b85930dd77..72bb7c48a7df 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
-f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
 f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 EndTable
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2b3b9170109c..f157338862f8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -640,6 +640,17 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
 	return 0;
 }
 
+/*
+ * WRUSS is a kernel instrcution and but writes to user
+ * shadow stack memory.  When a fault occurs, both
+ * X86_PF_USER and X86_PF_SHSTK are set.
+ */
+static int is_wruss(struct pt_regs *regs, unsigned long error_code)
+{
+	return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
+		(X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs));
+}
+
 static const char nx_warning[] = KERN_CRIT
 "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n";
 static const char smep_warning[] = KERN_CRIT
@@ -851,7 +862,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	struct task_struct *tsk = current;
 
 	/* User mode accesses just cause a SIGSEGV */
-	if (error_code & X86_PF_USER) {
+	if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) {
 		/*
 		 * It's possible to have interrupts off here:
 		 */
diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
index e0b85930dd77..72bb7c48a7df 100644
--- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
@@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
-f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
 f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 EndTable
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
  2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
  2018-06-07 14:37 ` [PATCH 02/10] x86/cet: Introduce WRUSS instruction Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:30   ` Andy Lutomirski
  2018-06-07 14:38 ` [PATCH 04/10] x86/cet: Handle thread " Yu-cheng Yu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Set and restore shadow stack pointer for signals.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/ia32/ia32_signal.c            |  5 ++++
 arch/x86/include/asm/cet.h             |  7 +++++
 arch/x86/include/uapi/asm/sigcontext.h |  4 +++
 arch/x86/kernel/cet.c                  | 51 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/signal.c               | 11 ++++++++
 5 files changed, 78 insertions(+)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 86b1341cba9a..26a776baff7c 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -34,6 +34,7 @@
 #include <asm/sigframe.h>
 #include <asm/sighandling.h>
 #include <asm/smap.h>
+#include <asm/cet.h>
 
 /*
  * Do a signal return; undo the signal stack.
@@ -74,6 +75,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
 	unsigned int tmpflags, err = 0;
 	void __user *buf;
 	u32 tmp;
+	u32 ssp;
 
 	/* Always make any pending restarted system calls return -EINTR */
 	current->restart_block.fn = do_no_restart_syscall;
@@ -104,9 +106,11 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
 
 		get_user_ex(tmp, &sc->fpstate);
 		buf = compat_ptr(tmp);
+		get_user_ex(ssp, &sc->ssp);
 	} get_user_catch(err);
 
 	err |= fpu__restore_sig(buf, 1);
+	err |= cet_restore_signal((unsigned long)ssp);
 
 	force_iret();
 
@@ -194,6 +198,7 @@ static int ia32_setup_sigcontext(struct sigcontext_32 __user *sc,
 		put_user_ex(current->thread.trap_nr, &sc->trapno);
 		put_user_ex(current->thread.error_code, &sc->err);
 		put_user_ex(regs->ip, &sc->ip);
+		put_user_ex((u32)cet_get_shstk_ptr(), &sc->ssp);
 		put_user_ex(regs->cs, (unsigned int __user *)&sc->cs);
 		put_user_ex(regs->flags, &sc->flags);
 		put_user_ex(regs->sp, &sc->sp_at_signal);
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 9d5bc1efc9b7..5507469cb803 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -17,14 +17,21 @@ struct cet_stat {
 
 #ifdef CONFIG_X86_INTEL_CET
 unsigned long cet_get_shstk_ptr(void);
+int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val);
 int cet_setup_shstk(void);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
+int cet_restore_signal(unsigned long ssp);
+int cet_setup_signal(int ia32, unsigned long addr);
 #else
 static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
+static inline int cet_push_shstk(int ia32, unsigned long ssp,
+				 unsigned long val) { return 0; }
 static inline int cet_setup_shstk(void) { return 0; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
+static inline int cet_restore_signal(unsigned long ssp) { return 0; }
+static inline int cet_setup_signal(int ia32, unsigned long addr) { return 0; }
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 844d60eb1882..6c8997a0156a 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -230,6 +230,7 @@ struct sigcontext_32 {
 	__u32				fpstate; /* Zero when no FPU/extended context */
 	__u32				oldmask;
 	__u32				cr2;
+	__u32				ssp;
 };
 
 /*
@@ -262,6 +263,7 @@ struct sigcontext_64 {
 	__u64				trapno;
 	__u64				oldmask;
 	__u64				cr2;
+	__u64				ssp;
 
 	/*
 	 * fpstate is really (struct _fpstate *) or (struct _xstate *)
@@ -320,6 +322,7 @@ struct sigcontext {
 	struct _fpstate __user		*fpstate;
 	__u32				oldmask;
 	__u32				cr2;
+	__u32				ssp;
 };
 # else /* __x86_64__: */
 struct sigcontext {
@@ -377,6 +380,7 @@ struct sigcontext {
 	__u64				trapno;
 	__u64				oldmask;
 	__u64				cr2;
+	__u64				ssp;
 	struct _fpstate __user		*fpstate;	/* Zero when no FPU context */
 #  ifdef __ILP32__
 	__u32				__fpstate_pad;
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 8abbfd44322a..6f445ce94c83 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -17,6 +17,7 @@
 #include <asm/fpu/xstate.h>
 #include <asm/fpu/types.h>
 #include <asm/cet.h>
+#include <asm/special_insns.h>
 
 #define SHSTK_SIZE (0x8000 * (test_thread_flag(TIF_IA32) ? 4 : 8))
 
@@ -47,6 +48,24 @@ unsigned long cet_get_shstk_ptr(void)
 	return ptr;
 }
 
+int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val)
+{
+	if (val >= TASK_SIZE)
+		return -EINVAL;
+
+	if (IS_ENABLED(CONFIG_IA32_EMULATION) && ia32) {
+		if (!IS_ALIGNED(ssp, 4))
+			return -EINVAL;
+		cet_set_shstk_ptr(ssp);
+		return write_user_shstk_32(ssp, (unsigned int)val);
+	} else {
+		if (!IS_ALIGNED(ssp, 8))
+			return -EINVAL;
+		cet_set_shstk_ptr(ssp);
+		return write_user_shstk_64(ssp, val);
+	}
+}
+
 static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
 {
 	struct mm_struct *mm = current->mm;
@@ -121,3 +140,35 @@ void cet_disable_free_shstk(struct task_struct *tsk)
 
 	tsk->thread.cet.shstk_enabled = 0;
 }
+
+int cet_restore_signal(unsigned long ssp)
+{
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+	return cet_set_shstk_ptr(ssp);
+}
+
+int cet_setup_signal(int ia32, unsigned long rstor_addr)
+{
+	unsigned long ssp;
+	struct cet_stat *cet = &current->thread.cet;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	ssp = cet_get_shstk_ptr();
+
+	/*
+	 * Put the restorer address on the shstk
+	 */
+	if (ia32)
+		ssp -= sizeof(u32);
+	else
+		ssp -= sizeof(rstor_addr);
+
+	if (ssp >= (cet->shstk_base + cet->shstk_size) ||
+	    ssp < cet->shstk_base)
+		return -EINVAL;
+
+	return cet_push_shstk(ia32, ssp, rstor_addr);
+}
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index da270b95fe4d..86fb897cae19 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -46,6 +46,7 @@
 
 #include <asm/sigframe.h>
 #include <asm/signal.h>
+#include <asm/cet.h>
 
 #define COPY(x)			do {			\
 	get_user_ex(regs->x, &sc->x);			\
@@ -102,6 +103,7 @@ static int restore_sigcontext(struct pt_regs *regs,
 	void __user *buf;
 	unsigned int tmpflags;
 	unsigned int err = 0;
+	unsigned long ssp = 0;
 
 	/* Always make any pending restarted system calls return -EINTR */
 	current->restart_block.fn = do_no_restart_syscall;
@@ -148,9 +150,11 @@ static int restore_sigcontext(struct pt_regs *regs,
 
 		get_user_ex(buf_val, &sc->fpstate);
 		buf = (void __user *)buf_val;
+		get_user_ex(ssp, &sc->ssp);
 	} get_user_catch(err);
 
 	err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
+	err |= cet_restore_signal(ssp);
 
 	force_iret();
 
@@ -193,6 +197,7 @@ int setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 		put_user_ex(current->thread.trap_nr, &sc->trapno);
 		put_user_ex(current->thread.error_code, &sc->err);
 		put_user_ex(regs->ip, &sc->ip);
+		put_user_ex(cet_get_shstk_ptr(), &sc->ssp);
 #ifdef CONFIG_X86_32
 		put_user_ex(regs->cs, (unsigned int __user *)&sc->cs);
 		put_user_ex(regs->flags, &sc->flags);
@@ -742,6 +747,12 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		user_disable_single_step(current);
 
 	failed = (setup_rt_frame(ksig, regs) < 0);
+	if (!failed) {
+		unsigned long rstor = (unsigned long)ksig->ka.sa.sa_restorer;
+		int ia32 = is_ia32_frame(ksig);
+
+		failed = cet_setup_signal(ia32, rstor);
+	}
 	if (!failed) {
 		/*
 		 * Clear the direction flag as per the ABI for function entry.
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (2 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 03/10] x86/cet: Signal handling for shadow stack Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:21   ` Andy Lutomirski
  2018-06-07 14:38 ` [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement Yu-cheng Yu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

When fork() specifies CLONE_VM but not CLONE_VFORK, the child
needs a separate program stack and a separate shadow stack.
This patch handles allocation and freeing of the thread shadow
stack.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h         |  2 ++
 arch/x86/include/asm/mmu_context.h |  3 +++
 arch/x86/kernel/cet.c              | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c          |  1 +
 arch/x86/kernel/process_64.c       |  7 +++++++
 5 files changed, 47 insertions(+)

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 5507469cb803..c8fd87e13859 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -19,6 +19,7 @@ struct cet_stat {
 unsigned long cet_get_shstk_ptr(void);
 int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val);
 int cet_setup_shstk(void);
+int cet_setup_thread_shstk(struct task_struct *p);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
 int cet_restore_signal(unsigned long ssp);
@@ -28,6 +29,7 @@ static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
 static inline int cet_push_shstk(int ia32, unsigned long ssp,
 				 unsigned long val) { return 0; }
 static inline int cet_setup_shstk(void) { return 0; }
+static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
 static inline int cet_restore_signal(unsigned long ssp) { return 0; }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index cf9911b5a53c..42395257efc3 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
 #include <asm/mpx.h>
+#include <asm/cet.h>
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -228,6 +229,8 @@ do {						\
 #else
 #define deactivate_mm(tsk, mm)			\
 do {						\
+	if (!tsk->vfork_done)			\
+		cet_disable_free_shstk(tsk);	\
 	load_gs_index(0);			\
 	loadsegment(fs, 0);			\
 } while (0)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 6f445ce94c83..156f5d88ffd5 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -103,6 +103,40 @@ int cet_setup_shstk(void)
 	return 0;
 }
 
+int cet_setup_thread_shstk(struct task_struct *tsk)
+{
+	unsigned long addr, size;
+	struct cet_user_state *state;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	state = get_xsave_addr(&tsk->thread.fpu.state.xsave,
+			       XFEATURE_MASK_SHSTK_USER);
+
+	if (!state)
+		return -EINVAL;
+
+	size = tsk->thread.cet.shstk_size;
+	if (size == 0)
+		size = SHSTK_SIZE;
+
+	addr = shstk_mmap(0, size);
+
+	if (addr >= TASK_SIZE) {
+		tsk->thread.cet.shstk_base = 0;
+		tsk->thread.cet.shstk_size = 0;
+		tsk->thread.cet.shstk_enabled = 0;
+		return -ENOMEM;
+	}
+
+	state->user_ssp = (u64)(addr + size - sizeof(u64));
+	tsk->thread.cet.shstk_base = addr;
+	tsk->thread.cet.shstk_size = size;
+	tsk->thread.cet.shstk_enabled = 1;
+	return 0;
+}
+
 void cet_disable_shstk(void)
 {
 	u64 r;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b3b0b482983a..ae56caee41f9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -127,6 +127,7 @@ void exit_thread(struct task_struct *tsk)
 
 	free_vm86(t);
 
+	cet_disable_free_shstk(tsk);
 	fpu__drop(fpu);
 }
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445fb98d..6e493b0bcedd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -317,6 +317,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
+	/* Allocate a new shadow stack for pthread */
+	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) {
+		err = cet_setup_thread_shstk(p);
+		if (err)
+			goto out;
+	}
+
 	err = -ENOMEM;
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (3 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 04/10] x86/cet: Handle thread " Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:38   ` Andy Lutomirski
  2018-06-07 14:38 ` [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack Yu-cheng Yu
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Look in .note.gnu.property of an ELF file and check if shadow stack needs
to be enabled for the task.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/Kconfig                         |   4 +
 arch/x86/include/asm/elf.h               |   5 +
 arch/x86/include/uapi/asm/elf_property.h |  16 +++
 arch/x86/kernel/Makefile                 |   2 +
 arch/x86/kernel/elf.c                    | 220 +++++++++++++++++++++++++++++++
 fs/binfmt_elf.c                          |  16 +++
 include/uapi/linux/elf.h                 |   1 +
 7 files changed, 264 insertions(+)
 create mode 100644 arch/x86/include/uapi/asm/elf_property.h
 create mode 100644 arch/x86/kernel/elf.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index dd580d4910fc..24339a5299da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1931,12 +1931,16 @@ config X86_INTEL_CET
 config ARCH_HAS_SHSTK
 	def_bool n
 
+config ARCH_HAS_PROGRAM_PROPERTIES
+	def_bool n
+
 config X86_INTEL_SHADOW_STACK_USER
 	prompt "Intel Shadow Stack for user-mode"
 	def_bool n
 	depends on CPU_SUP_INTEL && X86_64
 	select X86_INTEL_CET
 	select ARCH_HAS_SHSTK
+	select ARCH_HAS_PROGRAM_PROPERTIES
 	---help---
 	  Shadow stack provides hardware protection against program stack
 	  corruption.  Only when all the following are true will an application
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0d157d2a1e2a..5b5f169c5c07 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -382,4 +382,9 @@ struct va_alignment {
 
 extern struct va_alignment va_align;
 extern unsigned long align_vdso_addr(unsigned long);
+
+#ifdef CONFIG_ARCH_HAS_PROGRAM_PROPERTIES
+extern int arch_setup_features(void *ehdr, void *phdr, struct file *file,
+			       bool interp);
+#endif
 #endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/include/uapi/asm/elf_property.h b/arch/x86/include/uapi/asm/elf_property.h
new file mode 100644
index 000000000000..343a871b8fc1
--- /dev/null
+++ b/arch/x86/include/uapi/asm/elf_property.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_ASM_X86_ELF_PROPERTY_H
+#define _UAPI_ASM_X86_ELF_PROPERTY_H
+
+/*
+ * pr_type
+ */
+#define GNU_PROPERTY_X86_FEATURE_1_AND (0xc0000002)
+
+/*
+ * Bits for GNU_PROPERTY_X86_FEATURE_1_AND
+ */
+#define GNU_PROPERTY_X86_FEATURE_1_SHSTK	(0x00000002)
+#define GNU_PROPERTY_X86_FEATURE_1_IBT		(0x00000001)
+
+#endif /* _UAPI_ASM_X86_ELF_PROPERTY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 7ea5e099d558..cbf983f44b61 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -140,6 +140,8 @@ obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
 
+obj-$(CONFIG_ARCH_HAS_PROGRAM_PROPERTIES) += elf.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/elf.c b/arch/x86/kernel/elf.c
new file mode 100644
index 000000000000..8e2719d8dc86
--- /dev/null
+++ b/arch/x86/kernel/elf.c
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Look at an ELF file's .note.gnu.property and determine if the file
+ * supports shadow stack and/or indirect branch tracking.
+ * The path from the ELF header to the note section is the following:
+ * elfhdr->elf_phdr->elf_note->x86_note_gnu_property[].
+ */
+
+#include <asm/cet.h>
+#include <asm/elf_property.h>
+#include <uapi/linux/elf-em.h>
+#include <linux/binfmts.h>
+#include <linux/elf.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/string.h>
+
+#define ELF_NOTE_DESC_OFFSET(n, align) \
+	round_up(sizeof(*n) + n->n_namesz, (align))
+
+#define ELF_NOTE_NEXT_OFFSET(n, align) \
+	round_up(ELF_NOTE_DESC_OFFSET(n, align) + n->n_descsz, (align))
+
+static int find_cet(u8 *buf, u32 size, u32 align, int *shstk, int *ibt)
+{
+	unsigned long start = (unsigned long)buf;
+	struct elf_note *note = (struct elf_note *)buf;
+
+	*shstk = 0;
+	*ibt = 0;
+
+	/*
+	 * Go through the x86_note_gnu_property array pointed by
+	 * buf and look for shadow stack and indirect branch
+	 * tracking features.
+	 * The GNU_PROPERTY_X86_FEATURE_1_AND entry contains only
+	 * one u32 as data.  Do not go beyond buf_size.
+	 */
+
+	while ((unsigned long) (note + 1) - start < size) {
+		/* Find the NT_GNU_PROPERTY_TYPE_0 note. */
+		if (note->n_namesz == 4 &&
+		    note->n_type == NT_GNU_PROPERTY_TYPE_0 &&
+		    memcmp(note + 1, "GNU", 4) == 0) {
+			u8 *ptr, *ptr_end;
+
+			/* Check for invalid property. */
+			if (note->n_descsz < 8 ||
+			   (note->n_descsz % align) != 0)
+				return 0;
+
+			/* Start and end of property array. */
+			ptr = (u8 *)(note + 1) + 4;
+			ptr_end = ptr + note->n_descsz;
+
+			while (1) {
+				u32 type = *(u32 *)ptr;
+				u32 datasz = *(u32 *)(ptr + 4);
+
+				ptr += 8;
+				if ((ptr + datasz) > ptr_end)
+					break;
+
+				if (type == GNU_PROPERTY_X86_FEATURE_1_AND &&
+				    datasz == 4) {
+					u32 p = *(u32 *)ptr;
+
+					if (p & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+						*shstk = 1;
+					if (p & GNU_PROPERTY_X86_FEATURE_1_IBT)
+						*ibt = 1;
+					return 1;
+				}
+			}
+		}
+
+		/*
+		 * Note sections like .note.ABI-tag and .note.gnu.build-id
+		 * are aligned to 4 bytes in 64-bit ELF objects.
+		 */
+		note = (void *)note + ELF_NOTE_NEXT_OFFSET(note, align);
+	}
+
+	return 0;
+}
+
+static int check_pt_note_segment(struct file *file,
+				 unsigned long note_size, loff_t *pos,
+				 u32 align, int *shstk, int *ibt)
+{
+	int retval;
+	char *note_buf;
+
+	/*
+	 * Try to read in the whole PT_NOTE segment.
+	 */
+	note_buf = kmalloc(note_size, GFP_KERNEL);
+	if (!note_buf)
+		return -ENOMEM;
+	retval = kernel_read(file, note_buf, note_size, pos);
+	if (retval != note_size) {
+		kfree(note_buf);
+		return (retval < 0) ? retval : -EIO;
+	}
+
+	retval = find_cet(note_buf, note_size, align, shstk, ibt);
+	kfree(note_buf);
+	return retval;
+}
+
+#ifdef CONFIG_COMPAT
+static int check_pt_note_32(struct file *file, struct elf32_phdr *phdr,
+			    int phnum, int *shstk, int *ibt)
+{
+	int i;
+	int found = 0;
+
+	/*
+	 * Go through all PT_NOTE segments and find NT_GNU_PROPERTY_TYPE_0.
+	 */
+	for (i = 0; i < phnum; i++, phdr++) {
+		loff_t pos;
+
+		/*
+		 * NT_GNU_PROPERTY_TYPE_0 note is aligned to 4 bytes
+		 * in 32-bit binaries.
+		 */
+		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 4))
+			continue;
+
+		pos = phdr->p_offset;
+		found = check_pt_note_segment(file, phdr->p_filesz,
+					      &pos, phdr->p_align,
+					      shstk, ibt);
+		if (found)
+			break;
+	}
+	return found;
+}
+#endif
+
+#ifdef CONFIG_X86_64
+static int check_pt_note_64(struct file *file, struct elf64_phdr *phdr,
+			    int phnum, int *shstk, int *ibt)
+{
+	int found = 0;
+
+	/*
+	 * Go through all PT_NOTE segments and find NT_GNU_PROPERTY_TYPE_0.
+	 */
+	for (; phnum > 0; phnum--, phdr++) {
+		loff_t pos;
+
+		/*
+		 * NT_GNU_PROPERTY_TYPE_0 note is aligned to 8 bytes
+		 * in 64-bit binaries.
+		 */
+		if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 8))
+			continue;
+
+		pos = phdr->p_offset;
+		found = check_pt_note_segment(file, phdr->p_filesz,
+					      &pos, phdr->p_align,
+					      shstk, ibt);
+
+		if (found)
+			break;
+	}
+	return found;
+}
+#endif
+
+int arch_setup_features(void *ehdr_p, void *phdr_p,
+			struct file *file, bool interp)
+{
+	int err = 0;
+	int shstk = 0;
+	int ibt = 0;
+
+	struct elf64_hdr *ehdr64 = ehdr_p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return 0;
+
+	if (ehdr64->e_ident[EI_CLASS] == ELFCLASS64) {
+		struct elf64_phdr *phdr64 = phdr_p;
+
+		err = check_pt_note_64(file, phdr64, ehdr64->e_phnum,
+				       &shstk, &ibt);
+		if (err < 0)
+			goto out;
+	} else {
+#ifdef CONFIG_COMPAT
+		struct elf32_hdr *ehdr32 = ehdr_p;
+
+		if (ehdr32->e_ident[EI_CLASS] == ELFCLASS32) {
+			struct elf32_phdr *phdr32 = phdr_p;
+
+			err = check_pt_note_32(file, phdr32, ehdr32->e_phnum,
+					       &shstk, &ibt);
+			if (err < 0)
+				goto out;
+		}
+#endif
+	}
+
+	current->thread.cet.shstk_enabled = 0;
+	current->thread.cet.shstk_base = 0;
+	current->thread.cet.shstk_size = 0;
+	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+		if (shstk) {
+			err = cet_setup_shstk();
+			if (err < 0)
+				goto out;
+		}
+	}
+out:
+	return err;
+}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 4ad6f669fe34..9ddc6d01e779 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1081,6 +1081,22 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out_free_dentry;
 	}
 
+#ifdef CONFIG_ARCH_HAS_PROGRAM_PROPERTIES
+
+	if (interpreter) {
+		retval = arch_setup_features(&loc->interp_elf_ex,
+					     interp_elf_phdata,
+					     interpreter, true);
+	} else {
+		retval = arch_setup_features(&loc->elf_ex,
+					     elf_phdata,
+					     bprm->file, false);
+	}
+
+	if (retval < 0)
+		goto out_free_dentry;
+#endif
+
 	if (elf_interpreter) {
 		unsigned long interp_map_addr = 0;
 
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index e2535d6dcec7..f69ed8702271 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -372,6 +372,7 @@ typedef struct elf64_shdr {
 #define NT_PRFPREG	2
 #define NT_PRPSINFO	3
 #define NT_TASKSTRUCT	4
+#define NT_GNU_PROPERTY_TYPE_0 5
 #define NT_AUXV		6
 /*
  * Note to userspace developers: size of NT_SIGINFO note may increase
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (4 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:48   ` Andy Lutomirski
  2018-06-07 14:38 ` [PATCH 07/10] mm: Prevent mprotect from changing " Yu-cheng Yu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

The following operations are provided.

ARCH_CET_STATUS:
	return the current CET status

ARCH_CET_DISABLE:
	disable CET features

ARCH_CET_LOCK:
	lock out CET features

ARCH_CET_EXEC:
	set CET features for exec()

ARCH_CET_ALLOC_SHSTK:
	allocate a new shadow stack

ARCH_CET_PUSH_SHSTK:
	put a return address on shadow stack

ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
the implementation of GLIBC ucontext related APIs.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h        |   7 ++
 arch/x86/include/uapi/asm/prctl.h |  15 +++
 arch/x86/kernel/Makefile          |   2 +-
 arch/x86/kernel/cet.c             |  18 +++-
 arch/x86/kernel/cet_prctl.c       | 203 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/elf.c             |  24 ++++-
 arch/x86/kernel/process.c         |   7 ++
 7 files changed, 270 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/cet_prctl.c

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index c8fd87e13859..a2a53fe4d5e6 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -12,24 +12,31 @@ struct task_struct;
 struct cet_stat {
 	unsigned long	shstk_base;
 	unsigned long	shstk_size;
+	unsigned long	exec_shstk_size;
 	unsigned int	shstk_enabled:1;
+	unsigned int	locked:1;
+	unsigned int	exec_shstk:2;
 };
 
 #ifdef CONFIG_X86_INTEL_CET
+int prctl_cet(int option, unsigned long arg2);
 unsigned long cet_get_shstk_ptr(void);
 int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val);
 int cet_setup_shstk(void);
 int cet_setup_thread_shstk(struct task_struct *p);
+int cet_alloc_shstk(unsigned long *arg);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
 int cet_restore_signal(unsigned long ssp);
 int cet_setup_signal(int ia32, unsigned long addr);
 #else
+static inline int prctl_cet(int option, unsigned long arg2) { return 0; }
 static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
 static inline int cet_push_shstk(int ia32, unsigned long ssp,
 				 unsigned long val) { return 0; }
 static inline int cet_setup_shstk(void) { return 0; }
 static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; }
+static inline int cet_alloc_shstk(unsigned long *arg) { return -EINVAL; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
 static inline int cet_restore_signal(unsigned long ssp) { return 0; }
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 5a6aac9fa41f..f9965403b655 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -14,4 +14,19 @@
 #define ARCH_MAP_VDSO_32	0x2002
 #define ARCH_MAP_VDSO_64	0x2003
 
+#define ARCH_CET_STATUS		0x3001
+#define ARCH_CET_DISABLE	0x3002
+#define ARCH_CET_LOCK		0x3003
+#define ARCH_CET_EXEC		0x3004
+#define ARCH_CET_ALLOC_SHSTK	0x3005
+#define ARCH_CET_PUSH_SHSTK	0x3006
+
+/*
+ * Settings for ARCH_CET_EXEC
+ */
+#define CET_EXEC_ELF_PROPERTY	0
+#define CET_EXEC_ALWAYS_OFF	1
+#define CET_EXEC_ALWAYS_ON	2
+#define CET_EXEC_MAX CET_EXEC_ALWAYS_ON
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cbf983f44b61..80464f925a6a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -138,7 +138,7 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
-obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
+obj-$(CONFIG_X86_INTEL_CET)		+= cet.o cet_prctl.o
 
 obj-$(CONFIG_ARCH_HAS_PROGRAM_PROPERTIES) += elf.o
 
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 156f5d88ffd5..1b7089dcf1ea 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -83,6 +83,19 @@ static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
 	return addr;
 }
 
+int cet_alloc_shstk(unsigned long *arg)
+{
+	unsigned long size = *arg;
+	unsigned long addr;
+
+	addr = shstk_mmap(0, size);
+	if (addr >= TASK_SIZE)
+		return -ENOMEM;
+
+	*arg = addr;
+	return 0;
+}
+
 int cet_setup_shstk(void)
 {
 	unsigned long addr, size;
@@ -90,7 +103,10 @@ int cet_setup_shstk(void)
 	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
 		return -EOPNOTSUPP;
 
-	size = SHSTK_SIZE;
+	size = current->thread.cet.exec_shstk_size;
+	if ((size > TASK_SIZE) || (size == 0))
+		size = SHSTK_SIZE;
+
 	addr = shstk_mmap(0, size);
 
 	if (addr >= TASK_SIZE)
diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
new file mode 100644
index 000000000000..326996e2ea80
--- /dev/null
+++ b/arch/x86/kernel/cet_prctl.c
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/errno.h>
+#include <linux/uaccess.h>
+#include <linux/prctl.h>
+#include <linux/compat.h>
+#include <asm/processor.h>
+#include <asm/prctl.h>
+#include <asm/elf.h>
+#include <asm/elf_property.h>
+#include <asm/cet.h>
+
+/*
+ * Handler of prctl for CET:
+ *
+ * ARCH_CET_STATUS: return the current status
+ * ARCH_CET_DISABLE: disable features
+ * ARCH_CET_LOCK: lock out cet features until exec()
+ * ARCH_CET_EXEC: set default features for exec()
+ * ARCH_CET_ALLOC_SHSTK: allocate shadow stack
+ * ARCH_CET_PUSH_SHSTK: put a return address on shadow stack
+ */
+
+static int handle_get_status(unsigned long arg2)
+{
+	unsigned int features = 0, cet_exec = 0;
+	unsigned long shstk_size = 0;
+
+	if (current->thread.cet.shstk_enabled)
+		features |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
+	if (current->thread.cet.exec_shstk == CET_EXEC_ALWAYS_ON)
+		cet_exec |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
+	shstk_size = current->thread.cet.exec_shstk_size;
+
+	if (in_compat_syscall()) {
+		unsigned int buf[3];
+
+		buf[0] = features;
+		buf[1] = cet_exec;
+		buf[2] = (unsigned int)shstk_size;
+		return copy_to_user((unsigned int __user *)arg2, buf,
+				    sizeof(buf));
+	} else {
+		unsigned long buf[3];
+
+		buf[0] = (unsigned long)features;
+		buf[1] = (unsigned long)cet_exec;
+		buf[2] = shstk_size;
+		return copy_to_user((unsigned long __user *)arg2, buf,
+				    sizeof(buf));
+	}
+}
+
+static int handle_set_exec(unsigned long arg2)
+{
+	unsigned int features = 0, cet_exec = 0;
+	unsigned long shstk_size = 0;
+	int err = 0;
+
+	if (in_compat_syscall()) {
+		unsigned int buf[3];
+
+		err = copy_from_user(buf, (unsigned int __user *)arg2,
+				     sizeof(buf));
+		if (!err) {
+			features = buf[0];
+			cet_exec = buf[1];
+			shstk_size = (unsigned long)buf[2];
+		}
+	} else {
+		unsigned long buf[3];
+
+		err = copy_from_user(buf, (unsigned long __user *)arg2,
+				     sizeof(buf));
+		if (!err) {
+			features = (unsigned int)buf[0];
+			cet_exec = (unsigned int)buf[1];
+			shstk_size = buf[2];
+		}
+	}
+
+	if (err)
+		return -EFAULT;
+	if (cet_exec > CET_EXEC_MAX)
+		return -EINVAL;
+	if (shstk_size >= TASK_SIZE)
+		return -EINVAL;
+
+	if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK) {
+		if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+			return -EINVAL;
+		if ((current->thread.cet.exec_shstk == CET_EXEC_ALWAYS_ON) &&
+		    (cet_exec != CET_EXEC_ALWAYS_ON))
+			return -EPERM;
+	}
+
+	if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+		current->thread.cet.exec_shstk = cet_exec;
+
+	current->thread.cet.exec_shstk_size = shstk_size;
+	return 0;
+}
+
+static int handle_push_shstk(unsigned long arg2)
+{
+	unsigned long ssp = 0, ret_addr = 0;
+	int ia32, err;
+
+	ia32 = in_ia32_syscall();
+
+	if (ia32) {
+		unsigned int buf[2];
+
+		err = copy_from_user(buf, (unsigned int __user *)arg2,
+				     sizeof(buf));
+		if (!err) {
+			ssp = (unsigned long)buf[0];
+			ret_addr = (unsigned long)buf[1];
+		}
+	} else {
+		unsigned long buf[2];
+
+		err = copy_from_user(buf, (unsigned long __user *)arg2,
+				     sizeof(buf));
+		if (!err) {
+			ssp = buf[0];
+			ret_addr = buf[1];
+		}
+	}
+	if (err)
+		return -EFAULT;
+	err = cet_push_shstk(ia32, ssp, ret_addr);
+	if (err)
+		return -err;
+	return 0;
+}
+
+static int handle_alloc_shstk(unsigned long arg2)
+{
+	int err = 0;
+	unsigned long shstk_size = 0;
+
+	if (in_ia32_syscall()) {
+		unsigned int size;
+
+		err = get_user(size, (unsigned int __user *)arg2);
+		if (!err)
+			shstk_size = size;
+	} else {
+		err = get_user(shstk_size, (unsigned long __user *)arg2);
+	}
+
+	if (err)
+		return -EFAULT;
+
+	err = cet_alloc_shstk(&shstk_size);
+	if (err)
+		return -err;
+
+	if (in_ia32_syscall()) {
+		if (put_user(shstk_size, (unsigned int __user *)arg2))
+			return -EFAULT;
+	} else {
+		if (put_user(shstk_size, (unsigned long __user *)arg2))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+int prctl_cet(int option, unsigned long arg2)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
+		return -EINVAL;
+
+	switch (option) {
+	case ARCH_CET_STATUS:
+		return handle_get_status(arg2);
+
+	case ARCH_CET_DISABLE:
+		if (current->thread.cet.locked)
+			return -EPERM;
+		if (arg2 & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
+			cet_disable_free_shstk(current);
+
+		return 0;
+
+	case ARCH_CET_LOCK:
+		current->thread.cet.locked = 1;
+		return 0;
+
+	case ARCH_CET_EXEC:
+		return handle_set_exec(arg2);
+
+	case ARCH_CET_ALLOC_SHSTK:
+		return handle_alloc_shstk(arg2);
+
+	case ARCH_CET_PUSH_SHSTK:
+		return handle_push_shstk(arg2);
+
+	default:
+		return -EINVAL;
+	}
+}
diff --git a/arch/x86/kernel/elf.c b/arch/x86/kernel/elf.c
index 8e2719d8dc86..de08d41971f6 100644
--- a/arch/x86/kernel/elf.c
+++ b/arch/x86/kernel/elf.c
@@ -8,7 +8,10 @@
 
 #include <asm/cet.h>
 #include <asm/elf_property.h>
+#include <asm/prctl.h>
+#include <asm/processor.h>
 #include <uapi/linux/elf-em.h>
+#include <uapi/linux/prctl.h>
 #include <linux/binfmts.h>
 #include <linux/elf.h>
 #include <linux/slab.h>
@@ -208,13 +211,26 @@ int arch_setup_features(void *ehdr_p, void *phdr_p,
 	current->thread.cet.shstk_enabled = 0;
 	current->thread.cet.shstk_base = 0;
 	current->thread.cet.shstk_size = 0;
+	current->thread.cet.locked = 0;
 	if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
-		if (shstk) {
-			err = cet_setup_shstk();
-			if (err < 0)
-				goto out;
+		int exec = current->thread.cet.exec_shstk;
+
+		if (exec != CET_EXEC_ALWAYS_OFF) {
+			if (shstk || (exec == CET_EXEC_ALWAYS_ON)) {
+				err = cet_setup_shstk();
+				if (err < 0)
+					goto out;
+			}
 		}
 	}
+
+	/*
+	 * Lockout CET features if no interpreter
+	 */
+	if (!interp)
+		current->thread.cet.locked = 1;
+
+	err = 0;
 out:
 	return err;
 }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ae56caee41f9..54ad1863c6d2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -794,6 +794,13 @@ long do_arch_prctl_common(struct task_struct *task, int option,
 		return get_cpuid_mode();
 	case ARCH_SET_CPUID:
 		return set_cpuid_mode(task, cpuid_enabled);
+	case ARCH_CET_STATUS:
+	case ARCH_CET_DISABLE:
+	case ARCH_CET_LOCK:
+	case ARCH_CET_EXEC:
+	case ARCH_CET_ALLOC_SHSTK:
+	case ARCH_CET_PUSH_SHSTK:
+		return prctl_cet(option, cpuid_enabled);
 	}
 
 	return -EINVAL;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 07/10] mm: Prevent mprotect from changing shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (5 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 14:38 ` [PATCH 08/10] mm: Prevent mremap of " Yu-cheng Yu
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/mprotect.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 625608bc8962..128dcb880c12 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -446,6 +446,15 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 	error = -ENOMEM;
 	if (!vma)
 		goto out;
+
+	/*
+	 * Do not allow changing shadow stack memory.
+	 */
+	if (vma->vm_flags & VM_SHSTK) {
+		error = -EINVAL;
+		goto out;
+	}
+
 	prev = vma->vm_prev;
 	if (unlikely(grows & PROT_GROWSDOWN)) {
 		if (vma->vm_start >= end)
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 08/10] mm: Prevent mremap of shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (6 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 07/10] mm: Prevent mprotect from changing " Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:48   ` Andy Lutomirski
  2018-06-07 14:38 ` [PATCH 09/10] mm: Prevent madvise from changing " Yu-cheng Yu
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/mremap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 049470aa1e3e..70f20edb248e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -525,7 +525,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		unsigned long, new_addr)
 {
 	struct mm_struct *mm = current->mm;
-	struct vm_area_struct *vma;
+	struct vm_area_struct *vma = find_vma(mm, addr);
 	unsigned long ret = -EINVAL;
 	unsigned long charged = 0;
 	bool locked = false;
@@ -533,6 +533,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	LIST_HEAD(uf_unmap_early);
 	LIST_HEAD(uf_unmap);
 
+	if (vma->vm_flags & VM_SHSTK)
+		return ret;
+
 	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
 		return ret;
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 09/10] mm: Prevent madvise from changing shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (7 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 08/10] mm: Prevent mremap of " Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 20:54   ` Andy Lutomirski
  2018-06-07 21:09   ` Nadav Amit
  2018-06-07 14:38 ` [PATCH 10/10] mm: Prevent munmap and remap_file_pages of " Yu-cheng Yu
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/madvise.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 4d3c922ea1a1..2a6988badd6b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -839,6 +839,14 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 	if (vma && start > vma->vm_start)
 		prev = vma;
 
+	/*
+	 * Don't do anything on shadow stack.
+	 */
+	if (vma->vm_flags & VM_SHSTK) {
+		error = -EINVAL;
+		goto out_no_plug;
+	}
+
 	blk_start_plug(&plug);
 	for (;;) {
 		/* Still start < end. */
@@ -876,6 +884,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 	}
 out:
 	blk_finish_plug(&plug);
+out_no_plug:
 	if (write)
 		up_write(&current->mm->mmap_sem);
 	else
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 10/10] mm: Prevent munmap and remap_file_pages of shadow stack
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (8 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 09/10] mm: Prevent madvise from changing " Yu-cheng Yu
@ 2018-06-07 14:38 ` Yu-cheng Yu
  2018-06-07 18:50   ` Andy Lutomirski
  2018-06-12 10:56 ` [PATCH 00/10] Control Flow Enforcement - Part (3) Balbir Singh
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 14:38 UTC (permalink / raw)
  To: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz
  Cc: Yu-cheng Yu

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 mm/mmap.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index fc41c0543d7f..e7d1fcb7ec58 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2810,6 +2810,16 @@ EXPORT_SYMBOL(vm_munmap);
 
 SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 {
+	struct vm_area_struct *vma;
+
+	/* Do not munmap shadow stack */
+	down_read(&current->mm->mmap_sem);
+	vma = find_vma(current->mm, addr);
+	if (vma && (vma->vm_flags & VM_SHSTK)) {
+		up_read(&current->mm->mmap_sem);
+		return -EINVAL;
+	}
+	up_read(&current->mm->mmap_sem);
 	profile_munmap(addr);
 	return vm_munmap(addr, len);
 }
@@ -2851,6 +2861,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (!vma || !(vma->vm_flags & VM_SHARED))
 		goto out;
 
+	if (vma->vm_flags & VM_SHSTK)
+		goto out;
+
 	if (start < vma->vm_start)
 		goto out;
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
@ 2018-06-07 16:37   ` Andy Lutomirski
  2018-06-07 17:46     ` Yu-cheng Yu
  2018-06-12 11:56   ` Balbir Singh
  1 sibling, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 16:37 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> This patch adds basic shadow stack enabling/disabling routines.
> A task's shadow stack is allocated from memory with VM_SHSTK
> flag set and read-only protection.  The shadow stack is
> allocated to a fixed size and that can be changed by the system
> admin.

How do threads work?  Can a user program mremap() its shadow stack to
make it bigger?

Also, did you add all the needed checks to make get_user_pages(),
access_process_vm(), etc fail when called on the shadow stack?  (Or at
least fail if they're requesting write access and the FORCE bit isn't
set.)

> +#define SHSTK_SIZE (0x8000 * (test_thread_flag(TIF_IA32) ? 4 : 8))

Please don't add more mode-dependent #defines.  Also, please try to
avoid adding any new code that looks at TIF_IA32 or similar.  Uses of
that bit are generally bugs, and the bit itself should get removed
some day.  If you need to make a guess, use in_compat_syscall() or
similar if appropriate.

> +
> +static inline int cet_set_shstk_ptr(unsigned long addr)
> +{
> +       u64 r;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +               return -1;
> +
> +       if ((addr >= TASK_SIZE) || (!IS_ALIGNED(addr, 4)))
> +               return -1;'

TASK_SIZE_MAX, please.  TASK_SIZE is weird and is usually the wrong
thing to use.

> +static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
> +{
> +       struct mm_struct *mm = current->mm;
> +       unsigned long populate;
> +
> +       down_write(&mm->mmap_sem);
> +       addr = do_mmap(NULL, addr, len, PROT_READ,
> +                      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
> +                      0, &populate, NULL);
> +       up_write(&mm->mmap_sem);
> +
> +       if (populate)
> +               mm_populate(addr, populate);

Please don't populate if do_mmap() failed.

> +int cet_setup_shstk(void)
> +{
> +       unsigned long addr, size;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +               return -EOPNOTSUPP;
> +
> +       size = SHSTK_SIZE;
> +       addr = shstk_mmap(0, size);
> +
> +       if (addr >= TASK_SIZE)
> +               return -ENOMEM;

Please check the actual value that do_mmap() would return on error.
(IS_ERR, 0, MAP_FAILED -- I don't remember.)

> +
> +       cet_set_shstk_ptr(addr + size - sizeof(void *));
> +       current->thread.cet.shstk_base = addr;
> +       current->thread.cet.shstk_size = size;
> +       current->thread.cet.shstk_enabled = 1;
> +       return 0;
> +}
> +
> +void cet_disable_shstk(void)
> +{
> +       u64 r;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +               return;
> +
> +       rdmsrl(MSR_IA32_U_CET, r);
> +       r &= ~(MSR_IA32_CET_SHSTK_EN);
> +       wrmsrl(MSR_IA32_U_CET, r);
> +       wrmsrl(MSR_IA32_PL3_SSP, 0);
> +       current->thread.cet.shstk_enabled = 0;
> +}
> +
> +void cet_disable_free_shstk(struct task_struct *tsk)
> +{
> +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> +           !tsk->thread.cet.shstk_enabled)
> +               return;
> +
> +       if (tsk == current)
> +               cet_disable_shstk();

if tsk != current, then this will malfunction, right?  What is it
intended to do?

> +
> +       /*
> +        * Free only when tsk is current or shares mm
> +        * with current but has its own shstk.
> +        */
> +       if (tsk->mm && (tsk->mm == current->mm) &&
> +           (tsk->thread.cet.shstk_base)) {
> +               vm_munmap(tsk->thread.cet.shstk_base,
> +                         tsk->thread.cet.shstk_size);
> +               tsk->thread.cet.shstk_base = 0;
> +               tsk->thread.cet.shstk_size = 0;
> +       }

I'm having trouble imagining why the kernel would ever want to
automatically free the shadow stack vma.  What is this for?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 14:37 ` [PATCH 02/10] x86/cet: Introduce WRUSS instruction Yu-cheng Yu
@ 2018-06-07 16:40   ` Andy Lutomirski
  2018-06-07 16:51     ` Yu-cheng Yu
                       ` (2 more replies)
  2018-06-14  1:30   ` Balbir Singh
  1 sibling, 3 replies; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 16:40 UTC (permalink / raw)
  To: Yu-cheng Yu, Peter Zijlstra
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> WRUSS is a new kernel-mode instruction but writes directly
> to user shadow stack memory.  This is used to construct
> a return address on the shadow stack for the signal
> handler.
>
> This instruction can fault if the user shadow stack is
> invalid shadow stack memory.  In that case, the kernel does
> fixup.
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/special_insns.h          | 44 +++++++++++++++++++++++++++
>  arch/x86/lib/x86-opcode-map.txt               |  2 +-
>  arch/x86/mm/fault.c                           | 13 +++++++-
>  tools/objtool/arch/x86/lib/x86-opcode-map.txt |  2 +-
>  4 files changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 317fc59b512c..8ce532fcc171 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -237,6 +237,50 @@ static inline void clwb(volatile void *__p)
>                 : [pax] "a" (p));
>  }
>
> +#ifdef CONFIG_X86_INTEL_CET
> +
> +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
> +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> +{
> +       int err;
> +

Please add a comment indicating what exact opcode this is.

Peterz, isn't there some fancy better way we're supposed to handle the
error return these days?

> +       asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
> +                    "xor %[err],%[err]\n"
> +                    "2:\n"
> +                    ".section .fixup,\"ax\"\n"
> +                    "3: mov $-1,%[err]; jmp 2b\n"
> +                    ".previous\n"
> +                    _ASM_EXTABLE(1b, 3b)
> +               : [err] "=a" (err)
> +               : [val] "S" (val), [addr] "D" (addr)
> +               : "memory");
> +       return err;
> +}
> +#else
> +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> +{
> +       return 0;

BUG()?  Or just omit the ifdef?  It seems unhelpful to have a stub
function that does nothing.

> +}
> +#endif
> +
> +static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
> +{
> +       int err;
> +

Comment here too, please.

> +       asm volatile("1:.byte 0x66, 0x48, 0x0f, 0x38, 0xf5, 0x37\n"
> +                    "xor %[err],%[err]\n"
> +                    "2:\n"
> +                    ".section .fixup,\"ax\"\n"
> +                    "3: mov $-1,%[err]; jmp 2b\n"
> +                    ".previous\n"
> +                    _ASM_EXTABLE(1b, 3b)
> +               : [err] "=a" (err)
> +               : [val] "S" (val), [addr] "D" (addr)
> +               : "memory");
> +       return err;
> +}
> +#endif /* CONFIG_X86_INTEL_CET */
> +
>  #define nop() asm volatile ("nop")
>
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 16:40   ` Andy Lutomirski
@ 2018-06-07 16:51     ` Yu-cheng Yu
  2018-06-07 18:41     ` Peter Zijlstra
  2018-06-11  8:17     ` Peter Zijlstra
  2 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 16:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 09:40 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > WRUSS is a new kernel-mode instruction but writes directly
> > to user shadow stack memory.  This is used to construct
> > a return address on the shadow stack for the signal
> > handler.
> >
> > This instruction can fault if the user shadow stack is
> > invalid shadow stack memory.  In that case, the kernel does
> > fixup.
> >
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> >  arch/x86/include/asm/special_insns.h          | 44 +++++++++++++++++++++++++++
> >  arch/x86/lib/x86-opcode-map.txt               |  2 +-
> >  arch/x86/mm/fault.c                           | 13 +++++++-
> >  tools/objtool/arch/x86/lib/x86-opcode-map.txt |  2 +-
> >  4 files changed, 58 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 317fc59b512c..8ce532fcc171 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -237,6 +237,50 @@ static inline void clwb(volatile void *__p)
> >                 : [pax] "a" (p));
> >  }
> >
> > +#ifdef CONFIG_X86_INTEL_CET
> > +
> > +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
> > +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> > +{
> > +       int err;
> > +
> 
> Please add a comment indicating what exact opcode this is.

I will fix it.

> 
> Peterz, isn't there some fancy better way we're supposed to handle the
> error return these days?
> 
> > +       asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
> > +                    "xor %[err],%[err]\n"
> > +                    "2:\n"
> > +                    ".section .fixup,\"ax\"\n"
> > +                    "3: mov $-1,%[err]; jmp 2b\n"
> > +                    ".previous\n"
> > +                    _ASM_EXTABLE(1b, 3b)
> > +               : [err] "=a" (err)
> > +               : [val] "S" (val), [addr] "D" (addr)
> > +               : "memory");
> > +       return err;
> > +}
> > +#else
> > +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> > +{
> > +       return 0;
> 
> BUG()?  Or just omit the ifdef?  It seems unhelpful to have a stub
> function that does nothing.

I will fix it.

> 
> > +}
> > +#endif
> > +
> > +static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
> > +{
> > +       int err;
> > +
> 
> Comment here too, please.

OK.

> 
> > +       asm volatile("1:.byte 0x66, 0x48, 0x0f, 0x38, 0xf5, 0x37\n"
> > +                    "xor %[err],%[err]\n"
> > +                    "2:\n"
> > +                    ".section .fixup,\"ax\"\n"
> > +                    "3: mov $-1,%[err]; jmp 2b\n"
> > +                    ".previous\n"
> > +                    _ASM_EXTABLE(1b, 3b)
> > +               : [err] "=a" (err)
> > +               : [val] "S" (val), [addr] "D" (addr)
> > +               : "memory");
> > +       return err;
> > +}
> > +#endif /* CONFIG_X86_INTEL_CET */
> > +
> >  #define nop() asm volatile ("nop")
> >
> >

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 16:37   ` Andy Lutomirski
@ 2018-06-07 17:46     ` Yu-cheng Yu
  2018-06-07 17:55       ` Dave Hansen
  2018-06-07 18:23       ` Andy Lutomirski
  0 siblings, 2 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 17:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 09:37 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > This patch adds basic shadow stack enabling/disabling routines.
> > A task's shadow stack is allocated from memory with VM_SHSTK
> > flag set and read-only protection.  The shadow stack is
> > allocated to a fixed size and that can be changed by the system
> > admin.
> 
> How do threads work?  Can a user program mremap() its shadow stack to
> make it bigger?

A pthread's shadow stack is allocated/freed by the kernel.  This patch
has the supporting routines that handle both non-pthread and pthread.

In [PATCH 04/10] "Handle thread shadow stack", we allocate pthread
shadow stack in copy_thread_tls(), and free it in deactivate_mm().

If clone of a pthread fails, shadow stack is freed in
cet_disable_free_shstk() below (I will add more comments):

If (Current thread existing)
	Disable and free shadow stack

If (Clone of a pthread fails)
	Free the pthread shadow stack

We block mremap, mprotect, madvise, and munmap on a vma that has
VM_SHSTK (in separate patches).

> Also, did you add all the needed checks to make get_user_pages(),
> access_process_vm(), etc fail when called on the shadow stack?  (Or at
> least fail if they're requesting write access and the FORCE bit isn't
> set.)

Currently if FORCE bit is set, these functions can write to shadow
stack, otherwise write access will fail.  I will test it.

> > +#define SHSTK_SIZE (0x8000 * (test_thread_flag(TIF_IA32) ? 4 : 8))
> 
> Please don't add more mode-dependent #defines.  Also, please try to
> avoid adding any new code that looks at TIF_IA32 or similar.  Uses of
> that bit are generally bugs, and the bit itself should get removed
> some day.  If you need to make a guess, use in_compat_syscall() or
> similar if appropriate.

OK.

> 
> > +
> > +static inline int cet_set_shstk_ptr(unsigned long addr)
> > +{
> > +       u64 r;
> > +
> > +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +               return -1;
> > +
> > +       if ((addr >= TASK_SIZE) || (!IS_ALIGNED(addr, 4)))
> > +               return -1;'
> 
> TASK_SIZE_MAX, please.  TASK_SIZE is weird and is usually the wrong
> thing to use.

OK.

> 
> > +static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
> > +{
> > +       struct mm_struct *mm = current->mm;
> > +       unsigned long populate;
> > +
> > +       down_write(&mm->mmap_sem);
> > +       addr = do_mmap(NULL, addr, len, PROT_READ,
> > +                      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
> > +                      0, &populate, NULL);
> > +       up_write(&mm->mmap_sem);
> > +
> > +       if (populate)
> > +               mm_populate(addr, populate);
> 
> Please don't populate if do_mmap() failed.

I will fix it.

> 
> > +int cet_setup_shstk(void)
> > +{
> > +       unsigned long addr, size;
> > +
> > +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +               return -EOPNOTSUPP;
> > +
> > +       size = SHSTK_SIZE;
> > +       addr = shstk_mmap(0, size);
> > +
> > +       if (addr >= TASK_SIZE)
> > +               return -ENOMEM;
> 
> Please check the actual value that do_mmap() would return on error.
> (IS_ERR, 0, MAP_FAILED -- I don't remember.)

OK.

> 
> > +
> > +       cet_set_shstk_ptr(addr + size - sizeof(void *));
> > +       current->thread.cet.shstk_base = addr;
> > +       current->thread.cet.shstk_size = size;
> > +       current->thread.cet.shstk_enabled = 1;
> > +       return 0;
> > +}
> > +
> > +void cet_disable_shstk(void)
> > +{
> > +       u64 r;
> > +
> > +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +               return;
> > +
> > +       rdmsrl(MSR_IA32_U_CET, r);
> > +       r &= ~(MSR_IA32_CET_SHSTK_EN);
> > +       wrmsrl(MSR_IA32_U_CET, r);
> > +       wrmsrl(MSR_IA32_PL3_SSP, 0);
> > +       current->thread.cet.shstk_enabled = 0;
> > +}
> > +
> > +void cet_disable_free_shstk(struct task_struct *tsk)
> > +{
> > +       if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> > +           !tsk->thread.cet.shstk_enabled)
> > +               return;
> > +
> > +       if (tsk == current)
> > +               cet_disable_shstk();
> 
> if tsk != current, then this will malfunction, right?  What is it
> intended to do?

We get here when clone fails.  In this condition, we don't disable the
calling task's shadow stack.  I will add comments.

> 
> > +
> > +       /*
> > +        * Free only when tsk is current or shares mm
> > +        * with current but has its own shstk.
> > +        */
> > +       if (tsk->mm && (tsk->mm == current->mm) &&
> > +           (tsk->thread.cet.shstk_base)) {
> > +               vm_munmap(tsk->thread.cet.shstk_base,
> > +                         tsk->thread.cet.shstk_size);
> > +               tsk->thread.cet.shstk_base = 0;
> > +               tsk->thread.cet.shstk_size = 0;
> > +       }
> 
> I'm having trouble imagining why the kernel would ever want to
> automatically free the shadow stack vma.  What is this for?

This is for pthreads.  When a pthread exits, its shadow stack needs to
be freed.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 17:46     ` Yu-cheng Yu
@ 2018-06-07 17:55       ` Dave Hansen
  2018-06-07 18:23       ` Andy Lutomirski
  1 sibling, 0 replies; 98+ messages in thread
From: Dave Hansen @ 2018-06-07 17:55 UTC (permalink / raw)
  To: Yu-cheng Yu, Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz

On 06/07/2018 10:46 AM, Yu-cheng Yu wrote:
>> Also, did you add all the needed checks to make get_user_pages(),
>> access_process_vm(), etc fail when called on the shadow stack?  (Or at
>> least fail if they're requesting write access and the FORCE bit isn't
>> set.)
> Currently if FORCE bit is set, these functions can write to shadow
> stack, otherwise write access will fail.  I will test it.

Is this a part of your selftests/ for this feature?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-07 14:38 ` [PATCH 04/10] x86/cet: Handle thread " Yu-cheng Yu
@ 2018-06-07 18:21   ` Andy Lutomirski
  2018-06-07 19:47     ` Florian Weimer
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:21 UTC (permalink / raw)
  To: Yu-cheng Yu, Florian Weimer
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> needs a separate program stack and a separate shadow stack.
> This patch handles allocation and freeing of the thread shadow
> stack.

Aha -- you're trying to make this automatic.  I'm not convinced this
is a good idea.  The Linux kernel has a long and storied history of
enabling new hardware features in ways that are almost entirely
useless for userspace.

Florian, do you have any thoughts on how the user/kernel interaction
for the shadow stack should work?  My intuition would be that all
shadow stack management should be entirely controlled by userspace --
newly cloned threads (with CLONE_VM) should have no shadow stack
initially, and newly started processes should have no shadow stack
until they ask for one.  If it would be needed for optimization, there
could some indication in an ELF binary that it is requesting an
initial shadow stack.

But maybe some kind of automation like this patch does is actually reasonable.

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 17:46     ` Yu-cheng Yu
  2018-06-07 17:55       ` Dave Hansen
@ 2018-06-07 18:23       ` Andy Lutomirski
  1 sibling, 0 replies; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:23 UTC (permalink / raw)
  To: Yu-cheng Yu, Florian Weimer
  Cc: Andrew Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 10:50 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Thu, 2018-06-07 at 09:37 -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > This patch adds basic shadow stack enabling/disabling routines.
> > > A task's shadow stack is allocated from memory with VM_SHSTK
> > > flag set and read-only protection.  The shadow stack is
> > > allocated to a fixed size and that can be changed by the system
> > > admin.
> >
> > How do threads work?  Can a user program mremap() its shadow stack to
> > make it bigger?
>
> A pthread's shadow stack is allocated/freed by the kernel.  This patch
> has the supporting routines that handle both non-pthread and pthread.
>
> In [PATCH 04/10] "Handle thread shadow stack", we allocate pthread
> shadow stack in copy_thread_tls(), and free it in deactivate_mm().
>
> If clone of a pthread fails, shadow stack is freed in
> cet_disable_free_shstk() below (I will add more comments):
>
> If (Current thread existing)
>         Disable and free shadow stack
>
> If (Clone of a pthread fails)
>         Free the pthread shadow stack
>
> We block mremap, mprotect, madvise, and munmap on a vma that has
> VM_SHSTK (in separate patches).

Why?  mremap() seems like a sensible way to enlarge a shadow stack.
munmap() seems like a good way to get rid of one, and mmap() seems
like a nice way to create a new shadow stack if one were needed (for
green threads or similar).

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 14:38 ` [PATCH 03/10] x86/cet: Signal handling for shadow stack Yu-cheng Yu
@ 2018-06-07 18:30   ` Andy Lutomirski
  2018-06-07 18:58     ` Florian Weimer
                       ` (2 more replies)
  0 siblings, 3 replies; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:30 UTC (permalink / raw)
  To: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, Cyrill Gorcunov
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Set and restore shadow stack pointer for signals.

How does this interact with siglongjmp()?

This patch makes me extremely nervous due to the possibility of ABI
issues and CRIU breakage.

> diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> index 844d60eb1882..6c8997a0156a 100644
> --- a/arch/x86/include/uapi/asm/sigcontext.h
> +++ b/arch/x86/include/uapi/asm/sigcontext.h
> @@ -230,6 +230,7 @@ struct sigcontext_32 {
>         __u32                           fpstate; /* Zero when no FPU/extended context */
>         __u32                           oldmask;
>         __u32                           cr2;
> +       __u32                           ssp;
>  };
>
>  /*
> @@ -262,6 +263,7 @@ struct sigcontext_64 {
>         __u64                           trapno;
>         __u64                           oldmask;
>         __u64                           cr2;
> +       __u64                           ssp;
>
>         /*
>          * fpstate is really (struct _fpstate *) or (struct _xstate *)
> @@ -320,6 +322,7 @@ struct sigcontext {
>         struct _fpstate __user          *fpstate;
>         __u32                           oldmask;
>         __u32                           cr2;
> +       __u32                           ssp;

Is it actually okay to modify these structures like this?  They're
part of the user ABI, and I don't know whether any user code relies on
the size being constant.

> +int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val)
> +{
> +       if (val >= TASK_SIZE)
> +               return -EINVAL;

TASK_SIZE_MAX.  But I'm a bit unsure why you need this check at all.

> +int cet_restore_signal(unsigned long ssp)
> +{
> +       if (!current->thread.cet.shstk_enabled)
> +               return 0;
> +       return cet_set_shstk_ptr(ssp);
> +}

This will blow up if the shadow stack enabled state changes in a
signal handler.  Maybe we don't care.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement
  2018-06-07 14:38 ` [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement Yu-cheng Yu
@ 2018-06-07 18:38   ` Andy Lutomirski
  2018-06-07 20:40     ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:38 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Look in .note.gnu.property of an ELF file and check if shadow stack needs
> to be enabled for the task.

Nice!  But please structure it so it's one function that parses out
all the ELF notes and some other code (a table or a switch statement)
that handles them.  We will probably want to add more kernel-parsed
ELF notes some day, so let's structure the code to make it easier.

> +static int find_cet(u8 *buf, u32 size, u32 align, int *shstk, int *ibt)
> +{
> +       unsigned long start = (unsigned long)buf;
> +       struct elf_note *note = (struct elf_note *)buf;
> +
> +       *shstk = 0;
> +       *ibt = 0;
> +
> +       /*
> +        * Go through the x86_note_gnu_property array pointed by
> +        * buf and look for shadow stack and indirect branch
> +        * tracking features.
> +        * The GNU_PROPERTY_X86_FEATURE_1_AND entry contains only
> +        * one u32 as data.  Do not go beyond buf_size.
> +        */
> +
> +       while ((unsigned long) (note + 1) - start < size) {
> +               /* Find the NT_GNU_PROPERTY_TYPE_0 note. */
> +               if (note->n_namesz == 4 &&
> +                   note->n_type == NT_GNU_PROPERTY_TYPE_0 &&
> +                   memcmp(note + 1, "GNU", 4) == 0) {
> +                       u8 *ptr, *ptr_end;
> +
> +                       /* Check for invalid property. */
> +                       if (note->n_descsz < 8 ||
> +                          (note->n_descsz % align) != 0)
> +                               return 0;
> +
> +                       /* Start and end of property array. */
> +                       ptr = (u8 *)(note + 1) + 4;
> +                       ptr_end = ptr + note->n_descsz;

Exploitable bug here?  You haven't checked that ptr is in bounds or
that ptr + ptr_end is in bounds (or that ptr_end > ptr, for that
matter).

> +
> +                       while (1) {
> +                               u32 type = *(u32 *)ptr;
> +                               u32 datasz = *(u32 *)(ptr + 4);
> +
> +                               ptr += 8;
> +                               if ((ptr + datasz) > ptr_end)
> +                                       break;
> +
> +                               if (type == GNU_PROPERTY_X86_FEATURE_1_AND &&
> +                                   datasz == 4) {
> +                                       u32 p = *(u32 *)ptr;
> +
> +                                       if (p & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> +                                               *shstk = 1;
> +                                       if (p & GNU_PROPERTY_X86_FEATURE_1_IBT)
> +                                               *ibt = 1;
> +                                       return 1;
> +                               }
> +                       }
> +               }
> +
> +               /*
> +                * Note sections like .note.ABI-tag and .note.gnu.build-id
> +                * are aligned to 4 bytes in 64-bit ELF objects.
> +                */
> +               note = (void *)note + ELF_NOTE_NEXT_OFFSET(note, align);

A malicious value here will probably just break out of the while
statement, but it's still scary.

> +       }
> +
> +       return 0;
> +}
> +
> +static int check_pt_note_segment(struct file *file,
> +                                unsigned long note_size, loff_t *pos,
> +                                u32 align, int *shstk, int *ibt)
> +{
> +       int retval;
> +       char *note_buf;
> +
> +       /*
> +        * Try to read in the whole PT_NOTE segment.
> +        */
> +       note_buf = kmalloc(note_size, GFP_KERNEL);

kmalloc() with fully user-controlled, unchecked size is not a good idea.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 16:40   ` Andy Lutomirski
  2018-06-07 16:51     ` Yu-cheng Yu
@ 2018-06-07 18:41     ` Peter Zijlstra
  2018-06-07 20:31       ` Yu-cheng Yu
  2018-06-11  8:17     ` Peter Zijlstra
  2 siblings, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2018-06-07 18:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> Peterz, isn't there some fancy better way we're supposed to handle the
> error return these days?

Don't think so. I played with a few things but that never really went
anywhere.

Also, both asm things look suspicously similar, it might make sense to
share. Also, maybe do the instruction .byte sequence in a #define INSN
or something.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 14:38 ` [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack Yu-cheng Yu
@ 2018-06-07 18:48   ` Andy Lutomirski
  2018-06-07 20:30     ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:48 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> The following operations are provided.
>
> ARCH_CET_STATUS:
>         return the current CET status
>
> ARCH_CET_DISABLE:
>         disable CET features
>
> ARCH_CET_LOCK:
>         lock out CET features
>
> ARCH_CET_EXEC:
>         set CET features for exec()
>
> ARCH_CET_ALLOC_SHSTK:
>         allocate a new shadow stack
>
> ARCH_CET_PUSH_SHSTK:
>         put a return address on shadow stack
>
> ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> the implementation of GLIBC ucontext related APIs.

Please document exactly what these all do and why.  I don't understand
what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
each ELF program, so I think there should be no need for a magic
override.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/10] mm: Prevent mremap of shadow stack
  2018-06-07 14:38 ` [PATCH 08/10] mm: Prevent mremap of " Yu-cheng Yu
@ 2018-06-07 18:48   ` Andy Lutomirski
  2018-06-07 20:18     ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:48 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Please justify.  This seems actively harmful to me.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/10] mm: Prevent munmap and remap_file_pages of shadow stack
  2018-06-07 14:38 ` [PATCH 10/10] mm: Prevent munmap and remap_file_pages of " Yu-cheng Yu
@ 2018-06-07 18:50   ` Andy Lutomirski
  2018-06-07 20:15     ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 18:50 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:

blocking remap_file_pages() seems reasonable.  I'm not sure what's
wrong with munmap().

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 18:30   ` Andy Lutomirski
@ 2018-06-07 18:58     ` Florian Weimer
  2018-06-07 19:51       ` Yu-cheng Yu
  2018-06-07 20:07     ` Cyrill Gorcunov
  2018-06-07 20:12     ` Yu-cheng Yu
  2 siblings, 1 reply; 98+ messages in thread
From: Florian Weimer @ 2018-06-07 18:58 UTC (permalink / raw)
  To: Andy Lutomirski, Yu-cheng Yu, Dmitry Safonov, Cyrill Gorcunov
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On 06/07/2018 08:30 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> Set and restore shadow stack pointer for signals.
> 
> How does this interact with siglongjmp()?

We plan to use some unused signal mask bits in the jump buffer (we have 
a lot of those in glibc for some reason) to store the shadow stack pointer.

> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.
> 
>> diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
>> index 844d60eb1882..6c8997a0156a 100644
>> --- a/arch/x86/include/uapi/asm/sigcontext.h
>> +++ b/arch/x86/include/uapi/asm/sigcontext.h
>> @@ -230,6 +230,7 @@ struct sigcontext_32 {
>>          __u32                           fpstate; /* Zero when no FPU/extended context */
>>          __u32                           oldmask;
>>          __u32                           cr2;
>> +       __u32                           ssp;
>>   };
>>
>>   /*
>> @@ -262,6 +263,7 @@ struct sigcontext_64 {
>>          __u64                           trapno;
>>          __u64                           oldmask;
>>          __u64                           cr2;
>> +       __u64                           ssp;
>>
>>          /*
>>           * fpstate is really (struct _fpstate *) or (struct _xstate *)
>> @@ -320,6 +322,7 @@ struct sigcontext {
>>          struct _fpstate __user          *fpstate;
>>          __u32                           oldmask;
>>          __u32                           cr2;
>> +       __u32                           ssp;
> 
> Is it actually okay to modify these structures like this?  They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.

Probably not.  Historically, these things have been tacked at the end of 
the floating point state, see struct _xstate:

         /* New processor state extensions go here: */

However, I'm not sure if this is really ideal because I doubt that 
everyone who needs the shadow stack pointer also wants to sacrifice 
space for the AVX-512 save area (which is already a backwards 
compatibility hazard).  Other architectures have variable offsets and 
some TLV-style setup here.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-07 18:21   ` Andy Lutomirski
@ 2018-06-07 19:47     ` Florian Weimer
  2018-06-07 20:53       ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Florian Weimer @ 2018-06-07 19:47 UTC (permalink / raw)
  To: Andy Lutomirski, Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>> needs a separate program stack and a separate shadow stack.
>> This patch handles allocation and freeing of the thread shadow
>> stack.
> 
> Aha -- you're trying to make this automatic.  I'm not convinced this
> is a good idea.  The Linux kernel has a long and storied history of
> enabling new hardware features in ways that are almost entirely
> useless for userspace.
> 
> Florian, do you have any thoughts on how the user/kernel interaction
> for the shadow stack should work?

I have not looked at this in detail, have not played with the emulator, 
and have not been privy to any discussions before these patches have 
been posted, however …

I believe that we want as little code in userspace for shadow stack 
management as possible.  One concern I have is that even with the code 
we arguably need for various kinds of stack unwinding, we might have 
unwittingly built a generic trampoline that leads to full CET bypass.

I also expect that we'd only have donor mappings in userspace anyway, 
and that the memory is not actually accessible from userspace if it is 
used for a shadow stack.

> My intuition would be that all
> shadow stack management should be entirely controlled by userspace --
> newly cloned threads (with CLONE_VM) should have no shadow stack
> initially, and newly started processes should have no shadow stack
> until they ask for one.

If the new thread doesn't have a shadow stack, we need to disable 
signals around clone, and we are very likely forced to rewrite the early 
thread setup in assembler, to avoid spurious calls (including calls to 
thunks to get EIP on i386).  I wouldn't want to do this If we can avoid 
it.  Just using C and hoping to get away with it doesn't sound greater, 
either.  And obviously there is the matter that the initial thread setup 
code ends up being that universal trampoline.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 18:58     ` Florian Weimer
@ 2018-06-07 19:51       ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 19:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Dmitry Safonov, Cyrill Gorcunov, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 20:58 +0200, Florian Weimer wrote:
> On 06/07/2018 08:30 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> Set and restore shadow stack pointer for signals.
> > 
> > How does this interact with siglongjmp()?
> 
> We plan to use some unused signal mask bits in the jump buffer (we have 
> a lot of those in glibc for some reason) to store the shadow stack pointer.
> 
> > This patch makes me extremely nervous due to the possibility of ABI
> > issues and CRIU breakage.
> > 
> >> diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> >> index 844d60eb1882..6c8997a0156a 100644
> >> --- a/arch/x86/include/uapi/asm/sigcontext.h
> >> +++ b/arch/x86/include/uapi/asm/sigcontext.h
> >> @@ -230,6 +230,7 @@ struct sigcontext_32 {
> >>          __u32                           fpstate; /* Zero when no FPU/extended context */
> >>          __u32                           oldmask;
> >>          __u32                           cr2;
> >> +       __u32                           ssp;
> >>   };
> >>
> >>   /*
> >> @@ -262,6 +263,7 @@ struct sigcontext_64 {
> >>          __u64                           trapno;
> >>          __u64                           oldmask;
> >>          __u64                           cr2;
> >> +       __u64                           ssp;
> >>
> >>          /*
> >>           * fpstate is really (struct _fpstate *) or (struct _xstate *)
> >> @@ -320,6 +322,7 @@ struct sigcontext {
> >>          struct _fpstate __user          *fpstate;
> >>          __u32                           oldmask;
> >>          __u32                           cr2;
> >> +       __u32                           ssp;
> > 
> > Is it actually okay to modify these structures like this?  They're
> > part of the user ABI, and I don't know whether any user code relies on
> > the size being constant.
> 
> Probably not.  Historically, these things have been tacked at the end of 
> the floating point state, see struct _xstate:
> 
>          /* New processor state extensions go here: */
> 
> However, I'm not sure if this is really ideal because I doubt that 
> everyone who needs the shadow stack pointer also wants to sacrifice 
> space for the AVX-512 save area (which is already a backwards 
> compatibility hazard).  Other architectures have variable offsets and 
> some TLV-style setup here.
> 
> Thanks,
> Florian

I will move 'ssp' to _xstate for now, and look for other ways.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 18:30   ` Andy Lutomirski
  2018-06-07 18:58     ` Florian Weimer
@ 2018-06-07 20:07     ` Cyrill Gorcunov
  2018-06-07 20:57       ` Andy Lutomirski
  2018-06-07 20:12     ` Yu-cheng Yu
  2 siblings, 1 reply; 98+ messages in thread
From: Cyrill Gorcunov @ 2018-06-07 20:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz

On Thu, Jun 07, 2018 at 11:30:34AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Set and restore shadow stack pointer for signals.
> 
> How does this interact with siglongjmp()?
> 
> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.
> 
> > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > index 844d60eb1882..6c8997a0156a 100644
> > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> >         __u32                           fpstate; /* Zero when no FPU/extended context */
> >         __u32                           oldmask;
> >         __u32                           cr2;
> > +       __u32                           ssp;
> >  };
> >
> >  /*
> > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> >         __u64                           trapno;
> >         __u64                           oldmask;
> >         __u64                           cr2;
> > +       __u64                           ssp;
> >
> >         /*
> >          * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > @@ -320,6 +322,7 @@ struct sigcontext {
> >         struct _fpstate __user          *fpstate;
> >         __u32                           oldmask;
> >         __u32                           cr2;
> > +       __u32                           ssp;
> 
> Is it actually okay to modify these structures like this?  They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.

For sure it might cause problems for CRIU since we have
similar definitions for this structure inside our code.
That said if kernel is about to modify the structures it
should keep backward compatibility at least if a user
passes previous version of a structure @ssp should be
set to something safe by the kernel itself.

I didn't read the whole series of patches in details
yet, hopefully will be able tomorrow. Thanks Andy for
CC'ing!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 18:30   ` Andy Lutomirski
  2018-06-07 18:58     ` Florian Weimer
  2018-06-07 20:07     ` Cyrill Gorcunov
@ 2018-06-07 20:12     ` Yu-cheng Yu
  2018-06-07 20:17       ` Dave Hansen
  2 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, Dmitry Safonov, Cyrill Gorcunov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz

On Thu, 2018-06-07 at 11:30 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Set and restore shadow stack pointer for signals.
> 
> How does this interact with siglongjmp()?
> 
> This patch makes me extremely nervous due to the possibility of ABI
> issues and CRIU breakage.

Longjmp/Siglongjmp is handled in GLIBC and basically the shadow stack
pointer is unwound.  There could be some unexpected conditions.
However, we run all GLIBC tests.

> 
> > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > index 844d60eb1882..6c8997a0156a 100644
> > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> >         __u32                           fpstate; /* Zero when no FPU/extended context */
> >         __u32                           oldmask;
> >         __u32                           cr2;
> > +       __u32                           ssp;
> >  };
> >
> >  /*
> > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> >         __u64                           trapno;
> >         __u64                           oldmask;
> >         __u64                           cr2;
> > +       __u64                           ssp;
> >
> >         /*
> >          * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > @@ -320,6 +322,7 @@ struct sigcontext {
> >         struct _fpstate __user          *fpstate;
> >         __u32                           oldmask;
> >         __u32                           cr2;
> > +       __u32                           ssp;
> 
> Is it actually okay to modify these structures like this?  They're
> part of the user ABI, and I don't know whether any user code relies on
> the size being constant.
> 
> > +int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val)
> > +{
> > +       if (val >= TASK_SIZE)
> > +               return -EINVAL;
> 
> TASK_SIZE_MAX.  But I'm a bit unsure why you need this check at all.

If an invalid address is put on the shadow stack, the task will get a
control protection fault.  I will change it to TASK_SIZE_MAX.

> 
> > +int cet_restore_signal(unsigned long ssp)
> > +{
> > +       if (!current->thread.cet.shstk_enabled)
> > +               return 0;
> > +       return cet_set_shstk_ptr(ssp);
> > +}
> 
> This will blow up if the shadow stack enabled state changes in a
> signal handler.  Maybe we don't care.

Yes, the task will get a control protection fault.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/10] mm: Prevent munmap and remap_file_pages of shadow stack
  2018-06-07 18:50   ` Andy Lutomirski
@ 2018-06-07 20:15     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 11:50 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> 
> blocking remap_file_pages() seems reasonable.  I'm not sure what's
> wrong with munmap().

Yes, maybe we don't need to block munmap().  If the shadow stack is
unmapped, the application gets a fault.  I will remove the patch.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 20:12     ` Yu-cheng Yu
@ 2018-06-07 20:17       ` Dave Hansen
  0 siblings, 0 replies; 98+ messages in thread
From: Dave Hansen @ 2018-06-07 20:17 UTC (permalink / raw)
  To: Yu-cheng Yu, Andy Lutomirski
  Cc: Florian Weimer, Dmitry Safonov, Cyrill Gorcunov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On 06/07/2018 01:12 PM, Yu-cheng Yu wrote:
>>> +int cet_restore_signal(unsigned long ssp)
>>> +{
>>> +       if (!current->thread.cet.shstk_enabled)
>>> +               return 0;
>>> +       return cet_set_shstk_ptr(ssp);
>>> +}
>> This will blow up if the shadow stack enabled state changes in a
>> signal handler.  Maybe we don't care.
> Yes, the task will get a control protection fault.

Sounds like something to add to the very long list of things that are
unwise to do in a signal handler.  Great manpage fodder.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/10] mm: Prevent mremap of shadow stack
  2018-06-07 18:48   ` Andy Lutomirski
@ 2018-06-07 20:18     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> Please justify.  This seems actively harmful to me.

I will remove this patch.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 18:48   ` Andy Lutomirski
@ 2018-06-07 20:30     ` Yu-cheng Yu
  2018-06-07 21:01       ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > The following operations are provided.
> >
> > ARCH_CET_STATUS:
> >         return the current CET status
> >
> > ARCH_CET_DISABLE:
> >         disable CET features
> >
> > ARCH_CET_LOCK:
> >         lock out CET features
> >
> > ARCH_CET_EXEC:
> >         set CET features for exec()
> >
> > ARCH_CET_ALLOC_SHSTK:
> >         allocate a new shadow stack
> >
> > ARCH_CET_PUSH_SHSTK:
> >         put a return address on shadow stack
> >
> > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> > the implementation of GLIBC ucontext related APIs.
> 
> Please document exactly what these all do and why.  I don't understand
> what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> each ELF program, so I think there should be no need for a magic
> override.

CET is initially enabled if the loader has CET capability.  Then the
loader decides if the application can run with CET.  If the application
cannot run with CET (e.g. a dependent library does not have CET), then
the loader turns off CET before passing to the application.  When the
loader is done, it locks out CET and the feature cannot be turned off
anymore until the next exec() call.  When the next exec() is called, CET
feature is turned on/off based on the values set by ARCH_CET_EXEC.

I will put more details in Documentation/x86/intel_cet.txt.
 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 18:41     ` Peter Zijlstra
@ 2018-06-07 20:31       ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 20:41 +0200, Peter Zijlstra wrote:
> On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> > Peterz, isn't there some fancy better way we're supposed to handle the
> > error return these days?
> 
> Don't think so. I played with a few things but that never really went
> anywhere.
> 
> Also, both asm things look suspicously similar, it might make sense to
> share. Also, maybe do the instruction .byte sequence in a #define INSN
> or something.

I will fix that.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement
  2018-06-07 18:38   ` Andy Lutomirski
@ 2018-06-07 20:40     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 20:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, 2018-06-07 at 11:38 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > Look in .note.gnu.property of an ELF file and check if shadow stack needs
> > to be enabled for the task.
> 
> Nice!  But please structure it so it's one function that parses out
> all the ELF notes and some other code (a table or a switch statement)
> that handles them.  We will probably want to add more kernel-parsed
> ELF notes some day, so let's structure the code to make it easier.
> 
> > +static int find_cet(u8 *buf, u32 size, u32 align, int *shstk, int *ibt)
> > +{
> > +       unsigned long start = (unsigned long)buf;
> > +       struct elf_note *note = (struct elf_note *)buf;
> > +
> > +       *shstk = 0;
> > +       *ibt = 0;
> > +
> > +       /*
> > +        * Go through the x86_note_gnu_property array pointed by
> > +        * buf and look for shadow stack and indirect branch
> > +        * tracking features.
> > +        * The GNU_PROPERTY_X86_FEATURE_1_AND entry contains only
> > +        * one u32 as data.  Do not go beyond buf_size.
> > +        */
> > +
> > +       while ((unsigned long) (note + 1) - start < size) {
> > +               /* Find the NT_GNU_PROPERTY_TYPE_0 note. */
> > +               if (note->n_namesz == 4 &&
> > +                   note->n_type == NT_GNU_PROPERTY_TYPE_0 &&
> > +                   memcmp(note + 1, "GNU", 4) == 0) {
> > +                       u8 *ptr, *ptr_end;
> > +
> > +                       /* Check for invalid property. */
> > +                       if (note->n_descsz < 8 ||
> > +                          (note->n_descsz % align) != 0)
> > +                               return 0;
> > +
> > +                       /* Start and end of property array. */
> > +                       ptr = (u8 *)(note + 1) + 4;
> > +                       ptr_end = ptr + note->n_descsz;
> 
> Exploitable bug here?  You haven't checked that ptr is in bounds or
> that ptr + ptr_end is in bounds (or that ptr_end > ptr, for that
> matter).
> 
> > +
> > +                       while (1) {
> > +                               u32 type = *(u32 *)ptr;
> > +                               u32 datasz = *(u32 *)(ptr + 4);
> > +
> > +                               ptr += 8;
> > +                               if ((ptr + datasz) > ptr_end)
> > +                                       break;
> > +
> > +                               if (type == GNU_PROPERTY_X86_FEATURE_1_AND &&
> > +                                   datasz == 4) {
> > +                                       u32 p = *(u32 *)ptr;
> > +
> > +                                       if (p & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> > +                                               *shstk = 1;
> > +                                       if (p & GNU_PROPERTY_X86_FEATURE_1_IBT)
> > +                                               *ibt = 1;
> > +                                       return 1;
> > +                               }
> > +                       }
> > +               }
> > +
> > +               /*
> > +                * Note sections like .note.ABI-tag and .note.gnu.build-id
> > +                * are aligned to 4 bytes in 64-bit ELF objects.
> > +                */
> > +               note = (void *)note + ELF_NOTE_NEXT_OFFSET(note, align);
> 
> A malicious value here will probably just break out of the while
> statement, but it's still scary.
> 
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int check_pt_note_segment(struct file *file,
> > +                                unsigned long note_size, loff_t *pos,
> > +                                u32 align, int *shstk, int *ibt)
> > +{
> > +       int retval;
> > +       char *note_buf;
> > +
> > +       /*
> > +        * Try to read in the whole PT_NOTE segment.
> > +        */
> > +       note_buf = kmalloc(note_size, GFP_KERNEL);
> 
> kmalloc() with fully user-controlled, unchecked size is not a good idea.

I will fix these problems.  Thanks!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-07 19:47     ` Florian Weimer
@ 2018-06-07 20:53       ` Andy Lutomirski
  2018-06-08 14:53         ` Florian Weimer
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 20:53 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >> needs a separate program stack and a separate shadow stack.
> >> This patch handles allocation and freeing of the thread shadow
> >> stack.
> >
> > Aha -- you're trying to make this automatic.  I'm not convinced this
> > is a good idea.  The Linux kernel has a long and storied history of
> > enabling new hardware features in ways that are almost entirely
> > useless for userspace.
> >
> > Florian, do you have any thoughts on how the user/kernel interaction
> > for the shadow stack should work?
>
> I have not looked at this in detail, have not played with the emulator,
> and have not been privy to any discussions before these patches have
> been posted, however …
>
> I believe that we want as little code in userspace for shadow stack
> management as possible.  One concern I have is that even with the code
> we arguably need for various kinds of stack unwinding, we might have
> unwittingly built a generic trampoline that leads to full CET bypass.

I was imagining an API like "allocate a shadow stack for the current
thread, fail if the current thread already has one, and turn on the
shadow stack".  glibc would call clone and then call this ABI pretty
much immediately (i.e. before making any calls from which it expects
to return).

We definitely want strong enough user control that tools like CRIU can
continue to work.  I haven't looked at the SDM recently enough to
remember for sure, but I'm reasonably confident that user code can
learn the address of its own shadow stack.  If nothing else, CRIU
needs to be able to restore from a context where there's a signal on
the stack and the signal frame contains a shadow stack pointer.


>
> I also expect that we'd only have donor mappings in userspace anyway,
> and that the memory is not actually accessible from userspace if it is
> used for a shadow stack.
>
> > My intuition would be that all
> > shadow stack management should be entirely controlled by userspace --
> > newly cloned threads (with CLONE_VM) should have no shadow stack
> > initially, and newly started processes should have no shadow stack
> > until they ask for one.
>
> If the new thread doesn't have a shadow stack, we need to disable
> signals around clone, and we are very likely forced to rewrite the early
> thread setup in assembler, to avoid spurious calls (including calls to
> thunks to get EIP on i386).  I wouldn't want to do this If we can avoid
> it.  Just using C and hoping to get away with it doesn't sound greater,
> either.  And obviously there is the matter that the initial thread setup
> code ends up being that universal trampoline.
>

Only if the trampoline works if the shadow stack is already enabled.

I could very easily be convinced that automatic shadow stack setup is
a good idea, but I still think we need manual control for CRIU and
such.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
  2018-06-07 14:38 ` [PATCH 09/10] mm: Prevent madvise from changing " Yu-cheng Yu
@ 2018-06-07 20:54   ` Andy Lutomirski
  2018-06-07 21:09   ` Nadav Amit
  1 sibling, 0 replies; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 20:54 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Seems reasonable to me.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 20:07     ` Cyrill Gorcunov
@ 2018-06-07 20:57       ` Andy Lutomirski
  2018-06-08 12:07         ` Cyrill Gorcunov
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 20:57 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz

On Thu, Jun 7, 2018 at 1:07 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> On Thu, Jun 07, 2018 at 11:30:34AM -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > Set and restore shadow stack pointer for signals.
> >
> > How does this interact with siglongjmp()?
> >
> > This patch makes me extremely nervous due to the possibility of ABI
> > issues and CRIU breakage.
> >
> > > diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
> > > index 844d60eb1882..6c8997a0156a 100644
> > > --- a/arch/x86/include/uapi/asm/sigcontext.h
> > > +++ b/arch/x86/include/uapi/asm/sigcontext.h
> > > @@ -230,6 +230,7 @@ struct sigcontext_32 {
> > >         __u32                           fpstate; /* Zero when no FPU/extended context */
> > >         __u32                           oldmask;
> > >         __u32                           cr2;
> > > +       __u32                           ssp;
> > >  };
> > >
> > >  /*
> > > @@ -262,6 +263,7 @@ struct sigcontext_64 {
> > >         __u64                           trapno;
> > >         __u64                           oldmask;
> > >         __u64                           cr2;
> > > +       __u64                           ssp;
> > >
> > >         /*
> > >          * fpstate is really (struct _fpstate *) or (struct _xstate *)
> > > @@ -320,6 +322,7 @@ struct sigcontext {
> > >         struct _fpstate __user          *fpstate;
> > >         __u32                           oldmask;
> > >         __u32                           cr2;
> > > +       __u32                           ssp;
> >
> > Is it actually okay to modify these structures like this?  They're
> > part of the user ABI, and I don't know whether any user code relies on
> > the size being constant.
>
> For sure it might cause problems for CRIU since we have
> similar definitions for this structure inside our code.
> That said if kernel is about to modify the structures it
> should keep backward compatibility at least if a user
> passes previous version of a structure @ssp should be
> set to something safe by the kernel itself.
>
> I didn't read the whole series of patches in details
> yet, hopefully will be able tomorrow. Thanks Andy for
> CC'ing!

We have uc_flags.  It might be useful to carve out some of the flag
space (24 bits?) to indicate something like the *size* of sigcontext
and teach the kernel that new sigcontext fields should only be parsed
on sigreturn() if the size is large enough.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 20:30     ` Yu-cheng Yu
@ 2018-06-07 21:01       ` Andy Lutomirski
  2018-06-07 22:02         ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 21:01 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Andrew Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > The following operations are provided.
> > >
> > > ARCH_CET_STATUS:
> > >         return the current CET status
> > >
> > > ARCH_CET_DISABLE:
> > >         disable CET features
> > >
> > > ARCH_CET_LOCK:
> > >         lock out CET features
> > >
> > > ARCH_CET_EXEC:
> > >         set CET features for exec()
> > >
> > > ARCH_CET_ALLOC_SHSTK:
> > >         allocate a new shadow stack
> > >
> > > ARCH_CET_PUSH_SHSTK:
> > >         put a return address on shadow stack
> > >
> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> > > the implementation of GLIBC ucontext related APIs.
> >
> > Please document exactly what these all do and why.  I don't understand
> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> > each ELF program, so I think there should be no need for a magic
> > override.
>
> CET is initially enabled if the loader has CET capability.  Then the
> loader decides if the application can run with CET.  If the application
> cannot run with CET (e.g. a dependent library does not have CET), then
> the loader turns off CET before passing to the application.  When the
> loader is done, it locks out CET and the feature cannot be turned off
> anymore until the next exec() call.

Why is the lockout necessary?  If user code enables CET and tries to
run code that doesn't support CET, it will crash.  I don't see why we
need special code in the kernel to prevent a user program from calling
arch_prctl() and crashing itself.  There are already plenty of ways to
do that :)

> When the next exec() is called, CET
> feature is turned on/off based on the values set by ARCH_CET_EXEC.

And why do we need ARCH_CET_EXEC?

For background, I really really dislike adding new state that persists
across exec().  It's nice to get as close to a clean slate as possible
after exec() so that programs can run in a predictable environment.
exec() is also a security boundary, and anything a task can do to
affect itself after exec() needs to have its security implications
considered very carefully.  (As a trivial example, you should not be
able to use cetcmd ... sudo [malicious options here] to cause sudo to
run with CET off and then try to exploit it via the malicious options.

If a shutoff is needed for testing, how about teaching ld.so to parse
LD_CET=no or similar and protect it the same way as LD_PRELOAD is
protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
  2018-06-07 14:38 ` [PATCH 09/10] mm: Prevent madvise from changing " Yu-cheng Yu
  2018-06-07 20:54   ` Andy Lutomirski
@ 2018-06-07 21:09   ` Nadav Amit
  2018-06-07 21:18     ` Yu-cheng Yu
  1 sibling, 1 reply; 98+ messages in thread
From: Nadav Amit @ 2018-06-07 21:09 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Linux Kernel Mailing List, linux-doc,
	open list:MEMORY MANAGEMENT, linux-arch,
	the arch/x86 maintainers, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H.J. Lu, Vedvyas Shanbhogue, Ravi V. Shankar,
	Dave Hansen, Andy Lutomirski, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, Mike Kravetz

Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:

> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
> mm/madvise.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922ea1a1..2a6988badd6b 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -839,6 +839,14 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> 	if (vma && start > vma->vm_start)
> 		prev = vma;
> 
> +	/*
> +	 * Don't do anything on shadow stack.
> +	 */
> +	if (vma->vm_flags & VM_SHSTK) {
> +		error = -EINVAL;
> +		goto out_no_plug;
> +	}
> +
> 	blk_start_plug(&plug);
> 	for (;;) {
> 		/* Still start < end. */

What happens if the madvise() revolves multiple VMAs, the first one is not
VM_SHSTK, but the another one is? Shouldn’t the test be done inside the
loop, potentially in madvise_vma() ?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 09/10] mm: Prevent madvise from changing shadow stack
  2018-06-07 21:09   ` Nadav Amit
@ 2018-06-07 21:18     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-07 21:18 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Linux Kernel Mailing List, linux-doc,
	open list:MEMORY MANAGEMENT, linux-arch,
	the arch/x86 maintainers, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H.J. Lu, Vedvyas Shanbhogue, Ravi V. Shankar,
	Dave Hansen, Andy Lutomirski, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, Mike Kravetz

On Thu, 2018-06-07 at 14:09 -0700, Nadav Amit wrote:
> Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> > mm/madvise.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 4d3c922ea1a1..2a6988badd6b 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -839,6 +839,14 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> > 	if (vma && start > vma->vm_start)
> > 		prev = vma;
> > 
> > +	/*
> > +	 * Don't do anything on shadow stack.
> > +	 */
> > +	if (vma->vm_flags & VM_SHSTK) {
> > +		error = -EINVAL;
> > +		goto out_no_plug;
> > +	}
> > +
> > 	blk_start_plug(&plug);
> > 	for (;;) {
> > 		/* Still start < end. */
> 
> What happens if the madvise() revolves multiple VMAs, the first one is not
> VM_SHSTK, but the another one is? Shouldn’t the test be done inside the
> loop, potentially in madvise_vma() ?
> 

I will fix it.  Thanks!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 21:01       ` Andy Lutomirski
@ 2018-06-07 22:02         ` H.J. Lu
  2018-06-07 23:01           ` Andy Lutomirski
                             ` (2 more replies)
  0 siblings, 3 replies; 98+ messages in thread
From: H.J. Lu @ 2018-06-07 22:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> > >
>> > > The following operations are provided.
>> > >
>> > > ARCH_CET_STATUS:
>> > >         return the current CET status
>> > >
>> > > ARCH_CET_DISABLE:
>> > >         disable CET features
>> > >
>> > > ARCH_CET_LOCK:
>> > >         lock out CET features
>> > >
>> > > ARCH_CET_EXEC:
>> > >         set CET features for exec()
>> > >
>> > > ARCH_CET_ALLOC_SHSTK:
>> > >         allocate a new shadow stack
>> > >
>> > > ARCH_CET_PUSH_SHSTK:
>> > >         put a return address on shadow stack
>> > >
>> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> > > the implementation of GLIBC ucontext related APIs.
>> >
>> > Please document exactly what these all do and why.  I don't understand
>> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> > each ELF program, so I think there should be no need for a magic
>> > override.
>>
>> CET is initially enabled if the loader has CET capability.  Then the
>> loader decides if the application can run with CET.  If the application
>> cannot run with CET (e.g. a dependent library does not have CET), then
>> the loader turns off CET before passing to the application.  When the
>> loader is done, it locks out CET and the feature cannot be turned off
>> anymore until the next exec() call.
>
> Why is the lockout necessary?  If user code enables CET and tries to
> run code that doesn't support CET, it will crash.  I don't see why we
> need special code in the kernel to prevent a user program from calling
> arch_prctl() and crashing itself.  There are already plenty of ways to
> do that :)

On CET enabled machine, not all programs nor shared libraries are
CET enabled.  But since ld.so is CET enabled, all programs start
as CET enabled.  ld.so will disable CET if a program or any of its shared
libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
checking so that CET can't no longer be disabled afterwards.

>> When the next exec() is called, CET
>> feature is turned on/off based on the values set by ARCH_CET_EXEC.
>
> And why do we need ARCH_CET_EXEC?
>
> For background, I really really dislike adding new state that persists
> across exec().  It's nice to get as close to a clean slate as possible
> after exec() so that programs can run in a predictable environment.
> exec() is also a security boundary, and anything a task can do to
> affect itself after exec() needs to have its security implications
> considered very carefully.  (As a trivial example, you should not be
> able to use cetcmd ... sudo [malicious options here] to cause sudo to
> run with CET off and then try to exploit it via the malicious options.
>
> If a shutoff is needed for testing, how about teaching ld.so to parse
> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>

I will take a look.


-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 22:02         ` H.J. Lu
@ 2018-06-07 23:01           ` Andy Lutomirski
  2018-06-08  4:09             ` H.J. Lu
  2018-06-08  4:22           ` H.J. Lu
  2018-06-12 10:03           ` Thomas Gleixner
  2 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-07 23:01 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> > >
> >> > > The following operations are provided.
> >> > >
> >> > > ARCH_CET_STATUS:
> >> > >         return the current CET status
> >> > >
> >> > > ARCH_CET_DISABLE:
> >> > >         disable CET features
> >> > >
> >> > > ARCH_CET_LOCK:
> >> > >         lock out CET features
> >> > >
> >> > > ARCH_CET_EXEC:
> >> > >         set CET features for exec()
> >> > >
> >> > > ARCH_CET_ALLOC_SHSTK:
> >> > >         allocate a new shadow stack
> >> > >
> >> > > ARCH_CET_PUSH_SHSTK:
> >> > >         put a return address on shadow stack
> >> > >
> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> >> > > the implementation of GLIBC ucontext related APIs.
> >> >
> >> > Please document exactly what these all do and why.  I don't understand
> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> >> > each ELF program, so I think there should be no need for a magic
> >> > override.
> >>
> >> CET is initially enabled if the loader has CET capability.  Then the
> >> loader decides if the application can run with CET.  If the application
> >> cannot run with CET (e.g. a dependent library does not have CET), then
> >> the loader turns off CET before passing to the application.  When the
> >> loader is done, it locks out CET and the feature cannot be turned off
> >> anymore until the next exec() call.
> >
> > Why is the lockout necessary?  If user code enables CET and tries to
> > run code that doesn't support CET, it will crash.  I don't see why we
> > need special code in the kernel to prevent a user program from calling
> > arch_prctl() and crashing itself.  There are already plenty of ways to
> > do that :)
>
> On CET enabled machine, not all programs nor shared libraries are
> CET enabled.  But since ld.so is CET enabled, all programs start
> as CET enabled.  ld.so will disable CET if a program or any of its shared
> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> checking so that CET can't no longer be disabled afterwards.

Yeah, I got that.  No one has explained *why*.

(Also, shouldn't the vDSO itself be marked as supporting CET?)

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 23:01           ` Andy Lutomirski
@ 2018-06-08  4:09             ` H.J. Lu
  2018-06-08  4:38               ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-08  4:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>
>> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> > >
>> >> > > The following operations are provided.
>> >> > >
>> >> > > ARCH_CET_STATUS:
>> >> > >         return the current CET status
>> >> > >
>> >> > > ARCH_CET_DISABLE:
>> >> > >         disable CET features
>> >> > >
>> >> > > ARCH_CET_LOCK:
>> >> > >         lock out CET features
>> >> > >
>> >> > > ARCH_CET_EXEC:
>> >> > >         set CET features for exec()
>> >> > >
>> >> > > ARCH_CET_ALLOC_SHSTK:
>> >> > >         allocate a new shadow stack
>> >> > >
>> >> > > ARCH_CET_PUSH_SHSTK:
>> >> > >         put a return address on shadow stack
>> >> > >
>> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> >> > > the implementation of GLIBC ucontext related APIs.
>> >> >
>> >> > Please document exactly what these all do and why.  I don't understand
>> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> >> > each ELF program, so I think there should be no need for a magic
>> >> > override.
>> >>
>> >> CET is initially enabled if the loader has CET capability.  Then the
>> >> loader decides if the application can run with CET.  If the application
>> >> cannot run with CET (e.g. a dependent library does not have CET), then
>> >> the loader turns off CET before passing to the application.  When the
>> >> loader is done, it locks out CET and the feature cannot be turned off
>> >> anymore until the next exec() call.
>> >
>> > Why is the lockout necessary?  If user code enables CET and tries to
>> > run code that doesn't support CET, it will crash.  I don't see why we
>> > need special code in the kernel to prevent a user program from calling
>> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> > do that :)
>>
>> On CET enabled machine, not all programs nor shared libraries are
>> CET enabled.  But since ld.so is CET enabled, all programs start
>> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> checking so that CET can't no longer be disabled afterwards.
>
> Yeah, I got that.  No one has explained *why*.

It is to prevent malicious code from disabling CET.

> (Also, shouldn't the vDSO itself be marked as supporting CET?)

No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
compatible.

-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 22:02         ` H.J. Lu
  2018-06-07 23:01           ` Andy Lutomirski
@ 2018-06-08  4:22           ` H.J. Lu
  2018-06-08  4:35             ` Andy Lutomirski
  2018-06-12 10:03           ` Thomas Gleixner
  2 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-08  4:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>
>>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>> > >
>>> > > The following operations are provided.
>>> > >
>>> > > ARCH_CET_STATUS:
>>> > >         return the current CET status
>>> > >
>>> > > ARCH_CET_DISABLE:
>>> > >         disable CET features
>>> > >
>>> > > ARCH_CET_LOCK:
>>> > >         lock out CET features
>>> > >
>>> > > ARCH_CET_EXEC:
>>> > >         set CET features for exec()
>>> > >
>>> > > ARCH_CET_ALLOC_SHSTK:
>>> > >         allocate a new shadow stack
>>> > >
>>> > > ARCH_CET_PUSH_SHSTK:
>>> > >         put a return address on shadow stack
>>> > >

>> And why do we need ARCH_CET_EXEC?
>>
>> For background, I really really dislike adding new state that persists
>> across exec().  It's nice to get as close to a clean slate as possible
>> after exec() so that programs can run in a predictable environment.
>> exec() is also a security boundary, and anything a task can do to
>> affect itself after exec() needs to have its security implications
>> considered very carefully.  (As a trivial example, you should not be
>> able to use cetcmd ... sudo [malicious options here] to cause sudo to
>> run with CET off and then try to exploit it via the malicious options.
>>
>> If a shutoff is needed for testing, how about teaching ld.so to parse
>> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
>> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>>
>
> I will take a look.

We can use LD_CET to turn off CET.   Since most of legacy binaries
are compatible with shadow stack,  ARCH_CET_EXEC can be used
to turn on shadow stack on legacy binaries:

[hjl@gnu-cet-1 glibc]$ readelf -n /bin/ls| head -10

Displaying notes found in: .note.ABI-tag
  Owner                 Data size Description
  GNU                  0x00000010 NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.property
  Owner                 Data size Description
  GNU                  0x00000020 NT_GNU_PROPERTY_TYPE_0
      Properties: x86 ISA used:
[hjl@gnu-cet-1 glibc]$ cetcmd --on -- /bin/ls /
Segmentation fault
[hjl@gnu-cet-1 glibc]$ cetcmd --on -f shstk -- /bin/ls /
bin   dev  export  lib   libx32      media  mnt  opt root  sbin  sys  usr
boot  etc  home    lib64  lost+found  misc   net  proc run   srv   tmp  var
[hjl@gnu-cet-1 glibc]$ cetcmd --on -f ibt -- /bin/ls /
Segmentation fault
[hjl@gnu-cet-1 glibc]$

-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08  4:22           ` H.J. Lu
@ 2018-06-08  4:35             ` Andy Lutomirski
  2018-06-08 12:17               ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-08  4:35 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 9:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>
> >>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>> > >
> >>> > > The following operations are provided.
> >>> > >
> >>> > > ARCH_CET_STATUS:
> >>> > >         return the current CET status
> >>> > >
> >>> > > ARCH_CET_DISABLE:
> >>> > >         disable CET features
> >>> > >
> >>> > > ARCH_CET_LOCK:
> >>> > >         lock out CET features
> >>> > >
> >>> > > ARCH_CET_EXEC:
> >>> > >         set CET features for exec()
> >>> > >
> >>> > > ARCH_CET_ALLOC_SHSTK:
> >>> > >         allocate a new shadow stack
> >>> > >
> >>> > > ARCH_CET_PUSH_SHSTK:
> >>> > >         put a return address on shadow stack
> >>> > >
>
> >> And why do we need ARCH_CET_EXEC?
> >>
> >> For background, I really really dislike adding new state that persists
> >> across exec().  It's nice to get as close to a clean slate as possible
> >> after exec() so that programs can run in a predictable environment.
> >> exec() is also a security boundary, and anything a task can do to
> >> affect itself after exec() needs to have its security implications
> >> considered very carefully.  (As a trivial example, you should not be
> >> able to use cetcmd ... sudo [malicious options here] to cause sudo to
> >> run with CET off and then try to exploit it via the malicious options.
> >>
> >> If a shutoff is needed for testing, how about teaching ld.so to parse
> >> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
> >> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
> >>
> >
> > I will take a look.
>
> We can use LD_CET to turn off CET.   Since most of legacy binaries
> are compatible with shadow stack,  ARCH_CET_EXEC can be used
> to turn on shadow stack on legacy binaries:

Is there any reason you can't use LD_CET=force to do it for
dynamically linked binaries?

I find it quite hard to believe that forcibly CET-ifying a legacy
statically linked binary is a good idea.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08  4:09             ` H.J. Lu
@ 2018-06-08  4:38               ` Andy Lutomirski
  2018-06-08 12:24                 ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-08  4:38 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> >>
> >> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> >> > >
> >> >> > > The following operations are provided.
> >> >> > >
> >> >> > > ARCH_CET_STATUS:
> >> >> > >         return the current CET status
> >> >> > >
> >> >> > > ARCH_CET_DISABLE:
> >> >> > >         disable CET features
> >> >> > >
> >> >> > > ARCH_CET_LOCK:
> >> >> > >         lock out CET features
> >> >> > >
> >> >> > > ARCH_CET_EXEC:
> >> >> > >         set CET features for exec()
> >> >> > >
> >> >> > > ARCH_CET_ALLOC_SHSTK:
> >> >> > >         allocate a new shadow stack
> >> >> > >
> >> >> > > ARCH_CET_PUSH_SHSTK:
> >> >> > >         put a return address on shadow stack
> >> >> > >
> >> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> >> >> > > the implementation of GLIBC ucontext related APIs.
> >> >> >
> >> >> > Please document exactly what these all do and why.  I don't understand
> >> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> >> >> > each ELF program, so I think there should be no need for a magic
> >> >> > override.
> >> >>
> >> >> CET is initially enabled if the loader has CET capability.  Then the
> >> >> loader decides if the application can run with CET.  If the application
> >> >> cannot run with CET (e.g. a dependent library does not have CET), then
> >> >> the loader turns off CET before passing to the application.  When the
> >> >> loader is done, it locks out CET and the feature cannot be turned off
> >> >> anymore until the next exec() call.
> >> >
> >> > Why is the lockout necessary?  If user code enables CET and tries to
> >> > run code that doesn't support CET, it will crash.  I don't see why we
> >> > need special code in the kernel to prevent a user program from calling
> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
> >> > do that :)
> >>
> >> On CET enabled machine, not all programs nor shared libraries are
> >> CET enabled.  But since ld.so is CET enabled, all programs start
> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> >> checking so that CET can't no longer be disabled afterwards.
> >
> > Yeah, I got that.  No one has explained *why*.
>
> It is to prevent malicious code from disabling CET.
>

By the time malicious code issue its own syscalls, you've already lost
the battle.  I could probably be convinced that a lock-CET-on feature
that applies *only* to the calling thread and is not inherited by
clone() is a decent idea, but I'd want to see someone who understands
the state of the art in exploit design justify it.  You're also going
to need to figure out how to make CRIU work if you allow locking CET
on.

A priori, I think we should just not provide a lock mechanism.

> > (Also, shouldn't the vDSO itself be marked as supporting CET?)
>
> No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
> compatible.
>

I think the vDSO should do its best to act like a real DSO.  That
means that, if the vDSO supports CET, it should advertise support for
CET using the Linux ABI.  Since you're going to require GCC 8 anyway,
this should be a single line of code in the Makefile.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
  2018-06-07 20:57       ` Andy Lutomirski
@ 2018-06-08 12:07         ` Cyrill Gorcunov
  0 siblings, 0 replies; 98+ messages in thread
From: Cyrill Gorcunov @ 2018-06-08 12:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz

On Thu, Jun 07, 2018 at 01:57:03PM -0700, Andy Lutomirski wrote:
...
> >
> > I didn't read the whole series of patches in details
> > yet, hopefully will be able tomorrow. Thanks Andy for
> > CC'ing!
> 
> We have uc_flags.  It might be useful to carve out some of the flag
> space (24 bits?) to indicate something like the *size* of sigcontext
> and teach the kernel that new sigcontext fields should only be parsed
> on sigreturn() if the size is large enough.

Yes, this should do the trick.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08  4:35             ` Andy Lutomirski
@ 2018-06-08 12:17               ` H.J. Lu
  0 siblings, 0 replies; 98+ messages in thread
From: H.J. Lu @ 2018-06-08 12:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 9:35 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 9:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>>
>> >>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>> > >
>> >>> > > The following operations are provided.
>> >>> > >
>> >>> > > ARCH_CET_STATUS:
>> >>> > >         return the current CET status
>> >>> > >
>> >>> > > ARCH_CET_DISABLE:
>> >>> > >         disable CET features
>> >>> > >
>> >>> > > ARCH_CET_LOCK:
>> >>> > >         lock out CET features
>> >>> > >
>> >>> > > ARCH_CET_EXEC:
>> >>> > >         set CET features for exec()
>> >>> > >
>> >>> > > ARCH_CET_ALLOC_SHSTK:
>> >>> > >         allocate a new shadow stack
>> >>> > >
>> >>> > > ARCH_CET_PUSH_SHSTK:
>> >>> > >         put a return address on shadow stack
>> >>> > >
>>
>> >> And why do we need ARCH_CET_EXEC?
>> >>
>> >> For background, I really really dislike adding new state that persists
>> >> across exec().  It's nice to get as close to a clean slate as possible
>> >> after exec() so that programs can run in a predictable environment.
>> >> exec() is also a security boundary, and anything a task can do to
>> >> affect itself after exec() needs to have its security implications
>> >> considered very carefully.  (As a trivial example, you should not be
>> >> able to use cetcmd ... sudo [malicious options here] to cause sudo to
>> >> run with CET off and then try to exploit it via the malicious options.
>> >>
>> >> If a shutoff is needed for testing, how about teaching ld.so to parse
>> >> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
>> >> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>> >>
>> >
>> > I will take a look.
>>
>> We can use LD_CET to turn off CET.   Since most of legacy binaries
>> are compatible with shadow stack,  ARCH_CET_EXEC can be used
>> to turn on shadow stack on legacy binaries:
>
> Is there any reason you can't use LD_CET=force to do it for
> dynamically linked binaries?

We need to enable shadow stack from the start.  Otherwise function
return will fail when returning from callee with shadow stack to caller
without shadow stack.

> I find it quite hard to believe that forcibly CET-ifying a legacy
> statically linked binary is a good idea.

We'd like to provide protection as much as we can.

-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08  4:38               ` Andy Lutomirski
@ 2018-06-08 12:24                 ` H.J. Lu
  2018-06-08 14:57                   ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-08 12:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>
>> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> >>
>> >> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> >> > >
>> >> >> > > The following operations are provided.
>> >> >> > >
>> >> >> > > ARCH_CET_STATUS:
>> >> >> > >         return the current CET status
>> >> >> > >
>> >> >> > > ARCH_CET_DISABLE:
>> >> >> > >         disable CET features
>> >> >> > >
>> >> >> > > ARCH_CET_LOCK:
>> >> >> > >         lock out CET features
>> >> >> > >
>> >> >> > > ARCH_CET_EXEC:
>> >> >> > >         set CET features for exec()
>> >> >> > >
>> >> >> > > ARCH_CET_ALLOC_SHSTK:
>> >> >> > >         allocate a new shadow stack
>> >> >> > >
>> >> >> > > ARCH_CET_PUSH_SHSTK:
>> >> >> > >         put a return address on shadow stack
>> >> >> > >
>> >> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> >> >> > > the implementation of GLIBC ucontext related APIs.
>> >> >> >
>> >> >> > Please document exactly what these all do and why.  I don't understand
>> >> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> >> >> > each ELF program, so I think there should be no need for a magic
>> >> >> > override.
>> >> >>
>> >> >> CET is initially enabled if the loader has CET capability.  Then the
>> >> >> loader decides if the application can run with CET.  If the application
>> >> >> cannot run with CET (e.g. a dependent library does not have CET), then
>> >> >> the loader turns off CET before passing to the application.  When the
>> >> >> loader is done, it locks out CET and the feature cannot be turned off
>> >> >> anymore until the next exec() call.
>> >> >
>> >> > Why is the lockout necessary?  If user code enables CET and tries to
>> >> > run code that doesn't support CET, it will crash.  I don't see why we
>> >> > need special code in the kernel to prevent a user program from calling
>> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> >> > do that :)
>> >>
>> >> On CET enabled machine, not all programs nor shared libraries are
>> >> CET enabled.  But since ld.so is CET enabled, all programs start
>> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> >> checking so that CET can't no longer be disabled afterwards.
>> >
>> > Yeah, I got that.  No one has explained *why*.
>>
>> It is to prevent malicious code from disabling CET.
>>
>
> By the time malicious code issue its own syscalls, you've already lost
> the battle.  I could probably be convinced that a lock-CET-on feature
> that applies *only* to the calling thread and is not inherited by
> clone() is a decent idea, but I'd want to see someone who understands
> the state of the art in exploit design justify it.  You're also going
> to need to figure out how to make CRIU work if you allow locking CET
> on.
>
> A priori, I think we should just not provide a lock mechanism.

We need a door for CET.  But it is a very bad idea to leave it open
all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
In Userspace.  Can you free any application with AVX512 on AVX512
machine and restore it on non-AVX512 machine?

>> > (Also, shouldn't the vDSO itself be marked as supporting CET?)
>>
>> No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
>> compatible.
>>
>
> I think the vDSO should do its best to act like a real DSO.  That
> means that, if the vDSO supports CET, it should advertise support for
> CET using the Linux ABI.  Since you're going to require GCC 8 anyway,
> this should be a single line of code in the Makefile.

Sure.  A couple lines.

-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-07 20:53       ` Andy Lutomirski
@ 2018-06-08 14:53         ` Florian Weimer
  2018-06-08 15:01           ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Florian Weimer @ 2018-06-08 14:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
>>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>>
>>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>>>> needs a separate program stack and a separate shadow stack.
>>>> This patch handles allocation and freeing of the thread shadow
>>>> stack.
>>>
>>> Aha -- you're trying to make this automatic.  I'm not convinced this
>>> is a good idea.  The Linux kernel has a long and storied history of
>>> enabling new hardware features in ways that are almost entirely
>>> useless for userspace.
>>>
>>> Florian, do you have any thoughts on how the user/kernel interaction
>>> for the shadow stack should work?
>>
>> I have not looked at this in detail, have not played with the emulator,
>> and have not been privy to any discussions before these patches have
>> been posted, however …
>>
>> I believe that we want as little code in userspace for shadow stack
>> management as possible.  One concern I have is that even with the code
>> we arguably need for various kinds of stack unwinding, we might have
>> unwittingly built a generic trampoline that leads to full CET bypass.
> 
> I was imagining an API like "allocate a shadow stack for the current
> thread, fail if the current thread already has one, and turn on the
> shadow stack".  glibc would call clone and then call this ABI pretty
> much immediately (i.e. before making any calls from which it expects
> to return).

Ahh.  So you propose not to enable shadow stack enforcement on the new 
thread even if it is enabled for the current thread?  For the cases 
where CLONE_VM is involved?

It will still need a new assembler wrapper which sets up the shadow 
stack, and it's probably required to disable signals.

I think it should be reasonable safe and actually implementable.  But 
the benefits are not immediately obvious to me.

> We definitely want strong enough user control that tools like CRIU can
> continue to work.  I haven't looked at the SDM recently enough to
> remember for sure, but I'm reasonably confident that user code can
> learn the address of its own shadow stack.  If nothing else, CRIU
> needs to be able to restore from a context where there's a signal on
> the stack and the signal frame contains a shadow stack pointer.

CRIU also needs the shadow stack *contents*, which shouldn't be directly 
available to the process.  So it needs very special interfaces anyway.

Does CRIU implement MPX support?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08 12:24                 ` H.J. Lu
@ 2018-06-08 14:57                   ` Andy Lutomirski
  2018-06-08 15:52                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-08 14:57 UTC (permalink / raw)
  To: H. J. Lu, Cyrill Gorcunov, Dmitry Safonov
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Fri, Jun 8, 2018 at 5:24 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >>
> >
> > By the time malicious code issue its own syscalls, you've already lost
> > the battle.  I could probably be convinced that a lock-CET-on feature
> > that applies *only* to the calling thread and is not inherited by
> > clone() is a decent idea, but I'd want to see someone who understands
> > the state of the art in exploit design justify it.  You're also going
> > to need to figure out how to make CRIU work if you allow locking CET
> > on.
> >
> > A priori, I think we should just not provide a lock mechanism.
>
> We need a door for CET.  But it is a very bad idea to leave it open
> all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
> In Userspace.  Can you free any application with AVX512 on AVX512
> machine and restore it on non-AVX512 machine?

Presumably not -- if the program uses AVX512 and AVX512 goes away,
then the program won't be happy.

Anyway, having thought about this, here's a straw man proposal.  We
add a lock flag like in these patches.  The lock flag is set by
arch_prctl(), inherited on clone, and cleared on exec().  ptrace()
gains a new API to clear the lock flag and can modify the CET
configuration regardless of the lock flag.  (So ptrace() needs APIs to
read and write SSP, to read and write the shadow stack itself, and to
change the mode.)  By the time an attacker has gotten enough control
of a victim process to get it to use ptrace(), I don't think that
trying to protect CET serves any purpose.

As an aside, where are the latest CET docs?  I've found the "CET
technology preview 2.0", but it doesn't seem to be very clear or
entirely complete.

On Fri, Jun 8, 2018 at 5:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 9:35 PM, Andy Lutomirski <luto@kernel.org> wrote:

> > Is there any reason you can't use LD_CET=force to do it for
> > dynamically linked binaries?
>
> We need to enable shadow stack from the start.  Otherwise function
> return will fail when returning from callee with shadow stack to caller
> without shadow stack.

I don't see the problem.  A CET-supporting ld.so will be started with
CET on regardless of what the final binary says.  If ld.so sees
LD_CET=force, it can keep CET on regardless of the flags in the loaded
binary.

>
> > I find it quite hard to believe that forcibly CET-ifying a legacy
> > statically linked binary is a good idea.
>
> We'd like to provide protection as much as we can.
>

I agree that this is a nice sentiment, but I don't think that a simple
"force CET on next exec()" flag is a good way to accomplish this.
I've had the pleasure of using legacy binaries, and there are all
kinds of gotchas.  First, a bunch of them aren't binaries at all --
they're shell scripts.  There's big_expensive_program that starts with
#!/bin/bash and eventually execs
/opt/blahblahblah/big_expensive_program_bin, and that involves two
execs.  (Heck, even Firefox is set up more or less like this.)  Some
programs can re-exec themselves.  All of this is not to mention that
it would be really annoying when your program crashes after you've
been using it for hours because you finally triggered the code path
that did longjmp() and CET kills it.

And you don't really need kernel support for this anyway.  It should
be relatively straightforward to write a loader that opens and loads a
static binary.

I think that this entire CET-on-exec concept should be dropped from
this patch series.  If someone really wants it, make it a separate
patch on top after everything has been merged, and we can poke holes
in it them.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-08 14:53         ` Florian Weimer
@ 2018-06-08 15:01           ` Andy Lutomirski
  2018-06-08 15:50             ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-08 15:01 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>>
> >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >>>> needs a separate program stack and a separate shadow stack.
> >>>> This patch handles allocation and freeing of the thread shadow
> >>>> stack.
> >>>
> >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> >>> is a good idea.  The Linux kernel has a long and storied history of
> >>> enabling new hardware features in ways that are almost entirely
> >>> useless for userspace.
> >>>
> >>> Florian, do you have any thoughts on how the user/kernel interaction
> >>> for the shadow stack should work?
> >>
> >> I have not looked at this in detail, have not played with the emulator,
> >> and have not been privy to any discussions before these patches have
> >> been posted, however …
> >>
> >> I believe that we want as little code in userspace for shadow stack
> >> management as possible.  One concern I have is that even with the code
> >> we arguably need for various kinds of stack unwinding, we might have
> >> unwittingly built a generic trampoline that leads to full CET bypass.
> >
> > I was imagining an API like "allocate a shadow stack for the current
> > thread, fail if the current thread already has one, and turn on the
> > shadow stack".  glibc would call clone and then call this ABI pretty
> > much immediately (i.e. before making any calls from which it expects
> > to return).
>
> Ahh.  So you propose not to enable shadow stack enforcement on the new
> thread even if it is enabled for the current thread?  For the cases
> where CLONE_VM is involved?
>
> It will still need a new assembler wrapper which sets up the shadow
> stack, and it's probably required to disable signals.
>
> I think it should be reasonable safe and actually implementable.  But
> the benefits are not immediately obvious to me.

Doing it this way would have been my first incliniation.  It would
avoid all the oddities of the kernel magically creating a VMA when
clone() is called, guessing the shadow stack size, etc.  But I'm okay
with having the kernel do it automatically, too.  I think it would be
very nice to have a way for user code to find out the size of the
shadow stack and change it, though.  (And relocate it, but maybe
that's impossible.  The CET documentation doesn't have a clear
description of the shadow stack layout.)

>
> > We definitely want strong enough user control that tools like CRIU can
> > continue to work.  I haven't looked at the SDM recently enough to
> > remember for sure, but I'm reasonably confident that user code can
> > learn the address of its own shadow stack.  If nothing else, CRIU
> > needs to be able to restore from a context where there's a signal on
> > the stack and the signal frame contains a shadow stack pointer.
>
> CRIU also needs the shadow stack *contents*, which shouldn't be directly
> available to the process.  So it needs very special interfaces anyway.

True.  I proposed in a different email that ptrace() have full control
of the shadow stack (read, write, lock, unlock, etc).

>
> Does CRIU implement MPX support?

Dunno.  But given that MPX seems to be dying, I'm not sure it matters.

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
  2018-06-08 15:01           ` Andy Lutomirski
@ 2018-06-08 15:50             ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-08 15:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Fri, 2018-06-08 at 08:01 -0700, Andy Lutomirski wrote:
> On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >>>>
> > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> > >>>> needs a separate program stack and a separate shadow stack.
> > >>>> This patch handles allocation and freeing of the thread shadow
> > >>>> stack.
> > >>>
> > >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> > >>> is a good idea.  The Linux kernel has a long and storied history of
> > >>> enabling new hardware features in ways that are almost entirely
> > >>> useless for userspace.
> > >>>
> > >>> Florian, do you have any thoughts on how the user/kernel interaction
> > >>> for the shadow stack should work?
> > >>
> > >> I have not looked at this in detail, have not played with the emulator,
> > >> and have not been privy to any discussions before these patches have
> > >> been posted, however …
> > >>
> > >> I believe that we want as little code in userspace for shadow stack
> > >> management as possible.  One concern I have is that even with the code
> > >> we arguably need for various kinds of stack unwinding, we might have
> > >> unwittingly built a generic trampoline that leads to full CET bypass.
> > >
> > > I was imagining an API like "allocate a shadow stack for the current
> > > thread, fail if the current thread already has one, and turn on the
> > > shadow stack".  glibc would call clone and then call this ABI pretty
> > > much immediately (i.e. before making any calls from which it expects
> > > to return).
> >
> > Ahh.  So you propose not to enable shadow stack enforcement on the new
> > thread even if it is enabled for the current thread?  For the cases
> > where CLONE_VM is involved?
> >
> > It will still need a new assembler wrapper which sets up the shadow
> > stack, and it's probably required to disable signals.
> >
> > I think it should be reasonable safe and actually implementable.  But
> > the benefits are not immediately obvious to me.
> 
> Doing it this way would have been my first incliniation.  It would
> avoid all the oddities of the kernel magically creating a VMA when
> clone() is called, guessing the shadow stack size, etc.  But I'm okay
> with having the kernel do it automatically, too.

HJ wanted to add a arch_prctl that allocates a new shadow stack and
switches to it.  That was mainly for swapcontext.  Perhaps we can also
use that for threads?  HJ, can you comment on this?

> I think it would be
> very nice to have a way for user code to find out the size of the
> shadow stack and change it, though.  (And relocate it, but maybe
> that's impossible.  The CET documentation doesn't have a clear
> description of the shadow stack layout.)

The shadow stack is vm_mmap'ed from memory and does not have any special
layout.  We can add a arch_prctl to find out shadow stack's address and
size.

> >
> > > We definitely want strong enough user control that tools like CRIU can
> > > continue to work.  I haven't looked at the SDM recently enough to
> > > remember for sure, but I'm reasonably confident that user code can
> > > learn the address of its own shadow stack.  If nothing else, CRIU
> > > needs to be able to restore from a context where there's a signal on
> > > the stack and the signal frame contains a shadow stack pointer.
> >
> > CRIU also needs the shadow stack *contents*, which shouldn't be directly
> > available to the process.  So it needs very special interfaces anyway.
> 
> True.  I proposed in a different email that ptrace() have full control
> of the shadow stack (read, write, lock, unlock, etc).

PTRACE can do PTRACE_POKEDATA on shadow stack.  We can add lock/unlock.

> >
> > Does CRIU implement MPX support?
> 
> Dunno.  But given that MPX seems to be dying, I'm not sure it matters.
> 
> --Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-08 14:57                   ` Andy Lutomirski
@ 2018-06-08 15:52                     ` Cyrill Gorcunov
  0 siblings, 0 replies; 98+ messages in thread
From: Cyrill Gorcunov @ 2018-06-08 15:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. J. Lu, Dmitry Safonov, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Fri, Jun 08, 2018 at 07:57:22AM -0700, Andy Lutomirski wrote:
> On Fri, Jun 8, 2018 at 5:24 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > > On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > >>
> > >
> > > By the time malicious code issue its own syscalls, you've already lost
> > > the battle.  I could probably be convinced that a lock-CET-on feature
> > > that applies *only* to the calling thread and is not inherited by
> > > clone() is a decent idea, but I'd want to see someone who understands
> > > the state of the art in exploit design justify it.  You're also going
> > > to need to figure out how to make CRIU work if you allow locking CET
> > > on.
> > >
> > > A priori, I think we should just not provide a lock mechanism.
> >
> > We need a door for CET.  But it is a very bad idea to leave it open
> > all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
> > In Userspace.  Can you free any application with AVX512 on AVX512
> > machine and restore it on non-AVX512 machine?
> 
> Presumably not -- if the program uses AVX512 and AVX512 goes away,
> then the program won't be happy.

Yes. In most scenarios we require the fpu capability to be the same
on both machines (in case of migration) or/and not being changed
between c/r cycles.
...
> As an aside, where are the latest CET docs?  I've found the "CET
> technology preview 2.0", but it doesn't seem to be very clear or
> entirely complete.

+1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 16:40   ` Andy Lutomirski
  2018-06-07 16:51     ` Yu-cheng Yu
  2018-06-07 18:41     ` Peter Zijlstra
@ 2018-06-11  8:17     ` Peter Zijlstra
  2018-06-11 15:02       ` Yu-cheng Yu
  2 siblings, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2018-06-11  8:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:

> Peterz, isn't there some fancy better way we're supposed to handle the
> error return these days?

> > +       asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
> > +                    "xor %[err],%[err]\n"
> > +                    "2:\n"
> > +                    ".section .fixup,\"ax\"\n"
> > +                    "3: mov $-1,%[err]; jmp 2b\n"
> > +                    ".previous\n"
> > +                    _ASM_EXTABLE(1b, 3b)
> > +               : [err] "=a" (err)
> > +               : [val] "S" (val), [addr] "D" (addr)
> > +               : "memory");

So the alternative is something like:

__visible bool ex_handler_wuss(const struct exception_table_entry *fixup,
			       struct pt_regs *regs, int trapnr)
{
	regs->ip = ex_fixup_addr(fixup);
	regs->ax = -1L;

	return true;
}


	int err = 0;

	asm volatile("1: INSN_WUSS\n"
		     "2:\n"
		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wuss)
		     : "=a" (err)
		     : "S" (val), "D" (addr));

But I'm not at all sure that's actually better.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-11  8:17     ` Peter Zijlstra
@ 2018-06-11 15:02       ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-11 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Mon, 2018-06-11 at 10:17 +0200, Peter Zijlstra wrote:
> On Thu, Jun 07, 2018 at 09:40:02AM -0700, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> 
> > Peterz, isn't there some fancy better way we're supposed to handle the
> > error return these days?
> 
> > > +       asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
> > > +                    "xor %[err],%[err]\n"
> > > +                    "2:\n"
> > > +                    ".section .fixup,\"ax\"\n"
> > > +                    "3: mov $-1,%[err]; jmp 2b\n"
> > > +                    ".previous\n"
> > > +                    _ASM_EXTABLE(1b, 3b)
> > > +               : [err] "=a" (err)
> > > +               : [val] "S" (val), [addr] "D" (addr)
> > > +               : "memory");
> 
> So the alternative is something like:
> 
> __visible bool ex_handler_wuss(const struct exception_table_entry *fixup,
> 			       struct pt_regs *regs, int trapnr)
> {
> 	regs->ip = ex_fixup_addr(fixup);
> 	regs->ax = -1L;
> 
> 	return true;
> }
> 
> 
> 	int err = 0;
> 
> 	asm volatile("1: INSN_WUSS\n"
> 		     "2:\n"
> 		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wuss)
> 		     : "=a" (err)
> 		     : "S" (val), "D" (addr));
> 
> But I'm not at all sure that's actually better.

Thanks!  I will fix it.

Yu-cheng

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-07 22:02         ` H.J. Lu
  2018-06-07 23:01           ` Andy Lutomirski
  2018-06-08  4:22           ` H.J. Lu
@ 2018-06-12 10:03           ` Thomas Gleixner
  2018-06-12 11:43             ` H.J. Lu
  2 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2018-06-12 10:03 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Andy Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Thu, 7 Jun 2018, H.J. Lu wrote:
> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > Why is the lockout necessary?  If user code enables CET and tries to
> > run code that doesn't support CET, it will crash.  I don't see why we
> > need special code in the kernel to prevent a user program from calling
> > arch_prctl() and crashing itself.  There are already plenty of ways to
> > do that :)
> 
> On CET enabled machine, not all programs nor shared libraries are
> CET enabled.  But since ld.so is CET enabled, all programs start
> as CET enabled.  ld.so will disable CET if a program or any of its shared
> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> checking so that CET can't no longer be disabled afterwards.

That works for stuff which loads all libraries at start time, but what
happens if the program uses dlopen() later on? If CET is force locked and
the library is not CET enabled, it will fail.

I don't see the point of trying to support CET by magic. It adds complexity
and you'll never be able to handle all corner cases correctly. dlopen() is
not even a corner case.

Occasionally stuff needs to be recompiled to utilize new mechanisms, see
retpoline ...

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (9 preceding siblings ...)
  2018-06-07 14:38 ` [PATCH 10/10] mm: Prevent munmap and remap_file_pages of " Yu-cheng Yu
@ 2018-06-12 10:56 ` Balbir Singh
  2018-06-12 15:03   ` Yu-cheng Yu
  2018-06-26  2:46 ` Jann Horn
  2018-06-26  5:26 ` Andy Lutomirski
  12 siblings, 1 reply; 98+ messages in thread
From: Balbir Singh @ 2018-06-12 10:56 UTC (permalink / raw)
  To: Yu-cheng Yu, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz



On 08/06/18 00:37, Yu-cheng Yu wrote:
> This series introduces CET - Shadow stack
> 
> At the high level, shadow stack is:
> 
> 	Allocated from a task's address space with vm_flags VM_SHSTK;
> 	Its PTEs must be read-only and dirty;
> 	Fixed sized, but the default size can be changed by sys admin.
> 
> For a forked child, the shadow stack is duplicated when the next
> shadow stack access takes place.
> 
> For a pthread child, a new shadow stack is allocated.
> 
> The signal handler uses the same shadow stack as the main program.
> 

Even with sigaltstack()?


Balbir Singh.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 10:03           ` Thomas Gleixner
@ 2018-06-12 11:43             ` H.J. Lu
  2018-06-12 16:01               ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-12 11:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 7 Jun 2018, H.J. Lu wrote:
>> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > Why is the lockout necessary?  If user code enables CET and tries to
>> > run code that doesn't support CET, it will crash.  I don't see why we
>> > need special code in the kernel to prevent a user program from calling
>> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> > do that :)
>>
>> On CET enabled machine, not all programs nor shared libraries are
>> CET enabled.  But since ld.so is CET enabled, all programs start
>> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> checking so that CET can't no longer be disabled afterwards.
>
> That works for stuff which loads all libraries at start time, but what
> happens if the program uses dlopen() later on? If CET is force locked and
> the library is not CET enabled, it will fail.

That is to prevent disabling CET by dlopening a legacy shared library.

> I don't see the point of trying to support CET by magic. It adds complexity
> and you'll never be able to handle all corner cases correctly. dlopen() is
> not even a corner case.

That is a price we pay for security.  To enable CET, especially shadow
shack, the program and all of shared libraries it uses should be CET
enabled.  Most of programs can be enabled with CET by compiling them
with -fcf-protection.

> Occasionally stuff needs to be recompiled to utilize new mechanisms, see
> retpoline ...
>
> Thanks,
>
>         tglx
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
  2018-06-07 16:37   ` Andy Lutomirski
@ 2018-06-12 11:56   ` Balbir Singh
  2018-06-12 15:03     ` Yu-cheng Yu
  1 sibling, 1 reply; 98+ messages in thread
From: Balbir Singh @ 2018-06-12 11:56 UTC (permalink / raw)
  To: Yu-cheng Yu, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz



On 08/06/18 00:37, Yu-cheng Yu wrote:
> This patch adds basic shadow stack enabling/disabling routines.
> A task's shadow stack is allocated from memory with VM_SHSTK
> flag set and read-only protection.  The shadow stack is
> allocated to a fixed size and that can be changed by the system
> admin.
> 

I presume a read-only permission on the kernel side, but it
can be written by hardware?

> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/cet.h               |  32 ++++++++
>  arch/x86/include/asm/disabled-features.h |   8 +-
>  arch/x86/include/asm/msr-index.h         |  14 ++++
>  arch/x86/include/asm/processor.h         |   5 ++
>  arch/x86/kernel/Makefile                 |   2 +
>  arch/x86/kernel/cet.c                    | 123 +++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/common.c             |  24 ++++++
>  arch/x86/kernel/process.c                |   2 +
>  fs/proc/task_mmu.c                       |   3 +
>  9 files changed, 212 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/cet.h
>  create mode 100644 arch/x86/kernel/cet.c
> 
> diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
> new file mode 100644
> index 000000000000..9d5bc1efc9b7
> --- /dev/null
> +++ b/arch/x86/include/asm/cet.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_CET_H
> +#define _ASM_X86_CET_H
> +
> +#ifndef __ASSEMBLY__
> +#include <linux/types.h>
> +
> +struct task_struct;
> +/*
> + * Per-thread CET status
> + */
> +struct cet_stat {

stat sounds like statistics, just expand out to status please

> +	unsigned long	shstk_base;
> +	unsigned long	shstk_size;
> +	unsigned int	shstk_enabled:1;
> +};
> +
> +#ifdef CONFIG_X86_INTEL_CET
> +unsigned long cet_get_shstk_ptr(void);

For the current task? Why does _ptr routine return an unsigned long?

> +int cet_setup_shstk(void);
> +void cet_disable_shstk(void);
> +void cet_disable_free_shstk(struct task_struct *p);
> +#else
> +static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
> +static inline int cet_setup_shstk(void) { return 0; }
> +static inline void cet_disable_shstk(void) {}
> +static inline void cet_disable_free_shstk(struct task_struct *p) {}
> +#endif
> +
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* _ASM_X86_CET_H */
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 33833d1909af..3624a11e5ba6 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -56,6 +56,12 @@
>  # define DISABLE_PTI		(1 << (X86_FEATURE_PTI & 31))
>  #endif
>  
> +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
> +#define DISABLE_SHSTK	0
> +#else
> +#define DISABLE_SHSTK	(1<<(X86_FEATURE_SHSTK & 31))
> +#endif
> +
>  /*
>   * Make sure to add features to the correct mask
>   */
> @@ -75,7 +81,7 @@
>  #define DISABLED_MASK13	0
>  #define DISABLED_MASK14	0
>  #define DISABLED_MASK15	0
> -#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
> +#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
>  #define DISABLED_MASK17	0
>  #define DISABLED_MASK18	0
>  #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index fda2114197b3..428d13828ba9 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -770,4 +770,18 @@
>  #define MSR_VM_IGNNE                    0xc0010115
>  #define MSR_VM_HSAVE_PA                 0xc0010117
>  
> +/* Control-flow Enforcement Technology MSRs */
> +#define MSR_IA32_U_CET		0x6a0
> +#define MSR_IA32_S_CET		0x6a2
> +#define MSR_IA32_PL0_SSP	0x6a4
> +#define MSR_IA32_PL3_SSP	0x6a7
> +#define MSR_IA32_INT_SSP_TAB	0x6a8

some comments on the purpose of the MSR would be nice

> +
> +/* MSR_IA32_U_CET and MSR_IA32_S_CET bits */
> +#define MSR_IA32_CET_SHSTK_EN		0x0000000000000001
> +#define MSR_IA32_CET_WRSS_EN		0x0000000000000002
> +#define MSR_IA32_CET_ENDBR_EN		0x0000000000000004
> +#define MSR_IA32_CET_LEG_IW_EN		0x0000000000000008
> +#define MSR_IA32_CET_NO_TRACK_EN	0x0000000000000010
> +

Same as above

>  #endif /* _ASM_X86_MSR_INDEX_H */
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 21a114914ba4..e632dd7adaac 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -24,6 +24,7 @@ struct vm86;
>  #include <asm/special_insns.h>
>  #include <asm/fpu/types.h>
>  #include <asm/unwind_hints.h>
> +#include <asm/cet.h>
>  
>  #include <linux/personality.h>
>  #include <linux/cache.h>
> @@ -507,6 +508,10 @@ struct thread_struct {
>  	unsigned int		sig_on_uaccess_err:1;
>  	unsigned int		uaccess_err:1;	/* uaccess failed */
>  
> +#ifdef CONFIG_X86_INTEL_CET
> +	struct cet_stat		cet;
> +#endif
> +
>  	/* Floating point and extended processor state */
>  	struct fpu		fpu;
>  	/*
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..7ea5e099d558 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -138,6 +138,8 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
>  obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
>  obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
>  
> +obj-$(CONFIG_X86_INTEL_CET)		+= cet.o
> +
>  ###
>  # 64 bit specific files
>  ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
> new file mode 100644
> index 000000000000..8abbfd44322a
> --- /dev/null
> +++ b/arch/x86/kernel/cet.c
> @@ -0,0 +1,123 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * cet.c - Control Flow Enforcement (CET)
> + *
> + * Copyright (c) 2018, Intel Corporation.
> + * Yu-cheng Yu <yu-cheng.yu@intel.com>
> + */
> +
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/sched/signal.h>
> +#include <asm/msr.h>
> +#include <asm/user.h>
> +#include <asm/fpu/xstate.h>
> +#include <asm/fpu/types.h>
> +#include <asm/cet.h>
> +
> +#define SHSTK_SIZE (0x8000 * (test_thread_flag(TIF_IA32) ? 4 : 8))
> +
> +static inline int cet_set_shstk_ptr(unsigned long addr)
> +{
> +	u64 r;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		return -1;
> +
> +	if ((addr >= TASK_SIZE) || (!IS_ALIGNED(addr, 4)))
> +		return -1;

I think there was a comment about this being TASK_SIZE_MAX

> +
> +	rdmsrl(MSR_IA32_U_CET, r);
> +	wrmsrl(MSR_IA32_U_CET, r | MSR_IA32_CET_SHSTK_EN);
> +	wrmsrl(MSR_IA32_PL3_SSP, addr);

Should the enable happen before setting addr? I would expect to do this in the opposite order.

> +	return 0;
> +}
> +
> +unsigned long cet_get_shstk_ptr(void)
> +{
> +	unsigned long ptr;
> +
> +	if (!current->thread.cet.shstk_enabled)
> +		return 0;
> +
> +	rdmsrl(MSR_IA32_PL3_SSP, ptr);
> +	return ptr;
> +}
> +
> +static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
> +{
> +	struct mm_struct *mm = current->mm;
> +	unsigned long populate;
> +
> +	down_write(&mm->mmap_sem);
> +	addr = do_mmap(NULL, addr, len, PROT_READ,
> +		       MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
> +		       0, &populate, NULL);
> +	up_write(&mm->mmap_sem);

What happens if the mmap fails for any reason? I presume the caller disables shadow stack on this process?

> +
> +	if (populate)
> +		mm_populate(addr, populate);
> +
> +	return addr;
> +}
> +
> +int cet_setup_shstk(void)
> +{
> +	unsigned long addr, size;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		return -EOPNOTSUPP;
> +
> +	size = SHSTK_SIZE;
> +	addr = shstk_mmap(0, size);
> +
> +	if (addr >= TASK_SIZE)
> +		return -ENOMEM;
> +

TASK_SIZE_MAX?

> +	cet_set_shstk_ptr(addr + size - sizeof(void *));
> +	current->thread.cet.shstk_base = addr;
> +	current->thread.cet.shstk_size = size;
> +	current->thread.cet.shstk_enabled = 1;
> +	return 0;
> +}
> +
> +void cet_disable_shstk(void)
> +{
> +	u64 r;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		return;
> +
> +	rdmsrl(MSR_IA32_U_CET, r);
> +	r &= ~(MSR_IA32_CET_SHSTK_EN);
> +	wrmsrl(MSR_IA32_U_CET, r);
> +	wrmsrl(MSR_IA32_PL3_SSP, 0);

Again, I'd expect the order to be the reverse

> +	current->thread.cet.shstk_enabled = 0;
> +}
> +
> +void cet_disable_free_shstk(struct task_struct *tsk)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> +	    !tsk->thread.cet.shstk_enabled)
> +		return;
> +
> +	if (tsk == current)
> +		cet_disable_shstk();
> +
> +	/*
> +	 * Free only when tsk is current or shares mm
> +	 * with current but has its own shstk.
> +	 */
> +	if (tsk->mm && (tsk->mm == current->mm) &&
> +	    (tsk->thread.cet.shstk_base)) {

Does the caller hold a reference to tsk->mm?

> +		vm_munmap(tsk->thread.cet.shstk_base,
> +			  tsk->thread.cet.shstk_size);
> +		tsk->thread.cet.shstk_base = 0;
> +		tsk->thread.cet.shstk_size = 0;
> +	}
> +
> +	tsk->thread.cet.shstk_enabled = 0;
> +}
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 38276f58d3bf..f54fabdaef60 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -401,6 +401,29 @@ static __init int setup_disable_pku(char *arg)
>  __setup("nopku", setup_disable_pku);
>  #endif /* CONFIG_X86_64 */
>  
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> +	if (cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		cr4_set_bits(X86_CR4_CET);
> +}
> +
> +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
> +static __init int setup_disable_shstk(char *s)
> +{
> +	/* require an exact match without trailing characters */
> +	if (strlen(s))
> +		return 0;
> +
> +	if (!boot_cpu_has(X86_FEATURE_SHSTK))
> +		return 1;
> +
> +	setup_clear_cpu_cap(X86_FEATURE_SHSTK);
> +	pr_info("x86: 'noshstk' specified, disabling Shadow Stack\n");
> +	return 1;
> +}
> +__setup("noshstk", setup_disable_shstk);
> +#endif
> +
>  /*
>   * Some CPU features depend on higher CPUID levels, which may not always
>   * be available due to CPUID level capping or broken virtualization
> @@ -1313,6 +1336,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>  	x86_init_rdrand(c);
>  	x86_init_cache_qos(c);
>  	setup_pku(c);
> +	setup_cet(c);
>  
>  	/*
>  	 * Clear/Set all flags overridden by options, need do it
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 30ca2d1a9231..b3b0b482983a 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -39,6 +39,7 @@
>  #include <asm/desc.h>
>  #include <asm/prctl.h>
>  #include <asm/spec-ctrl.h>
> +#include <asm/cet.h>
>  
>  /*
>   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
> @@ -136,6 +137,7 @@ void flush_thread(void)
>  	flush_ptrace_hw_breakpoint(tsk);
>  	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
>  
> +	cet_disable_shstk();
>  	fpu__clear(&tsk->thread.fpu);
>  }
>  
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index c486ad4b43f0..6aca93ecec0e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  		[ilog2(VM_PKEY_BIT1)]	= "",
>  		[ilog2(VM_PKEY_BIT2)]	= "",
>  		[ilog2(VM_PKEY_BIT3)]	= "",
> +#endif
> +#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
> +		[ilog2(VM_SHSTK)]	= "ss"
>  #endif
>  	};
>  	size_t i;
> 

Balbir Singh.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/10] x86/cet: User-mode shadow stack support
  2018-06-12 11:56   ` Balbir Singh
@ 2018-06-12 15:03     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-12 15:03 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Tue, 2018-06-12 at 21:56 +1000, Balbir Singh wrote:
> 
> On 08/06/18 00:37, Yu-cheng Yu wrote:
> > This patch adds basic shadow stack enabling/disabling routines.
> > A task's shadow stack is allocated from memory with VM_SHSTK
> > flag set and read-only protection.  The shadow stack is
> > allocated to a fixed size and that can be changed by the system
> > admin.
> > 
> 
> I presume a read-only permission on the kernel side, but it
> can be written by hardware?

Yes, the shadow stack is written by the processor when a call
instruction is executed.

...

> > 
> > diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
> > new file mode 100644
> > index 000000000000..9d5bc1efc9b7
> > --- /dev/null
> > +++ b/arch/x86/include/asm/cet.h
> > @@ -0,0 +1,32 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_CET_H
> > +#define _ASM_X86_CET_H
> > +
> > +#ifndef __ASSEMBLY__
> > +#include <linux/types.h>
> > +
> > +struct task_struct;
> > +/*
> > + * Per-thread CET status
> > + */
> > +struct cet_stat {
> 
> stat sounds like statistics, just expand out to status please

I will make it 'cet_status'.

> > +	unsigned long	shstk_base;
> > +	unsigned long	shstk_size;
> > +	unsigned int	shstk_enabled:1;
> > +};
> > +
> > +#ifdef CONFIG_X86_INTEL_CET
> > +unsigned long cet_get_shstk_ptr(void);
> 
> For the current task? Why does _ptr routine return an unsigned long?

What about cet_get_shstk_addr()?

...

> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> > index fda2114197b3..428d13828ba9 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -770,4 +770,18 @@
> >  #define MSR_VM_IGNNE                    0xc0010115
> >  #define MSR_VM_HSAVE_PA                 0xc0010117
> >  
> > +/* Control-flow Enforcement Technology MSRs */
> > +#define MSR_IA32_U_CET		0x6a0
> > +#define MSR_IA32_S_CET		0x6a2
> > +#define MSR_IA32_PL0_SSP	0x6a4
> > +#define MSR_IA32_PL3_SSP	0x6a7
> > +#define MSR_IA32_INT_SSP_TAB	0x6a8
> 
> some comments on the purpose of the MSR would be nice

Sure.

...

> 
> I think there was a comment about this being TASK_SIZE_MAX
> 
> > +
> > +	rdmsrl(MSR_IA32_U_CET, r);
> > +	wrmsrl(MSR_IA32_U_CET, r | MSR_IA32_CET_SHSTK_EN);
> > +	wrmsrl(MSR_IA32_PL3_SSP, addr);
> 
> Should the enable happen before setting addr? I would expect to do this in the opposite order.

I will check.

> > +	return 0;
> > +}
> > +
> > +unsigned long cet_get_shstk_ptr(void)
> > +{
> > +	unsigned long ptr;
> > +
> > +	if (!current->thread.cet.shstk_enabled)
> > +		return 0;
> > +
> > +	rdmsrl(MSR_IA32_PL3_SSP, ptr);
> > +	return ptr;
> > +}
> > +
> > +static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	unsigned long populate;
> > +
> > +	down_write(&mm->mmap_sem);
> > +	addr = do_mmap(NULL, addr, len, PROT_READ,
> > +		       MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
> > +		       0, &populate, NULL);
> > +	up_write(&mm->mmap_sem);
> 
> What happens if the mmap fails for any reason? I presume the caller disables shadow stack on this process?

This is from exec(), and that fails.

> > +
> > +	if (populate)
> > +		mm_populate(addr, populate);
> > +
> > +	return addr;
> > +}
> > +
> > +int cet_setup_shstk(void)
> > +{
> > +	unsigned long addr, size;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +		return -EOPNOTSUPP;
> > +
> > +	size = SHSTK_SIZE;
> > +	addr = shstk_mmap(0, size);
> > +
> > +	if (addr >= TASK_SIZE)
> > +		return -ENOMEM;
> > +
> 
> TASK_SIZE_MAX?

Yes.

> 
> > +	cet_set_shstk_ptr(addr + size - sizeof(void *));
> > +	current->thread.cet.shstk_base = addr;
> > +	current->thread.cet.shstk_size = size;
> > +	current->thread.cet.shstk_enabled = 1;
> > +	return 0;
> > +}
> > +
> > +void cet_disable_shstk(void)
> > +{
> > +	u64 r;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +		return;
> > +
> > +	rdmsrl(MSR_IA32_U_CET, r);
> > +	r &= ~(MSR_IA32_CET_SHSTK_EN);
> > +	wrmsrl(MSR_IA32_U_CET, r);
> > +	wrmsrl(MSR_IA32_PL3_SSP, 0);
> 
> Again, I'd expect the order to be the reverse
> 
> > +	current->thread.cet.shstk_enabled = 0;
> > +}
> > +
> > +void cet_disable_free_shstk(struct task_struct *tsk)
> > +{
> > +	if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> > +	    !tsk->thread.cet.shstk_enabled)
> > +		return;
> > +
> > +	if (tsk == current)
> > +		cet_disable_shstk();
> > +
> > +	/*
> > +	 * Free only when tsk is current or shares mm
> > +	 * with current but has its own shstk.
> > +	 */
> > +	if (tsk->mm && (tsk->mm == current->mm) &&
> > +	    (tsk->thread.cet.shstk_base)) {
> 
> Does the caller hold a reference to tsk->mm?

If (tsk->mm == current->mm), i.e. it is current or it is a pthread of
current, then yes.

Yu-cheng


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 10:56 ` [PATCH 00/10] Control Flow Enforcement - Part (3) Balbir Singh
@ 2018-06-12 15:03   ` Yu-cheng Yu
  2018-06-12 16:00     ` Andy Lutomirski
  2018-06-14  1:07     ` Balbir Singh
  0 siblings, 2 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-12 15:03 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> 
> On 08/06/18 00:37, Yu-cheng Yu wrote:
> > This series introduces CET - Shadow stack
> > 
> > At the high level, shadow stack is:
> > 
> > 	Allocated from a task's address space with vm_flags VM_SHSTK;
> > 	Its PTEs must be read-only and dirty;
> > 	Fixed sized, but the default size can be changed by sys admin.
> > 
> > For a forked child, the shadow stack is duplicated when the next
> > shadow stack access takes place.
> > 
> > For a pthread child, a new shadow stack is allocated.
> > 
> > The signal handler uses the same shadow stack as the main program.
> > 
> 
> Even with sigaltstack()?
> 
> 
> Balbir Singh.

Yes.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 15:03   ` Yu-cheng Yu
@ 2018-06-12 16:00     ` Andy Lutomirski
  2018-06-12 16:21       ` Yu-cheng Yu
  2018-06-14  1:07     ` Balbir Singh
  1 sibling, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-12 16:00 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: bsingharora, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 8:06 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> >
> > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > This series introduces CET - Shadow stack
> > >
> > > At the high level, shadow stack is:
> > >
> > >     Allocated from a task's address space with vm_flags VM_SHSTK;
> > >     Its PTEs must be read-only and dirty;
> > >     Fixed sized, but the default size can be changed by sys admin.
> > >
> > > For a forked child, the shadow stack is duplicated when the next
> > > shadow stack access takes place.
> > >
> > > For a pthread child, a new shadow stack is allocated.
> > >
> > > The signal handler uses the same shadow stack as the main program.
> > >
> >
> > Even with sigaltstack()?
> >
> >
> > Balbir Singh.
>
> Yes.
>

I think we're going to need some provision to add an alternate signal
stack to handle the case where the shadow stack overflows.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 11:43             ` H.J. Lu
@ 2018-06-12 16:01               ` Andy Lutomirski
  2018-06-12 16:05                 ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-12 16:01 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Thomas Gleixner, Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Thu, 7 Jun 2018, H.J. Lu wrote:
> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > Why is the lockout necessary?  If user code enables CET and tries to
> >> > run code that doesn't support CET, it will crash.  I don't see why we
> >> > need special code in the kernel to prevent a user program from calling
> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
> >> > do that :)
> >>
> >> On CET enabled machine, not all programs nor shared libraries are
> >> CET enabled.  But since ld.so is CET enabled, all programs start
> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> >> checking so that CET can't no longer be disabled afterwards.
> >
> > That works for stuff which loads all libraries at start time, but what
> > happens if the program uses dlopen() later on? If CET is force locked and
> > the library is not CET enabled, it will fail.
>
> That is to prevent disabling CET by dlopening a legacy shared library.
>
> > I don't see the point of trying to support CET by magic. It adds complexity
> > and you'll never be able to handle all corner cases correctly. dlopen() is
> > not even a corner case.
>
> That is a price we pay for security.  To enable CET, especially shadow
> shack, the program and all of shared libraries it uses should be CET
> enabled.  Most of programs can be enabled with CET by compiling them
> with -fcf-protection.

If you charge too high a price for security, people may turn it off.
I think we're going to need a mode where a program says "I want to use
the CET, but turn it off if I dlopen an unsupported library".  There
are programs that load binary-only plugins.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 16:01               ` Andy Lutomirski
@ 2018-06-12 16:05                 ` H.J. Lu
  2018-06-12 16:34                   ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-12 16:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Thu, 7 Jun 2018, H.J. Lu wrote:
>> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> > Why is the lockout necessary?  If user code enables CET and tries to
>> >> > run code that doesn't support CET, it will crash.  I don't see why we
>> >> > need special code in the kernel to prevent a user program from calling
>> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> >> > do that :)
>> >>
>> >> On CET enabled machine, not all programs nor shared libraries are
>> >> CET enabled.  But since ld.so is CET enabled, all programs start
>> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> >> checking so that CET can't no longer be disabled afterwards.
>> >
>> > That works for stuff which loads all libraries at start time, but what
>> > happens if the program uses dlopen() later on? If CET is force locked and
>> > the library is not CET enabled, it will fail.
>>
>> That is to prevent disabling CET by dlopening a legacy shared library.
>>
>> > I don't see the point of trying to support CET by magic. It adds complexity
>> > and you'll never be able to handle all corner cases correctly. dlopen() is
>> > not even a corner case.
>>
>> That is a price we pay for security.  To enable CET, especially shadow
>> shack, the program and all of shared libraries it uses should be CET
>> enabled.  Most of programs can be enabled with CET by compiling them
>> with -fcf-protection.
>
> If you charge too high a price for security, people may turn it off.
> I think we're going to need a mode where a program says "I want to use
> the CET, but turn it off if I dlopen an unsupported library".  There
> are programs that load binary-only plugins.

You can do

# export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK

which turns off shadow stack.


-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 16:00     ` Andy Lutomirski
@ 2018-06-12 16:21       ` Yu-cheng Yu
  2018-06-12 16:31         ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-12 16:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: bsingharora, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, 2018-06-12 at 09:00 -0700, Andy Lutomirski wrote:
> On Tue, Jun 12, 2018 at 8:06 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > >
> > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > This series introduces CET - Shadow stack
> > > >
> > > > At the high level, shadow stack is:
> > > >
> > > >     Allocated from a task's address space with vm_flags VM_SHSTK;
> > > >     Its PTEs must be read-only and dirty;
> > > >     Fixed sized, but the default size can be changed by sys admin.
> > > >
> > > > For a forked child, the shadow stack is duplicated when the next
> > > > shadow stack access takes place.
> > > >
> > > > For a pthread child, a new shadow stack is allocated.
> > > >
> > > > The signal handler uses the same shadow stack as the main program.
> > > >
> > >
> > > Even with sigaltstack()?
> > >
> > >
> > > Balbir Singh.
> >
> > Yes.
> >
> 
> I think we're going to need some provision to add an alternate signal
> stack to handle the case where the shadow stack overflows.

The shadow stack stores only return addresses; its consumption will not
exceed a percentage of (program stack size + sigaltstack size) before
those overflow.  When that happens, there is usually very little we can
do.  So we set a default shadow stack size that supports certain nested
calls and allow sys admin to adjust it.



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 16:21       ` Yu-cheng Yu
@ 2018-06-12 16:31         ` Andy Lutomirski
  2018-06-12 17:24           ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-12 16:31 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Andrew Lutomirski, bsingharora, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 9:24 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> On Tue, 2018-06-12 at 09:00 -0700, Andy Lutomirski wrote:
> > On Tue, Jun 12, 2018 at 8:06 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > >
> > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > This series introduces CET - Shadow stack
> > > > >
> > > > > At the high level, shadow stack is:
> > > > >
> > > > >     Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > >     Its PTEs must be read-only and dirty;
> > > > >     Fixed sized, but the default size can be changed by sys admin.
> > > > >
> > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > shadow stack access takes place.
> > > > >
> > > > > For a pthread child, a new shadow stack is allocated.
> > > > >
> > > > > The signal handler uses the same shadow stack as the main program.
> > > > >
> > > >
> > > > Even with sigaltstack()?
> > > >
> > > >
> > > > Balbir Singh.
> > >
> > > Yes.
> > >
> >
> > I think we're going to need some provision to add an alternate signal
> > stack to handle the case where the shadow stack overflows.
>
> The shadow stack stores only return addresses; its consumption will not
> exceed a percentage of (program stack size + sigaltstack size) before
> those overflow.  When that happens, there is usually very little we can
> do.  So we set a default shadow stack size that supports certain nested
> calls and allow sys admin to adjust it.
>

Of course there's something you can do: add a sigaltstack-like stack
switching mechanism.  Have a reserve shadow stack and, when a signal
is delivered (possibly guarded by other conditions like "did the
shadow stack overflow"), switch to a new shadow stack and maybe write
a special token to the new shadow stack that says "signal delivery
jumped here and will restore to the previous shadow stack and
such-and-such address on return".

Also, I have a couple of other questions after reading the
documentation some more:

1. Why on Earth does INCSSP only take an 8-bit number of frames to
skip?  It seems to me that code that calls setjmp() and then calls
longjmp() while nested more than 256 function call levels will crash.

2. The mnemonic RSTORSSP makes no sense to me.  RSTORSSP is a stack
*switch* operation not a stack *restore* operation, unless I'm
seriously misunderstanding.

3. Is there anything resembling clear documentation of the format of
the shadow stack?  That is, what types of values might be found on the
shadow stack and what do they all mean?

4. Usually Intel doesn't submit upstream Linux patches for ISA
extensions until the ISA is documented for real.  CET does not appear
to be documented for real.  Could Intel kindly release something that
at least claims to be authoritative documentation?

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 16:05                 ` H.J. Lu
@ 2018-06-12 16:34                   ` Andy Lutomirski
  2018-06-12 16:51                     ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-12 16:34 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Thomas Gleixner, Yu-cheng Yu, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> > On Thu, 7 Jun 2018, H.J. Lu wrote:
> >> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> >> > Why is the lockout necessary?  If user code enables CET and tries to
> >> >> > run code that doesn't support CET, it will crash.  I don't see why we
> >> >> > need special code in the kernel to prevent a user program from calling
> >> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
> >> >> > do that :)
> >> >>
> >> >> On CET enabled machine, not all programs nor shared libraries are
> >> >> CET enabled.  But since ld.so is CET enabled, all programs start
> >> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
> >> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> >> >> checking so that CET can't no longer be disabled afterwards.
> >> >
> >> > That works for stuff which loads all libraries at start time, but what
> >> > happens if the program uses dlopen() later on? If CET is force locked and
> >> > the library is not CET enabled, it will fail.
> >>
> >> That is to prevent disabling CET by dlopening a legacy shared library.
> >>
> >> > I don't see the point of trying to support CET by magic. It adds complexity
> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
> >> > not even a corner case.
> >>
> >> That is a price we pay for security.  To enable CET, especially shadow
> >> shack, the program and all of shared libraries it uses should be CET
> >> enabled.  Most of programs can be enabled with CET by compiling them
> >> with -fcf-protection.
> >
> > If you charge too high a price for security, people may turn it off.
> > I think we're going to need a mode where a program says "I want to use
> > the CET, but turn it off if I dlopen an unsupported library".  There
> > are programs that load binary-only plugins.
>
> You can do
>
> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
>
> which turns off shadow stack.
>

Which exactly illustrates my point.  By making your security story too
absolute, you'll force people to turn it off when they don't need to.
If I'm using a fully CET-ified distro and I'm using a CET-aware
program that loads binary plugins, and I may or may not have an old
(binary-only, perhaps) plugin that doesn't support CET, then the
behavior I want is for CET to be on until I dlopen() a program that
doesn't support it.  Unless there's some ABI reason why that can't be
done, but I don't think there is.

I'm concerned that the entire concept of locking CET is there to solve
a security problem that doesn't actually exist.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 16:34                   ` Andy Lutomirski
@ 2018-06-12 16:51                     ` H.J. Lu
  2018-06-12 18:59                       ` Thomas Gleixner
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-12 16:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>
>> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> > On Thu, 7 Jun 2018, H.J. Lu wrote:
>> >> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> >> > Why is the lockout necessary?  If user code enables CET and tries to
>> >> >> > run code that doesn't support CET, it will crash.  I don't see why we
>> >> >> > need special code in the kernel to prevent a user program from calling
>> >> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> >> >> > do that :)
>> >> >>
>> >> >> On CET enabled machine, not all programs nor shared libraries are
>> >> >> CET enabled.  But since ld.so is CET enabled, all programs start
>> >> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> >> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> >> >> checking so that CET can't no longer be disabled afterwards.
>> >> >
>> >> > That works for stuff which loads all libraries at start time, but what
>> >> > happens if the program uses dlopen() later on? If CET is force locked and
>> >> > the library is not CET enabled, it will fail.
>> >>
>> >> That is to prevent disabling CET by dlopening a legacy shared library.
>> >>
>> >> > I don't see the point of trying to support CET by magic. It adds complexity
>> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
>> >> > not even a corner case.
>> >>
>> >> That is a price we pay for security.  To enable CET, especially shadow
>> >> shack, the program and all of shared libraries it uses should be CET
>> >> enabled.  Most of programs can be enabled with CET by compiling them
>> >> with -fcf-protection.
>> >
>> > If you charge too high a price for security, people may turn it off.
>> > I think we're going to need a mode where a program says "I want to use
>> > the CET, but turn it off if I dlopen an unsupported library".  There
>> > are programs that load binary-only plugins.
>>
>> You can do
>>
>> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
>>
>> which turns off shadow stack.
>>
>
> Which exactly illustrates my point.  By making your security story too
> absolute, you'll force people to turn it off when they don't need to.
> If I'm using a fully CET-ified distro and I'm using a CET-aware
> program that loads binary plugins, and I may or may not have an old
> (binary-only, perhaps) plugin that doesn't support CET, then the
> behavior I want is for CET to be on until I dlopen() a program that
> doesn't support it.  Unless there's some ABI reason why that can't be
> done, but I don't think there is.

We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
shared object is disallowed when CET is enabled.

> I'm concerned that the entire concept of locking CET is there to solve
> a security problem that doesn't actually exist.

We don't know that.


-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 16:31         ` Andy Lutomirski
@ 2018-06-12 17:24           ` Yu-cheng Yu
  2018-06-12 20:15             ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-12 17:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: bsingharora, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, 2018-06-12 at 09:31 -0700, Andy Lutomirski wrote:
> On Tue, Jun 12, 2018 at 9:24 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >
> > On Tue, 2018-06-12 at 09:00 -0700, Andy Lutomirski wrote:
> > > On Tue, Jun 12, 2018 at 8:06 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > >
> > > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > > >
> > > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > > This series introduces CET - Shadow stack
> > > > > >
> > > > > > At the high level, shadow stack is:
> > > > > >
> > > > > >     Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > > >     Its PTEs must be read-only and dirty;
> > > > > >     Fixed sized, but the default size can be changed by sys admin.
> > > > > >
> > > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > > shadow stack access takes place.
> > > > > >
> > > > > > For a pthread child, a new shadow stack is allocated.
> > > > > >
> > > > > > The signal handler uses the same shadow stack as the main program.
> > > > > >
> > > > >
> > > > > Even with sigaltstack()?
> > > > >
> > > > >
> > > > > Balbir Singh.
> > > >
> > > > Yes.
> > > >
> > >
> > > I think we're going to need some provision to add an alternate signal
> > > stack to handle the case where the shadow stack overflows.
> >
> > The shadow stack stores only return addresses; its consumption will not
> > exceed a percentage of (program stack size + sigaltstack size) before
> > those overflow.  When that happens, there is usually very little we can
> > do.  So we set a default shadow stack size that supports certain nested
> > calls and allow sys admin to adjust it.
> >
> 
> Of course there's something you can do: add a sigaltstack-like stack
> switching mechanism.  Have a reserve shadow stack and, when a signal
> is delivered (possibly guarded by other conditions like "did the
> shadow stack overflow"), switch to a new shadow stack and maybe write
> a special token to the new shadow stack that says "signal delivery
> jumped here and will restore to the previous shadow stack and
> such-and-such address on return".

If (shstk size == (stack size + sigaltstack size)), then shstk will not
overflow before program stack overflows and sigaltstack also overflows.

Let me think about this.

> Also, I have a couple of other questions after reading the
> documentation some more:
> 
> 1. Why on Earth does INCSSP only take an 8-bit number of frames to
> skip?  It seems to me that code that calls setjmp() and then calls
> longjmp() while nested more than 256 function call levels will crash.

GLIBC takes care of more than 256 functions calls.

> 2. The mnemonic RSTORSSP makes no sense to me.  RSTORSSP is a stack
> *switch* operation not a stack *restore* operation, unless I'm
> seriously misunderstanding.

The intention is to switch shadow stacks with tokens.  RSTORSSP restores
to a previous shadow stack address from a restore token.

> 3. Is there anything resembling clear documentation of the format of
> the shadow stack?  That is, what types of values might be found on the
> shadow stack and what do they all mean?

Only return addresses and restore tokens can be on a user-mode shadow
stack.  The restore token has the incoming shadow stack address plus one
bit indicating 64/32-bit mode.

I will put this into Documentation/x86/intel_cet.txt.

> 
> 4. Usually Intel doesn't submit upstream Linux patches for ISA
> extensions until the ISA is documented for real.  CET does not appear
> to be documented for real.  Could Intel kindly release something that
> at least claims to be authoritative documentation?
> 
> --Andy



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 16:51                     ` H.J. Lu
@ 2018-06-12 18:59                       ` Thomas Gleixner
  2018-06-12 19:34                         ` H.J. Lu
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2018-06-12 18:59 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Andy Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, 12 Jun 2018, H.J. Lu wrote:
> On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> >> > That works for stuff which loads all libraries at start time, but what
> >> >> > happens if the program uses dlopen() later on? If CET is force locked and
> >> >> > the library is not CET enabled, it will fail.
> >> >>
> >> >> That is to prevent disabling CET by dlopening a legacy shared library.
> >> >>
> >> >> > I don't see the point of trying to support CET by magic. It adds complexity
> >> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
> >> >> > not even a corner case.
> >> >>
> >> >> That is a price we pay for security.  To enable CET, especially shadow
> >> >> shack, the program and all of shared libraries it uses should be CET
> >> >> enabled.  Most of programs can be enabled with CET by compiling them
> >> >> with -fcf-protection.
> >> >
> >> > If you charge too high a price for security, people may turn it off.
> >> > I think we're going to need a mode where a program says "I want to use
> >> > the CET, but turn it off if I dlopen an unsupported library".  There
> >> > are programs that load binary-only plugins.
> >>
> >> You can do
> >>
> >> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
> >>
> >> which turns off shadow stack.
> >>
> >
> > Which exactly illustrates my point.  By making your security story too
> > absolute, you'll force people to turn it off when they don't need to.
> > If I'm using a fully CET-ified distro and I'm using a CET-aware
> > program that loads binary plugins, and I may or may not have an old
> > (binary-only, perhaps) plugin that doesn't support CET, then the
> > behavior I want is for CET to be on until I dlopen() a program that
> > doesn't support it.  Unless there's some ABI reason why that can't be
> > done, but I don't think there is.
> 
> We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
> shared object is disallowed when CET is enabled.

That's a bad idea. Stuff has launchers which people might not be able to
change. So they will simply turn of CET completely or it makes them hack
horrible crap into init, e.g. the above export.

Give them sane kernel options:

     cet = off, relaxed, forced

where relaxed allows to run binary plugins. Then let dlopen() call into the
kernel with the filepath of the library to check for CET and that will tell
you whether its ok or or not and do the necessary magic in the kernel when
CET has to be disabled due to a !CET library/application.

That's also making the whole thing independent of magic glibc environment
options and allows it to be used all over the place in the same way.

Thanks,

	tglx






^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 18:59                       ` Thomas Gleixner
@ 2018-06-12 19:34                         ` H.J. Lu
  2018-06-18 22:03                           ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: H.J. Lu @ 2018-06-12 19:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar, Shanbhogue,
	Vedvyas, Ravi V. Shankar, Dave Hansen, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 12 Jun 2018, H.J. Lu wrote:
>> On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> >> > That works for stuff which loads all libraries at start time, but what
>> >> >> > happens if the program uses dlopen() later on? If CET is force locked and
>> >> >> > the library is not CET enabled, it will fail.
>> >> >>
>> >> >> That is to prevent disabling CET by dlopening a legacy shared library.
>> >> >>
>> >> >> > I don't see the point of trying to support CET by magic. It adds complexity
>> >> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
>> >> >> > not even a corner case.
>> >> >>
>> >> >> That is a price we pay for security.  To enable CET, especially shadow
>> >> >> shack, the program and all of shared libraries it uses should be CET
>> >> >> enabled.  Most of programs can be enabled with CET by compiling them
>> >> >> with -fcf-protection.
>> >> >
>> >> > If you charge too high a price for security, people may turn it off.
>> >> > I think we're going to need a mode where a program says "I want to use
>> >> > the CET, but turn it off if I dlopen an unsupported library".  There
>> >> > are programs that load binary-only plugins.
>> >>
>> >> You can do
>> >>
>> >> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
>> >>
>> >> which turns off shadow stack.
>> >>
>> >
>> > Which exactly illustrates my point.  By making your security story too
>> > absolute, you'll force people to turn it off when they don't need to.
>> > If I'm using a fully CET-ified distro and I'm using a CET-aware
>> > program that loads binary plugins, and I may or may not have an old
>> > (binary-only, perhaps) plugin that doesn't support CET, then the
>> > behavior I want is for CET to be on until I dlopen() a program that
>> > doesn't support it.  Unless there's some ABI reason why that can't be
>> > done, but I don't think there is.
>>
>> We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
>> shared object is disallowed when CET is enabled.
>
> That's a bad idea. Stuff has launchers which people might not be able to
> change. So they will simply turn of CET completely or it makes them hack
> horrible crap into init, e.g. the above export.
>
> Give them sane kernel options:
>
>      cet = off, relaxed, forced
>
> where relaxed allows to run binary plugins. Then let dlopen() call into the
> kernel with the filepath of the library to check for CET and that will tell
> you whether its ok or or not and do the necessary magic in the kernel when
> CET has to be disabled due to a !CET library/application.
>
> That's also making the whole thing independent of magic glibc environment
> options and allows it to be used all over the place in the same way.

This is very similar to our ARCH_CET_EXEC proposal which controls how
CET should be enforced.   But Andy thinks it is a bad idea.


-- 
H.J.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 17:24           ` Yu-cheng Yu
@ 2018-06-12 20:15             ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-12 20:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: bsingharora, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, 2018-06-12 at 10:24 -0700, Yu-cheng Yu wrote:
> On Tue, 2018-06-12 at 09:31 -0700, Andy Lutomirski wrote:
> > On Tue, Jun 12, 2018 at 9:24 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >
> > > On Tue, 2018-06-12 at 09:00 -0700, Andy Lutomirski wrote:
> > > > On Tue, Jun 12, 2018 at 8:06 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > > > >
> > > > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > > > >
> > > > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > > > This series introduces CET - Shadow stack
> > > > > > >
> > > > > > > At the high level, shadow stack is:
> > > > > > >
> > > > > > >     Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > > > >     Its PTEs must be read-only and dirty;
> > > > > > >     Fixed sized, but the default size can be changed by sys admin.
> > > > > > >
> > > > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > > > shadow stack access takes place.
> > > > > > >
> > > > > > > For a pthread child, a new shadow stack is allocated.
> > > > > > >
> > > > > > > The signal handler uses the same shadow stack as the main program.
> > > > > > >
> > > > > >
> > > > > > Even with sigaltstack()?
> > > > > >
> > > > > >
> > > > > > Balbir Singh.
> > > > >
> > > > > Yes.
> > > > >
> > > >
> > > > I think we're going to need some provision to add an alternate signal
> > > > stack to handle the case where the shadow stack overflows.
> > >
> > > The shadow stack stores only return addresses; its consumption will not
> > > exceed a percentage of (program stack size + sigaltstack size) before
> > > those overflow.  When that happens, there is usually very little we can
> > > do.  So we set a default shadow stack size that supports certain nested
> > > calls and allow sys admin to adjust it.
> > >
> > 
> > Of course there's something you can do: add a sigaltstack-like stack
> > switching mechanism.  Have a reserve shadow stack and, when a signal
> > is delivered (possibly guarded by other conditions like "did the
> > shadow stack overflow"), switch to a new shadow stack and maybe write
> > a special token to the new shadow stack that says "signal delivery
> > jumped here and will restore to the previous shadow stack and
> > such-and-such address on return".
> 
> If (shstk size == (stack size + sigaltstack size)), then shstk will not
> overflow before program stack overflows and sigaltstack also overflows.
> 
> Let me think about this.

The reserve shadow stack will help only when the shstk overflows but
signal stack/sigaltstack still has room and we can deliver a signal.  If
the shstk is large enough to cover any nested calls that will overflow
both the program stack and sigaltstack then we don't need a reserve
shstk.

We can estimate how big the shstk needs to be; in the worst case it
should not be greater than (program stack size + sigaltstack size).  The
default shstk size we choose pass all signal tests in GLIBC.  In case
there is a need to increase it for a very large RLIMIT_STACK or very
large sigaltstack, the sys admin can increase the default shstk size.

Yu-cheng



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-12 15:03   ` Yu-cheng Yu
  2018-06-12 16:00     ` Andy Lutomirski
@ 2018-06-14  1:07     ` Balbir Singh
  2018-06-14 14:56       ` Yu-cheng Yu
  1 sibling, 1 reply; 98+ messages in thread
From: Balbir Singh @ 2018-06-14  1:07 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Tue, 2018-06-12 at 08:03 -0700, Yu-cheng Yu wrote:
> On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > 
> > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > This series introduces CET - Shadow stack
> > > 
> > > At the high level, shadow stack is:
> > > 
> > > 	Allocated from a task's address space with vm_flags VM_SHSTK;
> > > 	Its PTEs must be read-only and dirty;
> > > 	Fixed sized, but the default size can be changed by sys admin.
> > > 
> > > For a forked child, the shadow stack is duplicated when the next
> > > shadow stack access takes place.
> > > 
> > > For a pthread child, a new shadow stack is allocated.
> > > 
> > > The signal handler uses the same shadow stack as the main program.
> > > 
> > 
> > Even with sigaltstack()?
> > 
> Yes.

I am not convinced that it would work, as we switch stacks, oveflow might
be an issue. I also forgot to bring up setcontext(2), I presume those
will get new shadow stacks

Balbir Singh.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-07 14:37 ` [PATCH 02/10] x86/cet: Introduce WRUSS instruction Yu-cheng Yu
  2018-06-07 16:40   ` Andy Lutomirski
@ 2018-06-14  1:30   ` Balbir Singh
  2018-06-14 14:43     ` Yu-cheng Yu
  1 sibling, 1 reply; 98+ messages in thread
From: Balbir Singh @ 2018-06-14  1:30 UTC (permalink / raw)
  To: Yu-cheng Yu, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Thu, 2018-06-07 at 07:37 -0700, Yu-cheng Yu wrote:
> WRUSS is a new kernel-mode instruction but writes directly
> to user shadow stack memory.  This is used to construct
> a return address on the shadow stack for the signal
> handler.
> 
> This instruction can fault if the user shadow stack is
> invalid shadow stack memory.  In that case, the kernel does
> fixup.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  arch/x86/include/asm/special_insns.h          | 44 +++++++++++++++++++++++++++
>  arch/x86/lib/x86-opcode-map.txt               |  2 +-
>  arch/x86/mm/fault.c                           | 13 +++++++-
>  tools/objtool/arch/x86/lib/x86-opcode-map.txt |  2 +-
>  4 files changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 317fc59b512c..8ce532fcc171 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -237,6 +237,50 @@ static inline void clwb(volatile void *__p)
>  		: [pax] "a" (p));
>  }
>  
> +#ifdef CONFIG_X86_INTEL_CET
> +
> +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
> +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> +{
> +	int err;
> +
> +	asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"

It would nice to use something like ASM_WRUSS/Q like ASM_CLAC/ASM_STAC.
Is the 0x37 spurious? I don't see addr/val being used in the instructions
either.

> +		     "xor %[err],%[err]\n"
> +		     "2:\n"
> +		     ".section .fixup,\"ax\"\n"
> +		     "3: mov $-1,%[err]; jmp 2b\n"
> +		     ".previous\n"
> +		     _ASM_EXTABLE(1b, 3b)
> +		: [err] "=a" (err)
> +		: [val] "S" (val), [addr] "D" (addr)
> +		: "memory");
> +	return err;
> +}
> +#else
> +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> +{
> +	return 0;
> +}
> +#endif
> +
> +static inline int write_user_shstk_64(unsigned long addr, unsigned long val)
> +{
> +	int err;
> +
> +	asm volatile("1:.byte 0x66, 0x48, 0x0f, 0x38, 0xf5, 0x37\n"
> +		     "xor %[err],%[err]\n"
> +		     "2:\n"
> +		     ".section .fixup,\"ax\"\n"
> +		     "3: mov $-1,%[err]; jmp 2b\n"
> +		     ".previous\n"
> +		     _ASM_EXTABLE(1b, 3b)
> +		: [err] "=a" (err)
> +		: [val] "S" (val), [addr] "D" (addr)
> +		: "memory");
> +	return err;
> +}
> +#endif /* CONFIG_X86_INTEL_CET */
> +
>  #define nop() asm volatile ("nop")
>  
>  
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index e0b85930dd77..72bb7c48a7df 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
>  f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
>  f2: ANDN Gy,By,Ey (v)
>  f3: Grp17 (1A)
> -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
> +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
>  f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
>  f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
>  EndTable
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 2b3b9170109c..f157338862f8 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -640,6 +640,17 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
>  	return 0;
>  }
>  
> +/*
> + * WRUSS is a kernel instrcution and but writes to user
> + * shadow stack memory.  When a fault occurs, both
> + * X86_PF_USER and X86_PF_SHSTK are set.
> + */
> +static int is_wruss(struct pt_regs *regs, unsigned long error_code)
> +{
> +	return (((error_code & (X86_PF_USER | X86_PF_SHSTK)) ==
> +		(X86_PF_USER | X86_PF_SHSTK)) && !user_mode(regs));
> +}
> +
>  static const char nx_warning[] = KERN_CRIT
>  "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n";
>  static const char smep_warning[] = KERN_CRIT
> @@ -851,7 +862,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
>  	struct task_struct *tsk = current;
>  
>  	/* User mode accesses just cause a SIGSEGV */
> -	if (error_code & X86_PF_USER) {
> +	if ((error_code & X86_PF_USER) && !is_wruss(regs, error_code)) {
>  		/*
>  		 * It's possible to have interrupts off here:
>  		 */
> diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> index e0b85930dd77..72bb7c48a7df 100644
> --- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt
> @@ -789,7 +789,7 @@ f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
>  f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
>  f2: ANDN Gy,By,Ey (v)
>  f3: Grp17 (1A)
> -f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
> +f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSS Pq,Qq (66),REX.W
>  f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
>  f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
>  EndTable

Balbir Singh.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/10] x86/cet: Introduce WRUSS instruction
  2018-06-14  1:30   ` Balbir Singh
@ 2018-06-14 14:43     ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-14 14:43 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Thu, 2018-06-14 at 11:30 +1000, Balbir Singh wrote:
> On Thu, 2018-06-07 at 07:37 -0700, Yu-cheng Yu wrote:
> > WRUSS is a new kernel-mode instruction but writes directly
> > to user shadow stack memory.  This is used to construct
> > a return address on the shadow stack for the signal
> > handler.
> > 
> > This instruction can fault if the user shadow stack is
> > invalid shadow stack memory.  In that case, the kernel does
> > fixup.
> > 
> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> > ---
> >  arch/x86/include/asm/special_insns.h          | 44 +++++++++++++++++++++++++++
> >  arch/x86/lib/x86-opcode-map.txt               |  2 +-
> >  arch/x86/mm/fault.c                           | 13 +++++++-
> >  tools/objtool/arch/x86/lib/x86-opcode-map.txt |  2 +-
> >  4 files changed, 58 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 317fc59b512c..8ce532fcc171 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -237,6 +237,50 @@ static inline void clwb(volatile void *__p)
> >  		: [pax] "a" (p));
> >  }
> >  
> > +#ifdef CONFIG_X86_INTEL_CET
> > +
> > +#if defined(CONFIG_IA32_EMULATION) || defined(CONFIG_X86_X32)
> > +static inline int write_user_shstk_32(unsigned long addr, unsigned int val)
> > +{
> > +	int err;
> > +
> > +	asm volatile("1:.byte 0x66, 0x0f, 0x38, 0xf5, 0x37\n"
> 
> It would nice to use something like ASM_WRUSS/Q like ASM_CLAC/ASM_STAC.
> Is the 0x37 spurious? I don't see addr/val being used in the instructions
> either.
> 

Yes, this is being revised.  We are going to require a GCC and binutils
that support CET.  I will put in the WRUSS instruction, no '.byte' any
more.

Yu-cheng


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-14  1:07     ` Balbir Singh
@ 2018-06-14 14:56       ` Yu-cheng Yu
  2018-06-17  3:16         ` Balbir Singh
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-14 14:56 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Thu, 2018-06-14 at 11:07 +1000, Balbir Singh wrote:
> On Tue, 2018-06-12 at 08:03 -0700, Yu-cheng Yu wrote:
> > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > 
> > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > This series introduces CET - Shadow stack
> > > > 
> > > > At the high level, shadow stack is:
> > > > 
> > > > 	Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > 	Its PTEs must be read-only and dirty;
> > > > 	Fixed sized, but the default size can be changed by sys admin.
> > > > 
> > > > For a forked child, the shadow stack is duplicated when the next
> > > > shadow stack access takes place.
> > > > 
> > > > For a pthread child, a new shadow stack is allocated.
> > > > 
> > > > The signal handler uses the same shadow stack as the main program.
> > > > 
> > > 
> > > Even with sigaltstack()?
> > > 
> > Yes.
> 
> I am not convinced that it would work, as we switch stacks, oveflow might
> be an issue. I also forgot to bring up setcontext(2), I presume those
> will get new shadow stacks

Do you mean signal stack/sigaltstack overflow or swapcontext in a signal
handler?

Yu-cheng


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-14 14:56       ` Yu-cheng Yu
@ 2018-06-17  3:16         ` Balbir Singh
  2018-06-18 21:44           ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Balbir Singh @ 2018-06-17  3:16 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen,
	Andy Lutomirski, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	Mike Kravetz

On Thu, 2018-06-14 at 07:56 -0700, Yu-cheng Yu wrote:
> On Thu, 2018-06-14 at 11:07 +1000, Balbir Singh wrote:
> > On Tue, 2018-06-12 at 08:03 -0700, Yu-cheng Yu wrote:
> > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > > 
> > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > This series introduces CET - Shadow stack
> > > > > 
> > > > > At the high level, shadow stack is:
> > > > > 
> > > > > 	Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > > 	Its PTEs must be read-only and dirty;
> > > > > 	Fixed sized, but the default size can be changed by sys admin.
> > > > > 
> > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > shadow stack access takes place.
> > > > > 
> > > > > For a pthread child, a new shadow stack is allocated.
> > > > > 
> > > > > The signal handler uses the same shadow stack as the main program.
> > > > > 
> > > > 
> > > > Even with sigaltstack()?
> > > > 
> > > 
> > > Yes.
> > 
> > I am not convinced that it would work, as we switch stacks, oveflow might
> > be an issue. I also forgot to bring up setcontext(2), I presume those
> > will get new shadow stacks
> 
> Do you mean signal stack/sigaltstack overflow or swapcontext in a signal
> handler?
>

I meant any combination of that. If there is a user space threads implementation that uses sigaltstack for switching threads

Balbir Singh.
 
> Yu-cheng
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-17  3:16         ` Balbir Singh
@ 2018-06-18 21:44           ` Andy Lutomirski
  2018-06-19  8:52             ` Balbir Singh
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-18 21:44 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Sat, Jun 16, 2018 at 8:16 PM Balbir Singh <bsingharora@gmail.com> wrote:
>
> On Thu, 2018-06-14 at 07:56 -0700, Yu-cheng Yu wrote:
> > On Thu, 2018-06-14 at 11:07 +1000, Balbir Singh wrote:
> > > On Tue, 2018-06-12 at 08:03 -0700, Yu-cheng Yu wrote:
> > > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > > >
> > > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > > This series introduces CET - Shadow stack
> > > > > >
> > > > > > At the high level, shadow stack is:
> > > > > >
> > > > > >       Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > > >       Its PTEs must be read-only and dirty;
> > > > > >       Fixed sized, but the default size can be changed by sys admin.
> > > > > >
> > > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > > shadow stack access takes place.
> > > > > >
> > > > > > For a pthread child, a new shadow stack is allocated.
> > > > > >
> > > > > > The signal handler uses the same shadow stack as the main program.
> > > > > >
> > > > >
> > > > > Even with sigaltstack()?
> > > > >
> > > >
> > > > Yes.
> > >
> > > I am not convinced that it would work, as we switch stacks, oveflow might
> > > be an issue. I also forgot to bring up setcontext(2), I presume those
> > > will get new shadow stacks
> >
> > Do you mean signal stack/sigaltstack overflow or swapcontext in a signal
> > handler?
> >
>
> I meant any combination of that. If there is a user space threads implementation that uses sigaltstack for switching threads
>

Anyone who does that is nuts.  The whole point of user space threads
is speed, and signals are very slow.  For userspace threads to work,
we need an API to allocate new shadow stacks, and we need to use the
extremely awkwardly defined RSTORSSP stuff to switch.  (I assume this
is possible on an ISA level.  The docs are bad, and the mnemonics for
the relevant instructions are nonsensical.)

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-12 19:34                         ` H.J. Lu
@ 2018-06-18 22:03                           ` Andy Lutomirski
  2018-06-19  0:52                             ` Kees Cook
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-18 22:03 UTC (permalink / raw)
  To: H. J. Lu, Kees Cook
  Cc: Thomas Gleixner, Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Tue, Jun 12, 2018 at 12:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jun 12, 2018 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 12 Jun 2018, H.J. Lu wrote:
> >> On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
> >> >> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> >> >> > That works for stuff which loads all libraries at start time, but what
> >> >> >> > happens if the program uses dlopen() later on? If CET is force locked and
> >> >> >> > the library is not CET enabled, it will fail.
> >> >> >>
> >> >> >> That is to prevent disabling CET by dlopening a legacy shared library.
> >> >> >>
> >> >> >> > I don't see the point of trying to support CET by magic. It adds complexity
> >> >> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
> >> >> >> > not even a corner case.
> >> >> >>
> >> >> >> That is a price we pay for security.  To enable CET, especially shadow
> >> >> >> shack, the program and all of shared libraries it uses should be CET
> >> >> >> enabled.  Most of programs can be enabled with CET by compiling them
> >> >> >> with -fcf-protection.
> >> >> >
> >> >> > If you charge too high a price for security, people may turn it off.
> >> >> > I think we're going to need a mode where a program says "I want to use
> >> >> > the CET, but turn it off if I dlopen an unsupported library".  There
> >> >> > are programs that load binary-only plugins.
> >> >>
> >> >> You can do
> >> >>
> >> >> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
> >> >>
> >> >> which turns off shadow stack.
> >> >>
> >> >
> >> > Which exactly illustrates my point.  By making your security story too
> >> > absolute, you'll force people to turn it off when they don't need to.
> >> > If I'm using a fully CET-ified distro and I'm using a CET-aware
> >> > program that loads binary plugins, and I may or may not have an old
> >> > (binary-only, perhaps) plugin that doesn't support CET, then the
> >> > behavior I want is for CET to be on until I dlopen() a program that
> >> > doesn't support it.  Unless there's some ABI reason why that can't be
> >> > done, but I don't think there is.
> >>
> >> We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
> >> shared object is disallowed when CET is enabled.
> >
> > That's a bad idea. Stuff has launchers which people might not be able to
> > change. So they will simply turn of CET completely or it makes them hack
> > horrible crap into init, e.g. the above export.
> >
> > Give them sane kernel options:
> >
> >      cet = off, relaxed, forced
> >
> > where relaxed allows to run binary plugins. Then let dlopen() call into the
> > kernel with the filepath of the library to check for CET and that will tell
> > you whether its ok or or not and do the necessary magic in the kernel when
> > CET has to be disabled due to a !CET library/application.
> >
> > That's also making the whole thing independent of magic glibc environment
> > options and allows it to be used all over the place in the same way.
>
> This is very similar to our ARCH_CET_EXEC proposal which controls how
> CET should be enforced.   But Andy thinks it is a bad idea.
>

I do think it's a bad idea to have a new piece of state that survives
across exec().  It's going to have nasty usability problems and nasty
security problems.

We may need a mode by which glibc can turn CET *back off* even after a
program had it on if it dlopens() an old binary.  Or maybe there won't
be demand.  I can certainly understand why the CET_LOCK feature is
there, although I think we need a way to override it using something
like ptrace().  I'm not convinced that CET_LOCK is really needed, but
someone who understand the thread model should chime in.

Kees, do you know anyone who has a good enough understanding of
usermode exploits and how they'll interact with CET?

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-18 22:03                           ` Andy Lutomirski
@ 2018-06-19  0:52                             ` Kees Cook
  2018-06-19  6:40                               ` Florian Weimer
  2018-06-19 14:50                               ` Andy Lutomirski
  0 siblings, 2 replies; 98+ messages in thread
From: Kees Cook @ 2018-06-19  0:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. J. Lu, Thomas Gleixner, Yu-cheng Yu, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Mon, Jun 18, 2018 at 3:03 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 12:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Tue, Jun 12, 2018 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Tue, 12 Jun 2018, H.J. Lu wrote:
>> >> On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> > On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> >> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> >> > On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> >> >> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> >> >> > That works for stuff which loads all libraries at start time, but what
>> >> >> >> > happens if the program uses dlopen() later on? If CET is force locked and
>> >> >> >> > the library is not CET enabled, it will fail.
>> >> >> >>
>> >> >> >> That is to prevent disabling CET by dlopening a legacy shared library.
>> >> >> >>
>> >> >> >> > I don't see the point of trying to support CET by magic. It adds complexity
>> >> >> >> > and you'll never be able to handle all corner cases correctly. dlopen() is
>> >> >> >> > not even a corner case.
>> >> >> >>
>> >> >> >> That is a price we pay for security.  To enable CET, especially shadow
>> >> >> >> shack, the program and all of shared libraries it uses should be CET
>> >> >> >> enabled.  Most of programs can be enabled with CET by compiling them
>> >> >> >> with -fcf-protection.
>> >> >> >
>> >> >> > If you charge too high a price for security, people may turn it off.
>> >> >> > I think we're going to need a mode where a program says "I want to use
>> >> >> > the CET, but turn it off if I dlopen an unsupported library".  There
>> >> >> > are programs that load binary-only plugins.
>> >> >>
>> >> >> You can do
>> >> >>
>> >> >> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
>> >> >>
>> >> >> which turns off shadow stack.
>> >> >>
>> >> >
>> >> > Which exactly illustrates my point.  By making your security story too
>> >> > absolute, you'll force people to turn it off when they don't need to.
>> >> > If I'm using a fully CET-ified distro and I'm using a CET-aware
>> >> > program that loads binary plugins, and I may or may not have an old
>> >> > (binary-only, perhaps) plugin that doesn't support CET, then the
>> >> > behavior I want is for CET to be on until I dlopen() a program that
>> >> > doesn't support it.  Unless there's some ABI reason why that can't be
>> >> > done, but I don't think there is.
>> >>
>> >> We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
>> >> shared object is disallowed when CET is enabled.
>> >
>> > That's a bad idea. Stuff has launchers which people might not be able to
>> > change. So they will simply turn of CET completely or it makes them hack
>> > horrible crap into init, e.g. the above export.
>> >
>> > Give them sane kernel options:
>> >
>> >      cet = off, relaxed, forced
>> >
>> > where relaxed allows to run binary plugins. Then let dlopen() call into the
>> > kernel with the filepath of the library to check for CET and that will tell
>> > you whether its ok or or not and do the necessary magic in the kernel when
>> > CET has to be disabled due to a !CET library/application.
>> >
>> > That's also making the whole thing independent of magic glibc environment
>> > options and allows it to be used all over the place in the same way.
>>
>> This is very similar to our ARCH_CET_EXEC proposal which controls how
>> CET should be enforced.   But Andy thinks it is a bad idea.
>
> I do think it's a bad idea to have a new piece of state that survives
> across exec().  It's going to have nasty usability problems and nasty
> security problems.
>
> We may need a mode by which glibc can turn CET *back off* even after a
> program had it on if it dlopens() an old binary.  Or maybe there won't
> be demand.  I can certainly understand why the CET_LOCK feature is
> there, although I think we need a way to override it using something
> like ptrace().  I'm not convinced that CET_LOCK is really needed, but
> someone who understand the thread model should chime in.
>
> Kees, do you know anyone who has a good enough understanding of
> usermode exploits and how they'll interact with CET?

Adding Florian to CC, but if something gets CET enabled, it really
shouldn't have a way to turn it off. If there's a way to turn it off,
all the ROP research will suddenly turn to exactly one gadget before
doing the rest of the ROP: turning off CET. Right now ROP is: use
stack-pivot gadget, do everything else. Allowed CET to turn off will
just add one step: use CET-off gadget, use stack-pivot gadget, do
everything else. :P

Following Linus's request for "slow introduction" of new security
features, likely the best approach is to default to "relaxed" (with a
warning about down-grades), and allow distros/end-users to pick
"forced" if they know their libraries are all CET-enabled.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19  0:52                             ` Kees Cook
@ 2018-06-19  6:40                               ` Florian Weimer
  2018-06-19 14:50                               ` Andy Lutomirski
  1 sibling, 0 replies; 98+ messages in thread
From: Florian Weimer @ 2018-06-19  6:40 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski
  Cc: H. J. Lu, Thomas Gleixner, Yu-cheng Yu, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On 06/19/2018 02:52 AM, Kees Cook wrote:
> Adding Florian to CC, but if something gets CET enabled, it really
> shouldn't have a way to turn it off. If there's a way to turn it off,
> all the ROP research will suddenly turn to exactly one gadget before
> doing the rest of the ROP: turning off CET. Right now ROP is: use
> stack-pivot gadget, do everything else. Allowed CET to turn off will
> just add one step: use CET-off gadget, use stack-pivot gadget, do
> everything else. :P
> 
> Following Linus's request for "slow introduction" of new security
> features, likely the best approach is to default to "relaxed" (with a
> warning about down-grades), and allow distros/end-users to pick
> "forced" if they know their libraries are all CET-enabled.

The dynamic linker can tell beforehand (before executing any user code) 
whether a process image supports CET.  So there doesn't have to be 
anything gradual about it per se to preserve backwards compatibility.

The idea to turn off CET probably comes from the desire to support 
dlopen.  I'm not sure if this is really necessary because the complexity 
is rather nasty.  (We currently do something similar for executable 
stacks.)  I'd rather have a switch to turn off the feature upon process 
start.  Things like NSS and PAM modules need to be recompiled early.  (I 
hope that everything that goes directly to the network via custom 
protocols or hardware such as smartcards is proxied via daemons these days.)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-18 21:44           ` Andy Lutomirski
@ 2018-06-19  8:52             ` Balbir Singh
  0 siblings, 0 replies; 98+ messages in thread
From: Balbir Singh @ 2018-06-19  8:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz

On Mon, 2018-06-18 at 14:44 -0700, Andy Lutomirski wrote:
> On Sat, Jun 16, 2018 at 8:16 PM Balbir Singh <bsingharora@gmail.com> wrote:
> > 
> > On Thu, 2018-06-14 at 07:56 -0700, Yu-cheng Yu wrote:
> > > On Thu, 2018-06-14 at 11:07 +1000, Balbir Singh wrote:
> > > > On Tue, 2018-06-12 at 08:03 -0700, Yu-cheng Yu wrote:
> > > > > On Tue, 2018-06-12 at 20:56 +1000, Balbir Singh wrote:
> > > > > > 
> > > > > > On 08/06/18 00:37, Yu-cheng Yu wrote:
> > > > > > > This series introduces CET - Shadow stack
> > > > > > > 
> > > > > > > At the high level, shadow stack is:
> > > > > > > 
> > > > > > >       Allocated from a task's address space with vm_flags VM_SHSTK;
> > > > > > >       Its PTEs must be read-only and dirty;
> > > > > > >       Fixed sized, but the default size can be changed by sys admin.
> > > > > > > 
> > > > > > > For a forked child, the shadow stack is duplicated when the next
> > > > > > > shadow stack access takes place.
> > > > > > > 
> > > > > > > For a pthread child, a new shadow stack is allocated.
> > > > > > > 
> > > > > > > The signal handler uses the same shadow stack as the main program.
> > > > > > > 
> > > > > > 
> > > > > > Even with sigaltstack()?
> > > > > > 
> > > > > 
> > > > > Yes.
> > > > 
> > > > I am not convinced that it would work, as we switch stacks, oveflow might
> > > > be an issue. I also forgot to bring up setcontext(2), I presume those
> > > > will get new shadow stacks
> > > 
> > > Do you mean signal stack/sigaltstack overflow or swapcontext in a signal
> > > handler?
> > > 
> > 
> > I meant any combination of that. If there is a user space threads implementation that uses sigaltstack for switching threads
> > 
> 
> Anyone who does that is nuts.  The whole point of user space threads
> is speed, and signals are very slow.  For userspace threads to work,
> we need an API to allocate new shadow stacks, and we need to use the
> extremely awkwardly defined RSTORSSP stuff to switch.  (I assume this
> is possible on an ISA level.  The docs are bad, and the mnemonics for
> the relevant instructions are nonsensical.)

The whole point was to ensure we don't break applications/code that work
today. I think as long as there is a shadow stack allocated corresponding
to the user space stack and we can Restore SSP as we switch things should be
fine.

Balbir Singh.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19  0:52                             ` Kees Cook
  2018-06-19  6:40                               ` Florian Weimer
@ 2018-06-19 14:50                               ` Andy Lutomirski
  2018-06-19 16:44                                 ` Kees Cook
  1 sibling, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-19 14:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, H. J. Lu, Thomas Gleixner, Yu-cheng Yu, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer



> On Jun 18, 2018, at 5:52 PM, Kees Cook <keescook@chromium.org> wrote:
> 
>> On Mon, Jun 18, 2018 at 3:03 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> On Tue, Jun 12, 2018 at 12:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>> 
>>>> On Tue, Jun 12, 2018 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>> On Tue, 12 Jun 2018, H.J. Lu wrote:
>>>>>> On Tue, Jun 12, 2018 at 9:34 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>> On Tue, Jun 12, 2018 at 9:05 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>> On Tue, Jun 12, 2018 at 9:01 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>>>> On Tue, Jun 12, 2018 at 4:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>>>> On Tue, Jun 12, 2018 at 3:03 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>>>>>>> That works for stuff which loads all libraries at start time, but what
>>>>>>>>>> happens if the program uses dlopen() later on? If CET is force locked and
>>>>>>>>>> the library is not CET enabled, it will fail.
>>>>>>>>> 
>>>>>>>>> That is to prevent disabling CET by dlopening a legacy shared library.
>>>>>>>>> 
>>>>>>>>>> I don't see the point of trying to support CET by magic. It adds complexity
>>>>>>>>>> and you'll never be able to handle all corner cases correctly. dlopen() is
>>>>>>>>>> not even a corner case.
>>>>>>>>> 
>>>>>>>>> That is a price we pay for security.  To enable CET, especially shadow
>>>>>>>>> shack, the program and all of shared libraries it uses should be CET
>>>>>>>>> enabled.  Most of programs can be enabled with CET by compiling them
>>>>>>>>> with -fcf-protection.
>>>>>>>> 
>>>>>>>> If you charge too high a price for security, people may turn it off.
>>>>>>>> I think we're going to need a mode where a program says "I want to use
>>>>>>>> the CET, but turn it off if I dlopen an unsupported library".  There
>>>>>>>> are programs that load binary-only plugins.
>>>>>>> 
>>>>>>> You can do
>>>>>>> 
>>>>>>> # export GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK
>>>>>>> 
>>>>>>> which turns off shadow stack.
>>>>>>> 
>>>>>> 
>>>>>> Which exactly illustrates my point.  By making your security story too
>>>>>> absolute, you'll force people to turn it off when they don't need to.
>>>>>> If I'm using a fully CET-ified distro and I'm using a CET-aware
>>>>>> program that loads binary plugins, and I may or may not have an old
>>>>>> (binary-only, perhaps) plugin that doesn't support CET, then the
>>>>>> behavior I want is for CET to be on until I dlopen() a program that
>>>>>> doesn't support it.  Unless there's some ABI reason why that can't be
>>>>>> done, but I don't think there is.
>>>>> 
>>>>> We can make it opt-in via GLIBC_TUNABLES.  But by default, the legacy
>>>>> shared object is disallowed when CET is enabled.
>>>> 
>>>> That's a bad idea. Stuff has launchers which people might not be able to
>>>> change. So they will simply turn of CET completely or it makes them hack
>>>> horrible crap into init, e.g. the above export.
>>>> 
>>>> Give them sane kernel options:
>>>> 
>>>>     cet = off, relaxed, forced
>>>> 
>>>> where relaxed allows to run binary plugins. Then let dlopen() call into the
>>>> kernel with the filepath of the library to check for CET and that will tell
>>>> you whether its ok or or not and do the necessary magic in the kernel when
>>>> CET has to be disabled due to a !CET library/application.
>>>> 
>>>> That's also making the whole thing independent of magic glibc environment
>>>> options and allows it to be used all over the place in the same way.
>>> 
>>> This is very similar to our ARCH_CET_EXEC proposal which controls how
>>> CET should be enforced.   But Andy thinks it is a bad idea.
>> 
>> I do think it's a bad idea to have a new piece of state that survives
>> across exec().  It's going to have nasty usability problems and nasty
>> security problems.
>> 
>> We may need a mode by which glibc can turn CET *back off* even after a
>> program had it on if it dlopens() an old binary.  Or maybe there won't
>> be demand.  I can certainly understand why the CET_LOCK feature is
>> there, although I think we need a way to override it using something
>> like ptrace().  I'm not convinced that CET_LOCK is really needed, but
>> someone who understand the thread model should chime in.
>> 
>> Kees, do you know anyone who has a good enough understanding of
>> usermode exploits and how they'll interact with CET?
> 
> Adding Florian to CC, but if something gets CET enabled, it really
> shouldn't have a way to turn it off. If there's a way to turn it off,
> all the ROP research will suddenly turn to exactly one gadget before
> doing the rest of the ROP: turning off CET. Right now ROP is: use
> stack-pivot gadget, do everything else. Allowed CET to turn off will
> just add one step: use CET-off gadget, use stack-pivot gadget, do
> everything else. :P

Fair enough 

> 
> Following Linus's request for "slow introduction" of new security
> features, likely the best approach is to default to "relaxed" (with a
> warning about down-grades), and allow distros/end-users to pick
> "forced" if they know their libraries are all CET-enabled.

I still don’t get what “relaxed” is for.  I think the right design is:

Processes start with CET on or off depending on the ELF note, but they start with CET unlocked no matter what. They can freely switch CET on and off (subject to being clever enough not to crash if they turn it on and then return right off the end of the shadow stack) until they call ARCH_CET_LOCK.

Ptrace gets new APIs to turn CET on and off and to lock and unlock it.  If an attacker finds a “ptrace me and turn off CET” gadget, then they might as well just do “ptrace me and write shell code” instead. It’s basically the same gadget. Keep in mind that the actual sequence of syscalls to do this is incredibly complicated.

It’s unclear to me that forcing CET on belongs in the kernel at all.  By the time an attacker can find a non-CET ELF binary and can exec it in a context where it does their bidding, the attacker is far beyond what CET can even try to help. At this point we’re talking about an attacker who can effectively invoke system(3) with arbitrary parameters, and attackers with *that* power don’t need ROP and the like.

There is a new feature I’d like to see, though: add an ELF note to bless a binary as being an ELF interpreter. And add an LSM callback to validate an ELF interpreter.  Let’s minimize the shenanigans that people who control containers can get up to. (Obviously the ELF note part would need to be opt-in.)

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 14:50                               ` Andy Lutomirski
@ 2018-06-19 16:44                                 ` Kees Cook
  2018-06-19 16:59                                   ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Kees Cook @ 2018-06-19 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, H. J. Lu, Thomas Gleixner, Yu-cheng Yu, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, Jun 19, 2018 at 7:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Jun 18, 2018, at 5:52 PM, Kees Cook <keescook@chromium.org> wrote:
>> Following Linus's request for "slow introduction" of new security
>> features, likely the best approach is to default to "relaxed" (with a
>> warning about down-grades), and allow distros/end-users to pick
>> "forced" if they know their libraries are all CET-enabled.
>
> I still don’t get what “relaxed” is for.  I think the right design is:
>
> Processes start with CET on or off depending on the ELF note, but they start with CET unlocked no matter what. They can freely switch CET on and off (subject to being clever enough not to crash if they turn it on and then return right off the end of the shadow stack) until they call ARCH_CET_LOCK.

I'm fine with this. I'd expect modern loaders to just turn on CET and
ARCH_CET_LOCK immediately and be done with it. :P

> Ptrace gets new APIs to turn CET on and off and to lock and unlock it.  If an attacker finds a “ptrace me and turn off CET” gadget, then they might as well just do “ptrace me and write shell code” instead. It’s basically the same gadget. Keep in mind that the actual sequence of syscalls to do this is incredibly complicated.

Right -- if an attacker can control ptrace of the target, we're way
past CET. The only concern I have, though, is taking advantage of
expected ptracing. For example: browsers tend to have crash handlers
that launch a ptracer. If ptracing disabled CET for all threads, this
won't by safe: an attacker just gains control in two threads, crashes
one to get the ptracer to attach, which disables CET in the other
thread and the attacker continues ROP as normal. As long as the ptrace
disabling is thread-specific, I think this will be okay.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 16:44                                 ` Kees Cook
@ 2018-06-19 16:59                                   ` Yu-cheng Yu
  2018-06-19 17:07                                     ` Kees Cook
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-19 16:59 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski
  Cc: Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, 2018-06-19 at 09:44 -0700, Kees Cook wrote:
> On Tue, Jun 19, 2018 at 7:50 AM, Andy Lutomirski <luto@amacapital.net
> > wrote:
> > 
> > > 
> > > On Jun 18, 2018, at 5:52 PM, Kees Cook <keescook@chromium.org>
> > > wrote:
> > > Following Linus's request for "slow introduction" of new security
> > > features, likely the best approach is to default to "relaxed"
> > > (with a
> > > warning about down-grades), and allow distros/end-users to pick
> > > "forced" if they know their libraries are all CET-enabled.
> > I still don’t get what “relaxed” is for.  I think the right design
> > is:
> > 
> > Processes start with CET on or off depending on the ELF note, but
> > they start with CET unlocked no matter what. They can freely switch
> > CET on and off (subject to being clever enough not to crash if they
> > turn it on and then return right off the end of the shadow stack)
> > until they call ARCH_CET_LOCK.
> I'm fine with this. I'd expect modern loaders to just turn on CET and
> ARCH_CET_LOCK immediately and be done with it. :P

This is the current implementation.  If the loader has CET in its ELF
header, it is executed with CET on.  The loader will turn off CET if
the application being loaded does not support it (in the ELF header).
 The loader calls ARCH_CET_LOCK before passing to the application.  But
how do we handle dlopen?

> > 
> > Ptrace gets new APIs to turn CET on and off and to lock and unlock
> > it.  If an attacker finds a “ptrace me and turn off CET” gadget,
> > then they might as well just do “ptrace me and write shell code”
> > instead. It’s basically the same gadget. Keep in mind that the
> > actual sequence of syscalls to do this is incredibly complicated.
> Right -- if an attacker can control ptrace of the target, we're way
> past CET. The only concern I have, though, is taking advantage of
> expected ptracing. For example: browsers tend to have crash handlers
> that launch a ptracer. If ptracing disabled CET for all threads, this
> won't by safe: an attacker just gains control in two threads, crashes
> one to get the ptracer to attach, which disables CET in the other
> thread and the attacker continues ROP as normal. As long as the
> ptrace
> disabling is thread-specific, I think this will be okay.

If ptrace can turn CET on/off and it is thread-specific, do we still
need ptrace lock/unlock?

Yu-cheng

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 16:59                                   ` Yu-cheng Yu
@ 2018-06-19 17:07                                     ` Kees Cook
  2018-06-19 17:20                                       ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Kees Cook @ 2018-06-19 17:07 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Andy Lutomirski, Andy Lutomirski, H. J. Lu, Thomas Gleixner,
	LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, Jun 19, 2018 at 9:59 AM, Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> On Tue, 2018-06-19 at 09:44 -0700, Kees Cook wrote:
>> On Tue, Jun 19, 2018 at 7:50 AM, Andy Lutomirski <luto@amacapital.net
>> > wrote:
>> >
>> > >
>> > > On Jun 18, 2018, at 5:52 PM, Kees Cook <keescook@chromium.org>
>> > > wrote:
>> > > Following Linus's request for "slow introduction" of new security
>> > > features, likely the best approach is to default to "relaxed"
>> > > (with a
>> > > warning about down-grades), and allow distros/end-users to pick
>> > > "forced" if they know their libraries are all CET-enabled.
>> > I still don’t get what “relaxed” is for.  I think the right design
>> > is:
>> >
>> > Processes start with CET on or off depending on the ELF note, but
>> > they start with CET unlocked no matter what. They can freely switch
>> > CET on and off (subject to being clever enough not to crash if they
>> > turn it on and then return right off the end of the shadow stack)
>> > until they call ARCH_CET_LOCK.
>> I'm fine with this. I'd expect modern loaders to just turn on CET and
>> ARCH_CET_LOCK immediately and be done with it. :P
>
> This is the current implementation.  If the loader has CET in its ELF
> header, it is executed with CET on.  The loader will turn off CET if
> the application being loaded does not support it (in the ELF header).
>  The loader calls ARCH_CET_LOCK before passing to the application.  But
> how do we handle dlopen?

I thought CET_LOCK would not get set in "relaxed" mode, due to dlopen
usage, and that would be the WARN case. People without dlopen concerns
can boot with "enforced" mode? If a system builder knows there are no
legacy dlopens they build with enforced enabled, etc.

>> > Ptrace gets new APIs to turn CET on and off and to lock and unlock
>> > it.  If an attacker finds a “ptrace me and turn off CET” gadget,
>> > then they might as well just do “ptrace me and write shell code”
>> > instead. It’s basically the same gadget. Keep in mind that the
>> > actual sequence of syscalls to do this is incredibly complicated.
>> Right -- if an attacker can control ptrace of the target, we're way
>> past CET. The only concern I have, though, is taking advantage of
>> expected ptracing. For example: browsers tend to have crash handlers
>> that launch a ptracer. If ptracing disabled CET for all threads, this
>> won't by safe: an attacker just gains control in two threads, crashes
>> one to get the ptracer to attach, which disables CET in the other
>> thread and the attacker continues ROP as normal. As long as the
>> ptrace
>> disabling is thread-specific, I think this will be okay.
>
> If ptrace can turn CET on/off and it is thread-specific, do we still
> need ptrace lock/unlock?

Does it provide anything beyond what PR_DUMPABLE does?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 17:07                                     ` Kees Cook
@ 2018-06-19 17:20                                       ` Andy Lutomirski
  2018-06-19 20:12                                         ` Kees Cook
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-19 17:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yu-cheng Yu, Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer



> On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.org> wrote:
> 
>> On Tue, Jun 19, 2018 at 9:59 AM, Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>> On Tue, 2018-06-19 at 09:44 -0700, Kees Cook wrote:
>>> On Tue, Jun 19, 2018 at 7:50 AM, Andy Lutomirski <luto@amacapital.net
>>>> wrote:
>>>> 
>>>>> 
>>>>> On Jun 18, 2018, at 5:52 PM, Kees Cook <keescook@chromium.org>
>>>>> wrote:
>>>>> Following Linus's request for "slow introduction" of new security
>>>>> features, likely the best approach is to default to "relaxed"
>>>>> (with a
>>>>> warning about down-grades), and allow distros/end-users to pick
>>>>> "forced" if they know their libraries are all CET-enabled.
>>>> I still don’t get what “relaxed” is for.  I think the right design
>>>> is:
>>>> 
>>>> Processes start with CET on or off depending on the ELF note, but
>>>> they start with CET unlocked no matter what. They can freely switch
>>>> CET on and off (subject to being clever enough not to crash if they
>>>> turn it on and then return right off the end of the shadow stack)
>>>> until they call ARCH_CET_LOCK.
>>> I'm fine with this. I'd expect modern loaders to just turn on CET and
>>> ARCH_CET_LOCK immediately and be done with it. :P
>> 
>> This is the current implementation.  If the loader has CET in its ELF
>> header, it is executed with CET on.  The loader will turn off CET if
>> the application being loaded does not support it (in the ELF header).
>> The loader calls ARCH_CET_LOCK before passing to the application.  But
>> how do we handle dlopen?
> 
> I thought CET_LOCK would not get set in "relaxed" mode, due to dlopen
> usage, and that would be the WARN case. People without dlopen concerns
> can boot with "enforced" mode? If a system builder knows there are no
> legacy dlopens they build with enforced enabled, etc.

I think we’re getting ahead of ourselves. dlopen() of a non-CET-aware library in a CET process is distinctly non-trivial, especially in a multithreaded process. I think getting it right will require *userspace* support.  It certainly needs ld.so to issue to arch_prctl at a bare minimum. So I see no point to a kernel-supplied “relaxed” mode. I think there may be demand for a ld.so relaxed mode, but it will have nothing to do with boot options.

It’s potentially helpful to add an arch_prctl that turns CET off for all threads, but only if unlocked. It would obviously be one hell of a gadget.

> 
>>>> Ptrace gets new APIs to turn CET on and off and to lock and unlock
>>>> it.  If an attacker finds a “ptrace me and turn off CET” gadget,
>>>> then they might as well just do “ptrace me and write shell code”
>>>> instead. It’s basically the same gadget. Keep in mind that the
>>>> actual sequence of syscalls to do this is incredibly complicated.
>>> Right -- if an attacker can control ptrace of the target, we're way
>>> past CET. The only concern I have, though, is taking advantage of
>>> expected ptracing. For example: browsers tend to have crash handlers
>>> that launch a ptracer. If ptracing disabled CET for all threads, this
>>> won't by safe: an attacker just gains control in two threads, crashes
>>> one to get the ptracer to attach, which disables CET in the other
>>> thread and the attacker continues ROP as normal. As long as the
>>> ptrace
>>> disabling is thread-specific, I think this will be okay.
>> 
>> If ptrace can turn CET on/off and it is thread-specific, do we still
>> need ptrace lock/unlock?

Let me clarify. I don’t think ptrace() should have any automatic effect on CET. I think there should be an explicit way to ask ptrace to twiddle CET, and it should probably apply per thread.

> 
> Does it provide anything beyond what PR_DUMPABLE does?

What do you mean?


> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 17:20                                       ` Andy Lutomirski
@ 2018-06-19 20:12                                         ` Kees Cook
  2018-06-19 20:47                                           ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Kees Cook @ 2018-06-19 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, Jun 19, 2018 at 10:20 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>> Does it provide anything beyond what PR_DUMPABLE does?
>
> What do you mean?

I was just going by the name of it. I wasn't sure what "ptrace CET
lock" meant, so I was trying to understand if it was another "you
can't ptrace me" toggle, and if so, wouldn't it be redundant with
PR_SET_DUMPABLE = 0, etc.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 20:12                                         ` Kees Cook
@ 2018-06-19 20:47                                           ` Andy Lutomirski
  2018-06-19 22:38                                             ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-19 20:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Yu-cheng Yu, Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer


> On Jun 19, 2018, at 1:12 PM, Kees Cook <keescook@chromium.org> wrote:
> 
>> On Tue, Jun 19, 2018 at 10:20 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>>> On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.org> wrote:
>>> 
>>> Does it provide anything beyond what PR_DUMPABLE does?
>> 
>> What do you mean?
> 
> I was just going by the name of it. I wasn't sure what "ptrace CET
> lock" meant, so I was trying to understand if it was another "you
> can't ptrace me" toggle, and if so, wouldn't it be redundant with
> PR_SET_DUMPABLE = 0, etc.
> 

No, other way around. The valid CET states are on/unlocked, off/unlocked, on/locked, off/locked. arch_prctl can freely the state unless locked. ptrace can change it no matter what.  The lock is to prevent the existence of a gadget to disable CET (unless the gadget involves ptrace, but I don’t think that’s a real concern).

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 20:47                                           ` Andy Lutomirski
@ 2018-06-19 22:38                                             ` Yu-cheng Yu
  2018-06-20  0:50                                               ` Andy Lutomirski
  0 siblings, 1 reply; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-19 22:38 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook
  Cc: Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, 2018-06-19 at 13:47 -0700, Andy Lutomirski wrote:
> > 
> > On Jun 19, 2018, at 1:12 PM, Kees Cook <keescook@chromium.org>
> > wrote:
> > 
> > > 
> > > On Tue, Jun 19, 2018 at 10:20 AM, Andy Lutomirski <luto@amacapita
> > > l.net> wrote:
> > > 
> > > > 
> > > > On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.org>
> > > > wrote:
> > > > 
> > > > Does it provide anything beyond what PR_DUMPABLE does?
> > > What do you mean?
> > I was just going by the name of it. I wasn't sure what "ptrace CET
> > lock" meant, so I was trying to understand if it was another "you
> > can't ptrace me" toggle, and if so, wouldn't it be redundant with
> > PR_SET_DUMPABLE = 0, etc.
> > 
> No, other way around. The valid CET states are on/unlocked,
> off/unlocked, on/locked, off/locked. arch_prctl can freely the state
> unless locked. ptrace can change it no matter what.  The lock is to
> prevent the existence of a gadget to disable CET (unless the gadget
> involves ptrace, but I don’t think that’s a real concern).

We have the arch_prctl now and only need to add ptrace lock/unlock.

Back to the dlopen() "relaxed" mode. Would the following work?

If the lib being loaded does not use setjmp/getcontext families (the
loader knows?), then the loader leaves shstk on.  Otherwise, if the
system-wide setting is "relaxed", the loader turns off shstk and issues
a warning.  In addition, if (dlopen == relaxed), then cet is not locked
in any time.

The system-wide setting (somewhere in /etc?) can be:

	dlopen=force|relaxed /* controls dlopen of non-cet libs */
	exec=force|relaxed /* controls exec of non-cet apps */

--
Yu-cheng

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-19 22:38                                             ` Yu-cheng Yu
@ 2018-06-20  0:50                                               ` Andy Lutomirski
  2018-06-21 23:07                                                 ` Yu-cheng Yu
  0 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-20  0:50 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: Kees Cook, Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer



> On Jun 19, 2018, at 3:38 PM, Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> 
> On Tue, 2018-06-19 at 13:47 -0700, Andy Lutomirski wrote:
>>> 
>>> On Jun 19, 2018, at 1:12 PM, Kees Cook <keescook@chromium.org>
>>> wrote:
>>> 
>>>> 
>>>> On Tue, Jun 19, 2018 at 10:20 AM, Andy Lutomirski <luto@amacapita
>>>> l.net> wrote:
>>>> 
>>>>> 
>>>>> On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.org>
>>>>> wrote:
>>>>> 
>>>>> Does it provide anything beyond what PR_DUMPABLE does?
>>>> What do you mean?
>>> I was just going by the name of it. I wasn't sure what "ptrace CET
>>> lock" meant, so I was trying to understand if it was another "you
>>> can't ptrace me" toggle, and if so, wouldn't it be redundant with
>>> PR_SET_DUMPABLE = 0, etc.
>>> 
>> No, other way around. The valid CET states are on/unlocked,
>> off/unlocked, on/locked, off/locked. arch_prctl can freely the state
>> unless locked. ptrace can change it no matter what.  The lock is to
>> prevent the existence of a gadget to disable CET (unless the gadget
>> involves ptrace, but I don’t think that’s a real concern).
> 
> We have the arch_prctl now and only need to add ptrace lock/unlock.
> 
> Back to the dlopen() "relaxed" mode. Would the following work?
> 
> If the lib being loaded does not use setjmp/getcontext families (the
> loader knows?), then the loader leaves shstk on.  

Will that actually work?  Are there libs that do something like longjmp without actually using the glibc longjmp routine?  What about compilers that statically match a throw to a catch and try to return through several frames at once?


> Otherwise, if the
> system-wide setting is "relaxed", the loader turns off shstk and issues
> a warning.  In addition, if (dlopen == relaxed), then cet is not locked
> in any time.
> 
> The system-wide setting (somewhere in /etc?) can be:
> 
>    dlopen=force|relaxed /* controls dlopen of non-cet libs */
>    exec=force|relaxed /* controls exec of non-cet apps */
> 
> 

Why do we need a whole new mechanism here?  Can’t all this use regular glibc tunables?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
  2018-06-20  0:50                                               ` Andy Lutomirski
@ 2018-06-21 23:07                                                 ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-21 23:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Andy Lutomirski, H. J. Lu, Thomas Gleixner, LKML,
	linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Ingo Molnar, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz,
	Florian Weimer

On Tue, 2018-06-19 at 17:50 -0700, Andy Lutomirski wrote:
> 
> > 
> > On Jun 19, 2018, at 3:38 PM, Yu-cheng Yu <yu-cheng.yu@intel.com>
> > wrote:
> > 
> > On Tue, 2018-06-19 at 13:47 -0700, Andy Lutomirski wrote:
> > > 
> > > > 
> > > > 
> > > > On Jun 19, 2018, at 1:12 PM, Kees Cook <keescook@chromium.org>
> > > > wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > On Tue, Jun 19, 2018 at 10:20 AM, Andy Lutomirski <luto@amaca
> > > > > pita
> > > > > l.net> wrote:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Jun 19, 2018, at 10:07 AM, Kees Cook <keescook@chromium.
> > > > > > org>
> > > > > > wrote:
> > > > > > 
> > > > > > Does it provide anything beyond what PR_DUMPABLE does?
> > > > > What do you mean?
> > > > I was just going by the name of it. I wasn't sure what "ptrace
> > > > CET
> > > > lock" meant, so I was trying to understand if it was another
> > > > "you
> > > > can't ptrace me" toggle, and if so, wouldn't it be redundant
> > > > with
> > > > PR_SET_DUMPABLE = 0, etc.
> > > > 
> > > No, other way around. The valid CET states are on/unlocked,
> > > off/unlocked, on/locked, off/locked. arch_prctl can freely the
> > > state
> > > unless locked. ptrace can change it no matter what.  The lock is
> > > to
> > > prevent the existence of a gadget to disable CET (unless the
> > > gadget
> > > involves ptrace, but I don’t think that’s a real concern).
> > We have the arch_prctl now and only need to add ptrace lock/unlock.
> > 
> > Back to the dlopen() "relaxed" mode. Would the following work?
> > 
> > If the lib being loaded does not use setjmp/getcontext families
> > (the
> > loader knows?), then the loader leaves shstk on.  
> Will that actually work?  Are there libs that do something like
> longjmp without actually using the glibc longjmp routine?  What about
> compilers that statically match a throw to a catch and try to return
> through several frames at once?
> 

The compiler throw/catch is already handled similarly to how longjmp is
handled.

To summarize the dlopen() situation,

----
(1) We don't want to fall back like the following.  One reason is
turning off SHSTK for threads is tricky.

if ((dlopen() a legacy library) && (cet_policy==relaxed)) {
	/*
	 * We don't care if the library will actually fault;
	 * just turn off CET protection now.
	 */
	Turn off CET;
}

(2) We cannot predict what version of a library will be dlopen'ed, and
cannot turn off CET reliably from the beginning of an application.
----

Can we mandate a signal handler (to turn off CET) when ((dlopen is used
) && (cet_policy==relaxed))?

> > 
> > Otherwise, if the
> > system-wide setting is "relaxed", the loader turns off shstk and
> > issues
> > a warning.  In addition, if (dlopen == relaxed), then cet is not
> > locked
> > in any time.
> > 
> > The system-wide setting (somewhere in /etc?) can be:
> > 
> >    dlopen=force|relaxed /* controls dlopen of non-cet libs */
> >    exec=force|relaxed /* controls exec of non-cet apps */
> > 
> > 
> Why do we need a whole new mechanism here?  Can’t all this use
> regular glibc tunables?

Ok, got it.

Yu-cheng

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (10 preceding siblings ...)
  2018-06-12 10:56 ` [PATCH 00/10] Control Flow Enforcement - Part (3) Balbir Singh
@ 2018-06-26  2:46 ` Jann Horn
  2018-06-26 14:56   ` Yu-cheng Yu
  2018-06-26  5:26 ` Andy Lutomirski
  12 siblings, 1 reply; 98+ messages in thread
From: Jann Horn @ 2018-06-26  2:46 UTC (permalink / raw)
  To: yu-cheng.yu
  Cc: kernel list, linux-doc, Linux-MM, linux-arch,
	the arch/x86 maintainers, H . Peter Anvin, Thomas Gleixner,
	Ingo Molnar, hjl.tools, vedvyas.shanbhogue, ravi.v.shankar,
	Dave Hansen, Andy Lutomirski, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, Mike Kravetz

On Tue, Jun 26, 2018 at 4:45 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> This series introduces CET - Shadow stack
>
> At the high level, shadow stack is:
>
>         Allocated from a task's address space with vm_flags VM_SHSTK;
>         Its PTEs must be read-only and dirty;
>         Fixed sized, but the default size can be changed by sys admin.
>
> For a forked child, the shadow stack is duplicated when the next
> shadow stack access takes place.
>
> For a pthread child, a new shadow stack is allocated.
>
> The signal handler uses the same shadow stack as the main program.
>
> Yu-cheng Yu (10):
>   x86/cet: User-mode shadow stack support
>   x86/cet: Introduce WRUSS instruction
>   x86/cet: Signal handling for shadow stack
>   x86/cet: Handle thread shadow stack
>   x86/cet: ELF header parsing of Control Flow Enforcement
>   x86/cet: Add arch_prctl functions for shadow stack
>   mm: Prevent mprotect from changing shadow stack
>   mm: Prevent mremap of shadow stack
>   mm: Prevent madvise from changing shadow stack
>   mm: Prevent munmap and remap_file_pages of shadow stack

Shouldn't patches like these be CC'ed to linux-api@vger.kernel.org?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
                   ` (11 preceding siblings ...)
  2018-06-26  2:46 ` Jann Horn
@ 2018-06-26  5:26 ` Andy Lutomirski
  2018-06-26 14:56   ` Yu-cheng Yu
  12 siblings, 1 reply; 98+ messages in thread
From: Andy Lutomirski @ 2018-06-26  5:26 UTC (permalink / raw)
  To: Yu-cheng Yu, Linux API, Jann Horn, Florian Weimer
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> This series introduces CET - Shadow stack

I think you should add some mitigation against sigreturn-oriented
programming.  How about creating some special token on the shadow
stack that indicates the presence of a signal frame at a particular
address when delivering a signal and verifying and popping that token
in sigreturn?  The token could be literally the address of the signal
frame, and you could make this unambiguous by failing sigreturn if CET
is on and the signal frame is in executable memory.

IOW, it would be a shame if sigreturn() itself became a convenient
CET-bypassing gadget.

--Andy

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-26  2:46 ` Jann Horn
@ 2018-06-26 14:56   ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-26 14:56 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, linux-doc, Linux-MM, linux-arch,
	the arch/x86 maintainers, H . Peter Anvin, Thomas Gleixner,
	Ingo Molnar, hjl.tools, vedvyas.shanbhogue, ravi.v.shankar,
	Dave Hansen, Andy Lutomirski, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, Mike Kravetz

On Tue, 2018-06-26 at 04:46 +0200, Jann Horn wrote:
> On Tue, Jun 26, 2018 at 4:45 AM Yu-cheng Yu <yu-cheng.yu@intel.com>
> wrote:
> > 
> > 
> > This series introduces CET - Shadow stack
> > 
> > At the high level, shadow stack is:
> > 
> >         Allocated from a task's address space with vm_flags
> > VM_SHSTK;
> >         Its PTEs must be read-only and dirty;
> >         Fixed sized, but the default size can be changed by sys
> > admin.
> > 
> > For a forked child, the shadow stack is duplicated when the next
> > shadow stack access takes place.
> > 
> > For a pthread child, a new shadow stack is allocated.
> > 
> > The signal handler uses the same shadow stack as the main program.
> > 
> > Yu-cheng Yu (10):
> >   x86/cet: User-mode shadow stack support
> >   x86/cet: Introduce WRUSS instruction
> >   x86/cet: Signal handling for shadow stack
> >   x86/cet: Handle thread shadow stack
> >   x86/cet: ELF header parsing of Control Flow Enforcement
> >   x86/cet: Add arch_prctl functions for shadow stack
> >   mm: Prevent mprotect from changing shadow stack
> >   mm: Prevent mremap of shadow stack
> >   mm: Prevent madvise from changing shadow stack
> >   mm: Prevent munmap and remap_file_pages of shadow stack
> Shouldn't patches like these be CC'ed to linux-api@vger.kernel.org?

Yes, I will do that.

Thanks,
Yu-cheng

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/10] Control Flow Enforcement - Part (3)
  2018-06-26  5:26 ` Andy Lutomirski
@ 2018-06-26 14:56   ` Yu-cheng Yu
  0 siblings, 0 replies; 98+ messages in thread
From: Yu-cheng Yu @ 2018-06-26 14:56 UTC (permalink / raw)
  To: Andy Lutomirski, Linux API, Jann Horn, Florian Weimer
  Cc: LKML, linux-doc, Linux-MM, linux-arch, X86 ML, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz

On Mon, 2018-06-25 at 22:26 -0700, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com>
> wrote:
> > 
> > 
> > This series introduces CET - Shadow stack
> I think you should add some mitigation against sigreturn-oriented
> programming.  How about creating some special token on the shadow
> stack that indicates the presence of a signal frame at a particular
> address when delivering a signal and verifying and popping that token
> in sigreturn?  The token could be literally the address of the signal
> frame, and you could make this unambiguous by failing sigreturn if
> CET
> is on and the signal frame is in executable memory.
> 
> IOW, it would be a shame if sigreturn() itself became a convenient
> CET-bypassing gadget.
> 
> --Andy

I will look into that.

Thanks,
Yu-cheng


^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2018-06-26 15:00 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07 14:37 [PATCH 00/10] Control Flow Enforcement - Part (3) Yu-cheng Yu
2018-06-07 14:37 ` [PATCH 01/10] x86/cet: User-mode shadow stack support Yu-cheng Yu
2018-06-07 16:37   ` Andy Lutomirski
2018-06-07 17:46     ` Yu-cheng Yu
2018-06-07 17:55       ` Dave Hansen
2018-06-07 18:23       ` Andy Lutomirski
2018-06-12 11:56   ` Balbir Singh
2018-06-12 15:03     ` Yu-cheng Yu
2018-06-07 14:37 ` [PATCH 02/10] x86/cet: Introduce WRUSS instruction Yu-cheng Yu
2018-06-07 16:40   ` Andy Lutomirski
2018-06-07 16:51     ` Yu-cheng Yu
2018-06-07 18:41     ` Peter Zijlstra
2018-06-07 20:31       ` Yu-cheng Yu
2018-06-11  8:17     ` Peter Zijlstra
2018-06-11 15:02       ` Yu-cheng Yu
2018-06-14  1:30   ` Balbir Singh
2018-06-14 14:43     ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 03/10] x86/cet: Signal handling for shadow stack Yu-cheng Yu
2018-06-07 18:30   ` Andy Lutomirski
2018-06-07 18:58     ` Florian Weimer
2018-06-07 19:51       ` Yu-cheng Yu
2018-06-07 20:07     ` Cyrill Gorcunov
2018-06-07 20:57       ` Andy Lutomirski
2018-06-08 12:07         ` Cyrill Gorcunov
2018-06-07 20:12     ` Yu-cheng Yu
2018-06-07 20:17       ` Dave Hansen
2018-06-07 14:38 ` [PATCH 04/10] x86/cet: Handle thread " Yu-cheng Yu
2018-06-07 18:21   ` Andy Lutomirski
2018-06-07 19:47     ` Florian Weimer
2018-06-07 20:53       ` Andy Lutomirski
2018-06-08 14:53         ` Florian Weimer
2018-06-08 15:01           ` Andy Lutomirski
2018-06-08 15:50             ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 05/10] x86/cet: ELF header parsing of Control Flow Enforcement Yu-cheng Yu
2018-06-07 18:38   ` Andy Lutomirski
2018-06-07 20:40     ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack Yu-cheng Yu
2018-06-07 18:48   ` Andy Lutomirski
2018-06-07 20:30     ` Yu-cheng Yu
2018-06-07 21:01       ` Andy Lutomirski
2018-06-07 22:02         ` H.J. Lu
2018-06-07 23:01           ` Andy Lutomirski
2018-06-08  4:09             ` H.J. Lu
2018-06-08  4:38               ` Andy Lutomirski
2018-06-08 12:24                 ` H.J. Lu
2018-06-08 14:57                   ` Andy Lutomirski
2018-06-08 15:52                     ` Cyrill Gorcunov
2018-06-08  4:22           ` H.J. Lu
2018-06-08  4:35             ` Andy Lutomirski
2018-06-08 12:17               ` H.J. Lu
2018-06-12 10:03           ` Thomas Gleixner
2018-06-12 11:43             ` H.J. Lu
2018-06-12 16:01               ` Andy Lutomirski
2018-06-12 16:05                 ` H.J. Lu
2018-06-12 16:34                   ` Andy Lutomirski
2018-06-12 16:51                     ` H.J. Lu
2018-06-12 18:59                       ` Thomas Gleixner
2018-06-12 19:34                         ` H.J. Lu
2018-06-18 22:03                           ` Andy Lutomirski
2018-06-19  0:52                             ` Kees Cook
2018-06-19  6:40                               ` Florian Weimer
2018-06-19 14:50                               ` Andy Lutomirski
2018-06-19 16:44                                 ` Kees Cook
2018-06-19 16:59                                   ` Yu-cheng Yu
2018-06-19 17:07                                     ` Kees Cook
2018-06-19 17:20                                       ` Andy Lutomirski
2018-06-19 20:12                                         ` Kees Cook
2018-06-19 20:47                                           ` Andy Lutomirski
2018-06-19 22:38                                             ` Yu-cheng Yu
2018-06-20  0:50                                               ` Andy Lutomirski
2018-06-21 23:07                                                 ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 07/10] mm: Prevent mprotect from changing " Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 08/10] mm: Prevent mremap of " Yu-cheng Yu
2018-06-07 18:48   ` Andy Lutomirski
2018-06-07 20:18     ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 09/10] mm: Prevent madvise from changing " Yu-cheng Yu
2018-06-07 20:54   ` Andy Lutomirski
2018-06-07 21:09   ` Nadav Amit
2018-06-07 21:18     ` Yu-cheng Yu
2018-06-07 14:38 ` [PATCH 10/10] mm: Prevent munmap and remap_file_pages of " Yu-cheng Yu
2018-06-07 18:50   ` Andy Lutomirski
2018-06-07 20:15     ` Yu-cheng Yu
2018-06-12 10:56 ` [PATCH 00/10] Control Flow Enforcement - Part (3) Balbir Singh
2018-06-12 15:03   ` Yu-cheng Yu
2018-06-12 16:00     ` Andy Lutomirski
2018-06-12 16:21       ` Yu-cheng Yu
2018-06-12 16:31         ` Andy Lutomirski
2018-06-12 17:24           ` Yu-cheng Yu
2018-06-12 20:15             ` Yu-cheng Yu
2018-06-14  1:07     ` Balbir Singh
2018-06-14 14:56       ` Yu-cheng Yu
2018-06-17  3:16         ` Balbir Singh
2018-06-18 21:44           ` Andy Lutomirski
2018-06-19  8:52             ` Balbir Singh
2018-06-26  2:46 ` Jann Horn
2018-06-26 14:56   ` Yu-cheng Yu
2018-06-26  5:26 ` Andy Lutomirski
2018-06-26 14:56   ` Yu-cheng Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).