LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 00/17] x86, mpx updates for 4.1 (take 2)
@ 2015-03-26 18:33 Dave Hansen
  2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen

Hi x86 maintainers,

It appeared that my attempt to send this earlier this week
failed to reach at least some of the intended recipients.
I updated a few of the patch descriptions and am resending.

There are 4 basic things going on here:
1. Make FPU/xsave code preempt safe and work properly
2. Add trace points to make kernel and app debugging easier
3. Add a boot-time disable for mpx
4. Support 32-bit binaries to run on 64-bit kernels

I've hesitated sending this in the past few weeks as
the FPU code had a lot of churn and affected our
ability to test on current kernels.  It seems to have
settled down and appears more stable now.

These are available against 4.0-rc4 in git as well:

  git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx.git mpx-v17

Dave Hansen (17):
  x86, fpu: wrap get_xsave_addr() to make it safer
  x86, mpx: use new tsk_get_xsave_addr()
  x86, mpx: trace #BR exceptions
  x86, mpx: trace entry to bounds exception paths
  x86, mpx: trace when MPX is zapping pages
  x86, mpx: trace attempts to find bounds tables
  x86, mpx: trace allocation of new bounds tables
  x86, mpx: boot-time disable
  x86: make is_64bit_mm() widely available
  x86: make __VIRTUAL_MASK safe to use on 32 bit
  x86, mpx: we do not allocate the bounds directory
  x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK
  x86, mpx: Add temporary variable to reduce masking
  x86, mpx: new directory entry to addr helper
  x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  x86, mpx: support 32-bit binaries on 64-bit kernel
  x86, mpx: allow mixed binaries again

 Documentation/kernel-parameters.txt |   4 +
 arch/x86/include/asm/mmu_context.h  |  13 ++
 arch/x86/include/asm/mpx.h          |  74 +++++-----
 arch/x86/include/asm/page_types.h   |   8 ++
 arch/x86/include/asm/trace/mpx.h    | 133 ++++++++++++++++++
 arch/x86/include/asm/xsave.h        |   1 +
 arch/x86/kernel/cpu/common.c        |  16 +++
 arch/x86/kernel/traps.c             |  12 +-
 arch/x86/kernel/uprobes.c           |  10 +-
 arch/x86/kernel/xsave.c             |  32 +++++
 arch/x86/mm/mpx.c                   | 270 +++++++++++++++++++++++++++++-------
 11 files changed, 469 insertions(+), 104 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/mpx.h

-- 
1.9.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 15:15   ` Borislav Petkov
  2015-03-27 18:57   ` Oleg Nesterov
  2015-03-26 18:33 ` [PATCH 02/17] x86, mpx: use new tsk_get_xsave_addr() Dave Hansen
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, Dave Hansen, dave.hansen, oleg, bp, riel, sbsiddha,
	luto, mingo, hpa, fenghua.yu


From: Dave Hansen <dave.hansen@linux.intel.com>

The MPX code appears to be saving off the FPU in an unsafe
way.   It does not disable preemption or ensure that the
FPU state has been allocated.

This patch introduces a new helper which will do both of
those things internally.

Note that this requires a patch from Oleg in order to work
properly.  It is currently in tip/x86/fpu.

> commit f893959b0898bd876673adbeb6798bdf25c034d7
> Author: Oleg Nesterov <oleg@redhat.com>
> Date:   Fri Mar 13 18:30:30 2015 +0100
>
>    x86/fpu: Don't abuse drop_init_fpu() in flush_thread()

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: bp@alien8.de
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
---

 b/arch/x86/include/asm/xsave.h |    1 +
 b/arch/x86/kernel/xsave.c      |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff -puN arch/x86/include/asm/xsave.h~tsk_get_xsave_addr arch/x86/include/asm/xsave.h
--- a/arch/x86/include/asm/xsave.h~tsk_get_xsave_addr	2015-03-26 11:27:04.738204327 -0700
+++ b/arch/x86/include/asm/xsave.h	2015-03-26 11:27:04.743204552 -0700
@@ -252,6 +252,7 @@ static inline int xrestore_user(struct x
 }
 
 void *get_xsave_addr(struct xsave_struct *xsave, int xstate);
+void *tsk_get_xsave_field(struct task_struct *tsk, int xstate_field);
 void setup_xstate_comp(void);
 
 #endif
diff -puN arch/x86/kernel/xsave.c~tsk_get_xsave_addr arch/x86/kernel/xsave.c
--- a/arch/x86/kernel/xsave.c~tsk_get_xsave_addr	2015-03-26 11:27:04.740204417 -0700
+++ b/arch/x86/kernel/xsave.c	2015-03-26 11:27:04.744204597 -0700
@@ -740,3 +740,35 @@ void *get_xsave_addr(struct xsave_struct
 	return (void *)xsave + xstate_comp_offsets[feature];
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
+
+/*
+ * This wraps up the common operations that need to occur when retrieving
+ * data from an xsave struct.  It first ensures that the task was actually
+ * using the FPU and retrieves the data in to a buffer.  It then calculates
+ * the offset of the requested field in the buffer.
+ *
+ * This function is safe to call whether the FPU is in use or not.
+ *
+ * Inputs:
+ *	tsk: the task from which we are fetching xsave state
+ *	xstate: state which is defined in xsave.h (e.g. XSTATE_FP, XSTATE_SSE,
+ *	etc.)
+ * Output:
+ *	address of the state in the xsave area.
+ */
+void *tsk_get_xsave_field(struct task_struct *tsk, int xsave_field)
+{
+	union thread_xstate *xstate;
+
+	if (!used_math())
+		return NULL;
+	/*
+	 * unlazy_fpu() is poorly named and will actually
+	 * save the xstate off in to the memory buffer.
+	 */
+	unlazy_fpu(tsk);
+	xstate = tsk->thread.fpu.state;
+
+	return get_xsave_addr(&xstate->xsave, xsave_field);
+}
+EXPORT_SYMBOL_GPL(tsk_get_xsave_field);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 02/17] x86, mpx: use new tsk_get_xsave_addr()
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
  2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 03/17] x86, mpx: trace #BR exceptions Dave Hansen
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, Dave Hansen, dave.hansen, oleg, bp, riel, sbsiddha,
	luto, mingo, hpa, fenghua.yu


From: Dave Hansen <dave.hansen@linux.intel.com>

The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly
accessible via normal instructions.  They essentially act as
if they were floating point registers and are saved/restored
along with those registers.

There are two main paths in the MPX code where we care about
the contents of these registers:
	1. #BR (bounds) faults
	2. the prctl() code where we are setting MPX up

Both of those paths _might_ be called without the FPU having
been used.  That means that 'tsk->thread.fpu.state' might
never be allocated.

Also, fpu_save_init() is not preempt-safe.  It was a bug to
call it without disabling preemption.  The new
tsk_get_xsave_addr() calls unlazy_fpu() instead and properly
disables preemption.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: bp@alien8.de
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
---

 b/arch/x86/include/asm/mpx.h |    8 ++++----
 b/arch/x86/kernel/traps.c    |   10 ++++------
 b/arch/x86/mm/mpx.c          |   21 ++++++++++-----------
 3 files changed, 18 insertions(+), 21 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~use-new-tsk_get_xsave_addr arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~use-new-tsk_get_xsave_addr	2015-03-26 11:27:05.108221015 -0700
+++ b/arch/x86/include/asm/mpx.h	2015-03-26 11:27:05.114221286 -0700
@@ -60,8 +60,8 @@
 
 #ifdef CONFIG_X86_INTEL_MPX
 siginfo_t *mpx_generate_siginfo(struct pt_regs *regs,
-				struct xsave_struct *xsave_buf);
-int mpx_handle_bd_fault(struct xsave_struct *xsave_buf);
+				struct task_struct *tsk);
+int mpx_handle_bd_fault(struct task_struct *tsk);
 static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
 {
 	return (mm->bd_addr != MPX_INVALID_BOUNDS_DIR);
@@ -78,11 +78,11 @@ void mpx_notify_unmap(struct mm_struct *
 		      unsigned long start, unsigned long end);
 #else
 static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs,
-					      struct xsave_struct *xsave_buf)
+					      struct task_struct *tsk)
 {
 	return NULL;
 }
-static inline int mpx_handle_bd_fault(struct xsave_struct *xsave_buf)
+static inline int mpx_handle_bd_fault(struct task_struct *tsk)
 {
 	return -EINVAL;
 }
diff -puN arch/x86/kernel/traps.c~use-new-tsk_get_xsave_addr arch/x86/kernel/traps.c
--- a/arch/x86/kernel/traps.c~use-new-tsk_get_xsave_addr	2015-03-26 11:27:05.109221060 -0700
+++ b/arch/x86/kernel/traps.c	2015-03-26 11:27:05.115221331 -0700
@@ -61,6 +61,7 @@
 #include <asm/mach_traps.h>
 #include <asm/alternative.h>
 #include <asm/mpx.h>
+#include <asm/xsave.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -373,7 +374,6 @@ dotraplinkage void do_double_fault(struc
 dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 {
 	struct task_struct *tsk = current;
-	struct xsave_struct *xsave_buf;
 	enum ctx_state prev_state;
 	struct bndcsr *bndcsr;
 	siginfo_t *info;
@@ -397,9 +397,7 @@ dotraplinkage void do_bounds(struct pt_r
 	 * It is not directly accessible, though, so we need to
 	 * do an xsave and then pull it out of the xsave buffer.
 	 */
-	fpu_save_init(&tsk->thread.fpu);
-	xsave_buf = &(tsk->thread.fpu.state->xsave);
-	bndcsr = get_xsave_addr(xsave_buf, XSTATE_BNDCSR);
+	bndcsr = tsk_get_xsave_field(tsk, XSTATE_BNDCSR);
 	if (!bndcsr)
 		goto exit_trap;
 
@@ -410,11 +408,11 @@ dotraplinkage void do_bounds(struct pt_r
 	 */
 	switch (bndcsr->bndstatus & MPX_BNDSTA_ERROR_CODE) {
 	case 2:	/* Bound directory has invalid entry. */
-		if (mpx_handle_bd_fault(xsave_buf))
+		if (mpx_handle_bd_fault(tsk))
 			goto exit_trap;
 		break; /* Success, it was handled */
 	case 1: /* Bound violation. */
-		info = mpx_generate_siginfo(regs, xsave_buf);
+		info = mpx_generate_siginfo(regs, tsk);
 		if (IS_ERR(info)) {
 			/*
 			 * We failed to decode the MPX instruction.  Act as if
diff -puN arch/x86/mm/mpx.c~use-new-tsk_get_xsave_addr arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~use-new-tsk_get_xsave_addr	2015-03-26 11:27:05.111221150 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:05.115221331 -0700
@@ -273,7 +273,7 @@ bad_opcode:
  * The caller is expected to kfree() the returned siginfo_t.
  */
 siginfo_t *mpx_generate_siginfo(struct pt_regs *regs,
-				struct xsave_struct *xsave_buf)
+				struct task_struct *tsk)
 {
 	struct bndreg *bndregs, *bndreg;
 	siginfo_t *info = NULL;
@@ -296,7 +296,7 @@ siginfo_t *mpx_generate_siginfo(struct p
 		goto err_out;
 	}
 	/* get the bndregs _area_ of the xsave structure */
-	bndregs = get_xsave_addr(xsave_buf, XSTATE_BNDREGS);
+	bndregs = tsk_get_xsave_field(tsk, XSTATE_BNDREGS);
 	if (!bndregs) {
 		err = -EINVAL;
 		goto err_out;
@@ -358,8 +358,7 @@ static __user void *task_get_bounds_dir(
 	 * The bounds directory pointer is stored in a register
 	 * only accessible if we first do an xsave.
 	 */
-	fpu_save_init(&tsk->thread.fpu);
-	bndcsr = get_xsave_addr(&tsk->thread.fpu.state->xsave, XSTATE_BNDCSR);
+	bndcsr = tsk_get_xsave_field(tsk, XSTATE_BNDCSR);
 	if (!bndcsr)
 		return MPX_INVALID_BOUNDS_DIR;
 
@@ -390,9 +389,9 @@ int mpx_enable_management(struct task_st
 	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
 	 * XRSTOR instruction.
 	 *
-	 * fpu_xsave() is expected to be very expensive. Storing the bounds
-	 * directory here means that we do not have to do xsave in the unmap
-	 * path; we can just use mm->bd_addr instead.
+	 * xsaves are expected to be very expensive.  Storing the bounds
+	 * directory here means that we do not have to do xsave in the
+	 * unmap path; we can just use mm->bd_addr instead.
 	 */
 	bd_base = task_get_bounds_dir(tsk);
 	down_write(&mm->mmap_sem);
@@ -498,12 +497,12 @@ out_unmap:
  * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
  * and the size of each bound table is 4MB.
  */
-static int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+static int do_mpx_bt_fault(struct task_struct *tsk)
 {
 	unsigned long bd_entry, bd_base;
 	struct bndcsr *bndcsr;
 
-	bndcsr = get_xsave_addr(xsave_buf, XSTATE_BNDCSR);
+	bndcsr = tsk_get_xsave_field(tsk, XSTATE_BNDCSR);
 	if (!bndcsr)
 		return -EINVAL;
 	/*
@@ -526,7 +525,7 @@ static int do_mpx_bt_fault(struct xsave_
 	return allocate_bt((long __user *)bd_entry);
 }
 
-int mpx_handle_bd_fault(struct xsave_struct *xsave_buf)
+int mpx_handle_bd_fault(struct task_struct *tsk)
 {
 	/*
 	 * Userspace never asked us to manage the bounds tables,
@@ -535,7 +534,7 @@ int mpx_handle_bd_fault(struct xsave_str
 	if (!kernel_managing_mpx_tables(current->mm))
 		return -EINVAL;
 
-	if (do_mpx_bt_fault(xsave_buf)) {
+	if (do_mpx_bt_fault(tsk)) {
 		force_sig(SIGSEGV, current);
 		/*
 		 * The force_sig() is essentially "handling" this
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 03/17] x86, mpx: trace #BR exceptions
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
  2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
  2015-03-26 18:33 ` [PATCH 02/17] x86, mpx: use new tsk_get_xsave_addr() Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 10:21   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 04/17] x86, mpx: trace entry to bounds exception paths Dave Hansen
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This is the first in a series of MPX tracing patches.
I've found these extremely useful in the process of
debugging applications and the kernel code itself.

This exception hooks in to the bounds (#BR) exception
very early and allows capturing the key registers which
would influence how the exception is handled.

Note that bndcfgu/bndstatus are technically still
64-bit registers even in 32-bit mode.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/trace/mpx.h |   48 +++++++++++++++++++++++++++++++++++++
 b/arch/x86/kernel/traps.c          |    2 +
 b/arch/x86/mm/mpx.c                |    3 ++
 3 files changed, 53 insertions(+)

diff -puN /dev/null arch/x86/include/asm/trace/mpx.h
--- /dev/null	2014-10-10 16:10:57.316716958 -0700
+++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:05.444236170 -0700
@@ -0,0 +1,48 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mpx
+
+#if !defined(_TRACE_MPX_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MPX_H
+
+#include <linux/tracepoint.h>
+
+#ifdef CONFIG_X86_INTEL_MPX
+
+TRACE_EVENT(bounds_exception_mpx,
+
+	TP_PROTO(struct bndcsr *bndcsr),
+	TP_ARGS(bndcsr),
+
+	TP_STRUCT__entry(
+		__field(u64, bndcfgu)
+		__field(u64, bndstatus)
+	),
+
+	TP_fast_assign(
+		__entry->bndcfgu   = bndcsr->bndcfgu;
+		__entry->bndstatus = bndcsr->bndstatus;
+	),
+
+	TP_printk("bndcfgu:0x%llx bndstatus:0x%llx",
+		__entry->bndcfgu,
+		__entry->bndstatus)
+);
+
+#else
+
+/*
+ * This gets used outside of MPX-specific code, so we need a stub.
+ */
+static inline void trace_bounds_exception_mpx(struct bndcsr *bndcsr)
+{
+}
+
+#endif /* CONFIG_X86_INTEL_MPX */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH asm/trace/
+#define TRACE_INCLUDE_FILE mpx
+#endif /* _TRACE_MPX_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff -puN arch/x86/kernel/traps.c~x86-br-exception-trace arch/x86/kernel/traps.c
--- a/arch/x86/kernel/traps.c~x86-br-exception-trace	2015-03-26 11:27:05.439235945 -0700
+++ b/arch/x86/kernel/traps.c	2015-03-26 11:27:05.445236215 -0700
@@ -61,6 +61,7 @@
 #include <asm/mach_traps.h>
 #include <asm/alternative.h>
 #include <asm/mpx.h>
+#include <asm/trace/mpx.h>
 #include <asm/xsave.h>
 
 #ifdef CONFIG_X86_64
@@ -401,6 +402,7 @@ dotraplinkage void do_bounds(struct pt_r
 	if (!bndcsr)
 		goto exit_trap;
 
+	trace_bounds_exception_mpx(bndcsr);
 	/*
 	 * The error code field of the BNDSTATUS register communicates status
 	 * information of a bound range exception #BR or operation involving
diff -puN arch/x86/mm/mpx.c~x86-br-exception-trace arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~x86-br-exception-trace	2015-03-26 11:27:05.441236035 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:05.445236215 -0700
@@ -18,6 +18,9 @@
 #include <asm/processor.h>
 #include <asm/fpu-internal.h>
 
+#define CREATE_TRACE_POINTS
+#include <asm/trace/mpx.h>
+
 static const char *mpx_mapping_name(struct vm_area_struct *vma)
 {
 	return "[mpx]";
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 04/17] x86, mpx: trace entry to bounds exception paths
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (2 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 03/17] x86, mpx: trace #BR exceptions Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 12:02   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 05/17] x86, mpx: trace when MPX is zapping pages Dave Hansen
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

There are two basic things that can happen as the result of
a bounds exception (#BR):

	1. We allocate a new bounds table
	2. We pass up a bounds exception to userspace.

This patch adds a trace points for the case where we are
passing the exception up to userspace with a signal.

We are also explicit that we're printing out the inverse of
the 'upper' that we encounter.  If you want to filter, for
instance, you need to ~ the value first.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/trace/mpx.h |   25 +++++++++++++++++++++++++
 b/arch/x86/mm/mpx.c                |    2 ++
 2 files changed, 27 insertions(+)

diff -puN arch/x86/include/asm/trace/mpx.h~x86-mpx-trace-1 arch/x86/include/asm/trace/mpx.h
--- a/arch/x86/include/asm/trace/mpx.h~x86-mpx-trace-1	2015-03-26 11:27:05.871255430 -0700
+++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:05.876255655 -0700
@@ -8,6 +8,31 @@
 
 #ifdef CONFIG_X86_INTEL_MPX
 
+TRACE_EVENT(mpx_bounds_register_exception,
+
+	TP_PROTO(void *addr_referenced,
+		 struct bndreg *bndreg),
+	TP_ARGS(addr_referenced, bndreg),
+
+	TP_STRUCT__entry(
+		__field(void *, addr_referenced)
+		__field(u64, lower_bound)
+		__field(u64, upper_bound)
+	),
+
+	TP_fast_assign(
+		__entry->addr_referenced = addr_referenced;
+		__entry->lower_bound = bndreg->lower_bound;
+		__entry->upper_bound = bndreg->upper_bound;
+	),
+
+	TP_printk("address referenced: 0x%p bounds: lower: 0x%llx ~upper: 0x%llx",
+		__entry->addr_referenced,
+		__entry->lower_bound,
+		~__entry->upper_bound
+	)
+);
+
 TRACE_EVENT(bounds_exception_mpx,
 
 	TP_PROTO(struct bndcsr *bndcsr),
diff -puN arch/x86/mm/mpx.c~x86-mpx-trace-1 arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~x86-mpx-trace-1	2015-03-26 11:27:05.873255520 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:05.876255655 -0700
@@ -16,6 +16,7 @@
 #include <asm/mmu_context.h>
 #include <asm/mpx.h>
 #include <asm/processor.h>
+#include <asm/trace/mpx.h>
 #include <asm/fpu-internal.h>
 
 #define CREATE_TRACE_POINTS
@@ -337,6 +338,7 @@ siginfo_t *mpx_generate_siginfo(struct p
 		err = -EINVAL;
 		goto err_out;
 	}
+	trace_mpx_bounds_register_exception(info->si_addr, bndreg);
 	return info;
 err_out:
 	/* info might be NULL, but kfree() handles that */
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 05/17] x86, mpx: trace when MPX is zapping pages
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (3 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 04/17] x86, mpx: trace entry to bounds exception paths Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 12:26   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 06/17] x86, mpx: trace attempts to find bounds tables Dave Hansen
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

When MPX can not free an entire bounds table, it will instead
try to zap unused parts of a bounds table to free the backing
memory.  This decreases Rss (resident set) without decreasing
the virtual space allocated for bounds tables.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/trace/mpx.h |   22 ++++++++++++++++++++++
 b/arch/x86/mm/mpx.c                |    1 +
 2 files changed, 23 insertions(+)

diff -puN arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_zap arch/x86/include/asm/trace/mpx.h
--- a/arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_zap	2015-03-26 11:27:06.279273832 -0700
+++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:06.283274012 -0700
@@ -53,6 +53,28 @@ TRACE_EVENT(bounds_exception_mpx,
 		__entry->bndstatus)
 );
 
+TRACE_EVENT(mpx_unmap_zap,
+
+	TP_PROTO(unsigned long start,
+		 unsigned long end),
+	TP_ARGS(start, end),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+		__entry->end   = end;
+	),
+
+	TP_printk("0x%p -> 0x%p",
+		(void *)__entry->start,
+		(void *)__entry->end
+	)
+);
+
 #else
 
 /*
diff -puN arch/x86/mm/mpx.c~mpx-trace_unmap_zap arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-trace_unmap_zap	2015-03-26 11:27:06.280273877 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:06.284274058 -0700
@@ -670,6 +670,7 @@ static int zap_bt_entries(struct mm_stru
 
 		len = min(vma->vm_end, end) - addr;
 		zap_page_range(vma, addr, len, NULL);
+		trace_mpx_unmap_zap(addr, addr+len);
 
 		vma = vma->vm_next;
 		addr = vma->vm_start;
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 06/17] x86, mpx: trace attempts to find bounds tables
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (4 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 05/17] x86, mpx: trace when MPX is zapping pages Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 12:32   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 07/17] x86, mpx: trace allocation of new " Dave Hansen
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

This event traces any time we go looking to unmap a bounds table
for a given virtual address range.  This is useful to ensure
that the kernel actually "tried" to free a bounds table versus
times it succeeded.

It might try and fail if it realized that a table was shared
with an adjacent VMA which is not being unmapped.


Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/trace/mpx.h |   22 ++++++++++++++++++++++
 b/arch/x86/mm/mpx.c                |    1 +
 2 files changed, 23 insertions(+)

diff -puN arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_search arch/x86/include/asm/trace/mpx.h
--- a/arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_search	2015-03-26 11:27:06.684292099 -0700
+++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:06.689292324 -0700
@@ -75,6 +75,28 @@ TRACE_EVENT(mpx_unmap_zap,
 	)
 );
 
+TRACE_EVENT(mpx_unmap_search,
+
+	TP_PROTO(unsigned long start,
+		 unsigned long end),
+	TP_ARGS(start, end),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+		__entry->end   = end;
+	),
+
+	TP_printk("0x%p -> 0x%p",
+		(void *)__entry->start,
+		(void *)__entry->end
+	)
+);
+
 #else
 
 /*
diff -puN arch/x86/mm/mpx.c~mpx-trace_unmap_search arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-trace_unmap_search	2015-03-26 11:27:06.686292189 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:06.689292324 -0700
@@ -843,6 +843,7 @@ static int mpx_unmap_tables(struct mm_st
 	long __user *bd_entry, *bde_start, *bde_end;
 	unsigned long bt_addr;
 
+	trace_mpx_unmap_search(start, end);
 	/*
 	 * "Edge" bounds tables are those which are being used by the region
 	 * (start -> end), but that may be shared with adjacent areas.  If they
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 07/17] x86, mpx: trace allocation of new bounds tables
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (5 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 06/17] x86, mpx: trace attempts to find bounds tables Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 08/17] x86, mpx: boot-time disable Dave Hansen
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Bounds tables are a significant consumer of memory.  It is important
to know when they are being allocated.  Add a trace point to trace
whenever an allocation occurs and also its virtual address.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/trace/mpx.h |   16 ++++++++++++++++
 b/arch/x86/mm/mpx.c                |    1 +
 2 files changed, 17 insertions(+)

diff -puN arch/x86/include/asm/trace/mpx.h~trace_mpx_new_bounds_table arch/x86/include/asm/trace/mpx.h
--- a/arch/x86/include/asm/trace/mpx.h~trace_mpx_new_bounds_table	2015-03-26 11:27:07.090310411 -0700
+++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:07.095310637 -0700
@@ -97,6 +97,22 @@ TRACE_EVENT(mpx_unmap_search,
 	)
 );
 
+TRACE_EVENT(mpx_new_bounds_table,
+
+	TP_PROTO(unsigned long table_vaddr),
+	TP_ARGS(table_vaddr),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, table_vaddr)
+	),
+
+	TP_fast_assign(
+		__entry->table_vaddr = table_vaddr;
+	),
+
+	TP_printk("table vaddr:%p", (void *)__entry->table_vaddr)
+);
+
 #else
 
 /*
diff -puN arch/x86/mm/mpx.c~trace_mpx_new_bounds_table arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~trace_mpx_new_bounds_table	2015-03-26 11:27:07.092310501 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:07.096310682 -0700
@@ -485,6 +485,7 @@ static int allocate_bt(long __user *bd_e
 		ret = -EINVAL;
 		goto out_unmap;
 	}
+	trace_mpx_new_bounds_table(bt_addr);
 	return 0;
 out_unmap:
 	vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 08/17] x86, mpx: boot-time disable
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (6 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 07/17] x86, mpx: trace allocation of new " Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 15:07   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 09/17] x86: make is_64bit_mm() widely available Dave Hansen
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

MPX has the _potential_ to cause some issues.  Say part of your init
system tried to protect one of its components from buffer overflows
with MPX.  If there were a false positive, it's possible that MPX
could keep a system from booting.

MPX could also potentially cause performance issues since it is
present in hot paths like the unmap path.

Allow it to be disabled at boot time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/Documentation/kernel-parameters.txt |    4 ++++
 b/arch/x86/kernel/cpu/common.c        |   16 ++++++++++++++++
 2 files changed, 20 insertions(+)

diff -puN arch/x86/kernel/cpu/common.c~x86-mpx-disable-boot-time arch/x86/kernel/cpu/common.c
--- a/arch/x86/kernel/cpu/common.c~x86-mpx-disable-boot-time	2015-03-26 11:27:31.269400935 -0700
+++ b/arch/x86/kernel/cpu/common.c	2015-03-26 11:27:31.275401206 -0700
@@ -172,6 +172,22 @@ static int __init x86_xsaves_setup(char
 }
 __setup("noxsaves", x86_xsaves_setup);
 
+static int __init x86_mpx_setup(char *s)
+{
+	/* require an exact match without trailing characters */
+	if (strlen(s))
+		return 0;
+
+	/* do not emit a message if the feature is not present */
+	if (!boot_cpu_has(X86_FEATURE_MPX))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_MPX);
+	printk("nompx: Intel Memory Protection Extensions (MPX) disabled\n");
+	return 1;
+}
+__setup("nompx", x86_mpx_setup);
+
 #ifdef CONFIG_X86_32
 static int cachesize_override = -1;
 static int disable_x86_serial_nr = 1;
diff -puN Documentation/kernel-parameters.txt~x86-mpx-disable-boot-time Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt~x86-mpx-disable-boot-time	2015-03-26 11:27:31.272401070 -0700
+++ b/Documentation/kernel-parameters.txt	2015-03-26 11:27:31.277401296 -0700
@@ -2340,6 +2340,10 @@ bytes respectively. Such letter suffixes
 			parameter, xsave area per process might occupy more
 			memory on xsaves enabled systems.
 
+	nompx		[X86] Disables Intel Memory Protection Extensions.
+			See Documentation/x86/intel_mpx.txt for more
+			information about the feature.
+
 	eagerfpu=	[X86]
 			on	enable eager fpu restore
 			off	disable eager fpu restore
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 09/17] x86: make is_64bit_mm() widely available
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (7 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 08/17] x86, mpx: boot-time disable Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 22:35   ` Andy Lutomirski
  2015-03-27 15:21   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 10/17] x86: make __VIRTUAL_MASK safe to use on 32 bit Dave Hansen
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen, oleg


From: Dave Hansen <dave.hansen@linux.intel.com>

The uprobes code has a nice helper, is_64bit_mm(), that consults both
the runtime and compile-time flags for 32-bit support.  Instead of
reinventing the wheel, pull it in to an x86 header so we can use it
for MPX.

I prefer passing the mm around to test_thread_flag(TIF_IA32) beacuse
it makes it explicit where the context is coming from.  It will also
make it easier if we ever need to access the MPX structures from
another process.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---

 b/arch/x86/include/asm/mmu_context.h |   13 +++++++++++++
 b/arch/x86/kernel/uprobes.c          |   10 +---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff -puN arch/x86/include/asm/mmu_context.h~x86-make-is_64bit_mm-available arch/x86/include/asm/mmu_context.h
--- a/arch/x86/include/asm/mmu_context.h~x86-make-is_64bit_mm-available	2015-03-26 11:27:31.685419698 -0700
+++ b/arch/x86/include/asm/mmu_context.h	2015-03-26 11:27:31.690419923 -0700
@@ -142,6 +142,19 @@ static inline void arch_exit_mmap(struct
 	paravirt_arch_exit_mmap(mm);
 }
 
+#ifdef CONFIG_X86_64
+static inline bool is_64bit_mm(struct mm_struct *mm)
+{
+	return	!config_enabled(CONFIG_IA32_EMULATION) ||
+		!(mm->context.ia32_compat == TIF_IA32);
+}
+#else
+static inline bool is_64bit_mm(struct mm_struct *mm)
+{
+	return false;
+}
+#endif
+
 static inline void arch_bprm_mm_init(struct mm_struct *mm,
 		struct vm_area_struct *vma)
 {
diff -puN arch/x86/kernel/uprobes.c~x86-make-is_64bit_mm-available arch/x86/kernel/uprobes.c
--- a/arch/x86/kernel/uprobes.c~x86-make-is_64bit_mm-available	2015-03-26 11:27:31.687419788 -0700
+++ b/arch/x86/kernel/uprobes.c	2015-03-26 11:27:31.690419923 -0700
@@ -29,6 +29,7 @@
 #include <linux/kdebug.h>
 #include <asm/processor.h>
 #include <asm/insn.h>
+#include <asm/mmu_context.h>
 
 /* Post-execution fixups. */
 
@@ -312,11 +313,6 @@ static int uprobe_init_insn(struct arch_
 }
 
 #ifdef CONFIG_X86_64
-static inline bool is_64bit_mm(struct mm_struct *mm)
-{
-	return	!config_enabled(CONFIG_IA32_EMULATION) ||
-		!(mm->context.ia32_compat == TIF_IA32);
-}
 /*
  * If arch_uprobe->insn doesn't use rip-relative addressing, return
  * immediately.  Otherwise, rewrite the instruction so that it accesses
@@ -497,10 +493,6 @@ static void riprel_post_xol(struct arch_
 	}
 }
 #else /* 32-bit: */
-static inline bool is_64bit_mm(struct mm_struct *mm)
-{
-	return false;
-}
 /*
  * No RIP-relative addressing on 32-bit
  */
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 10/17] x86: make __VIRTUAL_MASK safe to use on 32 bit
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (8 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 09/17] x86: make is_64bit_mm() widely available Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 11/17] x86, mpx: we do not allocate the bounds directory Dave Hansen
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

We are going to do some calculations in a moment that are based on the
size of the virtual address space.  __VIRTUAL_MASK is currently unsafe
to use on 32-bit since it overflows an unsigned long with its shift.

The current version will emit a warning if used at all on 32-bit
kernels.

Add a version which is safe on 32-bit and consequently does not spit
out a warning.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/page_types.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff -puN arch/x86/include/asm/page_types.h~x86-make-__VIRTUAL_MASK-safe-to-use-on-32-bit arch/x86/include/asm/page_types.h
--- a/arch/x86/include/asm/page_types.h~x86-make-__VIRTUAL_MASK-safe-to-use-on-32-bit	2015-03-26 11:27:32.079437469 -0700
+++ b/arch/x86/include/asm/page_types.h	2015-03-26 11:27:32.082437604 -0700
@@ -10,7 +10,15 @@
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 
 #define __PHYSICAL_MASK		((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#ifdef CONFIG_X86_64
+/*
+ * This version doesn't work on 32-bit because __VIRTUAL_MASK_SHIFT=32
+ * overflows the unsigned long we're trying to shift.
+ */
 #define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
+#else
+#define __VIRTUAL_MASK		(~0UL)
+#endif
 
 /* Cast PAGE_MASK to a signed type so that it is sign-extended if
    virtual addresses are 32-bits but physical addresses are larger
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 11/17] x86, mpx: we do not allocate the bounds directory
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (9 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 10/17] x86: make __VIRTUAL_MASK safe to use on 32 bit Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK Dave Hansen
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

The comment and code here are confusing.  We do not currently
allocate the bounds directory in the kernel.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/mm/mpx.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86/mm/mpx.c~x86-mpx-we-do-not-allocate-the-bounds-directory arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~x86-mpx-we-do-not-allocate-the-bounds-directory	2015-03-26 11:27:32.434453481 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:32.437453616 -0700
@@ -51,8 +51,8 @@ static unsigned long mpx_mmap(unsigned l
 	vm_flags_t vm_flags;
 	struct vm_area_struct *vma;
 
-	/* Only bounds table and bounds directory can be allocated here */
-	if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES)
+	/* Only bounds table can be allocated here */
+	if (len != MPX_BT_SIZE_BYTES)
 		return -EINVAL;
 
 	down_write(&mm->mmap_sem);
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (10 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 11/17] x86, mpx: we do not allocate the bounds directory Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 17:01   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 13/17] x86, mpx: Add temporary variable to reduce masking Dave Hansen
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, qiaowei.ren, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

From: Qiaowei Ren <qiaowei.ren@intel.com>

MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes
redundant one.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/mpx.h |    1 -
 1 file changed, 1 deletion(-)

diff -puN arch/x86/include/asm/mpx.h~0001-x86-mpx-remove-redundant-MPX_BNDCFG_ADDR_MASK arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~0001-x86-mpx-remove-redundant-MPX_BNDCFG_ADDR_MASK	2015-03-26 11:27:32.798469899 -0700
+++ b/arch/x86/include/asm/mpx.h	2015-03-26 11:27:32.802470079 -0700
@@ -45,7 +45,6 @@
 #define MPX_BNDSTA_TAIL		2
 #define MPX_BNDCFG_TAIL		12
 #define MPX_BNDSTA_ADDR_MASK	(~((1UL<<MPX_BNDSTA_TAIL)-1))
-#define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
 #define MPX_BT_ADDR_MASK	(~((1UL<<MPX_BD_ENTRY_TAIL)-1))
 
 #define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 13/17] x86, mpx: Add temporary variable to reduce masking
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (11 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 14/17] x86, mpx: new directory entry to addr helper Dave Hansen
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

When we allocate a bounds table, we call mmap(), then add a
"valid" bit to the value before storing it in to the bounds
directory.

If we fail along the way, we go and mask that valid bit
_back_ out.  That seems a little silly, and this makes it
much more clear when we have a plain address versus an
actual table _entry_.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/mm/mpx.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff -puN arch/x86/mm/mpx.c~mpx-remove-unnecessary-masking arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-remove-unnecessary-masking	2015-03-26 11:27:33.173486813 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:33.176486948 -0700
@@ -431,6 +431,7 @@ static int allocate_bt(long __user *bd_e
 	unsigned long expected_old_val = 0;
 	unsigned long actual_old_val = 0;
 	unsigned long bt_addr;
+	unsigned long bd_new_entry;
 	int ret = 0;
 
 	/*
@@ -443,7 +444,7 @@ static int allocate_bt(long __user *bd_e
 	/*
 	 * Set the valid flag (kinda like _PAGE_PRESENT in a pte)
 	 */
-	bt_addr = bt_addr | MPX_BD_ENTRY_VALID_FLAG;
+	bd_new_entry = bt_addr | MPX_BD_ENTRY_VALID_FLAG;
 
 	/*
 	 * Go poke the address of the new bounds table in to the
@@ -457,7 +458,7 @@ static int allocate_bt(long __user *bd_e
 	 * of the MPX code that have to pagefault_disable().
 	 */
 	ret = user_atomic_cmpxchg_inatomic(&actual_old_val, bd_entry,
-					   expected_old_val, bt_addr);
+					   expected_old_val, bd_new_entry);
 	if (ret)
 		goto out_unmap;
 
@@ -488,7 +489,7 @@ static int allocate_bt(long __user *bd_e
 	trace_mpx_new_bounds_table(bt_addr);
 	return 0;
 out_unmap:
-	vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+	vm_munmap(bt_addr, MPX_BT_SIZE_BYTES);
 	return ret;
 }
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 14/17] x86, mpx: new directory entry to addr helper
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (12 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 13/17] x86, mpx: Add temporary variable to reduce masking Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps Dave Hansen
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Currently, to get from a bounds directory entry to the virtual
address of a bounds table, we simply mask off a few low bits.
However, the set of bits we mask off is different for 32 and
64-bit binaries.

This breaks the operation out in to a helper function and also
adds a temporary variable to store the result until we are
sure we are returning one.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/mpx.h |    1 -
 b/arch/x86/mm/mpx.c          |   41 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 34 insertions(+), 8 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mpx-new-entry-to-addr-helper arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mpx-new-entry-to-addr-helper	2015-03-26 11:27:33.538503276 -0700
+++ b/arch/x86/include/asm/mpx.h	2015-03-26 11:27:33.543503501 -0700
@@ -45,7 +45,6 @@
 #define MPX_BNDSTA_TAIL		2
 #define MPX_BNDCFG_TAIL		12
 #define MPX_BNDSTA_ADDR_MASK	(~((1UL<<MPX_BNDSTA_TAIL)-1))
-#define MPX_BT_ADDR_MASK	(~((1UL<<MPX_BD_ENTRY_TAIL)-1))
 
 #define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
 #define MPX_BNDSTA_ERROR_CODE	0x3
diff -puN arch/x86/mm/mpx.c~mpx-new-entry-to-addr-helper arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-new-entry-to-addr-helper	2015-03-26 11:27:33.540503366 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:33.544503546 -0700
@@ -578,29 +578,55 @@ static int mpx_resolve_fault(long __user
 	return 0;
 }
 
+static unsigned long mpx_bd_entry_to_bt_addr(struct mm_struct *mm,
+		unsigned long bd_entry)
+{
+	unsigned long bt_addr = bd_entry;
+	int align_to_bytes;
+	/*
+	 * Bit 0 in a bt_entry is always the valid bit.
+	 */
+	bt_addr &= ~MPX_BD_ENTRY_VALID_FLAG;
+	/*
+	 * Tables are naturally aligned at 8-byte boundaries
+	 * on 64-bit and 4-byte boundaries on 32-bit.  The
+	 * documentation makes it appear that the low bits
+	 * are ignored by the hardware, so we do the same.
+	 */
+	if (is_64bit_mm(mm))
+		align_to_bytes = 8;
+	else
+		align_to_bytes = 4;
+	bt_addr &= ~(align_to_bytes-1);
+	return bt_addr;
+}
+
 /*
  * Get the base of bounds tables pointed by specific bounds
  * directory entry.
  */
 static int get_bt_addr(struct mm_struct *mm,
-			long __user *bd_entry, unsigned long *bt_addr)
+			long __user *bd_entry_ptr,
+			unsigned long *bt_addr_result)
 {
 	int ret;
 	int valid_bit;
+	unsigned long bd_entry;
+	unsigned long bt_addr;
 
-	if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*bd_entry)))
+	if (!access_ok(VERIFY_READ, (bd_entry_ptr), sizeof(*bd_entry_ptr)))
 		return -EFAULT;
 
 	while (1) {
 		int need_write = 0;
 
 		pagefault_disable();
-		ret = get_user(*bt_addr, bd_entry);
+		ret = get_user(bd_entry, bd_entry_ptr);
 		pagefault_enable();
 		if (!ret)
 			break;
 		if (ret == -EFAULT)
-			ret = mpx_resolve_fault(bd_entry, need_write);
+			ret = mpx_resolve_fault(bd_entry_ptr, need_write);
 		/*
 		 * If we could not resolve the fault, consider it
 		 * userspace's fault and error out.
@@ -609,8 +635,8 @@ static int get_bt_addr(struct mm_struct
 			return ret;
 	}
 
-	valid_bit = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
-	*bt_addr &= MPX_BT_ADDR_MASK;
+	valid_bit = bd_entry & MPX_BD_ENTRY_VALID_FLAG;
+	bt_addr = mpx_bd_entry_to_bt_addr(mm, bd_entry);
 
 	/*
 	 * When the kernel is managing bounds tables, a bounds directory
@@ -619,7 +645,7 @@ static int get_bt_addr(struct mm_struct
 	 * data in the address field, we know something is wrong. This
 	 * -EINVAL return will cause a SIGSEGV.
 	 */
-	if (!valid_bit && *bt_addr)
+	if (!valid_bit && bt_addr)
 		return -EINVAL;
 	/*
 	 * Do we have an completely zeroed bt entry?  That is OK.  It
@@ -630,6 +656,7 @@ static int get_bt_addr(struct mm_struct
 	if (!valid_bit)
 		return -ENOENT;
 
+	*bt_addr_result = bt_addr;
 	return 0;
 }
 
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (13 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 14/17] x86, mpx: new directory entry to addr helper Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-27 17:29   ` Borislav Petkov
  2015-03-26 18:33 ` [PATCH 16/17] x86, mpx: support 32-bit binaries on 64-bit kernel Dave Hansen
  2015-03-26 18:33 ` [PATCH 17/17] x86, mpx: allow mixed binaries again Dave Hansen
  16 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

user_atomic_cmpxchg_inatomic() actually looks at sizeof(*ptr) to
figure out how many bytes to copy.  If we run it on a 64-bit
kernel with a 64-bit pointer, it will copy a 64-bit bounds
directory entry.  That's fine, except when we have 32-bit
programs with 32-bit bounds directory entries and we only *want*
32-bits.

This patch breaks the cmpxchg operation out in to its own
function and performs the 32-bit type swizzling in there.

Note, the "64-bit" version of this code _would_ work on a
32-bit-only kernel.  The issue this patch addresses is only for
when the kernel's 'long' is mismatched from the size of the
bounds directory entry of the process we are working on.

The new helper modifies 'actual_old_val' or returns an error.
But gcc doesn't know this, so it warns about 'actual_old_val'
being unused.  Shut it up with an uninitialized_var().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/mm/mpx.c |   41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff -puN arch/x86/mm/mpx.c~mpx-variable-sized-userspace-pokes arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-variable-sized-userspace-pokes	2015-03-26 11:27:33.927520821 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:33.931521001 -0700
@@ -421,6 +421,35 @@ int mpx_disable_management(struct task_s
 	return 0;
 }
 
+static int mpx_cmpxchg_bd_entry(struct mm_struct *mm,
+		unsigned long *actual_old_val_ptr, long __user *bd_entry_addr,
+		unsigned long expected_old_val, unsigned long new_bd_entry)
+{
+	int ret;
+	/*
+	 * user_atomic_cmpxchg_inatomic() actually uses sizeof()
+	 * the pointer thatt we pass to it to figure out how much
+	 * data to cmpxchg.  We have to be careful here not to
+	 * pass a pointer to a 64-bit data type when we only want
+	 * a 32-bit copy.
+	 */
+	if (is_64bit_mm(mm)) {
+		ret = user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
+				bd_entry_addr, expected_old_val, new_bd_entry);
+	} else {
+		u32 uninitialized_var(actual_old_val_32);
+		u32 expected_old_val_32 = expected_old_val;
+		u32 new_bd_entry_32 = new_bd_entry;
+		u32 __user *bd_entry_32 = (u32 __user *)bd_entry_addr;
+		ret = user_atomic_cmpxchg_inatomic(&actual_old_val_32,
+				bd_entry_32, expected_old_val_32,
+				new_bd_entry_32);
+		if (!ret)
+			*actual_old_val_ptr = actual_old_val_32;
+	}
+	return ret;
+}
+
 /*
  * With 32-bit mode, MPX_BT_SIZE_BYTES is 4MB, and the size of each
  * bounds table is 16KB. With 64-bit mode, MPX_BT_SIZE_BYTES is 2GB,
@@ -428,6 +457,7 @@ int mpx_disable_management(struct task_s
  */
 static int allocate_bt(long __user *bd_entry)
 {
+	struct mm_struct *mm = current->mm;
 	unsigned long expected_old_val = 0;
 	unsigned long actual_old_val = 0;
 	unsigned long bt_addr;
@@ -457,8 +487,8 @@ static int allocate_bt(long __user *bd_e
 	 * mmap_sem at this point, unlike some of the other part
 	 * of the MPX code that have to pagefault_disable().
 	 */
-	ret = user_atomic_cmpxchg_inatomic(&actual_old_val, bd_entry,
-					   expected_old_val, bd_new_entry);
+	ret = mpx_cmpxchg_bd_entry(mm, &actual_old_val,	bd_entry,
+				   expected_old_val, bd_new_entry);
 	if (ret)
 		goto out_unmap;
 
@@ -712,15 +742,16 @@ static int unmap_single_bt(struct mm_str
 		long __user *bd_entry, unsigned long bt_addr)
 {
 	unsigned long expected_old_val = bt_addr | MPX_BD_ENTRY_VALID_FLAG;
-	unsigned long actual_old_val = 0;
+	unsigned long uninitialized_var(actual_old_val);
 	int ret;
 
 	while (1) {
 		int need_write = 1;
+		unsigned long cleared_bd_entry = 0;
 
 		pagefault_disable();
-		ret = user_atomic_cmpxchg_inatomic(&actual_old_val, bd_entry,
-						   expected_old_val, 0);
+		ret = mpx_cmpxchg_bd_entry(mm, &actual_old_val,
+				bd_entry, bt_addr, cleared_bd_entry);
 		pagefault_enable();
 		if (!ret)
 			break;
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 16/17] x86, mpx: support 32-bit binaries on 64-bit kernel
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (14 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  2015-03-26 18:33 ` [PATCH 17/17] x86, mpx: allow mixed binaries again Dave Hansen
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

Right now, the kernel can only switch between 64-bit and 32-bit
binaries at compile time. This patch adds support for 32-bit
binaries on 64-bit kernels when we support ia32 emulation.

We essentially choose which set of table sizes to use when doing
arithmetic for the bounds table calculations.

This also uses a different approach for calculating the table
indexes than before.  I think the new one makes it much more
clear what is going on, and allows us to share more code between
the 32 and 64-bit cases.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/include/asm/mpx.h |   68 +++++++++----------
 b/arch/x86/mm/mpx.c          |  150 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 163 insertions(+), 55 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~0002-x86-mpx-support-32bit-binaries-on-64bit-kernel arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~0002-x86-mpx-support-32bit-binaries-on-64bit-kernel	2015-03-26 11:27:34.300537645 -0700
+++ b/arch/x86/include/asm/mpx.h	2015-03-26 11:27:34.305537870 -0700
@@ -13,49 +13,49 @@
 #define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
-#ifdef CONFIG_X86_64
-
-/* upper 28 bits [47:20] of the virtual address in 64-bit used to
- * index into bounds directory (BD).
+/*
+ * The upper 28 bits [47:20] of the virtual address in 64-bit
+ * are used to index into bounds directory (BD).
+ *
+ * The directory is 2G (2^31) in size, and with 8-byte entries
+ * it has 2^28 entries.
  */
-#define MPX_BD_ENTRY_OFFSET	28
-#define MPX_BD_ENTRY_SHIFT	3
-/* bits [19:3] of the virtual address in 64-bit used to index into
- * bounds table (BT).
+#define MPX_BD_SIZE_BYTES_64	(1UL<<31)
+/* An entry is a long, so 8 bytes and a shift of 3 */
+#define MPX_BD_ENTRY_BYTES_64	8
+#define MPX_BD_NR_ENTRIES_64	(MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
+
+/*
+ * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
+ * entries it has 2^20 entries.
  */
-#define MPX_BT_ENTRY_OFFSET	17
-#define MPX_BT_ENTRY_SHIFT	5
-#define MPX_IGN_BITS		3
-#define MPX_BD_ENTRY_TAIL	3
-
-#else
-
-#define MPX_BD_ENTRY_OFFSET	20
-#define MPX_BD_ENTRY_SHIFT	2
-#define MPX_BT_ENTRY_OFFSET	10
-#define MPX_BT_ENTRY_SHIFT	4
-#define MPX_IGN_BITS		2
-#define MPX_BD_ENTRY_TAIL	2
-
-#endif
-
-#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
-#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
+#define MPX_BD_SIZE_BYTES_32	(1UL<<22)
+/* An entry is a long, so 4 bytes and a shift of 2 */
+#define MPX_BD_ENTRY_BYTES_32	4
+#define MPX_BD_NR_ENTRIES_32	(MPX_BD_SIZE_BYTES_32/MPX_BD_ENTRY_BYTES_32)
+
+/*
+ * A 64-bit table is 4MB total in size, and an entry is
+ * 4 64-bit pointers in size.
+ */
+#define MPX_BT_SIZE_BYTES_64	(1UL<<22)
+#define MPX_BT_ENTRY_BYTES_64	32
+#define MPX_BT_NR_ENTRIES_64	(MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
+
+/*
+ * A 32-bit table is 16kB total in size, and an entry is
+ * 4 32-bit pointers in size.
+ */
+#define MPX_BT_SIZE_BYTES_32	(1UL<<14)
+#define MPX_BT_ENTRY_BYTES_32	16
+#define MPX_BT_NR_ENTRIES_32	(MPX_BD_SIZE_BYTES_32/MPX_BD_ENTRY_BYTES_32)
 
 #define MPX_BNDSTA_TAIL		2
 #define MPX_BNDCFG_TAIL		12
 #define MPX_BNDSTA_ADDR_MASK	(~((1UL<<MPX_BNDSTA_TAIL)-1))
-
 #define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
 #define MPX_BNDSTA_ERROR_CODE	0x3
 
-#define MPX_BD_ENTRY_MASK	((1<<MPX_BD_ENTRY_OFFSET)-1)
-#define MPX_BT_ENTRY_MASK	((1<<MPX_BT_ENTRY_OFFSET)-1)
-#define MPX_GET_BD_ENTRY_OFFSET(addr)	((((addr)>>(MPX_BT_ENTRY_OFFSET+ \
-		MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT)
-#define MPX_GET_BT_ENTRY_OFFSET(addr)	((((addr)>>MPX_IGN_BITS) & \
-		MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT)
-
 #ifdef CONFIG_X86_INTEL_MPX
 siginfo_t *mpx_generate_siginfo(struct pt_regs *regs,
 				struct task_struct *tsk);
diff -puN arch/x86/mm/mpx.c~0002-x86-mpx-support-32bit-binaries-on-64bit-kernel arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~0002-x86-mpx-support-32bit-binaries-on-64bit-kernel	2015-03-26 11:27:34.302537735 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:34.306537916 -0700
@@ -36,6 +36,22 @@ static int is_mpx_vma(struct vm_area_str
 	return (vma->vm_ops == &mpx_vma_ops);
 }
 
+static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
+{
+	if (is_64bit_mm(mm))
+		return MPX_BD_SIZE_BYTES_64;
+	else
+		return MPX_BD_SIZE_BYTES_32;
+}
+
+static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm)
+{
+	if (is_64bit_mm(mm))
+		return MPX_BT_SIZE_BYTES_64;
+	else
+		return MPX_BT_SIZE_BYTES_32;
+}
+
 /*
  * This is really a simplified "vm_mmap". it only handles MPX
  * bounds tables (the bounds directory is user-allocated).
@@ -52,7 +68,7 @@ static unsigned long mpx_mmap(unsigned l
 	struct vm_area_struct *vma;
 
 	/* Only bounds table can be allocated here */
-	if (len != MPX_BT_SIZE_BYTES)
+	if (len != mpx_bt_size_bytes(mm))
 		return -EINVAL;
 
 	down_write(&mm->mmap_sem);
@@ -451,13 +467,12 @@ static int mpx_cmpxchg_bd_entry(struct m
 }
 
 /*
- * With 32-bit mode, MPX_BT_SIZE_BYTES is 4MB, and the size of each
- * bounds table is 16KB. With 64-bit mode, MPX_BT_SIZE_BYTES is 2GB,
+ * With 32-bit mode, a bounds directory is 4MB, and the size of each
+ * bounds table is 16KB. With 64-bit mode, a bounds directory is 2GB,
  * and the size of each bounds table is 4MB.
  */
-static int allocate_bt(long __user *bd_entry)
+static int allocate_bt(struct mm_struct *mm, long __user *bd_entry)
 {
-	struct mm_struct *mm = current->mm;
 	unsigned long expected_old_val = 0;
 	unsigned long actual_old_val = 0;
 	unsigned long bt_addr;
@@ -468,7 +483,7 @@ static int allocate_bt(long __user *bd_e
 	 * Carve the virtual space out of userspace for the new
 	 * bounds table:
 	 */
-	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
+	bt_addr = mpx_mmap(mpx_bt_size_bytes(mm));
 	if (IS_ERR((void *)bt_addr))
 		return PTR_ERR((void *)bt_addr);
 	/*
@@ -519,7 +534,7 @@ static int allocate_bt(long __user *bd_e
 	trace_mpx_new_bounds_table(bt_addr);
 	return 0;
 out_unmap:
-	vm_munmap(bt_addr, MPX_BT_SIZE_BYTES);
+	vm_munmap(bt_addr, mpx_bt_size_bytes(mm));
 	return ret;
 }
 
@@ -538,6 +553,7 @@ static int do_mpx_bt_fault(struct task_s
 {
 	unsigned long bd_entry, bd_base;
 	struct bndcsr *bndcsr;
+	struct mm_struct *mm = current->mm;
 
 	bndcsr = tsk_get_xsave_field(tsk, XSTATE_BNDCSR);
 	if (!bndcsr)
@@ -556,10 +572,10 @@ static int do_mpx_bt_fault(struct task_s
 	 * the directory is.
 	 */
 	if ((bd_entry < bd_base) ||
-	    (bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
+	    (bd_entry >= bd_base + mpx_bd_size_bytes(mm)))
 		return -EINVAL;
 
-	return allocate_bt((long __user *)bd_entry);
+	return allocate_bt(mm, (long __user *)bd_entry);
 }
 
 int mpx_handle_bd_fault(struct task_struct *tsk)
@@ -791,7 +807,95 @@ static int unmap_single_bt(struct mm_str
 	 * avoid recursion, do_munmap() will check whether it comes
 	 * from one bounds table through VM_MPX flag.
 	 */
-	return do_munmap(mm, bt_addr, MPX_BT_SIZE_BYTES);
+	return do_munmap(mm, bt_addr, mpx_bt_size_bytes(mm));
+}
+
+/*
+ * Take a virtual address and turns it in to the offset in bytes
+ * inside of the bounds table where the bounds table entry
+ * controlling 'addr' can be found.
+ */
+static unsigned long mpx_get_bt_entry_offset_bytes(struct mm_struct *mm,
+		unsigned long addr)
+{
+	unsigned long bt_entry_size_bytes;
+	unsigned long bt_table_nr_entries;
+	unsigned long offset = addr;
+
+	if (is_64bit_mm(mm)) {
+		/* Bottom 3 bits are ignored on 64-bit */
+		offset >>= 3;
+		bt_entry_size_bytes = MPX_BT_ENTRY_BYTES_64;
+		bt_table_nr_entries = MPX_BT_NR_ENTRIES_64;
+	} else {
+		/* Bottom 2 bits are ignored on 32-bit */
+		offset >>= 2;
+		bt_entry_size_bytes = MPX_BT_ENTRY_BYTES_32;
+		bt_table_nr_entries = MPX_BT_NR_ENTRIES_32;
+	}
+	/*
+	 * We know the size of the table in to which we are
+	 * indexing, and we have eliminated all the low bits
+	 * which are ignored for indexing.
+	 *
+	 * Mask out all the high bits which we do not need
+	 * to index in to the table.
+	 */
+	offset &= (bt_table_nr_entries-1);
+	/*
+	 * We now have an entry offset in terms of *entries* in
+	 * the table.  We need to scale it back up to bytes.
+	 */
+	offset *= bt_entry_size_bytes;
+	return offset;
+}
+
+static noinline unsigned long mpx_get_bd_entry_offset(struct mm_struct *mm,
+		unsigned long addr)
+{
+	/*
+	 * Total size of the process's virtual address space
+	 * Use a u64 because 4GB (for 32-bit) won't fit in a long.
+	 */
+	u64 vaddr_space_size;
+	/*
+	 * How much virtual address space does a single bounds
+	 * directory entry cover?
+	 */
+	unsigned long bd_entry_virt_space;
+
+	/*
+	 * There are several ways to derive the bd offsets.  We
+	 * use the following approach here:
+	 * 1. We know the size of the virtual address space
+	 * 2. We know the number of entries in a bounds table
+	 * 3. We know that each entry covers a fixed amount of
+	 *    virtual address space.
+	 * So, we can just divide the virtual address by the
+	 * number of entries to figure out which entry "controls"
+	 * the given virtual address.
+	 */
+	if (is_64bit_mm(mm)) {
+	      	vaddr_space_size = 1ULL << __VIRTUAL_MASK_SHIFT;
+		bd_entry_virt_space = vaddr_space_size / MPX_BD_NR_ENTRIES_64;
+		/*
+		 * __VIRTUAL_MASK takes the 64-bit addressing hole
+		 * in to accout.  This is a noop on 32-bit.
+		 */
+		addr &= __VIRTUAL_MASK;
+		return addr / bd_entry_virt_space;
+	} else {
+	      	vaddr_space_size = (1ULL << 32);
+		bd_entry_virt_space = vaddr_space_size / MPX_BD_NR_ENTRIES_32;
+		return addr / bd_entry_virt_space;
+	}
+	/*
+	 * The two return calls above are exact copies.  If we
+	 * pull out a single copy and put it in here, gcc won't
+	 * realize that we're doing a power-of-2 divide and use
+	 * shifts.  It uses a real divide.  If we put them up
+	 * there, it manages to figure it out (gcc 4.8.3).
+	 */
 }
 
 /*
@@ -805,6 +909,7 @@ static int unmap_shared_bt(struct mm_str
 		unsigned long end, bool prev_shared, bool next_shared)
 {
 	unsigned long bt_addr;
+	unsigned long start_off, end_off;
 	int ret;
 
 	ret = get_bt_addr(mm, bd_entry, &bt_addr);
@@ -816,17 +921,20 @@ static int unmap_shared_bt(struct mm_str
 	if (ret)
 		return ret;
 
+	start_off = mpx_get_bt_entry_offset_bytes(mm, start);
+	end_off   = mpx_get_bt_entry_offset_bytes(mm, end);
+
 	if (prev_shared && next_shared)
 		ret = zap_bt_entries(mm, bt_addr,
-				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
-				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+				bt_addr+start_off,
+				bt_addr+end_off);
 	else if (prev_shared)
 		ret = zap_bt_entries(mm, bt_addr,
-				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
-				bt_addr+MPX_BT_SIZE_BYTES);
+				bt_addr + start_off,
+				bt_addr + mpx_bt_size_bytes(mm));
 	else if (next_shared)
 		ret = zap_bt_entries(mm, bt_addr, bt_addr,
-				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+				bt_addr+end_off);
 	else
 		ret = unmap_single_bt(mm, bd_entry, bt_addr);
 
@@ -847,8 +955,8 @@ static int unmap_edge_bts(struct mm_stru
 	struct vm_area_struct *prev, *next;
 	bool prev_shared = false, next_shared = false;
 
-	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
-	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+	bde_start = mm->bd_addr + mpx_get_bd_entry_offset(mm, start);
+	bde_end   = mm->bd_addr + mpx_get_bd_entry_offset(mm, end-1);
 
 	/*
 	 * Check whether bde_start and bde_end are shared with adjacent
@@ -860,10 +968,10 @@ static int unmap_edge_bts(struct mm_stru
 	 * in to 'next'.
 	 */
 	next = find_vma_prev(mm, start, &prev);
-	if (prev && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1))
+	if (prev && (mm->bd_addr + mpx_get_bd_entry_offset(mm, prev->vm_end-1))
 			== bde_start)
 		prev_shared = true;
-	if (next && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(next->vm_start))
+	if (next && (mm->bd_addr + mpx_get_bd_entry_offset(mm, next->vm_start))
 			== bde_end)
 		next_shared = true;
 
@@ -929,8 +1037,8 @@ static int mpx_unmap_tables(struct mm_st
 	 *   1. fully covered
 	 *   2. not at the edges of the mapping, even if full aligned
 	 */
-	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
-	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+	bde_start = mm->bd_addr + mpx_get_bd_entry_offset(mm, start);
+	bde_end   = mm->bd_addr + mpx_get_bd_entry_offset(mm, end-1);
 	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
 		ret = get_bt_addr(mm, bd_entry, &bt_addr);
 		switch (ret) {
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 17/17] x86, mpx: allow mixed binaries again
  2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
                   ` (15 preceding siblings ...)
  2015-03-26 18:33 ` [PATCH 16/17] x86, mpx: support 32-bit binaries on 64-bit kernel Dave Hansen
@ 2015-03-26 18:33 ` Dave Hansen
  16 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-26 18:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, Dave Hansen, dave.hansen


From: Dave Hansen <dave.hansen@linux.intel.com>

We explicitly disable allowing 32-bit binaries to enable
MPX on 64-bit kernels.  Re-allow that.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
---

 b/arch/x86/mm/mpx.c |    6 ------
 1 file changed, 6 deletions(-)

diff -puN arch/x86/mm/mpx.c~x86-mpx-allow-mixed-binaries-again arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~x86-mpx-allow-mixed-binaries-again	2015-03-26 11:27:34.699555641 -0700
+++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:34.703555822 -0700
@@ -370,12 +370,6 @@ static __user void *task_get_bounds_dir(
 		return MPX_INVALID_BOUNDS_DIR;
 
 	/*
-	 * 32-bit binaries on 64-bit kernels are currently
-	 * unsupported.
-	 */
-	if (IS_ENABLED(CONFIG_X86_64) && test_thread_flag(TIF_IA32))
-		return MPX_INVALID_BOUNDS_DIR;
-	/*
 	 * The bounds directory pointer is stored in a register
 	 * only accessible if we first do an xsave.
 	 */
_

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/17] x86: make is_64bit_mm() widely available
  2015-03-26 18:33 ` [PATCH 09/17] x86: make is_64bit_mm() widely available Dave Hansen
@ 2015-03-26 22:35   ` Andy Lutomirski
  2015-03-27 15:21   ` Borislav Petkov
  1 sibling, 0 replies; 38+ messages in thread
From: Andy Lutomirski @ 2015-03-26 22:35 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel; +Cc: x86, tglx, dave.hansen, oleg

On 03/26/2015 11:33 AM, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> The uprobes code has a nice helper, is_64bit_mm(), that consults both
> the runtime and compile-time flags for 32-bit support.  Instead of
> reinventing the wheel, pull it in to an x86 header so we can use it
> for MPX.
>
> I prefer passing the mm around to test_thread_flag(TIF_IA32) beacuse
> it makes it explicit where the context is coming from.  It will also
> make it easier if we ever need to access the MPX structures from
> another process.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>

I like this.  Maybe we can remove TIF_IA32 entirely at some point based 
on this :)

> ---
>
>   b/arch/x86/include/asm/mmu_context.h |   13 +++++++++++++
>   b/arch/x86/kernel/uprobes.c          |   10 +---------
>   2 files changed, 14 insertions(+), 9 deletions(-)
>
> diff -puN arch/x86/include/asm/mmu_context.h~x86-make-is_64bit_mm-available arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~x86-make-is_64bit_mm-available	2015-03-26 11:27:31.685419698 -0700
> +++ b/arch/x86/include/asm/mmu_context.h	2015-03-26 11:27:31.690419923 -0700
> @@ -142,6 +142,19 @@ static inline void arch_exit_mmap(struct
>   	paravirt_arch_exit_mmap(mm);
>   }
>
> +#ifdef CONFIG_X86_64
> +static inline bool is_64bit_mm(struct mm_struct *mm)
> +{
> +	return	!config_enabled(CONFIG_IA32_EMULATION) ||
> +		!(mm->context.ia32_compat == TIF_IA32);
> +}
> +#else
> +static inline bool is_64bit_mm(struct mm_struct *mm)
> +{
> +	return false;
> +}
> +#endif
> +
>   static inline void arch_bprm_mm_init(struct mm_struct *mm,
>   		struct vm_area_struct *vma)
>   {
> diff -puN arch/x86/kernel/uprobes.c~x86-make-is_64bit_mm-available arch/x86/kernel/uprobes.c
> --- a/arch/x86/kernel/uprobes.c~x86-make-is_64bit_mm-available	2015-03-26 11:27:31.687419788 -0700
> +++ b/arch/x86/kernel/uprobes.c	2015-03-26 11:27:31.690419923 -0700
> @@ -29,6 +29,7 @@
>   #include <linux/kdebug.h>
>   #include <asm/processor.h>
>   #include <asm/insn.h>
> +#include <asm/mmu_context.h>
>
>   /* Post-execution fixups. */
>
> @@ -312,11 +313,6 @@ static int uprobe_init_insn(struct arch_
>   }
>
>   #ifdef CONFIG_X86_64
> -static inline bool is_64bit_mm(struct mm_struct *mm)
> -{
> -	return	!config_enabled(CONFIG_IA32_EMULATION) ||
> -		!(mm->context.ia32_compat == TIF_IA32);
> -}
>   /*
>    * If arch_uprobe->insn doesn't use rip-relative addressing, return
>    * immediately.  Otherwise, rewrite the instruction so that it accesses
> @@ -497,10 +493,6 @@ static void riprel_post_xol(struct arch_
>   	}
>   }
>   #else /* 32-bit: */
> -static inline bool is_64bit_mm(struct mm_struct *mm)
> -{
> -	return false;
> -}
>   /*
>    * No RIP-relative addressing on 32-bit
>    */
> _
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/17] x86, mpx: trace #BR exceptions
  2015-03-26 18:33 ` [PATCH 03/17] x86, mpx: trace #BR exceptions Dave Hansen
@ 2015-03-27 10:21   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 10:21 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Hansen, linux-kernel, x86, tglx, dave.hansen

Adding rostedt for the TPs.

Steve, please take a look at the rest of the patchset too, there are
more tracepoints being added.

On Thu, Mar 26, 2015 at 11:33:36AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This is the first in a series of MPX tracing patches.
> I've found these extremely useful in the process of
> debugging applications and the kernel code itself.
> 
> This exception hooks in to the bounds (#BR) exception
> very early and allows capturing the key registers which
> would influence how the exception is handled.
> 
> Note that bndcfgu/bndstatus are technically still
> 64-bit registers even in 32-bit mode.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/trace/mpx.h |   48 +++++++++++++++++++++++++++++++++++++
>  b/arch/x86/kernel/traps.c          |    2 +
>  b/arch/x86/mm/mpx.c                |    3 ++
>  3 files changed, 53 insertions(+)
> 
> diff -puN /dev/null arch/x86/include/asm/trace/mpx.h
> --- /dev/null	2014-10-10 16:10:57.316716958 -0700
> +++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:05.444236170 -0700
> @@ -0,0 +1,48 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM mpx
> +
> +#if !defined(_TRACE_MPX_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_MPX_H
> +
> +#include <linux/tracepoint.h>
> +
> +#ifdef CONFIG_X86_INTEL_MPX
> +
> +TRACE_EVENT(bounds_exception_mpx,
> +
> +	TP_PROTO(struct bndcsr *bndcsr),
> +	TP_ARGS(bndcsr),
> +
> +	TP_STRUCT__entry(
> +		__field(u64, bndcfgu)
> +		__field(u64, bndstatus)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->bndcfgu   = bndcsr->bndcfgu;
> +		__entry->bndstatus = bndcsr->bndstatus;
> +	),
> +
> +	TP_printk("bndcfgu:0x%llx bndstatus:0x%llx",
> +		__entry->bndcfgu,
> +		__entry->bndstatus)
> +);
> +
> +#else
> +
> +/*
> + * This gets used outside of MPX-specific code, so we need a stub.
> + */
> +static inline void trace_bounds_exception_mpx(struct bndcsr *bndcsr)
> +{
> +}
> +
> +#endif /* CONFIG_X86_INTEL_MPX */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH asm/trace/
> +#define TRACE_INCLUDE_FILE mpx
> +#endif /* _TRACE_MPX_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff -puN arch/x86/kernel/traps.c~x86-br-exception-trace arch/x86/kernel/traps.c
> --- a/arch/x86/kernel/traps.c~x86-br-exception-trace	2015-03-26 11:27:05.439235945 -0700
> +++ b/arch/x86/kernel/traps.c	2015-03-26 11:27:05.445236215 -0700
> @@ -61,6 +61,7 @@
>  #include <asm/mach_traps.h>
>  #include <asm/alternative.h>
>  #include <asm/mpx.h>
> +#include <asm/trace/mpx.h>
>  #include <asm/xsave.h>
>  
>  #ifdef CONFIG_X86_64
> @@ -401,6 +402,7 @@ dotraplinkage void do_bounds(struct pt_r
>  	if (!bndcsr)
>  		goto exit_trap;
>  
> +	trace_bounds_exception_mpx(bndcsr);
>  	/*
>  	 * The error code field of the BNDSTATUS register communicates status
>  	 * information of a bound range exception #BR or operation involving
> diff -puN arch/x86/mm/mpx.c~x86-br-exception-trace arch/x86/mm/mpx.c
> --- a/arch/x86/mm/mpx.c~x86-br-exception-trace	2015-03-26 11:27:05.441236035 -0700
> +++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:05.445236215 -0700
> @@ -18,6 +18,9 @@
>  #include <asm/processor.h>
>  #include <asm/fpu-internal.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <asm/trace/mpx.h>
> +
>  static const char *mpx_mapping_name(struct vm_area_struct *vma)
>  {
>  	return "[mpx]";
> _
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/17] x86, mpx: trace entry to bounds exception paths
  2015-03-26 18:33 ` [PATCH 04/17] x86, mpx: trace entry to bounds exception paths Dave Hansen
@ 2015-03-27 12:02   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 12:02 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Thu, Mar 26, 2015 at 11:33:38AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> There are two basic things that can happen as the result of
> a bounds exception (#BR):
> 
> 	1. We allocate a new bounds table
> 	2. We pass up a bounds exception to userspace.
> 
> This patch adds a trace points for the case where we are

                  a tracepoint

> passing the exception up to userspace with a signal.
> 
> We are also explicit that we're printing out the inverse of
> the 'upper' that we encounter.  If you want to filter, for
> instance, you need to ~ the value first.

			negate
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/trace/mpx.h |   25 +++++++++++++++++++++++++
>  b/arch/x86/mm/mpx.c                |    2 ++
>  2 files changed, 27 insertions(+)

...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/17] x86, mpx: trace when MPX is zapping pages
  2015-03-26 18:33 ` [PATCH 05/17] x86, mpx: trace when MPX is zapping pages Dave Hansen
@ 2015-03-27 12:26   ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 12:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Thu, Mar 26, 2015 at 11:33:39AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> When MPX can not free an entire bounds table, it will instead
> try to zap unused parts of a bounds table to free the backing
> memory.  This decreases Rss (resident set) without decreasing

			  RSS (Resident Set Size)

> the virtual space allocated for bounds tables.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/trace/mpx.h |   22 ++++++++++++++++++++++
>  b/arch/x86/mm/mpx.c                |    1 +
>  2 files changed, 23 insertions(+)
> 
> diff -puN arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_zap arch/x86/include/asm/trace/mpx.h
> --- a/arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_zap	2015-03-26 11:27:06.279273832 -0700
> +++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:06.283274012 -0700
> @@ -53,6 +53,28 @@ TRACE_EVENT(bounds_exception_mpx,
>  		__entry->bndstatus)
>  );
>  
> +TRACE_EVENT(mpx_unmap_zap,
> +
> +	TP_PROTO(unsigned long start,
> +		 unsigned long end),
> +	TP_ARGS(start, end),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, start)
> +		__field(unsigned long, end)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start = start;
> +		__entry->end   = end;
> +	),
> +
> +	TP_printk("0x%p -> 0x%p",

Maybe as an interval:

	[0x%p:0x%p]

> +		(void *)__entry->start,
> +		(void *)__entry->end
> +	)
> +);
> +
>  #else

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/17] x86, mpx: trace attempts to find bounds tables
  2015-03-26 18:33 ` [PATCH 06/17] x86, mpx: trace attempts to find bounds tables Dave Hansen
@ 2015-03-27 12:32   ` Borislav Petkov
  2015-03-27 14:08     ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 12:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Thu, Mar 26, 2015 at 11:33:41AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This event traces any time we go looking to unmap a bounds table
> for a given virtual address range.  This is useful to ensure
> that the kernel actually "tried" to free a bounds table versus
> times it succeeded.
> 
> It might try and fail if it realized that a table was shared
> with an adjacent VMA which is not being unmapped.

Would it make sense to extend this tracepoint to also dump the error
values returned from unmap_edge_bts() and unmap_single_bt() inside
mpx_unmap_tables()?

> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/arch/x86/include/asm/trace/mpx.h |   22 ++++++++++++++++++++++
>  b/arch/x86/mm/mpx.c                |    1 +
>  2 files changed, 23 insertions(+)
> 
> diff -puN arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_search arch/x86/include/asm/trace/mpx.h
> --- a/arch/x86/include/asm/trace/mpx.h~mpx-trace_unmap_search	2015-03-26 11:27:06.684292099 -0700
> +++ b/arch/x86/include/asm/trace/mpx.h	2015-03-26 11:27:06.689292324 -0700
> @@ -75,6 +75,28 @@ TRACE_EVENT(mpx_unmap_zap,
>  	)
>  );
>  
> +TRACE_EVENT(mpx_unmap_search,
> +
> +	TP_PROTO(unsigned long start,
> +		 unsigned long end),
> +	TP_ARGS(start, end),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, start)
> +		__field(unsigned long, end)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start = start;
> +		__entry->end   = end;
> +	),
> +
> +	TP_printk("0x%p -> 0x%p",

Interval?

> +		(void *)__entry->start,
> +		(void *)__entry->end
> +	)
> +);
> +
>  #else

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/17] x86, mpx: trace attempts to find bounds tables
  2015-03-27 12:32   ` Borislav Petkov
@ 2015-03-27 14:08     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-27 14:08 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, dave.hansen

On 03/27/2015 05:32 AM, Borislav Petkov wrote:
>> > This event traces any time we go looking to unmap a bounds table
>> > for a given virtual address range.  This is useful to ensure
>> > that the kernel actually "tried" to free a bounds table versus
>> > times it succeeded.
>> > 
>> > It might try and fail if it realized that a table was shared
>> > with an adjacent VMA which is not being unmapped.
> Would it make sense to extend this tracepoint to also dump the error
> values returned from unmap_edge_bts() and unmap_single_bt() inside
> mpx_unmap_tables()?

Both of the paths below this point (the "real" unmap and the zap) also
have tracepoints.  It might also get confusing to see -ENOENT in the
success case.  Also, in practice, you can see the siginfo get generated
and tell that something bad happened.

I think I'd rather leave it as-is.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/17] x86, mpx: boot-time disable
  2015-03-26 18:33 ` [PATCH 08/17] x86, mpx: boot-time disable Dave Hansen
@ 2015-03-27 15:07   ` Borislav Petkov
  2015-03-27 15:16     ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 15:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Thu, Mar 26, 2015 at 11:33:43AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> MPX has the _potential_ to cause some issues.  Say part of your init
> system tried to protect one of its components from buffer overflows
> with MPX.  If there were a false positive, it's possible that MPX
> could keep a system from booting.
> 
> MPX could also potentially cause performance issues since it is
> present in hot paths like the unmap path.
> 
> Allow it to be disabled at boot time.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/Documentation/kernel-parameters.txt |    4 ++++
>  b/arch/x86/kernel/cpu/common.c        |   16 ++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff -puN arch/x86/kernel/cpu/common.c~x86-mpx-disable-boot-time arch/x86/kernel/cpu/common.c
> --- a/arch/x86/kernel/cpu/common.c~x86-mpx-disable-boot-time	2015-03-26 11:27:31.269400935 -0700
> +++ b/arch/x86/kernel/cpu/common.c	2015-03-26 11:27:31.275401206 -0700
> @@ -172,6 +172,22 @@ static int __init x86_xsaves_setup(char
>  }
>  __setup("noxsaves", x86_xsaves_setup);
>  
> +static int __init x86_mpx_setup(char *s)
> +{
> +	/* require an exact match without trailing characters */
> +	if (strlen(s))
> +		return 0;
> +
> +	/* do not emit a message if the feature is not present */
> +	if (!boot_cpu_has(X86_FEATURE_MPX))
> +		return 1;
> +
> +	setup_clear_cpu_cap(X86_FEATURE_MPX);
> +	printk("nompx: Intel Memory Protection Extensions (MPX) disabled\n");

WARNING: printk() should include KERN_ facility level
#52: FILE: arch/x86/kernel/cpu/common.c:186:
+       printk("nompx: Intel Memory Protection Extensions (MPX) disabled\n");

checkpatch is sometimes right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer
  2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
@ 2015-03-27 15:15   ` Borislav Petkov
  2015-03-27 16:35     ` Dave Hansen
  2015-03-27 18:57   ` Oleg Nesterov
  1 sibling, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 15:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, tglx, dave.hansen, oleg, riel, sbsiddha, luto,
	mingo, hpa, fenghua.yu

On Thu, Mar 26, 2015 at 11:33:34AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The MPX code appears to be saving off the FPU in an unsafe
> way.   It does not disable preemption or ensure that the
> FPU state has been allocated.
> 
> This patch introduces a new helper which will do both of
> those things internally.
> 
> Note that this requires a patch from Oleg in order to work
> properly.  It is currently in tip/x86/fpu.
> 
> > commit f893959b0898bd876673adbeb6798bdf25c034d7
> > Author: Oleg Nesterov <oleg@redhat.com>
> > Date:   Fri Mar 13 18:30:30 2015 +0100
> >
> >    x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: bp@alien8.de
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: the arch/x86 maintainers <x86@kernel.org>
> ---
> 
>  b/arch/x86/include/asm/xsave.h |    1 +
>  b/arch/x86/kernel/xsave.c      |   32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff -puN arch/x86/include/asm/xsave.h~tsk_get_xsave_addr arch/x86/include/asm/xsave.h
> --- a/arch/x86/include/asm/xsave.h~tsk_get_xsave_addr	2015-03-26 11:27:04.738204327 -0700
> +++ b/arch/x86/include/asm/xsave.h	2015-03-26 11:27:04.743204552 -0700
> @@ -252,6 +252,7 @@ static inline int xrestore_user(struct x
>  }
>  
>  void *get_xsave_addr(struct xsave_struct *xsave, int xstate);
> +void *tsk_get_xsave_field(struct task_struct *tsk, int xstate_field);
>  void setup_xstate_comp(void);
>  
>  #endif
> diff -puN arch/x86/kernel/xsave.c~tsk_get_xsave_addr arch/x86/kernel/xsave.c
> --- a/arch/x86/kernel/xsave.c~tsk_get_xsave_addr	2015-03-26 11:27:04.740204417 -0700
> +++ b/arch/x86/kernel/xsave.c	2015-03-26 11:27:04.744204597 -0700
> @@ -740,3 +740,35 @@ void *get_xsave_addr(struct xsave_struct
>  	return (void *)xsave + xstate_comp_offsets[feature];
>  }
>  EXPORT_SYMBOL_GPL(get_xsave_addr);
> +
> +/*
> + * This wraps up the common operations that need to occur when retrieving
> + * data from an xsave struct.  It first ensures that the task was actually
> + * using the FPU and retrieves the data in to a buffer.  It then calculates
> + * the offset of the requested field in the buffer.
> + *
> + * This function is safe to call whether the FPU is in use or not.
> + *
> + * Inputs:
> + *	tsk: the task from which we are fetching xsave state
> + *	xstate: state which is defined in xsave.h (e.g. XSTATE_FP, XSTATE_SSE,

I think you mean @xsave_field here?

> + *	etc.)
> + * Output:
> + *	address of the state in the xsave area.
> + */
> +void *tsk_get_xsave_field(struct task_struct *tsk, int xsave_field)
> +{
> +	union thread_xstate *xstate;
> +
> +	if (!used_math())
> +		return NULL;
> +	/*
> +	 * unlazy_fpu() is poorly named and will actually
> +	 * save the xstate off in to the memory buffer.
> +	 */
> +	unlazy_fpu(tsk);
> +	xstate = tsk->thread.fpu.state;
> +
> +	return get_xsave_addr(&xstate->xsave, xsave_field);
> +}
> +EXPORT_SYMBOL_GPL(tsk_get_xsave_field);

I'm not sure we want to export this to modules... and MPX cannot be
built as a module either. So why export it?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/17] x86, mpx: boot-time disable
  2015-03-27 15:07   ` Borislav Petkov
@ 2015-03-27 15:16     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-27 15:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, dave.hansen

On 03/27/2015 08:07 AM, Borislav Petkov wrote:
>> xtensions (MPX) disabled\n");
> WARNING: printk() should include KERN_ facility level
> #52: FILE: arch/x86/kernel/cpu/common.c:186:
> +       printk("nompx: Intel Memory Protection Extensions (MPX) disabled\n");
> 
> checkpatch is sometimes right.

Yup, I'll fix it up.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/17] x86: make is_64bit_mm() widely available
  2015-03-26 18:33 ` [PATCH 09/17] x86: make is_64bit_mm() widely available Dave Hansen
  2015-03-26 22:35   ` Andy Lutomirski
@ 2015-03-27 15:21   ` Borislav Petkov
  1 sibling, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 15:21 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen, oleg

On Thu, Mar 26, 2015 at 11:33:45AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The uprobes code has a nice helper, is_64bit_mm(), that consults both
> the runtime and compile-time flags for 32-bit support.  Instead of
> reinventing the wheel, pull it in to an x86 header so we can use it
> for MPX.
> 
> I prefer passing the mm around to test_thread_flag(TIF_IA32) beacuse

Just a nitpick:

s/beacuse/because/

> it makes it explicit where the context is coming from.  It will also
> make it easier if we ever need to access the MPX structures from
> another process.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
> 
>  b/arch/x86/include/asm/mmu_context.h |   13 +++++++++++++
>  b/arch/x86/kernel/uprobes.c          |   10 +---------
>  2 files changed, 14 insertions(+), 9 deletions(-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer
  2015-03-27 15:15   ` Borislav Petkov
@ 2015-03-27 16:35     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-27 16:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, tglx, dave.hansen, oleg, riel, sbsiddha, luto,
	mingo, hpa, fenghua.yu

On 03/27/2015 08:15 AM, Borislav Petkov wrote:
> I'm not sure we want to export this to modules... and MPX cannot be
> built as a module either. So why export it?

The thing I was wrapping (get_xsave_addr()) was EXPORT_SYMBOL_GPL().  I
assumed that it was exported for good reason and that my wrapper should
follow suit.

I do expect this to get used by more things than just MPX going forward.

But, I guess folks can just export it if and when they need it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK
  2015-03-26 18:33 ` [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK Dave Hansen
@ 2015-03-27 17:01   ` Borislav Petkov
  2015-03-27 20:45     ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 17:01 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, qiaowei.ren, dave.hansen

On Thu, Mar 26, 2015 at 11:33:49AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> From: Qiaowei Ren <qiaowei.ren@intel.com>

So Qiaowei is the author?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-26 18:33 ` [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps Dave Hansen
@ 2015-03-27 17:29   ` Borislav Petkov
  2015-03-27 18:16     ` Dave Hansen
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-03-27 17:29 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Thu, Mar 26, 2015 at 11:33:53AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> user_atomic_cmpxchg_inatomic() actually looks at sizeof(*ptr) to
> figure out how many bytes to copy.  If we run it on a 64-bit
> kernel with a 64-bit pointer, it will copy a 64-bit bounds
> directory entry.  That's fine, except when we have 32-bit
> programs with 32-bit bounds directory entries and we only *want*
> 32-bits.
> 
> This patch breaks the cmpxchg operation out in to its own
> function and performs the 32-bit type swizzling in there.
> 
> Note, the "64-bit" version of this code _would_ work on a
> 32-bit-only kernel.  The issue this patch addresses is only for
> when the kernel's 'long' is mismatched from the size of the
> bounds directory entry of the process we are working on.
> 
> The new helper modifies 'actual_old_val' or returns an error.
> But gcc doesn't know this, so it warns about 'actual_old_val'
> being unused.  Shut it up with an uninitialized_var().
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> 
>  b/arch/x86/mm/mpx.c |   41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff -puN arch/x86/mm/mpx.c~mpx-variable-sized-userspace-pokes arch/x86/mm/mpx.c
> --- a/arch/x86/mm/mpx.c~mpx-variable-sized-userspace-pokes	2015-03-26 11:27:33.927520821 -0700
> +++ b/arch/x86/mm/mpx.c	2015-03-26 11:27:33.931521001 -0700
> @@ -421,6 +421,35 @@ int mpx_disable_management(struct task_s
>  	return 0;
>  }
>  
> +static int mpx_cmpxchg_bd_entry(struct mm_struct *mm,
> +		unsigned long *actual_old_val_ptr, long __user *bd_entry_addr,
> +		unsigned long expected_old_val, unsigned long new_bd_entry)
> +{
> +	int ret;
> +	/*
> +	 * user_atomic_cmpxchg_inatomic() actually uses sizeof()
> +	 * the pointer thatt we pass to it to figure out how much
> +	 * data to cmpxchg.  We have to be careful here not to
> +	 * pass a pointer to a 64-bit data type when we only want
> +	 * a 32-bit copy.
> +	 */
> +	if (is_64bit_mm(mm)) {
> +		ret = user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
> +				bd_entry_addr, expected_old_val, new_bd_entry);
> +	} else {
> +		u32 uninitialized_var(actual_old_val_32);
> +		u32 expected_old_val_32 = expected_old_val;
> +		u32 new_bd_entry_32 = new_bd_entry;
> +		u32 __user *bd_entry_32 = (u32 __user *)bd_entry_addr;
> +		ret = user_atomic_cmpxchg_inatomic(&actual_old_val_32,
> +				bd_entry_32, expected_old_val_32,
> +				new_bd_entry_32);

Hmm, I would've added a user_atomic_cmpxchg_inatomic_size() macro which
calls __user_atomic_cmpxchg_inatomic().

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-27 17:29   ` Borislav Petkov
@ 2015-03-27 18:16     ` Dave Hansen
  2015-03-28  8:39       ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-27 18:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, dave.hansen

On 03/27/2015 10:29 AM, Borislav Petkov wrote:
>> > +static int mpx_cmpxchg_bd_entry(struct mm_struct *mm,
>> > +		unsigned long *actual_old_val_ptr, long __user *bd_entry_addr,
>> > +		unsigned long expected_old_val, unsigned long new_bd_entry)
>> > +{
>> > +	int ret;
>> > +	/*
>> > +	 * user_atomic_cmpxchg_inatomic() actually uses sizeof()
>> > +	 * the pointer thatt we pass to it to figure out how much
>> > +	 * data to cmpxchg.  We have to be careful here not to
>> > +	 * pass a pointer to a 64-bit data type when we only want
>> > +	 * a 32-bit copy.
>> > +	 */
>> > +	if (is_64bit_mm(mm)) {
>> > +		ret = user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
>> > +				bd_entry_addr, expected_old_val, new_bd_entry);
>> > +	} else {
>> > +		u32 uninitialized_var(actual_old_val_32);
>> > +		u32 expected_old_val_32 = expected_old_val;
>> > +		u32 new_bd_entry_32 = new_bd_entry;
>> > +		u32 __user *bd_entry_32 = (u32 __user *)bd_entry_addr;
>> > +		ret = user_atomic_cmpxchg_inatomic(&actual_old_val_32,
>> > +				bd_entry_32, expected_old_val_32,
>> > +				new_bd_entry_32);
> Hmm, I would've added a user_atomic_cmpxchg_inatomic_size() macro which
> calls __user_atomic_cmpxchg_inatomic().

That would have saved creating 'u32 __user *bd_entry_32' so that we
could implicitly do sizeof(*bd_entry_32).  But, what else does it buy us?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer
  2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
  2015-03-27 15:15   ` Borislav Petkov
@ 2015-03-27 18:57   ` Oleg Nesterov
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2015-03-27 18:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, tglx, dave.hansen, bp, riel, sbsiddha, luto,
	mingo, hpa, fenghua.yu

Just in case, 1, 2, and 9 looks good to me.

I didn't get the rest of this series, but I am sure it doesn't need
my review ;)

On 03/26, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The MPX code appears to be saving off the FPU in an unsafe
> way.   It does not disable preemption or ensure that the
> FPU state has been allocated.
> 
> This patch introduces a new helper which will do both of
> those things internally.
> 
> Note that this requires a patch from Oleg in order to work
> properly.  It is currently in tip/x86/fpu.
> 
> > commit f893959b0898bd876673adbeb6798bdf25c034d7
> > Author: Oleg Nesterov <oleg@redhat.com>
> > Date:   Fri Mar 13 18:30:30 2015 +0100
> >
> >    x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: bp@alien8.de
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: the arch/x86 maintainers <x86@kernel.org>
> ---
> 
>  b/arch/x86/include/asm/xsave.h |    1 +
>  b/arch/x86/kernel/xsave.c      |   32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff -puN arch/x86/include/asm/xsave.h~tsk_get_xsave_addr arch/x86/include/asm/xsave.h
> --- a/arch/x86/include/asm/xsave.h~tsk_get_xsave_addr	2015-03-26 11:27:04.738204327 -0700
> +++ b/arch/x86/include/asm/xsave.h	2015-03-26 11:27:04.743204552 -0700
> @@ -252,6 +252,7 @@ static inline int xrestore_user(struct x
>  }
>  
>  void *get_xsave_addr(struct xsave_struct *xsave, int xstate);
> +void *tsk_get_xsave_field(struct task_struct *tsk, int xstate_field);
>  void setup_xstate_comp(void);
>  
>  #endif
> diff -puN arch/x86/kernel/xsave.c~tsk_get_xsave_addr arch/x86/kernel/xsave.c
> --- a/arch/x86/kernel/xsave.c~tsk_get_xsave_addr	2015-03-26 11:27:04.740204417 -0700
> +++ b/arch/x86/kernel/xsave.c	2015-03-26 11:27:04.744204597 -0700
> @@ -740,3 +740,35 @@ void *get_xsave_addr(struct xsave_struct
>  	return (void *)xsave + xstate_comp_offsets[feature];
>  }
>  EXPORT_SYMBOL_GPL(get_xsave_addr);
> +
> +/*
> + * This wraps up the common operations that need to occur when retrieving
> + * data from an xsave struct.  It first ensures that the task was actually
> + * using the FPU and retrieves the data in to a buffer.  It then calculates
> + * the offset of the requested field in the buffer.
> + *
> + * This function is safe to call whether the FPU is in use or not.
> + *
> + * Inputs:
> + *	tsk: the task from which we are fetching xsave state
> + *	xstate: state which is defined in xsave.h (e.g. XSTATE_FP, XSTATE_SSE,
> + *	etc.)
> + * Output:
> + *	address of the state in the xsave area.
> + */
> +void *tsk_get_xsave_field(struct task_struct *tsk, int xsave_field)
> +{
> +	union thread_xstate *xstate;
> +
> +	if (!used_math())
> +		return NULL;
> +	/*
> +	 * unlazy_fpu() is poorly named and will actually
> +	 * save the xstate off in to the memory buffer.
> +	 */
> +	unlazy_fpu(tsk);
> +	xstate = tsk->thread.fpu.state;
> +
> +	return get_xsave_addr(&xstate->xsave, xsave_field);
> +}
> +EXPORT_SYMBOL_GPL(tsk_get_xsave_field);
> _


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK
  2015-03-27 17:01   ` Borislav Petkov
@ 2015-03-27 20:45     ` Dave Hansen
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-27 20:45 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, qiaowei.ren, dave.hansen

On 03/27/2015 10:01 AM, Borislav Petkov wrote:
> On Thu, Mar 26, 2015 at 11:33:49AM -0700, Dave Hansen wrote:
>>
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>>
>> From: Qiaowei Ren <qiaowei.ren@intel.com>
> 
> So Qiaowei is the author?

Yes, that was some borkage in my patch preparation scripts.  But,
seriously, it's 1 removed line.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-27 18:16     ` Dave Hansen
@ 2015-03-28  8:39       ` Borislav Petkov
  2015-03-30 16:57         ` Dave Hansen
  2015-03-30 18:58         ` Dave Hansen
  0 siblings, 2 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-28  8:39 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Fri, Mar 27, 2015 at 11:16:41AM -0700, Dave Hansen wrote:
> That would have saved creating 'u32 __user *bd_entry_32' so that we
> could implicitly do sizeof(*bd_entry_32).  But, what else does it buy us?

Well, you could misappropriate futex_atomic_cmpxchg_inatomic() which
takes u32s already - you probably might want to rename it to something
more generic first, though.

Diff ontop:

---
Index: b/arch/x86/mm/mpx.c
===================================================================
--- a/arch/x86/mm/mpx.c	2015-03-28 09:21:40.199966745 +0100
+++ b/arch/x86/mm/mpx.c	2015-03-28 09:19:40.491968402 +0100
@@ -18,6 +18,7 @@
 #include <asm/processor.h>
 #include <asm/trace/mpx.h>
 #include <asm/fpu-internal.h>
+#include <asm/futex.h>
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/mpx.h>
@@ -425,7 +426,6 @@ static int mpx_cmpxchg_bd_entry(struct m
 		unsigned long *actual_old_val_ptr, long __user *bd_entry_addr,
 		unsigned long expected_old_val, unsigned long new_bd_entry)
 {
-	int ret;
 	/*
 	 * user_atomic_cmpxchg_inatomic() actually uses sizeof()
 	 * the pointer thatt we pass to it to figure out how much
@@ -433,21 +433,16 @@ static int mpx_cmpxchg_bd_entry(struct m
 	 * pass a pointer to a 64-bit data type when we only want
 	 * a 32-bit copy.
 	 */
-	if (is_64bit_mm(mm)) {
-		ret = user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
-				bd_entry_addr, expected_old_val, new_bd_entry);
-	} else {
-		u32 uninitialized_var(actual_old_val_32);
-		u32 expected_old_val_32 = expected_old_val;
-		u32 new_bd_entry_32 = new_bd_entry;
-		u32 __user *bd_entry_32 = (u32 __user *)bd_entry_addr;
-		ret = user_atomic_cmpxchg_inatomic(&actual_old_val_32,
-				bd_entry_32, expected_old_val_32,
-				new_bd_entry_32);
-		if (!ret)
-			*actual_old_val_ptr = actual_old_val_32;
-	}
-	return ret;
+	if (is_64bit_mm(mm))
+		return user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
+						   bd_entry_addr,
+						   expected_old_val,
+						   new_bd_entry);
+	else
+		return futex_atomic_cmpxchg_inatomic((u32 *)actual_old_val_ptr,
+						    (u32 __user *)bd_entry_addr,
+						    expected_old_val,
+						    new_bd_entry);
 }
 
 /*
---

The asm looks the same except the retval. Yours does

	mov %rax, (%rsi)

for actual_old_val_ptr which, AFAICT, is not needed in the 32-bit
case because there we're returning a 32-bit value anyway:

	*actual_old_val_ptr = actual_old_val_32;

but gcc writes out the whole 64-bit register %rax to the pointer in %rsi
because it is an unsigned long it gets passed in.

Not that it matters, it is being sign-extended before that with

	movl	%eax, %eax	# actual_old_val_32, tmp137


yours:
------
	.loc 1 445 0
	cmpq	%rax, %rdx	# D.38827, bd_entry_addr
	ja	.L151	#,
.LBB993:
	.loc 1 445 0 is_stmt 0 discriminator 1
	movl	%ecx, %eax	# expected_old_val, actual_old_val_32
.LVL179:
	xorl	%edi, %edi	# ret
.LVL180:
#APP
# 445 "arch/x86/mm/mpx.c" 1

1:	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; cmpxchgl %r8d, (%rdx)	# new_bd_entry, MEM[(u32 *)bd_entry_addr_12(D)]
2:
	.section .fixup, "ax"
3:	mov     $-14, %edi	#, ret
	jmp     2b
	.previous
 .pushsection "__ex_table","a"
 .balign 8
 .long (1b) - .
 .long (3b) - .
 .popsection

# 0 "" 2
#NO_APP
.LBE993:
	.loc 1 448 0 is_stmt 1 discriminator 1
	testl	%edi, %edi	# ret
	jne	.L151	#,
	.loc 1 449 0
	movl	%eax, %eax	# actual_old_val_32, tmp137
.LVL181:
	movq	%rax, (%rsi)	# tmp137, *actual_old_val_ptr_17(D)
---



futex_atomic_cmpxchg_inatomic:
------------------------------
	.file 9 "./arch/x86/include/asm/futex.h"
	.loc 9 113 0
	cmpq	%rax, %rdx	# D.38827, bd_entry_addr
	ja	.L153	#,
.LBB1003:
	movl	%ecx, %eax	# expected_old_val, __old
.LVL185:
	xorl	%edi, %edi	# ret
.LVL186:
#APP
# 113 "./arch/x86/include/asm/futex.h" 1

1:	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; cmpxchgl %r8d, (%rdx)	# new_bd_entry, MEM[(u32 *)bd_entry_addr_12(D)]
2:
	.section .fixup, "ax"
3:	mov     $-14, %edi	#, ret
	jmp     2b
	.previous
 .pushsection "__ex_table","a"
 .balign 8
 .long (1b) - .
 .long (3b) - .
 .popsection

# 0 "" 2
#NO_APP
	movl	%eax, (%rsi)	# __old, MEM[(u32 *)actual_old_val_ptr_17(D)]
.LBE1003:
.LBE995:
.LBE994:
.LBE989:
	.loc 1 458 0
	movl	%edi, %eax	# ret,
---

Here the objdump output which shows the difference better:

yours:
------
     b02:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
     b08:       48 83 e8 04             sub    $0x4,%rax
     b0c:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
     b11:       48 39 c2                cmp    %rax,%rdx
     b14:       77 e8                   ja     afe <mpx_cmpxchg_bd_entry+0x3e>
     b16:       89 c8                   mov    %ecx,%eax
     b18:       31 ff                   xor    %edi,%edi
     b1a:       f0 44 0f b1 02          lock cmpxchg %r8d,(%rdx)
     b1f:       85 ff                   test   %edi,%edi
     b21:       75 db                   jne    afe <mpx_cmpxchg_bd_entry+0x3e>
     b23:       89 c0                   mov    %eax,%eax
     b25:       48 89 06                mov    %rax,(%rsi)
     b28:       89 f8                   mov    %edi,%eax
     b2a:       5d                      pop    %rbp
     b2b:       c3                      retq
     b2c:       0f 1f 40 00             nopl   0x0(%rax)



futex_atomic_cmpxchg_inatomic:
------------------------------
     b72:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
     b78:       48 83 ef 04             sub    $0x4,%rdi
     b7c:       b8 f2 ff ff ff          mov    $0xfffffff2,%eax
     b81:       48 39 fa                cmp    %rdi,%rdx
     b84:       77 ea                   ja     b70 <mpx_cmpxchg_bd_entry+0x40>
     b86:       89 c8                   mov    %ecx,%eax
     b88:       31 ff                   xor    %edi,%edi
     b8a:       f0 44 0f b1 02          lock cmpxchg %r8d,(%rdx)
     b8f:       89 06                   mov    %eax,(%rsi)
     b91:       89 f8                   mov    %edi,%eax
     b93:       5d                      pop    %rbp
     b94:       c3                      retq
     b95:       66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
     b9c:       00 00 00 00

AFAICT, in this case, we return only a 32-bit value and don't touch
the upper 32 bits of actual_old_val which might be a problem if the
assumptions of the callers is that the whole unsigned long is being
changed.

If that's not the case, then you get much nicer code :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-28  8:39       ` Borislav Petkov
@ 2015-03-30 16:57         ` Dave Hansen
  2015-03-30 16:59           ` Borislav Petkov
  2015-03-30 18:58         ` Dave Hansen
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Hansen @ 2015-03-30 16:57 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, dave.hansen

On 03/28/2015 01:39 AM, Borislav Petkov wrote:
> -	if (is_64bit_mm(mm)) {
> -		ret = user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
> -				bd_entry_addr, expected_old_val, new_bd_entry);
> -	} else {
> -		u32 uninitialized_var(actual_old_val_32);
> -		u32 expected_old_val_32 = expected_old_val;
> -		u32 new_bd_entry_32 = new_bd_entry;
> -		u32 __user *bd_entry_32 = (u32 __user *)bd_entry_addr;
> -		ret = user_atomic_cmpxchg_inatomic(&actual_old_val_32,
> -				bd_entry_32, expected_old_val_32,
> -				new_bd_entry_32);
> -		if (!ret)
> -			*actual_old_val_ptr = actual_old_val_32;
> -	}
> -	return ret;
> +	if (is_64bit_mm(mm))
> +		return user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
> +						   bd_entry_addr,
> +						   expected_old_val,
> +						   new_bd_entry);
> +	else
> +		return futex_atomic_cmpxchg_inatomic((u32 *)actual_old_val_ptr,
> +						    (u32 __user *)bd_entry_addr,
> +						    expected_old_val,
> +						    new_bd_entry);
>  }

That does look tempting, and I appreciate the analysis.

But, I'd really rather not hide this behind another layer of abstraction
in order to save a few variable declarations.  It's definitely _smaller_
code, but it's a little less obvious what is going on.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-30 16:57         ` Dave Hansen
@ 2015-03-30 16:59           ` Borislav Petkov
  0 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-03-30 16:59 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, x86, tglx, dave.hansen

On Mon, Mar 30, 2015 at 09:57:14AM -0700, Dave Hansen wrote:
> > +	if (is_64bit_mm(mm))
> > +		return user_atomic_cmpxchg_inatomic(actual_old_val_ptr,
> > +						   bd_entry_addr,
> > +						   expected_old_val,
> > +						   new_bd_entry);
> > +	else
> > +		return futex_atomic_cmpxchg_inatomic((u32 *)actual_old_val_ptr,
> > +						    (u32 __user *)bd_entry_addr,
> > +						    expected_old_val,
> > +						    new_bd_entry);
> >  }
> 
> That does look tempting, and I appreciate the analysis.
> 
> But, I'd really rather not hide this behind another layer of abstraction
> in order to save a few variable declarations.  It's definitely _smaller_
> code, but it's a little less obvious what is going on.

If you rename futex_atomic_cmpxchg_inatomic to
atomic_cmpxchg_inatomic_32 or so, it is perfectly clear what's going on.

	if (is_64bit_mm())
		return user_atomic_cmpxchg_inatomic()
	else /* 32-bit */
		return user_atomic_cmpxchg_inatomic_32()

It can't get any more obvious than that.

:-D

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps
  2015-03-28  8:39       ` Borislav Petkov
  2015-03-30 16:57         ` Dave Hansen
@ 2015-03-30 18:58         ` Dave Hansen
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Hansen @ 2015-03-30 18:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, dave.hansen

On 03/28/2015 01:39 AM, Borislav Petkov wrote:
> AFAICT, in this case, we return only a 32-bit value and don't touch
> the upper 32 bits of actual_old_val which might be a problem if the
> assumptions of the callers is that the whole unsigned long is being
> changed.

The suggestion to just drop in the futex code does not work for just
that reason.

We do this:

static int unmap_single_bt(struct mm_struct *mm,
{
...
	unsigned long uninitialized_var(actual_old_val);

	ret = mpx_cmpxchg_bd_entry(mm, &actual_old_val,
                                bd_entry, bt_addr, cleared_bd_entry);

and then check:

        if (actual_old_val != expected_old_val) {

If we do not touch the upper 32-bits of 'actual_old_val', then we might
end up with stack gunk in there.  The other caller of
mpx_cmpxchg_bd_entry() is OK since it initializes its 'actual_old_val'.

So, I don't think it will work as you've written.  We need to somehow
ensure that the upper 32-bits match the upper 32-bits of
'expected_old_val' which will always be 0's for a 32-bit app.

So, yeah, it's ugly.  You got me.  But all the 64/32-bit conversions are
done out in the open and it's obvious what's going on.  It is also
_tested_ and works.

I'd really like to keep it the way it is.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-03-30 18:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26 18:33 [PATCH 00/17] x86, mpx updates for 4.1 (take 2) Dave Hansen
2015-03-26 18:33 ` [PATCH 01/17] x86, fpu: wrap get_xsave_addr() to make it safer Dave Hansen
2015-03-27 15:15   ` Borislav Petkov
2015-03-27 16:35     ` Dave Hansen
2015-03-27 18:57   ` Oleg Nesterov
2015-03-26 18:33 ` [PATCH 02/17] x86, mpx: use new tsk_get_xsave_addr() Dave Hansen
2015-03-26 18:33 ` [PATCH 03/17] x86, mpx: trace #BR exceptions Dave Hansen
2015-03-27 10:21   ` Borislav Petkov
2015-03-26 18:33 ` [PATCH 04/17] x86, mpx: trace entry to bounds exception paths Dave Hansen
2015-03-27 12:02   ` Borislav Petkov
2015-03-26 18:33 ` [PATCH 05/17] x86, mpx: trace when MPX is zapping pages Dave Hansen
2015-03-27 12:26   ` Borislav Petkov
2015-03-26 18:33 ` [PATCH 06/17] x86, mpx: trace attempts to find bounds tables Dave Hansen
2015-03-27 12:32   ` Borislav Petkov
2015-03-27 14:08     ` Dave Hansen
2015-03-26 18:33 ` [PATCH 07/17] x86, mpx: trace allocation of new " Dave Hansen
2015-03-26 18:33 ` [PATCH 08/17] x86, mpx: boot-time disable Dave Hansen
2015-03-27 15:07   ` Borislav Petkov
2015-03-27 15:16     ` Dave Hansen
2015-03-26 18:33 ` [PATCH 09/17] x86: make is_64bit_mm() widely available Dave Hansen
2015-03-26 22:35   ` Andy Lutomirski
2015-03-27 15:21   ` Borislav Petkov
2015-03-26 18:33 ` [PATCH 10/17] x86: make __VIRTUAL_MASK safe to use on 32 bit Dave Hansen
2015-03-26 18:33 ` [PATCH 11/17] x86, mpx: we do not allocate the bounds directory Dave Hansen
2015-03-26 18:33 ` [PATCH 12/17] x86, mpx: remove redundant MPX_BNDCFG_ADDR_MASK Dave Hansen
2015-03-27 17:01   ` Borislav Petkov
2015-03-27 20:45     ` Dave Hansen
2015-03-26 18:33 ` [PATCH 13/17] x86, mpx: Add temporary variable to reduce masking Dave Hansen
2015-03-26 18:33 ` [PATCH 14/17] x86, mpx: new directory entry to addr helper Dave Hansen
2015-03-26 18:33 ` [PATCH 15/17] x86, mpx: do 32-bit-only cmpxchg for 32-bit apps Dave Hansen
2015-03-27 17:29   ` Borislav Petkov
2015-03-27 18:16     ` Dave Hansen
2015-03-28  8:39       ` Borislav Petkov
2015-03-30 16:57         ` Dave Hansen
2015-03-30 16:59           ` Borislav Petkov
2015-03-30 18:58         ` Dave Hansen
2015-03-26 18:33 ` [PATCH 16/17] x86, mpx: support 32-bit binaries on 64-bit kernel Dave Hansen
2015-03-26 18:33 ` [PATCH 17/17] x86, mpx: allow mixed binaries again Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).