LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v7 00/18] x86: Enable FSGSBASE instructions
@ 2019-05-08 10:02 Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task Chang S. Bae
                   ` (17 more replies)
  0 siblings, 18 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

Updates from v6 [6]:
* Fix ptracer-induced FS/GSBASE write behavior (Andy)
* Fix GSBASE handling in the NMI path
* Remove local_irq_save/restore() from switch_to()
* Separate things into new patches (from 12th to 9th and 10th)
* Add GSBASE handling in the documentation
* Add more comments for entry changes
* Edit changelog (multiple patches)
(all points by Thomas if not mentioned)

Previous versions: [1-5]

[1] Version 1: https://lore.kernel.org/patchwork/cover/934843
[2] Version 2: https://lore.kernel.org/patchwork/cover/912063
[3] Version 3: https://lore.kernel.org/patchwork/cover/1002725
[4] Version 4: https://lore.kernel.org/patchwork/cover/1032799
[5] Version 5: https://lore.kernel.org/patchwork/cover/1038035
[6] Version 6: https://lore.kernel.org/patchwork/cover/1051240

Andi Kleen (3):
  x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  x86/fsgsbase/64: Add documentation for FSGSBASE

Andy Lutomirski (4):
  x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is
    on
  selftests/x86/fsgsbase: Test WRGSBASE
  x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit

Chang S. Bae (11):
  x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task
  selftests/x86/fsgsbase: Test ptracer-induced GSBASE write
  kbuild: Raise the minimum required binutils version to 2.21
  x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions
  x86/entry/64: Add the READ_MSR_GSBASE macro
  x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry
  x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro
  x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path
  x86/fsgsbase/64: Document GSBASE handling in the paranoid path
  selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with
    FSGSBASE

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 Documentation/process/changes.rst               |   6 +-
 Documentation/x86/entry_64.txt                  |  17 +++
 Documentation/x86/fsgs.txt                      | 103 ++++++++++++++
 arch/x86/entry/calling.h                        |  49 +++++++
 arch/x86/entry/entry_64.S                       | 142 ++++++++++++++++---
 arch/x86/include/asm/fsgsbase.h                 |  45 ++++--
 arch/x86/include/asm/inst.h                     |  15 ++
 arch/x86/include/uapi/asm/hwcap2.h              |   3 +
 arch/x86/kernel/cpu/common.c                    |  22 +++
 arch/x86/kernel/process_64.c                    | 132 +++++++++++++----
 arch/x86/kernel/ptrace.c                        |  14 +-
 tools/testing/selftests/x86/fsgsbase.c          | 179 +++++++++++++++++++++++-
 13 files changed, 660 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/x86/fsgs.txt

--
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
       [not found]   ` <74F4F506-2913-4013-9D81-A0C69FA8CDF1@intel.com>
  2019-05-08 10:02 ` [PATCH v7 02/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write Chang S. Bae
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

When a ptracer writes to a ptracee's FS/GSBASE with a different value,
the selector is also cleared. This behavior is not straightforward.

The change will make the behavior simple and sensible, as it will
(only) update the base when requested. Also, the condition check for
comparing the base is removed to make more simple. It might save a few
cycles, but this path is not performance critical.

The only recognizable downside of this change is when writing the base
if the selector is already nonzero. The base will be reloaded according
to the selector. But the case is highly unexpected in real usages.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/process_64.c | 21 +++++++++------------
 arch/x86/kernel/ptrace.c     | 14 ++------------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index f8e1af3..32d12c6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -719,13 +719,13 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 			return -EPERM;
 
 		preempt_disable();
-		/*
-		 * ARCH_SET_GS has always overwritten the index
-		 * and the base. Zero is the most sensible value
-		 * to put in the index, and is the only value that
-		 * makes any sense if FSGSBASE is unavailable.
-		 */
 		if (task == current) {
+			/*
+			 * For the request to set the current task's base,
+			 * first to load with zero selector, then write
+			 * the base value to be effective on a non-FSGSBASE
+			 * system.
+			 */
 			loadseg(GS, 0);
 			x86_gsbase_write_cpu_inactive(arg2);
 
@@ -736,7 +736,6 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 			task->thread.gsbase = arg2;
 
 		} else {
-			task->thread.gsindex = 0;
 			x86_gsbase_write_task(task, arg2);
 		}
 		preempt_enable();
@@ -751,11 +750,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 			return -EPERM;
 
 		preempt_disable();
-		/*
-		 * Set the selector to 0 for the same reason
-		 * as %gs above.
-		 */
 		if (task == current) {
+			/*
+			 * Same here as %gs handling above
+			 */
 			loadseg(FS, 0);
 			x86_fsbase_write_cpu(arg2);
 
@@ -765,7 +763,6 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 			 */
 			task->thread.fsbase = arg2;
 		} else {
-			task->thread.fsindex = 0;
 			x86_fsbase_write_task(task, arg2);
 		}
 		preempt_enable();
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 4b8ee05..3309cfe 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -396,22 +396,12 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		/*
-		 * When changing the FS base, use do_arch_prctl_64()
-		 * to set the index to zero and to set the base
-		 * as requested.
-		 */
-		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+		x86_fsbase_write_cpu(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
-		/*
-		 * Exactly the same here as the %fs handling above.
-		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+		x86_gsbase_write_cpu(child, value);
 		return 0;
 #endif
 	}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 02/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:04   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

The test validates to make sure the selector is not changed when a
ptracer writes a ptracee's GSBASE.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 tools/testing/selftests/x86/fsgsbase.c | 70 ++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index f249e04..1c2dda0 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -23,6 +23,9 @@
 #include <pthread.h>
 #include <asm/ldt.h>
 #include <sys/mman.h>
+#include <stddef.h>
+#include <sys/ptrace.h>
+#include <sys/wait.h>
 
 #ifndef __x86_64__
 # error This test is 64-bit only
@@ -367,6 +370,71 @@ static void test_unexpected_base(void)
 	}
 }
 
+#define USER_REGS_OFFSET(r) offsetof(struct user_regs_struct, r)
+
+static void test_ptrace_write_gsbase(void)
+{
+	int status;
+	pid_t child = fork();
+
+	if (child < 0)
+		err(1, "fork");
+
+	if (child == 0) {
+		printf("[RUN]\tPTRACE_POKE(), write GSBASE from ptracer\n");
+
+		/*
+		 * Use the LDT setup and fetch the GSBASE from the LDT
+		 * by switching to the (nonzero) selector (again)
+		 */
+		do_unexpected_base();
+		asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0x7));
+
+		if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0)
+			err(1, "PTRACE_TRACEME");
+
+		raise(SIGTRAP);
+		_exit(0);
+	}
+
+	wait(&status);
+
+	if (WSTOPSIG(status) == SIGTRAP) {
+		unsigned long gs;
+		unsigned long gs_offset = USER_REGS_OFFSET(gs);
+		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
+
+		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+
+		if (gs != 0x7) {
+			nerrs++;
+			printf("[FAIL]\tGS is not prepared with nonzero\n");
+			goto END;
+		}
+
+		if (ptrace(PTRACE_POKEUSER, child, base_offset, 0xFF) != 0)
+			err(1, "PTRACE_POKEUSER");
+
+		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+
+		/*
+		 * In a non-FSGSBASE system, the nonzero selector will load
+		 * GSBASE (again). But what is tested here is whether the
+		 * selector value is changed or not by the GSBASE write in
+		 * a ptracer.
+		 */
+		if (gs != 0x7) {
+			nerrs++;
+			printf("[FAIL]\tGS changed to %lx\n", gs);
+		} else {
+			printf("[OK]\tGS remained 0x7\n");
+		}
+	}
+
+END:
+	ptrace(PTRACE_CONT, child, NULL, NULL);
+}
+
 int main()
 {
 	pthread_t thread;
@@ -423,5 +491,7 @@ int main()
 	if (pthread_join(thread, NULL) != 0)
 		err(1, "pthread_join");
 
+	test_ptrace_write_gsbase();
+
 	return nerrs == 0 ? 0 : 1;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 02/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:05   ` [tip:x86/cpu] x86/cpu: " tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2019-05-08 10:02 ` [PATCH v7 04/18] kbuild: Raise the minimum required binutils version to 2.21 Chang S. Bae
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Andrew Morton, Randy Dunlap

From: Andy Lutomirski <luto@kernel.org>

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/kernel/cpu/common.c                    | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fd03e2b..7fe1da0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2827,6 +2827,9 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
+	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
+			replaced with a nofsgsbase flag.
+
 	no_console_suspend
 			[HW] Never suspend the console
 			Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8739bdf..5743cb9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -366,6 +366,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 }
 
 /*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+	unsafe_fsgsbase = true;
+	return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
+/*
  * Protection Keys are not available in 32-bit mode.
  */
 static bool pku_disabled;
@@ -1344,6 +1360,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smap(c);
 	setup_umip(c);
 
+	/* Enable FSGSBASE instructions if available. */
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+		if (unsafe_fsgsbase)
+			cr4_set_bits(X86_CR4_FSGSBASE);
+		else
+			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+	}
+
 	/*
 	 * The vendor-specific functions might have changed features.
 	 * Now we do "generic changes."
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 04/18] kbuild: Raise the minimum required binutils version to 2.21
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (2 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:06   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Linux Torvalds

It helps to use some new instructions directly in assembly code.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linux Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 Documentation/process/changes.rst | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 18735dc..0a18075 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -31,7 +31,7 @@ you probably needn't concern yourself with isdn4k-utils.
 ====================== ===============  ========================================
 GNU C                  4.6              gcc --version
 GNU make               3.81             make --version
-binutils               2.20             ld -v
+binutils               2.21             ld -v
 flex                   2.5.35           flex --version
 bison                  2.0              bison --version
 util-linux             2.10o            fdformat --version
@@ -77,9 +77,7 @@ You will need GNU make 3.81 or later to build the kernel.
 Binutils
 --------
 
-The build system has, as of 4.13, switched to using thin archives (`ar T`)
-rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
-This requires binutils 2.20 or newer.
+Binutils 2.21 or newer is needed to build the kernel.
 
 pkg-config
 ----------
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (3 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 04/18] kbuild: Raise the minimum required binutils version to 2.21 Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:06   ` [tip:x86/cpu] " tip-bot for Andi Kleen
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
  2019-05-08 10:02 ` [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

From: Andi Kleen <ak@linux.intel.com>

Add C intrinsics and assembler macros for the new FSBASE and GSBASE
instructions.

Very straight forward. Used in followon patches.

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c74..fdd1177 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
 extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
 extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (4 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:07   ` [tip:x86/cpu] x86/fsgsbase/64: Enable FSGSBASE instructions in " tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Andrew Cooper

The helper functions will switch on faster accesses to FSBASE and
GSBASE when the FSGSBASE feature is enabled.

Accessing user GSBASE needs a couple of SWAPGS operations. It is
avoidable, if the user GSBASE is saved at kernel entry, being updated
as changes, and restored back at kernel exit.

However, this approach is more complicated than the SWAPGS-based.
SWAGPS is used to swap the content of GSBASE and MSR_KERNEL_GS_BASE,
contains the user space, on the transitions from and to user space.
Across context switches, MSR_KERNEL_GS_BASE has to be updated.

With FSGSBSE, the SWAPGS-based needs to be modified with using the
new instruction, on context switches and in the paranoid path for
performance and functionality. On the context switches, the MSR read
(or write) will be replaced by FSGSBASE instruction as shown in this
change. The following patch details the performance benefit with this.
In the paranoid path, the SWAPGS will not be used any more, as the
existing differentiation logic for kernel and user GS will be
broken. The paranoid path change will be also highlighed in the
followon patches.

Also, __{rd,wr}gsbase_inactive() are added as helpers to access user
GSBASE with SWAPGS. Note, for Xen PV, paravirt hooks can be added,
since it may allow a very efficient but different implementation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Any Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 arch/x86/include/asm/fsgsbase.h | 27 ++++++++----------
 arch/x86/kernel/process_64.c    | 62 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177..aefd537 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
 	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		wrfsbase(fsbase);
+	else
+		wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 32d12c6..17421c3 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,6 +161,36 @@ enum which_selector {
 };
 
 /*
+ * Out of line to be protected from kprobes. It is not used on Xen
+ * paravirt. When paravirt support is needed, it needs to be renamed
+ * with native_ prefix.
+ */
+static noinline unsigned long __rdgsbase_inactive(void)
+{
+	unsigned long gsbase;
+
+	native_swapgs();
+	gsbase = rdgsbase();
+	native_swapgs();
+
+	return gsbase;
+}
+NOKPROBE_SYMBOL(__rdgsbase_inactive);
+
+/*
+ * Out of line to be protected from kprobes. It is not used on Xen
+ * paravirt. When paravirt support is needed, it needs to be renamed
+ * with native_ prefix.
+ */
+static noinline void __wrgsbase_inactive(unsigned long gsbase)
+{
+	native_swapgs();
+	wrgsbase(gsbase);
+	native_swapgs();
+}
+NOKPROBE_SYMBOL(__wrgsbase_inactive);
+
+/*
  * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
  * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
  * It's forcibly inlined because it'll generate better code and this function
@@ -338,6 +368,38 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		/* Interrupts are disabled here. */
+		local_irq_save(flags);
+		gsbase = __rdgsbase_inactive();
+		local_irq_restore(flags);
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		/* Interrupts are disabled here. */
+		local_irq_save(flags);
+		__wrgsbase_inactive(gsbase);
+		local_irq_restore(flags);
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (5 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-21 15:22   ` Thomas Gleixner
                     ` (2 more replies)
  2019-05-08 10:02 ` [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions Chang S. Bae
                   ` (10 subsequent siblings)
  17 siblings, 3 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

With the new FSGSBASE instructions, we can efficiently read and write
the FSBASE and GSBASE in __switch_to().  Use that capability to preserve
the full state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup.  It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on my
Skylake laptop.

[ chang: 5~10% performance improvements were seen by a context switch
  benchmark that ran threads with different FS/GSBASE values (to the
  baseline 4.16). Minor edit on the changelog. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 17421c3..e2089c9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -239,8 +239,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
 {
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
-	save_base_legacy(task, task->thread.fsindex, FS);
-	save_base_legacy(task, task->thread.gsindex, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * If FSGSBASE is enabled, we can't make any useful guesses
+		 * about the base, and user code expects us to save the current
+		 * value.  Fortunately, reading the base directly is efficient.
+		 */
+		task->thread.fsbase = rdfsbase();
+		task->thread.gsbase = __rdgsbase_inactive();
+	} else {
+		save_base_legacy(task, task->thread.fsindex, FS);
+		save_base_legacy(task, task->thread.gsindex, GS);
+	}
 }
 
 #if IS_ENABLED(CONFIG_KVM)
@@ -319,10 +329,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/* Update the FS and GS selectors if they could have changed. */
+		if (unlikely(prev->fsindex || next->fsindex))
+			loadseg(FS, next->fsindex);
+		if (unlikely(prev->gsindex || next->gsindex))
+			loadseg(GS, next->gsindex);
+
+		/* Update the bases. */
+		wrfsbase(next->fsbase);
+		__wrgsbase_inactive(next->gsbase);
+	} else {
+		load_seg_legacy(prev->fsindex, prev->fsbase,
+				next->fsindex, next->fsbase, FS);
+		load_seg_legacy(prev->gsindex, prev->gsbase,
+				next->gsindex, next->gsbase, GS);
+	}
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (6 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:09   ` [tip:x86/cpu] x86/process/64: Use FSGSBASE instructions on thread copy and ptrace tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 09/18] x86/entry/64: Add the READ_MSR_GSBASE macro Chang S. Bae
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

Copy real FS/GSBASE values instead of approximation when FSGSBASE is
enabled.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/process_64.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e2089c9..8610df7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -428,7 +428,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.fsindex == 0))
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -442,7 +443,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.gsindex == 0))
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -482,10 +484,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
-	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
+	save_fsgs(me);
+	p->thread.fsindex = me->thread.fsindex;
+	p->thread.fsbase = me->thread.fsbase;
+	p->thread.gsindex = me->thread.gsindex;
+	p->thread.gsbase = me->thread.gsbase;
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 09/18] x86/entry/64: Add the READ_MSR_GSBASE macro
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (7 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry Chang S. Bae
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Dave Hansen

Factor out the RDMSR-based GSBASE read into a new macro.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/entry/calling.h  |  9 +++++++++
 arch/x86/entry/entry_64.S | 12 ++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index efb0d1b..d119729 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -337,6 +337,15 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+.macro READ_MSR_GSBASE save_reg:req
+	movl	$MSR_GS_BASE, %ecx
+	/* Read MSR specified by %ecx into %edx:%eax */
+	rdmsr
+	.ifnc \save_reg, %edx
+	movl	%edx, \save_reg
+	.endif
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 .macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 20e45d9..568a491 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1159,11 +1159,15 @@ ENTRY(paranoid_entry)
 	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
+
 	movl	$1, %ebx
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	1f				/* negative -> in kernel */
+	/*
+	 * The kernel-enforced convention is a negative GSBASE indicates
+	 * a kernel value.
+	 */
+	READ_MSR_GSBASE save_reg=%edx
+	testl	%edx, %edx	/* Negative -> in kernel */
+	js	1f
 	SWAPGS
 	xorl	%ebx, %ebx
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (8 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 09/18] x86/entry/64: Add the READ_MSR_GSBASE macro Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:09   ` [tip:x86/cpu] x86/entry/64: Switch CR3 before SWAPGS in " tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Dave Hansen

When FSGSBASE is enabled, GSBASE handling on the paranoid entry will
need to retrieve the kernel GSBASE. Thus, the kernel page table should
be in. As a preparation, the CR3 switching is moved to happen at
first, before the SWAPGS.

Current GSBASE switching mechanism is possible without the kernel
page table in. No functional change is expected.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/entry_64.S | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 568a491..034d8f8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1160,18 +1160,6 @@ ENTRY(paranoid_entry)
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
 
-	movl	$1, %ebx
-	/*
-	 * The kernel-enforced convention is a negative GSBASE indicates
-	 * a kernel value.
-	 */
-	READ_MSR_GSBASE save_reg=%edx
-	testl	%edx, %edx	/* Negative -> in kernel */
-	js	1f
-	SWAPGS
-	xorl	%ebx, %ebx
-
-1:
 	/*
 	 * Always stash CR3 in %r14.  This value will be restored,
 	 * verbatim, at exit.  Needed if paranoid_entry interrupted
@@ -1181,9 +1169,26 @@ ENTRY(paranoid_entry)
 	 * This is also why CS (stashed in the "iret frame" by the
 	 * hardware at entry) can not be used: this may be a return
 	 * to kernel code, but with a user CR3 value.
+	 *
+	 * This PTI macro doesn't depend on kernel GSBASE and, with
+	 * FSGSBASE, the GSBASE handling requires the kernel page
+	 * tables switched in. So, do it early here.
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	movl	$1, %ebx
+	/*
+	 * The kernel-enforced convention is a negative GSBASE indicates
+	 * a kernel value.
+	 */
+	READ_MSR_GSBASE save_reg=%edx
+	testl	%edx, %edx	/* Negative -> in kernel */
+	jns	.Lparanoid_entry_swapgs
+	ret
+
+.Lparanoid_entry_swapgs:
+	SWAPGS
+	xorl	%ebx, %ebx
 	ret
 END(paranoid_entry)
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (9 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:10   ` [tip:x86/cpu] x86/entry/64: " tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Dave Hansen

On the paranoid entry, by far, SWAPGS is needed when entering from user
space. A positive GSBASE means it is a user value and a negative GSBASE
means it is a kernel value. When FSGSBASE is enabled, however, user
space can write arbitrary values to GSBASE, so it will break the
existing logic to differentiate kernel and user GSBASE.

So, what the new GSBASE handling needs to do on the entry is:

	original_GSBASE = RDGSBASE()
	kernel_GSBASE = FIND_PERCPU_BASE()
	WRGSBASE(kernel_GSBASE)

The new macro will be useful to retrieve the kernel GSBASE, and will be
used on following patches.

The way used to retrieve per-CPU base is reading the per_cpu_offset
table with a CPU NR as an index. The CPU NR is obtained by the RDPID
instruction, or (if not available) it can be extracted from the limit
field of the CPUNODE entry in GDT.

Also, the GAS-compatible RDPID macro is added, as binutils 2.27 seems
to start to support the instruction. The minimum build requirement for
binutils is the 2.21 version, at this point.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/calling.h    | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/inst.h | 15 +++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index d119729..6cbf793 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/inst.h>
 
 /*
 
@@ -337,6 +338,39 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+	movq	$__CPUNODE_SEG, \reg
+	lsl	\reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+	ALTERNATIVE \
+		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+		"RDPID	\reg", \
+		X86_FEATURE_RDPID
+	andq	$VDSO_CPUNODE_MASK, \reg
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+	movq	pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
 .macro READ_MSR_GSBASE save_reg:req
 	movl	$MSR_GS_BASE, %ecx
 	/* Read MSR specified by %ecx into %edx:%eax */
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796d..d063841 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
 	.endif
 	MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
 	.endm
+
+.macro RDPID opd
+	REG_TYPE rdpid_opd_type \opd
+	.if rdpid_opd_type == REG_TYPE_R64
+	R64_NUM rdpid_opd \opd
+	.else
+	R32_NUM rdpid_opd \opd
+	.endif
+	.byte 0xf3
+	.if rdpid_opd > 7
+	PFX_REX rdpid_opd 0
+	.endif
+	.byte 0x0f, 0xc7
+	MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (10 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot for Chang S. Bae
                     ` (2 more replies)
  2019-05-08 10:02 ` [PATCH v7 13/18] x86/fsgsbase/64: Document GSBASE handling in the paranoid path Chang S. Bae
                   ` (5 subsequent siblings)
  17 siblings, 3 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Dave Hansen

Without FSGSBASE, user space cannot change GSBASE other than through
the PRCTL. On the transition from user space to kernel, SWAPGS is used
to swap the content of GSBASE and MSR_KERNEL_GS_BASE. However, SWAPGS
should only be used if entering from user mode to kernel mode.
Particularly in an super-atomic entry like the paranoid entry, the
differentiation of kernel and user GSBASE is based on the
kernel-enforced convention. A negative GSBASE indicates a kernel value
and a positive GSBASE means a user value.

In an FSGSBASE system, user space can set an arbitrary GSBASE value
without kernel interactions. Even, the value can be the same as the
kernel GSBASE. Therefore, the SWAPGS cannot be used in the paranoid
path, as the kernel-enforced convention is broken.

The new way to handle GSBASE is:

On entry:
	%rbx = RDGSBASE()
	kernel_GSBASE = FIND_PERCPU_BASE()
	WRGSBASE(kernel_GSBASE)

On exit:
	WRGSBASE(%rbx)

The original GSBASE will be always stashed on entry and restored at
exit. And the kernel GSBASE will be loaded in the paranoid path. The
new SAVE_AND_SET_GSBASE() macro wraps up the operations on the entry
side.

Obviously, for the non-paranoid path, it all keeps working exactly like
it does now.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/calling.h  |   6 +++
 arch/x86/entry/entry_64.S | 125 +++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 6cbf793..d6eb584 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -380,6 +380,12 @@ For 32-bit we have the following conventions - kernel is built with
 	.endif
 .endm
 
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+	rdgsbase \save_reg
+	GET_PERCPU_BASE \scratch_reg
+	wrgsbase \scratch_reg
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 .macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 034d8f8..c95999c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -933,8 +934,13 @@ ENTRY(\sym)
 	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
-	/* these procedures expect "no swapgs" flag in ebx */
 	.if \paranoid
+	/*
+	 * If the current system is FSGSBASE-enabled, the original
+	 * GSBASE is stashed in %rbx. On the other systems,
+	 * expect "no swapgs" flag in %ebx. (See details
+	 * in the paranoid_entry and paranoid_exit)
+	 */
 	jmp	paranoid_exit
 	.else
 	jmp	error_exit
@@ -1150,9 +1156,21 @@ idtentry machine_check		do_mce			has_error_code=0	paranoid=1
 #endif
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs.
+ *
+ * On non-FSGSBASE systems, traditionally, SWAPGS is used to swap
+ * the content of GSBASE and MSR_KERNEL_GS_BASE on the transition
+ * from user space to kernel. It is determined by the GSBASE value
+ * based on the kernel-enforced convention. If GSBASE is negative,
+ * it means a kernel value, and if the value is positive, it
+ * indicates a user value. On return, reset %ebx=0 if SWAPGS is
+ * needed at exit. And set %ebx=1 if not needed.
+ *
+ * On FSGSBASE systems, SWAPGS is not applicable to handle GSBASE,
+ * as the kernel-enforced convention is no longer valid. FSGSBASE
+ * instructions allow user space to set arbitrary GSBASE values.
+ * The new way of handling GSBASE, instead, is always stash GSBASE
+ * in %rbx and res tore it at exit.
  */
 ENTRY(paranoid_entry)
 	UNWIND_HINT_FUNC
@@ -1176,13 +1194,39 @@ ENTRY(paranoid_entry)
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
-	movl	$1, %ebx
 	/*
-	 * The kernel-enforced convention is a negative GSBASE indicates
-	 * a kernel value.
+	 * The way handling GSBASE is quite different depending on
+	 * whether FSGSBASE is enabled or not. With FSGSBASE, we
+	 * cannot make any assumption in the GSBASE value. Whereas,
+	 * the traditional SWAPGS-based GSBASE handling has an
+	 * assumption that a negative GSBASE value always indicates
+	 * a kernel value. So, the path is split into two ways,
+	 * based on the FSGSBASE flag.
+	 */
+	ALTERNATIVE "jmp .Lparanoid_entry_no_fsgsbase",	"",\
+		X86_FEATURE_FSGSBASE
+
+	/*
+	 * The macro always stashes the (current) GSBSE in %rbx,
+	 * and retrieves and loads the kernel GSBASE. The stashed
+	 * GSBASE should be always restored at the exit. The
+	 * kernel GSBASE is set to be the per-CPU base.
 	 */
+	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+	ret
+
+.Lparanoid_entry_no_fsgsbase:
+	/*
+	 * The macro reads the current GSBASE. And, based on the
+	 * value, it is determined whether current transition is
+	 * from user space to kernel or not. If the value is
+	 * nonnegative, it means a user space value, then do
+	 * SWAPGS and reset the flag (%ebx=0). Otherwise, set
+	 * the flag only (%ebx=1).
+	 */
+	movl	$1, %ebx
 	READ_MSR_GSBASE save_reg=%edx
-	testl	%edx, %edx	/* Negative -> in kernel */
+	testl	%edx, %edx
 	jns	.Lparanoid_entry_swapgs
 	ret
 
@@ -1202,25 +1246,55 @@ END(paranoid_entry)
  * be complicated.  Fortunately, we there's no good reason
  * to try to handle preemption here.
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On the entry, it is mentioned that why SWAPGS is not working
+ * with FSGSBASE and how GSBASE handling is different between
+ * FSGSBASE and non-FSGSBASE systems. And, therefore, the way
+ * to deal with GSBASE is diverged from there. At this point,
+ * therefore, %rbx or %ebx has different implications depending
+ * on the FSGSBASE availability.
+ *
+ * On FSGSBASE systems, the original GSBASE is always stashed
+ * in %rbx from the entry. At the exit here, it should be
+ * always restored.
+ *
+ * Whereas, on non-FSGSBASE systems, %ebx indicates "no swapgs"
+ * flag. 0 means that SWAPGS is needed here at exit, and 1
+ * means no need.
  */
 ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
-	testl	%ebx, %ebx			/* swapgs needed? */
+
+	/*
+	 * As the way of handling GSBASE is already split from the
+	 * entry, GSBASE is dealt here accordingly in two different
+	 * ways.
+	 */
+	ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase",	"nop",\
+		X86_FEATURE_FSGSBASE
+
+	/* On FSGSBASE systems, always restore the stashed GSBASE */
+	wrgsbase	%rbx
+	jmp	.Lparanoid_exit_no_swapgs;
+
+.Lparanoid_exit_no_fsgsbase:
+	/* On non-FSGSBASE systems, conditionally do SWAPGS */
+	testl	%ebx, %ebx
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
 	SWAPGS_UNSAFE_STACK
 	jmp	.Lparanoid_exit_restore
+
 .Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
+
 .Lparanoid_exit_restore:
-	jmp restore_regs_and_return_to_kernel
+	jmp	restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
@@ -1614,7 +1688,7 @@ end_repeat_nmi:
 	pushq	$-1				/* ORIG_RAX: no syscall to restart */
 
 	/*
-	 * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
+	 * Use paranoid_entry to handle GSBASE, but no need to useparanoid_exit
 	 * as we should not be calling schedule in NMI context.
 	 * Even with normal interrupts enabled. An NMI should not be
 	 * setting NEED_RESCHED or anything that normal interrupts and
@@ -1631,10 +1705,35 @@ end_repeat_nmi:
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/*
+	 * The paranoid_entry, called in this NMI path, handles
+	 * GSBASE differently depending on the FSGSBASE. And the
+	 * paranoid_exit is not called here, so the part of
+	 * GSBASE handling at the exit should be equally executed,
+	 * not to break the semantics.
+	 *
+	 * On the paranoid_exit, the %rbx/%ebx implications are
+	 * different between FSGSBASE and non-FSGSBASE systems.
+	 * And accordingly the handling path is split into two
+	 * ways.
+	 *
+	 * On non-FSGSBASE systems, SWAPGS is needed only if %ebx=0.
+	 * On FSGSBASE systems, always restore the stashed GSBASE
+	 * from %rbx.
+	 */
+	ALTERNATIVE "jmp nmi_no_fsgsbase", "nop",\
+		X86_FEATURE_FSGSBASE
+
+	wrgsbase	%rbx
+	jmp	nmi_restore
+
+nmi_no_fsgsbase:
+	testl	%ebx, %ebx
 	jnz	nmi_restore
+
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
+
 nmi_restore:
 	POP_REGS
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 13/18] x86/fsgsbase/64: Document GSBASE handling in the paranoid path
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (11 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: " tip-bot for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 14/18] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

On a FSGSBASE system, the way to handle GSBASE in the paranoid path
will be different from the existing SWAPGS-based. Document the reason
and what is done by the (new) GSBASE handling. In non-paranoid path,
it will keep working exactly like it does today.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 Documentation/x86/entry_64.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
index c1df8eb..2878c56 100644
--- a/Documentation/x86/entry_64.txt
+++ b/Documentation/x86/entry_64.txt
@@ -102,3 +102,20 @@ We try to only use IST entries and the paranoid entry code for vectors
 that absolutely need the more expensive check for the GS base - and we
 generate all 'normal' entry points with the regular (faster) paranoid=0
 variant.
+
+On a FSGSBASE system, however, user space can set GS without kernel
+interaction. It means the value of GS base itself does not imply
+anything, whether a kernel value or a user space value. So, there is
+no longer safe way to check if entering from user mode to kernel mode.
+Instead, this way handles GS base properly with FSGSBASE:
+
+On entry:
+	rdgsbase %rbx
+	GET_PERCPU_BASE %rax   /* see the details in calling.h */
+	wrgsbase %rax
+
+On exit:
+	wrgsbase %rbx
+
+Obviously, for the non-paranoid path, it all keeps working exactly
+likt it does without FSGSBASE.
\ No newline at end of file
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 14/18] selftests/x86/fsgsbase: Test WRGSBASE
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (12 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 13/18] x86/fsgsbase/64: Document GSBASE handling in the paranoid path Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:12   ` [tip:x86/cpu] selftests/x86/fsgsbase: Test RD/WRGSBASE tip-bot for Andy Lutomirski
  2019-05-08 10:02 ` [PATCH v7 15/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE Chang S. Bae
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

This validates that GS and GSBASE are independently preserved across
context switches.

[ chang: Use FSGSBASE instructions directly instead of .byte ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 tools/testing/selftests/x86/fsgsbase.c | 102 ++++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 1c2dda0..daff504 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -26,6 +26,7 @@
 #include <stddef.h>
 #include <sys/ptrace.h>
 #include <sys/wait.h>
+#include <setjmp.h>
 
 #ifndef __x86_64__
 # error This test is 64-bit only
@@ -74,6 +75,43 @@ static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
 
 }
 
+static jmp_buf jmpbuf;
+
+static void sigill(int sig, siginfo_t *si, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static bool have_fsgsbase;
+
+static inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
+static inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
 enum which_base { FS, GS };
 
 static unsigned long read_base(enum which_base which)
@@ -202,14 +240,16 @@ static void do_remote_base()
 	       to_set, hard_zero ? " and clear gs" : "", sel);
 }
 
-void do_unexpected_base(void)
+static __thread int set_thread_area_entry_number = -1;
+
+static void do_unexpected_base(void)
 {
 	/*
 	 * The goal here is to try to arrange for GS == 0, GSBASE !=
 	 * 0, and for the the kernel the think that GSBASE == 0.
 	 *
 	 * To make the test as reliable as possible, this uses
-	 * explicit descriptorss.  (This is not the only way.  This
+	 * explicit descriptors.  (This is not the only way.  This
 	 * could use ARCH_SET_GS with a low, nonzero base, but the
 	 * relevant side effect of ARCH_SET_GS could change.)
 	 */
@@ -242,7 +282,7 @@ void do_unexpected_base(void)
 			MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
 		memcpy(low_desc, &desc, sizeof(desc));
 
-		low_desc->entry_number = -1;
+		low_desc->entry_number = set_thread_area_entry_number;
 
 		/* 32-bit set_thread_area */
 		long ret;
@@ -257,6 +297,8 @@ void do_unexpected_base(void)
 			return;
 		}
 		printf("\tother thread: using GDT slot %d\n", desc.entry_number);
+		set_thread_area_entry_number = desc.entry_number;
+
 		asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)((desc.entry_number << 3) | 0x3)));
 	}
 
@@ -268,6 +310,34 @@ void do_unexpected_base(void)
 	asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0));
 }
 
+void test_wrbase(unsigned short index, unsigned long base)
+{
+	unsigned short newindex;
+	unsigned long newbase;
+
+	printf("[RUN]\tGS = 0x%hx, GSBASE = 0x%lx\n", index, base);
+
+	asm volatile ("mov %0, %%gs" : : "rm" (index));
+	wrgsbase(base);
+
+	remote_base = 0;
+	ftx = 1;
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+	while (ftx != 0)
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
+
+	asm volatile ("mov %%gs, %0" : "=rm" (newindex));
+	newbase = rdgsbase();
+
+	if (newindex == index && newbase == base) {
+		printf("[OK]\tIndex and base were preserved\n");
+	} else {
+		printf("[FAIL]\tAfter switch, GS = 0x%hx and GSBASE = 0x%lx\n",
+		       newindex, newbase);
+		nerrs++;
+	}
+}
+
 static void *threadproc(void *ctx)
 {
 	while (1) {
@@ -439,6 +509,17 @@ int main()
 {
 	pthread_t thread;
 
+	/* Probe FSGSBASE */
+	sethandler(SIGILL, sigill, 0);
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		rdfsbase();
+		have_fsgsbase = true;
+		printf("\tFSGSBASE instructions are enabled\n");
+	} else {
+		printf("\tFSGSBASE instructions are disabled\n");
+	}
+	clearhandler(SIGILL);
+
 	sethandler(SIGSEGV, sigsegv, 0);
 
 	check_gs_value(0);
@@ -485,6 +566,21 @@ int main()
 
 	test_unexpected_base();
 
+	if (have_fsgsbase) {
+		unsigned short ss;
+
+		asm volatile ("mov %%ss, %0" : "=rm" (ss));
+
+		test_wrbase(0, 0);
+		test_wrbase(0, 1);
+		test_wrbase(0, 0x200000000);
+		test_wrbase(0, 0xffffffffffffffff);
+		test_wrbase(ss, 0);
+		test_wrbase(ss, 1);
+		test_wrbase(ss, 0x200000000);
+		test_wrbase(ss, 0xffffffffffffffff);
+	}
+
 	ftx = 3;  /* Kill the thread. */
 	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 15/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (13 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 14/18] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:13   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Andi Kleen

This validates that GS and GSBASE are independently preserved in
ptracer commands.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@linux.intel.com>
---
 tools/testing/selftests/x86/fsgsbase.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index daff504..7cd36be 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -470,7 +470,7 @@ static void test_ptrace_write_gsbase(void)
 	wait(&status);
 
 	if (WSTOPSIG(status) == SIGTRAP) {
-		unsigned long gs;
+		unsigned long gs, base;
 		unsigned long gs_offset = USER_REGS_OFFSET(gs);
 		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -486,6 +486,7 @@ static void test_ptrace_write_gsbase(void)
 			err(1, "PTRACE_POKEUSER");
 
 		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+		base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
 		/*
 		 * In a non-FSGSBASE system, the nonzero selector will load
@@ -496,8 +497,14 @@ static void test_ptrace_write_gsbase(void)
 		if (gs != 0x7) {
 			nerrs++;
 			printf("[FAIL]\tGS changed to %lx\n", gs);
+		} else if (have_fsgsbase && (base != 0xFF)) {
+			nerrs++;
+			printf("[FAIL]\tGSBASE changed to %lx\n", base);
 		} else {
-			printf("[OK]\tGS remained 0x7\n");
+			printf("[OK]\tGS remained 0x7 %s");
+			if (have_fsgsbase)
+				printf("and GSBASE changed to 0xFF");
+			printf("\n");
 		}
 	}
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (14 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 15/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:14   ` [tip:x86/cpu] x86/cpu: Enable FSGSBASE on 64bit " tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2019-05-08 10:02 ` [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
  2019-05-08 10:02 ` [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

From: Andy Lutomirski <luto@kernel.org>

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +--
 arch/x86/kernel/cpu/common.c                    | 32 +++++++++++--------------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7fe1da0..bea4d2b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2827,8 +2827,7 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
-	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
-			replaced with a nofsgsbase flag.
+	nofsgsbase	[X86] Disables FSGSBASE instructions.
 
 	no_console_suspend
 			[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5743cb9..fd7ef49 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -365,21 +365,21 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-	unsafe_fsgsbase = true;
+	/* Require an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	/* Do not emit a message if the feature is not present. */
+	if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+	pr_info("nofsgsbase: FSGSBASE disabled\n");
 	return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1361,12 +1361,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-		if (unsafe_fsgsbase)
-			cr4_set_bits(X86_CR4_FSGSBASE);
-		else
-			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-	}
+	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+		cr4_set_bits(X86_CR4_FSGSBASE);
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (15 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-22 10:14   ` [tip:x86/cpu] " tip-bot for Andi Kleen
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
  2019-05-08 10:02 ` [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
  17 siblings, 2 replies; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML

From: Andi Kleen <ak@linux.intel.com>

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

So we need to provide a way for the application to discover the kernel
capability.

I used AT_HWCAP2 in the ELF aux vector which is already used by PPC for
similar things. We define a new Linux defined bitmap returned in AT_HWCAP.
Next to MONITOR/MWAIT, bit 1 is reserved for FSGSBASE capability checks.

The application can then access it manually or using the getauxval()
function in newer glibc.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 6ebaae9..c5ce54e 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT		(1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE			BIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fd7ef49..f8b1d7a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1361,8 +1361,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
 		cr4_set_bits(X86_CR4_FSGSBASE);
+		elf_hwcap2 |= HWCAP2_FSGSBASE;
+	}
 
 	/*
 	 * The vendor-specific functions might have changed features.
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
                   ` (16 preceding siblings ...)
  2019-05-08 10:02 ` [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
@ 2019-05-08 10:02 ` Chang S. Bae
  2019-06-14  6:54   ` Thomas Gleixner
  17 siblings, 1 reply; 63+ messages in thread
From: Chang S. Bae @ 2019-05-08 10:02 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: Ravi Shankar, Chang S . Bae, LKML, Randy Dunlap

From: Andi Kleen <ak@linux.intel.com>

v2: Minor updates to documentation requested in review.
v3: Update for new gcc and various improvements.
v4: Address the typos pointed by Randy Dunlap

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 Documentation/x86/fsgs.txt | 103 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)
 create mode 100644 Documentation/x86/fsgs.txt

diff --git a/Documentation/x86/fsgs.txt b/Documentation/x86/fsgs.txt
new file mode 100644
index 0000000..a6e2e38
--- /dev/null
+++ b/Documentation/x86/fsgs.txt
@@ -0,0 +1,103 @@
+
+Using FS and GS prefixes on 64-bit x86 linux
+
+The x86 architecture supports segment prefixes per instruction to add an
+offset to an address.  On 64-bit x86, these are mostly nops, except for FS
+and GS.
+
+This offers an efficient way to reference a global pointer.
+
+The compiler has to generate special code to use these base registers,
+or they can be accessed with inline assembler.
+
+	mov %gs:offset,%reg
+	mov %fs:offset,%reg
+
+On 64-bit code, FS is used to address the thread local segment (TLS), declared
+using thread. The compiler then automatically generates the correct prefixes
+and relocations to access these values.
+
+FS is normally managed by the runtime code or the threading library.
+Overwriting it can break a lot of things (including syscalls and gdb),
+but it can make sense to save/restore it for threading purposes.
+
+GS is freely available, but may need special (compiler or inline assembler)
+code to use.
+
+Traditionally 64-bit FS and GS could be set by the arch_prctl system call
+
+	arch_prctl(ARCH_SET_GS, value)
+	arch_prctl(ARCH_SET_FS, value)
+
+[There was also an older method using modify_ldt(), inherited from 32-bit,
+but this is not discussed here.]
+
+However, using a syscall is problematic for user space threading libraries
+that want to context switch in user space. The whole point of them
+is avoiding the overhead of a syscall. It's also cleaner for compilers
+wanting to use the extra register to use instructions to write
+it, or read it directly to compute addresses and offsets.
+
+Newer Intel CPUs (Ivy Bridge and later) added new instructions to directly
+access these registers quickly from user context:
+
+	RDFSBASE %reg	read the FS base	(or _readfsbase_u64)
+	RDGSBASE %reg	read the GS base	(or _readgsbase_u64)
+
+	WRFSBASE %reg	write the FS base	(or _writefsbase_u64)
+	WRGSBASE %reg	write the GS base	(or _writegsbase_u64)
+
+If you use the intrinsics, include <immintrin.h> and set the -mfsgsbase option.
+
+The instructions are supported by the CPU when the "fsgsbase" string is shown
+in /proc/cpuinfo (or directly retrieved through the CPUID instruction,
+7:0 (ebx), word 9, bit 0).
+
+The instructions are only available to 64-bit binaries.
+
+In addition the kernel needs to explicitly enable these instructions, as it
+may otherwise not correctly context switch the state. Newer Linux
+kernels enable this. When the kernel does not enable the instruction
+they will fault with a #UD exception.
+
+An FSGSBASE-enabled kernel can be detected by checking the AT_HWCAP2
+bitmask in the aux vector. When the HWCAP2_FSGSBASE bit is set the
+kernel supports FSGSBASE.
+
+	#include <sys/auxv.h>
+	#include <elf.h>
+
+	/* Will be eventually in asm/hwcap.h */
+	#define HWCAP2_FSGSBASE        (1 << 1)
+
+        unsigned val = getauxval(AT_HWCAP2);
+        if (val & HWCAP2_FSGSBASE) {
+                asm("wrgsbase %0" :: "r" (ptr));
+        }
+
+No extra CPUID check is needed as the kernel will not set this bit if the CPU
+does not support it.
+
+gcc 6 has special support to directly access data relative to fs/gs using the
+__seg_fs and __seg_gs address space pointer modifiers.
+
+#ifndef __SEG_GS
+#error "Need gcc 6 or later"
+#endif
+
+struct gsdata {
+	int a;
+	int b;
+} gsdata = { 1, 2 };
+
+int __seg_gs *valp = 0;		/* offset relative to GS */
+
+	/* Check if kernel supports FSGSBASE as above */
+
+	/* Set up new GS */
+	asm("wrgsbase %0" :: "r" (&gsdata));
+
+	/* Now the global pointer can be used normally */
+	printf("gsdata.a = %d\n", *valp);
+
+Andi Kleen
-- 
2.7.4


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-05-08 10:02 ` [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
@ 2019-06-14  6:54   ` Thomas Gleixner
  2019-06-14 20:07     ` Bae, Chang Seok
                       ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-14  6:54 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Ravi Shankar, LKML, Randy Dunlap, x86, Jonathan Corbet

On Wed, 8 May 2019, Chang S. Bae wrote:

> Subject: x86/fsgsbase/64: Add documentation for FSGSBASE

The proper prefix is Documentation/x86: 

> From: Andi Kleen <ak@linux.intel.com>
> 
> v2: Minor updates to documentation requested in review.
> v3: Update for new gcc and various improvements.
> v4: Address the typos pointed by Randy Dunlap

Please move the vX annoations below the --- marker so they are stripped out
automatically and I don't have to do it manually. They are not part of the
final changelog.
 
>  Documentation/x86/fsgs.txt | 103 +++++++++++++++++++++++++++++++++++++++++++++

The x86 documentation got converted to RST recently. Also as this is a
64bit specific documentation it belongs into Documentation/x86/x86_64 and
not into the generic x86 directory.

> +++ b/Documentation/x86/fsgs.txt
> @@ -0,0 +1,103 @@
> +

Documentation files require a SPDX license identifier as any other file.

> +Using FS and GS prefixes on 64-bit x86 linux

Moving this into the 64 bit specific folder spares all the 'oh this is
64bit only' notices all over the place. 

> +
> +The x86 architecture supports segment prefixes per instruction to add an

per instruction? It's only for instructions which access memory, not for
instructions which are purely register based.

> +offset to an address.  On 64-bit x86, these are mostly nops, except for FS
> +and GS.
> +
> +This offers an efficient way to reference a global pointer.

That sentence does not make any sense. What has this to do with global
pointers?

> +The compiler has to generate special code to use these base registers,
> +or they can be accessed with inline assembler.
> +
> +	mov %gs:offset,%reg
> +	mov %fs:offset,%reg
> +
> +On 64-bit code, FS is used to address the thread local segment (TLS), declared

TLS is Thread Local Storage not Segment.

> +using thread. The compiler then automatically generates the correct prefixes

What means: declared using thread? I assume you meant declared with the
__thread storage class specifier. If so, why not using the proper technical
terms?

> +and relocations to access these values.
> +
> +FS is normally managed by the runtime code or the threading library.
> +Overwriting it can break a lot of things (including syscalls and gdb),
> +but it can make sense to save/restore it for threading purposes.
> +
> +GS is freely available, but may need special (compiler or inline assembler)
> +code to use.
> +
> +Traditionally 64-bit FS and GS could be set by the arch_prctl system call

I don't see a tradition here and 'could' is just wrong.

> +
> +	arch_prctl(ARCH_SET_GS, value)
> +	arch_prctl(ARCH_SET_FS, value)
> +
> +[There was also an older method using modify_ldt(), inherited from 32-bit,
> +but this is not discussed here.]

So why is it even mentioned when it's not longer existing?

> +However, using a syscall is problematic for user space threading libraries
> +that want to context switch in user space. The whole point of them
> +is avoiding the overhead of a syscall.

User space threading libraries are one particular use case and not really
interesting for documenting this functionality. Documentation is about the
concepts and not about what a particular usecase prefers.

> It's also cleaner for compilers
> +wanting to use the extra register to use instructions to write
> +it, or read it directly to compute addresses and offsets.

I don't see the value of this either.

> +Newer Intel CPUs (Ivy Bridge and later) added new instructions to directly
> +access these registers quickly from user context:

The CPUs added new instructions?

> +	RDFSBASE %reg	read the FS base	(or _readfsbase_u64)
> +	RDGSBASE %reg	read the GS base	(or _readgsbase_u64)
> +
> +	WRFSBASE %reg	write the FS base	(or _writefsbase_u64)
> +	WRGSBASE %reg	write the GS base	(or _writegsbase_u64)
> +
> +If you use the intrinsics, include <immintrin.h> and set the -mfsgsbase option.
> +
> +The instructions are supported by the CPU when the "fsgsbase" string is shown
> +in /proc/cpuinfo (or directly retrieved through the CPUID instruction,
> +7:0 (ebx), word 9, bit 0).
> +
> +The instructions are only available to 64-bit binaries.
> +
> +In addition the kernel needs to explicitly enable these instructions, as it
> +may otherwise not correctly context switch the state. Newer Linux
> +kernels enable this. When the kernel does not enable the instruction
> +they will fault with a #UD exception.

.....

This is completely unstructured information hastily cobbled together.

As time is pressing for the 5.3 merge window, I reworked the documentation
as below. Please review and comment ASAP so I can merge the whole lot.

Thanks,

	tglx

8<------------------
From: Thomas Gleixner <tglx@linutronix.de>
Subject: Documentation/x86/64: Add documentation for GS/FS addressing mode
Date: Thu,  13 Jun 2019 22:04:24 +0300


Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/x86/x86_64/fsgs.rst  |  200 +++++++++++++++++++++++++++++++++++++
 Documentation/x86/x86_64/index.rst |    1 
 2 files changed, 201 insertions(+)
 create mode 100644 Documentation/x86/fsgs.txt

--- /dev/null
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -0,0 +1,200 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Using FS and GS segments in user space applications
+===================================================
+
+The x86 architecture supports segmentation. Instructions which access
+memory can use segment register based addressing mode. The following
+notation is used to address a byte within a segment:
+
+  Segment-register:Byte-address
+
+The segment base address is added to the Byte-address to compute the
+resulting virtual address which is accessed. This allows to access multiple
+instances of data with the identical Byte-address, i.e. the same code. The
+selection of a particular instance is purely based on the base-address in
+the segment register.
+
+In 32-bit mode the CPU provides 6 segments, which also support segment
+limits. The limits can be used to enforce address space protections.
+
+In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
+always 0 to provide a full 64bit address space. The FS and GS segments are
+still functional in 64-bit mode.
+
+Common FS and GS usage
+------------------------------
+
+The FS segment is commonly used to address Thread Local Storage (TLS). FS
+is usually managed by runtime code or a threading library. Variables
+declared with the '__thread' storage class specifier are instantiated per
+thread and the compiler emits the FS: address prefix for accesses to these
+variables. Each thread has its own FS base address so common code can be
+used without complex address offset calculations to access the per thread
+instances. Applications should not use FS for other purposes when they use
+runtimes or threading libraries which manage the per thread FS.
+
+The GS segment has no common use and can be used freely by
+applications. There is no storage class specifier similar to __thread which
+would cause the compiler to use GS based addressing modes. Newer versions
+of GCC and Clang support GS based addressing via address space identifiers.
+
+
+Reading and writing the FS/GS base address
+------------------------------------------
+
+There exist two mechanisms to read and write the FS/FS base address:
+
+ - the arch_prctl() system call
+
+ - the FSGSBASE instruction family
+
+Accessing FS/GS base with arch_prctl()
+--------------------------------------
+
+ The arch_prctl(2) based mechanism is available on all 64bit CPUs and all
+ kernel versions.
+
+ Reading the base:
+
+   arch_prctl(ARCH_GET_FS, &fsbase);
+   arch_prctl(ARCH_GET_GS, &gsbase);
+
+ Writing the base:
+
+   arch_prctl(ARCH_SET_FS, fsbase);
+   arch_prctl(ARCH_SET_GS, gsbase);
+
+ The ARCH_SET_GS prctl may be disabled depending on kernel configuration
+ and security settings.
+
+Accessing FS/GS base with the FSGSBASE instructions
+---------------------------------------------------
+
+ With the Ivy Bridge CPU generation Intel introduced a new set of
+ instructions to access the FS and GS base registers directly from user
+ space. These instructions are also supported on AMD Family 17H CPUs. The
+ following instructions are available:
+
+  =============== ===========================
+  RDFSBASE %reg   Read the FS base register
+  RDGSBASE %reg   Read the GS base register
+  WRFSBASE %reg   Write the FS base register
+  WRGSBASE %reg   Write the GS base register
+  =============== ===========================
+
+ The instructions avoid the overhead of the arch_prctl() syscall and allow
+ more flexible usage of the FS/GS addressing modes in user space
+ applications. This does not prevent conflicts between threading libraries
+ and runtimes which utilize FS and applications which want to use it for
+ their own purpose.
+
+FSGSBASE instructions enablement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
+ available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
+
+ The availability of the instructions is not enabling them
+ automatically. The kernel has to enable them explicitely in CR4. The
+ reason for this is that older kernels make assumptions about the values in
+ the GS register and enforce them when GS base is set via
+ arch_prctl(). Allowing user space to write arbitrary values to GS base
+ would violate these assumptions and cause malfunction.
+
+ On kernels which do not enable FSGSBASE the execution of the FSGSBASE
+ instructions will fault with a #UD exception.
+
+ The kernel provides reliable information about the enabled state in the
+ ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
+ kernel has FSGSBASE instructions enabled and applications can use them.
+ The following code example shows how this detection works::
+
+   #include <sys/auxv.h>
+   #include <elf.h>
+
+   /* Will be eventually in asm/hwcap.h */
+   #ifndef HWCAP2_FSGSBASE
+   #define HWCAP2_FSGSBASE        (1 << 1)
+   #endif
+
+   ....
+
+   unsigned val = getauxval(AT_HWCAP2);
+
+   if (val & HWCAP2_FSGSBASE)
+        printf("FSGSBASE enabled\n");
+
+FSGSBASE instructions compiler support
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+GCC version 6 and newer provide instrinsics for the FSGSBASE
+instructions. Clang supports them as well.
+
+  =================== ===========================
+  _readfsbase_u64()   Read the FS base register
+  _readfsbase_u64()   Read the GS base register
+  _writefsbase_u64()  Write the FS base register
+  _writegsbase_u64()  Write the GS base register
+  =================== ===========================
+
+To utilize these instrinsics <immintrin.h> must be included in the source
+code and the compiler option -mfsgsbase has to be added.
+
+Compiler support for FS/GS based addressing
+-------------------------------------------
+
+GCC version 6 and newer provide support for FS/GS based addressing via
+Named Address Spaces. GCC implements the following address space
+identifiers for x86:
+
+  ========= ====================================
+  __seg_fs  Variable is addressed relative to FS
+  __seg_gs  Variable is addressed relative to GS
+  ========= ====================================
+
+The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
+address spaces are supported. Code which implements fallback modes should
+check whether these symbols are defined. Usage example::
+
+  #ifdef __SEG_GS
+
+  long data0 = 0;
+  long data1 = 1;
+
+  long __seg_gs *ptr;
+
+  /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
+  ....
+
+  /* Set GS to point to data0 */
+  _writegsbase_u64(&data0);
+
+  /* Access offset 0 of GS */
+  ptr = 0;
+  print("data0 = %ld\n", *ptr);
+
+  /* Set GS to point to data1 */
+  _writegsbase_u64(&data1);
+  /* ptr still addresses offset 0! */
+  print("data1 = %ld\n", *ptr);
+
+
+Clang does not provide these address space identifiers, but it provides
+an attribute based mechanism:
+
+ ==================================== =====================================
+  __attribute__((address_space(256))  Variable is addressed relative to GS
+  __attribute__((address_space(257))  Variable is addressed relative to FS
+ ==================================== =====================================
+
+FS/GS based addressing with inline assembly
+-------------------------------------------
+
+In case the compiler does not support address spaces, inline assembly can
+be used for FS/GS based addressing mode::
+
+	mov %fs:offset, %reg
+	mov %gs:offset, %reg
+
+	mov %reg, %fs:offset
+	mov %reg, %gs:offset
--- a/Documentation/x86/x86_64/index.rst
+++ b/Documentation/x86/x86_64/index.rst
@@ -14,3 +14,4 @@ x86_64 Support
    fake-numa-for-cpusets
    cpu-hotplug-spec
    machinecheck
+   fsgs




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-14  6:54   ` Thomas Gleixner
@ 2019-06-14 20:07     ` Bae, Chang Seok
       [not found]       ` <89BE934A-A392-4CED-83E5-CA4FADDAE6DF@intel.com>
  2019-06-16 15:54     ` Randy Dunlap
  2019-06-22 10:15     ` [tip:x86/cpu] Documentation/x86/64: Add documentation for GS/FS addressing mode tip-bot for Thomas Gleixner
  2 siblings, 1 reply; 63+ messages in thread
From: Bae, Chang Seok @ 2019-06-14 20:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet


> On Jun 13, 2019, at 23:54, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Wed, 8 May 2019, Chang S. Bae wrote:
> 
>> Subject: x86/fsgsbase/64: Add documentation for FSGSBASE
> 
> The proper prefix is Documentation/x86: 
> 
>> From: Andi Kleen <ak@linux.intel.com>
>> 
>> v2: Minor updates to documentation requested in review.
>> v3: Update for new gcc and various improvements.
>> v4: Address the typos pointed by Randy Dunlap
> 
> Please move the vX annoations below the --- marker so they are stripped out
> automatically and I don't have to do it manually. They are not part of the
> final changelog.
> 
>> Documentation/x86/fsgs.txt | 103 +++++++++++++++++++++++++++++++++++++++++++++
> 
> The x86 documentation got converted to RST recently. Also as this is a
> 64bit specific documentation it belongs into Documentation/x86/x86_64 and
> not into the generic x86 directory.
> 
>> +++ b/Documentation/x86/fsgs.txt
>> @@ -0,0 +1,103 @@
>> +
> 
> Documentation files require a SPDX license identifier as any other file.
> 
>> +Using FS and GS prefixes on 64-bit x86 linux
> 
> Moving this into the 64 bit specific folder spares all the 'oh this is
> 64bit only' notices all over the place. 
> 
>> +
>> +The x86 architecture supports segment prefixes per instruction to add an
> 
> per instruction? It's only for instructions which access memory, not for
> instructions which are purely register based.
> 
>> +offset to an address.  On 64-bit x86, these are mostly nops, except for FS
>> +and GS.
>> +
>> +This offers an efficient way to reference a global pointer.
> 
> That sentence does not make any sense. What has this to do with global
> pointers?
> 
>> +The compiler has to generate special code to use these base registers,
>> +or they can be accessed with inline assembler.
>> +
>> +	mov %gs:offset,%reg
>> +	mov %fs:offset,%reg
>> +
>> +On 64-bit code, FS is used to address the thread local segment (TLS), declared
> 
> TLS is Thread Local Storage not Segment.
> 
>> +using thread. The compiler then automatically generates the correct prefixes
> 
> What means: declared using thread? I assume you meant declared with the
> __thread storage class specifier. If so, why not using the proper technical
> terms?
> 
>> +and relocations to access these values.
>> +
>> +FS is normally managed by the runtime code or the threading library.
>> +Overwriting it can break a lot of things (including syscalls and gdb),
>> +but it can make sense to save/restore it for threading purposes.
>> +
>> +GS is freely available, but may need special (compiler or inline assembler)
>> +code to use.
>> +
>> +Traditionally 64-bit FS and GS could be set by the arch_prctl system call
> 
> I don't see a tradition here and 'could' is just wrong.
> 
>> +
>> +	arch_prctl(ARCH_SET_GS, value)
>> +	arch_prctl(ARCH_SET_FS, value)
>> +
>> +[There was also an older method using modify_ldt(), inherited from 32-bit,
>> +but this is not discussed here.]
> 
> So why is it even mentioned when it's not longer existing?
> 
>> +However, using a syscall is problematic for user space threading libraries
>> +that want to context switch in user space. The whole point of them
>> +is avoiding the overhead of a syscall.
> 
> User space threading libraries are one particular use case and not really
> interesting for documenting this functionality. Documentation is about the
> concepts and not about what a particular usecase prefers.
> 
>> It's also cleaner for compilers
>> +wanting to use the extra register to use instructions to write
>> +it, or read it directly to compute addresses and offsets.
> 
> I don't see the value of this either.
> 
>> +Newer Intel CPUs (Ivy Bridge and later) added new instructions to directly
>> +access these registers quickly from user context:
> 
> The CPUs added new instructions?
> 
>> +	RDFSBASE %reg	read the FS base	(or _readfsbase_u64)
>> +	RDGSBASE %reg	read the GS base	(or _readgsbase_u64)
>> +
>> +	WRFSBASE %reg	write the FS base	(or _writefsbase_u64)
>> +	WRGSBASE %reg	write the GS base	(or _writegsbase_u64)
>> +
>> +If you use the intrinsics, include <immintrin.h> and set the -mfsgsbase option.
>> +
>> +The instructions are supported by the CPU when the "fsgsbase" string is shown
>> +in /proc/cpuinfo (or directly retrieved through the CPUID instruction,
>> +7:0 (ebx), word 9, bit 0).
>> +
>> +The instructions are only available to 64-bit binaries.
>> +
>> +In addition the kernel needs to explicitly enable these instructions, as it
>> +may otherwise not correctly context switch the state. Newer Linux
>> +kernels enable this. When the kernel does not enable the instruction
>> +they will fault with a #UD exception.
> 
> .....
> 
> This is completely unstructured information hastily cobbled together.
> 
> As time is pressing for the 5.3 merge window, I reworked the documentation
> as below. Please review and comment ASAP so I can merge the whole lot.
> 
> Thanks,
> 
> 	tglx
> 
> 8<------------------
> From: Thomas Gleixner <tglx@linutronix.de>
> Subject: Documentation/x86/64: Add documentation for GS/FS addressing mode
> Date: Thu,  13 Jun 2019 22:04:24 +0300
> 
> 
> Originally-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> Documentation/x86/x86_64/fsgs.rst  |  200 +++++++++++++++++++++++++++++++++++++
> Documentation/x86/x86_64/index.rst |    1 
> 2 files changed, 201 insertions(+)
> create mode 100644 Documentation/x86/fsgs.txt
> 
> --- /dev/null
> +++ b/Documentation/x86/x86_64/fsgs.rst
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Using FS and GS segments in user space applications
> +===================================================
> +
> +The x86 architecture supports segmentation. Instructions which access
> +memory can use segment register based addressing mode. The following
> +notation is used to address a byte within a segment:
> +
> +  Segment-register:Byte-address
> +
> +The segment base address is added to the Byte-address to compute the
> +resulting virtual address which is accessed. This allows to access multiple
> +instances of data with the identical Byte-address, i.e. the same code. The
> +selection of a particular instance is purely based on the base-address in
> +the segment register.
> +
> +In 32-bit mode the CPU provides 6 segments, which also support segment
> +limits. The limits can be used to enforce address space protections.
> +
> +In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
> +always 0 to provide a full 64bit address space. The FS and GS segments are
> +still functional in 64-bit mode.
> +
> +Common FS and GS usage
> +------------------------------
> +
> +The FS segment is commonly used to address Thread Local Storage (TLS). FS
> +is usually managed by runtime code or a threading library. Variables
> +declared with the '__thread' storage class specifier are instantiated per
> +thread and the compiler emits the FS: address prefix for accesses to these
> +variables. Each thread has its own FS base address so common code can be
> +used without complex address offset calculations to access the per thread
> +instances. Applications should not use FS for other purposes when they use
> +runtimes or threading libraries which manage the per thread FS.
> +
> +The GS segment has no common use and can be used freely by
> +applications. There is no storage class specifier similar to __thread which
> +would cause the compiler to use GS based addressing modes. Newer versions
> +of GCC and Clang support GS based addressing via address space identifiers.
> +
> +
> +Reading and writing the FS/GS base address
> +------------------------------------------
> +
> +There exist two mechanisms to read and write the FS/FS base address:
> +
> + - the arch_prctl() system call
> +
> + - the FSGSBASE instruction family
> +
> +Accessing FS/GS base with arch_prctl()
> +--------------------------------------
> +
> + The arch_prctl(2) based mechanism is available on all 64bit CPUs and all
> + kernel versions.
> +
> + Reading the base:
> +
> +   arch_prctl(ARCH_GET_FS, &fsbase);
> +   arch_prctl(ARCH_GET_GS, &gsbase);
> +
> + Writing the base:
> +
> +   arch_prctl(ARCH_SET_FS, fsbase);
> +   arch_prctl(ARCH_SET_GS, gsbase);
> +
> + The ARCH_SET_GS prctl may be disabled depending on kernel configuration
> + and security settings.
> +
> +Accessing FS/GS base with the FSGSBASE instructions
> +---------------------------------------------------
> +
> + With the Ivy Bridge CPU generation Intel introduced a new set of
> + instructions to access the FS and GS base registers directly from user
> + space. These instructions are also supported on AMD Family 17H CPUs. The
> + following instructions are available:
> +
> +  =============== ===========================
> +  RDFSBASE %reg   Read the FS base register
> +  RDGSBASE %reg   Read the GS base register
> +  WRFSBASE %reg   Write the FS base register
> +  WRGSBASE %reg   Write the GS base register
> +  =============== ===========================
> +
> + The instructions avoid the overhead of the arch_prctl() syscall and allow
> + more flexible usage of the FS/GS addressing modes in user space
> + applications. This does not prevent conflicts between threading libraries
> + and runtimes which utilize FS and applications which want to use it for
> + their own purpose.
> +
> +FSGSBASE instructions enablement
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> + The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
> + available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
> +
> + The availability of the instructions is not enabling them
> + automatically. The kernel has to enable them explicitely in CR4. The
> + reason for this is that older kernels make assumptions about the values in
> + the GS register and enforce them when GS base is set via
> + arch_prctl(). Allowing user space to write arbitrary values to GS base
> + would violate these assumptions and cause malfunction.
> +
> + On kernels which do not enable FSGSBASE the execution of the FSGSBASE
> + instructions will fault with a #UD exception.
> +
> + The kernel provides reliable information about the enabled state in the
> + ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
> + kernel has FSGSBASE instructions enabled and applications can use them.
> + The following code example shows how this detection works::
> +
> +   #include <sys/auxv.h>
> +   #include <elf.h>
> +
> +   /* Will be eventually in asm/hwcap.h */
> +   #ifndef HWCAP2_FSGSBASE
> +   #define HWCAP2_FSGSBASE        (1 << 1)
> +   #endif
> +
> +   ....
> +
> +   unsigned val = getauxval(AT_HWCAP2);
> +
> +   if (val & HWCAP2_FSGSBASE)
> +        printf("FSGSBASE enabled\n");
> +
> +FSGSBASE instructions compiler support
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +GCC version 6 and newer provide instrinsics for the FSGSBASE
> +instructions. Clang supports them as well.
> +

Looks like GCC-4.6.4 and Clang 5 begin to support the intrinsics with -mfsgsbase,
according to their documentations:
	GCC - https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/X86-Built_002din-Functions.html#X86-Built_002din-Functions
	Clang - https://releases.llvm.org/5.0.0/tools/clang/docs/ClangCommandLineReference.html

> +  =================== ===========================
> +  _readfsbase_u64()   Read the FS base register
> +  _readfsbase_u64()   Read the GS base register
> +  _writefsbase_u64()  Write the FS base register
> +  _writegsbase_u64()  Write the GS base register
> +  =================== ===========================
> +
> +To utilize these instrinsics <immintrin.h> must be included in the source
> +code and the compiler option -mfsgsbase has to be added.
> +
> +Compiler support for FS/GS based addressing
> +-------------------------------------------
> +
> +GCC version 6 and newer provide support for FS/GS based addressing via
> +Named Address Spaces. GCC implements the following address space
> +identifiers for x86:
> +
> +  ========= ====================================
> +  __seg_fs  Variable is addressed relative to FS
> +  __seg_gs  Variable is addressed relative to GS
> +  ========= ====================================
> +
> +The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
> +address spaces are supported. Code which implements fallback modes should
> +check whether these symbols are defined. Usage example::
> +
> +  #ifdef __SEG_GS
> +
> +  long data0 = 0;
> +  long data1 = 1;
> +
> +  long __seg_gs *ptr;
> +
> +  /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
> +  ....
> +
> +  /* Set GS to point to data0 */
> +  _writegsbase_u64(&data0);
> +
> +  /* Access offset 0 of GS */
> +  ptr = 0;
> +  print("data0 = %ld\n", *ptr);
> +
> +  /* Set GS to point to data1 */
> +  _writegsbase_u64(&data1);
> +  /* ptr still addresses offset 0! */
> +  print("data1 = %ld\n", *ptr);
> +

s/print/printf/

Tested with GCC and Clang.

> +
> +Clang does not provide these address space identifiers, but it provides
> +an attribute based mechanism:
> +
> + ==================================== =====================================
> +  __attribute__((address_space(256))  Variable is addressed relative to GS
> +  __attribute__((address_space(257))  Variable is addressed relative to FS
> + ==================================== =====================================
> +
> +FS/GS based addressing with inline assembly
> +-------------------------------------------
> +
> +In case the compiler does not support address spaces, inline assembly can
> +be used for FS/GS based addressing mode::
> +
> +	mov %fs:offset, %reg
> +	mov %gs:offset, %reg
> +
> +	mov %reg, %fs:offset
> +	mov %reg, %gs:offset
> --- a/Documentation/x86/x86_64/index.rst
> +++ b/Documentation/x86/x86_64/index.rst
> @@ -14,3 +14,4 @@ x86_64 Support
>    fake-numa-for-cpusets
>    cpu-hotplug-spec
>    machinecheck
> +   fsgs
> 
> 
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
       [not found]       ` <89BE934A-A392-4CED-83E5-CA4FADDAE6DF@intel.com>
@ 2019-06-16  8:39         ` Thomas Gleixner
  2019-06-16 12:34           ` Thomas Gleixner
  0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16  8:39 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On Sun, 16 Jun 2019, Bae, Chang Seok wrote:
> On Jun 14, 2019, at 13:07, Bae, Chang Seok <chang.seok.bae@intel.com<mailto:chang.seok.bae@intel.com>> wrote:
> 
> 
> On Jun 13, 2019, at 23:54, Thomas Gleixner <tglx@linutronix.de<mailto:tglx@linutronix.de>> wrote:
> 
> +The GS segment has no common use and can be used freely by
> +applications. There is no storage class specifier similar to __thread which
> +would cause the compiler to use GS based addressing modes. Newer versions
> +of GCC and Clang support GS based addressing via address space identifiers.
> +
> 
> …
> 
> +
> +Clang does not provide these address space identifiers, but it provides
> +an attribute based mechanism:
> +
> 
> These two sentences seem to conflict with each other; Clang needs to be clarified
> above.
> 
> Thanks for the write-up. Just preparing v8 right now. Will send out shortly.

Please dont. Send me a delta patch against the documentation. I have queued
all the other patches already internally. I did not push it out because I
wanted to have proper docs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16  8:39         ` Thomas Gleixner
@ 2019-06-16 12:34           ` Thomas Gleixner
  2019-06-16 15:34             ` Bae, Chang Seok
  0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16 12:34 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet

On Sun, 16 Jun 2019, Thomas Gleixner wrote:
> On Sun, 16 Jun 2019, Bae, Chang Seok wrote:
> > On Jun 14, 2019, at 13:07, Bae, Chang Seok <chang.seok.bae@intel.com<mailto:chang.seok.bae@intel.com>> wrote:
> > 
> > 
> > On Jun 13, 2019, at 23:54, Thomas Gleixner <tglx@linutronix.de<mailto:tglx@linutronix.de>> wrote:
> > 
> > +The GS segment has no common use and can be used freely by
> > +applications. There is no storage class specifier similar to __thread which
> > +would cause the compiler to use GS based addressing modes. Newer versions
> > +of GCC and Clang support GS based addressing via address space identifiers.
> > +
> > +Clang does not provide these address space identifiers, but it provides
> > +an attribute based mechanism:
> > +
> > 
> > These two sentences seem to conflict with each other; Clang needs to be clarified
> > above.
> > 
> > Thanks for the write-up. Just preparing v8 right now. Will send out shortly.
> 
> Please dont. Send me a delta patch against the documentation. I have queued
> all the other patches already internally. I did not push it out because I
> wanted to have proper docs.

Fixed it up already. About to push it out.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16 12:34           ` Thomas Gleixner
@ 2019-06-16 15:34             ` Bae, Chang Seok
  2019-06-16 16:05               ` Thomas Gleixner
  0 siblings, 1 reply; 63+ messages in thread
From: Bae, Chang Seok @ 2019-06-16 15:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet


> On Jun 16, 2019, at 05:34, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, 16 Jun 2019, Thomas Gleixner wrote:
>> 
>> Please dont. Send me a delta patch against the documentation. I have queued
>> all the other patches already internally. I did not push it out because I
>> wanted to have proper docs.
> 
> Fixed it up already. About to push it out.
> 

Thanks. This is the diff though.

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 22992c8377952..f667087792747 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -118,7 +118,7 @@ static __always_inline bool should_resched(int preempt_offset)
 
 	/* preempt count == 0 ? */
 	tmp &= ~PREEMPT_NEED_RESCHED;
-	if (tmp)
+	if (tmp != preempt_offset)
 		return false;
 	if (current_thread_info()->preempt_lazy_count)
 		return false;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c15583162a559..25bcf2f2714ba 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -92,6 +92,34 @@ static inline void softirq_clr_runner(unsigned int sirq)
 	sr->runner[sirq] = NULL;
 }
 
+static bool softirq_check_runner_tsk(struct task_struct *tsk,
+				     unsigned int *pending)
+{
+	bool ret = false;
+
+	if (!tsk)
+		return ret;
+
+	/*
+	 * The wakeup code in rtmutex.c wakes up the task
+	 * _before_ it sets pi_blocked_on to NULL under
+	 * tsk->pi_lock. So we need to check for both: state
+	 * and pi_blocked_on.
+	 * The test against UNINTERRUPTIBLE + ->sleeping_lock is in case the
+	 * task does cpu_chill().
+	 */
+	raw_spin_lock(&tsk->pi_lock);
+	if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING ||
+	    (tsk->state == TASK_UNINTERRUPTIBLE && tsk->sleeping_lock)) {
+		/* Clear all bits pending in that task */
+		*pending &= ~(tsk->softirqs_raised);
+		ret = true;
+	}
+	raw_spin_unlock(&tsk->pi_lock);
+
+	return ret;
+}
+
 /*
  * On preempt-rt a softirq running context might be blocked on a
  * lock. There might be no other runnable task on this CPU because the
@@ -104,6 +132,7 @@ static inline void softirq_clr_runner(unsigned int sirq)
  */
 void softirq_check_pending_idle(void)
 {
+	struct task_struct *tsk;
 	static int rate_limit;
 	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
 	u32 warnpending;
@@ -113,24 +142,23 @@ void softirq_check_pending_idle(void)
 		return;
 
 	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
+	if (!warnpending)
+		return;
 	for (i = 0; i < NR_SOFTIRQS; i++) {
-		struct task_struct *tsk = sr->runner[i];
+		tsk = sr->runner[i];
 
-		/*
-		 * The wakeup code in rtmutex.c wakes up the task
-		 * _before_ it sets pi_blocked_on to NULL under
-		 * tsk->pi_lock. So we need to check for both: state
-		 * and pi_blocked_on.
-		 */
-		if (tsk) {
-			raw_spin_lock(&tsk->pi_lock);
-			if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING) {
-				/* Clear all bits pending in that task */
-				warnpending &= ~(tsk->softirqs_raised);
-				warnpending &= ~(1 << i);
-			}
-			raw_spin_unlock(&tsk->pi_lock);
-		}
+		if (softirq_check_runner_tsk(tsk, &warnpending))
+			warnpending &= ~(1 << i);
+	}
+
+	if (warnpending) {
+		tsk = __this_cpu_read(ksoftirqd);
+		softirq_check_runner_tsk(tsk, &warnpending);
+	}
+
+	if (warnpending) {
+		tsk = __this_cpu_read(ktimer_softirqd);
+		softirq_check_runner_tsk(tsk, &warnpending);
 	}
 
 	if (warnpending) {
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 851b2134e77f4..6f2736ec4b8ef 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1902,15 +1902,18 @@ void cpu_chill(void)
 {
 	ktime_t chill_time;
 	unsigned int freeze_flag = current->flags & PF_NOFREEZE;
+	long saved_state;
 
+	saved_state = current->state;
 	chill_time = ktime_set(0, NSEC_PER_MSEC);
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	__set_current_state_no_track(TASK_UNINTERRUPTIBLE);
 	current->flags |= PF_NOFREEZE;
 	sleeping_lock_inc();
 	schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD);
 	sleeping_lock_dec();
 	if (!freeze_flag)
 		current->flags &= ~PF_NOFREEZE;
+	__set_current_state_no_track(saved_state);
 }
 EXPORT_SYMBOL(cpu_chill);
 #endif
diff --git a/localversion-rt b/localversion-rt
index 9f7d0bdbffb18..08b3e75841adc 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt13
+-rt14



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task
       [not found]     ` <6420E1A5-B5AD-4028-AA91-AA4D5445AC83@intel.com>
@ 2019-06-16 15:44       ` Bae, Chang Seok
  2019-06-16 16:32         ` Thomas Gleixner
                           ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Bae, Chang Seok @ 2019-06-16 15:44 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen
  Cc: LKML


> On Jun 14, 2019, at 13:11, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
>> 
>> On May 8, 2019, at 10:25, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
>> 
>>> On May 8, 2019, at 03:02, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
>>> 
>>> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
>>> index 4b8ee05..3309cfe 100644
>>> --- a/arch/x86/kernel/ptrace.c
>>> +++ b/arch/x86/kernel/ptrace.c
>>> @@ -396,22 +396,12 @@ static int putreg(struct task_struct *child,
>>> 	case offsetof(struct user_regs_struct,fs_base):
>>> 		if (value >= TASK_SIZE_MAX)
>>> 			return -EIO;
>>> -		/*
>>> -		 * When changing the FS base, use do_arch_prctl_64()
>>> -		 * to set the index to zero and to set the base
>>> -		 * as requested.
>>> -		 */
>>> -		if (child->thread.fsbase != value)
>>> -			return do_arch_prctl_64(child, ARCH_SET_FS, value);
>>> +		x86_fsbase_write_cpu(child, value);
> 
> This should be x86_fsbase_write_task() instead of *cpu()
> 
>>> 		return 0;
>>> 	case offsetof(struct user_regs_struct,gs_base):
>>> -		/*
>>> -		 * Exactly the same here as the %fs handling above.
>>> -		 */
>>> 		if (value >= TASK_SIZE_MAX)
>>> 			return -EIO;
>>> -		if (child->thread.gsbase != value)
>>> -			return do_arch_prctl_64(child, ARCH_SET_GS, value);
>>> +		x86_gsbase_write_cpu(child, value);
> 
> This should be also x86_gsbase_write_task() instead of *cpu()
> 
>>> 
>> 
>> Hmm, sorry there is a glitch.  At least, intended this title to be
>> “Fix ptrace-induced FS/GSBASE write behavior” and no changes
>> in the PRCTL. Will fix in the next version.
>> 
> 
> Hi Thomas, 
> 
> Due to the issues on this patch - my apologies, I was originally
> considering the v8. Given your re-write and comments on the last 
> patch (the documentation), at least I want/need to give my heads-up.
> 
> Thanks,
> Chang

[ Include LKML back. Unintentionally, it was missed. ]

Looks build error was reported with this. Sorry again for the noise.
Below patch was prepared to fix and to send with v8:

From c9aa7f6c7306aa46b3ecbb266989718c1b1dc85e Mon Sep 17 00:00:00 2001
From: "Chang S. Bae" <chang.seok.bae@intel.com>
Date: Wed, 1 May 2019 08:06:45 -0700
Subject: x86/fsgsbase/64: Fix ptrace-induced FS/GSBASE writing behaviors

When a ptracer writes to a ptracee's FS/GSBASE with a different value,
the selector is also cleared. This behavior is not straightforward.

The change will make the behavior simple and sensible, as it will
(only) update the base when requested. Also, the condition check for
comparing the base is removed to make more simple. It might save a few
cycles, but this path is not performance critical.

The only recognizable downside of this change is when writing the base
if the selector is already nonzero. The base will be reloaded according
to the selector. But the case is highly unexpected in real usages.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/ptrace.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index a166c96..3108cdc 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -397,22 +397,12 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		/*
-		 * When changing the FS base, use do_arch_prctl_64()
-		 * to set the index to zero and to set the base
-		 * as requested.
-		 */
-		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+		x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
-		/*
-		 * Exactly the same here as the %fs handling above.
-		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+		x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}
-- 
2.7.4



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-14  6:54   ` Thomas Gleixner
  2019-06-14 20:07     ` Bae, Chang Seok
@ 2019-06-16 15:54     ` Randy Dunlap
  2019-06-16 16:07       ` Thomas Gleixner
  2019-06-22 10:15     ` [tip:x86/cpu] Documentation/x86/64: Add documentation for GS/FS addressing mode tip-bot for Thomas Gleixner
  2 siblings, 1 reply; 63+ messages in thread
From: Randy Dunlap @ 2019-06-16 15:54 UTC (permalink / raw)
  To: Thomas Gleixner, Chang S. Bae
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Ravi Shankar, LKML, x86, Jonathan Corbet

Hi,

On 6/13/19 11:54 PM, Thomas Gleixner wrote:
> 
> 8<------------------
> From: Thomas Gleixner <tglx@linutronix.de>
> Subject: Documentation/x86/64: Add documentation for GS/FS addressing mode
> Date: Thu,  13 Jun 2019 22:04:24 +0300
> 
> 
> Originally-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  Documentation/x86/x86_64/fsgs.rst  |  200 +++++++++++++++++++++++++++++++++++++
>  Documentation/x86/x86_64/index.rst |    1 
>  2 files changed, 201 insertions(+)
>  create mode 100644 Documentation/x86/fsgs.txt
> 
> --- /dev/null
> +++ b/Documentation/x86/x86_64/fsgs.rst
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Using FS and GS segments in user space applications
> +===================================================
> +
> +The x86 architecture supports segmentation. Instructions which access
> +memory can use segment register based addressing mode. The following
> +notation is used to address a byte within a segment:
> +
> +  Segment-register:Byte-address
> +
> +The segment base address is added to the Byte-address to compute the
> +resulting virtual address which is accessed. This allows to access multiple
> +instances of data with the identical Byte-address, i.e. the same code. The
> +selection of a particular instance is purely based on the base-address in
> +the segment register.
> +
> +In 32-bit mode the CPU provides 6 segments, which also support segment
> +limits. The limits can be used to enforce address space protections.
> +
> +In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
> +always 0 to provide a full 64bit address space. The FS and GS segments are
> +still functional in 64-bit mode.
> +
> +Common FS and GS usage
> +------------------------------
> +
> +The FS segment is commonly used to address Thread Local Storage (TLS). FS
> +is usually managed by runtime code or a threading library. Variables
> +declared with the '__thread' storage class specifier are instantiated per
> +thread and the compiler emits the FS: address prefix for accesses to these
> +variables. Each thread has its own FS base address so common code can be
> +used without complex address offset calculations to access the per thread
> +instances. Applications should not use FS for other purposes when they use
> +runtimes or threading libraries which manage the per thread FS.
> +
> +The GS segment has no common use and can be used freely by
> +applications. There is no storage class specifier similar to __thread which
> +would cause the compiler to use GS based addressing modes. Newer versions
> +of GCC and Clang support GS based addressing via address space identifiers.
> +
> +
> +Reading and writing the FS/GS base address
> +------------------------------------------
> +
> +There exist two mechanisms to read and write the FS/FS base address:

should this be...                                   FS/GS


> +
> + - the arch_prctl() system call
> +
> + - the FSGSBASE instruction family
> +
> +Accessing FS/GS base with arch_prctl()
> +--------------------------------------
> +
> + The arch_prctl(2) based mechanism is available on all 64bit CPUs and all
> + kernel versions.
> +
> + Reading the base:
> +
> +   arch_prctl(ARCH_GET_FS, &fsbase);
> +   arch_prctl(ARCH_GET_GS, &gsbase);
> +
> + Writing the base:
> +
> +   arch_prctl(ARCH_SET_FS, fsbase);
> +   arch_prctl(ARCH_SET_GS, gsbase);
> +
> + The ARCH_SET_GS prctl may be disabled depending on kernel configuration
> + and security settings.
> +
> +Accessing FS/GS base with the FSGSBASE instructions
> +---------------------------------------------------
> +
> + With the Ivy Bridge CPU generation Intel introduced a new set of
> + instructions to access the FS and GS base registers directly from user
> + space. These instructions are also supported on AMD Family 17H CPUs. The
> + following instructions are available:
> +
> +  =============== ===========================
> +  RDFSBASE %reg   Read the FS base register
> +  RDGSBASE %reg   Read the GS base register
> +  WRFSBASE %reg   Write the FS base register
> +  WRGSBASE %reg   Write the GS base register
> +  =============== ===========================
> +
> + The instructions avoid the overhead of the arch_prctl() syscall and allow
> + more flexible usage of the FS/GS addressing modes in user space
> + applications. This does not prevent conflicts between threading libraries
> + and runtimes which utilize FS and applications which want to use it for
> + their own purpose.
> +
> +FSGSBASE instructions enablement
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> + The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
> + available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
> +
> + The availability of the instructions is not enabling them

prefer:                                  does not enable them

> + automatically. The kernel has to enable them explicitely in CR4. The

typo:                                            explicitly

> + reason for this is that older kernels make assumptions about the values in
> + the GS register and enforce them when GS base is set via
> + arch_prctl(). Allowing user space to write arbitrary values to GS base
> + would violate these assumptions and cause malfunction.
> +
> + On kernels which do not enable FSGSBASE the execution of the FSGSBASE
> + instructions will fault with a #UD exception.
> +
> + The kernel provides reliable information about the enabled state in the
> + ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
> + kernel has FSGSBASE instructions enabled and applications can use them.
> + The following code example shows how this detection works::
> +
> +   #include <sys/auxv.h>
> +   #include <elf.h>
> +
> +   /* Will be eventually in asm/hwcap.h */
> +   #ifndef HWCAP2_FSGSBASE
> +   #define HWCAP2_FSGSBASE        (1 << 1)
> +   #endif
> +
> +   ....
> +
> +   unsigned val = getauxval(AT_HWCAP2);
> +
> +   if (val & HWCAP2_FSGSBASE)
> +        printf("FSGSBASE enabled\n");
> +
> +FSGSBASE instructions compiler support
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +GCC version 6 and newer provide instrinsics for the FSGSBASE
> +instructions. Clang supports them as well.
> +
> +  =================== ===========================
> +  _readfsbase_u64()   Read the FS base register
> +  _readfsbase_u64()   Read the GS base register
> +  _writefsbase_u64()  Write the FS base register
> +  _writegsbase_u64()  Write the GS base register
> +  =================== ===========================
> +
> +To utilize these instrinsics <immintrin.h> must be included in the source
> +code and the compiler option -mfsgsbase has to be added.
> +
> +Compiler support for FS/GS based addressing
> +-------------------------------------------
> +
> +GCC version 6 and newer provide support for FS/GS based addressing via
> +Named Address Spaces. GCC implements the following address space
> +identifiers for x86:
> +
> +  ========= ====================================
> +  __seg_fs  Variable is addressed relative to FS
> +  __seg_gs  Variable is addressed relative to GS
> +  ========= ====================================
> +
> +The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
> +address spaces are supported. Code which implements fallback modes should
> +check whether these symbols are defined. Usage example::
> +
> +  #ifdef __SEG_GS
> +
> +  long data0 = 0;
> +  long data1 = 1;
> +
> +  long __seg_gs *ptr;
> +
> +  /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
> +  ....
> +
> +  /* Set GS to point to data0 */
> +  _writegsbase_u64(&data0);
> +
> +  /* Access offset 0 of GS */
> +  ptr = 0;
> +  print("data0 = %ld\n", *ptr);
> +
> +  /* Set GS to point to data1 */
> +  _writegsbase_u64(&data1);
> +  /* ptr still addresses offset 0! */
> +  print("data1 = %ld\n", *ptr);
> +
> +
> +Clang does not provide these address space identifiers, but it provides
> +an attribute based mechanism:
> +
> + ==================================== =====================================
> +  __attribute__((address_space(256))  Variable is addressed relative to GS
> +  __attribute__((address_space(257))  Variable is addressed relative to FS
> + ==================================== =====================================
> +
> +FS/GS based addressing with inline assembly
> +-------------------------------------------
> +
> +In case the compiler does not support address spaces, inline assembly can
> +be used for FS/GS based addressing mode::
> +
> +	mov %fs:offset, %reg
> +	mov %gs:offset, %reg
> +
> +	mov %reg, %fs:offset
> +	mov %reg, %gs:offset


-- 
~Randy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16 15:34             ` Bae, Chang Seok
@ 2019-06-16 16:05               ` Thomas Gleixner
  2019-06-16 20:48                 ` Bae, Chang Seok
  0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16 16:05 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet

On Sun, 16 Jun 2019, Bae, Chang Seok wrote:

> 
> > On Jun 16, 2019, at 05:34, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > On Sun, 16 Jun 2019, Thomas Gleixner wrote:
> >> 
> >> Please dont. Send me a delta patch against the documentation. I have queued
> >> all the other patches already internally. I did not push it out because I
> >> wanted to have proper docs.
> > 
> > Fixed it up already. About to push it out.
> > 
> 
> Thanks. This is the diff though.
> 
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index 22992c8377952..f667087792747 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -118,7 +118,7 @@ static __always_inline bool should_resched(int preempt_offset)
>  
>  	/* preempt count == 0 ? */
>  	tmp &= ~PREEMPT_NEED_RESCHED;
> -	if (tmp)
> +	if (tmp != preempt_offset)
>  		return false;
>  	if (current_thread_info()->preempt_lazy_count)
>  		return false;
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c15583162a559..25bcf2f2714ba 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -92,6 +92,34 @@ static inline void softirq_clr_runner(unsigned int sirq)
>  	sr->runner[sirq] = NULL;
>  }
>  
> +static bool softirq_check_runner_tsk(struct task_struct *tsk,
> +				     unsigned int *pending)

WHAT? How is this related ?


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16 15:54     ` Randy Dunlap
@ 2019-06-16 16:07       ` Thomas Gleixner
  0 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16 16:07 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Chang S. Bae, Andy Lutomirski, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Ravi Shankar, LKML, x86, Jonathan Corbet

On Sun, 16 Jun 2019, Randy Dunlap wrote:
> On 6/13/19 11:54 PM, Thomas Gleixner wrote:
> > +
> > +There exist two mechanisms to read and write the FS/FS base address:
> 
> should this be...                                   FS/GS

Indeed.

> > +FSGSBASE instructions enablement
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > + The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
> > + available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
> > +
> > + The availability of the instructions is not enabling them
> 
> prefer:                                  does not enable them

ok.

> > + automatically. The kernel has to enable them explicitely in CR4. The
> 
> typo:                                            explicitly

ok.

Randy, thanks for reviewing that!


       tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task
  2019-06-16 15:44       ` Bae, Chang Seok
@ 2019-06-16 16:32         ` Thomas Gleixner
  2019-06-22 10:04         ` [tip:x86/cpu] x86/ptrace: Prevent ptrace from clearing the FS/GS selector tip-bot for Chang S. Bae
  2020-06-18 13:50         ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16 16:32 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen, LKML

On Sun, 16 Jun 2019, Bae, Chang Seok wrote:
> > On Jun 14, 2019, at 13:11, Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
> 
> Looks build error was reported with this. Sorry again for the noise.

Well. This has not built when it was posted. Can't you run your stuff
through the Intel zero day infrastructure _BEFORE_ posting?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16 16:05               ` Thomas Gleixner
@ 2019-06-16 20:48                 ` Bae, Chang Seok
  2019-06-16 22:00                   ` Thomas Gleixner
  0 siblings, 1 reply; 63+ messages in thread
From: Bae, Chang Seok @ 2019-06-16 20:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet


> On Jun 16, 2019, at 09:05, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, 16 Jun 2019, Bae, Chang Seok wrote:
>> 
>> Thanks. This is the diff though.
>> ...
> 
> WHAT? How is this related ?

No. Sigh on me.. Instead of the garbage, I hoped to point out these small version things.
(The delta against the WIP.x86/cpu branch)

diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
index 380c0b5..d5588e00 100644
--- a/Documentation/x86/x86_64/fsgs.rst
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -125,7 +125,7 @@ FSGSBASE instructions enablement
 FSGSBASE instructions compiler support
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
+GCC version 6 and newer provide instrinsics for the FSGSBASE
 instructions. Clang supports them as well.
 
   =================== ===========================
@@ -141,7 +141,7 @@ code and the compiler option -mfsgsbase has to be added.
 Compiler support for FS/GS based addressing
 -------------------------------------------
 
-GCC version 6 and newer provide support for FS/GS based addressing via
+GCC version 4.6.4 and newer provide support for FS/GS based addressing via
 Named Address Spaces. GCC implements the following address space
 identifiers for x86:

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
  2019-06-16 20:48                 ` Bae, Chang Seok
@ 2019-06-16 22:00                   ` Thomas Gleixner
       [not found]                     ` <8E2E84B6-BCCC-424D-A1A7-604828B389FB@intel.com>
  0 siblings, 1 reply; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-16 22:00 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet

On Sun, 16 Jun 2019, Bae, Chang Seok wrote:
> > On Jun 16, 2019, at 09:05, Thomas Gleixner <tglx@linutronix.de> wrote:
> diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
> index 380c0b5..d5588e00 100644
> --- a/Documentation/x86/x86_64/fsgs.rst
> +++ b/Documentation/x86/x86_64/fsgs.rst
> @@ -125,7 +125,7 @@ FSGSBASE instructions enablement
>  FSGSBASE instructions compiler support
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  
> -GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
> +GCC version 6 and newer provide instrinsics for the FSGSBASE
>  instructions. Clang supports them as well.
>  
>    =================== ===========================
> @@ -141,7 +141,7 @@ code and the compiler option -mfsgsbase has to be added.
>  Compiler support for FS/GS based addressing
>  -------------------------------------------
>  
> -GCC version 6 and newer provide support for FS/GS based addressing via
> +GCC version 4.6.4 and newer provide support for FS/GS based addressing via
>  Named Address Spaces. GCC implements the following address space
>  identifiers for x86:

That's close to what I pushed out earlier into tip WIP.x86/cpu

Please check against that version including the Clang part about address
spaces close to the end.

Thanks, 

	tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE
       [not found]                     ` <8E2E84B6-BCCC-424D-A1A7-604828B389FB@intel.com>
@ 2019-06-17  5:18                       ` Thomas Gleixner
  0 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-17  5:18 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Randy Dunlap, x86, Jonathan Corbet

Chang,

On Mon, 17 Jun 2019, Bae, Chang Seok wrote:

Can you please use proper quoting style?

> On Jun 16, 2019, at 15:00, Thomas Gleixner <tglx@linutronix.de<mailto:tglx@linutronix.de>> wrote:
> > 
> > > -GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
> > > +GCC version 6 and newer provide instrinsics for the FSGSBASE
> > > instructions. Clang supports them as well.
> > > 
> > >   =================== ===========================
> > > @@ -141,7 +141,7 @@ code and the compiler option -mfsgsbase has to be added.
> > > Compiler support for FS/GS based addressing
> > > -------------------------------------------
> > > 
> > > -GCC version 6 and newer provide support for FS/GS based addressing via
> > > +GCC version 4.6.4 and newer provide support for FS/GS based addressing via
> > > Named Address Spaces. GCC implements the following address space
> > > identifiers for x86:
> > > 
> > That's close to what I pushed out earlier into tip WIP.x86/cpu
> >  
> >  Please check against that version including the Clang part about address
> >  spaces close to the end.
> 
> 
> It is actually rebased on the tip branch (WIP.x86/cpu).

I have no idea what you mean with that. That patch you sent (see above) did
not apply against WIP.x86/cpu and claims exactly what I changed and pushed
out. Now you say it's the other way round:

> The point is the two GCC version indications are opposite right now:
>  - Intrinsics support begins from v4.6.4, not v6.
> - Address space identifiers support starts from v6, instead of v4.6.4

Is it really so hard to send proper patches like the below or if that's not
possible write up the facts so someone else can turn it into a proper patch
like the one below:

Thanks,

	tglx

8<--------------------
diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
index d5588e00b939..380c0b5ccca2 100644
--- a/Documentation/x86/x86_64/fsgs.rst
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -125,7 +125,7 @@ FSGSBASE instructions enablement
 FSGSBASE instructions compiler support
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-GCC version 6 and newer provide instrinsics for the FSGSBASE
+GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
 instructions. Clang supports them as well.
 
   =================== ===========================
@@ -141,7 +141,7 @@ code and the compiler option -mfsgsbase has to be added.
 Compiler support for FS/GS based addressing
 -------------------------------------------
 
-GCC version 4.6.4 and newer provide support for FS/GS based addressing via
+GCC version 6 and newer provide support for FS/GS based addressing via
 Named Address Spaces. GCC implements the following address space
 identifiers for x86:
 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on
  2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
@ 2019-06-21 15:22   ` Thomas Gleixner
  2019-06-22 10:08   ` [tip:x86/cpu] x86/process/64: Use FSBSBASE in switch_to() if available tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-21 15:22 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Andy Lutomirski, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Ravi Shankar, LKML

On Wed, 8 May 2019, Chang S. Bae wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> With the new FSGSBASE instructions, we can efficiently read and write
> the FSBASE and GSBASE in __switch_to().  Use that capability to preserve
> the full state.
> 
> This will enable user code to do whatever it wants with the new
> instructions without any kernel-induced gotchas.  (There can still be
> architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
> if WRGSBASE was used, but users are expected to read the CPU manual
> before doing things like that.)
> 
> This is a considerable speedup.  It seems to save about 100 cycles
> per context switch compared to the baseline 4.6-rc1 behavior on my
> Skylake laptop.
> 
> [ chang: 5~10% performance improvements were seen by a context switch
>   benchmark that ran threads with different FS/GSBASE values (to the
>   baseline 4.16). Minor edit on the changelog. ]
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 17421c3..e2089c9 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -239,8 +239,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
>  {
>  	savesegment(fs, task->thread.fsindex);
>  	savesegment(gs, task->thread.gsindex);
> -	save_base_legacy(task, task->thread.fsindex, FS);
> -	save_base_legacy(task, task->thread.gsindex, GS);
> +	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +		/*
> +		 * If FSGSBASE is enabled, we can't make any useful guesses
> +		 * about the base, and user code expects us to save the current
> +		 * value.  Fortunately, reading the base directly is efficient.
> +		 */
> +		task->thread.fsbase = rdfsbase();
> +		task->thread.gsbase = __rdgsbase_inactive();

This explodes when called from copy_thread_tls() when an interrupt hits
after switching GS in __rdgsbase_inactive(). Wrapped it with
local_irq_save/restore().

Oh well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/ptrace: Prevent ptrace from clearing the FS/GS selector
  2019-06-16 15:44       ` Bae, Chang Seok
  2019-06-16 16:32         ` Thomas Gleixner
@ 2019-06-22 10:04         ` tip-bot for Chang S. Bae
  2020-06-18 13:50         ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, luto, linux-kernel, hpa, ak, chang.seok.bae

Commit-ID:  48f5e52e916b55fb73754833efbacc7f8081a159
Gitweb:     https://git.kernel.org/tip/48f5e52e916b55fb73754833efbacc7f8081a159
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Sun, 16 Jun 2019 15:44:11 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:50 +0200

x86/ptrace: Prevent ptrace from clearing the FS/GS selector

When a ptracer writes a ptracee's FS/GSBASE with a different value, the
selector is also cleared. This behavior is not correct as the selector
should be preserved.

Update only the base value and leave the selector intact. To simplify the
code further remove the conditional checking for the same value as this
code is not performance critical.

The only recognizable downside of this change is when the selector is
already nonzero on write. The base will be reloaded according to the
selector. But the case is highly unexpected in real usages.

[ tglx: Massage changelog ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com

---
 arch/x86/kernel/ptrace.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index a166c960bc9e..3108cdc00b29 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -397,22 +397,12 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		/*
-		 * When changing the FS base, use do_arch_prctl_64()
-		 * to set the index to zero and to set the base
-		 * as requested.
-		 */
-		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+		x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
-		/*
-		 * Exactly the same here as the %fs handling above.
-		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+		x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write
  2019-05-08 10:02 ` [PATCH v7 02/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write Chang S. Bae
@ 2019-06-22 10:04   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, tglx, linux-kernel, luto, hpa, chang.seok.bae, ravi.v.shankar, mingo

Commit-ID:  1b6858d5a2eb2485761f06bd48055ed5bed08464
Gitweb:     https://git.kernel.org/tip/1b6858d5a2eb2485761f06bd48055ed5bed08464
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:17 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:51 +0200

selftests/x86/fsgsbase: Test ptracer-induced GSBASE write

The test validates that the selector is not changed when a ptracer writes
the ptracee's GSBASE.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-3-git-send-email-chang.seok.bae@intel.com

---
 tools/testing/selftests/x86/fsgsbase.c | 70 ++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index af85bd4752a5..b02ddce49bbb 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -23,6 +23,9 @@
 #include <pthread.h>
 #include <asm/ldt.h>
 #include <sys/mman.h>
+#include <stddef.h>
+#include <sys/ptrace.h>
+#include <sys/wait.h>
 
 #ifndef __x86_64__
 # error This test is 64-bit only
@@ -367,6 +370,71 @@ static void test_unexpected_base(void)
 	}
 }
 
+#define USER_REGS_OFFSET(r) offsetof(struct user_regs_struct, r)
+
+static void test_ptrace_write_gsbase(void)
+{
+	int status;
+	pid_t child = fork();
+
+	if (child < 0)
+		err(1, "fork");
+
+	if (child == 0) {
+		printf("[RUN]\tPTRACE_POKE(), write GSBASE from ptracer\n");
+
+		/*
+		 * Use the LDT setup and fetch the GSBASE from the LDT
+		 * by switching to the (nonzero) selector (again)
+		 */
+		do_unexpected_base();
+		asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0x7));
+
+		if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0)
+			err(1, "PTRACE_TRACEME");
+
+		raise(SIGTRAP);
+		_exit(0);
+	}
+
+	wait(&status);
+
+	if (WSTOPSIG(status) == SIGTRAP) {
+		unsigned long gs;
+		unsigned long gs_offset = USER_REGS_OFFSET(gs);
+		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
+
+		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+
+		if (gs != 0x7) {
+			nerrs++;
+			printf("[FAIL]\tGS is not prepared with nonzero\n");
+			goto END;
+		}
+
+		if (ptrace(PTRACE_POKEUSER, child, base_offset, 0xFF) != 0)
+			err(1, "PTRACE_POKEUSER");
+
+		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+
+		/*
+		 * In a non-FSGSBASE system, the nonzero selector will load
+		 * GSBASE (again). But what is tested here is whether the
+		 * selector value is changed or not by the GSBASE write in
+		 * a ptracer.
+		 */
+		if (gs != 0x7) {
+			nerrs++;
+			printf("[FAIL]\tGS changed to %lx\n", gs);
+		} else {
+			printf("[OK]\tGS remained 0x7\n");
+		}
+	}
+
+END:
+	ptrace(PTRACE_CONT, child, NULL, NULL);
+}
+
 int main()
 {
 	pthread_t thread;
@@ -423,5 +491,7 @@ int main()
 	if (pthread_join(thread, NULL) != 0)
 		err(1, "pthread_join");
 
+	test_ptrace_write_gsbase();
+
 	return nerrs == 0 ? 0 : 1;
 }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2019-05-08 10:02 ` [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
@ 2019-06-22 10:05   ` tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-06-22 10:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, ak, akpm, linux-kernel, chang.seok.bae, luto, hpa,
	rdunlap, ravi.v.shankar, tglx

Commit-ID:  b64ed19b93c368be0fb6acf05377e8e3a694c92b
Gitweb:     https://git.kernel.org/tip/b64ed19b93c368be0fb6acf05377e8e3a694c92b
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 8 May 2019 03:02:18 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:51 +0200

x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com

---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/x86/kernel/cpu/common.c                    | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..b0fa5273b0fc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2857,6 +2857,9 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
+	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
+			replaced with a nofsgsbase flag.
+
 	no_console_suspend
 			[HW] Never suspend the console
 			Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index dad20bc891d5..71defe2d1b7c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -366,6 +366,22 @@ out:
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+/*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+	unsafe_fsgsbase = true;
+	return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1370,6 +1386,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smap(c);
 	setup_umip(c);
 
+	/* Enable FSGSBASE instructions if available. */
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+		if (unsafe_fsgsbase)
+			cr4_set_bits(X86_CR4_FSGSBASE);
+		else
+			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+	}
+
 	/*
 	 * The vendor-specific functions might have changed features.
 	 * Now we do "generic changes."

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] kbuild: Raise the minimum required binutils version to 2.21
  2019-05-08 10:02 ` [PATCH v7 04/18] kbuild: Raise the minimum required binutils version to 2.21 Chang S. Bae
@ 2019-06-22 10:06   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, ravi.v.shankar, linux-kernel, chang.seok.bae, hpa, luto,
	torvalds, mingo, ak, akpm

Commit-ID:  1fb12b35e5ffe379d109b22cb3069830d0136d9a
Gitweb:     https://git.kernel.org/tip/1fb12b35e5ffe379d109b22cb3069830d0136d9a
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:19 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:51 +0200

kbuild: Raise the minimum required binutils version to 2.21

It helps to use some new instructions directly in assembly code.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Linux Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-5-git-send-email-chang.seok.bae@intel.com

---
 Documentation/process/changes.rst | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 18735dc460a0..0a18075c485e 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -31,7 +31,7 @@ you probably needn't concern yourself with isdn4k-utils.
 ====================== ===============  ========================================
 GNU C                  4.6              gcc --version
 GNU make               3.81             make --version
-binutils               2.20             ld -v
+binutils               2.21             ld -v
 flex                   2.5.35           flex --version
 bison                  2.0              bison --version
 util-linux             2.10o            fdformat --version
@@ -77,9 +77,7 @@ You will need GNU make 3.81 or later to build the kernel.
 Binutils
 --------
 
-The build system has, as of 4.13, switched to using thin archives (`ar T`)
-rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
-This requires binutils 2.20 or newer.
+Binutils 2.21 or newer is needed to build the kernel.
 
 pkg-config
 ----------

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  2019-05-08 10:02 ` [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
@ 2019-06-22 10:06   ` tip-bot for Andi Kleen
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Andi Kleen @ 2019-06-22 10:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, ak, linux-kernel, chang.seok.bae, tglx, hpa, luto, ravi.v.shankar

Commit-ID:  8b71340d702ec5d587443b38a852671c4fb6a723
Gitweb:     https://git.kernel.org/tip/8b71340d702ec5d587443b38a852671c4fb6a723
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 8 May 2019 03:02:20 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:52 +0200

x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-6-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c743de77..fdd1177499b4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
 extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
 extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  2019-05-08 10:02 ` [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
@ 2019-06-22 10:07   ` tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, andrew.cooper3, hpa, ravi.v.shankar, chang.seok.bae,
	linux-kernel, ak, luto

Commit-ID:  a86b4625138d39e97b4cc254fc9c4bb9e1dc4542
Gitweb:     https://git.kernel.org/tip/a86b4625138d39e97b4cc254fc9c4bb9e1dc4542
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:21 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:52 +0200

x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions

Add cpu feature conditional FSGSBASE access to the relevant helper
functions. That allows to accelerate certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GSBASE operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive()
helpers. Note, for Xen PV, paravirt hooks can be added later as they might
allow a very efficient but different implementation.

[ tglx: Massaged changelog ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/include/asm/fsgsbase.h | 27 ++++++++---------
 arch/x86/kernel/process_64.c    | 66 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177499b4..aefd53767a5d 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
 	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		wrfsbase(fsbase);
+	else
+		wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 250e4c4ac6d9..c34ee0f72378 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,6 +161,40 @@ enum which_selector {
 	GS
 };
 
+/*
+ * Out of line to be protected from kprobes. It is not used on Xen
+ * paravirt. When paravirt support is needed, it needs to be renamed
+ * with native_ prefix.
+ */
+static noinline unsigned long __rdgsbase_inactive(void)
+{
+	unsigned long gsbase;
+
+	lockdep_assert_irqs_disabled();
+
+	native_swapgs();
+	gsbase = rdgsbase();
+	native_swapgs();
+
+	return gsbase;
+}
+NOKPROBE_SYMBOL(__rdgsbase_inactive);
+
+/*
+ * Out of line to be protected from kprobes. It is not used on Xen
+ * paravirt. When paravirt support is needed, it needs to be renamed
+ * with native_ prefix.
+ */
+static noinline void __wrgsbase_inactive(unsigned long gsbase)
+{
+	lockdep_assert_irqs_disabled();
+
+	native_swapgs();
+	wrgsbase(gsbase);
+	native_swapgs();
+}
+NOKPROBE_SYMBOL(__wrgsbase_inactive);
+
 /*
  * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
  * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
@@ -339,6 +373,38 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		/* Interrupts are disabled here. */
+		local_irq_save(flags);
+		gsbase = __rdgsbase_inactive();
+		local_irq_restore(flags);
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		/* Interrupts are disabled here. */
+		local_irq_save(flags);
+		__wrgsbase_inactive(gsbase);
+		local_irq_restore(flags);
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/process/64: Use FSBSBASE in switch_to() if available
  2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
  2019-06-21 15:22   ` Thomas Gleixner
@ 2019-06-22 10:08   ` tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-06-22 10:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, hpa, chang.seok.bae, luto, ravi.v.shankar, mingo, ak

Commit-ID:  1ab5f3f7fe3d7548b4361b68c1fed140c6841af9
Gitweb:     https://git.kernel.org/tip/1ab5f3f7fe3d7548b4361b68c1fed140c6841af9
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 8 May 2019 03:02:22 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:52 +0200

x86/process/64: Use FSBSBASE in switch_to() if available

With the new FSGSBASE instructions, FS and GSABSE can be efficiently read
and writen in __switch_to().  Use that capability to preserve the full
state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if
WRGSBASE was used, but users are expected to read the CPU manual before
doing things like that.)

This is a considerable speedup.  It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on a
Skylake laptop.

[ chang: 5~10% performance improvements were seen with a context switch
  benchmark that ran threads with different FS/GSBASE values (to the
  baseline 4.16). Minor edit on the changelog. ]

[ tglx: Masaage changelog ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index c34ee0f72378..59013f480b86 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -244,8 +244,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
 {
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
-	save_base_legacy(task, task->thread.fsindex, FS);
-	save_base_legacy(task, task->thread.gsindex, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * If FSGSBASE is enabled, we can't make any useful guesses
+		 * about the base, and user code expects us to save the current
+		 * value.  Fortunately, reading the base directly is efficient.
+		 */
+		task->thread.fsbase = rdfsbase();
+		task->thread.gsbase = __rdgsbase_inactive();
+	} else {
+		save_base_legacy(task, task->thread.fsindex, FS);
+		save_base_legacy(task, task->thread.gsindex, GS);
+	}
 }
 
 #if IS_ENABLED(CONFIG_KVM)
@@ -324,10 +334,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/* Update the FS and GS selectors if they could have changed. */
+		if (unlikely(prev->fsindex || next->fsindex))
+			loadseg(FS, next->fsindex);
+		if (unlikely(prev->gsindex || next->gsindex))
+			loadseg(GS, next->gsindex);
+
+		/* Update the bases. */
+		wrfsbase(next->fsbase);
+		__wrgsbase_inactive(next->gsbase);
+	} else {
+		load_seg_legacy(prev->fsindex, prev->fsbase,
+				next->fsindex, next->fsbase, FS);
+		load_seg_legacy(prev->gsindex, prev->gsbase,
+				next->gsindex, next->gsbase, GS);
+	}
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
  2019-05-08 10:02 ` [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions Chang S. Bae
@ 2019-06-22 10:09   ` tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, luto, mingo, tglx, linux-kernel, ravi.v.shankar, ak, chang.seok.bae

Commit-ID:  f60a83df4593c5e03e746ded66d8b436c4ad6e41
Gitweb:     https://git.kernel.org/tip/f60a83df4593c5e03e746ded66d8b436c4ad6e41
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:23 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:53 +0200

x86/process/64: Use FSGSBASE instructions on thread copy and ptrace

When FSGSBASE is enabled, copying threads and reading fsbase and gsbase
using ptrace must read the actual values.

When copying a thread, use save_fsgs() and copy the saved values.  For
ptrace, the bases must be read from memory regardless of the selector if
FSGSBASE is enabled.

[ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ]
[ luto: Massage changelog ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/kernel/process_64.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 59013f480b86..8f239091c15d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -245,13 +245,17 @@ static __always_inline void save_fsgs(struct task_struct *task)
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
 	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
 		/*
 		 * If FSGSBASE is enabled, we can't make any useful guesses
 		 * about the base, and user code expects us to save the current
 		 * value.  Fortunately, reading the base directly is efficient.
 		 */
 		task->thread.fsbase = rdfsbase();
+		local_irq_save(flags);
 		task->thread.gsbase = __rdgsbase_inactive();
+		local_irq_restore(flags);
 	} else {
 		save_base_legacy(task, task->thread.fsindex, FS);
 		save_base_legacy(task, task->thread.gsindex, GS);
@@ -433,7 +437,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.fsindex == 0))
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -447,7 +452,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.gsindex == 0))
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -487,10 +493,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
-	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : me->thread.fsbase;
+	save_fsgs(me);
+	p->thread.fsindex = me->thread.fsindex;
+	p->thread.fsbase = me->thread.fsbase;
+	p->thread.gsindex = me->thread.gsindex;
+	p->thread.gsbase = me->thread.gsbase;
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  2019-05-08 10:02 ` [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry Chang S. Bae
@ 2019-06-22 10:09   ` tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, chang.seok.bae, ak, hpa, luto, tglx, linux-kernel,
	mingo, ravi.v.shankar

Commit-ID:  1d07316b1363a004ed548c3759584f8e8b1e24e3
Gitweb:     https://git.kernel.org/tip/1d07316b1363a004ed548c3759584f8e8b1e24e3
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:25 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:53 +0200

x86/entry/64: Switch CR3 before SWAPGS in paranoid entry

When FSGSBASE is enabled, the GSBASE handling in paranoid entry will need
to retrieve the kernel GSBASE which requires that the kernel page table is
active.

As the CR3 switch to the kernel page tables (PTI is active) does not depend
on kernel GSBASE, move the CR3 switch in front of the GSBASE handling.

Comment the EBX content while at it.

No functional change.

[ tglx: Rewrote changelog and comments ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-11-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/entry/entry_64.S | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 11aa3b2afa4d..aaa846f8850a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1173,13 +1173,6 @@ ENTRY(paranoid_entry)
 	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
-	movl	$1, %ebx
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	1f				/* negative -> in kernel */
-	SWAPGS
-	xorl	%ebx, %ebx
 
 1:
 	/*
@@ -1191,9 +1184,30 @@ ENTRY(paranoid_entry)
 	 * This is also why CS (stashed in the "iret frame" by the
 	 * hardware at entry) can not be used: this may be a return
 	 * to kernel code, but with a user CR3 value.
+	 *
+	 * Switching CR3 does not depend on kernel GSBASE so it can
+	 * be done before switching to the kernel GSBASE. This is
+	 * required for FSGSBASE because the kernel GSBASE has to
+	 * be retrieved from a kernel internal table.
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	/* EBX = 1 -> kernel GSBASE active, no restore required */
+	movl	$1, %ebx
+	/*
+	 * The kernel-enforced convention is a negative GSBASE indicates
+	 * a kernel value. No SWAPGS needed on entry and exit.
+	 */
+	movl	$MSR_GS_BASE, %ecx
+	rdmsr
+	testl	%edx, %edx
+	jns	.Lparanoid_entry_swapgs
+	ret
+
+.Lparanoid_entry_swapgs:
+	SWAPGS
+	/* EBX = 0 -> SWAPGS required on exit */
+	xorl	%ebx, %ebx
 	ret
 END(paranoid_entry)
 
@@ -1213,7 +1227,8 @@ ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/* If EBX is 0, SWAPGS is required */
+	testl	%ebx, %ebx
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
 	/* Always restore stashed CR3 value (see paranoid_entry) */

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  2019-05-08 10:02 ` [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
@ 2019-06-22 10:10   ` tip-bot for Chang S. Bae
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, ravi.v.shankar, linux-kernel, mingo, hpa, dave.hansen, ak,
	chang.seok.bae, tglx

Commit-ID:  79e1932fa3cedd731ddbd6af111fe4db8ca109ae
Gitweb:     https://git.kernel.org/tip/79e1932fa3cedd731ddbd6af111fe4db8ca109ae
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:26 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:54 +0200

x86/entry/64: Introduce the FIND_PERCPU_BASE macro

GSBASE is used to find per-CPU data in the kernel. But when GSBASE is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR.  The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.21 do not support
it. Support is added in version 2.27.

[ tglx: Massaged changelog ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-12-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/entry/calling.h    | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/inst.h | 15 +++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index efb0d1b1f15f..9a524360ae2e 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/inst.h>
 
 /*
 
@@ -345,6 +346,39 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+	movq	$__CPUNODE_SEG, \reg
+	lsl	\reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+	ALTERNATIVE \
+		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+		"RDPID	\reg", \
+		X86_FEATURE_RDPID
+	andq	$VDSO_CPUNODE_MASK, \reg
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+	movq	pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
 /*
  * This does 'call enter_from_user_mode' unless we can avoid it based on
  * kernel config or using the static jump infrastructure.
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796da07f8..d063841a17e3 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
 	.endif
 	MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
 	.endm
+
+.macro RDPID opd
+	REG_TYPE rdpid_opd_type \opd
+	.if rdpid_opd_type == REG_TYPE_R64
+	R64_NUM rdpid_opd \opd
+	.else
+	R32_NUM rdpid_opd \opd
+	.endif
+	.byte 0xf3
+	.if rdpid_opd > 7
+	PFX_REX rdpid_opd 0
+	.endif
+	.byte 0x0f, 0xc7
+	MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
@ 2019-06-22 10:11   ` tip-bot for Chang S. Bae
  2019-06-29  7:21   ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Bae, Chang Seok
  2020-06-18 13:50   ` [tip: x86/fsgsbase] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot2 for Chang S. Bae
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ravi.v.shankar, hpa, luto, dave.hansen, linux-kernel, mingo,
	tglx, chang.seok.bae, ak

Commit-ID:  708078f65721b46d82d9934a3f0b36a2b8ad0656
Gitweb:     https://git.kernel.org/tip/708078f65721b46d82d9934a3f0b36a2b8ad0656
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:27 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:54 +0200

x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit

Without FSGSBASE, user space cannot change GSBASE other than through a
PRCTL. The kernel enforces that the user space GSBASE value is postive as
negative values are used for detecting the kernel space GSBASE value in the
paranoid entry code.

If FSGSBASE is enabled, user space can set arbitrary GSBASE values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.

To avoid this, paranoid entry needs to unconditionally save the current
GSBASE value independent of the interrupted context, retrieve and write the
kernel GSBASE and unconditionally restore the saved value on exit. The
restore happens either in paranoid_exit or in the special exit path of the
NMI low level code.

All other entry code pathes which use unconditional SWAPGS are not affected
as they do not depend on the actual content.

[ tglx: Massaged changelogs and comments ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-13-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/entry/calling.h  |  6 ++++
 arch/x86/entry/entry_64.S | 80 ++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9a524360ae2e..d3fbe2dc03ea 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -338,6 +338,12 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+	rdgsbase \save_reg
+	GET_PERCPU_BASE \scratch_reg
+	wrgsbase \scratch_reg
+.endm
+
 #endif /* CONFIG_X86_64 */
 
 .macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index aaa846f8850a..7f9f5119d6b1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -947,7 +948,6 @@ ENTRY(\sym)
 	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
 
-	/* these procedures expect "no swapgs" flag in ebx */
 	.if \paranoid
 	jmp	paranoid_exit
 	.else
@@ -1164,9 +1164,14 @@ idtentry machine_check		do_mce			has_error_code=0	paranoid=1
 #endif
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs. Return GSBASE related information
+ * in EBX depending on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
+ *
+ *     Y        GSBASE value at entry, must be restored in paranoid_exit
  */
 ENTRY(paranoid_entry)
 	UNWIND_HINT_FUNC
@@ -1174,7 +1179,6 @@ ENTRY(paranoid_entry)
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
 
-1:
 	/*
 	 * Always stash CR3 in %r14.  This value will be restored,
 	 * verbatim, at exit.  Needed if paranoid_entry interrupted
@@ -1192,6 +1196,25 @@ ENTRY(paranoid_entry)
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+        /*
+	 * Handling GSBASE depends on the availability of FSGSBASE.
+	 *
+	 * Without FSGSBASE the kernel enforces that negative GSBASE
+	 * values indicate kernel GSBASE. With FSGSBASE no assumptions
+	 * can be made about the GSBASE value when entering from user
+	 * space.
+	*/
+	ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
+
+	/*
+	 * Read the current GSBASE and store it in in %rbx unconditionally,
+	 * retrieve and set the current CPUs kernel GSBASE. The stored value
+	 * has to be restored in paranoid_exit unconditionally.
+	 */
+	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+	ret
+
+.Lparanoid_entry_checkgs:
 	/* EBX = 1 -> kernel GSBASE active, no restore required */
 	movl	$1, %ebx
 	/*
@@ -1218,16 +1241,32 @@ END(paranoid_entry)
  *
  * We may be returning to very strange contexts (e.g. very early
  * in syscall entry), so checking for preemption here would
- * be complicated.  Fortunately, we there's no good reason
- * to try to handle preemption here.
+ * be complicated.  Fortunately, there's no good reason to try
+ * to handle preemption here.
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * R/EBX contains the GSBASE related information depending on the
+ * availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
+ *
+ *     Y        User space GSBASE, must be restored unconditionally
  */
 ENTRY(paranoid_exit)
 	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
-	/* If EBX is 0, SWAPGS is required */
+
+	/* Handle GS depending on FSGSBASE availability */
+	ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "nop",X86_FEATURE_FSGSBASE
+
+	/* With FSGSBASE enabled, unconditionally restore GSBASE */
+	wrgsbase	%rbx
+	jmp	.Lparanoid_exit_no_swapgs;
+
+.Lparanoid_exit_checkgs:
+	/* On non-FSGSBASE systems, conditionally do SWAPGS */
 	testl	%ebx, %ebx
 	jnz	.Lparanoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
@@ -1235,12 +1274,14 @@ ENTRY(paranoid_exit)
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
 	SWAPGS_UNSAFE_STACK
 	jmp	.Lparanoid_exit_restore
+
 .Lparanoid_exit_no_swapgs:
 	TRACE_IRQS_IRETQ_DEBUG
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
+
 .Lparanoid_exit_restore:
-	jmp restore_regs_and_return_to_kernel
+	jmp	restore_regs_and_return_to_kernel
 END(paranoid_exit)
 
 /*
@@ -1651,10 +1692,27 @@ end_repeat_nmi:
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/*
+	 * The above invocation of paranoid_entry stored the GSBASE
+	 * related information in R/EBX depending on the availability
+	 * of FSGSBASE.
+	 *
+	 * If FSGSBASE is enabled, restore the saved GSBASE value
+	 * unconditionally, otherwise take the conditional SWAPGS path.
+	 */
+	ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
+
+	wrgsbase	%rbx
+	jmp	nmi_restore
+
+nmi_no_fsgsbase:
+	/* EBX == 0 -> invoke SWAPGS */
+	testl	%ebx, %ebx
 	jnz	nmi_restore
+
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
+
 nmi_restore:
 	POP_REGS
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/entry/64: Document GSBASE handling in the paranoid path
  2019-05-08 10:02 ` [PATCH v7 13/18] x86/fsgsbase/64: Document GSBASE handling in the paranoid path Chang S. Bae
@ 2019-06-22 10:11   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, chang.seok.bae, luto, ravi.v.shankar, linux-kernel, ak, mingo

Commit-ID:  5bf0cab60ee2c730ec91ae0aabc3146bcfed138b
Gitweb:     https://git.kernel.org/tip/5bf0cab60ee2c730ec91ae0aabc3146bcfed138b
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:28 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:55 +0200

x86/entry/64: Document GSBASE handling in the paranoid path

On a FSGSBASE system, the way to handle GSBASE in the paranoid path is
different from the existing SWAPGS-based entry/exit path handling. Document
the reason and what has to be done for FSGSBASE enabled systems.

[ tglx: Massaged doc and changelog ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-14-git-send-email-chang.seok.bae@intel.com

---
 Documentation/x86/entry_64.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index a48b3f6ebbe8..b87c1d816aea 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,12 @@ We try to only use IST entries and the paranoid entry code for vectors
 that absolutely need the more expensive check for the GS base - and we
 generate all 'normal' entry points with the regular (faster) paranoid=0
 variant.
+
+On a FSGSBASE system, however, user space can set GS without kernel
+interaction. It means the value of GS base itself does not imply anything,
+whether a kernel value or a user space value. So, there is no longer a safe
+way to check whether the exception is entering from user mode or kernel
+mode in the paranoid entry code path. So the GSBASE value needs to be read
+out, saved and the kernel GSBASE value written. On exit the saved GSBASE
+value needs to be restored unconditionally. The non paranoid entry/exit
+code still uses SWAPGS unconditionally as the state is known.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] selftests/x86/fsgsbase: Test RD/WRGSBASE
  2019-05-08 10:02 ` [PATCH v7 14/18] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
@ 2019-06-22 10:12   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-06-22 10:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, hpa, luto, ravi.v.shankar, tglx, chang.seok.bae, linux-kernel, mingo

Commit-ID:  9ad75a0922e1533b08f3d1451bd908d19e5db41e
Gitweb:     https://git.kernel.org/tip/9ad75a0922e1533b08f3d1451bd908d19e5db41e
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 8 May 2019 03:02:29 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:55 +0200

selftests/x86/fsgsbase: Test RD/WRGSBASE

This validates that GS and GSBASE are independently preserved across
context switches.

[ chang: Use FSGSBASE instructions directly instead of .byte ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-15-git-send-email-chang.seok.bae@intel.com

---
 tools/testing/selftests/x86/fsgsbase.c | 102 ++++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index b02ddce49bbb..afd029897c79 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -26,6 +26,7 @@
 #include <stddef.h>
 #include <sys/ptrace.h>
 #include <sys/wait.h>
+#include <setjmp.h>
 
 #ifndef __x86_64__
 # error This test is 64-bit only
@@ -74,6 +75,43 @@ static void sigsegv(int sig, siginfo_t *si, void *ctx_void)
 
 }
 
+static jmp_buf jmpbuf;
+
+static void sigill(int sig, siginfo_t *si, void *ctx_void)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static bool have_fsgsbase;
+
+static inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
+static inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
 enum which_base { FS, GS };
 
 static unsigned long read_base(enum which_base which)
@@ -202,14 +240,16 @@ static void do_remote_base()
 	       to_set, hard_zero ? " and clear gs" : "", sel);
 }
 
-void do_unexpected_base(void)
+static __thread int set_thread_area_entry_number = -1;
+
+static void do_unexpected_base(void)
 {
 	/*
 	 * The goal here is to try to arrange for GS == 0, GSBASE !=
 	 * 0, and for the the kernel the think that GSBASE == 0.
 	 *
 	 * To make the test as reliable as possible, this uses
-	 * explicit descriptorss.  (This is not the only way.  This
+	 * explicit descriptors.  (This is not the only way.  This
 	 * could use ARCH_SET_GS with a low, nonzero base, but the
 	 * relevant side effect of ARCH_SET_GS could change.)
 	 */
@@ -242,7 +282,7 @@ void do_unexpected_base(void)
 			MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
 		memcpy(low_desc, &desc, sizeof(desc));
 
-		low_desc->entry_number = -1;
+		low_desc->entry_number = set_thread_area_entry_number;
 
 		/* 32-bit set_thread_area */
 		long ret;
@@ -257,6 +297,8 @@ void do_unexpected_base(void)
 			return;
 		}
 		printf("\tother thread: using GDT slot %d\n", desc.entry_number);
+		set_thread_area_entry_number = desc.entry_number;
+
 		asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)((desc.entry_number << 3) | 0x3)));
 	}
 
@@ -268,6 +310,34 @@ void do_unexpected_base(void)
 	asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0));
 }
 
+void test_wrbase(unsigned short index, unsigned long base)
+{
+	unsigned short newindex;
+	unsigned long newbase;
+
+	printf("[RUN]\tGS = 0x%hx, GSBASE = 0x%lx\n", index, base);
+
+	asm volatile ("mov %0, %%gs" : : "rm" (index));
+	wrgsbase(base);
+
+	remote_base = 0;
+	ftx = 1;
+	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+	while (ftx != 0)
+		syscall(SYS_futex, &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
+
+	asm volatile ("mov %%gs, %0" : "=rm" (newindex));
+	newbase = rdgsbase();
+
+	if (newindex == index && newbase == base) {
+		printf("[OK]\tIndex and base were preserved\n");
+	} else {
+		printf("[FAIL]\tAfter switch, GS = 0x%hx and GSBASE = 0x%lx\n",
+		       newindex, newbase);
+		nerrs++;
+	}
+}
+
 static void *threadproc(void *ctx)
 {
 	while (1) {
@@ -439,6 +509,17 @@ int main()
 {
 	pthread_t thread;
 
+	/* Probe FSGSBASE */
+	sethandler(SIGILL, sigill, 0);
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		rdfsbase();
+		have_fsgsbase = true;
+		printf("\tFSGSBASE instructions are enabled\n");
+	} else {
+		printf("\tFSGSBASE instructions are disabled\n");
+	}
+	clearhandler(SIGILL);
+
 	sethandler(SIGSEGV, sigsegv, 0);
 
 	check_gs_value(0);
@@ -485,6 +566,21 @@ int main()
 
 	test_unexpected_base();
 
+	if (have_fsgsbase) {
+		unsigned short ss;
+
+		asm volatile ("mov %%ss, %0" : "=rm" (ss));
+
+		test_wrbase(0, 0);
+		test_wrbase(0, 1);
+		test_wrbase(0, 0x200000000);
+		test_wrbase(0, 0xffffffffffffffff);
+		test_wrbase(ss, 0);
+		test_wrbase(ss, 1);
+		test_wrbase(ss, 0x200000000);
+		test_wrbase(ss, 0xffffffffffffffff);
+	}
+
 	ftx = 3;  /* Kill the thread. */
 	syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE
  2019-05-08 10:02 ` [PATCH v7 15/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE Chang S. Bae
@ 2019-06-22 10:13   ` tip-bot for Chang S. Bae
  0 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Chang S. Bae @ 2019-06-22 10:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, ravi.v.shankar, linux-kernel, chang.seok.bae, luto, mingo, tglx, hpa

Commit-ID:  a87730cc3acc475eff12ddde3f7d5687371b5c76
Gitweb:     https://git.kernel.org/tip/a87730cc3acc475eff12ddde3f7d5687371b5c76
Author:     Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Wed, 8 May 2019 03:02:30 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:56 +0200

selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE

This validates that GS and GSBASE are independently preserved in
ptracer commands.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-16-git-send-email-chang.seok.bae@intel.com

---
 tools/testing/selftests/x86/fsgsbase.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index afd029897c79..21fd4f94b5b0 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -470,7 +470,7 @@ static void test_ptrace_write_gsbase(void)
 	wait(&status);
 
 	if (WSTOPSIG(status) == SIGTRAP) {
-		unsigned long gs;
+		unsigned long gs, base;
 		unsigned long gs_offset = USER_REGS_OFFSET(gs);
 		unsigned long base_offset = USER_REGS_OFFSET(gs_base);
 
@@ -486,6 +486,7 @@ static void test_ptrace_write_gsbase(void)
 			err(1, "PTRACE_POKEUSER");
 
 		gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+		base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);
 
 		/*
 		 * In a non-FSGSBASE system, the nonzero selector will load
@@ -496,8 +497,14 @@ static void test_ptrace_write_gsbase(void)
 		if (gs != 0x7) {
 			nerrs++;
 			printf("[FAIL]\tGS changed to %lx\n", gs);
+		} else if (have_fsgsbase && (base != 0xFF)) {
+			nerrs++;
+			printf("[FAIL]\tGSBASE changed to %lx\n", base);
 		} else {
-			printf("[OK]\tGS remained 0x7\n");
+			printf("[OK]\tGS remained 0x7 %s");
+			if (have_fsgsbase)
+				printf("and GSBASE changed to 0xFF");
+			printf("\n");
 		}
 	}
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
  2019-05-08 10:02 ` [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
@ 2019-06-22 10:14   ` tip-bot for Andy Lutomirski
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Andy Lutomirski @ 2019-06-22 10:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, chang.seok.bae, ravi.v.shankar, ak, tglx, hpa, luto

Commit-ID:  2032f1f96ee0da600633c6c627b9c0a2e0f8b8a6
Gitweb:     https://git.kernel.org/tip/2032f1f96ee0da600633c6c627b9c0a2e0f8b8a6
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Wed, 8 May 2019 03:02:31 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:56 +0200

x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com

---
 Documentation/admin-guide/kernel-parameters.txt |  3 +--
 arch/x86/kernel/cpu/common.c                    | 32 +++++++++++--------------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b0fa5273b0fc..35bc3c3574c6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2857,8 +2857,7 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
-	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
-			replaced with a nofsgsbase flag.
+	nofsgsbase	[X86] Disables FSGSBASE instructions.
 
 	no_console_suspend
 			[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 71defe2d1b7c..1305f16b6105 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -366,21 +366,21 @@ out:
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-	unsafe_fsgsbase = true;
+	/* Require an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	/* Do not emit a message if the feature is not present. */
+	if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+	pr_info("FSGSBASE disabled via kernel command line\n");
 	return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1387,12 +1387,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-		if (unsafe_fsgsbase)
-			cr4_set_bits(X86_CR4_FSGSBASE);
-		else
-			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-	}
+	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+		cr4_set_bits(X86_CR4_FSGSBASE);
 
 	/*
 	 * The vendor-specific functions might have changed features.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  2019-05-08 10:02 ` [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
@ 2019-06-22 10:14   ` tip-bot for Andi Kleen
  2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot for Andi Kleen @ 2019-06-22 10:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, ak, mingo, ravi.v.shankar, linux-kernel, luto, hpa, chang.seok.bae

Commit-ID:  f987c955c74501c9295a81372c7d363cbe07c8a6
Gitweb:     https://git.kernel.org/tip/f987c955c74501c9295a81372c7d363cbe07c8a6
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 8 May 2019 03:02:32 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:56 +0200

x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com

---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 6ebaae90e207..c5ce54e749f6 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT		(1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE			BIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1305f16b6105..637c9117d5ae 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1387,8 +1387,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
 		cr4_set_bits(X86_CR4_FSGSBASE);
+		elf_hwcap2 |= HWCAP2_FSGSBASE;
+	}
 
 	/*
 	 * The vendor-specific functions might have changed features.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:x86/cpu] Documentation/x86/64: Add documentation for GS/FS addressing mode
  2019-06-14  6:54   ` Thomas Gleixner
  2019-06-14 20:07     ` Bae, Chang Seok
  2019-06-16 15:54     ` Randy Dunlap
@ 2019-06-22 10:15     ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-06-22 10:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, corbet, rdunlap, chang.seok.bae, linux-kernel, ak,
	tglx, ravi.v.shankar

Commit-ID:  2c7b5ac5d5a93c4b0557293d06c6677f765081a6
Gitweb:     https://git.kernel.org/tip/2c7b5ac5d5a93c4b0557293d06c6677f765081a6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 13 Jun 2019 22:04:24 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 22 Jun 2019 11:38:57 +0200

Documentation/x86/64: Add documentation for GS/FS addressing mode

Explain how the GS/FS based addressing can be utilized in user space
applications along with the differences between the generic prctl() based
GS/FS base control and the FSGSBASE version available on newer CPUs.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>,
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: "Shankar, Ravi V" <ravi.v.shankar@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132246310.1791@nanos.tec.linutronix.de
---
 Documentation/x86/x86_64/fsgs.rst  | 199 +++++++++++++++++++++++++++++++++++++
 Documentation/x86/x86_64/index.rst |   1 +
 2 files changed, 200 insertions(+)

diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
new file mode 100644
index 000000000000..380c0b5ccca2
--- /dev/null
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Using FS and GS segments in user space applications
+===================================================
+
+The x86 architecture supports segmentation. Instructions which access
+memory can use segment register based addressing mode. The following
+notation is used to address a byte within a segment:
+
+  Segment-register:Byte-address
+
+The segment base address is added to the Byte-address to compute the
+resulting virtual address which is accessed. This allows to access multiple
+instances of data with the identical Byte-address, i.e. the same code. The
+selection of a particular instance is purely based on the base-address in
+the segment register.
+
+In 32-bit mode the CPU provides 6 segments, which also support segment
+limits. The limits can be used to enforce address space protections.
+
+In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
+always 0 to provide a full 64bit address space. The FS and GS segments are
+still functional in 64-bit mode.
+
+Common FS and GS usage
+------------------------------
+
+The FS segment is commonly used to address Thread Local Storage (TLS). FS
+is usually managed by runtime code or a threading library. Variables
+declared with the '__thread' storage class specifier are instantiated per
+thread and the compiler emits the FS: address prefix for accesses to these
+variables. Each thread has its own FS base address so common code can be
+used without complex address offset calculations to access the per thread
+instances. Applications should not use FS for other purposes when they use
+runtimes or threading libraries which manage the per thread FS.
+
+The GS segment has no common use and can be used freely by
+applications. GCC and Clang support GS based addressing via address space
+identifiers.
+
+Reading and writing the FS/GS base address
+------------------------------------------
+
+There exist two mechanisms to read and write the FS/FS base address:
+
+ - the arch_prctl() system call
+
+ - the FSGSBASE instruction family
+
+Accessing FS/GS base with arch_prctl()
+--------------------------------------
+
+ The arch_prctl(2) based mechanism is available on all 64bit CPUs and all
+ kernel versions.
+
+ Reading the base:
+
+   arch_prctl(ARCH_GET_FS, &fsbase);
+   arch_prctl(ARCH_GET_GS, &gsbase);
+
+ Writing the base:
+
+   arch_prctl(ARCH_SET_FS, fsbase);
+   arch_prctl(ARCH_SET_GS, gsbase);
+
+ The ARCH_SET_GS prctl may be disabled depending on kernel configuration
+ and security settings.
+
+Accessing FS/GS base with the FSGSBASE instructions
+---------------------------------------------------
+
+ With the Ivy Bridge CPU generation Intel introduced a new set of
+ instructions to access the FS and GS base registers directly from user
+ space. These instructions are also supported on AMD Family 17H CPUs. The
+ following instructions are available:
+
+  =============== ===========================
+  RDFSBASE %reg   Read the FS base register
+  RDGSBASE %reg   Read the GS base register
+  WRFSBASE %reg   Write the FS base register
+  WRGSBASE %reg   Write the GS base register
+  =============== ===========================
+
+ The instructions avoid the overhead of the arch_prctl() syscall and allow
+ more flexible usage of the FS/GS addressing modes in user space
+ applications. This does not prevent conflicts between threading libraries
+ and runtimes which utilize FS and applications which want to use it for
+ their own purpose.
+
+FSGSBASE instructions enablement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
+ available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
+
+ The availability of the instructions does not enable them
+ automatically. The kernel has to enable them explicitly in CR4. The
+ reason for this is that older kernels make assumptions about the values in
+ the GS register and enforce them when GS base is set via
+ arch_prctl(). Allowing user space to write arbitrary values to GS base
+ would violate these assumptions and cause malfunction.
+
+ On kernels which do not enable FSGSBASE the execution of the FSGSBASE
+ instructions will fault with a #UD exception.
+
+ The kernel provides reliable information about the enabled state in the
+ ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
+ kernel has FSGSBASE instructions enabled and applications can use them.
+ The following code example shows how this detection works::
+
+   #include <sys/auxv.h>
+   #include <elf.h>
+
+   /* Will be eventually in asm/hwcap.h */
+   #ifndef HWCAP2_FSGSBASE
+   #define HWCAP2_FSGSBASE        (1 << 1)
+   #endif
+
+   ....
+
+   unsigned val = getauxval(AT_HWCAP2);
+
+   if (val & HWCAP2_FSGSBASE)
+        printf("FSGSBASE enabled\n");
+
+FSGSBASE instructions compiler support
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
+instructions. Clang supports them as well.
+
+  =================== ===========================
+  _readfsbase_u64()   Read the FS base register
+  _readfsbase_u64()   Read the GS base register
+  _writefsbase_u64()  Write the FS base register
+  _writegsbase_u64()  Write the GS base register
+  =================== ===========================
+
+To utilize these instrinsics <immintrin.h> must be included in the source
+code and the compiler option -mfsgsbase has to be added.
+
+Compiler support for FS/GS based addressing
+-------------------------------------------
+
+GCC version 6 and newer provide support for FS/GS based addressing via
+Named Address Spaces. GCC implements the following address space
+identifiers for x86:
+
+  ========= ====================================
+  __seg_fs  Variable is addressed relative to FS
+  __seg_gs  Variable is addressed relative to GS
+  ========= ====================================
+
+The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
+address spaces are supported. Code which implements fallback modes should
+check whether these symbols are defined. Usage example::
+
+  #ifdef __SEG_GS
+
+  long data0 = 0;
+  long data1 = 1;
+
+  long __seg_gs *ptr;
+
+  /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
+  ....
+
+  /* Set GS to point to data0 */
+  _writegsbase_u64(&data0);
+
+  /* Access offset 0 of GS */
+  ptr = 0;
+  printf("data0 = %ld\n", *ptr);
+
+  /* Set GS to point to data1 */
+  _writegsbase_u64(&data1);
+  /* ptr still addresses offset 0! */
+  printf("data1 = %ld\n", *ptr);
+
+
+Clang does not provide the GCC address space identifiers, but it provides
+address spaces via an attribute based mechanism in Clang 5 and newer
+versions:
+
+ ==================================== =====================================
+  __attribute__((address_space(256))  Variable is addressed relative to GS
+  __attribute__((address_space(257))  Variable is addressed relative to FS
+ ==================================== =====================================
+
+FS/GS based addressing with inline assembly
+-------------------------------------------
+
+In case the compiler does not support address spaces, inline assembly can
+be used for FS/GS based addressing mode::
+
+	mov %fs:offset, %reg
+	mov %gs:offset, %reg
+
+	mov %reg, %fs:offset
+	mov %reg, %gs:offset
diff --git a/Documentation/x86/x86_64/index.rst b/Documentation/x86/x86_64/index.rst
index d6eaaa5a35fc..a56070fc8e77 100644
--- a/Documentation/x86/x86_64/index.rst
+++ b/Documentation/x86/x86_64/index.rst
@@ -14,3 +14,4 @@ x86_64 Support
    fake-numa-for-cpusets
    cpu-hotplug-spec
    machinecheck
+   fsgs

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path
  2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
  2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot for Chang S. Bae
@ 2019-06-29  7:21   ` Bae, Chang Seok
  2019-06-29  7:37     ` Thomas Gleixner
  2020-06-18 13:50   ` [tip: x86/fsgsbase] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot2 for Chang S. Bae
  2 siblings, 1 reply; 63+ messages in thread
From: Bae, Chang Seok @ 2019-06-29  7:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Vegard Nossum, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Shankar, Ravi V, LKML, Dave Hansen


> On May 8, 2019, at 03:02, Chang S. Bae <chang.seok.bae@intel.com> wrote:
> 
> ENTRY(paranoid_exit)
> …
> +
> +	/* On FSGSBASE systems, always restore the stashed GSBASE */
> +	wrgsbase	%rbx
> +	jmp	.Lparanoid_exit_no_swapgs;

It would crash any time getting a paranoid entry with user GS but kernel CR3.
The issue is thankfully uncovered by Vegard N. A relevant test case will be
published by Andy L. The patch fixes the issue. (Rebased on the tip master.)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b5e782a..dfdadc1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1288,9 +1288,12 @@ ENTRY(paranoid_exit)
       /* Handle GS depending on FSGSBASE availability */
       ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "nop",X86_FEATURE_FSGSBASE

+       TRACE_IRQS_IRETQ
+       /* Always restore stashed CR3 value (see paranoid_entry) */
+       RESTORE_CR3     scratch_reg=%rax save_reg=%r14
       /* With FSGSBASE enabled, unconditionally restore GSBASE */
       wrgsbase        %rbx
-       jmp     .Lparanoid_exit_no_swapgs;
+       jmp     .Lparanoid_exit_restore;

.Lparanoid_exit_checkgs:
       /* On non-FSGSBASE systems, conditionally do SWAPGS */

>  ...
> .Lparanoid_exit_no_swapgs:
> 	TRACE_IRQS_IRETQ_DEBUG
> 	/* Always restore stashed CR3 value (see paranoid_entry) */
> 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
> +
> .Lparanoid_exit_restore:
> -	jmp restore_regs_and_return_to_kernel
> +	jmp	restore_regs_and_return_to_kernel
> END(paranoid_exit)
> 



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path
  2019-06-29  7:21   ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Bae, Chang Seok
@ 2019-06-29  7:37     ` Thomas Gleixner
  0 siblings, 0 replies; 63+ messages in thread
From: Thomas Gleixner @ 2019-06-29  7:37 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Andy Lutomirski, Vegard Nossum, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Shankar, Ravi V, LKML, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 610 bytes --]

On Sat, 29 Jun 2019, Bae, Chang Seok wrote:
> > On May 8, 2019, at 03:02, Chang S. Bae <chang.seok.bae@intel.com> wrote:
> > 
> > ENTRY(paranoid_exit)
> > …
> > +
> > +	/* On FSGSBASE systems, always restore the stashed GSBASE */
> > +	wrgsbase	%rbx
> > +	jmp	.Lparanoid_exit_no_swapgs;
> 
> It would crash any time getting a paranoid entry with user GS but kernel CR3.
> The issue is thankfully uncovered by Vegard N. A relevant test case will be
> published by Andy L. The patch fixes the issue. (Rebased on the tip master.)

Can you please provide a proper patch with a proper changelog?

Thanks,

	tglx
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  2019-05-08 10:02 ` [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
  2019-06-22 10:14   ` [tip:x86/cpu] " tip-bot for Andi Kleen
@ 2020-06-18 13:50   ` tip-bot2 for Andi Kleen
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Andi Kleen @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andi Kleen, Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     742c45c3ecc9255e15eddbbcee44fd8de401cf1c
Gitweb:        https://git.kernel.org/tip/742c45c3ecc9255e15eddbbcee44fd8de401cf1c
Author:        Andi Kleen <ak@linux.intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:59 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:05 +02:00

x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-14-sashal@kernel.org


---
 arch/x86/include/uapi/asm/hwcap2.h | 3 +++
 arch/x86/kernel/cpu/common.c       | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 8b2effe..5fdfcb4 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
 /* MONITOR/MWAIT enabled in Ring 3 */
 #define HWCAP2_RING3MWAIT		(1 << 0)
 
+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE			BIT(1)
+
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 18857ce..fca5612 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1512,8 +1512,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
 		cr4_set_bits(X86_CR4_FSGSBASE);
+		elf_hwcap2 |= HWCAP2_FSGSBASE;
+	}
 
 	/*
 	 * The vendor-specific functions might have changed features.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  2019-05-08 10:02 ` [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
  2019-06-22 10:10   ` [tip:x86/cpu] x86/entry/64: " tip-bot for Chang S. Bae
@ 2020-06-18 13:50   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: H. Peter Anvin, Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     eaad981291ee36efee15a5e515d4598ae94ace07
Gitweb:        https://git.kernel.org/tip/eaad981291ee36efee15a5e515d4598ae94ace07
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:56 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:04 +02:00

x86/entry/64: Introduce the FIND_PERCPU_BASE macro

GSBASE is used to find per-CPU data in the kernel. But when GSBASE is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR.  The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.23 do not support
it. Support is added in version 2.27.

[ tglx: Massaged changelog ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-12-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-11-sashal@kernel.org


---
 arch/x86/entry/calling.h    | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/inst.h | 15 +++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 4208c1e..5c0cbb4 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
 #include <asm/percpu.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
+#include <asm/inst.h>
 
 /*
 
@@ -351,3 +352,36 @@ For 32-bit we have the following conventions - kernel is built with
 	call stackleak_erase
 #endif
 .endm
+
+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+	movq	$__CPUNODE_SEG, \reg
+	lsl	\reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+	ALTERNATIVE \
+		"LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+		"RDPID	\reg", \
+		X86_FEATURE_RDPID
+	andq	$VDSO_CPUNODE_MASK, \reg
+	movq	__per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+	movq	pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796d..d063841 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
 	.endif
 	MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
 	.endm
+
+.macro RDPID opd
+	REG_TYPE rdpid_opd_type \opd
+	.if rdpid_opd_type == REG_TYPE_R64
+	R64_NUM rdpid_opd \opd
+	.else
+	R32_NUM rdpid_opd \opd
+	.endif
+	.byte 0xf3
+	.if rdpid_opd > 7
+	PFX_REX rdpid_opd 0
+	.endif
+	.byte 0x0f, 0xc7
+	MODRM 0xc0 rdpid_opd 0x7
+.endm
 #endif
 
 #endif

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
  2019-05-08 10:02 ` [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
  2019-06-22 10:14   ` [tip:x86/cpu] x86/cpu: Enable FSGSBASE on 64bit " tip-bot for Andy Lutomirski
@ 2020-06-18 13:50   ` tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin,
	Andi Kleen, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     b745cfba44c152c34363eea9e052367b6b1d652b
Gitweb:        https://git.kernel.org/tip/b745cfba44c152c34363eea9e052367b6b1d652b
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Thu, 28 May 2020 16:13:58 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:05 +02:00

x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-13-sashal@kernel.org


---
 Documentation/admin-guide/kernel-parameters.txt |  3 +--
 arch/x86/kernel/cpu/common.c                    | 32 +++++++---------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7308db7..8c0d045 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3079,8 +3079,7 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
-	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
-			replaced with a nofsgsbase flag.
+	nofsgsbase	[X86] Disables FSGSBASE instructions.
 
 	no_console_suspend
 			[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7438a31..18857ce 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -441,21 +441,21 @@ static void __init setup_cr_pinning(void)
 	static_key_enable(&cr_pinning.key);
 }
 
-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
 {
-	unsafe_fsgsbase = true;
+	/* Require an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	/* Do not emit a message if the feature is not present. */
+	if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+	pr_info("FSGSBASE disabled via kernel command line\n");
 	return 1;
 }
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);
 
 /*
  * Protection Keys are not available in 32-bit mode.
@@ -1512,12 +1512,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_umip(c);
 
 	/* Enable FSGSBASE instructions if available. */
-	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-		if (unsafe_fsgsbase)
-			cr4_set_bits(X86_CR4_FSGSBASE);
-		else
-			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
-	}
+	if (cpu_has(c, X86_FEATURE_FSGSBASE))
+		cr4_set_bits(X86_CR4_FSGSBASE);
 
 	/*
 	 * The vendor-specific functions might have changed features.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
  2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot for Chang S. Bae
  2019-06-29  7:21   ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Bae, Chang Seok
@ 2020-06-18 13:50   ` tip-bot2 for Chang S. Bae
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: H. Peter Anvin, Andy Lutomirski, Thomas Gleixner, Chang S. Bae,
	Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     c82965f9e53005c1c62632c468968293262056cb
Gitweb:        https://git.kernel.org/tip/c82965f9e53005c1c62632c468968293262056cb
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:57 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:04 +02:00

x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit

Without FSGSBASE, user space cannot change GSBASE other than through a
PRCTL. The kernel enforces that the user space GSBASE value is postive as
negative values are used for detecting the kernel space GSBASE value in the
paranoid entry code.

If FSGSBASE is enabled, user space can set arbitrary GSBASE values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.

To avoid this, paranoid entry needs to unconditionally save the current
GSBASE value independent of the interrupted context, retrieve and write the
kernel GSBASE and unconditionally restore the saved value on exit. The
restore happens either in paranoid_exit or in the special exit path of the
NMI low level code.

All other entry code pathes which use unconditional SWAPGS are not affected
as they do not depend on the actual content.

[ tglx: Massaged changelogs and comments ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-13-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-12-sashal@kernel.org


---
 arch/x86/entry/calling.h  |   6 ++-
 arch/x86/entry/entry_64.S | 111 ++++++++++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 5c0cbb4..98e4d88 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -342,6 +342,12 @@ For 32-bit we have the following conventions - kernel is built with
 #endif
 .endm
 
+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+	rdgsbase \save_reg
+	GET_PERCPU_BASE \scratch_reg
+	wrgsbase \scratch_reg
+.endm
+
 #else /* CONFIG_X86_64 */
 # undef		UNWIND_HINT_IRET_REGS
 # define	UNWIND_HINT_IRET_REGS
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 04d1eea..fb729f4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/frame.h>
 #include <asm/trapnr.h>
 #include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -426,10 +427,7 @@ SYM_CODE_START(\asmsym)
 	testb	$3, CS-ORIG_RAX(%rsp)
 	jnz	.Lfrom_usermode_switch_stack_\@
 
-	/*
-	 * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
-	 * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
-	 */
+	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
 	call	paranoid_entry
 
 	UNWIND_HINT_REGS
@@ -458,10 +456,7 @@ SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS offset=8
 	ASM_CLAC
 
-	/*
-	 * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
-	 * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
-	 */
+	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
 	call	paranoid_entry
 	UNWIND_HINT_REGS
 
@@ -798,9 +793,14 @@ SYM_CODE_END(xen_failsafe_callback)
 #endif /* CONFIG_XEN_PV */
 
 /*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs. Return GSBASE related information
+ * in EBX depending on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
+ *
+ *     Y        GSBASE value at entry, must be restored in paranoid_exit
  */
 SYM_CODE_START_LOCAL(paranoid_entry)
 	UNWIND_HINT_FUNC
@@ -808,7 +808,6 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
 
-1:
 	/*
 	 * Always stash CR3 in %r14.  This value will be restored,
 	 * verbatim, at exit.  Needed if paranoid_entry interrupted
@@ -826,6 +825,28 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	/*
+	 * Handling GSBASE depends on the availability of FSGSBASE.
+	 *
+	 * Without FSGSBASE the kernel enforces that negative GSBASE
+	 * values indicate kernel GSBASE. With FSGSBASE no assumptions
+	 * can be made about the GSBASE value when entering from user
+	 * space.
+	 */
+	ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
+
+	/*
+	 * Read the current GSBASE and store it in %rbx unconditionally,
+	 * retrieve and set the current CPUs kernel GSBASE. The stored value
+	 * has to be restored in paranoid_exit unconditionally.
+	 *
+	 * The MSR write ensures that no subsequent load is based on a
+	 * mispredicted GSBASE. No extra FENCE required.
+	 */
+	SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+	ret
+
+.Lparanoid_entry_checkgs:
 	/* EBX = 1 -> kernel GSBASE active, no restore required */
 	movl	$1, %ebx
 	/*
@@ -860,24 +881,45 @@ SYM_CODE_END(paranoid_entry)
  *
  * We may be returning to very strange contexts (e.g. very early
  * in syscall entry), so checking for preemption here would
- * be complicated.  Fortunately, we there's no good reason
- * to try to handle preemption here.
+ * be complicated.  Fortunately, there's no good reason to try
+ * to handle preemption here.
+ *
+ * R/EBX contains the GSBASE related information depending on the
+ * availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE	R/EBX
+ *     N        0 -> SWAPGS on exit
+ *              1 -> no SWAPGS on exit
  *
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ *     Y        User space GSBASE, must be restored unconditionally
  */
 SYM_CODE_START_LOCAL(paranoid_exit)
 	UNWIND_HINT_REGS
-	/* If EBX is 0, SWAPGS is required */
-	testl	%ebx, %ebx
-	jnz	.Lparanoid_exit_no_swapgs
-	/* Always restore stashed CR3 value (see paranoid_entry) */
-	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
+	/*
+	 * The order of operations is important. RESTORE_CR3 requires
+	 * kernel GSBASE.
+	 *
+	 * NB to anyone to try to optimize this code: this code does
+	 * not execute at all for exceptions from user mode. Those
+	 * exceptions go through error_exit instead.
+	 */
+	RESTORE_CR3	scratch_reg=%rax save_reg=%r14
+
+	/* Handle the three GSBASE cases */
+	ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
+
+	/* With FSGSBASE enabled, unconditionally restore GSBASE */
+	wrgsbase	%rbx
+	jmp		restore_regs_and_return_to_kernel
+
+.Lparanoid_exit_checkgs:
+	/* On non-FSGSBASE systems, conditionally do SWAPGS */
+	testl		%ebx, %ebx
+	jnz		restore_regs_and_return_to_kernel
+
+	/* We are returning to a context with user GSBASE */
 	SWAPGS_UNSAFE_STACK
-	jmp	restore_regs_and_return_to_kernel
-.Lparanoid_exit_no_swapgs:
-	/* Always restore stashed CR3 value (see paranoid_entry) */
-	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14
-	jmp restore_regs_and_return_to_kernel
+	jmp		restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
 /*
@@ -1282,10 +1324,27 @@ end_repeat_nmi:
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/*
+	 * The above invocation of paranoid_entry stored the GSBASE
+	 * related information in R/EBX depending on the availability
+	 * of FSGSBASE.
+	 *
+	 * If FSGSBASE is enabled, restore the saved GSBASE value
+	 * unconditionally, otherwise take the conditional SWAPGS path.
+	 */
+	ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
+
+	wrgsbase	%rbx
+	jmp	nmi_restore
+
+nmi_no_fsgsbase:
+	/* EBX == 0 -> invoke SWAPGS */
+	testl	%ebx, %ebx
 	jnz	nmi_restore
+
 nmi_swapgs:
 	SWAPGS_UNSAFE_STACK
+
 nmi_restore:
 	POP_REGS
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  2019-05-08 10:02 ` [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry Chang S. Bae
  2019-06-22 10:09   ` [tip:x86/cpu] x86/entry/64: Switch CR3 before SWAPGS in " tip-bot for Chang S. Bae
@ 2020-06-18 13:50   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     96b2371413e8f636a5f25c42a933af21c35a2a41
Gitweb:        https://git.kernel.org/tip/96b2371413e8f636a5f25c42a933af21c35a2a41
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:55 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:03 +02:00

x86/entry/64: Switch CR3 before SWAPGS in paranoid entry

When FSGSBASE is enabled, the GSBASE handling in paranoid entry will need
to retrieve the kernel GSBASE which requires that the kernel page table is
active.

As the CR3 switch to the kernel page tables (PTI is active) does not depend
on kernel GSBASE, move the CR3 switch in front of the GSBASE handling.

Comment the EBX content while at it.

No functional change.

[ tglx: Rewrote changelog and comments ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-11-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-10-sashal@kernel.org


---
 arch/x86/entry/entry_64.S | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d2a00c9..04d1eea 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -807,13 +807,6 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	cld
 	PUSH_AND_CLEAR_REGS save_ret=1
 	ENCODE_FRAME_POINTER 8
-	movl	$1, %ebx
-	movl	$MSR_GS_BASE, %ecx
-	rdmsr
-	testl	%edx, %edx
-	js	1f				/* negative -> in kernel */
-	SWAPGS
-	xorl	%ebx, %ebx
 
 1:
 	/*
@@ -825,9 +818,29 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	 * This is also why CS (stashed in the "iret frame" by the
 	 * hardware at entry) can not be used: this may be a return
 	 * to kernel code, but with a user CR3 value.
+	 *
+	 * Switching CR3 does not depend on kernel GSBASE so it can
+	 * be done before switching to the kernel GSBASE. This is
+	 * required for FSGSBASE because the kernel GSBASE has to
+	 * be retrieved from a kernel internal table.
 	 */
 	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
 
+	/* EBX = 1 -> kernel GSBASE active, no restore required */
+	movl	$1, %ebx
+	/*
+	 * The kernel-enforced convention is a negative GSBASE indicates
+	 * a kernel value. No SWAPGS needed on entry and exit.
+	 */
+	movl	$MSR_GS_BASE, %ecx
+	rdmsr
+	testl	%edx, %edx
+	jns	.Lparanoid_entry_swapgs
+	ret
+
+.Lparanoid_entry_swapgs:
+	SWAPGS
+
 	/*
 	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
 	 * unconditional CR3 write, even in the PTI case.  So do an lfence
@@ -835,6 +848,8 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	 */
 	FENCE_SWAPGS_KERNEL_ENTRY
 
+	/* EBX = 0 -> SWAPGS required on exit */
+	xorl	%ebx, %ebx
 	ret
 SYM_CODE_END(paranoid_entry)
 
@@ -852,7 +867,8 @@ SYM_CODE_END(paranoid_entry)
  */
 SYM_CODE_START_LOCAL(paranoid_exit)
 	UNWIND_HINT_REGS
-	testl	%ebx, %ebx			/* swapgs needed? */
+	/* If EBX is 0, SWAPGS is required */
+	testl	%ebx, %ebx
 	jnz	.Lparanoid_exit_no_swapgs
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3	scratch_reg=%rbx save_reg=%r14

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
  2019-05-08 10:02 ` [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions Chang S. Bae
  2019-06-22 10:09   ` [tip:x86/cpu] x86/process/64: Use FSGSBASE instructions on thread copy and ptrace tip-bot for Chang S. Bae
@ 2020-06-18 13:50   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     005f141e5d5e05d3986539567d0bc5aa2f4dc640
Gitweb:        https://git.kernel.org/tip/005f141e5d5e05d3986539567d0bc5aa2f4dc640
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:53 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:02 +02:00

x86/process/64: Use FSGSBASE instructions on thread copy and ptrace

When FSGSBASE is enabled, copying threads and reading fsbase and gsbase
using ptrace must read the actual values.

When copying a thread, use save_fsgs() and copy the saved values.  For
ptrace, the bases must be read from memory regardless of the selector if
FSGSBASE is enabled.

[ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ]
[ luto: Massage changelog ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-8-sashal@kernel.org


---
 arch/x86/kernel/process.c    | 10 ++++++----
 arch/x86/kernel/process_64.c |  6 ++++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0..216c88d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -140,10 +140,12 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
 #ifdef CONFIG_X86_64
-	savesegment(gs, p->thread.gsindex);
-	p->thread.gsbase = p->thread.gsindex ? 0 : current->thread.gsbase;
-	savesegment(fs, p->thread.fsindex);
-	p->thread.fsbase = p->thread.fsindex ? 0 : current->thread.fsbase;
+	current_save_fsgs();
+	p->thread.fsindex = current->thread.fsindex;
+	p->thread.fsbase = current->thread.fsbase;
+	p->thread.gsindex = current->thread.gsindex;
+	p->thread.gsbase = current->thread.gsbase;
+
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
 #else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8ccc587..d618969 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -426,7 +426,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		fsbase = x86_fsbase_read_cpu();
-	else if (task->thread.fsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.fsindex == 0))
 		fsbase = task->thread.fsbase;
 	else
 		fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -440,7 +441,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 
 	if (task == current)
 		gsbase = x86_gsbase_read_cpu_inactive();
-	else if (task->thread.gsindex == 0)
+	else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+		 (task->thread.gsindex == 0))
 		gsbase = task->thread.gsbase;
 	else
 		gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/process/64: Use FSBSBASE in switch_to() if available
  2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
  2019-06-21 15:22   ` Thomas Gleixner
  2019-06-22 10:08   ` [tip:x86/cpu] x86/process/64: Use FSBSBASE in switch_to() if available tip-bot for Andy Lutomirski
@ 2020-06-18 13:50   ` tip-bot2 for Andy Lutomirski
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin,
	Andi Kleen, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     673903495c85137791d5820d690229efe09c8f7b
Gitweb:        https://git.kernel.org/tip/673903495c85137791d5820d690229efe09c8f7b
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Thu, 28 May 2020 16:13:51 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:02 +02:00

x86/process/64: Use FSBSBASE in switch_to() if available

With the new FSGSBASE instructions, FS and GSABSE can be efficiently read
and writen in __switch_to().  Use that capability to preserve the full
state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas.  (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if
WRGSBASE was used, but users are expected to read the CPU manual before
doing things like that.)

This is a considerable speedup.  It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on a
Skylake laptop. This is mostly due to avoiding the WRMSR operation.

[ chang: 5~10% performance improvements were seen with a context switch
  benchmark that ran threads with different FS/GSBASE values (to the
  baseline 4.16). Minor edit on the changelog. ]

[ tglx: Masaage changelog ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-6-sashal@kernel.org


---
 arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ef2f755..8ccc587 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -236,8 +236,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
 {
 	savesegment(fs, task->thread.fsindex);
 	savesegment(gs, task->thread.gsindex);
-	save_base_legacy(task, task->thread.fsindex, FS);
-	save_base_legacy(task, task->thread.gsindex, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/*
+		 * If FSGSBASE is enabled, we can't make any useful guesses
+		 * about the base, and user code expects us to save the current
+		 * value.  Fortunately, reading the base directly is efficient.
+		 */
+		task->thread.fsbase = rdfsbase();
+		task->thread.gsbase = __rdgsbase_inactive();
+	} else {
+		save_base_legacy(task, task->thread.fsindex, FS);
+		save_base_legacy(task, task->thread.gsindex, GS);
+	}
 }
 
 /*
@@ -319,10 +329,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
-	load_seg_legacy(prev->fsindex, prev->fsbase,
-			next->fsindex, next->fsbase, FS);
-	load_seg_legacy(prev->gsindex, prev->gsbase,
-			next->gsindex, next->gsbase, GS);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		/* Update the FS and GS selectors if they could have changed. */
+		if (unlikely(prev->fsindex || next->fsindex))
+			loadseg(FS, next->fsindex);
+		if (unlikely(prev->gsindex || next->gsindex))
+			loadseg(GS, next->gsindex);
+
+		/* Update the bases. */
+		wrfsbase(next->fsbase);
+		__wrgsbase_inactive(next->gsbase);
+	} else {
+		load_seg_legacy(prev->fsindex, prev->fsbase,
+				next->fsindex, next->fsbase, FS);
+		load_seg_legacy(prev->gsindex, prev->gsbase,
+				next->gsindex, next->gsbase, GS);
+	}
 }
 
 static unsigned long x86_fsgsbase_read_task(struct task_struct *task,

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  2019-05-08 10:02 ` [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
  2019-06-22 10:07   ` [tip:x86/cpu] x86/fsgsbase/64: Enable FSGSBASE instructions in " tip-bot for Chang S. Bae
@ 2020-06-18 13:50   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     58edfd2e0a93c9adc2f29902a0335af0584041a0
Gitweb:        https://git.kernel.org/tip/58edfd2e0a93c9adc2f29902a0335af0584041a0
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:50 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:00 +02:00

x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions

Add cpu feature conditional FSGSBASE access to the relevant helper
functions. That allows to accelerate certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GSBASE operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive()
helpers. Note, for Xen PV, paravirt hooks can be added later as they might
allow a very efficient but different implementation.

[ tglx: Massaged changelog, convert it to noinstr and force inline
  	native_swapgs() ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-5-sashal@kernel.org


---
 arch/x86/include/asm/fsgsbase.h  | 27 +++++-------
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/process_64.c     | 68 +++++++++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177..aefd537 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
 	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		wrfsbase(fsbase);
+	else
+		wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 42cd333..f66202d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -575,7 +575,7 @@ native_load_sp0(unsigned long sp0)
 	this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
 }
 
-static inline void native_swapgs(void)
+static __always_inline void native_swapgs(void)
 {
 #ifdef CONFIG_X86_64
 	asm volatile("swapgs" ::: "memory");
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9a97415..c41e0aa 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -150,6 +150,44 @@ enum which_selector {
 };
 
 /*
+ * Out of line to be protected from kprobes and tracing. If this would be
+ * traced or probed than any access to a per CPU variable happens with
+ * the wrong GS.
+ *
+ * It is not used on Xen paravirt. When paravirt support is needed, it
+ * needs to be renamed with native_ prefix.
+ */
+static noinstr unsigned long __rdgsbase_inactive(void)
+{
+	unsigned long gsbase;
+
+	lockdep_assert_irqs_disabled();
+
+	native_swapgs();
+	gsbase = rdgsbase();
+	native_swapgs();
+
+	return gsbase;
+}
+
+/*
+ * Out of line to be protected from kprobes and tracing. If this would be
+ * traced or probed than any access to a per CPU variable happens with
+ * the wrong GS.
+ *
+ * It is not used on Xen paravirt. When paravirt support is needed, it
+ * needs to be renamed with native_ prefix.
+ */
+static noinstr void __wrgsbase_inactive(unsigned long gsbase)
+{
+	lockdep_assert_irqs_disabled();
+
+	native_swapgs();
+	wrgsbase(gsbase);
+	native_swapgs();
+}
+
+/*
  * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
  * not available.  The goal is to be reasonably fast on non-FSGSBASE systems.
  * It's forcibly inlined because it'll generate better code and this function
@@ -327,6 +365,36 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		gsbase = __rdgsbase_inactive();
+		local_irq_restore(flags);
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		__wrgsbase_inactive(gsbase);
+		local_irq_restore(flags);
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  2019-05-08 10:02 ` [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
  2019-06-22 10:06   ` [tip:x86/cpu] " tip-bot for Andi Kleen
@ 2020-06-18 13:50   ` tip-bot2 for Andi Kleen
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Andi Kleen @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andi Kleen, Andy Lutomirski, Chang S. Bae, Thomas Gleixner,
	Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     b15378ca50810c1350086cafad3fe19979262a83
Gitweb:        https://git.kernel.org/tip/b15378ca50810c1350086cafad3fe19979262a83
Author:        Andi Kleen <ak@linux.intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:49 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:47:00 +02:00

x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
  make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/1557309753-24073-6-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-4-sashal@kernel.org


---
 arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c74..fdd1177 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
 extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
 extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);
 
+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+	unsigned long fsbase;
+
+	asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+	return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+	unsigned long gsbase;
+
+	asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+	return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+	asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  2019-05-08 10:02 ` [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
  2019-06-22 10:05   ` [tip:x86/cpu] x86/cpu: " tip-bot for Andy Lutomirski
@ 2020-06-18 13:50   ` tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 63+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin,
	Andi Kleen, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     dd649bd0b3aa012740059b1ba31ecad28a408f7f
Gitweb:        https://git.kernel.org/tip/dd649bd0b3aa012740059b1ba31ecad28a408f7f
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Thu, 28 May 2020 16:13:48 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:46:59 +02:00

x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-3-sashal@kernel.org


---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++-
 arch/x86/kernel/cpu/common.c                    | 24 ++++++++++++++++-
 2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad..7308db7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3079,6 +3079,9 @@
 	no5lvl		[X86-64] Disable 5-level paging mode. Forces
 			kernel to use 4-level paging instead.
 
+	unsafe_fsgsbase	[X86] Allow FSGSBASE instructions.  This will be
+			replaced with a nofsgsbase flag.
+
 	no_console_suspend
 			[HW] Never suspend the console
 			Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 043d93c..7438a31 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -442,6 +442,22 @@ static void __init setup_cr_pinning(void)
 }
 
 /*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+	unsafe_fsgsbase = true;
+	return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
+/*
  * Protection Keys are not available in 32-bit mode.
  */
 static bool pku_disabled;
@@ -1495,6 +1511,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	setup_smap(c);
 	setup_umip(c);
 
+	/* Enable FSGSBASE instructions if available. */
+	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+		if (unsafe_fsgsbase)
+			cr4_set_bits(X86_CR4_FSGSBASE);
+		else
+			clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+	}
+
 	/*
 	 * The vendor-specific functions might have changed features.
 	 * Now we do "generic changes."

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip: x86/fsgsbase] x86/ptrace: Prevent ptrace from clearing the FS/GS selector
  2019-06-16 15:44       ` Bae, Chang Seok
  2019-06-16 16:32         ` Thomas Gleixner
  2019-06-22 10:04         ` [tip:x86/cpu] x86/ptrace: Prevent ptrace from clearing the FS/GS selector tip-bot for Chang S. Bae
@ 2020-06-18 13:50         ` tip-bot2 for Chang S. Bae
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2020-06-18 13:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin, x86, LKML

The following commit has been merged into the x86/fsgsbase branch of tip:

Commit-ID:     fddf8ba1e48860211c9639d00883833b42fcc1e0
Gitweb:        https://git.kernel.org/tip/fddf8ba1e48860211c9639d00883833b42fcc1e0
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Thu, 28 May 2020 16:13:47 -04:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Jun 2020 15:46:59 +02:00

x86/ptrace: Prevent ptrace from clearing the FS/GS selector

When a ptracer writes a ptracee's FS/GSBASE with a different value, the
selector is also cleared. This behavior is not correct as the selector
should be preserved.

Update only the base value and leave the selector intact. To simplify the
code further remove the conditional checking for the same value as this
code is not performance critical.

The only recognizable downside of this change is when the selector is
already nonzero on write. The base will be reloaded according to the
selector. But the case is highly unexpected in real usages.

[ tglx: Massage changelog ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com
Link: https://lkml.kernel.org/r/20200528201402.1708239-2-sashal@kernel.org


---
 arch/x86/kernel/ptrace.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 4413058..1c7646c 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -379,25 +379,12 @@ static int putreg(struct task_struct *child,
 	case offsetof(struct user_regs_struct,fs_base):
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		/*
-		 * When changing the FS base, use do_arch_prctl_64()
-		 * to set the index to zero and to set the base
-		 * as requested.
-		 *
-		 * NB: This behavior is nonsensical and likely needs to
-		 * change when FSGSBASE support is added.
-		 */
-		if (child->thread.fsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_FS, value);
+		x86_fsbase_write_task(child, value);
 		return 0;
 	case offsetof(struct user_regs_struct,gs_base):
-		/*
-		 * Exactly the same here as the %fs handling above.
-		 */
 		if (value >= TASK_SIZE_MAX)
 			return -EIO;
-		if (child->thread.gsbase != value)
-			return do_arch_prctl_64(child, ARCH_SET_GS, value);
+		x86_gsbase_write_task(child, value);
 		return 0;
 #endif
 	}

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2020-06-18 13:52 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 10:02 [PATCH v7 00/18] x86: Enable FSGSBASE instructions Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 01/18] x86/fsgsbase/64: Fix ARCH_SET_FS/GS behaviors for a remote task Chang S. Bae
     [not found]   ` <74F4F506-2913-4013-9D81-A0C69FA8CDF1@intel.com>
     [not found]     ` <6420E1A5-B5AD-4028-AA91-AA4D5445AC83@intel.com>
2019-06-16 15:44       ` Bae, Chang Seok
2019-06-16 16:32         ` Thomas Gleixner
2019-06-22 10:04         ` [tip:x86/cpu] x86/ptrace: Prevent ptrace from clearing the FS/GS selector tip-bot for Chang S. Bae
2020-06-18 13:50         ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 02/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write Chang S. Bae
2019-06-22 10:04   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 03/18] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
2019-06-22 10:05   ` [tip:x86/cpu] x86/cpu: " tip-bot for Andy Lutomirski
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2019-05-08 10:02 ` [PATCH v7 04/18] kbuild: Raise the minimum required binutils version to 2.21 Chang S. Bae
2019-06-22 10:06   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 05/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
2019-06-22 10:06   ` [tip:x86/cpu] " tip-bot for Andi Kleen
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
2019-05-08 10:02 ` [PATCH v7 06/18] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
2019-06-22 10:07   ` [tip:x86/cpu] x86/fsgsbase/64: Enable FSGSBASE instructions in " tip-bot for Chang S. Bae
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 07/18] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
2019-06-21 15:22   ` Thomas Gleixner
2019-06-22 10:08   ` [tip:x86/cpu] x86/process/64: Use FSBSBASE in switch_to() if available tip-bot for Andy Lutomirski
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2019-05-08 10:02 ` [PATCH v7 08/18] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions Chang S. Bae
2019-06-22 10:09   ` [tip:x86/cpu] x86/process/64: Use FSGSBASE instructions on thread copy and ptrace tip-bot for Chang S. Bae
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 09/18] x86/entry/64: Add the READ_MSR_GSBASE macro Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 10/18] x86/entry/64: Switch CR3 before SWAPGS on the paranoid entry Chang S. Bae
2019-06-22 10:09   ` [tip:x86/cpu] x86/entry/64: Switch CR3 before SWAPGS in " tip-bot for Chang S. Bae
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 11/18] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
2019-06-22 10:10   ` [tip:x86/cpu] x86/entry/64: " tip-bot for Chang S. Bae
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Chang S. Bae
2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot for Chang S. Bae
2019-06-29  7:21   ` [PATCH v7 12/18] x86/fsgsbase/64: GSBASE handling with FSGSBASE in the paranoid path Bae, Chang Seok
2019-06-29  7:37     ` Thomas Gleixner
2020-06-18 13:50   ` [tip: x86/fsgsbase] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit tip-bot2 for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 13/18] x86/fsgsbase/64: Document GSBASE handling in the paranoid path Chang S. Bae
2019-06-22 10:11   ` [tip:x86/cpu] x86/entry/64: " tip-bot for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 14/18] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
2019-06-22 10:12   ` [tip:x86/cpu] selftests/x86/fsgsbase: Test RD/WRGSBASE tip-bot for Andy Lutomirski
2019-05-08 10:02 ` [PATCH v7 15/18] selftests/x86/fsgsbase: Test ptracer-induced GSBASE write with FSGSBASE Chang S. Bae
2019-06-22 10:13   ` [tip:x86/cpu] " tip-bot for Chang S. Bae
2019-05-08 10:02 ` [PATCH v7 16/18] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
2019-06-22 10:14   ` [tip:x86/cpu] x86/cpu: Enable FSGSBASE on 64bit " tip-bot for Andy Lutomirski
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andy Lutomirski
2019-05-08 10:02 ` [PATCH v7 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
2019-06-22 10:14   ` [tip:x86/cpu] " tip-bot for Andi Kleen
2020-06-18 13:50   ` [tip: x86/fsgsbase] " tip-bot2 for Andi Kleen
2019-05-08 10:02 ` [PATCH v7 18/18] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
2019-06-14  6:54   ` Thomas Gleixner
2019-06-14 20:07     ` Bae, Chang Seok
     [not found]       ` <89BE934A-A392-4CED-83E5-CA4FADDAE6DF@intel.com>
2019-06-16  8:39         ` Thomas Gleixner
2019-06-16 12:34           ` Thomas Gleixner
2019-06-16 15:34             ` Bae, Chang Seok
2019-06-16 16:05               ` Thomas Gleixner
2019-06-16 20:48                 ` Bae, Chang Seok
2019-06-16 22:00                   ` Thomas Gleixner
     [not found]                     ` <8E2E84B6-BCCC-424D-A1A7-604828B389FB@intel.com>
2019-06-17  5:18                       ` Thomas Gleixner
2019-06-16 15:54     ` Randy Dunlap
2019-06-16 16:07       ` Thomas Gleixner
2019-06-22 10:15     ` [tip:x86/cpu] Documentation/x86/64: Add documentation for GS/FS addressing mode tip-bot for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).