LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v7 00/10] Add TDX Guest Support (#VE handler support)
@ 2021-10-05 20:41 Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation Kuppuswamy Sathyanarayanan
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds #VE handler support,
for port I/O, MMIO and MWAIT/MONITOR features in TDX guest.

This series is the continuation of the patch series titled "Add TDX Guest
Support (Initial support)" which added initial support for TDX guests. You
can find the patchset in the following link.

[set 1, v8] - https://lore.kernel.org/lkml/20211005025205.1784480-1-sathyanarayanan.kuppuswamy@linux.intel.com/

Also please note that this series alone is not necessarily fully
functional.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

Changes since v6:
 * Included section titles in spec references

Changes since v5:
 * Rebased on top of v5.15-rc1.
 * Rebased on top of Tom Landeckys latest CC support patches.

Changes since v4:
 * Renamed tdg_ prefix to tdx_.
 * Rest of changelogs are included in patches in-line.

Changes since v3:
 * Rebased on top of Tom Lendacky protected guest changes.
 * Rest of changelogs are included in patches in-line.

Changes since v2:
 * Rebased on top of v5.14-rc1.
 * Rest of changelogs are included in patches in-line.

Changes since v1:
 * Rebased on top of TDX guest set 1 patches (which had some core API changes).
 * Moved "x86/tdx: Add early_is_tdx_guest() interface" patch from set 1 patch
   series to this patchset (since it is only used in early I/O support case).
 * Rest of changelogs are included in patches in-line.

Andi Kleen (1):
  x86/tdx: Handle early IO operations

Kirill A. Shutemov (6):
  x86/io: Allow to override inX() and outX() implementation
  x86/tdx: Handle port I/O
  x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
  x86/insn-eval: Introduce insn_decode_mmio()
  x86/sev-es: Use insn_decode_mmio() for MMIO implementation
  x86/tdx: Handle in-kernel MMIO

Kuppuswamy Sathyanarayanan (3):
  x86/tdx: Add early_is_tdx_guest() interface
  x86/tdx: Handle port I/O in decompression code
  x86/tdx: Handle MWAIT and MONITOR

 arch/x86/boot/compressed/Makefile |   2 +
 arch/x86/boot/compressed/tdcall.S |   3 +
 arch/x86/boot/compressed/tdx.c    |  31 +++++
 arch/x86/boot/cpuflags.c          |  12 +-
 arch/x86/boot/cpuflags.h          |   2 +
 arch/x86/include/asm/insn-eval.h  |  13 ++
 arch/x86/include/asm/io.h         |  24 +++-
 arch/x86/include/asm/tdx.h        |  64 +++++++++
 arch/x86/kernel/cpu/intel.c       |   1 +
 arch/x86/kernel/head64.c          |   3 +
 arch/x86/kernel/sev.c             | 171 ++++++------------------
 arch/x86/kernel/tdx.c             | 211 ++++++++++++++++++++++++++++++
 arch/x86/lib/insn-eval.c          | 102 +++++++++++++++
 include/linux/cc_platform.h       |  11 ++
 14 files changed, 511 insertions(+), 139 deletions(-)
 create mode 100644 arch/x86/boot/compressed/tdcall.S
 create mode 100644 arch/x86/boot/compressed/tdx.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-17 19:27   ` Thomas Gleixner
  2021-10-05 20:41 ` [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface Kuppuswamy Sathyanarayanan
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

The patch allows to override the implementation of the port IO
helpers. TDX code will provide an implementation that redirect the
helpers to paravirt calls.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/include/asm/io.h | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 5c6a4af0b911..3647a96238a9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -271,18 +271,26 @@ static inline bool sev_key_active(void) { return false; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
+#ifndef __out
+#define __out(bwl, bw)							\
+	asm volatile("out" #bwl " %" #bw "0, %w1" : : "a"(value), "Nd"(port))
+#endif
+
+#ifndef __in
+#define __in(bwl, bw)							\
+	asm volatile("in" #bwl " %w1, %" #bw "0" : "=a"(value) : "Nd"(port))
+#endif
+
 #define BUILDIO(bwl, bw, type)						\
 static inline void out##bwl(unsigned type value, int port)		\
 {									\
-	asm volatile("out" #bwl " %" #bw "0, %w1"			\
-		     : : "a"(value), "Nd"(port));			\
+	__out(bwl, bw);							\
 }									\
 									\
 static inline unsigned type in##bwl(int port)				\
 {									\
 	unsigned type value;						\
-	asm volatile("in" #bwl " %w1, %" #bw "0"			\
-		     : "=a"(value) : "Nd"(port));			\
+	__in(bwl, bw);							\
 	return value;							\
 }									\
 									\
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-17 19:28   ` Thomas Gleixner
  2021-10-05 20:41 ` [PATCH v7 03/10] x86/tdx: Handle port I/O in decompression code Kuppuswamy Sathyanarayanan
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add helper function to detect TDX feature support. It will be used
to protect TDX specific code in decompression code (for example to
add TDX specific I/O fixes in decompression code).

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * Fixed string order issue in cpuid_count() call.

Changes since v1:
 * Modified cpuid_has_tdx_guest() to use cpuid_count() instead of native_cpuid().
 * Reused cpuid_count() from cpuflags.c.
 * Added a new function cpuid_eax().
 * Renamed native_cpuid_has_tdx_guest() as early_cpuid_has_tdx_guest().

 arch/x86/boot/compressed/Makefile |  1 +
 arch/x86/boot/compressed/tdx.c    | 31 +++++++++++++++++++++++++++++++
 arch/x86/boot/cpuflags.c          | 12 ++++++++++--
 arch/x86/boot/cpuflags.h          |  2 ++
 4 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/boot/compressed/tdx.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..22a2a6cc2ab4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -98,6 +98,7 @@ ifdef CONFIG_X86_64
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o
 
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
new file mode 100644
index 000000000000..ecb3b42992e0
--- /dev/null
+++ b/arch/x86/boot/compressed/tdx.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tdx.c - Early boot code for TDX
+ */
+
+#include "../cpuflags.h"
+#include "../string.h"
+
+#define TDX_CPUID_LEAF_ID                       0x21
+
+static int tdx_guest = -1;
+
+static inline bool early_cpuid_has_tdx_guest(void)
+{
+	u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
+
+	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+		return false;
+
+	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
+
+	return !memcmp("IntelTDX    ", sig, 12);
+}
+
+bool early_is_tdx_guest(void)
+{
+	if (tdx_guest < 0)
+		tdx_guest = early_cpuid_has_tdx_guest();
+
+	return !!tdx_guest;
+}
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
index a0b75f73dc63..102613a092aa 100644
--- a/arch/x86/boot/cpuflags.c
+++ b/arch/x86/boot/cpuflags.c
@@ -71,8 +71,7 @@ int has_eflag(unsigned long mask)
 # define EBX_REG "=b"
 #endif
 
-static inline void cpuid_count(u32 id, u32 count,
-		u32 *a, u32 *b, u32 *c, u32 *d)
+void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d)
 {
 	asm volatile(".ifnc %%ebx,%3 ; movl  %%ebx,%3 ; .endif	\n\t"
 		     "cpuid					\n\t"
@@ -82,6 +81,15 @@ static inline void cpuid_count(u32 id, u32 count,
 	);
 }
 
+u32 cpuid_eax(u32 id)
+{
+	u32 eax, ebx, ecx, edx;
+
+	cpuid_count(id, 0, &eax, &ebx, &ecx, &edx);
+
+	return eax;
+}
+
 #define cpuid(id, a, b, c, d) cpuid_count(id, 0, a, b, c, d)
 
 void get_cpuflags(void)
diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h
index 2e20814d3ce3..5a72233eb8fd 100644
--- a/arch/x86/boot/cpuflags.h
+++ b/arch/x86/boot/cpuflags.h
@@ -17,5 +17,7 @@ extern u32 cpu_vendor[3];
 
 int has_eflag(unsigned long mask);
 void get_cpuflags(void);
+void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d);
+u32 cpuid_eax(u32 id);
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 03/10] x86/tdx: Handle port I/O in decompression code
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 04/10] x86/tdx: Handle early IO operations Kuppuswamy Sathyanarayanan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add support to replace IN/OUT instructions in decompression code with
TDX IO hypercalls.

TDX cannot do port IO directly. The TDX module triggers a #VE exception
to let the guest kernel to emulate port I/O, by converting them into
TDX hypercalls to call the host.

But for the really early code in the decompressor, #VE cannot be used
because the IDT needed for handling the exception is not set-up, and
some other infrastructure needed by the handler is missing. So to
support port IO in decompressor code, directly replace IN/OUT
instructions with TDX IO hypercalls. This can be easily achieved by
modifying __in/__out macros.

Also, since TDX IO hypercall requires an IO size parameter, modify
__in/__out macros to accept size as input parameter.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * Replaced in/out with IN/OUT.
 * Fixed typo in commit log and comments.
 * Renamed tdg_* prefix with tdx_*

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Modified tdg_{in/out} to adapt to API changes of __tdx_hypercall().
 * Removed !CONFIG_INTEL_TDX_GUEST comment in #if else case.

 arch/x86/boot/compressed/Makefile |  1 +
 arch/x86/boot/compressed/tdcall.S |  3 ++
 arch/x86/include/asm/io.h         |  9 ++---
 arch/x86/include/asm/tdx.h        | 60 +++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/boot/compressed/tdcall.S

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 22a2a6cc2ab4..1bfe30ebadbe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -99,6 +99,7 @@ endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdcall.o
 
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdcall.S b/arch/x86/boot/compressed/tdcall.S
new file mode 100644
index 000000000000..aafadc136c88
--- /dev/null
+++ b/arch/x86/boot/compressed/tdcall.S
@@ -0,0 +1,3 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include "../../kernel/tdcall.S"
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 3647a96238a9..fa6aa43e5dc3 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -41,6 +41,7 @@
 #include <linux/string.h>
 #include <linux/compiler.h>
 #include <asm/page.h>
+#include <asm/tdx.h>
 #include <asm/early_ioremap.h>
 #include <asm/pgtable_types.h>
 
@@ -272,25 +273,25 @@ static inline bool sev_key_active(void) { return false; }
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #ifndef __out
-#define __out(bwl, bw)							\
+#define __out(bwl, bw, sz)						\
 	asm volatile("out" #bwl " %" #bw "0, %w1" : : "a"(value), "Nd"(port))
 #endif
 
 #ifndef __in
-#define __in(bwl, bw)							\
+#define __in(bwl, bw, sz)						\
 	asm volatile("in" #bwl " %w1, %" #bw "0" : "=a"(value) : "Nd"(port))
 #endif
 
 #define BUILDIO(bwl, bw, type)						\
 static inline void out##bwl(unsigned type value, int port)		\
 {									\
-	__out(bwl, bw);							\
+	__out(bwl, bw, sizeof(type));					\
 }									\
 									\
 static inline unsigned type in##bwl(int port)				\
 {									\
 	unsigned type value;						\
-	__in(bwl, bw);							\
+	__in(bwl, bw, sizeof(type));					\
 	return value;							\
 }									\
 									\
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index ebb97e082376..02f6036ef85f 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -5,6 +5,8 @@
 
 #include <linux/cpufeature.h>
 #include <linux/types.h>
+#include <vdso/limits.h>
+#include <asm/vmx.h>
 
 #define TDX_CPUID_LEAF_ID			0x21
 #define TDX_HYPERCALL_STANDARD			0
@@ -72,6 +74,64 @@ unsigned long tdx_get_ve_info(struct ve_info *ve);
 int tdx_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve);
 
+/*
+ * To support I/O port access in decompressor or early kernel init
+ * code, since #VE exception handler cannot be used, use paravirt
+ * model to implement __in/__out macros which will in turn be used
+ * by in{b,w,l}()/out{b,w,l} I/O helper macros used in kernel. Details
+ * about __in/__out macro usage can be found in arch/x86/include/asm/io.h
+ */
+#ifdef BOOT_COMPRESSED_MISC_H
+
+bool early_is_tdx_guest(void);
+
+/*
+ * Helper function used for making hypercall for "in"
+ * instruction. It will be called from __in IO macro
+ * If IO is failed, it will return all 1s.
+ */
+static inline unsigned int tdx_io_in(int size, int port)
+{
+	struct tdx_hypercall_output out = {0};
+
+	__tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION,
+			size, 0, port, 0, &out);
+
+	return out.r10 ? UINT_MAX : out.r11;
+}
+
+/*
+ * Helper function used for making hypercall for "out"
+ * instruction. It will be called from __out IO macro
+ */
+static inline void tdx_io_out(int size, int port, u64 value)
+{
+	struct tdx_hypercall_output out = {0};
+
+	__tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION,
+			size, 1, port, value, &out);
+}
+
+#define __out(bwl, bw, sz)						\
+do {									\
+	if (early_is_tdx_guest()) {					\
+		tdx_io_out(sz, port, value);				\
+	} else {							\
+		asm volatile("out" #bwl " %" #bw "0, %w1" : :		\
+				"a"(value), "Nd"(port));		\
+	}								\
+} while (0)
+#define __in(bwl, bw, sz)						\
+do {									\
+	if (early_is_tdx_guest()) {					\
+		value = tdx_io_in(sz, port);				\
+	} else {							\
+		asm volatile("in" #bwl " %w1, %" #bw "0" :		\
+				"=a"(value) : "Nd"(port));		\
+	}								\
+} while (0)
+#endif
+
 #else
 
 #define is_tdx_guest		0ULL
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 04/10] x86/tdx: Handle early IO operations
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (2 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 03/10] x86/tdx: Handle port I/O in decompression code Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-11-05 21:12   ` Sean Christopherson
  2021-10-05 20:41 ` [PATCH v7 05/10] x86/tdx: Handle port I/O Kuppuswamy Sathyanarayanan
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

Add an early #VE handler to convert early port IOs into TDCALLs.

TDX cannot do port IO directly. The TDX module triggers a #VE exception
to let the guest kernel to emulate port I/O, by converting them into
TDCALLs to call the host.

To support port IO in early boot code, add a minimal support in early
exception handlers. This is similar to what AMD SEV does. This is
mainly to support early_printk's serial driver, as well as potentially
the VGA driver (although it is expected not to be used).

The early handler only does IO calls and nothing else, and anything
that goes wrong results in a normal early exception panic.

It cannot share the code paths with the normal #VE handler because it
needs to avoid using trace calls or printk.

This early handler allows us to use the normal in*/out* macros without
patching them for every driver. Since, there is no expectation that
early port IO is performance critical, the #VE emulation cost is worth
the simplicity benefit of not patching out port IO usage in early code.
There are also no concerns with nesting, since there should be no NMIs
this early.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * Changed order of variable declaration in tdx_early_io()

Changes since v3:
 * None

Changes since v2:
 * None

Changes since v1:
 * Renamed tdx_early_io() to tdg_early_io().

 arch/x86/include/asm/tdx.h |  4 +++
 arch/x86/kernel/head64.c   |  3 ++
 arch/x86/kernel/tdx.c      | 60 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 02f6036ef85f..e8ddc7c5bbab 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -74,6 +74,8 @@ unsigned long tdx_get_ve_info(struct ve_info *ve);
 int tdx_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve);
 
+bool tdx_early_handle_ve(struct pt_regs *regs);
+
 /*
  * To support I/O port access in decompressor or early kernel init
  * code, since #VE exception handler cannot be used, use paravirt
@@ -137,6 +139,8 @@ do {									\
 #define is_tdx_guest		0ULL
 static inline void tdx_early_init(void) { };
 
+static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
+
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 97ce0d037387..281d360e9ca4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -410,6 +410,9 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr)
 	    trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs))
 		return;
 
+	if (trapnr == X86_TRAP_VE && tdx_early_handle_ve(regs))
+		return;
+
 	early_fixup_exception(regs, trapnr);
 }
 
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 11e367228e96..4cbffcb737d9 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -10,6 +10,11 @@
 /* TDX Module call Leaf IDs */
 #define TDGETVEINFO			3
 
+#define VE_IS_IO_OUT(exit_qual)		(((exit_qual) & 8) ? 0 : 1)
+#define VE_GET_IO_SIZE(exit_qual)	(((exit_qual) & 7) + 1)
+#define VE_GET_PORT_NUM(exit_qual)	((exit_qual) >> 16)
+#define VE_IS_IO_STRING(exit_qual)	((exit_qual) & 16 ? 1 : 0)
+
 /*
  * Allocate it in the data region to avoid zeroing it during
  * BSS initialization. It is mainly used in cc_platform_has()
@@ -228,6 +233,61 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
 	return ret;
 }
 
+/*
+ * Handle early IO, mainly for early printks serial output.
+ * This avoids anything that doesn't work early on, like tracing
+ * or printks, by calling the low level functions directly. Any
+ * problems are handled by falling back to a standard early exception.
+ *
+ * Assumes the IO instruction was using ax, which is enforced
+ * by the standard io.h macros.
+ */
+static __init bool tdx_early_io(struct pt_regs *regs, u32 exit_qual)
+{
+	struct tdx_hypercall_output outh;
+	int out, size, port, ret;
+	bool string;
+	u64 mask;
+
+	string = VE_IS_IO_STRING(exit_qual);
+
+	/* I/O strings ops are unrolled at build time. */
+	if (string)
+		return 0;
+
+	out = VE_IS_IO_OUT(exit_qual);
+	size = VE_GET_IO_SIZE(exit_qual);
+	port = VE_GET_PORT_NUM(exit_qual);
+	mask = GENMASK(8 * size, 0);
+
+	ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
+			     regs->ax, &outh);
+	if (!out && !ret) {
+		regs->ax &= ~mask;
+		regs->ax |= outh.r11 & mask;
+	}
+
+	return !ret;
+}
+
+/*
+ * Early #VE exception handler. Just used to handle port IOs
+ * for early_printk. If anything goes wrong handle it like
+ * a normal early exception.
+ */
+__init bool tdx_early_handle_ve(struct pt_regs *regs)
+{
+	struct ve_info ve;
+
+	if (tdx_get_ve_info(&ve))
+		return false;
+
+	if (ve.exit_reason == EXIT_REASON_IO_INSTRUCTION)
+		return tdx_early_io(regs, ve.exit_qual);
+
+	return false;
+}
+
 void __init tdx_early_init(void)
 {
 	is_tdx_guest_init();
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (3 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 04/10] x86/tdx: Handle early IO operations Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-17 19:58   ` Thomas Gleixner
  2021-11-05 21:23   ` Sean Christopherson
  2021-10-05 20:41 ` [PATCH v7 06/10] x86/insn-eval: Introduce insn_get_modrm_reg_ptr() Kuppuswamy Sathyanarayanan
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

TDX hypervisors cannot emulate instructions directly. This includes
port IO which is normally emulated in the hypervisor. All port IO
instructions inside TDX trigger the #VE exception in the guest and
would be normally emulated there.

Also string I/O is not supported in TDX guest. So, unroll the string
I/O operation into a loop operating on one element at a time. This
method is similar to AMD SEV, so just extend the support for TDX guest
platform.

Add a new confidential guest flag CC_ATTR_GUEST_UNROLL_STRING_IO to
add string unroll support in asm/io.h

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * Changed prot_guest_has() to cc_platform_has().

Changes since v4:
 * Changed order of variable declaration in tdx_handle_io().
 * Changed tdg_* prefix with tdx_*.

Changes since v3:
 * Included PATTR_GUEST_UNROLL_STRING_IO protected guest flag
   addition change in this patch.
 * Rebased on top of Tom Lendacks protected guest change.

Changes since v2:
 * None

Changes since v1:
 * Fixed comments for tdg_handle_io().
 * Used _tdx_hypercall() instead of __tdx_hypercall() in tdg_handle_io().

 arch/x86/include/asm/io.h   |  7 +++++--
 arch/x86/kernel/cpu/intel.c |  1 +
 arch/x86/kernel/tdx.c       | 35 +++++++++++++++++++++++++++++++++++
 include/linux/cc_platform.h | 11 +++++++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index fa6aa43e5dc3..67e0c4a0a0f4 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -40,6 +40,7 @@
 
 #include <linux/string.h>
 #include <linux/compiler.h>
+#include <linux/cc_platform.h>
 #include <asm/page.h>
 #include <asm/tdx.h>
 #include <asm/early_ioremap.h>
@@ -310,7 +311,8 @@ static inline unsigned type in##bwl##_p(int port)			\
 									\
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {									\
-	if (sev_key_active()) {						\
+	if (sev_key_active() ||						\
+	    cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {		\
 		unsigned type *value = (unsigned type *)addr;		\
 		while (count) {						\
 			out##bwl(*value, port);				\
@@ -326,7 +328,8 @@ static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 									\
 static inline void ins##bwl(int port, void *addr, unsigned long count)	\
 {									\
-	if (sev_key_active()) {						\
+	if (sev_key_active() ||						\
+	    cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {		\
 		unsigned type *value = (unsigned type *)addr;		\
 		while (count) {						\
 			*value = in##bwl(port);				\
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b99ead877549..01d7935feaed 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -67,6 +67,7 @@ bool intel_cc_platform_has(enum cc_attr attr)
 {
 	switch (attr) {
 	case CC_ATTR_GUEST_TDX:
+	case CC_ATTR_GUEST_UNROLL_STRING_IO:
 		return is_tdx_guest;
 	default:
 		return false;
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 4cbffcb737d9..cd0fb5d14ad7 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -175,6 +175,38 @@ static u64 tdx_handle_cpuid(struct pt_regs *regs)
 	return ret;
 }
 
+/*
+ * tdx_handle_early_io() cannot be re-used in #VE handler for handling
+ * I/O because the way of handling string I/O is different between
+ * normal and early I/O case. Also, once trace support is enabled,
+ * tdx_handle_io() will be extended to use trace calls which is also
+ * not valid for early I/O cases.
+ */
+static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
+{
+	struct tdx_hypercall_output outh;
+	int out, size, port, ret;
+	bool string;
+	u64 mask;
+
+	string = VE_IS_IO_STRING(exit_qual);
+
+	/* I/O strings ops are unrolled at build time. */
+	BUG_ON(string);
+
+	out = VE_IS_IO_OUT(exit_qual);
+	size = VE_GET_IO_SIZE(exit_qual);
+	port = VE_GET_PORT_NUM(exit_qual);
+	mask = GENMASK(8 * size, 0);
+
+	ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
+			     regs->ax, &outh);
+	if (!out) {
+		regs->ax &= ~mask;
+		regs->ax |= (ret ? UINT_MAX : outh.r11) & mask;
+	}
+}
+
 unsigned long tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out = {0};
@@ -221,6 +253,9 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
 	case EXIT_REASON_CPUID:
 		ret = tdx_handle_cpuid(regs);
 		break;
+	case EXIT_REASON_IO_INSTRUCTION:
+		tdx_handle_io(regs, ve->exit_qual);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 26eb19f37d56..03dfddd736d2 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -70,6 +70,17 @@ enum cc_attr {
 	 * Examples include Intel TDX.
 	 */
 	CC_ATTR_GUEST_TDX,
+
+	/**
+	 * @CC_ATTR_GUEST_UNROLL_STRING_IO: String I/O is implemented with
+	 *                                  IN/OUT instructions
+	 *
+	 * The platform/OS is running as a guest/virtual machine and uses
+	 * IN/OUT instructions in place of string I/O.
+	 *
+	 * Examples include TDX Guest.
+	 */
+	CC_ATTR_GUEST_UNROLL_STRING_IO,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 06/10] x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (4 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 05/10] x86/tdx: Handle port I/O Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 07/10] x86/insn-eval: Introduce insn_decode_mmio() Kuppuswamy Sathyanarayanan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

The helper returns a pointer to the register indicated by
ModRM byte.

It's going to replace vc_insn_get_reg() in the SEV MMIO
implementation. TDX MMIO implementation will also use it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c         | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 91d7182ad2d6..041f399153b9 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -19,6 +19,7 @@ bool insn_has_rep_prefix(struct insn *insn);
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs);
+void *insn_get_modrm_reg_ptr(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a1d24fdc07cf..fbaa3fa24bde 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -850,6 +850,26 @@ int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs)
 	return get_reg_offset(insn, regs, REG_TYPE_REG);
 }
 
+/**
+ * insn_get_modrm_reg_ptr() - Obtain register pointer based on ModRM byte
+ * @insn:	Instruction containing the ModRM byte
+ * @regs:	Register values as seen when entering kernel mode
+ *
+ * Returns:
+ *
+ * The register indicated by the reg part of the ModRM byte.
+ * The register is obtained as a pointer within pt_regs.
+ */
+void *insn_get_modrm_reg_ptr(struct insn *insn, struct pt_regs *regs)
+{
+	int offset;
+
+	offset = insn_get_modrm_reg_off(insn, regs);
+	if (offset < 0)
+		return NULL;
+	return (void *)regs + offset;
+}
+
 /**
  * get_seg_base_limit() - obtain base address and limit of a segment
  * @insn:	Instruction. Must be valid.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 07/10] x86/insn-eval: Introduce insn_decode_mmio()
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (5 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 06/10] x86/insn-eval: Introduce insn_get_modrm_reg_ptr() Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 08/10] x86/sev-es: Use insn_decode_mmio() for MMIO implementation Kuppuswamy Sathyanarayanan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

In preparation for sharing MMIO instruction decode between SEV-ES and
TDX, factor out the common decode into a new insn_decode_mmio() helper.

For regular virtual machine, MMIO is handled by the VMM and KVM
emulates instructions that caused MMIO. But, this model doesn't work
for a secure VMs (like SEV or TDX) as VMM doesn't have access to the
guest memory and register state. So, for TDX or SEV VMM needs
assistance in handling MMIO. It induces exception in the guest. Guest
has to decode the instruction and handle it on its own.

The code is based on the current SEV MMIO implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/include/asm/insn-eval.h | 12 +++++
 arch/x86/lib/insn-eval.c         | 82 ++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 041f399153b9..4a4ca7e7be66 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -29,4 +29,16 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs,
 bool insn_decode_from_regs(struct insn *insn, struct pt_regs *regs,
 			   unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
+enum mmio_type {
+	MMIO_DECODE_FAILED,
+	MMIO_WRITE,
+	MMIO_WRITE_IMM,
+	MMIO_READ,
+	MMIO_READ_ZERO_EXTEND,
+	MMIO_READ_SIGN_EXTEND,
+	MMIO_MOVS,
+};
+
+enum mmio_type insn_decode_mmio(struct insn *insn, int *bytes);
+
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index fbaa3fa24bde..2ab29d8d6731 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1559,3 +1559,85 @@ bool insn_decode_from_regs(struct insn *insn, struct pt_regs *regs,
 
 	return true;
 }
+
+/**
+ * insn_decode_mmio() - Decode a MMIO instruction
+ * @insn:	Structure to store decoded instruction
+ * @bytes:	Returns size of memory operand
+ *
+ * Decodes instruction that used for Memory-mapped I/O.
+ *
+ * Returns:
+ *
+ * Type of the instruction. Size of the memory operand is stored in
+ * @bytes. If decode failed, MMIO_DECODE_FAILED returned.
+ */
+enum mmio_type insn_decode_mmio(struct insn *insn, int *bytes)
+{
+	int type = MMIO_DECODE_FAILED;
+
+	*bytes = 0;
+
+	insn_get_opcode(insn);
+	switch (insn->opcode.bytes[0]) {
+	case 0x88: /* MOV m8,r8 */
+		*bytes = 1;
+		fallthrough;
+	case 0x89: /* MOV m16/m32/m64, r16/m32/m64 */
+		if (!*bytes)
+			*bytes = insn->opnd_bytes;
+		type = MMIO_WRITE;
+		break;
+
+	case 0xc6: /* MOV m8, imm8 */
+		*bytes = 1;
+		fallthrough;
+	case 0xc7: /* MOV m16/m32/m64, imm16/imm32/imm64 */
+		if (!*bytes)
+			*bytes = insn->opnd_bytes;
+		type = MMIO_WRITE_IMM;
+		break;
+
+	case 0x8a: /* MOV r8, m8 */
+		*bytes = 1;
+		fallthrough;
+	case 0x8b: /* MOV r16/r32/r64, m16/m32/m64 */
+		if (!*bytes)
+			*bytes = insn->opnd_bytes;
+		type = MMIO_READ;
+		break;
+
+	case 0xa4: /* MOVS m8, m8 */
+		*bytes = 1;
+		fallthrough;
+	case 0xa5: /* MOVS m16/m32/m64, m16/m32/m64 */
+		if (!*bytes)
+			*bytes = insn->opnd_bytes;
+		type = MMIO_MOVS;
+		break;
+
+	case 0x0f: /* Two-byte instruction */
+		switch (insn->opcode.bytes[1]) {
+		case 0xb6: /* MOVZX r16/r32/r64, m8 */
+			*bytes = 1;
+			fallthrough;
+		case 0xb7: /* MOVZX r32/r64, m16 */
+			if (!*bytes)
+				*bytes = 2;
+			type = MMIO_READ_ZERO_EXTEND;
+			break;
+
+		case 0xbe: /* MOVSX r16/r32/r64, m8 */
+			*bytes = 1;
+			fallthrough;
+		case 0xbf: /* MOVSX r32/r64, m16 */
+			if (!*bytes)
+				*bytes = 2;
+			type = MMIO_READ_SIGN_EXTEND;
+			break;
+		}
+		break;
+	}
+
+	return type;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 08/10] x86/sev-es: Use insn_decode_mmio() for MMIO implementation
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (6 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 07/10] x86/insn-eval: Introduce insn_decode_mmio() Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO Kuppuswamy Sathyanarayanan
  2021-10-05 20:41 ` [PATCH v7 10/10] x86/tdx: Handle MWAIT and MONITOR Kuppuswamy Sathyanarayanan
  9 siblings, 0 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Switch SEV implementation to insn_decode_mmio(). The helper is going
to be used by TDX too.

No functional changes. It is only build-tested.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/kernel/sev.c | 171 ++++++++++--------------------------------
 1 file changed, 40 insertions(+), 131 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..e2f1ae006114 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -807,22 +807,6 @@ static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt)
 	do_early_exception(ctxt->regs, trapnr);
 }
 
-static long *vc_insn_get_reg(struct es_em_ctxt *ctxt)
-{
-	long *reg_array;
-	int offset;
-
-	reg_array = (long *)ctxt->regs;
-	offset    = insn_get_modrm_reg_off(&ctxt->insn, ctxt->regs);
-
-	if (offset < 0)
-		return NULL;
-
-	offset /= sizeof(long);
-
-	return reg_array + offset;
-}
-
 static long *vc_insn_get_rm(struct es_em_ctxt *ctxt)
 {
 	long *reg_array;
@@ -870,76 +854,6 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
 	return sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_2);
 }
 
-static enum es_result vc_handle_mmio_twobyte_ops(struct ghcb *ghcb,
-						 struct es_em_ctxt *ctxt)
-{
-	struct insn *insn = &ctxt->insn;
-	unsigned int bytes = 0;
-	enum es_result ret;
-	int sign_byte;
-	long *reg_data;
-
-	switch (insn->opcode.bytes[1]) {
-		/* MMIO Read w/ zero-extension */
-	case 0xb6:
-		bytes = 1;
-		fallthrough;
-	case 0xb7:
-		if (!bytes)
-			bytes = 2;
-
-		ret = vc_do_mmio(ghcb, ctxt, bytes, true);
-		if (ret)
-			break;
-
-		/* Zero extend based on operand size */
-		reg_data = vc_insn_get_reg(ctxt);
-		if (!reg_data)
-			return ES_DECODE_FAILED;
-
-		memset(reg_data, 0, insn->opnd_bytes);
-
-		memcpy(reg_data, ghcb->shared_buffer, bytes);
-		break;
-
-		/* MMIO Read w/ sign-extension */
-	case 0xbe:
-		bytes = 1;
-		fallthrough;
-	case 0xbf:
-		if (!bytes)
-			bytes = 2;
-
-		ret = vc_do_mmio(ghcb, ctxt, bytes, true);
-		if (ret)
-			break;
-
-		/* Sign extend based on operand size */
-		reg_data = vc_insn_get_reg(ctxt);
-		if (!reg_data)
-			return ES_DECODE_FAILED;
-
-		if (bytes == 1) {
-			u8 *val = (u8 *)ghcb->shared_buffer;
-
-			sign_byte = (*val & 0x80) ? 0xff : 0x00;
-		} else {
-			u16 *val = (u16 *)ghcb->shared_buffer;
-
-			sign_byte = (*val & 0x8000) ? 0xff : 0x00;
-		}
-		memset(reg_data, sign_byte, insn->opnd_bytes);
-
-		memcpy(reg_data, ghcb->shared_buffer, bytes);
-		break;
-
-	default:
-		ret = ES_UNSUPPORTED;
-	}
-
-	return ret;
-}
-
 /*
  * The MOVS instruction has two memory operands, which raises the
  * problem that it is not known whether the access to the source or the
@@ -1007,83 +921,78 @@ static enum es_result vc_handle_mmio_movs(struct es_em_ctxt *ctxt,
 		return ES_RETRY;
 }
 
-static enum es_result vc_handle_mmio(struct ghcb *ghcb,
-				     struct es_em_ctxt *ctxt)
+static enum es_result vc_handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct insn *insn = &ctxt->insn;
 	unsigned int bytes = 0;
+	enum mmio_type mmio;
 	enum es_result ret;
+	u8 sign_byte;
 	long *reg_data;
 
-	switch (insn->opcode.bytes[0]) {
-	/* MMIO Write */
-	case 0x88:
-		bytes = 1;
-		fallthrough;
-	case 0x89:
-		if (!bytes)
-			bytes = insn->opnd_bytes;
+	mmio = insn_decode_mmio(insn, &bytes);
+	if (mmio == MMIO_DECODE_FAILED)
+		return ES_DECODE_FAILED;
 
-		reg_data = vc_insn_get_reg(ctxt);
+	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
+		reg_data = insn_get_modrm_reg_ptr(insn, ctxt->regs);
 		if (!reg_data)
 			return ES_DECODE_FAILED;
+	}
 
+	switch (mmio) {
+	case MMIO_WRITE:
 		memcpy(ghcb->shared_buffer, reg_data, bytes);
-
 		ret = vc_do_mmio(ghcb, ctxt, bytes, false);
 		break;
-
-	case 0xc6:
-		bytes = 1;
-		fallthrough;
-	case 0xc7:
-		if (!bytes)
-			bytes = insn->opnd_bytes;
-
+	case MMIO_WRITE_IMM:
 		memcpy(ghcb->shared_buffer, insn->immediate1.bytes, bytes);
-
 		ret = vc_do_mmio(ghcb, ctxt, bytes, false);
 		break;
-
-		/* MMIO Read */
-	case 0x8a:
-		bytes = 1;
-		fallthrough;
-	case 0x8b:
-		if (!bytes)
-			bytes = insn->opnd_bytes;
-
+	case MMIO_READ:
 		ret = vc_do_mmio(ghcb, ctxt, bytes, true);
 		if (ret)
 			break;
 
-		reg_data = vc_insn_get_reg(ctxt);
-		if (!reg_data)
-			return ES_DECODE_FAILED;
-
 		/* Zero-extend for 32-bit operation */
 		if (bytes == 4)
 			*reg_data = 0;
 
 		memcpy(reg_data, ghcb->shared_buffer, bytes);
 		break;
+	case MMIO_READ_ZERO_EXTEND:
+		ret = vc_do_mmio(ghcb, ctxt, bytes, true);
+		if (ret)
+			break;
+
+		memset(reg_data, 0, insn->opnd_bytes);
+		memcpy(reg_data, ghcb->shared_buffer, bytes);
+		break;
+	case MMIO_READ_SIGN_EXTEND:
+		ret = vc_do_mmio(ghcb, ctxt, bytes, true);
+		if (ret)
+			break;
 
-		/* MOVS instruction */
-	case 0xa4:
-		bytes = 1;
-		fallthrough;
-	case 0xa5:
-		if (!bytes)
-			bytes = insn->opnd_bytes;
+		if (bytes == 1) {
+			u8 *val = (u8 *)ghcb->shared_buffer;
 
-		ret = vc_handle_mmio_movs(ctxt, bytes);
+			sign_byte = (*val & 0x80) ? 0xff : 0x00;
+		} else {
+			u16 *val = (u16 *)ghcb->shared_buffer;
+
+			sign_byte = (*val & 0x8000) ? 0xff : 0x00;
+		}
+
+		/* Sign extend based on operand size */
+		memset(reg_data, sign_byte, insn->opnd_bytes);
+		memcpy(reg_data, ghcb->shared_buffer, bytes);
 		break;
-		/* Two-Byte Opcodes */
-	case 0x0f:
-		ret = vc_handle_mmio_twobyte_ops(ghcb, ctxt);
+	case MMIO_MOVS:
+		ret = vc_handle_mmio_movs(ctxt, bytes);
 		break;
 	default:
 		ret = ES_UNSUPPORTED;
+		break;
 	}
 
 	return ret;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (7 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 08/10] x86/sev-es: Use insn_decode_mmio() for MMIO implementation Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  2021-11-05 22:41   ` Sean Christopherson
  2021-10-05 20:41 ` [PATCH v7 10/10] x86/tdx: Handle MWAIT and MONITOR Kuppuswamy Sathyanarayanan
  9 siblings, 1 reply; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

In traditional VMs, MMIO is usually implemented by giving a guest
access to a mapping which will cause a VMEXIT on access and then the
VMM emulating the access. That's not possible in TDX guest because
VMEXIT will expose the register state to the host. TDX guests don't
trust the host and can't have its state exposed to the host. In TDX
the MMIO regions are instead configured to trigger a #VE exception in
the guest. The guest #VE handler then emulates the MMIO instruction
inside the guest and converts them into a controlled TDCALL to the
host, rather than completely exposing the state to the host.

Currently, TDX only supports MMIO for instructions that are known to
come from io.h macros (build_mmio_read/write()). For drivers that don't
use the io.h macros or uses structure overlay to do MMIO are currently
not supported in TDX guest (for example the MMIO based XAPIC is disable
at runtime for TDX).

This way of handling is similar to AMD SEV.

Also, reasons for supporting #VE based MMIO in TDX guest are,

* MMIO is widely used and more drivers will be added in the future.
* To avoid annotating every TDX specific MMIO readl/writel etc.
* If annotation is not preferred, an alternative way is required
  for every MMIO access in the kernel (even though 99.9% will never
  be used on TDX) which would be a complete waste and incredible
  binary bloat for nothing.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * Changed order of variable declaration in tdx_handle_mmio().
 * Changed tdg_* prefix with tdx_*.

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/kernel/tdx.c | 108 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index cd0fb5d14ad7..851ad143da03 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -6,6 +6,9 @@
 
 #include <asm/tdx.h>
 #include <asm/vmx.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <linux/sched/signal.h> /* force_sig_fault() */
 
 /* TDX Module call Leaf IDs */
 #define TDGETVEINFO			3
@@ -207,6 +210,103 @@ static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
 	}
 }
 
+static unsigned long tdx_mmio(int size, bool write, unsigned long addr,
+			      unsigned long *val)
+{
+	struct tdx_hypercall_output out = {0};
+	u64 err;
+
+	err = _tdx_hypercall(EXIT_REASON_EPT_VIOLATION, size, write,
+			     addr, *val, &out);
+	*val = out.r11;
+	return err;
+}
+
+static int tdx_handle_mmio(struct pt_regs *regs, struct ve_info *ve)
+{
+	char buffer[MAX_INSN_SIZE];
+	unsigned long *reg, val;
+	struct insn insn = {};
+	enum mmio_type mmio;
+	int size, ret;
+	u8 sign_byte;
+
+	if (user_mode(regs)) {
+		ret = insn_fetch_from_user(regs, buffer);
+		if (!ret)
+			return -EFAULT;
+		if (!insn_decode_from_regs(&insn, regs, buffer, ret))
+			return -EFAULT;
+	} else {
+		ret = copy_from_kernel_nofault(buffer, (void *)regs->ip,
+					       MAX_INSN_SIZE);
+		if (ret)
+			return -EFAULT;
+		insn_init(&insn, buffer, MAX_INSN_SIZE, 1);
+		insn_get_length(&insn);
+	}
+
+	mmio = insn_decode_mmio(&insn, &size);
+	if (mmio == MMIO_DECODE_FAILED)
+		return -EFAULT;
+
+	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
+		reg = insn_get_modrm_reg_ptr(&insn, regs);
+		if (!reg)
+			return -EFAULT;
+	}
+
+	switch (mmio) {
+	case MMIO_WRITE:
+		memcpy(&val, reg, size);
+		ret = tdx_mmio(size, true, ve->gpa, &val);
+		break;
+	case MMIO_WRITE_IMM:
+		val = insn.immediate.value;
+		ret = tdx_mmio(size, true, ve->gpa, &val);
+		break;
+	case MMIO_READ:
+		ret = tdx_mmio(size, false, ve->gpa, &val);
+		if (ret)
+			break;
+		/* Zero-extend for 32-bit operation */
+		if (size == 4)
+			*reg = 0;
+		memcpy(reg, &val, size);
+		break;
+	case MMIO_READ_ZERO_EXTEND:
+		ret = tdx_mmio(size, false, ve->gpa, &val);
+		if (ret)
+			break;
+
+		/* Zero extend based on operand size */
+		memset(reg, 0, insn.opnd_bytes);
+		memcpy(reg, &val, size);
+		break;
+	case MMIO_READ_SIGN_EXTEND:
+		ret = tdx_mmio(size, false, ve->gpa, &val);
+		if (ret)
+			break;
+
+		if (size == 1)
+			sign_byte = (val & 0x80) ? 0xff : 0x00;
+		else
+			sign_byte = (val & 0x8000) ? 0xff : 0x00;
+
+		/* Sign extend based on operand size */
+		memset(reg, sign_byte, insn.opnd_bytes);
+		memcpy(reg, &val, size);
+		break;
+	case MMIO_MOVS:
+	case MMIO_DECODE_FAILED:
+		return -EFAULT;
+	}
+
+	if (ret)
+		return -EFAULT;
+	return insn.length;
+}
+
 unsigned long tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out = {0};
@@ -256,6 +356,14 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
 	case EXIT_REASON_IO_INSTRUCTION:
 		tdx_handle_io(regs, ve->exit_qual);
 		break;
+	case EXIT_REASON_EPT_VIOLATION:
+		/* Currently only MMIO triggers EPT violation */
+		ve->instr_len = tdx_handle_mmio(regs, ve);
+		if (ve->instr_len < 0) {
+			pr_warn_once("MMIO failed\n");
+			return -EFAULT;
+		}
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 10/10] x86/tdx: Handle MWAIT and MONITOR
  2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
                   ` (8 preceding siblings ...)
  2021-10-05 20:41 ` [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO Kuppuswamy Sathyanarayanan
@ 2021-10-05 20:41 ` Kuppuswamy Sathyanarayanan
  9 siblings, 0 replies; 22+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-05 20:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

When running as a TDX guest, there are a number of existing, privileged
instructions that do not work. If the guest kernel uses these
instructions, the hardware generates a #VE.

List of unsupported instructions can be found in Intel Trust Domain
Extensions (Intel® TDX) Module specification, sec titled "Instructions
that Cause a #VE Unconditionally" and in Guest-Host Communication
Interface (GHCI) Specification for Intel TDX, sec titled "#VE Injected
due to disallowed instructions".

To prevent TD guests from using MWAIT/MONITOR instructions, the CPUID
flags for these instructions are already disabled by the TDX module. 
   
After the above mentioned preventive measures, if TD guests still
execute these instructions, add appropriate warning message
(WARN_ONCE()) in #VE handler. This handling behavior is same as KVM
(which also treats MWAIT/MONITOR as nops with warning once in
unsupported platforms).

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---

Changes since v6:
 * Added section title to spec reference in commit log.

Changes since v5:
 * None

Changes since v4:
 * Removed usage of We/You in commit log and comments.

Changes since v3:
 * None

Changes since v2:
 * None

 arch/x86/kernel/tdx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 851ad143da03..a66520405109 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -364,6 +364,14 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
 			return -EFAULT;
 		}
 		break;
+	case EXIT_REASON_MONITOR_INSTRUCTION:
+	case EXIT_REASON_MWAIT_INSTRUCTION:
+		/*
+		 * Something in the kernel used MONITOR or MWAIT despite
+		 * X86_FEATURE_MWAIT being cleared for TDX guests.
+		 */
+		WARN_ONCE(1, "TD Guest used unsupported MWAIT/MONITOR instruction\n");
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation
  2021-10-05 20:41 ` [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation Kuppuswamy Sathyanarayanan
@ 2021-10-17 19:27   ` Thomas Gleixner
  2021-10-17 20:17     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2021-10-17 19:27 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> The patch allows to override the implementation of the port IO

git grep 'This patch' Documentation/process

> helpers. TDX code will provide an implementation that redirect the
> helpers to paravirt calls.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface
  2021-10-05 20:41 ` [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface Kuppuswamy Sathyanarayanan
@ 2021-10-17 19:28   ` Thomas Gleixner
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2021-10-17 19:28 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
> +
> +#define TDX_CPUID_LEAF_ID                       0x21
> +
> +static int tdx_guest = -1;
> +
> +static inline bool early_cpuid_has_tdx_guest(void)
> +{
> +	u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
> +
> +	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
> +		return false;
> +
> +	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
> +
> +	return !memcmp("IntelTDX    ", sig, 12);
> +}
> +
> +bool early_is_tdx_guest(void)
> +{
> +	if (tdx_guest < 0)
> +		tdx_guest = early_cpuid_has_tdx_guest();
> +
> +	return !!tdx_guest;
> +}

Sigh.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-05 20:41 ` [PATCH v7 05/10] x86/tdx: Handle port I/O Kuppuswamy Sathyanarayanan
@ 2021-10-17 19:58   ` Thomas Gleixner
  2021-10-17 20:35     ` Sathyanarayanan Kuppuswamy
  2021-11-05 21:23   ` Sean Christopherson
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2021-10-17 19:58 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
>  									\
>  static inline void outs##bwl(int port, const void *addr, unsigned long count) \
>  {									\
> -	if (sev_key_active()) {						\
> +	if (sev_key_active() ||						\
> +	    cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {		\

Instead of adding an extra check, can you please replace that
sev_key_active() with cc_platform_has() completely?

> +/*
> + * tdx_handle_early_io() cannot be re-used in #VE handler for handling
> + * I/O because the way of handling string I/O is different between
> + * normal and early I/O case. Also, once trace support is enabled,
> + * tdx_handle_io() will be extended to use trace calls which is also
> + * not valid for early I/O cases.
> + */
> +static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
> +{
> +	struct tdx_hypercall_output outh;
> +	int out, size, port, ret;
> +	bool string;
> +	u64 mask;
> +
> +	string = VE_IS_IO_STRING(exit_qual);
> +
> +	/* I/O strings ops are unrolled at build time. */

Fancy. The compiler can evaluate sev_key_active() and
cc_platform_has() at build time?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation
  2021-10-17 19:27   ` Thomas Gleixner
@ 2021-10-17 20:17     ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-17 20:17 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/17/21 12:27 PM, Thomas Gleixner wrote:
> On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> The patch allows to override the implementation of the port IO
> git grep 'This patch' Documentation/process


Sorry, I will fix the commit log in next version.


>
>> helpers. TDX code will provide an implementation that redirect the
>> helpers to paravirt calls.
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-17 19:58   ` Thomas Gleixner
@ 2021-10-17 20:35     ` Sathyanarayanan Kuppuswamy
  2021-10-18 13:52       ` Tom Lendacky
  0 siblings, 1 reply; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-17 20:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin, thomas.lendacky
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/17/21 12:58 PM, Thomas Gleixner wrote:
> On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
>>   									\
>>   static inline void outs##bwl(int port, const void *addr, unsigned long count) \
>>   {									\
>> -	if (sev_key_active()) {						\
>> +	if (sev_key_active() ||						\
>> +	    cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {		\
> Instead of adding an extra check, can you please replace that
> sev_key_active() with cc_platform_has() completely?

Yes. sev_key_active() can be removed and replaced with
cc_platform_has().

Thomas Lendacky also proposed to introduce as common
static key which can be set by both AMD SEV and TDX code.

@Thomas Lendacky, any comments?

>
>> +/*
>> + * tdx_handle_early_io() cannot be re-used in #VE handler for handling
>> + * I/O because the way of handling string I/O is different between
>> + * normal and early I/O case. Also, once trace support is enabled,
>> + * tdx_handle_io() will be extended to use trace calls which is also
>> + * not valid for early I/O cases.
>> + */
>> +static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
>> +{
>> +	struct tdx_hypercall_output outh;
>> +	int out, size, port, ret;
>> +	bool string;
>> +	u64 mask;
>> +
>> +	string = VE_IS_IO_STRING(exit_qual);
>> +
>> +	/* I/O strings ops are unrolled at build time. */
> Fancy. The compiler can evaluate sev_key_active() and
> cc_platform_has() at build time?

It is incorrect. I will fix this in next version. How about following
change?

I/O strings are replaced with in/out instructions. If string is reported,
report it with BUG.

>
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-17 20:35     ` Sathyanarayanan Kuppuswamy
@ 2021-10-18 13:52       ` Tom Lendacky
  2021-10-18 18:42         ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 22+ messages in thread
From: Tom Lendacky @ 2021-10-18 13:52 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 10/17/21 3:35 PM, Sathyanarayanan Kuppuswamy wrote:
> 
> On 10/17/21 12:58 PM, Thomas Gleixner wrote:
>> On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
>>>                                       \
>>>   static inline void outs##bwl(int port, const void *addr, unsigned 
>>> long count) \
>>>   {                                    \
>>> -    if (sev_key_active()) {                        \
>>> +    if (sev_key_active() ||                        \
>>> +        cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {        \
>> Instead of adding an extra check, can you please replace that
>> sev_key_active() with cc_platform_has() completely?
> 
> Yes. sev_key_active() can be removed and replaced with
> cc_platform_has().
> 
> Thomas Lendacky also proposed to introduce as common
> static key which can be set by both AMD SEV and TDX code.
> 
> @Thomas Lendacky, any comments?

Either way works for me.

For the cc_platform_has() path, you will need to update the 
amd_cc_platform_has() to return true for CC_ATTR_GUEST_UNROLL_STRING_IO 
for SEV only, i.e.:

	case CC_ATTR_GUEST_UNROLL_STRING_IO:
		return (sev_status & MSR_AMD64_SEV_ENABLED) &&
		       !(sev_status & MSR_AMD64_SEV_ES_ENABLED);

Thanks,
Tom

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-18 13:52       ` Tom Lendacky
@ 2021-10-18 18:42         ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-18 18:42 UTC (permalink / raw)
  To: Tom Lendacky, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin
  Cc: Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/18/21 6:52 AM, Tom Lendacky wrote:
> On 10/17/21 3:35 PM, Sathyanarayanan Kuppuswamy wrote:
>>
>> On 10/17/21 12:58 PM, Thomas Gleixner wrote:
>>> On Tue, Oct 05 2021 at 13:41, Kuppuswamy Sathyanarayanan wrote:
>>>> \
>>>>   static inline void outs##bwl(int port, const void *addr, unsigned 
>>>> long count) \
>>>>   {                                    \
>>>> -    if (sev_key_active()) {                        \
>>>> +    if (sev_key_active() ||                        \
>>>> +        cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {        \
>>> Instead of adding an extra check, can you please replace that
>>> sev_key_active() with cc_platform_has() completely?
>>
>> Yes. sev_key_active() can be removed and replaced with
>> cc_platform_has().
>>
>> Thomas Lendacky also proposed to introduce as common
>> static key which can be set by both AMD SEV and TDX code.
>>
>> @Thomas Lendacky, any comments?
>
> Either way works for me.
>
> For the cc_platform_has() path, you will need to update the 
> amd_cc_platform_has() to return true for 
> CC_ATTR_GUEST_UNROLL_STRING_IO for SEV only, i.e.:
>
>     case CC_ATTR_GUEST_UNROLL_STRING_IO:
>         return (sev_status & MSR_AMD64_SEV_ENABLED) &&
>                !(sev_status & MSR_AMD64_SEV_ES_ENABLED);

I will submit a separate cleanup patch to make SEV code use 
CC_ATTR_GUEST_UNROLL_STRING_IO
first. That way, in this patch I can just set 
CC_ATTR_GUEST_UNROLL_STRING_IO.

>
> Thanks,
> Tom

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 04/10] x86/tdx: Handle early IO operations
  2021-10-05 20:41 ` [PATCH v7 04/10] x86/tdx: Handle early IO operations Kuppuswamy Sathyanarayanan
@ 2021-11-05 21:12   ` Sean Christopherson
  2021-11-05 23:08     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2021-11-05 21:12 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Tue, Oct 05, 2021, Kuppuswamy Sathyanarayanan wrote:
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>

Heh, is Andi double-dipping to pad his stats?  :-D

> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---

...

> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 11e367228e96..4cbffcb737d9 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -10,6 +10,11 @@
>  /* TDX Module call Leaf IDs */
>  #define TDGETVEINFO			3
>  
> +#define VE_IS_IO_OUT(exit_qual)		(((exit_qual) & 8) ? 0 : 1)
> +#define VE_GET_IO_SIZE(exit_qual)	(((exit_qual) & 7) + 1)
> +#define VE_GET_PORT_NUM(exit_qual)	((exit_qual) >> 16)
> +#define VE_IS_IO_STRING(exit_qual)	((exit_qual) & 16 ? 1 : 0)
> +
>  /*
>   * Allocate it in the data region to avoid zeroing it during
>   * BSS initialization. It is mainly used in cc_platform_has()
> @@ -228,6 +233,61 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
>  	return ret;
>  }
>  
> +/*
> + * Handle early IO, mainly for early printks serial output.
> + * This avoids anything that doesn't work early on, like tracing
> + * or printks, by calling the low level functions directly. Any
> + * problems are handled by falling back to a standard early exception.
> + *
> + * Assumes the IO instruction was using ax, which is enforced
> + * by the standard io.h macros.
> + */
> +static __init bool tdx_early_io(struct pt_regs *regs, u32 exit_qual)
> +{
> +	struct tdx_hypercall_output outh;

"outh" looks like a typo.  Maybe "result" or something alongs those lines?

> +	int out, size, port, ret;
> +	bool string;
> +	u64 mask;
> +
> +	string = VE_IS_IO_STRING(exit_qual);
> +
> +	/* I/O strings ops are unrolled at build time. */
> +	if (string)

Why bother with "string"?

	if (VE_IS_IO_STRING(exit_qual))
		return false;

> +		return 0;

Ugh.  This needs to be "return false".  "return 0" in the kernel usually means
success, but this horror returns a bool where "false" is failure.

> +
> +	out = VE_IS_IO_OUT(exit_qual);
> +	size = VE_GET_IO_SIZE(exit_qual);
> +	port = VE_GET_PORT_NUM(exit_qual);
> +	mask = GENMASK(8 * size, 0);

size * BITS_PER_BYTE

> +
> +	ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
> +			     regs->ax, &outh);

This unnecessarily exposes RAX to the untrusted VMM for IN.

> +	if (!out && !ret) {
> +		regs->ax &= ~mask;
> +		regs->ax |= outh.r11 & mask;
> +	}
> +
> +	return !ret;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 05/10] x86/tdx: Handle port I/O
  2021-10-05 20:41 ` [PATCH v7 05/10] x86/tdx: Handle port I/O Kuppuswamy Sathyanarayanan
  2021-10-17 19:58   ` Thomas Gleixner
@ 2021-11-05 21:23   ` Sean Christopherson
  1 sibling, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-11-05 21:23 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Tue, Oct 05, 2021, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 4cbffcb737d9..cd0fb5d14ad7 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -175,6 +175,38 @@ static u64 tdx_handle_cpuid(struct pt_regs *regs)
>  	return ret;
>  }
>  
> +/*
> + * tdx_handle_early_io() cannot be re-used in #VE handler for handling
> + * I/O because the way of handling string I/O is different between
> + * normal and early I/O case. Also, once trace support is enabled,
> + * tdx_handle_io() will be extended to use trace calls which is also
> + * not valid for early I/O cases.
> + */
> +static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
> +{
> +	struct tdx_hypercall_output outh;

Same comments as patch 04.

> +	int out, size, port, ret;
> +	bool string;
> +	u64 mask;
> +
> +	string = VE_IS_IO_STRING(exit_qual);
> +
> +	/* I/O strings ops are unrolled at build time. */
> +	BUG_ON(string);

And here as well.

> +
> +	out = VE_IS_IO_OUT(exit_qual);
> +	size = VE_GET_IO_SIZE(exit_qual);
> +	port = VE_GET_PORT_NUM(exit_qual);
> +	mask = GENMASK(8 * size, 0);

And here.

> +
> +	ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
> +			     regs->ax, &outh);

This one too.

> +	if (!out) {

This needs to check "ret".  In general, why is this continuing on if I/O fails?
If I/O fails, the kernel done messed up and some downstream driver is going to
be real unhappy.  At a minimum, it should WARN.

On a related topic, the GHCB says:

	TDG.VP.VMCALL_INVALID_OPERAND 0x80000000 00000000 Invalid-IO-Port access

The "Invalid-IO-Port access" in particular is poorly worded as it implies the VMM
is allowed to deny access to ports, but AFAIK that's not the intention.  A better
phrasing would be something like "Reserved value in input GPR".

The GHCI should probably also state that bits 63:16 of R14 (port) are reserved.

> +		regs->ax &= ~mask;
> +		regs->ax |= (ret ? UINT_MAX : outh.r11) & mask;
> +	}
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO
  2021-10-05 20:41 ` [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO Kuppuswamy Sathyanarayanan
@ 2021-11-05 22:41   ` Sean Christopherson
  0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2021-11-05 22:41 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Tue, Oct 05, 2021, Kuppuswamy Sathyanarayanan wrote:
> @@ -207,6 +210,103 @@ static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
>  	}
>  }
>  
> +static unsigned long tdx_mmio(int size, bool write, unsigned long addr,
> +			      unsigned long *val)
> +{
> +	struct tdx_hypercall_output out = {0};
> +	u64 err;
> +
> +	err = _tdx_hypercall(EXIT_REASON_EPT_VIOLATION, size, write,
> +			     addr, *val, &out);
> +	*val = out.r11;

Val should not be written on error, and writing it for "write" is also weird.

> +	return err;
> +}
> +
> +static int tdx_handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> +{
> +	char buffer[MAX_INSN_SIZE];
> +	unsigned long *reg, val;
> +	struct insn insn = {};
> +	enum mmio_type mmio;
> +	int size, ret;
> +	u8 sign_byte;
> +
> +	if (user_mode(regs)) {
> +		ret = insn_fetch_from_user(regs, buffer);
> +		if (!ret)
> +			return -EFAULT;
> +		if (!insn_decode_from_regs(&insn, regs, buffer, ret))
> +			return -EFAULT;
> +	} else {
> +		ret = copy_from_kernel_nofault(buffer, (void *)regs->ip,
> +					       MAX_INSN_SIZE);
> +		if (ret)
> +			return -EFAULT;
> +		insn_init(&insn, buffer, MAX_INSN_SIZE, 1);
> +		insn_get_length(&insn);
> +	}
> +
> +	mmio = insn_decode_mmio(&insn, &size);
> +	if (mmio == MMIO_DECODE_FAILED)
> +		return -EFAULT;
> +
> +	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
> +		reg = insn_get_modrm_reg_ptr(&insn, regs);
> +		if (!reg)
> +			return -EFAULT;
> +	}
> +
> +	switch (mmio) {
> +	case MMIO_WRITE:
> +		memcpy(&val, reg, size);
> +		ret = tdx_mmio(size, true, ve->gpa, &val);
> +		break;
> +	case MMIO_WRITE_IMM:
> +		val = insn.immediate.value;
> +		ret = tdx_mmio(size, true, ve->gpa, &val);
> +		break;
> +	case MMIO_READ:
> +		ret = tdx_mmio(size, false, ve->gpa, &val);

val is never set, i.e. this is leaking stack data to the untrusted VMM.

> +		if (ret)
> +			break;
> +		/* Zero-extend for 32-bit operation */
> +		if (size == 4)
> +			*reg = 0;
> +		memcpy(reg, &val, size);
> +		break;
> +	case MMIO_READ_ZERO_EXTEND:
> +		ret = tdx_mmio(size, false, ve->gpa, &val);

And here.

> +		if (ret)
> +			break;
> +
> +		/* Zero extend based on operand size */
> +		memset(reg, 0, insn.opnd_bytes);
> +		memcpy(reg, &val, size);
> +		break;
> +	case MMIO_READ_SIGN_EXTEND:
> +		ret = tdx_mmio(size, false, ve->gpa, &val);

And here.

> +		if (ret)
> +			break;
> +
> +		if (size == 1)
> +			sign_byte = (val & 0x80) ? 0xff : 0x00;
> +		else
> +			sign_byte = (val & 0x8000) ? 0xff : 0x00;
> +
> +		/* Sign extend based on operand size */
> +		memset(reg, sign_byte, insn.opnd_bytes);
> +		memcpy(reg, &val, size);
> +		break;
> +	case MMIO_MOVS:
> +	case MMIO_DECODE_FAILED:
> +		return -EFAULT;
> +	}
> +
> +	if (ret)
> +		return -EFAULT;
> +	return insn.length;
> +}
> +
>  unsigned long tdx_get_ve_info(struct ve_info *ve)
>  {
>  	struct tdx_module_output out = {0};
> @@ -256,6 +356,14 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
>  	case EXIT_REASON_IO_INSTRUCTION:
>  		tdx_handle_io(regs, ve->exit_qual);
>  		break;
> +	case EXIT_REASON_EPT_VIOLATION:
> +		/* Currently only MMIO triggers EPT violation */
> +		ve->instr_len = tdx_handle_mmio(regs, ve);
> +		if (ve->instr_len < 0) {
> +			pr_warn_once("MMIO failed\n");

That's not remotely helpful.  Why not?

		if (WARN_ON_ONCE(ve->instr_len < 0))
			return -EFAULT;

> +			return -EFAULT;
> +		}
> +		break;
>  	default:
>  		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>  		return -EFAULT;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 04/10] x86/tdx: Handle early IO operations
  2021-11-05 21:12   ` Sean Christopherson
@ 2021-11-05 23:08     ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 22+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-11-05 23:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, H . Peter Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel



On 11/5/21 2:12 PM, Sean Christopherson wrote:
> On Tue, Oct 05, 2021, Kuppuswamy Sathyanarayanan wrote:
>> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> 
> Heh, is Andi double-dipping to pad his stats?  :-D

Sorry, it was my mistake. I will remove it.

> 
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
> 
> ...
> 
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index 11e367228e96..4cbffcb737d9 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -10,6 +10,11 @@
>>   /* TDX Module call Leaf IDs */
>>   #define TDGETVEINFO			3
>>   
>> +#define VE_IS_IO_OUT(exit_qual)		(((exit_qual) & 8) ? 0 : 1)
>> +#define VE_GET_IO_SIZE(exit_qual)	(((exit_qual) & 7) + 1)
>> +#define VE_GET_PORT_NUM(exit_qual)	((exit_qual) >> 16)
>> +#define VE_IS_IO_STRING(exit_qual)	((exit_qual) & 16 ? 1 : 0)
>> +
>>   /*
>>    * Allocate it in the data region to avoid zeroing it during
>>    * BSS initialization. It is mainly used in cc_platform_has()
>> @@ -228,6 +233,61 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
>>   	return ret;
>>   }
>>   
>> +/*
>> + * Handle early IO, mainly for early printks serial output.
>> + * This avoids anything that doesn't work early on, like tracing
>> + * or printks, by calling the low level functions directly. Any
>> + * problems are handled by falling back to a standard early exception.
>> + *
>> + * Assumes the IO instruction was using ax, which is enforced
>> + * by the standard io.h macros.
>> + */
>> +static __init bool tdx_early_io(struct pt_regs *regs, u32 exit_qual)
>> +{
>> +	struct tdx_hypercall_output outh;
> 
> "outh" looks like a typo.  Maybe "result" or something alongs those lines?

I have fixed it (will be part of next submission). We are going to use
out here and change

out = VE_IS_IO_OUT(exit_qual);

to

in = VE_IS_IO_IN(exit_qual);

> 
>> +	int out, size, port, ret;
>> +	bool string;
>> +	u64 mask;
>> +
>> +	string = VE_IS_IO_STRING(exit_qual);
>> +
>> +	/* I/O strings ops are unrolled at build time. */
>> +	if (string)
> 
> Why bother with "string"?
> 
> 	if (VE_IS_IO_STRING(exit_qual))
> 		return false;
> 
>> +		return 0;
> 
> Ugh.  This needs to be "return false".  "return 0" in the kernel usually means
> success, but this horror returns a bool where "false" is failure.

It will be fixed in next version.

> 
>> +
>> +	out = VE_IS_IO_OUT(exit_qual);
>> +	size = VE_GET_IO_SIZE(exit_qual);
>> +	port = VE_GET_PORT_NUM(exit_qual);
>> +	mask = GENMASK(8 * size, 0);
> 
> size * BITS_PER_BYTE

Ok. I will fix this in next version.

> 
>> +
>> +	ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
>> +			     regs->ax, &outh);
> 
> This unnecessarily exposes RAX to the untrusted VMM for IN.

Yes. I will remove it.

> 
>> +	if (!out && !ret) {
>> +		regs->ax &= ~mask;
>> +		regs->ax |= outh.r11 & mask;
>> +	}
>> +
>> +	return !ret;
>> +}

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-11-05 23:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 20:41 [PATCH v7 00/10] Add TDX Guest Support (#VE handler support) Kuppuswamy Sathyanarayanan
2021-10-05 20:41 ` [PATCH v7 01/10] x86/io: Allow to override inX() and outX() implementation Kuppuswamy Sathyanarayanan
2021-10-17 19:27   ` Thomas Gleixner
2021-10-17 20:17     ` Sathyanarayanan Kuppuswamy
2021-10-05 20:41 ` [PATCH v7 02/10] x86/tdx: Add early_is_tdx_guest() interface Kuppuswamy Sathyanarayanan
2021-10-17 19:28   ` Thomas Gleixner
2021-10-05 20:41 ` [PATCH v7 03/10] x86/tdx: Handle port I/O in decompression code Kuppuswamy Sathyanarayanan
2021-10-05 20:41 ` [PATCH v7 04/10] x86/tdx: Handle early IO operations Kuppuswamy Sathyanarayanan
2021-11-05 21:12   ` Sean Christopherson
2021-11-05 23:08     ` Sathyanarayanan Kuppuswamy
2021-10-05 20:41 ` [PATCH v7 05/10] x86/tdx: Handle port I/O Kuppuswamy Sathyanarayanan
2021-10-17 19:58   ` Thomas Gleixner
2021-10-17 20:35     ` Sathyanarayanan Kuppuswamy
2021-10-18 13:52       ` Tom Lendacky
2021-10-18 18:42         ` Sathyanarayanan Kuppuswamy
2021-11-05 21:23   ` Sean Christopherson
2021-10-05 20:41 ` [PATCH v7 06/10] x86/insn-eval: Introduce insn_get_modrm_reg_ptr() Kuppuswamy Sathyanarayanan
2021-10-05 20:41 ` [PATCH v7 07/10] x86/insn-eval: Introduce insn_decode_mmio() Kuppuswamy Sathyanarayanan
2021-10-05 20:41 ` [PATCH v7 08/10] x86/sev-es: Use insn_decode_mmio() for MMIO implementation Kuppuswamy Sathyanarayanan
2021-10-05 20:41 ` [PATCH v7 09/10] x86/tdx: Handle in-kernel MMIO Kuppuswamy Sathyanarayanan
2021-11-05 22:41   ` Sean Christopherson
2021-10-05 20:41 ` [PATCH v7 10/10] x86/tdx: Handle MWAIT and MONITOR Kuppuswamy Sathyanarayanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).