LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH RFC 0/5] Handle UEFI NX-restricted page tables
@ 2021-11-10 10:46 Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 1/5] efi/x86: Disable paging when booting via efistub Baskov Evgeniy
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

Note, that this patch series is RFC, since it is yet untested
and possibly incompatible with AMD SEV and related extensions.

The UEFI specification states that certain memory regions may
not have every permission, i.e. may not be writable or executable.

Furthermore there exist some implementations (at least on i386/x86_64)
that restrict execution of memory regions expected by the kernel to
be executable. E.g. first megabyte of address space, where trampoline
for switching between 4/5 level paging is placed and memory regions,
allocated as loader data.

This patch series allows Linux kernel to boot on such UEFI
implementations on i386 and x86_64.

The simplest way to achieve that on i386 is to disable paging
before jumping to potentially relocated code.

x86_64, on the other hand, does not allow disabling paging so it
is required to build temporary page tables containing memory regions
required for Linux kernel to boot with appropriate access permissions.

Baskov Evgeniy (5):
       Docs: document notemppt option
       efi: Add option for handling efi memory protection
       libstub: build temporary page table without NX-bit
       efi/x86_64: set page table if provided by libstub
       efi/x86: Disable paging when booting via efistub

 Documentation/admin-guide/kernel-parameters.txt |    7 
 arch/x86/boot/compressed/head_32.S              |   12 +
 arch/x86/boot/compressed/head_64.S              |   12 +
 drivers/firmware/efi/Kconfig                    |   17 ++
 drivers/firmware/efi/libstub/Makefile           |    2 
 drivers/firmware/efi/libstub/efi-stub-helper.c  |    3 
 drivers/firmware/efi/libstub/efistub.h          |   10 +
 drivers/firmware/efi/libstub/temp-pgtable.c     |  190 ++++++++++++++++++++++++
 drivers/firmware/efi/libstub/x86-stub.c         |    8 -
 9 files changed, 258 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 1/5] efi/x86: Disable paging when booting via efistub
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
@ 2021-11-10 10:46 ` Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 2/5] efi/x86_64: set page table if provided by libstub Baskov Evgeniy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

Some UEFI implementations protect lower 1M memory regions and memory
regions allocated by libstub from being executable, which prevents
Linux kernel from booting.

Disable paging after returning from efi_main() before jumping
to potentially relocated code to prevent page fault from happening.

Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
---
 arch/x86/boot/compressed/head_32.S | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 659fad53ca82..c66fccaa90a2 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -156,6 +156,18 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
 	add	$0x4, %esp
 	movl	8(%esp), %esi	/* save boot_params pointer */
 	call	efi_main
+
+#ifdef CONFIG_EFI_STRICT_PGTABLE
+	/*
+	 * Disable paging before jumping to relocated address to prevent
+	 * page faulting on EFI firmware versions that enforces restricted
+	 * permissions on identity page tables
+	 */
+	movl	%cr0, %ecx
+	btrl	$31, %ecx
+	movl	%ecx, %cr0
+#endif
+
 	/* efi_main returns the possibly relocated address of startup_32 */
 	jmp	*%eax
 SYM_FUNC_END(efi32_stub_entry)
-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 2/5] efi/x86_64: set page table if provided by libstub
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 1/5] efi/x86: Disable paging when booting via efistub Baskov Evgeniy
@ 2021-11-10 10:46 ` Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 3/5] libstub: build temporary page table without NX-bit Baskov Evgeniy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

It is desired to be able to switch pages tables before jumping
to potentially relocated code while booting via UEFI.
The easiest way to achieve that is to do it in assembly
immediately after returning from efi_main().

Add mechanism to switch page table to one provided by libstub.

Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
---
 arch/x86/boot/compressed/head_64.S | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 572c535cf45b..1e467fdefd9d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -540,6 +540,17 @@ SYM_FUNC_START_ALIAS(efi_stub_entry)
 	movq	%rdx, %rbx			/* save boot_params pointer */
 	call	efi_main
 	movq	%rbx,%rsi
+
+	/*
+	 * Switch page table to the one constructed by libstub if provided
+	 * It is required to be done here because stack is not mapped
+	 * in new page table.
+	 */
+	movq	efi_temp_pgtable(%rip), %rbx
+	testq	%rbx, %rbx
+	jz	1f
+	movq	%rbx, %cr3
+1:
 	leaq	rva(startup_64)(%rax), %rax
 	jmp	*%rax
 SYM_FUNC_END(efi64_stub_entry)
@@ -736,6 +747,7 @@ SYM_DATA_END_LABEL(boot32_idt, SYM_L_GLOBAL, boot32_idt_end)
 
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
+SYM_DATA(efi_temp_pgtable, .quad 0)
 #endif
 #ifdef CONFIG_EFI_MIXED
 SYM_DATA_LOCAL(efi32_boot_args, .long 0, 0, 0)
-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 3/5] libstub: build temporary page table without NX-bit
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 1/5] efi/x86: Disable paging when booting via efistub Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 2/5] efi/x86_64: set page table if provided by libstub Baskov Evgeniy
@ 2021-11-10 10:46 ` Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 4/5] efi: Add option for handling efi memory protection Baskov Evgeniy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

Some UEFI implementations restrict execution of memory regions expected
by the kernel to be executable. E.g. first MiB of physical addresses
and memory regions allocated by libstub via efi boot services
as loader data.

Build temporary page table containing address ranges with code execution
allowed required for the kernel to boot on x86_64.

This new page table is set immediately after the return from
efi_main().

Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
---
 drivers/firmware/efi/libstub/Makefile         |   2 +
 .../firmware/efi/libstub/efi-stub-helper.c    |   3 +
 drivers/firmware/efi/libstub/efistub.h        |  10 +
 drivers/firmware/efi/libstub/temp-pgtable.c   | 190 ++++++++++++++++++
 drivers/firmware/efi/libstub/x86-stub.c       |   8 +-
 5 files changed, 211 insertions(+), 2 deletions(-)
 create mode 100644 drivers/firmware/efi/libstub/temp-pgtable.c

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e..3eb2cfc370f4 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -142,3 +142,5 @@ quiet_cmd_stubcopy = STUBCPY $@
 		/bin/false;						\
 	fi;								\
 	$(OBJCOPY) $(STUBCOPY_FLAGS-y) $< $@
+
+lib-$(CONFIG_EFI_STRICT_PGTABLE)	+= temp-pgtable.o
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index d489bdc645fe..ccd6d98753c7 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -23,6 +23,7 @@ bool efi_nokaslr = !IS_ENABLED(CONFIG_RANDOMIZE_BASE);
 bool efi_noinitrd;
 int efi_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
 bool efi_novamap;
+bool efi_temppt = IS_ENABLED(CONFIG_EFI_STRICT_PGTABLE);
 
 static bool efi_nosoftreserve;
 static bool efi_disable_pci_dma = IS_ENABLED(CONFIG_EFI_DISABLE_PCI_DMA);
@@ -227,6 +228,8 @@ efi_status_t efi_parse_options(char const *cmdline)
 				efi_disable_pci_dma = true;
 			if (parse_option_str(val, "no_disable_early_pci_dma"))
 				efi_disable_pci_dma = false;
+			if (parse_option_str(val, "notemppt"))
+				efi_temppt = false;
 			if (parse_option_str(val, "debug"))
 				efi_loglevel = CONSOLE_LOGLEVEL_DEBUG;
 		} else if (!strcmp(param, "video") &&
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index cde0a2ef507d..9e40c486c564 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -34,6 +34,7 @@ extern bool efi_nokaslr;
 extern bool efi_noinitrd;
 extern int efi_loglevel;
 extern bool efi_novamap;
+extern bool efi_temppt;
 
 extern const efi_system_table_t *efi_system_table;
 
@@ -858,4 +859,13 @@ efi_enable_reset_attack_mitigation(void) { }
 
 void efi_retrieve_tpm2_eventlog(void);
 
+#if defined(CONFIG_EFI_STRICT_PGTABLE) && defined(CONFIG_X86_64)
+void efi_build_temp_pgtable(struct boot_params *boot_params,
+			    unsigned long bzimage_addr);
+#else
+static inline void
+efi_build_temp_pgtable(struct boot_params *boot_params,
+		       unsigned long image_base) { }
+#endif
+
 #endif
diff --git a/drivers/firmware/efi/libstub/temp-pgtable.c b/drivers/firmware/efi/libstub/temp-pgtable.c
new file mode 100644
index 000000000000..9fb74f7b2bef
--- /dev/null
+++ b/drivers/firmware/efi/libstub/temp-pgtable.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/efi.h>
+#include <linux/stddef.h>
+
+#include <asm/boot.h>
+#include <asm/efi.h>
+#include <asm/init.h>
+#include <asm/setup.h>
+
+#include "efistub.h"
+
+#ifdef CONFIG_X86_64
+
+/*
+ * The top level page table entry pointer.
+ * Used in head_64.S to load temporary page tables
+ */
+
+extern unsigned long efi_temp_pgtable;
+
+#ifdef CONFIG_X86_5LEVEL
+extern unsigned int __pgtable_l5_enabled;
+#else
+unsigned int __pgtable_l5_enabled;
+#endif
+
+void startup_32(struct boot_params *boot_params);
+extern u32 image_offset;
+
+#define TRAMPOLINE_PLACEMENT_MAX 0x100000
+
+/*
+ * Same as in latter boot page table,
+ * but with 5th level and a few extra pages
+ * just to be future proof, since we only store
+ * addresses and not whole pages.
+ */
+#define MAX_TEMP_PAGE_TABLE_PAGES (BOOT_PGT_SIZE+7)
+
+static unsigned long temppt_pages[MAX_TEMP_PAGE_TABLE_PAGES];
+static size_t temppt_page_count;
+
+static void *allocate_pt_page(void *ctx)
+{
+	efi_status_t status;
+	unsigned long res;
+
+	(void)ctx;
+
+	/*
+	 * We need to allocate above 1MB boundary, so
+	 * page table won't be overwritten by trampoline code
+	 */
+	status = efi_low_alloc_above(PAGE_SIZE, PAGE_SIZE,
+				     &res, TRAMPOLINE_PLACEMENT_MAX);
+	if (status != EFI_SUCCESS)
+		return NULL;
+
+	/* Track allocations to free them in error path */
+	if (temppt_page_count < MAX_TEMP_PAGE_TABLE_PAGES) {
+		temppt_pages[temppt_page_count++] = res;
+	} else {
+		efi_warn("Exceeded number of allocated pages\n");
+		efi_free(PAGE_SIZE, res);
+		return NULL;
+	}
+
+	memset((void *)res, 0, PAGE_SIZE);
+
+	return (void *)res;
+}
+
+static int add_identity_map(struct x86_mapping_info *info,
+			    unsigned long start, unsigned long end)
+{
+	/* Align boundary to 2M. */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return 0;
+
+	/* This function is included into compressed kernel */
+	return kernel_ident_mapping_init(info, (pgd_t *)efi_temp_pgtable,
+					 start, end);
+}
+
+void efi_build_temp_pgtable(struct boot_params *boot_params,
+			    unsigned long bzimage_addr)
+{
+	struct setup_data *it;
+	unsigned long buffer_size, image_addr, buffer_addr;
+	unsigned long cmdline;
+
+	/*
+	 * We need to override this variable to build
+	 * the same amount of page table levels as
+	 * currently used by EFI page tables since
+	 * switching number of page table levels
+	 * is not possible without leaving long mode.
+	 * This variable is used inside kernel_ident_mapping_init().
+	 * (from arch/x86/boot/compressed/ident_map_64.c)
+	 */
+
+	__pgtable_l5_enabled = !!(native_read_cr4() & X86_CR4_LA57);
+
+	image_addr = (unsigned long)&startup_32;
+	buffer_addr = bzimage_addr - image_offset;
+	buffer_size = ALIGN(buffer_addr, boot_params->hdr.kernel_alignment) +
+		boot_params->hdr.init_size + bzimage_addr;
+
+	struct x86_mapping_info info = {
+		.alloc_pgt_page = allocate_pt_page,
+		.page_flag = __PAGE_KERNEL_LARGE_EXEC,
+		.kernpg_flag = _KERNPG_TABLE_NOENC,
+	};
+
+	efi_temp_pgtable = (unsigned long)allocate_pt_page(NULL);
+	if (!efi_temp_pgtable)
+		return;
+
+	/*
+	 * First MiB of memory is used for trampoline placement
+	 * while switching page tables in head_64.S
+	 */
+
+	if (add_identity_map(&info, 0, TRAMPOLINE_PLACEMENT_MAX))
+		goto error;
+
+	/*
+	 * Relocated compressed kernel image and
+	 * buffer for decompression
+	 */
+
+	if (add_identity_map(&info, buffer_addr,
+			     buffer_addr + buffer_size))
+		goto error;
+
+	/*
+	 * Original placement of compressed kernel image
+	 * is  required to be mapped to prevent
+	 * from segfaulting during page table switching
+	 */
+
+	if (image_addr != buffer_addr + image_offset) {
+		if (add_identity_map(&info, image_addr, (unsigned long)_end))
+			goto error;
+	}
+
+	/*
+	 * Boot parameters ("zero page") and kernel command line
+	 * are also used by early boot code
+	 */
+
+	if (add_identity_map(&info, (unsigned long)boot_params,
+			     (unsigned long)(boot_params + 1)))
+		goto error;
+
+	it = (struct setup_data *)boot_params->hdr.setup_data;
+	while (it) {
+		if (add_identity_map(&info, (unsigned long)it,
+				     (unsigned long)sizeof(*it) + it->len))
+			goto error;
+		it = (struct setup_data *)it->next;
+	}
+
+	cmdline = boot_params->hdr.cmd_line_ptr;
+	cmdline |= (u64)boot_params->ext_cmd_line_ptr << 32;
+	if (add_identity_map(&info, cmdline, cmdline + COMMAND_LINE_SIZE))
+		goto error;
+
+	__pgtable_l5_enabled = 0;
+	return;
+
+error:
+	/*
+	 * If an error occurred, we have nothing better to do than
+	 * pretend that nothing happened and leave current
+	 * page table as is.
+	 */
+	__pgtable_l5_enabled = 0;
+	efi_temp_pgtable = 0;
+
+	while (temppt_page_count > 0)
+		efi_free(PAGE_SIZE, temppt_pages[--temppt_page_count]);
+
+	efi_warn("Termporary page table allocation failed\n");
+}
+
+#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f14c4ff5839f..244b49c8d226 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -622,7 +622,8 @@ static efi_status_t exit_boot_func(struct efi_boot_memmap *map,
 	return EFI_SUCCESS;
 }
 
-static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
+static efi_status_t exit_boot(struct boot_params *boot_params,
+			      unsigned long bzimage_addr, void *handle)
 {
 	unsigned long map_sz, key, desc_size, buff_size;
 	efi_memory_desc_t *mem_map;
@@ -642,6 +643,9 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
 	priv.boot_params	= boot_params;
 	priv.efi		= &boot_params->efi_info;
 
+	if (efi_temppt)
+		efi_build_temp_pgtable(boot_params, bzimage_addr);
+
 	status = allocate_e820(boot_params, &e820ext, &e820ext_size);
 	if (status != EFI_SUCCESS)
 		return status;
@@ -799,7 +803,7 @@ unsigned long efi_main(efi_handle_t handle,
 
 	setup_quirks(boot_params);
 
-	status = exit_boot(boot_params, handle);
+	status = exit_boot(boot_params, bzimage_addr, handle);
 	if (status != EFI_SUCCESS) {
 		efi_err("exit_boot() failed!\n");
 		goto fail;
-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 4/5] efi: Add option for handling efi memory protection
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
                   ` (2 preceding siblings ...)
  2021-11-10 10:46 ` [PATCH RFC 3/5] libstub: build temporary page table without NX-bit Baskov Evgeniy
@ 2021-11-10 10:46 ` Baskov Evgeniy
  2021-11-10 10:46 ` [PATCH RFC 5/5] Docs: document notemppt option Baskov Evgeniy
  2021-11-10 11:11 ` [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Ard Biesheuvel
  5 siblings, 0 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

Add option to enable handling strict page table permissions
added in previous patches.

Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
---
 drivers/firmware/efi/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 2c3dac5ecb36..f57a9c865dce 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -243,6 +243,23 @@ config EFI_DISABLE_PCI_DMA
 	  options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma"
 	  may be used to override this option.
 
+config EFI_STRICT_PGTABLE
+	bool "Handle strict page table permissions in libstub"
+	depends on EFI_STUB && X86
+	default y
+	help
+	  Some firmware disables execution of memory allocated in boot loader
+	  and/or lower megabyte of physical address space, since it is allowed
+	  by specification. That prevents Linux kernel from booting.
+
+	  This option makes libstub either create temporary identity mapped
+	  page tables with unset non-executable flag  on x86_64 or just disable
+	  paging altogether on i386 to overcome the issue.
+
+	  If firmware does not restrict permissions on identity page tables,
+	  temporary page table creation may be disabled with kernel command
+	  line option "efi=notemppt".
+
 endmenu
 
 config EFI_EMBEDDED_FIRMWARE
-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 5/5] Docs: document notemppt option
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
                   ` (3 preceding siblings ...)
  2021-11-10 10:46 ` [PATCH RFC 4/5] efi: Add option for handling efi memory protection Baskov Evgeniy
@ 2021-11-10 10:46 ` Baskov Evgeniy
  2021-11-10 11:11 ` [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Ard Biesheuvel
  5 siblings, 0 replies; 9+ messages in thread
From: Baskov Evgeniy @ 2021-11-10 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Baskov Evgeniy, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Jonathan Corbet, Thomas Gleixner, x86, linux-doc, linux-efi,
	linux-kernel

This option allows to disable building temporary page tables when
booting via efistub with CONFIG_EFI_STRICT_PGTABLE enabled.

Signed-off-by: Baskov Evgeniy <baskov@ispras.ru>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..0c4ed43cd13c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1294,7 +1294,8 @@
 	efi=		[EFI]
 			Format: { "debug", "disable_early_pci_dma",
 				  "nochunk", "noruntime", "nosoftreserve",
-				  "novamap", "no_disable_early_pci_dma" }
+				  "novamap", "no_disable_early_pci_dma",
+				  "notemppt" }
 			debug: enable misc debug output.
 			disable_early_pci_dma: disable the busmaster bit on all
 			PCI bridges while in the EFI boot stub.
@@ -1311,6 +1312,10 @@
 			novamap: do not call SetVirtualAddressMap().
 			no_disable_early_pci_dma: Leave the busmaster bit set
 			on all PCI bridges while in the EFI boot stub
+			notemppt: disable temporary page table creation in EFISTUB
+			for CONFIG_EFI_STRICT_PGTABLE on x86_64. Can very slightly
+			increase boot speed. Page copying is not required by most
+			EFI implementations.
 
 	efi_no_storage_paranoia [EFI; X86]
 			Using this parameter you can use more than 50% of
-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/5] Handle UEFI NX-restricted page tables
  2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
                   ` (4 preceding siblings ...)
  2021-11-10 10:46 ` [PATCH RFC 5/5] Docs: document notemppt option Baskov Evgeniy
@ 2021-11-10 11:11 ` Ard Biesheuvel
  2021-11-25  7:36   ` baskov
  5 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2021-11-10 11:11 UTC (permalink / raw)
  To: Baskov Evgeniy
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Jonathan Corbet,
	Thomas Gleixner, X86 ML, Linux Doc Mailing List, linux-efi,
	Linux Kernel Mailing List

On Wed, 10 Nov 2021 at 11:56, Baskov Evgeniy <baskov@ispras.ru> wrote:
>
> Note, that this patch series is RFC, since it is yet untested
> and possibly incompatible with AMD SEV and related extensions.
>
> The UEFI specification states that certain memory regions may
> not have every permission, i.e. may not be writable or executable.
>
> Furthermore there exist some implementations (at least on i386/x86_64)
> that restrict execution of memory regions expected by the kernel to
> be executable. E.g. first megabyte of address space, where trampoline
> for switching between 4/5 level paging is placed and memory regions,
> allocated as loader data.
>
> This patch series allows Linux kernel to boot on such UEFI
> implementations on i386 and x86_64.
>
> The simplest way to achieve that on i386 is to disable paging
> before jumping to potentially relocated code.
>
> x86_64, on the other hand, does not allow disabling paging so it
> is required to build temporary page tables containing memory regions
> required for Linux kernel to boot with appropriate access permissions.
>

Hello Baskov,

To be honest, I am truly not a fan of this approach.

Which systems is this issue occurring on? Did you try something like
the below to allocate executable memory explicitly?


diff --git a/drivers/firmware/efi/libstub/relocate.c
b/drivers/firmware/efi/libstub/relocate.c
index 8ee9eb2b9039..b73012a7bcdc 100644
--- a/drivers/firmware/efi/libstub/relocate.c
+++ b/drivers/firmware/efi/libstub/relocate.c
@@ -80,7 +80,7 @@ efi_status_t efi_low_alloc_above(unsigned long size,
unsigned long align,
                        continue;

                status = efi_bs_call(allocate_pages, EFI_ALLOCATE_ADDRESS,
-                                    EFI_LOADER_DATA, nr_pages, &start);
+                                    EFI_LOADER_CODE, nr_pages, &start);
                if (status == EFI_SUCCESS) {
                        *addr = start;
                        break;
@@ -146,7 +146,7 @@ efi_status_t efi_relocate_kernel(unsigned long *image_addr,
         */
        nr_pages = round_up(alloc_size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
        status = efi_bs_call(allocate_pages, EFI_ALLOCATE_ADDRESS,
-                            EFI_LOADER_DATA, nr_pages, &efi_addr);
+                            EFI_LOADER_CODE, nr_pages, &efi_addr);
        new_addr = efi_addr;
        /*
         * If preferred address allocation failed allocate as low as



-- 
Ard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/5] Handle UEFI NX-restricted page tables
  2021-11-10 11:11 ` [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Ard Biesheuvel
@ 2021-11-25  7:36   ` baskov
  2021-12-16 17:30     ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: baskov @ 2021-11-25  7:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Jonathan Corbet,
	Thomas Gleixner, X86 ML, Linux Doc Mailing List, linux-efi,
	Linux Kernel Mailing List


Hello,

I apologize for delayed reply.

The system in question runs in a firmware that tries to achieve
complete W^X protection. Both loader code and loader data
are not executable, so the suggested approach does not work.
If you would like to test this, you can set
the PcdDxeNxMemoryProtectionPolicy in any firmware available to you.

As a justification for the approach itself, I can use the fact that
UEFI specification says nothing about the ability to execute
self-allocated EfiLoaderCode or any other types besides the areas
allocated by the firmware for UEFI Images. In fact, Table 7-5
explicitly states that EfiLoaderCode is used for:

> The code portions of a loaded UEFI application.

While we do not think it should be interpreted as one cannot allocate
such areas at all, it is clear that there are no guarantees about the
other use cases and permissions of the allocations of this type besides
those stated by 2.3.4:

> Paging mode is enabled and any memory space defined by the UEFI memory
> map is identity mapped (virtual address equals physical address),
> although the attributes of certain regions may not have all read,
> write, and execute attributes or be unmarked for purposes of platform
> protection.

Long story short, the kernel is not allowed to allocate such areas and
assume they are executable, it should do paging itself, and the changes
here address that. For the reference, Windows adheres to this convention
and works fine on the target system.

Thanks,
Baskov Evgeniy

On 2021-11-10 14:11, Ard Biesheuvel wrote:
> On Wed, 10 Nov 2021 at 11:56, Baskov Evgeniy <baskov@ispras.ru> wrote:
>> 
>> Note, that this patch series is RFC, since it is yet untested
>> and possibly incompatible with AMD SEV and related extensions.
>> 
>> The UEFI specification states that certain memory regions may
>> not have every permission, i.e. may not be writable or executable.
>> 
>> Furthermore there exist some implementations (at least on i386/x86_64)
>> that restrict execution of memory regions expected by the kernel to
>> be executable. E.g. first megabyte of address space, where trampoline
>> for switching between 4/5 level paging is placed and memory regions,
>> allocated as loader data.
>> 
>> This patch series allows Linux kernel to boot on such UEFI
>> implementations on i386 and x86_64.
>> 
>> The simplest way to achieve that on i386 is to disable paging
>> before jumping to potentially relocated code.
>> 
>> x86_64, on the other hand, does not allow disabling paging so it
>> is required to build temporary page tables containing memory regions
>> required for Linux kernel to boot with appropriate access permissions.
>> 
> 
> Hello Baskov,
> 
> To be honest, I am truly not a fan of this approach.
> 
> Which systems is this issue occurring on? Did you try something like
> the below to allocate executable memory explicitly?
> 
> 
> diff --git a/drivers/firmware/efi/libstub/relocate.c
> b/drivers/firmware/efi/libstub/relocate.c
> index 8ee9eb2b9039..b73012a7bcdc 100644
> --- a/drivers/firmware/efi/libstub/relocate.c
> +++ b/drivers/firmware/efi/libstub/relocate.c
> @@ -80,7 +80,7 @@ efi_status_t efi_low_alloc_above(unsigned long size,
> unsigned long align,
>                         continue;
> 
>                 status = efi_bs_call(allocate_pages, 
> EFI_ALLOCATE_ADDRESS,
> -                                    EFI_LOADER_DATA, nr_pages, 
> &start);
> +                                    EFI_LOADER_CODE, nr_pages, 
> &start);
>                 if (status == EFI_SUCCESS) {
>                         *addr = start;
>                         break;
> @@ -146,7 +146,7 @@ efi_status_t efi_relocate_kernel(unsigned long 
> *image_addr,
>          */
>         nr_pages = round_up(alloc_size, EFI_ALLOC_ALIGN) / 
> EFI_PAGE_SIZE;
>         status = efi_bs_call(allocate_pages, EFI_ALLOCATE_ADDRESS,
> -                            EFI_LOADER_DATA, nr_pages, &efi_addr);
> +                            EFI_LOADER_CODE, nr_pages, &efi_addr);
>         new_addr = efi_addr;
>         /*
>          * If preferred address allocation failed allocate as low as



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/5] Handle UEFI NX-restricted page tables
  2021-11-25  7:36   ` baskov
@ 2021-12-16 17:30     ` Ard Biesheuvel
  0 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2021-12-16 17:30 UTC (permalink / raw)
  To: Baskov Evgeniy
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Jonathan Corbet,
	Thomas Gleixner, X86 ML, Linux Doc Mailing List, linux-efi,
	Linux Kernel Mailing List

On Thu, 25 Nov 2021 at 08:36, <baskov@ispras.ru> wrote:
>
>
> Hello,
>
> I apologize for delayed reply.
>

No worries.


> The system in question runs in a firmware that tries to achieve
> complete W^X protection. Both loader code and loader data
> are not executable, so the suggested approach does not work.
> If you would like to test this, you can set
> the PcdDxeNxMemoryProtectionPolicy in any firmware available to you.
>

The PCD in question has the following note:

# NOTE: User must NOT set NX protection for EfiLoaderCode /
EfiBootServicesCode / EfiRuntimeServicesCode. <BR>

Any idea whether this is easily reproducible with OVMF? Restricting
the loader from creating executable regions seems rather daft, so we
should at least report this, and preferably fix it in EDK2.

> As a justification for the approach itself, I can use the fact that
> UEFI specification says nothing about the ability to execute
> self-allocated EfiLoaderCode or any other types besides the areas
> allocated by the firmware for UEFI Images. In fact, Table 7-5
> explicitly states that EfiLoaderCode is used for:
>
> > The code portions of a loaded UEFI application.
>

Fair enough. So EfiLoaderCode is not the right type.

> While we do not think it should be interpreted as one cannot allocate
> such areas at all, it is clear that there are no guarantees about the
> other use cases and permissions of the allocations of this type besides
> those stated by 2.3.4:
>
> > Paging mode is enabled and any memory space defined by the UEFI memory
> > map is identity mapped (virtual address equals physical address),
> > although the attributes of certain regions may not have all read,
> > write, and execute attributes or be unmarked for purposes of platform
> > protection.
>
> Long story short, the kernel is not allowed to allocate such areas and
> assume they are executable,

OK

> it should do paging itself,

Now you're going too fast. One does not necessarily imply the other.

> and the changes
> here address that. For the reference, Windows adheres to this convention
> and works fine on the target system.
>

Given that this issue is specific to EDK2 based firmwares, would it be
possible to fix this using DXE services instead? In particular, could
we use

gDS->SetMemorySpaceAttributes()

to ensure that the regions have executable permissions?





>
> On 2021-11-10 14:11, Ard Biesheuvel wrote:
> > On Wed, 10 Nov 2021 at 11:56, Baskov Evgeniy <baskov@ispras.ru> wrote:
> >>
> >> Note, that this patch series is RFC, since it is yet untested
> >> and possibly incompatible with AMD SEV and related extensions.
> >>
> >> The UEFI specification states that certain memory regions may
> >> not have every permission, i.e. may not be writable or executable.
> >>
> >> Furthermore there exist some implementations (at least on i386/x86_64)
> >> that restrict execution of memory regions expected by the kernel to
> >> be executable. E.g. first megabyte of address space, where trampoline
> >> for switching between 4/5 level paging is placed and memory regions,
> >> allocated as loader data.
> >>
> >> This patch series allows Linux kernel to boot on such UEFI
> >> implementations on i386 and x86_64.
> >>
> >> The simplest way to achieve that on i386 is to disable paging
> >> before jumping to potentially relocated code.
> >>
> >> x86_64, on the other hand, does not allow disabling paging so it
> >> is required to build temporary page tables containing memory regions
> >> required for Linux kernel to boot with appropriate access permissions.
> >>
> >
> > Hello Baskov,
> >
> > To be honest, I am truly not a fan of this approach.
> >
> > Which systems is this issue occurring on? Did you try something like
> > the below to allocate executable memory explicitly?
> >
> >
> > diff --git a/drivers/firmware/efi/libstub/relocate.c
> > b/drivers/firmware/efi/libstub/relocate.c
> > index 8ee9eb2b9039..b73012a7bcdc 100644
> > --- a/drivers/firmware/efi/libstub/relocate.c
> > +++ b/drivers/firmware/efi/libstub/relocate.c
> > @@ -80,7 +80,7 @@ efi_status_t efi_low_alloc_above(unsigned long size,
> > unsigned long align,
> >                         continue;
> >
> >                 status = efi_bs_call(allocate_pages,
> > EFI_ALLOCATE_ADDRESS,
> > -                                    EFI_LOADER_DATA, nr_pages,
> > &start);
> > +                                    EFI_LOADER_CODE, nr_pages,
> > &start);
> >                 if (status == EFI_SUCCESS) {
> >                         *addr = start;
> >                         break;
> > @@ -146,7 +146,7 @@ efi_status_t efi_relocate_kernel(unsigned long
> > *image_addr,
> >          */
> >         nr_pages = round_up(alloc_size, EFI_ALLOC_ALIGN) /
> > EFI_PAGE_SIZE;
> >         status = efi_bs_call(allocate_pages, EFI_ALLOCATE_ADDRESS,
> > -                            EFI_LOADER_DATA, nr_pages, &efi_addr);
> > +                            EFI_LOADER_CODE, nr_pages, &efi_addr);
> >         new_addr = efi_addr;
> >         /*
> >          * If preferred address allocation failed allocate as low as
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-16 17:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 10:46 [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Baskov Evgeniy
2021-11-10 10:46 ` [PATCH RFC 1/5] efi/x86: Disable paging when booting via efistub Baskov Evgeniy
2021-11-10 10:46 ` [PATCH RFC 2/5] efi/x86_64: set page table if provided by libstub Baskov Evgeniy
2021-11-10 10:46 ` [PATCH RFC 3/5] libstub: build temporary page table without NX-bit Baskov Evgeniy
2021-11-10 10:46 ` [PATCH RFC 4/5] efi: Add option for handling efi memory protection Baskov Evgeniy
2021-11-10 10:46 ` [PATCH RFC 5/5] Docs: document notemppt option Baskov Evgeniy
2021-11-10 11:11 ` [PATCH RFC 0/5] Handle UEFI NX-restricted page tables Ard Biesheuvel
2021-11-25  7:36   ` baskov
2021-12-16 17:30     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).