LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCHv3 0/3] Fix couple of issues with LDT remap for PTI
@ 2018-10-26 12:28 Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-10-26 12:28 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

The patchset fixes issues with LDT remap for PTI:

 - Layout collision due to KASLR with 5-level paging;

 - Information leak via Meltdown-like attack;

Please review and consider applying.

v3:
 - Split out cleanup in map_ldt_struct() into a separate patch
v2:
 - Rebase to the Linus' tree
   + fix conflict with new documentation of kernel memory layout
   + fix few mistakes in layout documentation
 - Fix typo in commit message

Kirill A. Shutemov (3):
  x86/mm: Move LDT remap out of KASLR region on 5-level paging
  x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  x86/ldt: Remove unused variable in map_ldt_struct()

 Documentation/x86/x86_64/mm.txt         | 34 +++++++-------
 arch/x86/include/asm/page_64_types.h    | 12 ++---
 arch/x86/include/asm/pgtable_64_types.h |  4 +-
 arch/x86/kernel/ldt.c                   | 59 ++++++++++++++++---------
 arch/x86/xen/mmu_pv.c                   |  6 +--
 5 files changed, 67 insertions(+), 48 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-26 12:28 [PATCHv3 0/3] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
@ 2018-10-26 12:28 ` Kirill A. Shutemov
  2018-11-02 21:07   ` Andy Lutomirski
                     ` (2 more replies)
  2018-10-26 12:28 ` [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct() Kirill A. Shutemov
  2 siblings, 3 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-10-26 12:28 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

On 5-level paging LDT remap area is placed in the middle of
KASLR randomization region and it can overlap with direct mapping,
vmalloc or vmap area.

Let's move LDT just before direct mapping which makes it safe for KASLR.
This also allows us to unify layout between 4- and 5-level paging.

We don't touch 4 pgd slot gap just before the direct mapping reserved
for a hypervisor, but move direct mapping by one slot instead.

The LDT mapping is per-mm, so we cannot move it into P4D page table next
to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
---
 Documentation/x86/x86_64/mm.txt         | 34 +++++++++++++------------
 arch/x86/include/asm/page_64_types.h    | 12 +++++----
 arch/x86/include/asm/pgtable_64_types.h |  4 +--
 arch/x86/xen/mmu_pv.c                   |  6 ++---
 4 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 702898633b00..75bff98928a8 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
- ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
- ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
+ ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
+ ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
+ ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
  ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
  ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
  ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
  ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
  ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
-                                                            | Identical layout to the 47-bit one from here on:
+                                                            | Identical layout to the 56-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
@@ -83,7 +84,7 @@ Notes:
 __________________|____________|__________________|_________|___________________________________________________________
                   |            |                  |         |
  0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
-                  |            |                  |         |     virtual memory addresses up to the -128 TB
+                  |            |                  |         |     virtual memory addresses up to the -64 PB
                   |            |                  |         |     starting offset of kernel mappings.
 __________________|____________|__________________|_________|___________________________________________________________
                                                             |
@@ -91,23 +92,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
- ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
- ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
+ ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
+ ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
+ ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
  ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
  ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
  ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
  ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
  ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
                                                             | Identical layout to the 47-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index cd0cf1c568b4..8f657286d599 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -33,12 +33,14 @@
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
- * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
- * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
- * what Xen requires.
+ * PGDIR_SIZE*17 (pgd slot 273).
+ *
+ * The gap is to allow a space for LDT remap for PTI (1 pgd slot) and space for
+ * a hypervisor (16 slots). Choosing 16 slots for a hypervisor is arbitrary,
+ * but it's what Xen requires.
  */
-#define __PAGE_OFFSET_BASE_L5	_AC(0xff10000000000000, UL)
-#define __PAGE_OFFSET_BASE_L4	_AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE_L5	_AC(0xff11000000000000, UL)
+#define __PAGE_OFFSET_BASE_L4	_AC(0xffff888000000000, UL)
 
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 #define __PAGE_OFFSET           page_offset_base
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04edd2d58211..84bd9bdc1987 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -111,9 +111,7 @@ extern unsigned int ptrs_per_p4d;
  */
 #define MAXMEM			(1UL << MAX_PHYSMEM_BITS)
 
-#define LDT_PGD_ENTRY_L4	-3UL
-#define LDT_PGD_ENTRY_L5	-112UL
-#define LDT_PGD_ENTRY		(pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
+#define LDT_PGD_ENTRY		-240UL
 #define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
 #define LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 70ea598a37d2..7a2a74c2dd30 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1905,7 +1905,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	init_top_pgt[0] = __pgd(0);
 
 	/* Pre-constructed entries are in pfn, so convert to mfn */
-	/* L4[272] -> level3_ident_pgt  */
+	/* L4[273] -> level3_ident_pgt  */
 	/* L4[511] -> level3_kernel_pgt */
 	convert_pfn_mfn(init_top_pgt);
 
@@ -1925,8 +1925,8 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	addr[0] = (unsigned long)pgd;
 	addr[1] = (unsigned long)l3;
 	addr[2] = (unsigned long)l2;
-	/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
-	 * Both L4[272][0] and L4[511][510] have entries that point to the same
+	/* Graft it onto L4[273][0]. Note that we creating an aliasing problem:
+	 * Both L4[273][0] and L4[511][510] have entries that point to the same
 	 * L2 (PMD) tables. Meaning that if you modify it in __va space
 	 * it will be also modified in the __ka space! (But if you just
 	 * modify the PMD table to point to other PTE's or none, then you
-- 
2.19.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  2018-10-26 12:28 [PATCHv3 0/3] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
@ 2018-10-26 12:28 ` Kirill A. Shutemov
  2018-10-31 12:17   ` Kirill A. Shutemov
  2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct() Kirill A. Shutemov
  2 siblings, 2 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-10-26 12:28 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

modify_ldt(2) leaves old LDT mapped after we switch over to the new one.
Memory for the old LDT gets freed and the pages can be re-used.

Leaving the mapping in place can have security implications. The mapping
is present in userspace copy of page tables and Meltdown-like attack can
read these freed and possibly reused pages.

It's relatively simple to fix: just unmap the old LDT and flush TLB
before freeing LDT memory.

We can now avoid flushing TLB on map_ldt_struct() as the slot is
unmapped and flushed by unmap_ldt_struct() (or never mapped in
the first place). The overhead of the change should be negligible.
It shouldn't be a particularly hot path anyway.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
---
 arch/x86/kernel/ldt.c | 51 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ab18e0884dc6..e32f3427438a 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -199,14 +199,6 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
- *
- * There is no corresponding unmap function.  Even if the LDT is freed, we
- * leave the PTEs around until the slot is reused or the mm is destroyed.
- * This is harmless: the LDT is always in ordinary memory, and no one will
- * access the freed slot.
- *
- * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
- * it useful, and the flush would slow down modify_ldt().
  */
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
@@ -215,7 +207,7 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	bool is_vmalloc;
 	spinlock_t *ptl;
 	pgd_t *pgd;
-	int i;
+	int i, nr_pages;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return 0;
@@ -238,7 +230,8 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
-	for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+	for (i = 0; i < nr_pages; i++) {
 		unsigned long offset = i << PAGE_SHIFT;
 		const void *src = (char *)ldt->entries + offset;
 		unsigned long pfn;
@@ -272,13 +265,39 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Propagate LDT mapping to the user page-table */
 	map_ldt_struct_to_user(mm);
 
-	va = (unsigned long)ldt_slot_va(slot);
-	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, PAGE_SHIFT, false);
-
 	ldt->slot = slot;
 	return 0;
 }
 
+static void
+unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+	unsigned long va;
+	int i, nr_pages;
+
+	if (!ldt)
+		return;
+
+	/* LDT map/unmap is only required for PTI */
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+	for (i = 0; i < nr_pages; i++) {
+		unsigned long offset = i << PAGE_SHIFT;
+		pte_t *ptep;
+		spinlock_t *ptl;
+
+		va = (unsigned long)ldt_slot_va(ldt->slot) + offset;
+		ptep = get_locked_pte(mm, va, &ptl);
+		pte_clear(mm, va, ptep);
+		pte_unmap_unlock(ptep, ptl);
+	}
+
+	va = (unsigned long)ldt_slot_va(ldt->slot);
+	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, 0, false);
+}
+
 #else /* !CONFIG_PAGE_TABLE_ISOLATION */
 
 static int
@@ -286,6 +305,11 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
 	return 0;
 }
+
+static void
+unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+}
 #endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 static void free_ldt_pgtables(struct mm_struct *mm)
@@ -524,6 +548,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	}
 
 	install_ldt(mm, new_ldt);
+	unmap_ldt_struct(mm, old_ldt);
 	free_ldt_struct(old_ldt);
 	error = 0;
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct()
  2018-10-26 12:28 [PATCHv3 0/3] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
  2018-10-26 12:28 ` [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
@ 2018-10-26 12:28 ` Kirill A. Shutemov
  2018-11-02 21:08   ` Andy Lutomirski
  2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
  2 siblings, 2 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-10-26 12:28 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

Commit

  9bae3197e15d ("x86/ldt: Split out sanity check in map_ldt_struct()")

moved page table syncing into a separate funtion. pgd variable is now
unsed in map_ldt_struct().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/ldt.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index e32f3427438a..5dc8ed202fa8 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -206,7 +206,6 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	unsigned long va;
 	bool is_vmalloc;
 	spinlock_t *ptl;
-	pgd_t *pgd;
 	int i, nr_pages;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
@@ -221,13 +220,6 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Check if the current mappings are sane */
 	sanity_check_ldt_mapping(mm);
 
-	/*
-	 * Did we already have the top level entry allocated?  We can't
-	 * use pgd_none() for this because it doens't do anything on
-	 * 4-level page table kernels.
-	 */
-	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
 	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  2018-10-26 12:28 ` [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
@ 2018-10-31 12:17   ` Kirill A. Shutemov
  2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-10-31 12:17 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel

On Fri, Oct 26, 2018 at 12:28:55PM +0000, Kirill A. Shutemov wrote:
> +	va = (unsigned long)ldt_slot_va(ldt->slot);
> +	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, 0, false);

I've got it wrong on rebase. It has to be PAGE_SHIFT instead of 0.
Here's the fix up.

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 5dc8ed202fa8..60775dcd5bcc 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -287,7 +287,7 @@ unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
 	}
 
 	va = (unsigned long)ldt_slot_va(ldt->slot);
-	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, 0, false);
+	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false);
 }
 
 #else /* !CONFIG_PAGE_TABLE_ISOLATION */
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
@ 2018-11-02 21:07   ` Andy Lutomirski
  2018-11-06 20:39   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
  2018-11-10 12:29   ` [PATCHv3 1/3] " Baoquan He
  2 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2018-11-02 21:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andrew Lutomirski, Peter Zijlstra, Boris Ostrovsky,
	Juergen Gross, Baoquan He, Matthew Wilcox, X86 ML, Linux-MM,
	LKML

On Fri, Oct 26, 2018 at 5:29 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On 5-level paging LDT remap area is placed in the middle of
> KASLR randomization region and it can overlap with direct mapping,
> vmalloc or vmap area.
>
> Let's move LDT just before direct mapping which makes it safe for KASLR.
> This also allows us to unify layout between 4- and 5-level paging.
>
> We don't touch 4 pgd slot gap just before the direct mapping reserved
> for a hypervisor, but move direct mapping by one slot instead.
>
> The LDT mapping is per-mm, so we cannot move it into P4D page table next
> to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> paging.

Reviewed-by: Andy Lutomirski <luto@kernel.org>

(assuming it passes tests with 4-level and 5-level.  my test setup is
current busted, and i'm bisecting it.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct()
  2018-10-26 12:28 ` [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct() Kirill A. Shutemov
@ 2018-11-02 21:08   ` Andy Lutomirski
  2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2018-11-02 21:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andrew Lutomirski, Peter Zijlstra, Boris Ostrovsky,
	Juergen Gross, Baoquan He, Matthew Wilcox, X86 ML, Linux-MM,
	LKML

On Fri, Oct 26, 2018 at 5:29 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Commit
>
>   9bae3197e15d ("x86/ldt: Split out sanity check in map_ldt_struct()")
>
> moved page table syncing into a separate funtion. pgd variable is now
> unsed in map_ldt_struct().

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:x86/urgent] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
  2018-11-02 21:07   ` Andy Lutomirski
@ 2018-11-06 20:39   ` tip-bot for Kirill A. Shutemov
  2018-11-10 12:29   ` [PATCHv3 1/3] " Baoquan He
  2 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-11-06 20:39 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: kirill.shutemov, hpa, linux-kernel, mingo, tglx, luto

Commit-ID:  d52888aa2753e3063a9d3a0c9f72f94aa9809c15
Gitweb:     https://git.kernel.org/tip/d52888aa2753e3063a9d3a0c9f72f94aa9809c15
Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 26 Oct 2018 15:28:54 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 6 Nov 2018 21:35:11 +0100

x86/mm: Move LDT remap out of KASLR region on 5-level paging

On 5-level paging the LDT remap area is placed in the middle of the KASLR
randomization region and it can overlap with the direct mapping, the
vmalloc or the vmap area.

The LDT mapping is per mm, so it cannot be moved into the P4D page table
next to the CPU_ENTRY_AREA without complicating PGD table allocation for
5-level paging.

The 4 PGD slot gap just before the direct mapping is reserved for
hypervisors, so it cannot be used.

Move the direct mapping one slot deeper and use the resulting gap for the
LDT remap area. The resulting layout is the same for 4 and 5 level paging.

[ tglx: Massaged changelog ]

Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com

---
 Documentation/x86/x86_64/mm.txt         | 34 +++++++++++++++++----------------
 arch/x86/include/asm/page_64_types.h    | 12 +++++++-----
 arch/x86/include/asm/pgtable_64_types.h |  4 +---
 arch/x86/xen/mmu_pv.c                   |  6 +++---
 4 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 73aaaa3da436..804f9426ed17 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
- ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
- ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
+ ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
+ ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
+ ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
  ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
  ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
  ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
  ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
  ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
-                                                            | Identical layout to the 47-bit one from here on:
+                                                            | Identical layout to the 56-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
@@ -83,7 +84,7 @@ Notes:
 __________________|____________|__________________|_________|___________________________________________________________
                   |            |                  |         |
  0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
-                  |            |                  |         |     virtual memory addresses up to the -128 TB
+                  |            |                  |         |     virtual memory addresses up to the -64 PB
                   |            |                  |         |     starting offset of kernel mappings.
 __________________|____________|__________________|_________|___________________________________________________________
                                                             |
@@ -91,23 +92,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
- ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
- ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
+ ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
+ ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
+ ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
  ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
  ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
  ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
  ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
  ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
                                                             | Identical layout to the 47-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index cd0cf1c568b4..8f657286d599 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -33,12 +33,14 @@
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
- * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
- * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
- * what Xen requires.
+ * PGDIR_SIZE*17 (pgd slot 273).
+ *
+ * The gap is to allow a space for LDT remap for PTI (1 pgd slot) and space for
+ * a hypervisor (16 slots). Choosing 16 slots for a hypervisor is arbitrary,
+ * but it's what Xen requires.
  */
-#define __PAGE_OFFSET_BASE_L5	_AC(0xff10000000000000, UL)
-#define __PAGE_OFFSET_BASE_L4	_AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE_L5	_AC(0xff11000000000000, UL)
+#define __PAGE_OFFSET_BASE_L4	_AC(0xffff888000000000, UL)
 
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 #define __PAGE_OFFSET           page_offset_base
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04edd2d58211..84bd9bdc1987 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -111,9 +111,7 @@ extern unsigned int ptrs_per_p4d;
  */
 #define MAXMEM			(1UL << MAX_PHYSMEM_BITS)
 
-#define LDT_PGD_ENTRY_L4	-3UL
-#define LDT_PGD_ENTRY_L5	-112UL
-#define LDT_PGD_ENTRY		(pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
+#define LDT_PGD_ENTRY		-240UL
 #define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
 #define LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 0d7b3ae4960b..a5d7ed125337 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1905,7 +1905,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	init_top_pgt[0] = __pgd(0);
 
 	/* Pre-constructed entries are in pfn, so convert to mfn */
-	/* L4[272] -> level3_ident_pgt  */
+	/* L4[273] -> level3_ident_pgt  */
 	/* L4[511] -> level3_kernel_pgt */
 	convert_pfn_mfn(init_top_pgt);
 
@@ -1925,8 +1925,8 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	addr[0] = (unsigned long)pgd;
 	addr[1] = (unsigned long)l3;
 	addr[2] = (unsigned long)l2;
-	/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
-	 * Both L4[272][0] and L4[511][510] have entries that point to the same
+	/* Graft it onto L4[273][0]. Note that we creating an aliasing problem:
+	 * Both L4[273][0] and L4[511][510] have entries that point to the same
 	 * L2 (PMD) tables. Meaning that if you modify it in __va space
 	 * it will be also modified in the __ka space! (But if you just
 	 * modify the PMD table to point to other PTE's or none, then you

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:x86/urgent] x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  2018-10-26 12:28 ` [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
  2018-10-31 12:17   ` Kirill A. Shutemov
@ 2018-11-06 20:40   ` tip-bot for Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-11-06 20:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, hpa, linux-kernel, tglx, kirill.shutemov

Commit-ID:  a0e6e0831c516860fc7f9be1db6c081fe902ebcf
Gitweb:     https://git.kernel.org/tip/a0e6e0831c516860fc7f9be1db6c081fe902ebcf
Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 26 Oct 2018 15:28:55 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 6 Nov 2018 21:35:11 +0100

x86/ldt: Unmap PTEs for the slot before freeing LDT pages

modify_ldt(2) leaves the old LDT mapped after switching over to the new
one. The old LDT gets freed and the pages can be re-used.

Leaving the mapping in place can have security implications. The mapping is
present in the userspace page tables and Meltdown-like attacks can read
these freed and possibly reused pages.

It's relatively simple to fix: unmap the old LDT and flush TLB before
freeing the old LDT memory.

This further allows to avoid flushing the TLB in map_ldt_struct() as the
slot is unmapped and flushed by unmap_ldt_struct() or has never been mapped
at all.

[ tglx: Massaged changelog and removed the needless line breaks ]

Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-3-kirill.shutemov@linux.intel.com

---
 arch/x86/kernel/ldt.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ab18e0884dc6..18e4525c5933 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -199,14 +199,6 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
- *
- * There is no corresponding unmap function.  Even if the LDT is freed, we
- * leave the PTEs around until the slot is reused or the mm is destroyed.
- * This is harmless: the LDT is always in ordinary memory, and no one will
- * access the freed slot.
- *
- * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
- * it useful, and the flush would slow down modify_ldt().
  */
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
@@ -214,8 +206,8 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	unsigned long va;
 	bool is_vmalloc;
 	spinlock_t *ptl;
+	int i, nr_pages;
 	pgd_t *pgd;
-	int i;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return 0;
@@ -238,7 +230,9 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
-	for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+
+	for (i = 0; i < nr_pages; i++) {
 		unsigned long offset = i << PAGE_SHIFT;
 		const void *src = (char *)ldt->entries + offset;
 		unsigned long pfn;
@@ -272,13 +266,39 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Propagate LDT mapping to the user page-table */
 	map_ldt_struct_to_user(mm);
 
-	va = (unsigned long)ldt_slot_va(slot);
-	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, PAGE_SHIFT, false);
-
 	ldt->slot = slot;
 	return 0;
 }
 
+static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+	unsigned long va;
+	int i, nr_pages;
+
+	if (!ldt)
+		return;
+
+	/* LDT map/unmap is only required for PTI */
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+
+	for (i = 0; i < nr_pages; i++) {
+		unsigned long offset = i << PAGE_SHIFT;
+		spinlock_t *ptl;
+		pte_t *ptep;
+
+		va = (unsigned long)ldt_slot_va(ldt->slot) + offset;
+		ptep = get_locked_pte(mm, va, &ptl);
+		pte_clear(mm, va, ptep);
+		pte_unmap_unlock(ptep, ptl);
+	}
+
+	va = (unsigned long)ldt_slot_va(ldt->slot);
+	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false);
+}
+
 #else /* !CONFIG_PAGE_TABLE_ISOLATION */
 
 static int
@@ -286,6 +306,10 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
 	return 0;
 }
+
+static void unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+}
 #endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 static void free_ldt_pgtables(struct mm_struct *mm)
@@ -524,6 +548,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	}
 
 	install_ldt(mm, new_ldt);
+	unmap_ldt_struct(mm, old_ldt);
 	free_ldt_struct(old_ldt);
 	error = 0;
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:x86/urgent] x86/ldt: Remove unused variable in map_ldt_struct()
  2018-10-26 12:28 ` [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct() Kirill A. Shutemov
  2018-11-02 21:08   ` Andy Lutomirski
@ 2018-11-06 20:40   ` tip-bot for Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-11-06 20:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, hpa, mingo, kirill.shutemov, linux-kernel, luto

Commit-ID:  b082f2dd80612015cd6d9d84e52099734ec9a0e1
Gitweb:     https://git.kernel.org/tip/b082f2dd80612015cd6d9d84e52099734ec9a0e1
Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 26 Oct 2018 15:28:56 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 6 Nov 2018 21:35:11 +0100

x86/ldt: Remove unused variable in map_ldt_struct()

Splitting out the sanity check in map_ldt_struct() moved page table syncing
into a separate function, which made the pgd variable unused. Remove it.

[ tglx: Massaged changelog ]

Fixes: 9bae3197e15d ("x86/ldt: Split out sanity check in map_ldt_struct()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-4-kirill.shutemov@linux.intel.com

---
 arch/x86/kernel/ldt.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 18e4525c5933..6135ae8ce036 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -207,7 +207,6 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	bool is_vmalloc;
 	spinlock_t *ptl;
 	int i, nr_pages;
-	pgd_t *pgd;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return 0;
@@ -221,13 +220,6 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Check if the current mappings are sane */
 	sanity_check_ldt_mapping(mm);
 
-	/*
-	 * Did we already have the top level entry allocated?  We can't
-	 * use pgd_none() for this because it doens't do anything on
-	 * 4-level page table kernels.
-	 */
-	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
 	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
  2018-11-02 21:07   ` Andy Lutomirski
  2018-11-06 20:39   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
@ 2018-11-10 12:29   ` Baoquan He
  2018-11-23 15:58     ` Kirill A. Shutemov
  2 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2018-11-10 12:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, boris.ostrovsky,
	jgross, willy, x86, linux-mm, linux-kernel

On 10/26/18 at 03:28pm, Kirill A. Shutemov wrote:
> On 5-level paging LDT remap area is placed in the middle of
> KASLR randomization region and it can overlap with direct mapping,
> vmalloc or vmap area.
             ~~~
		We usually call it vmemmap.
> 
> Let's move LDT just before direct mapping which makes it safe for KASLR.
> This also allows us to unify layout between 4- and 5-level paging.
...

> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index 702898633b00..75bff98928a8 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
>  ____________________________________________________________|___________________________________________________________
>                    |            |                  |         |
>   ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> - ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> - ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> + ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
> + ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> + ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole

Hi Kirill,

Thanks for this fix. One small concern is whether we can put LDT
remap in other place, e.g shrink KASAN area and save one pgd size for
it, Just from Redhat's enterprise relase point of view, we don't
enable CONFIG_KASAN, and LDT is rarely used for server, now cutting one
block from the direct mapping area and moving it up one pgd slot seems a
little too abrupt. Does KASAN really cost 16 TB in 4-level and 8 PB in
5-level? After all the direct mapping is the core mapping and has been
there always, LDT remap is kind of not so core and important mapping.
Just a very perceptual feeling.

Other than this, this patch looks good to me.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-11-10 12:29   ` [PATCHv3 1/3] " Baoquan He
@ 2018-11-23 15:58     ` Kirill A. Shutemov
  2018-12-03  3:01       ` Baoquan He
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-11-23 15:58 UTC (permalink / raw)
  To: Baoquan He
  Cc: Kirill A. Shutemov, tglx, mingo, bp, hpa, dave.hansen, luto,
	peterz, boris.ostrovsky, jgross, willy, x86, linux-mm,
	linux-kernel

On Sat, Nov 10, 2018 at 08:29:05PM +0800, Baoquan He wrote:
> > diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> > index 702898633b00..75bff98928a8 100644
> > --- a/Documentation/x86/x86_64/mm.txt
> > +++ b/Documentation/x86/x86_64/mm.txt
> > @@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
> >  ____________________________________________________________|___________________________________________________________
> >                    |            |                  |         |
> >   ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
> > - ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> > - ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
> > + ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
> > + ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
> > + ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
> 
> Hi Kirill,
> 
> Thanks for this fix. One small concern is whether we can put LDT
> remap in other place, e.g shrink KASAN area and save one pgd size for
> it, Just from Redhat's enterprise relase point of view, we don't
> enable CONFIG_KASAN, and LDT is rarely used for server, now cutting one
> block from the direct mapping area and moving it up one pgd slot seems a
> little too abrupt. Does KASAN really cost 16 TB in 4-level and 8 PB in
> 5-level? After all the direct mapping is the core mapping and has been
> there always, LDT remap is kind of not so core and important mapping.
> Just a very perceptual feeling.

Sorry for late reply.

KASAN requires one byte of shadow memory per 8 bytes of target memory, so,
yeah, we need 16 TiB of virtual address space with 4-level paging.

With 5-level, we might save some address space as the limit for physical
address space if 52-bit, not 55. I dedicated 55-bit address space because
it was easier: just scale 4-level layout by factor of 9 and you'll get all
nicely aligned without much thought (PGD translates to PGD, etc).

There is also complication with KASAN layout. We have to have the same
KASAN_SHADOW_OFFSET between 4- and 5-level paging to make boot time
switching between paging modes work. The offset cannot be changed at
runtime: it used as parameter to compiler. That's the reason KASAN area
alignment looks strange.

A possibly better solution would be to actually include LDT in KASLR:
randomize the area along with direct mapping, vmalloc and vmemmap.
But it's more complexity than I found reasonable for a fix.

Do you want to try this? :)

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-11-23 15:58     ` Kirill A. Shutemov
@ 2018-12-03  3:01       ` Baoquan He
  2018-12-03  9:26         ` Kirill A. Shutemov
  0 siblings, 1 reply; 14+ messages in thread
From: Baoquan He @ 2018-12-03  3:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, tglx, mingo, bp, hpa, dave.hansen, luto,
	peterz, boris.ostrovsky, jgross, willy, x86, linux-mm,
	linux-kernel

Hi Kirill,

On 11/23/18 at 06:58pm, Kirill A. Shutemov wrote:
> > Thanks for this fix. One small concern is whether we can put LDT
> > remap in other place, e.g shrink KASAN area and save one pgd size for
> > it, Just from Redhat's enterprise relase point of view, we don't
> > enable CONFIG_KASAN, and LDT is rarely used for server, now cutting one
> > block from the direct mapping area and moving it up one pgd slot seems a
> > little too abrupt. Does KASAN really cost 16 TB in 4-level and 8 PB in
> > 5-level? After all the direct mapping is the core mapping and has been
> > there always, LDT remap is kind of not so core and important mapping.
> > Just a very perceptual feeling.
> 
> KASAN requires one byte of shadow memory per 8 bytes of target memory, so,
> yeah, we need 16 TiB of virtual address space with 4-level paging.
> 
> With 5-level, we might save some address space as the limit for physical
> address space if 52-bit, not 55. I dedicated 55-bit address space because
> it was easier: just scale 4-level layout by factor of 9 and you'll get all
> nicely aligned without much thought (PGD translates to PGD, etc).
> 
> There is also complication with KASAN layout. We have to have the same
> KASAN_SHADOW_OFFSET between 4- and 5-level paging to make boot time
> switching between paging modes work. The offset cannot be changed at
> runtime: it used as parameter to compiler. That's the reason KASAN area
> alignment looks strange.

Thanks for explanation. KASAN area can't be touched as you said.

> 
> A possibly better solution would be to actually include LDT in KASLR:
> randomize the area along with direct mapping, vmalloc and vmemmap.
> But it's more complexity than I found reasonable for a fix.
> 
> Do you want to try this? :)

                                                           |
Seems the unused hole between vmemmap and KASAN can be used. e.g put LDT
remap in -20.5 TB place like below. And meanwhile 
____________________________________________________________|___________________________________________________________
                  |            |                  |         |
 ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
 ffff888000000000 | -120    TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
 ffffc88000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
 ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
 ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
 ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
 ffffeb0000000000 |  -21    TB | ffffebffffffffff |  0.5 TB | ... unused hole
 ffffeb0000000000 |  -20.5  TB | ffffebffffffffff |  0.5 TB | LDT remap for PTI 
 ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
__________________|____________|__________________|_________|____________________________________________________________

In non-KASLR case, only 0.5 TB left as hole between vmemmap and LDT.
Meanwhile since LDT remap only costs 128 KB at most at the beginning,
the left area can be seen as guard hole between it and KASAN.

And yes, in KASLR case, we have to take it with the old three regions
together to randomize.

It looks do-able, not sure if the test case is complicated or not, if
not hard, I can have a try. And I have some internal bugs, can focus on
this later. I saw you posted another patchset to fix xen issue, it may
not be needed any more if we take this way?

And not sure if other people have different idea.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-12-03  3:01       ` Baoquan He
@ 2018-12-03  9:26         ` Kirill A. Shutemov
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill A. Shutemov @ 2018-12-03  9:26 UTC (permalink / raw)
  To: Baoquan He
  Cc: Kirill A. Shutemov, tglx, mingo, bp, hpa, dave.hansen, luto,
	peterz, boris.ostrovsky, jgross, willy, x86, linux-mm,
	linux-kernel

On Mon, Dec 03, 2018 at 11:01:00AM +0800, Baoquan He wrote:
> It looks do-able, not sure if the test case is complicated or not, if
> not hard, I can have a try. And I have some internal bugs, can focus on
> this later. I saw you posted another patchset to fix xen issue, it may
> not be needed any more if we take this way?

Well, it depends on what is the first in the KALSR group. The fix will not
be needed if direct mapping comes the first.

But I would rather go with the patch anyway. The hypervisor hole is part
of ABI and we should not calculate it based on other movable entity
(direct mapping, LDT remap, whatever). It's too fragile.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-03  9:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-26 12:28 [PATCHv3 0/3] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
2018-10-26 12:28 ` [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
2018-11-02 21:07   ` Andy Lutomirski
2018-11-06 20:39   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
2018-11-10 12:29   ` [PATCHv3 1/3] " Baoquan He
2018-11-23 15:58     ` Kirill A. Shutemov
2018-12-03  3:01       ` Baoquan He
2018-12-03  9:26         ` Kirill A. Shutemov
2018-10-26 12:28 ` [PATCHv3 2/3] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
2018-10-31 12:17   ` Kirill A. Shutemov
2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
2018-10-26 12:28 ` [PATCHv3 3/3] x86/ldt: Remove unused variable in map_ldt_struct() Kirill A. Shutemov
2018-11-02 21:08   ` Andy Lutomirski
2018-11-06 20:40   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).