LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] arm64: mm: enable per pmd page table lock
@ 2019-02-14 21:16 Yu Zhao
  2019-02-18 15:12 ` Will Deacon
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-14 21:16 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
 arch/arm64/include/asm/tlb.h     |  5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..104325a1ffc3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+	def_bool y
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	---help---
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 52fa47c73bf0..dabba4b2c61f 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -33,12 +33,22 @@
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)__get_free_page(PGALLOC_GFP);
+	struct page *page;
+
+	page = alloc_page(PGALLOC_GFP);
+	if (!page)
+		return NULL;
+	if (!pgtable_pmd_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
+	return page_address(page);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
 	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+	pgtable_pmd_page_dtor(virt_to_page(pmdp));
 	free_page((unsigned long)pmdp);
 }
 
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 106fdc951b6e..4e3becfed387 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pmdp));
+	struct page *page = virt_to_page(pmdp);
+
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] arm64: mm: enable per pmd page table lock
  2019-02-14 21:16 [PATCH] arm64: mm: enable per pmd page table lock Yu Zhao
@ 2019-02-18 15:12 ` Will Deacon
  2019-02-18 19:49   ` Yu Zhao
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
  2019-02-19  3:08 ` [PATCH] " Anshuman Khandual
  2 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2019-02-18 15:12 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Catalin Marinas, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	mark.rutland

[+Mark]

On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4168d366127..104325a1ffc3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>  config ARCH_HAS_CACHE_LINE_SIZE
>  	def_bool y
>  
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> +	def_bool y
> +
>  config SECCOMP
>  	bool "Enable seccomp to safely compute untrusted bytecode"
>  	---help---
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 52fa47c73bf0..dabba4b2c61f 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -33,12 +33,22 @@
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> +	struct page *page;
> +
> +	page = alloc_page(PGALLOC_GFP);
> +	if (!page)
> +		return NULL;
> +	if (!pgtable_pmd_page_ctor(page)) {
> +		__free_page(page);
> +		return NULL;
> +	}
> +	return page_address(page);

I'm a bit worried as to how this interacts with the page-table code in
arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
looks like that currently always calls pgtable_page_ctor(), regardless of
level. Do we now need a separate allocator function for the PMD level?

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] arm64: mm: enable per pmd page table lock
  2019-02-18 15:12 ` Will Deacon
@ 2019-02-18 19:49   ` Yu Zhao
  2019-02-18 20:48     ` Yu Zhao
  2019-02-19  4:09     ` Anshuman Khandual
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-18 19:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	mark.rutland

On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
> [+Mark]
> 
> On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> > Switch from per mm_struct to per pmd page table lock by enabling
> > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > large system.
> > 
> > I'm not sure if there is contention on mm->page_table_lock. Given
> > the option comes at no cost (apart from initializing more spin
> > locks), why not enable it now.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> >  3 files changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index a4168d366127..104325a1ffc3 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> >  config ARCH_HAS_CACHE_LINE_SIZE
> >  	def_bool y
> >  
> > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > +	def_bool y
> > +
> >  config SECCOMP
> >  	bool "Enable seccomp to safely compute untrusted bytecode"
> >  	---help---
> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > index 52fa47c73bf0..dabba4b2c61f 100644
> > --- a/arch/arm64/include/asm/pgalloc.h
> > +++ b/arch/arm64/include/asm/pgalloc.h
> > @@ -33,12 +33,22 @@
> >  
> >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> >  {
> > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > +	struct page *page;
> > +
> > +	page = alloc_page(PGALLOC_GFP);
> > +	if (!page)
> > +		return NULL;
> > +	if (!pgtable_pmd_page_ctor(page)) {
> > +		__free_page(page);
> > +		return NULL;
> > +	}
> > +	return page_address(page);
> 
> I'm a bit worried as to how this interacts with the page-table code in
> arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
> looks like that currently always calls pgtable_page_ctor(), regardless of
> level. Do we now need a separate allocator function for the PMD level?

Thanks for reminding me, I never noticed this. The short answer is
no.

I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
there because it's also compatible with pud, and pmd too without
this patch. So your concern is valid. Thanks again.

Why my answer is no? Because I don't think the ctor matters for
pgd_pgtable_alloc(). The ctor is only required for userspace page
tables, and that's why we don't have it in pte_alloc_one_kernel().
AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
userspace page tables in any other arch).

So to avoid future confusion, we might just remove the ctor from
pgd_pgtable_alloc().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] arm64: mm: enable per pmd page table lock
  2019-02-18 19:49   ` Yu Zhao
@ 2019-02-18 20:48     ` Yu Zhao
  2019-02-19  4:09     ` Anshuman Khandual
  1 sibling, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-18 20:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	mark.rutland

On Mon, Feb 18, 2019 at 12:49:38PM -0700, Yu Zhao wrote:
> On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
> > [+Mark]
> > 
> > On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> > > Switch from per mm_struct to per pmd page table lock by enabling
> > > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > > large system.
> > > 
> > > I'm not sure if there is contention on mm->page_table_lock. Given
> > > the option comes at no cost (apart from initializing more spin
> > > locks), why not enable it now.
> > > 
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > ---
> > >  arch/arm64/Kconfig               |  3 +++
> > >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> > >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> > >  3 files changed, 18 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index a4168d366127..104325a1ffc3 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> > >  config ARCH_HAS_CACHE_LINE_SIZE
> > >  	def_bool y
> > >  
> > > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > > +	def_bool y
> > > +
> > >  config SECCOMP
> > >  	bool "Enable seccomp to safely compute untrusted bytecode"
> > >  	---help---
> > > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > > index 52fa47c73bf0..dabba4b2c61f 100644
> > > --- a/arch/arm64/include/asm/pgalloc.h
> > > +++ b/arch/arm64/include/asm/pgalloc.h
> > > @@ -33,12 +33,22 @@
> > >  
> > >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> > >  {
> > > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > > +	struct page *page;
> > > +
> > > +	page = alloc_page(PGALLOC_GFP);
> > > +	if (!page)
> > > +		return NULL;
> > > +	if (!pgtable_pmd_page_ctor(page)) {
> > > +		__free_page(page);
> > > +		return NULL;
> > > +	}
> > > +	return page_address(page);
> > 
> > I'm a bit worried as to how this interacts with the page-table code in
> > arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
> > looks like that currently always calls pgtable_page_ctor(), regardless of
> > level. Do we now need a separate allocator function for the PMD level?
> 
> Thanks for reminding me, I never noticed this. The short answer is
> no.
> 
> I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
> there because it's also compatible with pud, and pmd too without
> this patch. So your concern is valid. Thanks again.
> 
> Why my answer is no? Because I don't think the ctor matters for
> pgd_pgtable_alloc(). The ctor is only required for userspace page
> tables, and that's why we don't have it in pte_alloc_one_kernel().
> AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
> pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
> userspace page tables in any other arch).
> 
> So to avoid future confusion, we might just remove the ctor from
> pgd_pgtable_alloc().

I'm sorry. I've missed that we call apply_to_page_range() on efi_mm.
The function does require the ctor. So we actually can't remove it.
Though pgtable_page_ctor() also does the work adequately for pmd in
terms of giving apply_to_page_range() what it requires, it would be
more appropriate to use pgtable_pmd_page_ctor() instead (and not
calling the ctor at all on pud).

I could add this change prior to this patch, if it makes sense to
you. Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-14 21:16 [PATCH] arm64: mm: enable per pmd page table lock Yu Zhao
  2019-02-18 15:12 ` Will Deacon
@ 2019-02-18 23:13 ` Yu Zhao
  2019-02-18 23:13   ` [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
                     ` (4 more replies)
  2019-02-19  3:08 ` [PATCH] " Anshuman Khandual
  2 siblings, 5 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-18 23:13 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, Yu Zhao

For pte page, use pgtable_page_ctor(); for pmd page, use
pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
p4d and pgd), don't use any.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b6f5aa52ac67..fa7351877af3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -98,7 +98,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(void)
+static phys_addr_t __init early_pgtable_alloc(int shift)
 {
 	phys_addr_t phys;
 	void *ptr;
@@ -173,7 +173,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void),
+				phys_addr_t (*pgtable_alloc)(int),
 				int flags)
 {
 	unsigned long next;
@@ -183,7 +183,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 	if (pmd_none(pmd)) {
 		phys_addr_t pte_phys;
 		BUG_ON(!pgtable_alloc);
-		pte_phys = pgtable_alloc();
+		pte_phys = pgtable_alloc(PAGE_SHIFT);
 		__pmd_populate(pmdp, pte_phys, PMD_TYPE_TABLE);
 		pmd = READ_ONCE(*pmdp);
 	}
@@ -207,7 +207,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 
 static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 		     phys_addr_t phys, pgprot_t prot,
-		     phys_addr_t (*pgtable_alloc)(void), int flags)
+		     phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pmd_t *pmdp;
@@ -245,7 +245,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void), int flags)
+				phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pud_t pud = READ_ONCE(*pudp);
@@ -257,7 +257,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 	if (pud_none(pud)) {
 		phys_addr_t pmd_phys;
 		BUG_ON(!pgtable_alloc);
-		pmd_phys = pgtable_alloc();
+		pmd_phys = pgtable_alloc(PMD_SHIFT);
 		__pud_populate(pudp, pmd_phys, PUD_TYPE_TABLE);
 		pud = READ_ONCE(*pudp);
 	}
@@ -293,7 +293,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 
 static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
-			   phys_addr_t (*pgtable_alloc)(void),
+			   phys_addr_t (*pgtable_alloc)(int),
 			   int flags)
 {
 	unsigned long next;
@@ -303,7 +303,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 	if (pgd_none(pgd)) {
 		phys_addr_t pud_phys;
 		BUG_ON(!pgtable_alloc);
-		pud_phys = pgtable_alloc();
+		pud_phys = pgtable_alloc(PUD_SHIFT);
 		__pgd_populate(pgdp, pud_phys, PUD_TYPE_TABLE);
 		pgd = READ_ONCE(*pgdp);
 	}
@@ -344,7 +344,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
-				 phys_addr_t (*pgtable_alloc)(void),
+				 phys_addr_t (*pgtable_alloc)(int),
 				 int flags)
 {
 	unsigned long addr, length, end, next;
@@ -370,11 +370,20 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
-static phys_addr_t pgd_pgtable_alloc(void)
+static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
-	if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
-		BUG();
+	BUG_ON(!ptr);
+
+	/*
+	 * Initialize page table locks in case later we need to
+	 * call core mm functions like apply_to_page_range() on
+	 * this pre-allocated page table.
+	 */
+	if (shift == PAGE_SHIFT)
+		BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
+	else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT)
+		BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr)));
 
 	/* Ensure the zeroed page is visible to the page table walker */
 	dsb(ishst);
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
@ 2019-02-18 23:13   ` Yu Zhao
  2019-02-26 15:13     ` Mark Rutland
  2019-02-18 23:13   ` [PATCH v2 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-02-18 23:13 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, Yu Zhao

init_mm doesn't require page table lock to be initialized at
any level. Add a separate page table allocator for it, and the
new one skips page table ctors.

The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not
calling them avoids memory leak in case we call pte_free_kernel()
on init_mm.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index fa7351877af3..e8bf8a6300e8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -370,6 +370,16 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
+static phys_addr_t pgd_kernel_pgtable_alloc(int shift)
+{
+	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
+	BUG_ON(!ptr);
+
+	/* Ensure the zeroed page is visible to the page table walker */
+	dsb(ishst);
+	return __pa(ptr);
+}
+
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
@@ -591,7 +601,7 @@ static int __init map_entry_trampoline(void)
 	/* Map only the text into the trampoline page table */
 	memset(tramp_pg_dir, 0, PGD_SIZE);
 	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
-			     prot, pgd_pgtable_alloc, 0);
+			     prot, pgd_kernel_pgtable_alloc, 0);
 
 	/* Map both the text and data into the kernel page table */
 	__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
@@ -1067,7 +1077,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
-			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
+			     size, PAGE_KERNEL, pgd_kernel_pgtable_alloc,
+			     flags);
 
 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
 			   altmap, want_memblock);
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 3/3] arm64: mm: enable per pmd page table lock
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
  2019-02-18 23:13   ` [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
@ 2019-02-18 23:13   ` Yu Zhao
  2019-02-19  4:21   ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-18 23:13 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm, Yu Zhao

Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
 arch/arm64/include/asm/tlb.h     |  5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..8dbfa49d926c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+	def_bool y if PGTABLE_LEVELS > 2
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	---help---
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 52fa47c73bf0..dabba4b2c61f 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -33,12 +33,22 @@
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)__get_free_page(PGALLOC_GFP);
+	struct page *page;
+
+	page = alloc_page(PGALLOC_GFP);
+	if (!page)
+		return NULL;
+	if (!pgtable_pmd_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
+	return page_address(page);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
 	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+	pgtable_pmd_page_dtor(virt_to_page(pmdp));
 	free_page((unsigned long)pmdp);
 }
 
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 106fdc951b6e..4e3becfed387 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pmdp));
+	struct page *page = virt_to_page(pmdp);
+
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] arm64: mm: enable per pmd page table lock
  2019-02-14 21:16 [PATCH] arm64: mm: enable per pmd page table lock Yu Zhao
  2019-02-18 15:12 ` Will Deacon
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
@ 2019-02-19  3:08 ` Anshuman Khandual
  2 siblings, 0 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-19  3:08 UTC (permalink / raw)
  To: Yu Zhao, Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm



On 02/15/2019 02:46 AM, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 

This has similar changes to what I had posted part of the general page table
page accounting clean up series on arm64 last month.

https://www.spinics.net/lists/arm-kernel/msg701954.html
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] arm64: mm: enable per pmd page table lock
  2019-02-18 19:49   ` Yu Zhao
  2019-02-18 20:48     ` Yu Zhao
@ 2019-02-19  4:09     ` Anshuman Khandual
  1 sibling, 0 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-19  4:09 UTC (permalink / raw)
  To: Yu Zhao, Will Deacon
  Cc: Catalin Marinas, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm,
	mark.rutland



On 02/19/2019 01:19 AM, Yu Zhao wrote:
> On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
>> [+Mark]
>>
>> On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
>>> Switch from per mm_struct to per pmd page table lock by enabling
>>> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
>>> large system.
>>>
>>> I'm not sure if there is contention on mm->page_table_lock. Given
>>> the option comes at no cost (apart from initializing more spin
>>> locks), why not enable it now.
>>>
>>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>>> ---
>>>  arch/arm64/Kconfig               |  3 +++
>>>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>>>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>>>  3 files changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index a4168d366127..104325a1ffc3 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>>>  config ARCH_HAS_CACHE_LINE_SIZE
>>>  	def_bool y
>>>  
>>> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>>> +	def_bool y
>>> +
>>>  config SECCOMP
>>>  	bool "Enable seccomp to safely compute untrusted bytecode"
>>>  	---help---
>>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>>> index 52fa47c73bf0..dabba4b2c61f 100644
>>> --- a/arch/arm64/include/asm/pgalloc.h
>>> +++ b/arch/arm64/include/asm/pgalloc.h
>>> @@ -33,12 +33,22 @@
>>>  
>>>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>>>  {
>>> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
>>> +	struct page *page;
>>> +
>>> +	page = alloc_page(PGALLOC_GFP);
>>> +	if (!page)
>>> +		return NULL;
>>> +	if (!pgtable_pmd_page_ctor(page)) {
>>> +		__free_page(page);
>>> +		return NULL;
>>> +	}
>>> +	return page_address(page);
>>
>> I'm a bit worried as to how this interacts with the page-table code in
>> arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
>> looks like that currently always calls pgtable_page_ctor(), regardless of
>> level. Do we now need a separate allocator function for the PMD level?> 
> Thanks for reminding me, I never noticed this. The short answer is
> no.
> 
> I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
> there because it's also compatible with pud, and pmd too without
> this patch. So your concern is valid. Thanks again.

pgtable_page_ctor() acts on a given page used as page table at any level
which sets appropriate page type (page flag PG_table) and increments the
zone stat for NR_PAGETABLE. pgtable_page_dtor() exactly does the inverse.

These two complimentary operations are required for every level page table
pages for their proper initialization, identification in buddy and zone
statistics. Hence these need to be called for all level page table pages.

pgtable_pmd_page_ctor()/pgtable_pmd_page_dtor() on the other hand just
init/free page table lock on the page for !THP cases and additionally
init page->pmd_huge_pte (deposited page table page) for THP cases.
Some archs seem to be calling pgtable_pmd_page_ctor() in place of
pgtable_page_ctor(). Wondering would not that approach skip page flag
and accounting requirements.

> 
> Why my answer is no? Because I don't think the ctor matters for
> pgd_pgtable_alloc(). The ctor is only required for userspace page
> tables, and that's why we don't have it in pte_alloc_one_kernel().

At present on arm64 certain kernel page table page allocations call
pgtable_pmd_page_ctor() and some dont. The series which I had posted
make sure that all kernel and user page table page allocations go through
pgtable_page_ctor()/dtor(). These constructs are required for kernel
page table pages as well for accurate init and accounting not just for
user space. The series just skips vmemmap struct page mapping from this
as that would require generic sparse vmemmap allocation/free functions
which I believe should also be changed going forward as well.

> AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
> pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
> userspace page tables in any other arch).
> 
> So to avoid future confusion, we might just remove the ctor from
> pgd_pgtable_alloc().

No. Instead we should just make sure the that those pages go through
dtor() destructor path when getting freed and the clean up series
does that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
  2019-02-18 23:13   ` [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
  2019-02-18 23:13   ` [PATCH v2 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
@ 2019-02-19  4:21   ` Anshuman Khandual
  2019-02-19  5:32     ` Yu Zhao
  2019-02-26 15:12   ` Mark Rutland
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
  4 siblings, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-19  4:21 UTC (permalink / raw)
  To: Yu Zhao, Catalin Marinas, Will Deacon
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm



On 02/19/2019 04:43 AM, Yu Zhao wrote:
> For pte page, use pgtable_page_ctor(); for pmd page, use
> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> p4d and pgd), don't use any.
pgtable_page_ctor()/dtor() is not optional for any level page table page
as it determines the struct page state and zone statistics. We should not
skip it for any page table page. As stated before pgtable_pmd_page_ctor()
is not a replacement for pgtable_page_ctor().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19  4:21   ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
@ 2019-02-19  5:32     ` Yu Zhao
  2019-02-19  6:17       ` Anshuman Khandual
  2019-02-20 21:03       ` Matthew Wilcox
  0 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-02-19  5:32 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Mark Rutland, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm

On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
> 
> 
> On 02/19/2019 04:43 AM, Yu Zhao wrote:
> > For pte page, use pgtable_page_ctor(); for pmd page, use
> > pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> > p4d and pgd), don't use any.
> pgtable_page_ctor()/dtor() is not optional for any level page table page
> as it determines the struct page state and zone statistics.

This is not true. pgtable_page_ctor() is only meant for user pte
page. The name isn't perfect (we named it this way before we had
split pmd page table lock, and never bothered to change it).

The commit cccd843f54be ("mm: mark pages in use for page tables")
clearly states so:
  Note that only pages currently accounted as NR_PAGETABLES are
  tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.

I'm sure if we go back further, we can find similar stories: we
don't set PageTable on page tables other than pte; and we don't
account page tables other than pte. I don't have any objection if
you want change these two. But please make sure they are consistent
across all archs.

> We should not skip it for any page table page.

In fact, calling it on pmd/pud/p4d is peculiar, and may even be
considered wrong. AFAIK, no other arch does so.

> As stated before pgtable_pmd_page_ctor() is not a replacement for
> pgtable_page_ctor().

pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd,
it's okay to use pgtable_page_ctor() instead only because kernel
doesn't have thp.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19  5:32     ` Yu Zhao
@ 2019-02-19  6:17       ` Anshuman Khandual
  2019-02-19 22:28         ` Yu Zhao
  2019-02-20  1:34         ` Matthew Wilcox
  2019-02-20 21:03       ` Matthew Wilcox
  1 sibling, 2 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-19  6:17 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Mark Rutland, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm, Matthew Wilcox

+ Matthew Wilcox

On 02/19/2019 11:02 AM, Yu Zhao wrote:
> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 02/19/2019 04:43 AM, Yu Zhao wrote:
>>> For pte page, use pgtable_page_ctor(); for pmd page, use
>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
>>> p4d and pgd), don't use any.
>> pgtable_page_ctor()/dtor() is not optional for any level page table page
>> as it determines the struct page state and zone statistics.
> 
> This is not true. pgtable_page_ctor() is only meant for user pte
> page. The name isn't perfect (we named it this way before we had
> split pmd page table lock, and never bothered to change it).
> 
> The commit cccd843f54be ("mm: mark pages in use for page tables")
> clearly states so:
>   Note that only pages currently accounted as NR_PAGETABLES are
>   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.

I think the commit is the following one and it does say so. But what is
the rationale of tagging only PTE page as PageTable and updating the zone
stat but not doing so for higher level page table pages ? Are not they
used as page table pages ? Should not they count towards NR_PAGETABLE ?

1d40a5ea01d53251c ("mm: mark pages in use for page tables")
> 
> I'm sure if we go back further, we can find similar stories: we
> don't set PageTable on page tables other than pte; and we don't
> account page tables other than pte. I don't have any objection if
> you want change these two. But please make sure they are consistent
> across all archs.

pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
for generalization which has been already acknowledged earlier. But for now we
can atleast fix this on arm64.

https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/

> 
>> We should not skip it for any page table page.
> 
> In fact, calling it on pmd/pud/p4d is peculiar, and may even be
> considered wrong. AFAIK, no other arch does so.

Why would it be considered wrong ? IIUC archs have their own understanding
of this and there are different implementations. But doing something for
PTE page and skipping for others is plain inconsistent.

> 
>> As stated before pgtable_pmd_page_ctor() is not a replacement for
>> pgtable_page_ctor().
> 
> pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd,
> it's okay to use pgtable_page_ctor() instead only because kernel
> doesn't have thp.

The only extra thing to be done for THP is initializing page->pmd_huge_pte
apart from calling pgtable_page_ctor(). Right not it just works on arm64
may be because page->pmd_huge_pte never gets accessed before it's init and
no path checks for it when not THP. Its better to init/reset pmd_huge_pte.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19  6:17       ` Anshuman Khandual
@ 2019-02-19 22:28         ` Yu Zhao
  2019-02-20 10:27           ` Anshuman Khandual
  2019-02-20  1:34         ` Matthew Wilcox
  1 sibling, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-02-19 22:28 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Mark Rutland, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm, Matthew Wilcox

On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
> + Matthew Wilcox
> 
> On 02/19/2019 11:02 AM, Yu Zhao wrote:
> > On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
> >>
> >>
> >> On 02/19/2019 04:43 AM, Yu Zhao wrote:
> >>> For pte page, use pgtable_page_ctor(); for pmd page, use
> >>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> >>> p4d and pgd), don't use any.
> >> pgtable_page_ctor()/dtor() is not optional for any level page table page
> >> as it determines the struct page state and zone statistics.
> > 
> > This is not true. pgtable_page_ctor() is only meant for user pte
> > page. The name isn't perfect (we named it this way before we had
> > split pmd page table lock, and never bothered to change it).
> > 
> > The commit cccd843f54be ("mm: mark pages in use for page tables")
> > clearly states so:
> >   Note that only pages currently accounted as NR_PAGETABLES are
> >   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
> 
> I think the commit is the following one and it does say so. But what is
> the rationale of tagging only PTE page as PageTable and updating the zone
> stat but not doing so for higher level page table pages ? Are not they
> used as page table pages ? Should not they count towards NR_PAGETABLE ?
> 
> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")

Well, I was just trying to clarify how the ctor is meant to be used.
The rational behind it is probably another topic.

For starters, the number of pmd/pud/p4d/pgd is at least two orders
of magnitude less than the number of pte, which makes them almost
negligible. And some archs use kmem for them, so it's infeasible to
SetPageTable on or account them in the way the ctor does on those
archs.

But, as I said, it's not something can't be changed. It's just not
the concern of this patch.

> > 
> > I'm sure if we go back further, we can find similar stories: we
> > don't set PageTable on page tables other than pte; and we don't
> > account page tables other than pte. I don't have any objection if
> > you want change these two. But please make sure they are consistent
> > across all archs.
> 
> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
> for generalization which has been already acknowledged earlier. But for now we
> can atleast fix this on arm64.
> 
> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/

This is again not true. Please stop making claims not backed up by
facts. And the link is completely irrelevant to the ctor.

I just checked *all* arches. Only four arches call the ctor outside
pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do
so not because they want to SetPageTable on or account pmd/pud/p4d/
pgd, but because they have to work around something, as arm/arm64
do.

> 
> > 
> >> We should not skip it for any page table page.
> > 
> > In fact, calling it on pmd/pud/p4d is peculiar, and may even be
> > considered wrong. AFAIK, no other arch does so.
> 
> Why would it be considered wrong ? IIUC archs have their own understanding
> of this and there are different implementations. But doing something for
> PTE page and skipping for others is plain inconsistent.

Allocating memory that will never be used is wrong. Please look into
the ctor and find out what exactly it does under different configs.

And why I said "may"? Because we know there is only negligible number
of pmd/pud/p4d, so the memory allocated may be considered negligible
as well.

> 
> > 
> >> As stated before pgtable_pmd_page_ctor() is not a replacement for
> >> pgtable_page_ctor().
> > 
> > pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd,
> > it's okay to use pgtable_page_ctor() instead only because kernel
> > doesn't have thp.
> 
> The only extra thing to be done for THP is initializing page->pmd_huge_pte
> apart from calling pgtable_page_ctor(). Right not it just works on arm64
> may be because page->pmd_huge_pte never gets accessed before it's init and
> no path checks for it when not THP. Its better to init/reset pmd_huge_pte.

This is not the reason. Arm64 gets by with calling
pgtable_page_ctor() on pmd because it only does so on efi_mm. efi_mm
is not user mm, therefore doesn't involve thp.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19  6:17       ` Anshuman Khandual
  2019-02-19 22:28         ` Yu Zhao
@ 2019-02-20  1:34         ` Matthew Wilcox
  2019-02-20  3:20           ` Anshuman Khandual
  1 sibling, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2019-02-20  1:34 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Yu Zhao, Catalin Marinas, Will Deacon, Aneesh Kumar K . V,
	Andrew Morton, Nick Piggin, Peter Zijlstra, Joel Fernandes,
	Kirill A . Shutemov, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm

On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
> + Matthew Wilcox
> On 02/19/2019 11:02 AM, Yu Zhao wrote:
> > On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
> >>
> >>
> >> On 02/19/2019 04:43 AM, Yu Zhao wrote:
> >>> For pte page, use pgtable_page_ctor(); for pmd page, use
> >>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> >>> p4d and pgd), don't use any.
> >> pgtable_page_ctor()/dtor() is not optional for any level page table page
> >> as it determines the struct page state and zone statistics.
> > 
> > This is not true. pgtable_page_ctor() is only meant for user pte
> > page. The name isn't perfect (we named it this way before we had
> > split pmd page table lock, and never bothered to change it).
> > 
> > The commit cccd843f54be ("mm: mark pages in use for page tables")

Where did you get that commit ID from?  In Linus' tree, it's
1d40a5ea01d53251c23c7be541d3f4a656cfc537

> > clearly states so:
> >   Note that only pages currently accounted as NR_PAGETABLES are
> >   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
> 
> I think the commit is the following one and it does say so. But what is
> the rationale of tagging only PTE page as PageTable and updating the zone
> stat but not doing so for higher level page table pages ? Are not they
> used as page table pages ? Should not they count towards NR_PAGETABLE ?
> 
> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")

I think they should all be accounted towards NR_PAGETABLE and marked
as being PageTable.  Somebody needs to make the case for that and
send the patches.  That patch even says that there should be follow-up
patches to do that.  I've been a little busy and haven't got back to it.
I thought you said you were going to do it.

> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
> for generalization which has been already acknowledged earlier. But for now we
> can atleast fix this on arm64.
> 
> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/

... were you not listening when you were told that was completely
inadequate?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-20  1:34         ` Matthew Wilcox
@ 2019-02-20  3:20           ` Anshuman Khandual
  0 siblings, 0 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-20  3:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Zhao, Catalin Marinas, Will Deacon, Aneesh Kumar K . V,
	Andrew Morton, Nick Piggin, Peter Zijlstra, Joel Fernandes,
	Kirill A . Shutemov, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm



On 02/20/2019 07:04 AM, Matthew Wilcox wrote:
> On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
>> + Matthew Wilcox
>> On 02/19/2019 11:02 AM, Yu Zhao wrote:
>>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 02/19/2019 04:43 AM, Yu Zhao wrote:
>>>>> For pte page, use pgtable_page_ctor(); for pmd page, use
>>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
>>>>> p4d and pgd), don't use any.
>>>> pgtable_page_ctor()/dtor() is not optional for any level page table page
>>>> as it determines the struct page state and zone statistics.
>>>
>>> This is not true. pgtable_page_ctor() is only meant for user pte
>>> page. The name isn't perfect (we named it this way before we had
>>> split pmd page table lock, and never bothered to change it).
>>>
>>> The commit cccd843f54be ("mm: mark pages in use for page tables")
> 
> Where did you get that commit ID from?  In Linus' tree, it's
> 1d40a5ea01d53251c23c7be541d3f4a656cfc537
> 
>>> clearly states so:
>>>   Note that only pages currently accounted as NR_PAGETABLES are
>>>   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
>>
>> I think the commit is the following one and it does say so. But what is
>> the rationale of tagging only PTE page as PageTable and updating the zone
>> stat but not doing so for higher level page table pages ? Are not they
>> used as page table pages ? Should not they count towards NR_PAGETABLE ?
>>
>> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")
> 
> I think they should all be accounted towards NR_PAGETABLE and marked
> as being PageTable.  Somebody needs to make the case for that and

Okay so we agree on the applicability part.

> send the patches.  That patch even says that there should be follow-up
> patches to do that.  I've been a little busy and haven't got back to it.
> I thought you said you were going to do it.

This is very much arch specific. pgtabe_page_ctor()/dtor() are not uniformly
called for all page table level allocations (user or kernel) across different
archs. Yes I am planning to make generic page table allocation functions for
all levels which archs can choose to use. But for now I have a series to fix
the situation on arm64.

> 
>> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
>> for generalization which has been already acknowledged earlier. But for now we
>> can atleast fix this on arm64.
>>
>> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/
> 
> ... were you not listening when you were told that was completely
> inadequate?

Agreed. The discussion on the thread made it clear that the above patch was
inadequate. What I was trying to point out (probably not very clearly) that
there is a need for larger generalization/consolidation on page table page
allocation front including but might not be limited to allocation flag for
user/kernel page table, standard allocation functions etc. The very idea of
quoting the above URL here was to bring attention to the fact that different
archs are doing these allocations differently already.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19 22:28         ` Yu Zhao
@ 2019-02-20 10:27           ` Anshuman Khandual
  2019-02-20 12:24             ` Matthew Wilcox
  2019-02-20 20:22             ` Yu Zhao
  0 siblings, 2 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-02-20 10:27 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Mark Rutland, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm, Matthew Wilcox



On 02/20/2019 03:58 AM, Yu Zhao wrote:
> On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
>> + Matthew Wilcox
>>
>> On 02/19/2019 11:02 AM, Yu Zhao wrote:
>>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 02/19/2019 04:43 AM, Yu Zhao wrote:
>>>>> For pte page, use pgtable_page_ctor(); for pmd page, use
>>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
>>>>> p4d and pgd), don't use any.
>>>> pgtable_page_ctor()/dtor() is not optional for any level page table page
>>>> as it determines the struct page state and zone statistics.
>>>
>>> This is not true. pgtable_page_ctor() is only meant for user pte
>>> page. The name isn't perfect (we named it this way before we had
>>> split pmd page table lock, and never bothered to change it).
>>>
>>> The commit cccd843f54be ("mm: mark pages in use for page tables")
>>> clearly states so:
>>>   Note that only pages currently accounted as NR_PAGETABLES are
>>>   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
>>
>> I think the commit is the following one and it does say so. But what is
>> the rationale of tagging only PTE page as PageTable and updating the zone
>> stat but not doing so for higher level page table pages ? Are not they
>> used as page table pages ? Should not they count towards NR_PAGETABLE ?
>>
>> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")
> 
> Well, I was just trying to clarify how the ctor is meant to be used.
> The rational behind it is probably another topic.
> 
> For starters, the number of pmd/pud/p4d/pgd is at least two orders
> of magnitude less than the number of pte, which makes them almost
> negligible. And some archs use kmem for them, so it's infeasible to
> SetPageTable on or account them in the way the ctor does on those
> archs.
> 

I understand the kmem cases which are definitely problematic and should
be fixed. IIRC there is a mechanism to custom init pages allocated for
slab cache with a ctor function which in turn can call pgtable_page_ctor().
But destructor helper support for slab has been dropped I guess.


> But, as I said, it's not something can't be changed. It's just not
> the concern of this patch.

Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation
as suggested in the patch breaks pmd_alloc_one() changes as per the
previous proposal. Hence we all would need some agreement here.

https://www.spinics.net/lists/arm-kernel/msg701960.html

We can still accommodate the split PMD ptlock feature in pmd_alloc_one().
A possible solution can be like this above and over the previous series.

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..c02abb2a69f7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -9,6 +9,7 @@ config ARM64
        select ACPI_SPCR_TABLE if ACPI
        select ACPI_PPTT if ACPI
        select ARCH_CLOCKSOURCE_DATA
+       select ARCH_ENABLE_SPLIT_PMD_PTLOCK if HAVE_ARCH_TRANSPARENT_HUGEPAGE
        select ARCH_HAS_DEBUG_VIRTUAL
        select ARCH_HAS_DEVMEM_IS_ALLOWED
        select ARCH_HAS_DMA_COHERENT_TO_PFN
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index a02a4d1d967d..258e09fb3ce2 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -37,13 +37,29 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pte);
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-       return (pmd_t *)pte_alloc_one_virt(mm);
+       pgtable_t ptr;
+
+       ptr = pte_alloc_one(mm);
+       if (!ptr)
+               return 0;
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+       ptr->pmd_huge_pte = NULL;
+#endif
+       return (pmd_t *)page_to_virt(ptr);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
+       struct page *page;
+
        BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
-       pte_free(mm, virt_to_page(pmdp));
+       page = virt_to_page(pmdp);
+
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+       VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
+#endif
+       pte_free(mm, page);
 }


> 
>>>
>>> I'm sure if we go back further, we can find similar stories: we
>>> don't set PageTable on page tables other than pte; and we don't
>>> account page tables other than pte. I don't have any objection if
>>> you want change these two. But please make sure they are consistent
>>> across all archs.
>>
>> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
>> for generalization which has been already acknowledged earlier. But for now we
>> can atleast fix this on arm64.
>>
>> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/
> 
> This is again not true. Please stop making claims not backed up by
> facts. And the link is completely irrelevant to the ctor.
> 
> I just checked *all* arches. Only four arches call the ctor outside
> pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do
> so not because they want to SetPageTable on or account pmd/pud/p4d/
> pgd, but because they have to work around something, as arm/arm64
> do.

That reaffirms the fact that pgtable_page_ctor()/dtor() are getting used
not in a consistent manner.

> 
>>
>>>
>>>> We should not skip it for any page table page.
>>>
>>> In fact, calling it on pmd/pud/p4d is peculiar, and may even be
>>> considered wrong. AFAIK, no other arch does so.
>>
>> Why would it be considered wrong ? IIUC archs have their own understanding
>> of this and there are different implementations. But doing something for
>> PTE page and skipping for others is plain inconsistent.
> 
> Allocating memory that will never be used is wrong. Please look into
> the ctor and find out what exactly it does under different configs.

Are you referring to ptlock_init() --> ptlock_alloc() triggered spinlock_t
allocations with USE_SPLIT_PTE_PTLOCKS and ALLOC_SPLIT_PTLOCKS.

> 
> And why I said "may"? Because we know there is only negligible number
> of pmd/pud/p4d, so the memory allocated may be considered negligible
> as well.

Okay.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-20 10:27           ` Anshuman Khandual
@ 2019-02-20 12:24             ` Matthew Wilcox
  2019-02-20 20:22             ` Yu Zhao
  1 sibling, 0 replies; 38+ messages in thread
From: Matthew Wilcox @ 2019-02-20 12:24 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Yu Zhao, Catalin Marinas, Will Deacon, Aneesh Kumar K . V,
	Andrew Morton, Nick Piggin, Peter Zijlstra, Joel Fernandes,
	Kirill A . Shutemov, Mark Rutland, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm

On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote:
> On 02/20/2019 03:58 AM, Yu Zhao wrote:
> > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
> >> On 02/19/2019 11:02 AM, Yu Zhao wrote:
> >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
> >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote:
> >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use
> >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> >>>>> p4d and pgd), don't use any.
> >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page
> >>>> as it determines the struct page state and zone statistics.
> >>>
> >>> This is not true. pgtable_page_ctor() is only meant for user pte
> >>> page. The name isn't perfect (we named it this way before we had
> >>> split pmd page table lock, and never bothered to change it).
> >>>
> >>> The commit cccd843f54be ("mm: mark pages in use for page tables")
> >>> clearly states so:
> >>>   Note that only pages currently accounted as NR_PAGETABLES are
> >>>   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
> >>
> >> I think the commit is the following one and it does say so. But what is
> >> the rationale of tagging only PTE page as PageTable and updating the zone
> >> stat but not doing so for higher level page table pages ? Are not they
> >> used as page table pages ? Should not they count towards NR_PAGETABLE ?
> >>
> >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")
> > 
> > Well, I was just trying to clarify how the ctor is meant to be used.
> > The rational behind it is probably another topic.
> > 
> > For starters, the number of pmd/pud/p4d/pgd is at least two orders
> > of magnitude less than the number of pte, which makes them almost
> > negligible. And some archs use kmem for them, so it's infeasible to
> > SetPageTable on or account them in the way the ctor does on those
> > archs.
> 
> I understand the kmem cases which are definitely problematic and should
> be fixed. IIRC there is a mechanism to custom init pages allocated for
> slab cache with a ctor function which in turn can call pgtable_page_ctor().
> But destructor helper support for slab has been dropped I guess.

You can't put a spinlock in the struct page if the page is allocated
through slab.  Slab uses basically all of struct page for its own
purposes.  I tried to make that clear with the new layout of struct
page where everything's in a union discriminated by what the page is
allocated for.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-20 10:27           ` Anshuman Khandual
  2019-02-20 12:24             ` Matthew Wilcox
@ 2019-02-20 20:22             ` Yu Zhao
  2019-02-20 20:59               ` Matthew Wilcox
  1 sibling, 1 reply; 38+ messages in thread
From: Yu Zhao @ 2019-02-20 20:22 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Mark Rutland, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm, Matthew Wilcox

On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote:
> 
> 
> On 02/20/2019 03:58 AM, Yu Zhao wrote:
> > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote:
> >> + Matthew Wilcox
> >>
> >> On 02/19/2019 11:02 AM, Yu Zhao wrote:
> >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote:
> >>>>
> >>>>
> >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote:
> >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use
> >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> >>>>> p4d and pgd), don't use any.
> >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page
> >>>> as it determines the struct page state and zone statistics.
> >>>
> >>> This is not true. pgtable_page_ctor() is only meant for user pte
> >>> page. The name isn't perfect (we named it this way before we had
> >>> split pmd page table lock, and never bothered to change it).
> >>>
> >>> The commit cccd843f54be ("mm: mark pages in use for page tables")
> >>> clearly states so:
> >>>   Note that only pages currently accounted as NR_PAGETABLES are
> >>>   tracked as PageTable; this does not include pgd/p4d/pud/pmd pages.
> >>
> >> I think the commit is the following one and it does say so. But what is
> >> the rationale of tagging only PTE page as PageTable and updating the zone
> >> stat but not doing so for higher level page table pages ? Are not they
> >> used as page table pages ? Should not they count towards NR_PAGETABLE ?
> >>
> >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables")
> > 
> > Well, I was just trying to clarify how the ctor is meant to be used.
> > The rational behind it is probably another topic.
> > 
> > For starters, the number of pmd/pud/p4d/pgd is at least two orders
> > of magnitude less than the number of pte, which makes them almost
> > negligible. And some archs use kmem for them, so it's infeasible to
> > SetPageTable on or account them in the way the ctor does on those
> > archs.
> > 
> 
> I understand the kmem cases which are definitely problematic and should
> be fixed. IIRC there is a mechanism to custom init pages allocated for
> slab cache with a ctor function which in turn can call pgtable_page_ctor().
> But destructor helper support for slab has been dropped I guess.
> 
> 
> > But, as I said, it's not something can't be changed. It's just not
> > the concern of this patch.
> 
> Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation
> as suggested in the patch breaks pmd_alloc_one() changes as per the
> previous proposal. Hence we all would need some agreement here.
> 
> https://www.spinics.net/lists/arm-kernel/msg701960.html

A proposal that requires all page tables to go through a same set of
ctors on all archs is not only inefficient (for kernel page tables)
but also infeasible (for arches use kmem for page tables). I've
explained this clearly.

The generalized page table functions must recognize the differences
on different levels and between user and kernel page tables, and
provide unified api that is capable of handling the differences.

The change below is not helping at all.

> 
> We can still accommodate the split PMD ptlock feature in pmd_alloc_one().
> A possible solution can be like this above and over the previous series.
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4168d366127..c02abb2a69f7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -9,6 +9,7 @@ config ARM64
>         select ACPI_SPCR_TABLE if ACPI
>         select ACPI_PPTT if ACPI
>         select ARCH_CLOCKSOURCE_DATA
> +       select ARCH_ENABLE_SPLIT_PMD_PTLOCK if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>         select ARCH_HAS_DEBUG_VIRTUAL
>         select ARCH_HAS_DEVMEM_IS_ALLOWED
>         select ARCH_HAS_DMA_COHERENT_TO_PFN
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index a02a4d1d967d..258e09fb3ce2 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -37,13 +37,29 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pte);
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -       return (pmd_t *)pte_alloc_one_virt(mm);
> +       pgtable_t ptr;
> +
> +       ptr = pte_alloc_one(mm);
> +       if (!ptr)
> +               return 0;
> +
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> +       ptr->pmd_huge_pte = NULL;
> +#endif
> +       return (pmd_t *)page_to_virt(ptr);
>  }
>  
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
>  {
> +       struct page *page;
> +
>         BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> -       pte_free(mm, virt_to_page(pmdp));
> +       page = virt_to_page(pmdp);
> +
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> +       VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> +#endif
> +       pte_free(mm, page);
>  }
> 
> 
> > 
> >>>
> >>> I'm sure if we go back further, we can find similar stories: we
> >>> don't set PageTable on page tables other than pte; and we don't
> >>> account page tables other than pte. I don't have any objection if
> >>> you want change these two. But please make sure they are consistent
> >>> across all archs.
> >>
> >> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need
> >> for generalization which has been already acknowledged earlier. But for now we
> >> can atleast fix this on arm64.
> >>
> >> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/
> > 
> > This is again not true. Please stop making claims not backed up by
> > facts. And the link is completely irrelevant to the ctor.
> > 
> > I just checked *all* arches. Only four arches call the ctor outside
> > pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do
> > so not because they want to SetPageTable on or account pmd/pud/p4d/
> > pgd, but because they have to work around something, as arm/arm64
> > do.
> 
> That reaffirms the fact that pgtable_page_ctor()/dtor() are getting used
> not in a consistent manner.

Now it's getting absurd. I'll just stop before this turns into
complete nonsense.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-20 20:22             ` Yu Zhao
@ 2019-02-20 20:59               ` Matthew Wilcox
  0 siblings, 0 replies; 38+ messages in thread
From: Matthew Wilcox @ 2019-02-20 20:59 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon,
	Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

On Wed, Feb 20, 2019 at 01:22:44PM -0700, Yu Zhao wrote:
> On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote:
> > Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation
> > as suggested in the patch breaks pmd_alloc_one() changes as per the
> > previous proposal. Hence we all would need some agreement here.
> > 
> > https://www.spinics.net/lists/arm-kernel/msg701960.html
> 
> A proposal that requires all page tables to go through a same set of
> ctors on all archs is not only inefficient (for kernel page tables)
> but also infeasible (for arches use kmem for page tables). I've
> explained this clearly.
> 
> The generalized page table functions must recognize the differences
> on different levels and between user and kernel page tables, and
> provide unified api that is capable of handling the differences.

The two architectures I'm aware of (s390 and power) which use sub-page
allocations for page tables do so by allocating entire pages and then
implementing their own allocators.  It shouldn't be a huge problem to
use a ctor for the pages.  We can probably even implement a dtor for them.

Oh, another corner-case I've just remembered is x86-32's PAE with four
8-byte entries in the PGD.  That should also go away and be replaced
with a shared implementation of sub-page allocations which can also be
marked as PageTable.

Ideally PTEs, PMDs, etc, etc would all be accounted to the individual
processes causing them to be allocated.  This isn't really feasible
with the x86 PGD; by definition there's only one per process.  I'm OK
with failing to account this 32-byte allocation to the task though.
So maybe the pgd_cache can remain separate from the hypothetical unified
ppc/s390 code.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-19  5:32     ` Yu Zhao
  2019-02-19  6:17       ` Anshuman Khandual
@ 2019-02-20 21:03       ` Matthew Wilcox
  1 sibling, 0 replies; 38+ messages in thread
From: Matthew Wilcox @ 2019-02-20 21:03 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon,
	Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Mark Rutland,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

On Mon, Feb 18, 2019 at 10:32:05PM -0700, Yu Zhao wrote:
> pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd,
> it's okay to use pgtable_page_ctor() instead only because kernel
> doesn't have thp.

I'm not sure that's true.  I think you can create THPs in vmalloc
these days.  See HAVE_ARCH_HUGE_VMAP which is supported by arm64.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
                     ` (2 preceding siblings ...)
  2019-02-19  4:21   ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
@ 2019-02-26 15:12   ` Mark Rutland
  2019-03-09  4:01     ` Yu Zhao
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
  4 siblings, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2019-02-26 15:12 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

Hi,

On Mon, Feb 18, 2019 at 04:13:17PM -0700, Yu Zhao wrote:
> For pte page, use pgtable_page_ctor(); for pmd page, use
> pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> p4d and pgd), don't use any.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------
>  1 file changed, 21 insertions(+), 12 deletions(-)

[...]

> -static phys_addr_t pgd_pgtable_alloc(void)
> +static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>  	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> -	if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
> -		BUG();
> +	BUG_ON(!ptr);
> +
> +	/*
> +	 * Initialize page table locks in case later we need to
> +	 * call core mm functions like apply_to_page_range() on
> +	 * this pre-allocated page table.
> +	 */
> +	if (shift == PAGE_SHIFT)
> +		BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
> +	else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT)
> +		BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr)));

IIUC, this is for nopmd kernels, where we only have real PGD and PTE
levels of table. From my PoV, that would be clearer if we did:

	else if (shift == PMD_SHIFT && !is_defined(__PAGETABLE_PMD_FOLDED))

... though IMO it would be a bit nicer if the generic
pgtable_pmd_page_ctor() were nop'd out for __PAGETABLE_PMD_FOLDED
builds, so that callers don't have to be aware of folding.

I couldn't think of a nicer way of distinguishing levels of table, and
having separate function pointers for each level seems over-the-top, so
otehr than that this looks good to me.

Assuming you're happy with the above change:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm
  2019-02-18 23:13   ` [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
@ 2019-02-26 15:13     ` Mark Rutland
  2019-03-09  3:52       ` Yu Zhao
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2019-02-26 15:13 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

Hi,

On Mon, Feb 18, 2019 at 04:13:18PM -0700, Yu Zhao wrote:
> init_mm doesn't require page table lock to be initialized at
> any level. Add a separate page table allocator for it, and the
> new one skips page table ctors.

Just to check, in a previous reply you mentioned we need to call the
ctors for our efi_mm, since we use apply_to_page_range() on that. Is
that only because apply_to_pte_range() tries to take the ptl for non
init_mm?

... or did I miss something else?

> The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not
> calling them avoids memory leak in case we call pte_free_kernel()
> on init_mm.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Assuming that was all, this patch makes sense to me. FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/mm/mmu.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index fa7351877af3..e8bf8a6300e8 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -370,6 +370,16 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>  	} while (pgdp++, addr = next, addr != end);
>  }
>  
> +static phys_addr_t pgd_kernel_pgtable_alloc(int shift)
> +{
> +	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> +	BUG_ON(!ptr);
> +
> +	/* Ensure the zeroed page is visible to the page table walker */
> +	dsb(ishst);
> +	return __pa(ptr);
> +}
> +
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>  	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> @@ -591,7 +601,7 @@ static int __init map_entry_trampoline(void)
>  	/* Map only the text into the trampoline page table */
>  	memset(tramp_pg_dir, 0, PGD_SIZE);
>  	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
> -			     prot, pgd_pgtable_alloc, 0);
> +			     prot, pgd_kernel_pgtable_alloc, 0);
>  
>  	/* Map both the text and data into the kernel page table */
>  	__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
> @@ -1067,7 +1077,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>  		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>  
>  	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> -			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
> +			     size, PAGE_KERNEL, pgd_kernel_pgtable_alloc,
> +			     flags);
>  
>  	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>  			   altmap, want_memblock);
> -- 
> 2.21.0.rc0.258.g878e2cd30e-goog
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm
  2019-02-26 15:13     ` Mark Rutland
@ 2019-03-09  3:52       ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-09  3:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

On Tue, Feb 26, 2019 at 03:13:07PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Mon, Feb 18, 2019 at 04:13:18PM -0700, Yu Zhao wrote:
> > init_mm doesn't require page table lock to be initialized at
> > any level. Add a separate page table allocator for it, and the
> > new one skips page table ctors.
> 
> Just to check, in a previous reply you mentioned we need to call the
> ctors for our efi_mm, since we use apply_to_page_range() on that. Is
> that only because apply_to_pte_range() tries to take the ptl for non
> init_mm?

Precisely.

> ... or did I miss something else?
> 
> > The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not
> > calling them avoids memory leak in case we call pte_free_kernel()
> > on init_mm.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> 
> Assuming that was all, this patch makes sense to me. FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks.

> Thanks,
> Mark.
> 
> > ---
> >  arch/arm64/mm/mmu.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index fa7351877af3..e8bf8a6300e8 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -370,6 +370,16 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> >  	} while (pgdp++, addr = next, addr != end);
> >  }
> >  
> > +static phys_addr_t pgd_kernel_pgtable_alloc(int shift)
> > +{
> > +	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> > +	BUG_ON(!ptr);
> > +
> > +	/* Ensure the zeroed page is visible to the page table walker */
> > +	dsb(ishst);
> > +	return __pa(ptr);
> > +}
> > +
> >  static phys_addr_t pgd_pgtable_alloc(int shift)
> >  {
> >  	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> > @@ -591,7 +601,7 @@ static int __init map_entry_trampoline(void)
> >  	/* Map only the text into the trampoline page table */
> >  	memset(tramp_pg_dir, 0, PGD_SIZE);
> >  	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
> > -			     prot, pgd_pgtable_alloc, 0);
> > +			     prot, pgd_kernel_pgtable_alloc, 0);
> >  
> >  	/* Map both the text and data into the kernel page table */
> >  	__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
> > @@ -1067,7 +1077,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> >  		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >  
> >  	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> > -			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
> > +			     size, PAGE_KERNEL, pgd_kernel_pgtable_alloc,
> > +			     flags);
> >  
> >  	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
> >  			   altmap, want_memblock);
> > -- 
> > 2.21.0.rc0.258.g878e2cd30e-goog
> > 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-26 15:12   ` Mark Rutland
@ 2019-03-09  4:01     ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-09  4:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

On Tue, Feb 26, 2019 at 03:12:31PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Mon, Feb 18, 2019 at 04:13:17PM -0700, Yu Zhao wrote:
> > For pte page, use pgtable_page_ctor(); for pmd page, use
> > pgtable_pmd_page_ctor() if not folded; and for the rest (pud,
> > p4d and pgd), don't use any.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >  arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------
> >  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> [...]
> 
> > -static phys_addr_t pgd_pgtable_alloc(void)
> > +static phys_addr_t pgd_pgtable_alloc(int shift)
> >  {
> >  	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
> > -	if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
> > -		BUG();
> > +	BUG_ON(!ptr);
> > +
> > +	/*
> > +	 * Initialize page table locks in case later we need to
> > +	 * call core mm functions like apply_to_page_range() on
> > +	 * this pre-allocated page table.
> > +	 */
> > +	if (shift == PAGE_SHIFT)
> > +		BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
> > +	else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT)
> > +		BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr)));
> 
> IIUC, this is for nopmd kernels, where we only have real PGD and PTE
> levels of table. From my PoV, that would be clearer if we did:
> 
> 	else if (shift == PMD_SHIFT && !is_defined(__PAGETABLE_PMD_FOLDED))
> 
> ... though IMO it would be a bit nicer if the generic
> pgtable_pmd_page_ctor() were nop'd out for __PAGETABLE_PMD_FOLDED
> builds, so that callers don't have to be aware of folding.

Agreed. Will make pgtable_pmd_page_ctor() nop when pmd is folded.

> I couldn't think of a nicer way of distinguishing levels of table, and
> having separate function pointers for each level seems over-the-top, so
> otehr than that this looks good to me.
> 
> Assuming you're happy with the above change:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables
  2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
                     ` (3 preceding siblings ...)
  2019-02-26 15:12   ` Mark Rutland
@ 2019-03-10  1:19   ` Yu Zhao
  2019-03-10  1:19     ` [PATCH v3 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
                       ` (3 more replies)
  4 siblings, 4 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-10  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

For pte page, use pgtable_page_ctor(); for pmd page, use
pgtable_pmd_page_ctor(); and for the rest (pud, p4d and pgd),
don't use any.

For now, we don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK and
pgtable_pmd_page_ctor() is a nop. When we do in patch 3, we
make sure pmd is not folded so we won't mistakenly call
pgtable_pmd_page_ctor() on pud or p4d.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b6f5aa52ac67..f704b291f2c5 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -98,7 +98,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(void)
+static phys_addr_t __init early_pgtable_alloc(int shift)
 {
 	phys_addr_t phys;
 	void *ptr;
@@ -173,7 +173,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void),
+				phys_addr_t (*pgtable_alloc)(int),
 				int flags)
 {
 	unsigned long next;
@@ -183,7 +183,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 	if (pmd_none(pmd)) {
 		phys_addr_t pte_phys;
 		BUG_ON(!pgtable_alloc);
-		pte_phys = pgtable_alloc();
+		pte_phys = pgtable_alloc(PAGE_SHIFT);
 		__pmd_populate(pmdp, pte_phys, PMD_TYPE_TABLE);
 		pmd = READ_ONCE(*pmdp);
 	}
@@ -207,7 +207,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 
 static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 		     phys_addr_t phys, pgprot_t prot,
-		     phys_addr_t (*pgtable_alloc)(void), int flags)
+		     phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pmd_t *pmdp;
@@ -245,7 +245,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void), int flags)
+				phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pud_t pud = READ_ONCE(*pudp);
@@ -257,7 +257,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 	if (pud_none(pud)) {
 		phys_addr_t pmd_phys;
 		BUG_ON(!pgtable_alloc);
-		pmd_phys = pgtable_alloc();
+		pmd_phys = pgtable_alloc(PMD_SHIFT);
 		__pud_populate(pudp, pmd_phys, PUD_TYPE_TABLE);
 		pud = READ_ONCE(*pudp);
 	}
@@ -293,7 +293,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 
 static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
-			   phys_addr_t (*pgtable_alloc)(void),
+			   phys_addr_t (*pgtable_alloc)(int),
 			   int flags)
 {
 	unsigned long next;
@@ -303,7 +303,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 	if (pgd_none(pgd)) {
 		phys_addr_t pud_phys;
 		BUG_ON(!pgtable_alloc);
-		pud_phys = pgtable_alloc();
+		pud_phys = pgtable_alloc(PUD_SHIFT);
 		__pgd_populate(pgdp, pud_phys, PUD_TYPE_TABLE);
 		pgd = READ_ONCE(*pgdp);
 	}
@@ -344,7 +344,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
-				 phys_addr_t (*pgtable_alloc)(void),
+				 phys_addr_t (*pgtable_alloc)(int),
 				 int flags)
 {
 	unsigned long addr, length, end, next;
@@ -370,11 +370,23 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
-static phys_addr_t pgd_pgtable_alloc(void)
+static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
-	if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
-		BUG();
+	BUG_ON(!ptr);
+
+	/*
+	 * Call proper page table ctor in case later we need to
+	 * call core mm functions like apply_to_page_range() on
+	 * this pre-allocated page table.
+	 *
+	 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
+	 * folded, and if so pgtable_pmd_page_ctor() becomes nop.
+	 */
+	if (shift == PAGE_SHIFT)
+		BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
+	else if (shift == PMD_SHIFT)
+		BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr)));
 
 	/* Ensure the zeroed page is visible to the page table walker */
 	dsb(ishst);
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 2/3] arm64: mm: don't call page table ctors for init_mm
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
@ 2019-03-10  1:19     ` Yu Zhao
  2019-03-10  1:19     ` [PATCH v3 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-10  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

init_mm doesn't require page table lock to be initialized at
any level. Add a separate page table allocator for it, and the
new one skips page table ctors.

The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not
calling them avoids memory leak in case we call pte_free_kernel()
on init_mm.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f704b291f2c5..d1dc2a2777aa 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -370,6 +370,16 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
+static phys_addr_t pgd_kernel_pgtable_alloc(int shift)
+{
+	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
+	BUG_ON(!ptr);
+
+	/* Ensure the zeroed page is visible to the page table walker */
+	dsb(ishst);
+	return __pa(ptr);
+}
+
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
@@ -594,7 +604,7 @@ static int __init map_entry_trampoline(void)
 	/* Map only the text into the trampoline page table */
 	memset(tramp_pg_dir, 0, PGD_SIZE);
 	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
-			     prot, pgd_pgtable_alloc, 0);
+			     prot, pgd_kernel_pgtable_alloc, 0);
 
 	/* Map both the text and data into the kernel page table */
 	__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
@@ -1070,7 +1080,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
-			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
+			     size, PAGE_KERNEL, pgd_kernel_pgtable_alloc,
+			     flags);
 
 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
 			   altmap, want_memblock);
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
  2019-03-10  1:19     ` [PATCH v3 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
@ 2019-03-10  1:19     ` Yu Zhao
  2019-03-11  8:28       ` Anshuman Khandual
  2019-03-11 12:12       ` Mark Rutland
  2019-03-11  7:45     ` [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
  2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
  3 siblings, 2 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-10  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

We only do so when pmd is not folded, so we don't mistakenly call
pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
is folded).

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
 arch/arm64/include/asm/tlb.h     |  5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cfbf307d6dc4..a3b1b789f766 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+	def_bool y if PGTABLE_LEVELS > 2
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	---help---
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 52fa47c73bf0..dabba4b2c61f 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -33,12 +33,22 @@
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)__get_free_page(PGALLOC_GFP);
+	struct page *page;
+
+	page = alloc_page(PGALLOC_GFP);
+	if (!page)
+		return NULL;
+	if (!pgtable_pmd_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
+	return page_address(page);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
 	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+	pgtable_pmd_page_dtor(virt_to_page(pmdp));
 	free_page((unsigned long)pmdp);
 }
 
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 106fdc951b6e..4e3becfed387 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pmdp));
+	struct page *page = virt_to_page(pmdp);
+
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
  2019-03-10  1:19     ` [PATCH v3 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
  2019-03-10  1:19     ` [PATCH v3 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
@ 2019-03-11  7:45     ` Anshuman Khandual
  2019-03-11 23:23       ` Yu Zhao
  2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
  3 siblings, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2019-03-11  7:45 UTC (permalink / raw)
  To: Yu Zhao, Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm

Hello Yu,

We had some disagreements over this series last time around after which I had
posted the following series [1] which tried to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
after doing some pgtable accounting changes. After some thoughts and deliberations
I figure that its better not to do pgtable alloc changes on arm64 creating a brand
new semantics which ideally should be first debated and agreed upon in generic MM.

Though I still see value in a changed generic pgtable page allocation semantics
for user and kernel space that should not stop us from enabling more granular
PMD level locks through ARCH_ENABLE_SPLIT_PMD_PTLOCK right now.

[1] https://www.spinics.net/lists/arm-kernel/msg709917.html

Having said that this series attempts to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK with
some minimal changes to existing kernel pgtable page allocation code. Hence just
trying to re-evaluate the series in that isolation.

On 03/10/2019 06:49 AM, Yu Zhao wrote:

> For pte page, use pgtable_page_ctor(); for pmd page, use
> pgtable_pmd_page_ctor(); and for the rest (pud, p4d and pgd),
> don't use any.

This is semantics change. Hence the question is why ? Should not we wait until a
generic MM agreement in place in this regard ? Can we avoid this ? Is the change
really required to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK for user space THP which
this series originally intended to achieve ?

> 
> For now, we don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK and
> pgtable_pmd_page_ctor() is a nop. When we do in patch 3, we
> make sure pmd is not folded so we won't mistakenly call
> pgtable_pmd_page_ctor() on pud or p4d.

This makes sense from code perspective but I still dont understand the need to
change kernel pgtable page allocation semantics without any real benefit or fix at
the moment. Cant we keep kernel page table page allocation unchanged for now and
just enable ARCH_ENABLE_SPLIT_PMD_PTLOCK for user space THP benefits ? Do you see
any concern with that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-10  1:19     ` [PATCH v3 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
@ 2019-03-11  8:28       ` Anshuman Khandual
  2019-03-11 23:10         ` Yu Zhao
  2019-03-11 12:12       ` Mark Rutland
  1 sibling, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2019-03-11  8:28 UTC (permalink / raw)
  To: Yu Zhao, Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm

On 03/10/2019 06:49 AM, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 
> We only do so when pmd is not folded, so we don't mistakenly call
> pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
> check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
> is folded).
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index cfbf307d6dc4..a3b1b789f766 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>  config ARCH_HAS_CACHE_LINE_SIZE
>  	def_bool y
>  
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> +	def_bool y if PGTABLE_LEVELS > 2
> +
>  config SECCOMP
>  	bool "Enable seccomp to safely compute untrusted bytecode"
>  	---help---
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 52fa47c73bf0..dabba4b2c61f 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -33,12 +33,22 @@
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> +	struct page *page;
> +
> +	page = alloc_page(PGALLOC_GFP);
> +	if (!page)
> +		return NULL;
> +	if (!pgtable_pmd_page_ctor(page)) {
> +		__free_page(page);
> +		return NULL;
> +	}
> +	return page_address(page);
>  }
>  
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
>  {
>  	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> +	pgtable_pmd_page_dtor(virt_to_page(pmdp));
>  	free_page((unsigned long)pmdp);
>  }

There is just one problem here. ARM KVM's stage2_pmd_free() calls into pmd_free() on a page
originally allocated with __get_free_page() and never went through pgtable_pmd_page_ctor().
So when ARCH_ENABLE_SPLIT_PMD_PTLOCK is enabled

stage2_pmd_free()
	pgtable_pmd_page_dtor()
		ptlock_free()
			kmem_cache_free(page_ptl_cachep, page->ptl)

Though SLUB implementation for kmem_cache_free() seems to be handling NULL page->ptl (as the
page never got it's lock allocated or initialized) correctly I am not sure if it is a right
thing to do.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-10  1:19     ` [PATCH v3 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
  2019-03-11  8:28       ` Anshuman Khandual
@ 2019-03-11 12:12       ` Mark Rutland
  2019-03-11 12:57         ` Anshuman Khandual
  2019-03-11 23:11         ` Yu Zhao
  1 sibling, 2 replies; 38+ messages in thread
From: Mark Rutland @ 2019-03-11 12:12 UTC (permalink / raw)
  To: Yu Zhao, Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

Hi,

On Sat, Mar 09, 2019 at 06:19:06PM -0700, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 
> We only do so when pmd is not folded, so we don't mistakenly call
> pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
> check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
> is folded).

Just to check, I take it pgtable_pmd_page_ctor() is now a NOP when the
PMD is folded, and this last paragraph is stale?

> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index cfbf307d6dc4..a3b1b789f766 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>  config ARCH_HAS_CACHE_LINE_SIZE
>  	def_bool y
>  
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> +	def_bool y if PGTABLE_LEVELS > 2
> +
>  config SECCOMP
>  	bool "Enable seccomp to safely compute untrusted bytecode"
>  	---help---
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 52fa47c73bf0..dabba4b2c61f 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -33,12 +33,22 @@
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> +	struct page *page;
> +
> +	page = alloc_page(PGALLOC_GFP);
> +	if (!page)
> +		return NULL;
> +	if (!pgtable_pmd_page_ctor(page)) {
> +		__free_page(page);
> +		return NULL;
> +	}
> +	return page_address(page);
>  }
>  
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
>  {
>  	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> +	pgtable_pmd_page_dtor(virt_to_page(pmdp));
>  	free_page((unsigned long)pmdp);
>  }

It looks like arm64's existing stage-2 code is inconsistent across
alloc/free, and IIUC this change might turn that into a real problem.
Currently we allocate all levels of stage-2 table with
__get_free_page(), but free them with p?d_free(). We always miss the
ctor and always use the dtor.

Other than that, this patch looks fine to me, but I'd feel more
comfortable if we could first fix the stage-2 code to free those stage-2
tables without invoking the dtor.

Anshuman, IIRC you had a patch to fix the stage-2 code to not invoke the
dtors. If so, could you please post that so that we could take it as a
preparatory patch for this series?

Thanks,
Mark.

> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 106fdc951b6e..4e3becfed387 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
>  				  unsigned long addr)
>  {
> -	tlb_remove_table(tlb, virt_to_page(pmdp));
> +	struct page *page = virt_to_page(pmdp);
> +
> +	pgtable_pmd_page_dtor(page);
> +	tlb_remove_table(tlb, page);
>  }
>  #endif
>  
> -- 
> 2.21.0.360.g471c308f928-goog
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-11 12:12       ` Mark Rutland
@ 2019-03-11 12:57         ` Anshuman Khandual
  2019-03-11 23:11         ` Yu Zhao
  1 sibling, 0 replies; 38+ messages in thread
From: Anshuman Khandual @ 2019-03-11 12:57 UTC (permalink / raw)
  To: Mark Rutland, Yu Zhao
  Cc: Catalin Marinas, Will Deacon, Aneesh Kumar K . V, Andrew Morton,
	Nick Piggin, Peter Zijlstra, Joel Fernandes, Kirill A . Shutemov,
	Ard Biesheuvel, Chintan Pandya, Jun Yao, Laura Abbott,
	linux-arm-kernel, linux-kernel, linux-arch, linux-mm

On 03/11/2019 05:42 PM, Mark Rutland wrote:
> Hi,
> 
> On Sat, Mar 09, 2019 at 06:19:06PM -0700, Yu Zhao wrote:
>> Switch from per mm_struct to per pmd page table lock by enabling
>> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
>> large system.
>>
>> I'm not sure if there is contention on mm->page_table_lock. Given
>> the option comes at no cost (apart from initializing more spin
>> locks), why not enable it now.
>>
>> We only do so when pmd is not folded, so we don't mistakenly call
>> pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
>> check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
>> is folded).
> 
> Just to check, I take it pgtable_pmd_page_ctor() is now a NOP when the
> PMD is folded, and this last paragraph is stale?
> 
>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>> ---
>>  arch/arm64/Kconfig               |  3 +++
>>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>>  3 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index cfbf307d6dc4..a3b1b789f766 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>>  config ARCH_HAS_CACHE_LINE_SIZE
>>  	def_bool y
>>  
>> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>> +	def_bool y if PGTABLE_LEVELS > 2
>> +
>>  config SECCOMP
>>  	bool "Enable seccomp to safely compute untrusted bytecode"
>>  	---help---
>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>> index 52fa47c73bf0..dabba4b2c61f 100644
>> --- a/arch/arm64/include/asm/pgalloc.h
>> +++ b/arch/arm64/include/asm/pgalloc.h
>> @@ -33,12 +33,22 @@
>>  
>>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>>  {
>> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
>> +	struct page *page;
>> +
>> +	page = alloc_page(PGALLOC_GFP);
>> +	if (!page)
>> +		return NULL;
>> +	if (!pgtable_pmd_page_ctor(page)) {
>> +		__free_page(page);
>> +		return NULL;
>> +	}
>> +	return page_address(page);
>>  }
>>  
>>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
>>  {
>>  	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
>> +	pgtable_pmd_page_dtor(virt_to_page(pmdp));
>>  	free_page((unsigned long)pmdp);
>>  }
> 
> It looks like arm64's existing stage-2 code is inconsistent across
> alloc/free, and IIUC this change might turn that into a real problem.
> Currently we allocate all levels of stage-2 table with
> __get_free_page(), but free them with p?d_free(). We always miss the
> ctor and always use the dtor.
> 
> Other than that, this patch looks fine to me, but I'd feel more
> comfortable if we could first fix the stage-2 code to free those stage-2
> tables without invoking the dtor.

Thats right. I have already highlighted this problem.
 
> 
> Anshuman, IIRC you had a patch to fix the stage-2 code to not invoke the
> dtors. If so, could you please post that so that we could take it as a
> preparatory patch for this series?

Sure I can after fixing PTE level pte_free_kernel/__free_page which I had
missed in V2.

https://www.spinics.net/lists/arm-kernel/msg710118.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-11  8:28       ` Anshuman Khandual
@ 2019-03-11 23:10         ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-11 23:10 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Aneesh Kumar K . V,
	Andrew Morton, Nick Piggin, Peter Zijlstra, Joel Fernandes,
	Kirill A . Shutemov, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm

On Mon, Mar 11, 2019 at 01:58:27PM +0530, Anshuman Khandual wrote:
> On 03/10/2019 06:49 AM, Yu Zhao wrote:
> > Switch from per mm_struct to per pmd page table lock by enabling
> > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > large system.
> > 
> > I'm not sure if there is contention on mm->page_table_lock. Given
> > the option comes at no cost (apart from initializing more spin
> > locks), why not enable it now.
> > 
> > We only do so when pmd is not folded, so we don't mistakenly call
> > pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
> > check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
> > is folded).
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> >  3 files changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index cfbf307d6dc4..a3b1b789f766 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> >  config ARCH_HAS_CACHE_LINE_SIZE
> >  	def_bool y
> >  
> > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > +	def_bool y if PGTABLE_LEVELS > 2
> > +
> >  config SECCOMP
> >  	bool "Enable seccomp to safely compute untrusted bytecode"
> >  	---help---
> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > index 52fa47c73bf0..dabba4b2c61f 100644
> > --- a/arch/arm64/include/asm/pgalloc.h
> > +++ b/arch/arm64/include/asm/pgalloc.h
> > @@ -33,12 +33,22 @@
> >  
> >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> >  {
> > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > +	struct page *page;
> > +
> > +	page = alloc_page(PGALLOC_GFP);
> > +	if (!page)
> > +		return NULL;
> > +	if (!pgtable_pmd_page_ctor(page)) {
> > +		__free_page(page);
> > +		return NULL;
> > +	}
> > +	return page_address(page);
> >  }
> >  
> >  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
> >  {
> >  	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> > +	pgtable_pmd_page_dtor(virt_to_page(pmdp));
> >  	free_page((unsigned long)pmdp);
> >  }
> 
> There is just one problem here. ARM KVM's stage2_pmd_free() calls into pmd_free() on a page
> originally allocated with __get_free_page() and never went through pgtable_pmd_page_ctor().
> So when ARCH_ENABLE_SPLIT_PMD_PTLOCK is enabled
> 
> stage2_pmd_free()
> 	pgtable_pmd_page_dtor()
> 		ptlock_free()
> 			kmem_cache_free(page_ptl_cachep, page->ptl)
> 
> Though SLUB implementation for kmem_cache_free() seems to be handling NULL page->ptl (as the
> page never got it's lock allocated or initialized) correctly I am not sure if it is a right
> thing to do.

Thanks for reminding me. This should be fixed as well. Will do it
in a separate patch.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 3/3] arm64: mm: enable per pmd page table lock
  2019-03-11 12:12       ` Mark Rutland
  2019-03-11 12:57         ` Anshuman Khandual
@ 2019-03-11 23:11         ` Yu Zhao
  1 sibling, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-11 23:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon,
	Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm

On Mon, Mar 11, 2019 at 12:12:28PM +0000, Mark Rutland wrote:
> Hi,
> 
> On Sat, Mar 09, 2019 at 06:19:06PM -0700, Yu Zhao wrote:
> > Switch from per mm_struct to per pmd page table lock by enabling
> > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > large system.
> > 
> > I'm not sure if there is contention on mm->page_table_lock. Given
> > the option comes at no cost (apart from initializing more spin
> > locks), why not enable it now.
> > 
> > We only do so when pmd is not folded, so we don't mistakenly call
> > pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc(). (We
> > check shift against PMD_SHIFT, which is same as PUD_SHIFT when pmd
> > is folded).
> 
> Just to check, I take it pgtable_pmd_page_ctor() is now a NOP when the
> PMD is folded, and this last paragraph is stale?

Yes, and will remove it.

> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> >  3 files changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index cfbf307d6dc4..a3b1b789f766 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> >  config ARCH_HAS_CACHE_LINE_SIZE
> >  	def_bool y
> >  
> > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > +	def_bool y if PGTABLE_LEVELS > 2
> > +
> >  config SECCOMP
> >  	bool "Enable seccomp to safely compute untrusted bytecode"
> >  	---help---
> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > index 52fa47c73bf0..dabba4b2c61f 100644
> > --- a/arch/arm64/include/asm/pgalloc.h
> > +++ b/arch/arm64/include/asm/pgalloc.h
> > @@ -33,12 +33,22 @@
> >  
> >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> >  {
> > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > +	struct page *page;
> > +
> > +	page = alloc_page(PGALLOC_GFP);
> > +	if (!page)
> > +		return NULL;
> > +	if (!pgtable_pmd_page_ctor(page)) {
> > +		__free_page(page);
> > +		return NULL;
> > +	}
> > +	return page_address(page);
> >  }
> >  
> >  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
> >  {
> >  	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
> > +	pgtable_pmd_page_dtor(virt_to_page(pmdp));
> >  	free_page((unsigned long)pmdp);
> >  }
> 
> It looks like arm64's existing stage-2 code is inconsistent across
> alloc/free, and IIUC this change might turn that into a real problem.
> Currently we allocate all levels of stage-2 table with
> __get_free_page(), but free them with p?d_free(). We always miss the
> ctor and always use the dtor.
> 
> Other than that, this patch looks fine to me, but I'd feel more
> comfortable if we could first fix the stage-2 code to free those stage-2
> tables without invoking the dtor.
> 
> Anshuman, IIRC you had a patch to fix the stage-2 code to not invoke the
> dtors. If so, could you please post that so that we could take it as a
> preparatory patch for this series?

Will do.

> Thanks,
> Mark.
> 
> > diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> > index 106fdc951b6e..4e3becfed387 100644
> > --- a/arch/arm64/include/asm/tlb.h
> > +++ b/arch/arm64/include/asm/tlb.h
> > @@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> >  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> >  				  unsigned long addr)
> >  {
> > -	tlb_remove_table(tlb, virt_to_page(pmdp));
> > +	struct page *page = virt_to_page(pmdp);
> > +
> > +	pgtable_pmd_page_dtor(page);
> > +	tlb_remove_table(tlb, page);
> >  }
> >  #endif
> >  
> > -- 
> > 2.21.0.360.g471c308f928-goog
> > 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables
  2019-03-11  7:45     ` [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
@ 2019-03-11 23:23       ` Yu Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-11 23:23 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Aneesh Kumar K . V,
	Andrew Morton, Nick Piggin, Peter Zijlstra, Joel Fernandes,
	Kirill A . Shutemov, Ard Biesheuvel, Chintan Pandya, Jun Yao,
	Laura Abbott, linux-arm-kernel, linux-kernel, linux-arch,
	linux-mm

On Mon, Mar 11, 2019 at 01:15:55PM +0530, Anshuman Khandual wrote:
> Hello Yu,
> 
> We had some disagreements over this series last time around after which I had
> posted the following series [1] which tried to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
> after doing some pgtable accounting changes. After some thoughts and deliberations
> I figure that its better not to do pgtable alloc changes on arm64 creating a brand
> new semantics which ideally should be first debated and agreed upon in generic MM.
> 
> Though I still see value in a changed generic pgtable page allocation semantics
> for user and kernel space that should not stop us from enabling more granular
> PMD level locks through ARCH_ENABLE_SPLIT_PMD_PTLOCK right now.
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg709917.html
> 
> Having said that this series attempts to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK with
> some minimal changes to existing kernel pgtable page allocation code. Hence just
> trying to re-evaluate the series in that isolation.
> 
> On 03/10/2019 06:49 AM, Yu Zhao wrote:
> 
> > For pte page, use pgtable_page_ctor(); for pmd page, use
> > pgtable_pmd_page_ctor(); and for the rest (pud, p4d and pgd),
> > don't use any.
> 
> This is semantics change. Hence the question is why ? Should not we wait until a
> generic MM agreement in place in this regard ? Can we avoid this ? Is the change
> really required to enable ARCH_ENABLE_SPLIT_PMD_PTLOCK for user space THP which
> this series originally intended to achieve ?
> 
> > 
> > For now, we don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK and
> > pgtable_pmd_page_ctor() is a nop. When we do in patch 3, we
> > make sure pmd is not folded so we won't mistakenly call
> > pgtable_pmd_page_ctor() on pud or p4d.
> 
> This makes sense from code perspective but I still dont understand the need to
> change kernel pgtable page allocation semantics without any real benefit or fix at
> the moment. Cant we keep kernel page table page allocation unchanged for now and
> just enable ARCH_ENABLE_SPLIT_PMD_PTLOCK for user space THP benefits ? Do you see
> any concern with that.

This is not for kernel page tables (i.e. init_mm). This is to
accommodate pre-allocated efi_mm page tables because it uses
apply_to_page_range() which then calls pte_alloc_map_lock().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 1/4] arm64: mm: use appropriate ctors for page tables
  2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
                       ` (2 preceding siblings ...)
  2019-03-11  7:45     ` [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
@ 2019-03-12  0:57     ` Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 2/4] arm64: mm: don't call page table ctors for init_mm Yu Zhao
                         ` (2 more replies)
  3 siblings, 3 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-12  0:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

For pte page, use pgtable_page_ctor(); for pmd page, use
pgtable_pmd_page_ctor(); and for the rest (pud, p4d and pgd),
don't use any.

For now, we don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK and
pgtable_pmd_page_ctor() is a nop. When we do in patch 3, we
make sure pmd is not folded so we won't mistakenly call
pgtable_pmd_page_ctor() on pud or p4d.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b6f5aa52ac67..f704b291f2c5 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -98,7 +98,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static phys_addr_t __init early_pgtable_alloc(void)
+static phys_addr_t __init early_pgtable_alloc(int shift)
 {
 	phys_addr_t phys;
 	void *ptr;
@@ -173,7 +173,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void),
+				phys_addr_t (*pgtable_alloc)(int),
 				int flags)
 {
 	unsigned long next;
@@ -183,7 +183,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 	if (pmd_none(pmd)) {
 		phys_addr_t pte_phys;
 		BUG_ON(!pgtable_alloc);
-		pte_phys = pgtable_alloc();
+		pte_phys = pgtable_alloc(PAGE_SHIFT);
 		__pmd_populate(pmdp, pte_phys, PMD_TYPE_TABLE);
 		pmd = READ_ONCE(*pmdp);
 	}
@@ -207,7 +207,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
 
 static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 		     phys_addr_t phys, pgprot_t prot,
-		     phys_addr_t (*pgtable_alloc)(void), int flags)
+		     phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pmd_t *pmdp;
@@ -245,7 +245,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end,
 static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 				unsigned long end, phys_addr_t phys,
 				pgprot_t prot,
-				phys_addr_t (*pgtable_alloc)(void), int flags)
+				phys_addr_t (*pgtable_alloc)(int), int flags)
 {
 	unsigned long next;
 	pud_t pud = READ_ONCE(*pudp);
@@ -257,7 +257,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 	if (pud_none(pud)) {
 		phys_addr_t pmd_phys;
 		BUG_ON(!pgtable_alloc);
-		pmd_phys = pgtable_alloc();
+		pmd_phys = pgtable_alloc(PMD_SHIFT);
 		__pud_populate(pudp, pmd_phys, PUD_TYPE_TABLE);
 		pud = READ_ONCE(*pudp);
 	}
@@ -293,7 +293,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 
 static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
-			   phys_addr_t (*pgtable_alloc)(void),
+			   phys_addr_t (*pgtable_alloc)(int),
 			   int flags)
 {
 	unsigned long next;
@@ -303,7 +303,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 	if (pgd_none(pgd)) {
 		phys_addr_t pud_phys;
 		BUG_ON(!pgtable_alloc);
-		pud_phys = pgtable_alloc();
+		pud_phys = pgtable_alloc(PUD_SHIFT);
 		__pgd_populate(pgdp, pud_phys, PUD_TYPE_TABLE);
 		pgd = READ_ONCE(*pgdp);
 	}
@@ -344,7 +344,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
-				 phys_addr_t (*pgtable_alloc)(void),
+				 phys_addr_t (*pgtable_alloc)(int),
 				 int flags)
 {
 	unsigned long addr, length, end, next;
@@ -370,11 +370,23 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
-static phys_addr_t pgd_pgtable_alloc(void)
+static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
-	if (!ptr || !pgtable_page_ctor(virt_to_page(ptr)))
-		BUG();
+	BUG_ON(!ptr);
+
+	/*
+	 * Call proper page table ctor in case later we need to
+	 * call core mm functions like apply_to_page_range() on
+	 * this pre-allocated page table.
+	 *
+	 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
+	 * folded, and if so pgtable_pmd_page_ctor() becomes nop.
+	 */
+	if (shift == PAGE_SHIFT)
+		BUG_ON(!pgtable_page_ctor(virt_to_page(ptr)));
+	else if (shift == PMD_SHIFT)
+		BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr)));
 
 	/* Ensure the zeroed page is visible to the page table walker */
 	dsb(ishst);
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 2/4] arm64: mm: don't call page table ctors for init_mm
  2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
@ 2019-03-12  0:57       ` Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 3/4] arm64: mm: call ctor for stage2 pmd page Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 4/4] arm64: mm: enable per pmd page table lock Yu Zhao
  2 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-12  0:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

init_mm doesn't require page table lock to be initialized at
any level. Add a separate page table allocator for it, and the
new one skips page table ctors.

The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not
calling them avoids memory leak in case we call pte_free_kernel()
on init_mm.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/mm/mmu.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f704b291f2c5..d1dc2a2777aa 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -370,6 +370,16 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	} while (pgdp++, addr = next, addr != end);
 }
 
+static phys_addr_t pgd_kernel_pgtable_alloc(int shift)
+{
+	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
+	BUG_ON(!ptr);
+
+	/* Ensure the zeroed page is visible to the page table walker */
+	dsb(ishst);
+	return __pa(ptr);
+}
+
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
@@ -594,7 +604,7 @@ static int __init map_entry_trampoline(void)
 	/* Map only the text into the trampoline page table */
 	memset(tramp_pg_dir, 0, PGD_SIZE);
 	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, PAGE_SIZE,
-			     prot, pgd_pgtable_alloc, 0);
+			     prot, pgd_kernel_pgtable_alloc, 0);
 
 	/* Map both the text and data into the kernel page table */
 	__set_fixmap(FIX_ENTRY_TRAMP_TEXT, pa_start, prot);
@@ -1070,7 +1080,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
-			     size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
+			     size, PAGE_KERNEL, pgd_kernel_pgtable_alloc,
+			     flags);
 
 	return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
 			   altmap, want_memblock);
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 3/4] arm64: mm: call ctor for stage2 pmd page
  2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 2/4] arm64: mm: don't call page table ctors for init_mm Yu Zhao
@ 2019-03-12  0:57       ` Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 4/4] arm64: mm: enable per pmd page table lock Yu Zhao
  2 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-12  0:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

Call pgtable_pmd_page_dtor() for pmd page allocated by
mmu_memory_cache_alloc() so kernel won't crash when it's freed
through stage2_pmd_free()->pmd_free()->pgtable_pmd_page_dtor().

This is needed if we are going to enable split pmd pt lock.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/include/asm/stage2_pgtable.h | 15 ++++++++++++---
 virt/kvm/arm/mmu.c                      | 13 +++++++++++--
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 5412fa40825e..0d9207144257 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -174,10 +174,19 @@ static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud)
 		return 1;
 }
 
-static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd)
+static inline int stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd)
 {
-	if (kvm_stage2_has_pmd(kvm))
-		pud_populate(NULL, pud, pmd);
+	if (!kvm_stage2_has_pmd(kvm))
+		return 0;
+
+	/* paired with pgtable_pmd_page_dtor() in pmd_free() below */
+	if (!pgtable_pmd_page_ctor(virt_to_page(pmd))) {
+		free_page((unsigned long)pmd);
+		return -ENOMEM;
+	}
+
+	pud_populate(NULL, pud, pmd);
+	return 0;
 }
 
 static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e9d28a7ca673..11922d84be83 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1037,6 +1037,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
+	int ret;
 	pud_t *pud;
 	pmd_t *pmd;
 
@@ -1048,7 +1049,9 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 		if (!cache)
 			return NULL;
 		pmd = mmu_memory_cache_alloc(cache);
-		stage2_pud_populate(kvm, pud, pmd);
+		ret = stage2_pud_populate(kvm, pud, pmd);
+		if (ret)
+			return ERR_PTR(ret);
 		get_page(virt_to_page(pud));
 	}
 
@@ -1061,6 +1064,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	pmd_t *pmd, old_pmd;
 
 	pmd = stage2_get_pmd(kvm, cache, addr);
+	if (IS_ERR(pmd))
+		return PTR_ERR(pmd);
+
 	VM_BUG_ON(!pmd);
 
 	old_pmd = *pmd;
@@ -1198,6 +1204,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
 {
+	int ret;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -1227,7 +1234,9 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		if (!cache)
 			return 0; /* ignore calls from kvm_set_spte_hva */
 		pmd = mmu_memory_cache_alloc(cache);
-		stage2_pud_populate(kvm, pud, pmd);
+		ret = stage2_pud_populate(kvm, pud, pmd);
+		if (ret)
+			return ret;
 		get_page(virt_to_page(pud));
 	}
 
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 4/4] arm64: mm: enable per pmd page table lock
  2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 2/4] arm64: mm: don't call page table ctors for init_mm Yu Zhao
  2019-03-12  0:57       ` [PATCH v4 3/4] arm64: mm: call ctor for stage2 pmd page Yu Zhao
@ 2019-03-12  0:57       ` Yu Zhao
  2 siblings, 0 replies; 38+ messages in thread
From: Yu Zhao @ 2019-03-12  0:57 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland
  Cc: Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
	Joel Fernandes, Kirill A . Shutemov, Ard Biesheuvel,
	Chintan Pandya, Jun Yao, Laura Abbott, linux-arm-kernel,
	linux-kernel, linux-arch, linux-mm, Yu Zhao

Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

We only do so when pmd is not folded, so we don't mistakenly call
pgtable_pmd_page_ctor() on pud or p4d in pgd_pgtable_alloc().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
 arch/arm64/include/asm/tlb.h     |  5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cfbf307d6dc4..a3b1b789f766 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+	def_bool y if PGTABLE_LEVELS > 2
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	---help---
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 52fa47c73bf0..dabba4b2c61f 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -33,12 +33,22 @@
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)__get_free_page(PGALLOC_GFP);
+	struct page *page;
+
+	page = alloc_page(PGALLOC_GFP);
+	if (!page)
+		return NULL;
+	if (!pgtable_pmd_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
+	return page_address(page);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
 	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+	pgtable_pmd_page_dtor(virt_to_page(pmdp));
 	free_page((unsigned long)pmdp);
 }
 
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 106fdc951b6e..4e3becfed387 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pmdp));
+	struct page *page = virt_to_page(pmdp);
+
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif
 
-- 
2.21.0.360.g471c308f928-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-03-12  0:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 21:16 [PATCH] arm64: mm: enable per pmd page table lock Yu Zhao
2019-02-18 15:12 ` Will Deacon
2019-02-18 19:49   ` Yu Zhao
2019-02-18 20:48     ` Yu Zhao
2019-02-19  4:09     ` Anshuman Khandual
2019-02-18 23:13 ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Yu Zhao
2019-02-18 23:13   ` [PATCH v2 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
2019-02-26 15:13     ` Mark Rutland
2019-03-09  3:52       ` Yu Zhao
2019-02-18 23:13   ` [PATCH v2 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
2019-02-19  4:21   ` [PATCH v2 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
2019-02-19  5:32     ` Yu Zhao
2019-02-19  6:17       ` Anshuman Khandual
2019-02-19 22:28         ` Yu Zhao
2019-02-20 10:27           ` Anshuman Khandual
2019-02-20 12:24             ` Matthew Wilcox
2019-02-20 20:22             ` Yu Zhao
2019-02-20 20:59               ` Matthew Wilcox
2019-02-20  1:34         ` Matthew Wilcox
2019-02-20  3:20           ` Anshuman Khandual
2019-02-20 21:03       ` Matthew Wilcox
2019-02-26 15:12   ` Mark Rutland
2019-03-09  4:01     ` Yu Zhao
2019-03-10  1:19   ` [PATCH v3 " Yu Zhao
2019-03-10  1:19     ` [PATCH v3 2/3] arm64: mm: don't call page table ctors for init_mm Yu Zhao
2019-03-10  1:19     ` [PATCH v3 3/3] arm64: mm: enable per pmd page table lock Yu Zhao
2019-03-11  8:28       ` Anshuman Khandual
2019-03-11 23:10         ` Yu Zhao
2019-03-11 12:12       ` Mark Rutland
2019-03-11 12:57         ` Anshuman Khandual
2019-03-11 23:11         ` Yu Zhao
2019-03-11  7:45     ` [PATCH v3 1/3] arm64: mm: use appropriate ctors for page tables Anshuman Khandual
2019-03-11 23:23       ` Yu Zhao
2019-03-12  0:57     ` [PATCH v4 1/4] " Yu Zhao
2019-03-12  0:57       ` [PATCH v4 2/4] arm64: mm: don't call page table ctors for init_mm Yu Zhao
2019-03-12  0:57       ` [PATCH v4 3/4] arm64: mm: call ctor for stage2 pmd page Yu Zhao
2019-03-12  0:57       ` [PATCH v4 4/4] arm64: mm: enable per pmd page table lock Yu Zhao
2019-02-19  3:08 ` [PATCH] " Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).