LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH  00/17] Implement use of HW assistance on TLB table walk on 8xx
@ 2018-05-04 12:33 Christophe Leroy
  2018-05-04 12:33 ` [PATCH 01/17] powerpc/nohash: remove hash related code from nohash headers Christophe Leroy
                   ` (17 more replies)
  0 siblings, 18 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

The purpose of this serie is to implement hardware assistance for TLB table walk
on the 8xx.

First part is to make L1 entries and L2 entries independant.
For that, we need to alter ioremap functions in order to handle GUARD attribute
at the PGD/PMD level.

Last part is to try and reuse PTE fragment implemented on PPC64 in order to
not waste 16k Pages for page tables as only 4k are used. For the time being,
it doesn't work, but I include it in the serie anyway in order to get feedback.

Tested successfully on 8xx up to the one before the last.

Didn't have time to do compilation test on other configs, I send it anyway
before leaving for one week vacation in order to get feedback.

Christophe Leroy (17):
  powerpc/nohash: remove hash related code from nohash headers.
  powerpc/nohash: remove _PAGE_BUSY
  powerpc/nohash: use IS_ENABLED() to simplify __set_pte_at()
  Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for
    CONFIG_SWAP"
  powerpc: move io mapping functions into ioremap.c
  powerpc: common ioremap functions.
  powerpc: make ioremap_bot common to PPC32 and PPC64
  powerpc: make __iounmap() common to PPC32 and PPC64
  powerpc: make __ioremap_caller() common to PPC32 and PPC64
  powerpc: use _ALIGN macro
  powerpc/nohash32: set GUARDED attribute in the PMD directly
  powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
  powerpc/8xx: reunify TLB handler routines
  powerpc/8xx: Free up SPRN_SPRG_SCRATCH2
  powerpc/mm: Make pte_fragment_alloc() common to PPC32 and PPC64
  powerpc/mm: Use pte_fragment_alloc() on 8xx (Not Working yet)

 arch/powerpc/include/asm/book3s/32/pgtable.h |  16 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h |   2 +
 arch/powerpc/include/asm/hugetlb.h           |   4 +-
 arch/powerpc/include/asm/machdep.h           |   2 +-
 arch/powerpc/include/asm/mmu-8xx.h           |  38 +--
 arch/powerpc/include/asm/mmu_context.h       |  28 +++
 arch/powerpc/include/asm/nohash/32/pgalloc.h |  39 ++-
 arch/powerpc/include/asm/nohash/32/pgtable.h |  88 +++----
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |   6 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  26 +-
 arch/powerpc/include/asm/nohash/pgtable.h    |  61 ++---
 arch/powerpc/include/asm/nohash/pte-book3e.h |   6 -
 arch/powerpc/include/asm/pgtable-types.h     |   4 +
 arch/powerpc/kernel/head_8xx.S               | 350 ++++++++++-----------------
 arch/powerpc/mm/8xx_mmu.c                    |  12 +-
 arch/powerpc/mm/Makefile                     |   2 +-
 arch/powerpc/mm/dma-noncoherent.c            |   2 +-
 arch/powerpc/mm/dump_linuxpagetables.c       |  32 ++-
 arch/powerpc/mm/hugetlbpage.c                |  12 +
 arch/powerpc/mm/init_32.c                    |   6 +-
 arch/powerpc/mm/ioremap.c                    | 250 +++++++++++++++++++
 arch/powerpc/mm/mem.c                        |  16 +-
 arch/powerpc/mm/mmu_context_book3s64.c       |  28 ---
 arch/powerpc/mm/mmu_context_nohash.c         |   4 +
 arch/powerpc/mm/pgtable.c                    |  75 ++++++
 arch/powerpc/mm/pgtable_32.c                 | 167 +++----------
 arch/powerpc/mm/pgtable_64.c                 | 244 -------------------
 arch/powerpc/platforms/Kconfig.cputype       |   9 +
 28 files changed, 730 insertions(+), 799 deletions(-)
 create mode 100644 arch/powerpc/mm/ioremap.c

-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  01/17] powerpc/nohash: remove hash related code from nohash headers.
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
@ 2018-05-04 12:33 ` Christophe Leroy
  2018-05-08  8:25   ` Aneesh Kumar K.V
  2018-05-04 12:33 ` [PATCH 02/17] powerpc/nohash: remove _PAGE_BUSY Christophe Leroy
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

When nohash and book3s header were split, some hash related stuff
remained in the nohash header. This patch removes them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 Removed the call to pte_young() as it fails, back to using PAGE_ACCESSED directly.

 arch/powerpc/include/asm/nohash/32/pgtable.h | 29 +++------------------
 arch/powerpc/include/asm/nohash/64/pgtable.h | 16 ++----------
 arch/powerpc/include/asm/nohash/pgtable.h    | 38 +++-------------------------
 arch/powerpc/include/asm/nohash/pte-book3e.h |  1 -
 4 files changed, 10 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 03bbd1149530..140f8e74b478 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -133,7 +133,7 @@ extern int icache_44x_need_flush;
 #ifndef __ASSEMBLY__
 
 #define pte_clear(mm, addr, ptep) \
-	do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
+	do { pte_update(ptep, ~0, 0); } while (0)
 
 #define pmd_none(pmd)		(!pmd_val(pmd))
 #define	pmd_bad(pmd)		(pmd_val(pmd) & _PMD_BAD)
@@ -146,21 +146,6 @@ static inline void pmd_clear(pmd_t *pmdp)
 
 
 /*
- * When flushing the tlb entry for a page, we also need to flush the hash
- * table entry.  flush_hash_pages is assembler (for speed) in hashtable.S.
- */
-extern int flush_hash_pages(unsigned context, unsigned long va,
-			    unsigned long pmdval, int count);
-
-/* Add an HPTE to the hash table */
-extern void add_hash_page(unsigned context, unsigned long va,
-			  unsigned long pmdval);
-
-/* Flush an entry from the TLB/hash table */
-extern void flush_hash_entry(struct mm_struct *mm, pte_t *ptep,
-			     unsigned long address);
-
-/*
  * PTE updates. This function is called whenever an existing
  * valid PTE is updated. This does -not- include set_pte_at()
  * which nowadays only sets a new PTE.
@@ -246,12 +231,6 @@ static inline int __ptep_test_and_clear_young(unsigned int context, unsigned lon
 {
 	unsigned long old;
 	old = pte_update(ptep, _PAGE_ACCESSED, 0);
-#if _PAGE_HASHPTE != 0
-	if (old & _PAGE_HASHPTE) {
-		unsigned long ptephys = __pa(ptep) & PAGE_MASK;
-		flush_hash_pages(context, addr, ptephys, 1);
-	}
-#endif
 	return (old & _PAGE_ACCESSED) != 0;
 }
 #define ptep_test_and_clear_young(__vma, __addr, __ptep) \
@@ -261,7 +240,7 @@ static inline int __ptep_test_and_clear_young(unsigned int context, unsigned lon
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 				       pte_t *ptep)
 {
-	return __pte(pte_update(ptep, ~_PAGE_HASHPTE, 0));
+	return __pte(pte_update(ptep, ~0, 0));
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
@@ -289,7 +268,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PTE_SAME
-#define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
+#define pte_same(A,B)	((pte_val(A) ^ pte_val(B)) == 0)
 
 /*
  * Note that on Book E processors, the pmd contains the kernel virtual
@@ -330,7 +309,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 /*
  * Encode and decode a swap entry.
  * Note that the bits we use in a PTE for representing a swap entry
- * must not include the _PAGE_PRESENT bit or the _PAGE_HASHPTE bit (if used).
+ * must not include the _PAGE_PRESENT bit.
  *   -- paulus
  */
 #define __swp_type(entry)		((entry).val & 0x1f)
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 5c5f75d005ad..4f6f5a27bfb5 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -173,8 +173,6 @@ static inline void pgd_set(pgd_t *pgdp, unsigned long val)
 /* to find an entry in a kernel page-table-directory */
 /* This now only contains the vmalloc pages */
 #define pgd_offset_k(address) pgd_offset(&init_mm, address)
-extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
-			    pte_t *ptep, unsigned long pte, int huge);
 
 /* Atomic PTE updates */
 static inline unsigned long pte_update(struct mm_struct *mm,
@@ -205,11 +203,6 @@ static inline unsigned long pte_update(struct mm_struct *mm,
 	if (!huge)
 		assert_pte_locked(mm, addr);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-	if (old & _PAGE_HASHPTE)
-		hpte_need_flush(mm, addr, ptep, old, huge);
-#endif
-
 	return old;
 }
 
@@ -218,7 +211,7 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
 {
 	unsigned long old;
 
-	if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
+	if ((pte_val(*ptep) & _PAGE_ACCESSED) == 0)
 		return 0;
 	old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
 	return (old & _PAGE_ACCESSED) != 0;
@@ -312,7 +305,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PTE_SAME
-#define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HPTEFLAGS) == 0)
+#define pte_same(A,B)	((pte_val(A) ^ pte_val(B)) == 0)
 
 #define pte_ERROR(e) \
 	pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
@@ -324,11 +317,6 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 /* Encode and de-code a swap entry */
 #define MAX_SWAPFILES_CHECK() do { \
 	BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
-	/*							\
-	 * Don't have overlapping bits with _PAGE_HPTEFLAGS	\
-	 * We filter HPTEFLAGS on set_pte.			\
-	 */							\
-	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
 	} while (0)
 /*
  * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index c56de1e8026f..f2fe3cbe90af 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -148,37 +148,16 @@ extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 				pte_t *ptep, pte_t pte, int percpu)
 {
-#if defined(CONFIG_PPC_STD_MMU_32) && defined(CONFIG_SMP) && !defined(CONFIG_PTE_64BIT)
-	/* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use the
-	 * helper pte_update() which does an atomic update. We need to do that
-	 * because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
-	 * per-CPU PTE such as a kmap_atomic, we do a simple update preserving
-	 * the hash bits instead (ie, same as the non-SMP case)
-	 */
-	if (percpu)
-		*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
-			      | (pte_val(pte) & ~_PAGE_HASHPTE));
-	else
-		pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
-
-#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
+#if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 	/* Second case is 32-bit with 64-bit PTE.  In this case, we
 	 * can just store as long as we do the two halves in the right order
-	 * with a barrier in between. This is possible because we take care,
-	 * in the hash code, to pre-invalidate if the PTE was already hashed,
-	 * which synchronizes us with any concurrent invalidation.
-	 * In the percpu case, we also fallback to the simple update preserving
-	 * the hash bits
+	 * with a barrier in between.
+	 * In the percpu case, we also fallback to the simple update
 	 */
 	if (percpu) {
-		*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
-			      | (pte_val(pte) & ~_PAGE_HASHPTE));
+		*ptep = pte;
 		return;
 	}
-#if _PAGE_HASHPTE != 0
-	if (pte_val(*ptep) & _PAGE_HASHPTE)
-		flush_hash_entry(mm, ptep, addr);
-#endif
 	__asm__ __volatile__("\
 		stw%U0%X0 %2,%0\n\
 		eieio\n\
@@ -186,15 +165,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
 	: "r" (pte) : "memory");
 
-#elif defined(CONFIG_PPC_STD_MMU_32)
-	/* Third case is 32-bit hash table in UP mode, we need to preserve
-	 * the _PAGE_HASHPTE bit since we may not have invalidated the previous
-	 * translation in the hash yet (done in a subsequent flush_tlb_xxx())
-	 * and see we need to keep track that this PTE needs invalidating
-	 */
-	*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
-		      | (pte_val(pte) & ~_PAGE_HASHPTE));
-
 #else
 	/* Anything else just stores the PTE normally. That covers all 64-bit
 	 * cases, and 32-bit non-hash with 32-bit PTEs.
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h
index ccee8eb509bb..9ff51b4c0cac 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -57,7 +57,6 @@
 #define _PAGE_USER		(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */
 #define _PAGE_PRIVILEGED	(_PAGE_BAP_SR)
 
-#define _PAGE_HASHPTE	0
 #define _PAGE_BUSY	0
 
 #define _PAGE_SPECIAL	_PAGE_SW0
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  02/17] powerpc/nohash: remove _PAGE_BUSY
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
  2018-05-04 12:33 ` [PATCH 01/17] powerpc/nohash: remove hash related code from nohash headers Christophe Leroy
@ 2018-05-04 12:33 ` Christophe Leroy
  2018-05-08  8:26   ` Aneesh Kumar K.V
  2018-05-04 12:33 ` [PATCH 03/17] powerpc/nohash: use IS_ENABLED() to simplify __set_pte_at() Christophe Leroy
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

_PAGE_BUSY is always 0, remove it

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/nohash/64/pgtable.h | 10 +++-------
 arch/powerpc/include/asm/nohash/pte-book3e.h |  5 -----
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 4f6f5a27bfb5..c3559d7a94fb 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -186,14 +186,12 @@ static inline unsigned long pte_update(struct mm_struct *mm,
 
 	__asm__ __volatile__(
 	"1:	ldarx	%0,0,%3		# pte_update\n\
-	andi.	%1,%0,%6\n\
-	bne-	1b \n\
 	andc	%1,%0,%4 \n\
-	or	%1,%1,%7\n\
+	or	%1,%1,%6\n\
 	stdcx.	%1,0,%3 \n\
 	bne-	1b"
 	: "=&r" (old), "=&r" (tmp), "=m" (*ptep)
-	: "r" (ptep), "r" (clr), "m" (*ptep), "i" (_PAGE_BUSY), "r" (set)
+	: "r" (ptep), "r" (clr), "m" (*ptep), "r" (set)
 	: "cc" );
 #else
 	unsigned long old = pte_val(*ptep);
@@ -290,13 +288,11 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
 
 	__asm__ __volatile__(
 	"1:	ldarx	%0,0,%4\n\
-		andi.	%1,%0,%6\n\
-		bne-	1b \n\
 		or	%0,%3,%0\n\
 		stdcx.	%0,0,%4\n\
 		bne-	1b"
 	:"=&r" (old), "=&r" (tmp), "=m" (*ptep)
-	:"r" (bits), "r" (ptep), "m" (*ptep), "i" (_PAGE_BUSY)
+	:"r" (bits), "r" (ptep), "m" (*ptep)
 	:"cc");
 #else
 	unsigned long old = pte_val(*ptep);
diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h
index 9ff51b4c0cac..12730b81cd98 100644
--- a/arch/powerpc/include/asm/nohash/pte-book3e.h
+++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
@@ -57,13 +57,8 @@
 #define _PAGE_USER		(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */
 #define _PAGE_PRIVILEGED	(_PAGE_BAP_SR)
 
-#define _PAGE_BUSY	0
-
 #define _PAGE_SPECIAL	_PAGE_SW0
 
-/* Flags to be preserved on PTE modifications */
-#define _PAGE_HPTEFLAGS	_PAGE_BUSY
-
 /* Base page size */
 #ifdef CONFIG_PPC_64K_PAGES
 #define _PAGE_PSIZE	_PAGE_PSIZE_64K
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  03/17] powerpc/nohash: use IS_ENABLED() to simplify __set_pte_at()
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
  2018-05-04 12:33 ` [PATCH 01/17] powerpc/nohash: remove hash related code from nohash headers Christophe Leroy
  2018-05-04 12:33 ` [PATCH 02/17] powerpc/nohash: remove _PAGE_BUSY Christophe Leroy
@ 2018-05-04 12:33 ` Christophe Leroy
  2018-05-04 12:33 ` [PATCH 04/17] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

By using IS_ENABLED() we can simplify __set_pte_at() by removing
redundant *ptep = pte

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/nohash/pgtable.h | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index f2fe3cbe90af..077472640b35 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -148,40 +148,33 @@ extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 				pte_t *ptep, pte_t pte, int percpu)
 {
-#if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 	/* Second case is 32-bit with 64-bit PTE.  In this case, we
 	 * can just store as long as we do the two halves in the right order
 	 * with a barrier in between.
 	 * In the percpu case, we also fallback to the simple update
 	 */
-	if (percpu) {
-		*ptep = pte;
+	if (IS_ENABLED(CONFIG_PPC32) && IS_ENABLED(CONFIG_PTE_64BIT) && !percpu) {
+		__asm__ __volatile__("\
+			stw%U0%X0 %2,%0\n\
+			eieio\n\
+			stw%U0%X0 %L2,%1"
+		: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
+		: "r" (pte) : "memory");
 		return;
 	}
-	__asm__ __volatile__("\
-		stw%U0%X0 %2,%0\n\
-		eieio\n\
-		stw%U0%X0 %L2,%1"
-	: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
-	: "r" (pte) : "memory");
-
-#else
 	/* Anything else just stores the PTE normally. That covers all 64-bit
 	 * cases, and 32-bit non-hash with 32-bit PTEs.
 	 */
 	*ptep = pte;
 
-#ifdef CONFIG_PPC_BOOK3E_64
 	/*
 	 * With hardware tablewalk, a sync is needed to ensure that
 	 * subsequent accesses see the PTE we just wrote.  Unlike userspace
 	 * mappings, we can't tolerate spurious faults, so make sure
 	 * the new PTE will be seen the first time.
 	 */
-	if (is_kernel_addr(addr))
+	if (IS_ENABLED(CONFIG_PPC_BOOK3E_64) && is_kernel_addr(addr))
 		mb();
-#endif
-#endif
 }
 
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  04/17] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (2 preceding siblings ...)
  2018-05-04 12:33 ` [PATCH 03/17] powerpc/nohash: use IS_ENABLED() to simplify __set_pte_at() Christophe Leroy
@ 2018-05-04 12:33 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 05/17] powerpc: move io mapping functions into ioremap.c Christophe Leroy
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

This reverts commit 4f94b2c7462d9720b2afa7e8e8d4c19446bb31ce.

That commit was buggy, as it used rlwinm instead of rlwimi.
Instead of fixing that bug, we revert the previous commit in order to
reduce the dependency between L1 entries and L2 entries

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/mmu-8xx.h | 34 +++++-----------------------
 arch/powerpc/kernel/head_8xx.S     | 45 +++++++++++++++++++++++---------------
 arch/powerpc/mm/8xx_mmu.c          |  2 +-
 3 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 4f547752ae79..193f53116c7a 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -34,20 +34,12 @@
  * respectively NA for All or X for Supervisor and no access for User.
  * Then we use the APG to say whether accesses are according to Page rules or
  * "all Supervisor" rules (Access to all)
- * We also use the 2nd APG bit for _PAGE_ACCESSED when having SWAP:
- * When that bit is not set access is done iaw "all user"
- * which means no access iaw page rules.
- * Therefore, we define 4 APG groups. lsb is _PMD_USER, 2nd is _PAGE_ACCESSED
- * 0x => No access => 11 (all accesses performed as user iaw page definition)
- * 10 => No user => 01 (all accesses performed according to page definition)
- * 11 => User => 00 (all accesses performed as supervisor iaw page definition)
+ * Therefore, we define 2 APG groups. lsb is _PMD_USER
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
  * We define all 16 groups so that all other bits of APG can take any value
  */
-#ifdef CONFIG_SWAP
-#define MI_APG_INIT	0xf4f4f4f4
-#else
 #define MI_APG_INIT	0x44444444
-#endif
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MI_RPN is written, bits in
@@ -115,20 +107,12 @@
  * Supervisor and no access for user and NA for ALL.
  * Then we use the APG to say whether accesses are according to Page rules or
  * "all Supervisor" rules (Access to all)
- * We also use the 2nd APG bit for _PAGE_ACCESSED when having SWAP:
- * When that bit is not set access is done iaw "all user"
- * which means no access iaw page rules.
- * Therefore, we define 4 APG groups. lsb is _PMD_USER, 2nd is _PAGE_ACCESSED
- * 0x => No access => 11 (all accesses performed as user iaw page definition)
- * 10 => No user => 01 (all accesses performed according to page definition)
- * 11 => User => 00 (all accesses performed as supervisor iaw page definition)
+ * Therefore, we define 2 APG groups. lsb is _PMD_USER
+ * 0 => No user => 01 (all accesses performed according to page definition)
+ * 1 => User => 00 (all accesses performed as supervisor iaw page definition)
  * We define all 16 groups so that all other bits of APG can take any value
  */
-#ifdef CONFIG_SWAP
-#define MD_APG_INIT	0xf4f4f4f4
-#else
 #define MD_APG_INIT	0x44444444
-#endif
 
 /* The effective page number register.  When read, contains the information
  * about the last instruction TLB miss.  When MD_RPN is written, bits in
@@ -180,12 +164,6 @@
  */
 #define SPRN_M_TW	799
 
-/* APGs */
-#define M_APG0		0x00000000
-#define M_APG1		0x00000020
-#define M_APG2		0x00000040
-#define M_APG3		0x00000060
-
 #ifdef CONFIG_PPC_MM_SLICES
 #include <asm/nohash/32/slice.h>
 #define SLICE_ARRAY_SIZE	(1 << (32 - SLICE_LOW_SHIFT - 1))
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index d8670a37d70c..c3b831bb8bad 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -354,13 +354,14 @@ _ENTRY(ITLBMiss_cmp)
 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
 	mtcr	r12
 #endif
-
-#ifdef CONFIG_SWAP
-	rlwinm	r11, r10, 31, _PAGE_ACCESSED >> 1
-#endif
 	/* Load the MI_TWC with the attributes for this "segment." */
 	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
 
+#ifdef CONFIG_SWAP
+	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
+	and	r11, r11, r10
+	rlwimi	r10, r11, 0, _PAGE_PRESENT
+#endif
 	li	r11, RPN_PATTERN | 0x200
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 20 and 23 must be clear.
@@ -471,14 +472,22 @@ _ENTRY(DTLBMiss_jmp)
 	 * above.
 	 */
 	rlwimi	r11, r10, 0, _PAGE_GUARDED
-#ifdef CONFIG_SWAP
-	/* _PAGE_ACCESSED has to be set. We use second APG bit for that, 0
-	 * on that bit will represent a Non Access group
-	 */
-	rlwinm	r11, r10, 31, _PAGE_ACCESSED >> 1
-#endif
 	mtspr	SPRN_MD_TWC, r11
 
+	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
+	 * We also need to know if the insn is a load/store, so:
+	 * Clear _PAGE_PRESENT and load that which will
+	 * trap into DTLB Error with store bit set accordinly.
+	 */
+	/* PRESENT=0x1, ACCESSED=0x20
+	 * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
+	 * r10 = (r10 & ~PRESENT) | r11;
+	 */
+#ifdef CONFIG_SWAP
+	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
+	and	r11, r11, r10
+	rlwimi	r10, r11, 0, _PAGE_PRESENT
+#endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
@@ -638,8 +647,8 @@ InstructionBreakpoint:
  */
 DTLBMissIMMR:
 	mtcr	r12
-	/* Set 512k byte guarded page and mark it valid and accessed */
-	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID | M_APG2
+	/* Set 512k byte guarded page and mark it valid */
+	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
 	mtspr	SPRN_MD_TWC, r10
 	mfspr	r10, SPRN_IMMR			/* Get current IMMR */
 	rlwinm	r10, r10, 0, 0xfff80000		/* Get 512 kbytes boundary */
@@ -657,8 +666,8 @@ _ENTRY(dtlb_miss_exit_2)
 
 DTLBMissLinear:
 	mtcr	r12
-	/* Set 8M byte page and mark it valid and accessed */
-	li	r11, MD_PS8MEG | MD_SVALID | M_APG2
+	/* Set 8M byte page and mark it valid */
+	li	r11, MD_PS8MEG | MD_SVALID
 	mtspr	SPRN_MD_TWC, r11
 	rlwinm	r10, r10, 0, 0x0f800000	/* 8xx supports max 256Mb RAM */
 	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
@@ -676,8 +685,8 @@ _ENTRY(dtlb_miss_exit_3)
 #ifndef CONFIG_PIN_TLB_TEXT
 ITLBMissLinear:
 	mtcr	r12
-	/* Set 8M byte page and mark it valid,accessed */
-	li	r11, MI_PS8MEG | MI_SVALID | M_APG2
+	/* Set 8M byte page and mark it valid */
+	li	r11, MI_PS8MEG | MI_SVALID
 	mtspr	SPRN_MI_TWC, r11
 	rlwinm	r10, r10, 0, 0x0f800000	/* 8xx supports max 256Mb RAM */
 	ori	r10, r10, 0xf0 | MI_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
@@ -960,7 +969,7 @@ initial_mmu:
 	ori	r8, r8, MI_EVALID	/* Mark it valid */
 	mtspr	SPRN_MI_EPN, r8
 	li	r8, MI_PS8MEG /* Set 8M byte page */
-	ori	r8, r8, MI_SVALID | M_APG2	/* Make it valid, APG 2 */
+	ori	r8, r8, MI_SVALID	/* Make it valid */
 	mtspr	SPRN_MI_TWC, r8
 	li	r8, MI_BOOTINIT		/* Create RPN for address 0 */
 	mtspr	SPRN_MI_RPN, r8		/* Store TLB entry */
@@ -987,7 +996,7 @@ initial_mmu:
 	ori	r8, r8, MD_EVALID	/* Mark it valid */
 	mtspr	SPRN_MD_EPN, r8
 	li	r8, MD_PS512K | MD_GUARDED	/* Set 512k byte page */
-	ori	r8, r8, MD_SVALID | M_APG2	/* Make it valid and accessed */
+	ori	r8, r8, MD_SVALID	/* Make it valid */
 	mtspr	SPRN_MD_TWC, r8
 	mr	r8, r9			/* Create paddr for TLB */
 	ori	r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index cf77d755246d..5d53684c2ebd 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -79,7 +79,7 @@ void __init MMU_init_hw(void)
 	for (; i < 32 && mem >= LARGE_PAGE_SIZE_8M; i++) {
 		mtspr(SPRN_MD_CTR, ctr | (i << 8));
 		mtspr(SPRN_MD_EPN, (unsigned long)__va(addr) | MD_EVALID);
-		mtspr(SPRN_MD_TWC, MD_PS8MEG | MD_SVALID | M_APG2);
+		mtspr(SPRN_MD_TWC, MD_PS8MEG | MD_SVALID);
 		mtspr(SPRN_MD_RPN, addr | flags | _PAGE_PRESENT);
 		addr += LARGE_PAGE_SIZE_8M;
 		mem -= LARGE_PAGE_SIZE_8M;
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  05/17] powerpc: move io mapping functions into ioremap.c
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (3 preceding siblings ...)
  2018-05-04 12:33 ` [PATCH 04/17] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-11  6:01   ` Michael Ellerman
  2018-05-04 12:34 ` [PATCH 06/17] powerpc: common ioremap functions Christophe Leroy
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

This patch is the first of a serie that intends to make
io mappings common to PPC32 and PPC64.

It moves ioremap/unmap fonctions into a new file called ioremap.c with
no other modification to the functions.
For the time being, the PPC32 and PPC64 parts get enclosed into #ifdef.
Following patches will aim at making those functions as common as
possible between PPC32 and PPC64.

This patch also moves EXPORT_SYMBOL at the end of each function

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/mm/Makefile     |   2 +-
 arch/powerpc/mm/ioremap.c    | 350 +++++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/pgtable_32.c | 139 -----------------
 arch/powerpc/mm/pgtable_64.c | 177 ----------------------
 4 files changed, 351 insertions(+), 317 deletions(-)
 create mode 100644 arch/powerpc/mm/ioremap.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index f06f3577d8d1..22d54c1d90e1 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -9,7 +9,7 @@ ccflags-$(CONFIG_PPC64)	:= $(NO_MINIMAL_TOC)
 
 obj-y				:= fault.o mem.o pgtable.o mmap.o \
 				   init_$(BITS).o pgtable_$(BITS).o \
-				   init-common.o mmu_context.o drmem.o
+				   init-common.o mmu_context.o drmem.o ioremap.o
 obj-$(CONFIG_PPC_MMU_NOHASH)	+= mmu_context_nohash.o tlb_nohash.o \
 				   tlb_nohash_low.o
 obj-$(CONFIG_PPC_BOOK3E)	+= tlb_low_$(BITS)e.o
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
new file mode 100644
index 000000000000..5d2645193568
--- /dev/null
+++ b/arch/powerpc/mm/ioremap.c
@@ -0,0 +1,350 @@
+/*
+ * This file contains the routines for mapping IO areas
+ *
+ *  Derived from arch/powerpc/mm/pgtable_32.c and
+ *  arch/powerpc/mm/pgtable_64.c
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/init.h>
+#include <linux/highmem.h>
+#include <linux/memblock.h>
+#include <linux/slab.h>
+
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/fixmap.h>
+#include <asm/io.h>
+#include <asm/setup.h>
+#include <asm/sections.h>
+
+#include "mmu_decl.h"
+
+#ifdef CONFIG_PPC32
+
+unsigned long ioremap_bot;
+EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
+
+void __iomem *
+ioremap(phys_addr_t addr, unsigned long size)
+{
+	return __ioremap_caller(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL(ioremap);
+
+void __iomem *
+ioremap_wc(phys_addr_t addr, unsigned long size)
+{
+	return __ioremap_caller(addr, size, _PAGE_NO_CACHE,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL(ioremap_wc);
+
+void __iomem *
+ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags)
+{
+	/* writeable implies dirty for kernel addresses */
+	if ((flags & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO)
+		flags |= _PAGE_DIRTY | _PAGE_HWWRITE;
+
+	/* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
+	flags &= ~(_PAGE_USER | _PAGE_EXEC);
+	flags |= _PAGE_PRIVILEGED;
+
+	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(ioremap_prot);
+
+void __iomem *
+__ioremap(phys_addr_t addr, unsigned long size, unsigned long flags)
+{
+	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__ioremap);
+
+void __iomem *
+__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
+		 void *caller)
+{
+	unsigned long v, i;
+	phys_addr_t p;
+	int err;
+
+	/* Make sure we have the base flags */
+	if ((flags & _PAGE_PRESENT) == 0)
+		flags |= pgprot_val(PAGE_KERNEL);
+
+	/* Non-cacheable page cannot be coherent */
+	if (flags & _PAGE_NO_CACHE)
+		flags &= ~_PAGE_COHERENT;
+
+	/*
+	 * Choose an address to map it to.
+	 * Once the vmalloc system is running, we use it.
+	 * Before then, we use space going down from IOREMAP_TOP
+	 * (ioremap_bot records where we're up to).
+	 */
+	p = addr & PAGE_MASK;
+	size = PAGE_ALIGN(addr + size) - p;
+
+	/*
+	 * If the address lies within the first 16 MB, assume it's in ISA
+	 * memory space
+	 */
+	if (p < 16*1024*1024)
+		p += _ISA_MEM_BASE;
+
+#ifndef CONFIG_CRASH_DUMP
+	/*
+	 * Don't allow anybody to remap normal RAM that we're using.
+	 * mem_init() sets high_memory so only do the check after that.
+	 */
+	if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
+	    page_is_ram(__phys_to_pfn(p))) {
+		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
+		       (unsigned long long)p, __builtin_return_address(0));
+		return NULL;
+	}
+#endif
+
+	if (size == 0)
+		return NULL;
+
+	/*
+	 * Is it already mapped?  Perhaps overlapped by a previous
+	 * mapping.
+	 */
+	v = p_block_mapped(p);
+	if (v)
+		goto out;
+
+	if (slab_is_available()) {
+		struct vm_struct *area;
+		area = get_vm_area_caller(size, VM_IOREMAP, caller);
+		if (area == 0)
+			return NULL;
+		area->phys_addr = p;
+		v = (unsigned long) area->addr;
+	} else {
+		v = (ioremap_bot -= size);
+	}
+
+	/*
+	 * Should check if it is a candidate for a BAT mapping
+	 */
+
+	err = 0;
+	for (i = 0; i < size && err == 0; i += PAGE_SIZE)
+		err = map_kernel_page(v+i, p+i, flags);
+	if (err) {
+		if (slab_is_available())
+			vunmap((void *)v);
+		return NULL;
+	}
+
+out:
+	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
+}
+
+void iounmap(volatile void __iomem *addr)
+{
+	/*
+	 * If mapped by BATs then there is nothing to do.
+	 * Calling vfree() generates a benign warning.
+	 */
+	if (v_block_mapped((unsigned long)addr))
+		return;
+
+	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
+		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
+}
+EXPORT_SYMBOL(iounmap);
+
+#else
+
+#ifdef CONFIG_PPC_BOOK3S_64
+unsigned long ioremap_bot;
+#else /* !CONFIG_PPC_BOOK3S_64 */
+unsigned long ioremap_bot = IOREMAP_BASE;
+#endif
+
+/**
+ * __ioremap_at - Low level function to establish the page tables
+ *                for an IO mapping
+ */
+void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size,
+			    unsigned long flags)
+{
+	unsigned long i;
+
+	/* Make sure we have the base flags */
+	if ((flags & _PAGE_PRESENT) == 0)
+		flags |= pgprot_val(PAGE_KERNEL);
+
+	/* We don't support the 4K PFN hack with ioremap */
+	if (flags & H_PAGE_4K_PFN)
+		return NULL;
+
+	WARN_ON(pa & ~PAGE_MASK);
+	WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
+	WARN_ON(size & ~PAGE_MASK);
+
+	for (i = 0; i < size; i += PAGE_SIZE)
+		if (map_kernel_page((unsigned long)ea+i, pa+i, flags))
+			return NULL;
+
+	return (void __iomem *)ea;
+}
+EXPORT_SYMBOL(__ioremap_at);
+
+/**
+ * __iounmap_from - Low level function to tear down the page tables
+ *                  for an IO mapping. This is used for mappings that
+ *                  are manipulated manually, like partial unmapping of
+ *                  PCI IOs or ISA space.
+ */
+void __iounmap_at(void *ea, unsigned long size)
+{
+	WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
+	WARN_ON(size & ~PAGE_MASK);
+
+	unmap_kernel_range((unsigned long)ea, size);
+}
+EXPORT_SYMBOL(__iounmap_at);
+
+void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
+				unsigned long flags, void *caller)
+{
+	phys_addr_t paligned;
+	void __iomem *ret;
+
+	/*
+	 * Choose an address to map it to.
+	 * Once the imalloc system is running, we use it.
+	 * Before that, we map using addresses going
+	 * up from ioremap_bot.  imalloc will use
+	 * the addresses from ioremap_bot through
+	 * IMALLOC_END
+	 *
+	 */
+	paligned = addr & PAGE_MASK;
+	size = PAGE_ALIGN(addr + size) - paligned;
+
+	if ((size == 0) || (paligned == 0))
+		return NULL;
+
+	if (slab_is_available()) {
+		struct vm_struct *area;
+
+		area = __get_vm_area_caller(size, VM_IOREMAP,
+					    ioremap_bot, IOREMAP_END,
+					    caller);
+		if (area == NULL)
+			return NULL;
+
+		area->phys_addr = paligned;
+		ret = __ioremap_at(paligned, area->addr, size, flags);
+		if (!ret)
+			vunmap(area->addr);
+	} else {
+		ret = __ioremap_at(paligned, (void *)ioremap_bot, size, flags);
+		if (ret)
+			ioremap_bot += size;
+	}
+
+	if (ret)
+		ret += addr & ~PAGE_MASK;
+	return ret;
+}
+
+void __iomem * __ioremap(phys_addr_t addr, unsigned long size,
+			 unsigned long flags)
+{
+	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__ioremap);
+
+void __iomem * ioremap(phys_addr_t addr, unsigned long size)
+{
+	unsigned long flags = pgprot_val(pgprot_noncached(__pgprot(0)));
+	void *caller = __builtin_return_address(0);
+
+	if (ppc_md.ioremap)
+		return ppc_md.ioremap(addr, size, flags, caller);
+	return __ioremap_caller(addr, size, flags, caller);
+}
+EXPORT_SYMBOL(ioremap);
+
+void __iomem * ioremap_wc(phys_addr_t addr, unsigned long size)
+{
+	unsigned long flags = pgprot_val(pgprot_noncached_wc(__pgprot(0)));
+	void *caller = __builtin_return_address(0);
+
+	if (ppc_md.ioremap)
+		return ppc_md.ioremap(addr, size, flags, caller);
+	return __ioremap_caller(addr, size, flags, caller);
+}
+EXPORT_SYMBOL(ioremap_wc);
+
+void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size,
+			     unsigned long flags)
+{
+	void *caller = __builtin_return_address(0);
+
+	/* writeable implies dirty for kernel addresses */
+	if (flags & _PAGE_WRITE)
+		flags |= _PAGE_DIRTY;
+
+	/* we don't want to let _PAGE_EXEC leak out */
+	flags &= ~_PAGE_EXEC;
+	/*
+	 * Force kernel mapping.
+	 */
+	flags &= ~_PAGE_USER;
+	flags |= _PAGE_PRIVILEGED;
+
+	if (ppc_md.ioremap)
+		return ppc_md.ioremap(addr, size, flags, caller);
+	return __ioremap_caller(addr, size, flags, caller);
+}
+EXPORT_SYMBOL(ioremap_prot);
+
+/*
+ * Unmap an IO region and remove it from imalloc'd list.
+ * Access to IO memory should be serialized by driver.
+ */
+void __iounmap(volatile void __iomem *token)
+{
+	void *addr;
+
+	if (!slab_is_available())
+		return;
+
+	addr = (void *) ((unsigned long __force)
+			 PCI_FIX_ADDR(token) & PAGE_MASK);
+	if ((unsigned long)addr < ioremap_bot) {
+		printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
+		       " at 0x%p\n", addr);
+		return;
+	}
+	vunmap(addr);
+}
+EXPORT_SYMBOL(__iounmap);
+
+void iounmap(volatile void __iomem *token)
+{
+	if (ppc_md.iounmap)
+		ppc_md.iounmap(token);
+	else
+		__iounmap(token);
+}
+EXPORT_SYMBOL(iounmap);
+
+#endif
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 120a49bfb9c6..54a5bc0767a9 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -38,9 +38,6 @@
 
 #include "mmu_decl.h"
 
-unsigned long ioremap_bot;
-EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
-
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
@@ -73,142 +70,6 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 	return ptepage;
 }
 
-void __iomem *
-ioremap(phys_addr_t addr, unsigned long size)
-{
-	return __ioremap_caller(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED,
-				__builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap);
-
-void __iomem *
-ioremap_wc(phys_addr_t addr, unsigned long size)
-{
-	return __ioremap_caller(addr, size, _PAGE_NO_CACHE,
-				__builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_wc);
-
-void __iomem *
-ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-	/* writeable implies dirty for kernel addresses */
-	if ((flags & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO)
-		flags |= _PAGE_DIRTY | _PAGE_HWWRITE;
-
-	/* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
-	flags &= ~(_PAGE_USER | _PAGE_EXEC);
-	flags |= _PAGE_PRIVILEGED;
-
-	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_prot);
-
-void __iomem *
-__ioremap(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
-}
-
-void __iomem *
-__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
-		 void *caller)
-{
-	unsigned long v, i;
-	phys_addr_t p;
-	int err;
-
-	/* Make sure we have the base flags */
-	if ((flags & _PAGE_PRESENT) == 0)
-		flags |= pgprot_val(PAGE_KERNEL);
-
-	/* Non-cacheable page cannot be coherent */
-	if (flags & _PAGE_NO_CACHE)
-		flags &= ~_PAGE_COHERENT;
-
-	/*
-	 * Choose an address to map it to.
-	 * Once the vmalloc system is running, we use it.
-	 * Before then, we use space going down from IOREMAP_TOP
-	 * (ioremap_bot records where we're up to).
-	 */
-	p = addr & PAGE_MASK;
-	size = PAGE_ALIGN(addr + size) - p;
-
-	/*
-	 * If the address lies within the first 16 MB, assume it's in ISA
-	 * memory space
-	 */
-	if (p < 16*1024*1024)
-		p += _ISA_MEM_BASE;
-
-#ifndef CONFIG_CRASH_DUMP
-	/*
-	 * Don't allow anybody to remap normal RAM that we're using.
-	 * mem_init() sets high_memory so only do the check after that.
-	 */
-	if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
-	    page_is_ram(__phys_to_pfn(p))) {
-		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
-		       (unsigned long long)p, __builtin_return_address(0));
-		return NULL;
-	}
-#endif
-
-	if (size == 0)
-		return NULL;
-
-	/*
-	 * Is it already mapped?  Perhaps overlapped by a previous
-	 * mapping.
-	 */
-	v = p_block_mapped(p);
-	if (v)
-		goto out;
-
-	if (slab_is_available()) {
-		struct vm_struct *area;
-		area = get_vm_area_caller(size, VM_IOREMAP, caller);
-		if (area == 0)
-			return NULL;
-		area->phys_addr = p;
-		v = (unsigned long) area->addr;
-	} else {
-		v = (ioremap_bot -= size);
-	}
-
-	/*
-	 * Should check if it is a candidate for a BAT mapping
-	 */
-
-	err = 0;
-	for (i = 0; i < size && err == 0; i += PAGE_SIZE)
-		err = map_kernel_page(v+i, p+i, flags);
-	if (err) {
-		if (slab_is_available())
-			vunmap((void *)v);
-		return NULL;
-	}
-
-out:
-	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
-}
-EXPORT_SYMBOL(__ioremap);
-
-void iounmap(volatile void __iomem *addr)
-{
-	/*
-	 * If mapped by BATs then there is nothing to do.
-	 * Calling vfree() generates a benign warning.
-	 */
-	if (v_block_mapped((unsigned long)addr))
-		return;
-
-	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
-		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
-}
-EXPORT_SYMBOL(iounmap);
-
 int map_kernel_page(unsigned long va, phys_addr_t pa, int flags)
 {
 	pmd_t *pd;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9bf659d5078c..dd1102a246e4 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -109,185 +109,8 @@ unsigned long __pte_frag_nr;
 EXPORT_SYMBOL(__pte_frag_nr);
 unsigned long __pte_frag_size_shift;
 EXPORT_SYMBOL(__pte_frag_size_shift);
-unsigned long ioremap_bot;
-#else /* !CONFIG_PPC_BOOK3S_64 */
-unsigned long ioremap_bot = IOREMAP_BASE;
 #endif
 
-/**
- * __ioremap_at - Low level function to establish the page tables
- *                for an IO mapping
- */
-void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size,
-			    unsigned long flags)
-{
-	unsigned long i;
-
-	/* Make sure we have the base flags */
-	if ((flags & _PAGE_PRESENT) == 0)
-		flags |= pgprot_val(PAGE_KERNEL);
-
-	/* We don't support the 4K PFN hack with ioremap */
-	if (flags & H_PAGE_4K_PFN)
-		return NULL;
-
-	WARN_ON(pa & ~PAGE_MASK);
-	WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-	WARN_ON(size & ~PAGE_MASK);
-
-	for (i = 0; i < size; i += PAGE_SIZE)
-		if (map_kernel_page((unsigned long)ea+i, pa+i, flags))
-			return NULL;
-
-	return (void __iomem *)ea;
-}
-
-/**
- * __iounmap_from - Low level function to tear down the page tables
- *                  for an IO mapping. This is used for mappings that
- *                  are manipulated manually, like partial unmapping of
- *                  PCI IOs or ISA space.
- */
-void __iounmap_at(void *ea, unsigned long size)
-{
-	WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-	WARN_ON(size & ~PAGE_MASK);
-
-	unmap_kernel_range((unsigned long)ea, size);
-}
-
-void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
-				unsigned long flags, void *caller)
-{
-	phys_addr_t paligned;
-	void __iomem *ret;
-
-	/*
-	 * Choose an address to map it to.
-	 * Once the imalloc system is running, we use it.
-	 * Before that, we map using addresses going
-	 * up from ioremap_bot.  imalloc will use
-	 * the addresses from ioremap_bot through
-	 * IMALLOC_END
-	 * 
-	 */
-	paligned = addr & PAGE_MASK;
-	size = PAGE_ALIGN(addr + size) - paligned;
-
-	if ((size == 0) || (paligned == 0))
-		return NULL;
-
-	if (slab_is_available()) {
-		struct vm_struct *area;
-
-		area = __get_vm_area_caller(size, VM_IOREMAP,
-					    ioremap_bot, IOREMAP_END,
-					    caller);
-		if (area == NULL)
-			return NULL;
-
-		area->phys_addr = paligned;
-		ret = __ioremap_at(paligned, area->addr, size, flags);
-		if (!ret)
-			vunmap(area->addr);
-	} else {
-		ret = __ioremap_at(paligned, (void *)ioremap_bot, size, flags);
-		if (ret)
-			ioremap_bot += size;
-	}
-
-	if (ret)
-		ret += addr & ~PAGE_MASK;
-	return ret;
-}
-
-void __iomem * __ioremap(phys_addr_t addr, unsigned long size,
-			 unsigned long flags)
-{
-	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
-}
-
-void __iomem * ioremap(phys_addr_t addr, unsigned long size)
-{
-	unsigned long flags = pgprot_val(pgprot_noncached(__pgprot(0)));
-	void *caller = __builtin_return_address(0);
-
-	if (ppc_md.ioremap)
-		return ppc_md.ioremap(addr, size, flags, caller);
-	return __ioremap_caller(addr, size, flags, caller);
-}
-
-void __iomem * ioremap_wc(phys_addr_t addr, unsigned long size)
-{
-	unsigned long flags = pgprot_val(pgprot_noncached_wc(__pgprot(0)));
-	void *caller = __builtin_return_address(0);
-
-	if (ppc_md.ioremap)
-		return ppc_md.ioremap(addr, size, flags, caller);
-	return __ioremap_caller(addr, size, flags, caller);
-}
-
-void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size,
-			     unsigned long flags)
-{
-	void *caller = __builtin_return_address(0);
-
-	/* writeable implies dirty for kernel addresses */
-	if (flags & _PAGE_WRITE)
-		flags |= _PAGE_DIRTY;
-
-	/* we don't want to let _PAGE_EXEC leak out */
-	flags &= ~_PAGE_EXEC;
-	/*
-	 * Force kernel mapping.
-	 */
-	flags &= ~_PAGE_USER;
-	flags |= _PAGE_PRIVILEGED;
-
-	if (ppc_md.ioremap)
-		return ppc_md.ioremap(addr, size, flags, caller);
-	return __ioremap_caller(addr, size, flags, caller);
-}
-
-
-/*  
- * Unmap an IO region and remove it from imalloc'd list.
- * Access to IO memory should be serialized by driver.
- */
-void __iounmap(volatile void __iomem *token)
-{
-	void *addr;
-
-	if (!slab_is_available())
-		return;
-	
-	addr = (void *) ((unsigned long __force)
-			 PCI_FIX_ADDR(token) & PAGE_MASK);
-	if ((unsigned long)addr < ioremap_bot) {
-		printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
-		       " at 0x%p\n", addr);
-		return;
-	}
-	vunmap(addr);
-}
-
-void iounmap(volatile void __iomem *token)
-{
-	if (ppc_md.iounmap)
-		ppc_md.iounmap(token);
-	else
-		__iounmap(token);
-}
-
-EXPORT_SYMBOL(ioremap);
-EXPORT_SYMBOL(ioremap_wc);
-EXPORT_SYMBOL(ioremap_prot);
-EXPORT_SYMBOL(__ioremap);
-EXPORT_SYMBOL(__ioremap_at);
-EXPORT_SYMBOL(iounmap);
-EXPORT_SYMBOL(__iounmap);
-EXPORT_SYMBOL(__iounmap_at);
-
 #ifndef __PAGETABLE_PUD_FOLDED
 /* 4 level page table */
 struct page *pgd_page(pgd_t pgd)
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  06/17] powerpc: common ioremap functions.
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (4 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 05/17] powerpc: move io mapping functions into ioremap.c Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 07/17] powerpc: make ioremap_bot common to PPC32 and PPC64 Christophe Leroy
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

__ioremap(), ioremap(), ioremap_wc() et ioremap_prot() are
very similar between PPC32 and PPC64, they can easily be
made common.

_PAGE_WRITE equals to _PAGE_RW on PPC32
_PAGE_RO and _PAGE_HWWRITE are 0 on PPC64

iounmap() can also be made common by renamig the PPC32
iounmap() as __iounmap()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
 arch/powerpc/include/asm/machdep.h           |  2 +-
 arch/powerpc/mm/ioremap.c                    | 95 +++++++++-------------------
 3 files changed, 31 insertions(+), 67 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 47b5ffc8715d..c5c6ead06bfb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -17,6 +17,7 @@
 #define _PAGE_NA		0
 #define _PAGE_RO		0
 #define _PAGE_USER		0
+#define _PAGE_HWWRITE		0
 
 #define _PAGE_EXEC		0x00001 /* execute permission */
 #define _PAGE_WRITE		0x00002 /* write access allowed */
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index ffe7c71e1132..84d99ed82d5d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -33,11 +33,11 @@ struct pci_host_bridge;
 
 struct machdep_calls {
 	char		*name;
-#ifdef CONFIG_PPC64
 	void __iomem *	(*ioremap)(phys_addr_t addr, unsigned long size,
 				   unsigned long flags, void *caller);
 	void		(*iounmap)(volatile void __iomem *token);
 
+#ifdef CONFIG_PPC64
 #ifdef CONFIG_PM
 	void		(*iommu_save)(void);
 	void		(*iommu_restore)(void);
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 5d2645193568..f8dc9638c598 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -23,6 +23,7 @@
 #include <asm/io.h>
 #include <asm/setup.h>
 #include <asm/sections.h>
+#include <asm/machdep.h>
 
 #include "mmu_decl.h"
 
@@ -32,44 +33,6 @@ unsigned long ioremap_bot;
 EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
 
 void __iomem *
-ioremap(phys_addr_t addr, unsigned long size)
-{
-	return __ioremap_caller(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED,
-				__builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap);
-
-void __iomem *
-ioremap_wc(phys_addr_t addr, unsigned long size)
-{
-	return __ioremap_caller(addr, size, _PAGE_NO_CACHE,
-				__builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_wc);
-
-void __iomem *
-ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-	/* writeable implies dirty for kernel addresses */
-	if ((flags & (_PAGE_RW | _PAGE_RO)) != _PAGE_RO)
-		flags |= _PAGE_DIRTY | _PAGE_HWWRITE;
-
-	/* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
-	flags &= ~(_PAGE_USER | _PAGE_EXEC);
-	flags |= _PAGE_PRIVILEGED;
-
-	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_prot);
-
-void __iomem *
-__ioremap(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-	return __ioremap_caller(addr, size, flags, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(__ioremap);
-
-void __iomem *
 __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 		 void *caller)
 {
@@ -153,7 +116,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
 }
 
-void iounmap(volatile void __iomem *addr)
+void __iounmap(volatile void __iomem *addr)
 {
 	/*
 	 * If mapped by BATs then there is nothing to do.
@@ -165,7 +128,7 @@ void iounmap(volatile void __iomem *addr)
 	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
 		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
 }
-EXPORT_SYMBOL(iounmap);
+EXPORT_SYMBOL(__iounmap);
 
 #else
 
@@ -264,6 +227,30 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
 	return ret;
 }
 
+/*
+ * Unmap an IO region and remove it from imalloc'd list.
+ * Access to IO memory should be serialized by driver.
+ */
+void __iounmap(volatile void __iomem *token)
+{
+	void *addr;
+
+	if (!slab_is_available())
+		return;
+
+	addr = (void *) ((unsigned long __force)
+			 PCI_FIX_ADDR(token) & PAGE_MASK);
+	if ((unsigned long)addr < ioremap_bot) {
+		printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
+		       " at 0x%p\n", addr);
+		return;
+	}
+	vunmap(addr);
+}
+EXPORT_SYMBOL(__iounmap);
+
+#endif
+
 void __iomem * __ioremap(phys_addr_t addr, unsigned long size,
 			 unsigned long flags)
 {
@@ -299,8 +286,8 @@ void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size,
 	void *caller = __builtin_return_address(0);
 
 	/* writeable implies dirty for kernel addresses */
-	if (flags & _PAGE_WRITE)
-		flags |= _PAGE_DIRTY;
+	if ((flags & (_PAGE_WRITE | _PAGE_RO)) != _PAGE_RO)
+		flags |= _PAGE_DIRTY | _PAGE_HWWRITE;
 
 	/* we don't want to let _PAGE_EXEC leak out */
 	flags &= ~_PAGE_EXEC;
@@ -316,28 +303,6 @@ void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size,
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-/*
- * Unmap an IO region and remove it from imalloc'd list.
- * Access to IO memory should be serialized by driver.
- */
-void __iounmap(volatile void __iomem *token)
-{
-	void *addr;
-
-	if (!slab_is_available())
-		return;
-
-	addr = (void *) ((unsigned long __force)
-			 PCI_FIX_ADDR(token) & PAGE_MASK);
-	if ((unsigned long)addr < ioremap_bot) {
-		printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
-		       " at 0x%p\n", addr);
-		return;
-	}
-	vunmap(addr);
-}
-EXPORT_SYMBOL(__iounmap);
-
 void iounmap(volatile void __iomem *token)
 {
 	if (ppc_md.iounmap)
@@ -346,5 +311,3 @@ void iounmap(volatile void __iomem *token)
 		__iounmap(token);
 }
 EXPORT_SYMBOL(iounmap);
-
-#endif
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  07/17] powerpc: make ioremap_bot common to PPC32 and PPC64
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (5 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 06/17] powerpc: common ioremap functions Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 08/17] powerpc: make __iounmap() " Christophe Leroy
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Today, early ioremap maps from IOREMAP_BASE down to up on PPC64
and from IOREMAP_TOP up to down on PPC32

This patchs modifies PPC32 behaviour to get same behaviour as PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 16 +++++++++-------
 arch/powerpc/include/asm/nohash/32/pgtable.h | 20 ++++++++------------
 arch/powerpc/mm/dma-noncoherent.c            |  2 +-
 arch/powerpc/mm/dump_linuxpagetables.c       |  6 +++---
 arch/powerpc/mm/init_32.c                    |  6 +++++-
 arch/powerpc/mm/ioremap.c                    | 22 ++++++++++------------
 arch/powerpc/mm/mem.c                        |  7 ++++---
 7 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index c615abdce119..6cf962ec7a20 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -54,16 +54,17 @@
 #else
 #define KVIRT_TOP	(0xfe000000UL)	/* for now, could be FIXMAP_BASE ? */
 #endif
+#define IOREMAP_BASE	VMALLOC_BASE
 
 /*
- * ioremap_bot starts at that address. Early ioremaps move down from there,
- * until mem_init() at which point this becomes the top of the vmalloc
+ * ioremap_bot starts at IOREMAP_BASE. Early ioremaps move up from there,
+ * until mem_init() at which point this becomes the bottom of the vmalloc
  * and ioremap space
  */
 #ifdef CONFIG_NOT_COHERENT_CACHE
-#define IOREMAP_TOP	((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
+#define IOREMAP_END	((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
 #else
-#define IOREMAP_TOP	KVIRT_TOP
+#define IOREMAP_END	KVIRT_TOP
 #endif
 
 /*
@@ -85,11 +86,12 @@
  */
 #define VMALLOC_OFFSET (0x1000000) /* 16M */
 #ifdef PPC_PIN_SIZE
-#define VMALLOC_START (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
 #else
-#define VMALLOC_START ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
 #endif
-#define VMALLOC_END	ioremap_bot
+#define VMALLOC_START	ioremap_bot
+#define VMALLOC_END	IOREMAP_END
 
 #ifndef __ASSEMBLY__
 #include <linux/sched.h>
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 140f8e74b478..b413abcd5a09 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -80,10 +80,11 @@ extern int icache_44x_need_flush;
  * and ioremap space
  */
 #ifdef CONFIG_NOT_COHERENT_CACHE
-#define IOREMAP_TOP	((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
+#define IOREMAP_END	((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
 #else
-#define IOREMAP_TOP	KVIRT_TOP
+#define IOREMAP_END	KVIRT_TOP
 #endif
+#define IOREMAP_BASE	VMALLOC_BASE
 
 /*
  * Just any arbitrary offset to the start of the vmalloc VM area: the
@@ -94,21 +95,16 @@ extern int icache_44x_need_flush;
  * area for the same reason. ;)
  *
  * We no longer map larger than phys RAM with the BATs so we don't have
- * to worry about the VMALLOC_OFFSET causing problems.  We do have to worry
- * about clashes between our early calls to ioremap() that start growing down
- * from IOREMAP_TOP being run into the VM area allocations (growing upwards
- * from VMALLOC_START).  For this reason we have ioremap_bot to check when
- * we actually run into our mappings setup in the early boot with the VM
- * system.  This really does become a problem for machines with good amounts
- * of RAM.  -- Cort
+ * to worry about the VMALLOC_OFFSET causing problems.
  */
 #define VMALLOC_OFFSET (0x1000000) /* 16M */
 #ifdef PPC_PIN_SIZE
-#define VMALLOC_START (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
 #else
-#define VMALLOC_START ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
 #endif
-#define VMALLOC_END	ioremap_bot
+#define VMALLOC_START	ioremap_bot
+#define VMALLOC_END	IOREMAP_END
 
 /*
  * Bits in a linux-style PTE.  These match the bits in the
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index 382528475433..d0a8fe74f5a0 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -43,7 +43,7 @@
  * can be further configured for specific applications under
  * the "Advanced Setup" menu. -Matt
  */
-#define CONSISTENT_BASE		(IOREMAP_TOP)
+#define CONSISTENT_BASE		(IOREMAP_END)
 #define CONSISTENT_END 		(CONSISTENT_BASE + CONFIG_CONSISTENT_SIZE)
 #define CONSISTENT_OFFSET(x)	(((unsigned long)(x) - CONSISTENT_BASE) >> PAGE_SHIFT)
 
diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c
index 876e2a3c79f2..6022adb899b7 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -452,11 +452,11 @@ static void populate_markers(void)
 	address_markers[i++].start_address =  VMEMMAP_BASE;
 #endif
 #else /* !CONFIG_PPC64 */
+	address_markers[i++].start_address = IOREMAP_BASE;
 	address_markers[i++].start_address = ioremap_bot;
-	address_markers[i++].start_address = IOREMAP_TOP;
 #ifdef CONFIG_NOT_COHERENT_CACHE
-	address_markers[i++].start_address = IOREMAP_TOP;
-	address_markers[i++].start_address = IOREMAP_TOP +
+	address_markers[i++].start_address = IOREMAP_END;
+	address_markers[i++].start_address = IOREMAP_END +
 					     CONFIG_CONSISTENT_SIZE;
 #endif
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d64b01..7fb9e5a9852a 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -172,7 +172,11 @@ void __init MMU_init(void)
 	mapin_ram();
 
 	/* Initialize early top-down ioremap allocator */
-	ioremap_bot = IOREMAP_TOP;
+	if (IS_ENABLED(CONFIG_HIGHMEM))
+		high_memory = (void *) __va(lowmem_end_addr);
+	else
+		high_memory = (void *) __va(memblock_end_of_DRAM());
+	ioremap_bot = IOREMAP_BASE;
 
 	if (ppc_md.progress)
 		ppc_md.progress("MMU:exit", 0x211);
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index f8dc9638c598..153657db084e 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -27,10 +27,13 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC32
-
+#if defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC32)
 unsigned long ioremap_bot;
-EXPORT_SYMBOL(ioremap_bot);	/* aka VMALLOC_END */
+#else
+unsigned long ioremap_bot = IOREMAP_BASE;
+#endif
+
+#ifdef CONFIG_PPC32
 
 void __iomem *
 __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
@@ -51,7 +54,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 	/*
 	 * Choose an address to map it to.
 	 * Once the vmalloc system is running, we use it.
-	 * Before then, we use space going down from IOREMAP_TOP
+	 * Before then, we use space going up from IOREMAP_BASE
 	 * (ioremap_bot records where we're up to).
 	 */
 	p = addr & PAGE_MASK;
@@ -96,7 +99,8 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 		area->phys_addr = p;
 		v = (unsigned long) area->addr;
 	} else {
-		v = (ioremap_bot -= size);
+		v = ioremap_bot;
+		ioremap_bot += size;
 	}
 
 	/*
@@ -125,19 +129,13 @@ void __iounmap(volatile void __iomem *addr)
 	if (v_block_mapped((unsigned long)addr))
 		return;
 
-	if (addr > high_memory && (unsigned long) addr < ioremap_bot)
+	if ((unsigned long) addr >= ioremap_bot)
 		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
 }
 EXPORT_SYMBOL(__iounmap);
 
 #else
 
-#ifdef CONFIG_PPC_BOOK3S_64
-unsigned long ioremap_bot;
-#else /* !CONFIG_PPC_BOOK3S_64 */
-unsigned long ioremap_bot = IOREMAP_BASE;
-#endif
-
 /**
  * __ioremap_at - Low level function to establish the page tables
  *                for an IO mapping
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c3c39b02b2ba..b680aa78a4ac 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -345,8 +345,9 @@ void __init mem_init(void)
 #ifdef CONFIG_SWIOTLB
 	swiotlb_init(0);
 #endif
-
+#ifdef CONFIG_PPC64
 	high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
+#endif
 	set_max_mapnr(max_pfn);
 	free_all_bootmem();
 
@@ -383,10 +384,10 @@ void __init mem_init(void)
 #endif /* CONFIG_HIGHMEM */
 #ifdef CONFIG_NOT_COHERENT_CACHE
 	pr_info("  * 0x%08lx..0x%08lx  : consistent mem\n",
-		IOREMAP_TOP, IOREMAP_TOP + CONFIG_CONSISTENT_SIZE);
+		IOREMAP_END, IOREMAP_END + CONFIG_CONSISTENT_SIZE);
 #endif /* CONFIG_NOT_COHERENT_CACHE */
 	pr_info("  * 0x%08lx..0x%08lx  : early ioremap\n",
-		ioremap_bot, IOREMAP_TOP);
+		IOREMAP_BASE, ioremap_bot);
 	pr_info("  * 0x%08lx..0x%08lx  : vmalloc & ioremap\n",
 		VMALLOC_START, VMALLOC_END);
 #endif /* CONFIG_PPC32 */
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  08/17] powerpc: make __iounmap() common to PPC32 and PPC64
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (6 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 07/17] powerpc: make ioremap_bot common to PPC32 and PPC64 Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 09/17] powerpc: make __ioremap_caller() " Christophe Leroy
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

This patch makes __iounmap() common to PPC32 and PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/mm/ioremap.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 153657db084e..65d611d44d38 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -120,20 +120,6 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
 	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
 }
 
-void __iounmap(volatile void __iomem *addr)
-{
-	/*
-	 * If mapped by BATs then there is nothing to do.
-	 * Calling vfree() generates a benign warning.
-	 */
-	if (v_block_mapped((unsigned long)addr))
-		return;
-
-	if ((unsigned long) addr >= ioremap_bot)
-		vunmap((void *) (PAGE_MASK & (unsigned long)addr));
-}
-EXPORT_SYMBOL(__iounmap);
-
 #else
 
 /**
@@ -225,6 +211,8 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
 	return ret;
 }
 
+#endif
+
 /*
  * Unmap an IO region and remove it from imalloc'd list.
  * Access to IO memory should be serialized by driver.
@@ -238,6 +226,14 @@ void __iounmap(volatile void __iomem *token)
 
 	addr = (void *) ((unsigned long __force)
 			 PCI_FIX_ADDR(token) & PAGE_MASK);
+
+	/*
+	 * If mapped by BATs then there is nothing to do.
+	 * Calling vfree() generates a benign warning.
+	 */
+	if (v_block_mapped((unsigned long)addr))
+		return;
+
 	if ((unsigned long)addr < ioremap_bot) {
 		printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
 		       " at 0x%p\n", addr);
@@ -247,8 +243,6 @@ void __iounmap(volatile void __iomem *token)
 }
 EXPORT_SYMBOL(__iounmap);
 
-#endif
-
 void __iomem * __ioremap(phys_addr_t addr, unsigned long size,
 			 unsigned long flags)
 {
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  09/17] powerpc: make __ioremap_caller() common to PPC32 and PPC64
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (7 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 08/17] powerpc: make __iounmap() " Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-08  9:56   ` Aneesh Kumar K.V
  2018-05-04 12:34 ` [PATCH 10/17] powerpc: use _ALIGN macro Christophe Leroy
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |   1 +
 arch/powerpc/mm/ioremap.c                    | 126 +++++++--------------------
 2 files changed, 34 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index c5c6ead06bfb..2bebdd8302cb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -18,6 +18,7 @@
 #define _PAGE_RO		0
 #define _PAGE_USER		0
 #define _PAGE_HWWRITE		0
+#define _PAGE_COHERENT		0
 
 #define _PAGE_EXEC		0x00001 /* execute permission */
 #define _PAGE_WRITE		0x00002 /* write access allowed */
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 65d611d44d38..59be5dfcb3e9 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -33,95 +33,6 @@ unsigned long ioremap_bot;
 unsigned long ioremap_bot = IOREMAP_BASE;
 #endif
 
-#ifdef CONFIG_PPC32
-
-void __iomem *
-__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
-		 void *caller)
-{
-	unsigned long v, i;
-	phys_addr_t p;
-	int err;
-
-	/* Make sure we have the base flags */
-	if ((flags & _PAGE_PRESENT) == 0)
-		flags |= pgprot_val(PAGE_KERNEL);
-
-	/* Non-cacheable page cannot be coherent */
-	if (flags & _PAGE_NO_CACHE)
-		flags &= ~_PAGE_COHERENT;
-
-	/*
-	 * Choose an address to map it to.
-	 * Once the vmalloc system is running, we use it.
-	 * Before then, we use space going up from IOREMAP_BASE
-	 * (ioremap_bot records where we're up to).
-	 */
-	p = addr & PAGE_MASK;
-	size = PAGE_ALIGN(addr + size) - p;
-
-	/*
-	 * If the address lies within the first 16 MB, assume it's in ISA
-	 * memory space
-	 */
-	if (p < 16*1024*1024)
-		p += _ISA_MEM_BASE;
-
-#ifndef CONFIG_CRASH_DUMP
-	/*
-	 * Don't allow anybody to remap normal RAM that we're using.
-	 * mem_init() sets high_memory so only do the check after that.
-	 */
-	if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
-	    page_is_ram(__phys_to_pfn(p))) {
-		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
-		       (unsigned long long)p, __builtin_return_address(0));
-		return NULL;
-	}
-#endif
-
-	if (size == 0)
-		return NULL;
-
-	/*
-	 * Is it already mapped?  Perhaps overlapped by a previous
-	 * mapping.
-	 */
-	v = p_block_mapped(p);
-	if (v)
-		goto out;
-
-	if (slab_is_available()) {
-		struct vm_struct *area;
-		area = get_vm_area_caller(size, VM_IOREMAP, caller);
-		if (area == 0)
-			return NULL;
-		area->phys_addr = p;
-		v = (unsigned long) area->addr;
-	} else {
-		v = ioremap_bot;
-		ioremap_bot += size;
-	}
-
-	/*
-	 * Should check if it is a candidate for a BAT mapping
-	 */
-
-	err = 0;
-	for (i = 0; i < size && err == 0; i += PAGE_SIZE)
-		err = map_kernel_page(v+i, p+i, flags);
-	if (err) {
-		if (slab_is_available())
-			vunmap((void *)v);
-		return NULL;
-	}
-
-out:
-	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
-}
-
-#else
-
 /**
  * __ioremap_at - Low level function to establish the page tables
  *                for an IO mapping
@@ -135,6 +46,10 @@ void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size,
 	if ((flags & _PAGE_PRESENT) == 0)
 		flags |= pgprot_val(PAGE_KERNEL);
 
+	/* Non-cacheable page cannot be coherent */
+	if (flags & _PAGE_NO_CACHE)
+		flags &= ~_PAGE_COHERENT;
+
 	/* We don't support the 4K PFN hack with ioremap */
 	if (flags & H_PAGE_4K_PFN)
 		return NULL;
@@ -187,6 +102,33 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
 	if ((size == 0) || (paligned == 0))
 		return NULL;
 
+	/*
+	 * If the address lies within the first 16 MB, assume it's in ISA
+	 * memory space
+	 */
+	if (IS_ENABLED(CONFIG_PPC32) && paligned < 16*1024*1024)
+		paligned += _ISA_MEM_BASE;
+
+	/*
+	 * Don't allow anybody to remap normal RAM that we're using.
+	 * mem_init() sets high_memory so only do the check after that.
+	 */
+	if (!IS_ENABLED(CONFIG_CRASH_DUMP) &&
+	    slab_is_available() && (paligned < virt_to_phys(high_memory)) &&
+	    page_is_ram(__phys_to_pfn(paligned))) {
+		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
+		       (u64)paligned, __builtin_return_address(0));
+		return NULL;
+	}
+
+	/*
+	 * Is it already mapped?  Perhaps overlapped by a previous
+	 * mapping.
+	 */
+	ret = (void __iomem *)p_block_mapped(paligned);
+	if (ret)
+		goto out;
+
 	if (slab_is_available()) {
 		struct vm_struct *area;
 
@@ -205,14 +147,12 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
 		if (ret)
 			ioremap_bot += size;
 	}
-
+out:
 	if (ret)
-		ret += addr & ~PAGE_MASK;
+		ret += (unsigned long)addr & ~PAGE_MASK;
 	return ret;
 }
 
-#endif
-
 /*
  * Unmap an IO region and remove it from imalloc'd list.
  * Access to IO memory should be serialized by driver.
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  10/17] powerpc: use _ALIGN macro
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (8 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 09/17] powerpc: make __ioremap_caller() " Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly Christophe Leroy
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index b413abcd5a09..93dc22dbe964 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -99,9 +99,9 @@ extern int icache_44x_need_flush;
  */
 #define VMALLOC_OFFSET (0x1000000) /* 16M */
 #ifdef PPC_PIN_SIZE
-#define VMALLOC_BASE (((_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE _ALIGN_DOWN(_ALIGN((long)high_memory, PPC_PIN_SIZE) + VMALLOC_OFFSET, VMALLOC_OFFSET)
 #else
-#define VMALLOC_BASE ((((long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)))
+#define VMALLOC_BASE _ALIGN_DOWN((long)high_memory + VMALLOC_OFFSET, VMALLOC_OFFSET)
 #endif
 #define VMALLOC_START	ioremap_bot
 #define VMALLOC_END	IOREMAP_END
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (9 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 10/17] powerpc: use _ALIGN macro Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-11  6:45   ` Michael Ellerman
  2018-05-04 12:34 ` [PATCH 12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

On the 8xx, the GUARDED attribute of the pages is managed in the
L1 entry, therefore to avoid having to copy it into L1 entry
at each TLB miss, we set it in the PMD.

For this, we split the VM alloc space in two parts, one
for VM alloc and non Guarded IO, and one for Guarded IO.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 10 ++++++++++
 arch/powerpc/include/asm/nohash/32/pgtable.h | 18 ++++++++++++++++--
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |  3 ++-
 arch/powerpc/kernel/head_8xx.S               | 18 +++++++-----------
 arch/powerpc/mm/dump_linuxpagetables.c       | 26 ++++++++++++++++++++++++--
 arch/powerpc/mm/ioremap.c                    | 11 ++++++++---
 arch/powerpc/mm/mem.c                        |  9 +++++++++
 arch/powerpc/mm/pgtable_32.c                 | 28 +++++++++++++++++++++++++++-
 arch/powerpc/platforms/Kconfig.cputype       |  3 +++
 9 files changed, 106 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 29d37bd1f3b3..1c6461e7c6aa 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -58,6 +58,12 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
 	*pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
 }
 
+static inline void pmd_populate_kernel_g(struct mm_struct *mm, pmd_t *pmdp,
+					 pte_t *pte)
+{
+	*pmdp = __pmd(__pa(pte) | _PMD_PRESENT | _PMD_GUARDED);
+}
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
 				pgtable_t pte_page)
 {
@@ -83,6 +89,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
 #define pmd_pgtable(pmd) pmd_page(pmd)
 #endif
 
+#define pte_alloc_kernel_g(pmd, address)			\
+	((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel_g(pmd, address))? \
+		NULL: pte_offset_kernel(pmd, address))
+
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
 
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 93dc22dbe964..009a5b3d3192 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -69,9 +69,14 @@ extern int icache_44x_need_flush;
  * virtual space that goes below PKMAP and FIXMAP
  */
 #ifdef CONFIG_HIGHMEM
-#define KVIRT_TOP	PKMAP_BASE
+#define _KVIRT_TOP	PKMAP_BASE
 #else
-#define KVIRT_TOP	(0xfe000000UL)	/* for now, could be FIXMAP_BASE ? */
+#define _KVIRT_TOP	(0xfe000000UL)	/* for now, could be FIXMAP_BASE ? */
+#endif
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+#define KVIRT_TOP	_ALIGN_DOWN(_KVIRT_TOP, PGDIR_SIZE)
+#else
+#define KVIRT_TOP	_KVIRT_TOP
 #endif
 
 /*
@@ -84,7 +89,11 @@ extern int icache_44x_need_flush;
 #else
 #define IOREMAP_END	KVIRT_TOP
 #endif
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+#define IOREMAP_BASE	_ALIGN_UP(VMALLOC_BASE + (IOREMAP_END - VMALLOC_BASE) / 2, PGDIR_SIZE)
+#else
 #define IOREMAP_BASE	VMALLOC_BASE
+#endif
 
 /*
  * Just any arbitrary offset to the start of the vmalloc VM area: the
@@ -103,8 +112,13 @@ extern int icache_44x_need_flush;
 #else
 #define VMALLOC_BASE _ALIGN_DOWN((long)high_memory + VMALLOC_OFFSET, VMALLOC_OFFSET)
 #endif
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+#define VMALLOC_START	VMALLOC_BASE
+#define VMALLOC_END	IOREMAP_BASE
+#else
 #define VMALLOC_START	ioremap_bot
 #define VMALLOC_END	IOREMAP_END
+#endif
 
 /*
  * Bits in a linux-style PTE.  These match the bits in the
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index f04cb46ae8a1..a9a2919251e0 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -47,10 +47,11 @@
 #define _PAGE_RO	0x0600	/* Supervisor RO, User no access */
 
 #define _PMD_PRESENT	0x0001
-#define _PMD_BAD	0x0fd0
+#define _PMD_BAD	0x0fc0
 #define _PMD_PAGE_MASK	0x000c
 #define _PMD_PAGE_8M	0x000c
 #define _PMD_PAGE_512K	0x0004
+#define _PMD_GUARDED	0x0010
 #define _PMD_USER	0x0020	/* APG 1 */
 
 /* Until my rework is finished, 8xx still needs atomic PTE updates */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c3b831bb8bad..85b017c67e11 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -345,6 +345,10 @@ _ENTRY(ITLBMiss_cmp)
 	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
 #ifdef CONFIG_HUGETLB_PAGE
 	mtcr	r11
+#endif
+	/* Load the MI_TWC with the attributes for this "segment." */
+	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
+#ifdef CONFIG_HUGETLB_PAGE
 	bt-	28, 10f		/* bit 28 = Large page (8M) */
 	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
 #endif
@@ -354,8 +358,6 @@ _ENTRY(ITLBMiss_cmp)
 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
 	mtcr	r12
 #endif
-	/* Load the MI_TWC with the attributes for this "segment." */
-	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
 
 #ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
@@ -457,6 +459,9 @@ _ENTRY(DTLBMiss_jmp)
 	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
 #ifdef CONFIG_HUGETLB_PAGE
 	mtcr	r11
+#endif
+	mtspr	SPRN_MD_TWC, r11
+#ifdef CONFIG_HUGETLB_PAGE
 	bt-	28, 10f		/* bit 28 = Large page (8M) */
 	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
 #endif
@@ -465,15 +470,6 @@ _ENTRY(DTLBMiss_jmp)
 4:
 	mtcr	r12
 
-	/* Insert the Guarded flag into the TWC from the Linux PTE.
-	 * It is bit 27 of both the Linux PTE and the TWC (at least
-	 * I got that right :-).  It will be better when we can put
-	 * this into the Linux pgd/pmd and load it in the operation
-	 * above.
-	 */
-	rlwimi	r11, r10, 0, _PAGE_GUARDED
-	mtspr	SPRN_MD_TWC, r11
-
 	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
 	 * We also need to know if the insn is a load/store, so:
 	 * Clear _PAGE_PRESENT and load that which will
diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c
index 6022adb899b7..cd3797be5e05 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -74,9 +74,9 @@ struct addr_marker {
 
 static struct addr_marker address_markers[] = {
 	{ 0,	"Start of kernel VM" },
+#ifdef CONFIG_PPC64
 	{ 0,	"vmalloc() Area" },
 	{ 0,	"vmalloc() End" },
-#ifdef CONFIG_PPC64
 	{ 0,	"isa I/O start" },
 	{ 0,	"isa I/O end" },
 	{ 0,	"phb I/O start" },
@@ -85,8 +85,19 @@ static struct addr_marker address_markers[] = {
 	{ 0,	"I/O remap end" },
 	{ 0,	"vmemmap start" },
 #else
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+	{ 0,	"vmalloc() Area" },
+	{ 0,	"vmalloc() End" },
 	{ 0,	"Early I/O remap start" },
 	{ 0,	"Early I/O remap end" },
+	{ 0,	"I/O remap start" },
+	{ 0,	"I/O remap end" },
+#else
+	{ 0,	"Early I/O remap start" },
+	{ 0,	"Early I/O remap end" },
+	{ 0,	"vmalloc() I/O remap start" },
+	{ 0,	"vmalloc() I/O remap end" },
+#endif
 #ifdef CONFIG_NOT_COHERENT_CACHE
 	{ 0,	"Consistent mem start" },
 	{ 0,	"Consistent mem end" },
@@ -437,9 +448,9 @@ static void populate_markers(void)
 	int i = 0;
 
 	address_markers[i++].start_address = PAGE_OFFSET;
+#ifdef CONFIG_PPC64
 	address_markers[i++].start_address = VMALLOC_START;
 	address_markers[i++].start_address = VMALLOC_END;
-#ifdef CONFIG_PPC64
 	address_markers[i++].start_address = ISA_IO_BASE;
 	address_markers[i++].start_address = ISA_IO_END;
 	address_markers[i++].start_address = PHB_IO_BASE;
@@ -452,8 +463,19 @@ static void populate_markers(void)
 	address_markers[i++].start_address =  VMEMMAP_BASE;
 #endif
 #else /* !CONFIG_PPC64 */
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+	address_markers[i++].start_address = VMALLOC_START;
+	address_markers[i++].start_address = VMALLOC_END;
 	address_markers[i++].start_address = IOREMAP_BASE;
 	address_markers[i++].start_address = ioremap_bot;
+	address_markers[i++].start_address = ioremap_bot;
+	address_markers[i++].start_address = IOREMAP_END;
+#else
+	address_markers[i++].start_address = IOREMAP_BASE;
+	address_markers[i++].start_address = ioremap_bot;
+	address_markers[i++].start_address = ioremap_bot;
+	address_markers[i++].start_address = IOREMAP_END;
+#endif
 #ifdef CONFIG_NOT_COHERENT_CACHE
 	address_markers[i++].start_address = IOREMAP_END;
 	address_markers[i++].start_address = IOREMAP_END +
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 59be5dfcb3e9..b8c347077e02 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -132,9 +132,14 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
 	if (slab_is_available()) {
 		struct vm_struct *area;
 
-		area = __get_vm_area_caller(size, VM_IOREMAP,
-					    ioremap_bot, IOREMAP_END,
-					    caller);
+		if (flags & _PAGE_GUARDED)
+			area = __get_vm_area_caller(size, VM_IOREMAP,
+						    ioremap_bot, IOREMAP_END,
+						    caller);
+		else
+			area = __get_vm_area_caller(size, VM_IOREMAP,
+						    VMALLOC_START, VMALLOC_END,
+						    caller);
 		if (area == NULL)
 			return NULL;
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index b680aa78a4ac..fd7af7af5b58 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -386,10 +386,19 @@ void __init mem_init(void)
 	pr_info("  * 0x%08lx..0x%08lx  : consistent mem\n",
 		IOREMAP_END, IOREMAP_END + CONFIG_CONSISTENT_SIZE);
 #endif /* CONFIG_NOT_COHERENT_CACHE */
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+	pr_info("  * 0x%08lx..0x%08lx  : ioremap\n",
+		ioremap_bot, IOREMAP_END);
 	pr_info("  * 0x%08lx..0x%08lx  : early ioremap\n",
 		IOREMAP_BASE, ioremap_bot);
+	pr_info("  * 0x%08lx..0x%08lx  : vmalloc\n",
+		VMALLOC_START, VMALLOC_END);
+#else
 	pr_info("  * 0x%08lx..0x%08lx  : vmalloc & ioremap\n",
 		VMALLOC_START, VMALLOC_END);
+	pr_info("  * 0x%08lx..0x%08lx  : early ioremap\n",
+		IOREMAP_BASE, ioremap_bot);
+#endif
 #endif /* CONFIG_PPC32 */
 }
 
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 54a5bc0767a9..3aa0c78db95d 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -70,6 +70,27 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 	return ptepage;
 }
 
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+int __pte_alloc_kernel_g(pmd_t *pmd, unsigned long address)
+{
+	pte_t *new = pte_alloc_one_kernel(&init_mm, address);
+	if (!new)
+		return -ENOMEM;
+
+	smp_wmb(); /* See comment in __pte_alloc */
+
+	spin_lock(&init_mm.page_table_lock);
+	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
+		pmd_populate_kernel_g(&init_mm, pmd, new);
+		new = NULL;
+	}
+	spin_unlock(&init_mm.page_table_lock);
+	if (new)
+		pte_free_kernel(&init_mm, new);
+	return 0;
+}
+#endif
+
 int map_kernel_page(unsigned long va, phys_addr_t pa, int flags)
 {
 	pmd_t *pd;
@@ -79,7 +100,12 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, int flags)
 	/* Use upper 10 bits of VA to index the first level map */
 	pd = pmd_offset(pud_offset(pgd_offset_k(va), va), va);
 	/* Use middle 10 bits of VA to index the second-level map */
-	pg = pte_alloc_kernel(pd, va);
+#ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
+	if (flags & _PAGE_GUARDED)
+		pg = pte_alloc_kernel_g(pd, va);
+	else
+#endif
+		pg = pte_alloc_kernel(pd, va);
 	if (pg != 0) {
 		err = 0;
 		/* The PTE should never be already set nor present in the
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 67d3125d0610..f860f0326c78 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -319,6 +319,9 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	def_bool y
 	depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
 
+config PPC_GUARDED_PAGE_IN_PMD
+	def_bool y
+	depends on PPC_8xx
 
 config PPC_MMU_NOHASH
 	def_bool y
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (10 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 13:16   ` Joakim Tjernlund
  2018-05-04 12:34 ` [PATCH 13/17] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */

commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as linux
mm expects") removed all PTE updates done in TLB miss handlers

Therefore, atomic PTE updates are not needed anymore for the 8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index a9a2919251e0..31401320c1a5 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -54,9 +54,6 @@
 #define _PMD_GUARDED	0x0010
 #define _PMD_USER	0x0020	/* APG 1 */
 
-/* Until my rework is finished, 8xx still needs atomic PTE updates */
-#define PTE_ATOMIC_UPDATES	1
-
 #ifdef CONFIG_PPC_16K_PAGES
 #define _PAGE_PSIZE	_PAGE_HUGE
 #endif
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  13/17] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (11 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 14/17] powerpc/8xx: reunify TLB handler routines Christophe Leroy
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance, this patch does the following
modifications:
- Make PGD size independant of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Modify the TLB handlers to use HW assistance
- Adapt the size of the hugepage tables.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/hugetlb.h           |   4 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h |  16 +-
 arch/powerpc/include/asm/nohash/pgtable.h    |   4 +
 arch/powerpc/include/asm/pgtable-types.h     |   4 +
 arch/powerpc/kernel/head_8xx.S               | 227 +++++++++------------------
 arch/powerpc/mm/8xx_mmu.c                    |  10 +-
 arch/powerpc/mm/hugetlbpage.c                |  12 ++
 7 files changed, 112 insertions(+), 165 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 78540c074d70..6d29be6bac74 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -77,7 +77,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
 	unsigned long idx = 0;
 
 	pte_t *dir = hugepd_page(hpd);
-#ifndef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_8xx
+	idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+#elif !defined(CONFIG_PPC_FSL_BOOK3E)
 	idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
 #endif
 
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 009a5b3d3192..3efd616bbc80 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -18,7 +18,11 @@ extern int icache_44x_need_flush;
 
 #endif /* __ASSEMBLY__ */
 
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define PTE_INDEX_SIZE  (PTE_SHIFT - 2)
+#else
 #define PTE_INDEX_SIZE	PTE_SHIFT
+#endif
 #define PMD_INDEX_SIZE	0
 #define PUD_INDEX_SIZE	0
 #define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
@@ -47,7 +51,11 @@ extern int icache_44x_need_flush;
  * -Matt
  */
 /* PGDIR_SHIFT determines what a top-level page table entry can map */
+#ifdef CONFIG_PPC_8xx
+#define PGDIR_SHIFT	22
+#else
 #define PGDIR_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
+#endif
 #define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
@@ -190,7 +198,13 @@ static inline unsigned long pte_update(pte_t *p,
 	: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
 	unsigned long old = pte_val(*p);
-	*p = __pte((old & ~clr) | set);
+	unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+	p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+	*p = __pte(new);
+#endif
 #endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 077472640b35..e4b6c084be5c 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -165,7 +165,11 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	/* Anything else just stores the PTE normally. That covers all 64-bit
 	 * cases, and 32-bit non-hash with 32-bit PTEs.
 	 */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+	ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
 	*ptep = pte;
+#endif
 
 	/*
 	 * With hardware tablewalk, a sync is needed to ensure that
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index eccb30b38b47..3b0edf041b2e 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -3,7 +3,11 @@
 #define _ASM_POWERPC_PGTABLE_TYPES_H
 
 /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
 typedef struct { pte_basic_t pte; } pte_t;
+#endif
 #define __pte(x)	((pte_t) { (x) })
 static inline pte_basic_t pte_val(pte_t x)
 {
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 85b017c67e11..4855d5a36f70 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -275,7 +275,7 @@ SystemCall:
 	. = 0x1100
 /*
  * For the MPC8xx, this is a software tablewalk to load the instruction
- * TLB.  The task switch loads the M_TW register with the pointer to the first
+ * TLB.  The task switch loads the M_TWB register with the pointer to the first
  * level table.
  * If we discover there is no second level table (value is zero) or if there
  * is an invalid pte, we load that into the TLB, which causes another fault
@@ -285,106 +285,100 @@ SystemCall:
  */
 
 #ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)	\
-	addi	tmp, addr, PAGE_SIZE;	\
-	tlbie	tmp;			\
-	addi	tmp, addr, -PAGE_SIZE;	\
-	tlbie	tmp
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)	\
+	addi	addr, addr, PAGE_SIZE;	\
+	tlbie	addr;			\
+	addi	addr, addr, -(PAGE_SIZE << 1);	\
+	tlbie	addr;			\
+	addi	addr, addr, PAGE_SIZE
 #else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
 #endif
 
 InstructionTLBMiss:
 	mtspr	SPRN_SPRG_SCRATCH0, r10
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
 	mtspr	SPRN_SPRG_SCRATCH1, r11
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-	mtspr	SPRN_SPRG_SCRATCH2, r12
+#ifdef ITLB_MISS_KERNEL
+	mfcr	r11
+#endif
 #endif
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
 	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
-	INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+	INVALIDATE_ADJACENT_PAGES_CPU15(r10)
 	/* Only modules will cause ITLB Misses as we always
 	 * pin the first 8MB of kernel memory */
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-	mfcr	r12
-#endif
+	mtspr	SPRN_MD_EPN, r10
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-	andis.	r11, r10, 0x8000	/* Address >= 0x80000000 */
+	cmpi	cr0, r10, 0	/* Address >= 0x80000000 */
 #else
-	rlwinm	r11, r10, 16, 0xfff8
-	cmpli	cr0, r11, PAGE_OFFSET@h
+	rlwinm	r10, r10, 16, 0xfff8
+	cmpli	cr0, r10, PAGE_OFFSET@h
 #ifndef CONFIG_PIN_TLB_TEXT
 	/* It is assumed that kernel code fits into the first 8M page */
 _ENTRY(ITLBMiss_cmp)
-	cmpli	cr7, r11, (PAGE_OFFSET + 0x0800000)@h
+	cmpli	cr7, r10, (PAGE_OFFSET + 0x0800000)@h
 #endif
 #endif
 #endif
-	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
+	mfspr	r10, SPRN_M_TWB	/* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-	beq+	3f
+	bge+	3f
 #else
 	blt+	3f
 #endif
 #ifndef CONFIG_PIN_TLB_TEXT
 	blt	cr7, ITLBMissLinear
 #endif
-	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+	rlwinm	r10, r10, 0, 20, 31
+	oris	r10, r10, (swapper_pg_dir - PAGE_OFFSET)@h
+	ori	r10, r10, (swapper_pg_dir - PAGE_OFFSET)@l
 3:
 #endif
+#ifdef ITLB_MISS_KERNEL
+	mfcr	r11
+#endif
 	/* Insert level 1 index */
-	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
-	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	lwz	r10, 0(r10)	/* Get the level 1 entry */
+	mtspr	SPRN_MI_TWC, r10	/* Set segment attributes */
+	mtspr	SPRN_MD_TWC, r10
 
-	/* Extract level 2 index */
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-	mtcr	r11
-#endif
-	/* Load the MI_TWC with the attributes for this "segment." */
-	mtspr	SPRN_MI_TWC, r11	/* Set segment attributes */
-#ifdef CONFIG_HUGETLB_PAGE
-	bt-	28, 10f		/* bit 28 = Large page (8M) */
-	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
-#endif
-	rlwimi	r10, r11, 0, 0, 32 - PAGE_SHIFT - 1	/* Add level 2 base */
+	mfspr	r10, SPRN_MD_TWC
 	lwz	r10, 0(r10)	/* Get the pte */
-4:
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-	mtcr	r12
-#endif
 
 #ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
 #endif
-	li	r11, RPN_PATTERN | 0x200
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 20 and 23 must be clear.
 	 * Software indicator bits 22, 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
-	rlwimi	r11, r10, 4, 0x0400	/* Copy _PAGE_EXEC into bit 21 */
-	rlwimi	r10, r11, 0, 0x0ff0	/* Set 22, 24-27, clear 20,23 */
+	rlwimi	r10, r10, 0, 0x0f00	/* Clear bits 20-23 */
+	rlwimi	r10, r10, 4, 0x0400	/* Copy _PAGE_EXEC into bit 21 */
+	ori	r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
 
 	/* Restore registers */
 _ENTRY(itlb_miss_exit_1)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 #endif
 	rfi
 #ifdef CONFIG_PERF_EVENTS
 _ENTRY(itlb_miss_perf)
+#if !defined(ITLB_MISS_KERNEL) && !defined(CONFIG_SWAP)
+	mtspr	SPRN_SPRG_SCRATCH1, r11
+#endif
 	lis	r10, (itlb_miss_counter - PAGE_OFFSET)@ha
 	lwz	r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
 	addi	r11, r11, 1
@@ -392,83 +386,42 @@ _ENTRY(itlb_miss_perf)
 #endif
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-	mfspr	r12, SPRN_SPRG_SCRATCH2
-#endif
 	rfi
 
-#ifdef CONFIG_HUGETLB_PAGE
-10:	/* 8M pages */
-#ifdef CONFIG_PPC_16K_PAGES
-	/* Extract level 2 index */
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
-	/* Add level 2 base */
-	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
-	/* Level 2 base */
-	rlwinm	r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
-	lwz	r10, 0(r10)	/* Get the pte */
-	b	4b
-
-20:	/* 512k pages */
-	/* Extract level 2 index */
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-	/* Add level 2 base */
-	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-	lwz	r10, 0(r10)	/* Get the pte */
-	b	4b
-#endif
-
 	. = 0x1200
 DataStoreTLBMiss:
 	mtspr	SPRN_SPRG_SCRATCH0, r10
 	mtspr	SPRN_SPRG_SCRATCH1, r11
-	mtspr	SPRN_SPRG_SCRATCH2, r12
-	mfcr	r12
+	mfcr	r11
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
 	mfspr	r10, SPRN_MD_EPN
-	rlwinm	r11, r10, 16, 0xfff8
-	cmpli	cr0, r11, PAGE_OFFSET@h
-	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
-	blt+	3f
-	rlwinm	r11, r10, 16, 0xfff8
+	rlwinm	r10, r10, 16, 0xfff8
+	cmpli	cr0, r10, PAGE_OFFSET@h
 #ifndef CONFIG_PIN_TLB_IMMR
-	cmpli	cr0, r11, VIRT_IMMR_BASE@h
+	cmpli	cr6, r10, VIRT_IMMR_BASE@h
 #endif
 _ENTRY(DTLBMiss_cmp)
-	cmpli	cr7, r11, (PAGE_OFFSET + 0x1800000)@h
+	cmpli	cr7, r10, (PAGE_OFFSET + 0x1800000)@h
+	mfspr	r10, SPRN_M_TWB	/* Get level 1 table */
+	blt+	3f
 #ifndef CONFIG_PIN_TLB_IMMR
 _ENTRY(DTLBMiss_jmp)
-	beq-	DTLBMissIMMR
+	beq-	cr6, DTLBMissIMMR
 #endif
 	blt	cr7, DTLBMissLinear
-	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+	rlwinm	r10, r10, 0, 20, 31
+	oris	r10, r10, (swapper_pg_dir - PAGE_OFFSET)@h
+	ori	r10, r10, (swapper_pg_dir - PAGE_OFFSET)@l
 3:
-
-	/* Insert level 1 index */
-	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
-	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
-
-	/* We have a pte table, so load fetch the pte from the table.
-	 */
-	/* Extract level 2 index */
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
+	lwz	r10, 0(r10)	/* Get the level 1 entry */
 	mtcr	r11
-#endif
-	mtspr	SPRN_MD_TWC, r11
-#ifdef CONFIG_HUGETLB_PAGE
-	bt-	28, 10f		/* bit 28 = Large page (8M) */
-	bt-	29, 20f		/* bit 29 = Large page (8M or 512k) */
-#endif
-	rlwimi	r10, r11, 0, 0, 32 - PAGE_SHIFT - 1	/* Add level 2 base */
+
+	mtspr	SPRN_MD_TWC, r10
+	mfspr	r10, SPRN_MD_TWC
 	lwz	r10, 0(r10)	/* Get the pte */
-4:
-	mtcr	r12
 
 	/* Both _PAGE_ACCESSED and _PAGE_PRESENT has to be set.
 	 * We also need to know if the insn is a load/store, so:
@@ -498,7 +451,6 @@ _ENTRY(DTLBMiss_jmp)
 _ENTRY(dtlb_miss_exit_1)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 	rfi
 #ifdef CONFIG_PERF_EVENTS
 _ENTRY(dtlb_miss_perf)
@@ -509,32 +461,8 @@ _ENTRY(dtlb_miss_perf)
 #endif
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 	rfi
 
-#ifdef CONFIG_HUGETLB_PAGE
-10:	/* 8M pages */
-	/* Extract level 2 index */
-#ifdef CONFIG_PPC_16K_PAGES
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
-	/* Add level 2 base */
-	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
-	/* Level 2 base */
-	rlwinm	r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
-	lwz	r10, 0(r10)	/* Get the pte */
-	b	4b
-
-20:	/* 512k pages */
-	/* Extract level 2 index */
-	rlwinm	r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-	/* Add level 2 base */
-	rlwimi	r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-	lwz	r10, 0(r10)	/* Get the pte */
-	b	4b
-#endif
-
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -642,7 +570,7 @@ InstructionBreakpoint:
  * not enough space in the DataStoreTLBMiss area.
  */
 DTLBMissIMMR:
-	mtcr	r12
+	mtcr	r11
 	/* Set 512k byte guarded page and mark it valid */
 	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
 	mtspr	SPRN_MD_TWC, r10
@@ -657,15 +585,14 @@ DTLBMissIMMR:
 _ENTRY(dtlb_miss_exit_2)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 	rfi
 
 DTLBMissLinear:
-	mtcr	r12
+	mtcr	r11
 	/* Set 8M byte page and mark it valid */
 	li	r11, MD_PS8MEG | MD_SVALID
 	mtspr	SPRN_MD_TWC, r11
-	rlwinm	r10, r10, 0, 0x0f800000	/* 8xx supports max 256Mb RAM */
+	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
 	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
 			  _PAGE_PRESENT
 	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
@@ -675,16 +602,15 @@ DTLBMissLinear:
 _ENTRY(dtlb_miss_exit_3)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 	rfi
 
 #ifndef CONFIG_PIN_TLB_TEXT
 ITLBMissLinear:
-	mtcr	r12
+	mtcr	r11
 	/* Set 8M byte page and mark it valid */
 	li	r11, MI_PS8MEG | MI_SVALID
 	mtspr	SPRN_MI_TWC, r11
-	rlwinm	r10, r10, 0, 0x0f800000	/* 8xx supports max 256Mb RAM */
+	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
 	ori	r10, r10, 0xf0 | MI_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
 			  _PAGE_PRESENT
 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
@@ -692,7 +618,6 @@ ITLBMissLinear:
 _ENTRY(itlb_miss_exit_2)
 	mfspr	r10, SPRN_SPRG_SCRATCH0
 	mfspr	r11, SPRN_SPRG_SCRATCH1
-	mfspr	r12, SPRN_SPRG_SCRATCH2
 	rfi
 #endif
 
@@ -706,9 +631,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
 	mtspr	SPRN_SPRG_SCRATCH2, r10
 	/* fetch instruction from memory. */
 	mfspr	r10, SPRN_SRR0
+	mtspr	SPRN_MD_EPN, r10
 	rlwinm	r11, r10, 16, 0xfff8
 	cmpli	cr0, r11, PAGE_OFFSET@h
-	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
+	mfspr	r11, SPRN_M_TWB	/* Get level 1 table */
 	blt+	3f
 	rlwinm	r11, r10, 16, 0xfff8
 _ENTRY(FixupDAR_cmp)
@@ -716,19 +642,20 @@ _ENTRY(FixupDAR_cmp)
 	/* create physical page address from effective address */
 	tophys(r11, r10)
 	blt-	cr7, 201f
-	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
-	/* Insert level 1 index */
-3:	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
-	lwz	r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)	/* Get the level 1 entry */
+	mfspr	r11, SPRN_M_TWB	/* Get level 1 table */
+	rlwinm	r11, r11, 0, 20, 31
+	oris	r11, r11, (swapper_pg_dir - PAGE_OFFSET)@h
+	ori	r11, r11, (swapper_pg_dir - PAGE_OFFSET)@l
+3:
+	lwz	r11, 0(r11)	/* Get the level 1 entry */
+	mtspr	SPRN_MD_TWC, r11
 	mtcr	r11
+	mfspr	r11, SPRN_MD_TWC
+	lwz	r11, 0(r11)	/* Get the pte */
 	bt	28,200f		/* bit 28 = Large page (8M) */
 	bt	29,202f		/* bit 29 = Large page (8M or 512K) */
-	rlwinm	r11, r11,0,0,19	/* Extract page descriptor page address */
-	/* Insert level 2 index */
-	rlwimi	r11, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-	lwz	r11, 0(r11)	/* Get the pte */
 	/* concat physical page address(r11) and page offset(r10) */
-	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT, 31
+	rlwimi	r11, r10, 0, 20, 31
 201:	lwz	r11,0(r11)
 /* Check if it really is a dcbx instruction. */
 /* dcbt and dcbtst does not generate DTLB Misses/Errors,
@@ -748,23 +675,12 @@ _ENTRY(FixupDAR_cmp)
 141:	mfspr	r10,SPRN_SPRG_SCRATCH2
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
-	/* concat physical page address(r11) and page offset(r10) */
 200:
-#ifdef CONFIG_PPC_16K_PAGES
-	rlwinm	r11, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-	rlwimi	r11, r10, 32 - (PAGE_SHIFT_8M - 2), 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1), 29
-#else
-	rlwinm	r11, r10, 0, ~HUGEPD_SHIFT_MASK
-#endif
-	lwz	r11, 0(r11)	/* Get the pte */
 	/* concat physical page address(r11) and page offset(r10) */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_8M, 31
 	b	201b
 
 202:
-	rlwinm	r11, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-	rlwimi	r11, r10, 32 - (PAGE_SHIFT_512K - 2), 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-	lwz	r11, 0(r11)	/* Get the pte */
 	/* concat physical page address(r11) and page offset(r10) */
 	rlwimi	r11, r10, 0, 32 - PAGE_SHIFT_512K, 31
 	b	201b
@@ -898,9 +814,10 @@ start_here:
 	 * init's THREAD like the context switch code does, but this is
 	 * easier......until someone changes init's static structures.
 	 */
-	lis	r6, swapper_pg_dir@ha
+	lis	r6, swapper_pg_dir@h
+	ori	r6, r6, swapper_pg_dir@l
 	tophys(r6,r6)
-	mtspr	SPRN_M_TW, r6
+	mtspr	SPRN_M_TWB, r6
 	lis	r4,2f@h
 	ori	r4,r4,2f@l
 	tophys(r4,r4)
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 5d53684c2ebd..54a02b8e21ec 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -173,8 +173,6 @@ void __init setup_initial_memory_limit(phys_addr_t first_memblock_base,
  */
 void set_context(unsigned long id, pgd_t *pgd)
 {
-	s16 offset = (s16)(__pa(swapper_pg_dir));
-
 #ifdef CONFIG_BDI_SWITCH
 	pgd_t	**ptr = *(pgd_t ***)(KERNELBASE + 0xf0);
 
@@ -184,12 +182,8 @@ void set_context(unsigned long id, pgd_t *pgd)
 	*(ptr + 1) = pgd;
 #endif
 
-	/* Register M_TW will contain base address of level 1 table minus the
-	 * lower part of the kernel PGDIR base address, so that all accesses to
-	 * level 1 table are done relative to lower part of kernel PGDIR base
-	 * address.
-	 */
-	mtspr(SPRN_M_TW, __pa(pgd) - offset);
+	/* Register M_TWB will contain base address of level 1 table */
+	mtspr(SPRN_M_TWB, __pa(pgd));
 
 	/* Update context */
 	mtspr(SPRN_M_CASID, id - 1);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f1153f8254e3..6c07d40eed0f 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -61,7 +61,11 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 		cachep = hugepte_cache;
 		num_hugepd = 1 << (pshift - pdshift);
 	} else {
+#ifdef CONFIG_PPC_8xx
+		cachep = PGT_CACHE(PTE_SHIFT);
+#else
 		cachep = PGT_CACHE(pdshift - pshift);
+#endif
 		num_hugepd = 1;
 	}
 
@@ -326,7 +330,11 @@ static void free_hugepd_range(struct mmu_gather *tlb, hugepd_t *hpdp, int pdshif
 	if (shift >= pdshift)
 		hugepd_free(tlb, hugepte);
 	else
+#ifdef CONFIG_PPC_8xx
+		pgtable_free_tlb(tlb, hugepte, PTE_SHIFT);
+#else
 		pgtable_free_tlb(tlb, hugepte, pdshift - shift);
+#endif
 }
 
 static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -689,7 +697,11 @@ static int __init hugetlbpage_init(void)
 		 * use pgt cache for hugepd.
 		 */
 		if (pdshift > shift)
+#ifdef CONFIG_PPC_8xx
+			pgtable_cache_add(PTE_SHIFT, NULL);
+#else
 			pgtable_cache_add(pdshift - shift, NULL);
+#endif
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 		else if (!hugepte_cache) {
 			/*
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  14/17] powerpc/8xx: reunify TLB handler routines
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (12 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 13/17] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 15/17] powerpc/8xx: Free up SPRN_SPRG_SCRATCH2 Christophe Leroy
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Each handler must not exceed 64 instructions to fit into the main
exception area.
Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part.

In the worst case:
Main part of ITLB handler is 45 insn, side part is 9 insn ==> total 54
Main part of DTLB handler is 37 insn, side part is 23 insn ==> total 60

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/head_8xx.S | 108 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 4855d5a36f70..c98a4ebb5a4d 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -388,6 +388,23 @@ _ENTRY(itlb_miss_perf)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
 	rfi
 
+#ifndef CONFIG_PIN_TLB_TEXT
+ITLBMissLinear:
+	mtcr	r11
+	/* Set 8M byte page and mark it valid */
+	li	r11, MI_PS8MEG | MI_SVALID
+	mtspr	SPRN_MI_TWC, r11
+	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
+	ori	r10, r10, 0xf0 | MI_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
+			  _PAGE_PRESENT
+	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
+
+_ENTRY(itlb_miss_exit_2)
+	mfspr	r10, SPRN_SPRG_SCRATCH0
+	mfspr	r11, SPRN_SPRG_SCRATCH1
+	rfi
+#endif
+
 	. = 0x1200
 DataStoreTLBMiss:
 	mtspr	SPRN_SPRG_SCRATCH0, r10
@@ -463,6 +480,41 @@ _ENTRY(dtlb_miss_perf)
 	mfspr	r11, SPRN_SPRG_SCRATCH1
 	rfi
 
+DTLBMissIMMR:
+	mtcr	r11
+	/* Set 512k byte guarded page and mark it valid */
+	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
+	mtspr	SPRN_MD_TWC, r10
+	mfspr	r10, SPRN_IMMR			/* Get current IMMR */
+	rlwinm	r10, r10, 0, 0xfff80000		/* Get 512 kbytes boundary */
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
+			  _PAGE_PRESENT | _PAGE_NO_CACHE
+	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+_ENTRY(dtlb_miss_exit_2)
+	mfspr	r10, SPRN_SPRG_SCRATCH0
+	mfspr	r11, SPRN_SPRG_SCRATCH1
+	rfi
+
+DTLBMissLinear:
+	mtcr	r11
+	/* Set 8M byte page and mark it valid */
+	li	r11, MD_PS8MEG | MD_SVALID
+	mtspr	SPRN_MD_TWC, r11
+	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
+	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
+			  _PAGE_PRESENT
+	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
+
+	li	r11, RPN_PATTERN
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+_ENTRY(dtlb_miss_exit_3)
+	mfspr	r10, SPRN_SPRG_SCRATCH0
+	mfspr	r11, SPRN_SPRG_SCRATCH1
+	rfi
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -565,62 +617,6 @@ InstructionBreakpoint:
 
 	. = 0x2000
 
-/*
- * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
- * not enough space in the DataStoreTLBMiss area.
- */
-DTLBMissIMMR:
-	mtcr	r11
-	/* Set 512k byte guarded page and mark it valid */
-	li	r10, MD_PS512K | MD_GUARDED | MD_SVALID
-	mtspr	SPRN_MD_TWC, r10
-	mfspr	r10, SPRN_IMMR			/* Get current IMMR */
-	rlwinm	r10, r10, 0, 0xfff80000		/* Get 512 kbytes boundary */
-	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
-			  _PAGE_PRESENT | _PAGE_NO_CACHE
-	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
-
-	li	r11, RPN_PATTERN
-	mtspr	SPRN_DAR, r11	/* Tag DAR */
-_ENTRY(dtlb_miss_exit_2)
-	mfspr	r10, SPRN_SPRG_SCRATCH0
-	mfspr	r11, SPRN_SPRG_SCRATCH1
-	rfi
-
-DTLBMissLinear:
-	mtcr	r11
-	/* Set 8M byte page and mark it valid */
-	li	r11, MD_PS8MEG | MD_SVALID
-	mtspr	SPRN_MD_TWC, r11
-	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
-	ori	r10, r10, 0xf0 | MD_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
-			  _PAGE_PRESENT
-	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
-
-	li	r11, RPN_PATTERN
-	mtspr	SPRN_DAR, r11	/* Tag DAR */
-_ENTRY(dtlb_miss_exit_3)
-	mfspr	r10, SPRN_SPRG_SCRATCH0
-	mfspr	r11, SPRN_SPRG_SCRATCH1
-	rfi
-
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
-	mtcr	r11
-	/* Set 8M byte page and mark it valid */
-	li	r11, MI_PS8MEG | MI_SVALID
-	mtspr	SPRN_MI_TWC, r11
-	rlwinm	r10, r10, 20, 0x0f800000	/* 8xx supports max 256Mb RAM */
-	ori	r10, r10, 0xf0 | MI_SPS16K | _PAGE_PRIVILEGED | _PAGE_DIRTY | \
-			  _PAGE_PRESENT
-	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
-
-_ENTRY(itlb_miss_exit_2)
-	mfspr	r10, SPRN_SPRG_SCRATCH0
-	mfspr	r11, SPRN_SPRG_SCRATCH1
-	rfi
-#endif
-
 /* This is the procedure to calculate the data EA for buggy dcbx,dcbi instructions
  * by decoding the registers used by the dcbx instruction and adding them.
  * DAR is set to the calculated address.
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  15/17] powerpc/8xx: Free up SPRN_SPRG_SCRATCH2
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (13 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 14/17] powerpc/8xx: reunify TLB handler routines Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH 16/17] powerpc/mm: Make pte_fragment_alloc() common to PPC32 and PPC64 Christophe Leroy
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

We can now use SPRN_M_TW in the DAR Fixup code, freeing
SPRN_SPRG_SCRATCH2

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/head_8xx.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c98a4ebb5a4d..8e96b526f109 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -624,7 +624,7 @@ InstructionBreakpoint:
  /* define if you don't want to use self modifying code */
 #define NO_SELF_MODIFYING_CODE
 FixupDAR:/* Entry point for dcbx workaround. */
-	mtspr	SPRN_SPRG_SCRATCH2, r10
+	mtspr	SPRN_M_TW, r10
 	/* fetch instruction from memory. */
 	mfspr	r10, SPRN_SRR0
 	mtspr	SPRN_MD_EPN, r10
@@ -668,7 +668,7 @@ _ENTRY(FixupDAR_cmp)
 	beq+	142f
 	cmpwi	cr0, r10, 1964	/* Is icbi? */
 	beq+	142f
-141:	mfspr	r10,SPRN_SPRG_SCRATCH2
+141:	mfspr	r10,SPRN_M_TW
 	b	DARFixed	/* Nope, go back to normal TLB processing */
 
 200:
@@ -703,7 +703,7 @@ modified_instr:
 	bne+	143f
 	subf	r10,r0,r10	/* r10=r10-r0, only if reg RA is r0 */
 143:	mtdar	r10		/* store faulting EA in DAR */
-	mfspr	r10,SPRN_SPRG_SCRATCH2
+	mfspr	r10,SPRN_M_TW
 	b	DARFixed	/* Go back to normal TLB handling */
 #else
 	mfctr	r10
@@ -757,7 +757,7 @@ modified_instr:
 	mfdar	r11
 	mtctr	r11			/* restore ctr reg from DAR */
 	mtdar	r10			/* save fault EA to DAR */
-	mfspr	r10,SPRN_SPRG_SCRATCH2
+	mfspr	r10,SPRN_M_TW
 	b	DARFixed		/* Go back to normal TLB handling */
 
 	/* special handling for r10,r11 since these are modified already */
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH  16/17] powerpc/mm: Make pte_fragment_alloc() common to PPC32 and PPC64
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (14 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 15/17] powerpc/8xx: Free up SPRN_SPRG_SCRATCH2 Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-04 12:34 ` [PATCH BAD 17/17] powerpc/mm: Use pte_fragment_alloc() on 8xx Christophe Leroy
  2018-05-11  6:48 ` [PATCH 00/17] Implement use of HW assistance on TLB table walk " Michael Ellerman
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

In order to allow the 8xx to handle pte_fragments, this patch
makes in common to PPC32 and PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/mmu_context.h | 28 ++++++++++++++
 arch/powerpc/mm/mmu_context_book3s64.c | 28 --------------
 arch/powerpc/mm/pgtable.c              | 67 ++++++++++++++++++++++++++++++++++
 arch/powerpc/mm/pgtable_64.c           | 67 ----------------------------------
 arch/powerpc/platforms/Kconfig.cputype |  5 +++
 5 files changed, 100 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 1835ca1505d6..252988f7e219 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -262,5 +262,33 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 
 #endif /* CONFIG_PPC_MEM_KEYS */
 
+#ifdef CONFIG_NEED_PTE_FRAG
+static inline void destroy_pagetable_page(struct mm_struct *mm)
+{
+	int count;
+	void *pte_frag;
+	struct page *page;
+
+	pte_frag = mm->context.pte_frag;
+	if (!pte_frag)
+		return;
+
+	page = virt_to_page(pte_frag);
+	/* drop all the pending references */
+	count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
+	/* We allow PTE_FRAG_NR fragments from a PTE page */
+	if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
+		pgtable_page_dtor(page);
+		free_unref_page(page);
+	}
+}
+
+#else
+static inline void destroy_pagetable_page(struct mm_struct *mm)
+{
+	return;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index b75194dff64c..2f55a4e3c09a 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -192,34 +192,6 @@ static void destroy_contexts(mm_context_t *ctx)
 	spin_unlock(&mmu_context_lock);
 }
 
-#ifdef CONFIG_PPC_64K_PAGES
-static void destroy_pagetable_page(struct mm_struct *mm)
-{
-	int count;
-	void *pte_frag;
-	struct page *page;
-
-	pte_frag = mm->context.pte_frag;
-	if (!pte_frag)
-		return;
-
-	page = virt_to_page(pte_frag);
-	/* drop all the pending references */
-	count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
-	/* We allow PTE_FRAG_NR fragments from a PTE page */
-	if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
-		pgtable_page_dtor(page);
-		free_unref_page(page);
-	}
-}
-
-#else
-static inline void destroy_pagetable_page(struct mm_struct *mm)
-{
-	return;
-}
-#endif
-
 void destroy_context(struct mm_struct *mm)
 {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9f361ae571e9..2d34755ed727 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -264,3 +264,70 @@ unsigned long vmalloc_to_phys(void *va)
 	return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va);
 }
 EXPORT_SYMBOL_GPL(vmalloc_to_phys);
+
+#ifdef CONFIG_NEED_PTE_FRAG
+static pte_t *get_from_cache(struct mm_struct *mm)
+{
+	void *pte_frag, *ret;
+
+	spin_lock(&mm->page_table_lock);
+	ret = mm->context.pte_frag;
+	if (ret) {
+		pte_frag = ret + PTE_FRAG_SIZE;
+		/*
+		 * If we have taken up all the fragments mark PTE page NULL
+		 */
+		if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
+			pte_frag = NULL;
+		mm->context.pte_frag = pte_frag;
+	}
+	spin_unlock(&mm->page_table_lock);
+	return (pte_t *)ret;
+}
+
+static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
+{
+	void *ret = NULL;
+	struct page *page;
+
+	if (!kernel) {
+		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
+		if (!page)
+			return NULL;
+		if (!pgtable_page_ctor(page)) {
+			__free_page(page);
+			return NULL;
+		}
+	} else {
+		page = alloc_page(PGALLOC_GFP);
+		if (!page)
+			return NULL;
+	}
+
+	ret = page_address(page);
+	spin_lock(&mm->page_table_lock);
+	/*
+	 * If we find pgtable_page set, we return
+	 * the allocated page with single fragement
+	 * count.
+	 */
+	if (likely(!mm->context.pte_frag)) {
+		set_page_count(page, PTE_FRAG_NR);
+		mm->context.pte_frag = ret + PTE_FRAG_SIZE;
+	}
+	spin_unlock(&mm->page_table_lock);
+
+	return (pte_t *)ret;
+}
+
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
+{
+	pte_t *pte;
+
+	pte = get_from_cache(mm);
+	if (pte)
+		return pte;
+
+	return __alloc_for_cache(mm, kernel);
+}
+#endif
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index dd1102a246e4..1d8dc37d98a7 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -139,73 +139,6 @@ struct page *pmd_page(pmd_t pmd)
 	return virt_to_page(pmd_page_vaddr(pmd));
 }
 
-#ifdef CONFIG_PPC_64K_PAGES
-static pte_t *get_from_cache(struct mm_struct *mm)
-{
-	void *pte_frag, *ret;
-
-	spin_lock(&mm->page_table_lock);
-	ret = mm->context.pte_frag;
-	if (ret) {
-		pte_frag = ret + PTE_FRAG_SIZE;
-		/*
-		 * If we have taken up all the fragments mark PTE page NULL
-		 */
-		if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
-			pte_frag = NULL;
-		mm->context.pte_frag = pte_frag;
-	}
-	spin_unlock(&mm->page_table_lock);
-	return (pte_t *)ret;
-}
-
-static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
-{
-	void *ret = NULL;
-	struct page *page;
-
-	if (!kernel) {
-		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-		if (!page)
-			return NULL;
-		if (!pgtable_page_ctor(page)) {
-			__free_page(page);
-			return NULL;
-		}
-	} else {
-		page = alloc_page(PGALLOC_GFP);
-		if (!page)
-			return NULL;
-	}
-
-	ret = page_address(page);
-	spin_lock(&mm->page_table_lock);
-	/*
-	 * If we find pgtable_page set, we return
-	 * the allocated page with single fragement
-	 * count.
-	 */
-	if (likely(!mm->context.pte_frag)) {
-		set_page_count(page, PTE_FRAG_NR);
-		mm->context.pte_frag = ret + PTE_FRAG_SIZE;
-	}
-	spin_unlock(&mm->page_table_lock);
-
-	return (pte_t *)ret;
-}
-
-pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
-{
-	pte_t *pte;
-
-	pte = get_from_cache(mm);
-	if (pte)
-		return pte;
-
-	return __alloc_for_cache(mm, kernel);
-}
-#endif /* CONFIG_PPC_64K_PAGES */
-
 void pte_fragment_free(unsigned long *table, int kernel)
 {
 	struct page *page = virt_to_page(table);
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index f860f0326c78..7172b04c91b5 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -337,6 +337,11 @@ config PPC_MM_SLICES
 	default y if PPC_8xx && HUGETLB_PAGE
 	default n
 
+config NEED_PTE_FRAG
+	bool
+	default y if PPC_BOOK3S_64 && PPC_64K_PAGES
+	default n
+
 config PPC_HAVE_PMU_SUPPORT
        bool
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH BAD 17/17] powerpc/mm: Use pte_fragment_alloc() on 8xx
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (15 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH 16/17] powerpc/mm: Make pte_fragment_alloc() common to PPC32 and PPC64 Christophe Leroy
@ 2018-05-04 12:34 ` Christophe Leroy
  2018-05-11  6:48 ` [PATCH 00/17] Implement use of HW assistance on TLB table walk " Michael Ellerman
  17 siblings, 0 replies; 28+ messages in thread
From: Christophe Leroy @ 2018-05-04 12:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

DO NOT APPLY THAT ONE, IT BUGS. But comments are welcome.


In 16k pages mode, the 8xx still need only 4k for the page table.

This patch makes use of the pte_fragment functions in order
to avoid wasting memory space

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/mmu-8xx.h           |  4 ++++
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 29 +++++++++++++++++++++++++++-
 arch/powerpc/include/asm/nohash/32/pgtable.h |  5 ++++-
 arch/powerpc/mm/mmu_context_nohash.c         |  4 ++++
 arch/powerpc/mm/pgtable.c                    | 10 +++++++++-
 arch/powerpc/mm/pgtable_32.c                 | 12 ++++++++++++
 arch/powerpc/platforms/Kconfig.cputype       |  1 +
 7 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h
index 193f53116c7a..4f4cb754afd8 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -190,6 +190,10 @@ typedef struct {
 	struct slice_mask mask_8m;
 # endif
 #endif
+#ifdef CONFIG_NEED_PTE_FRAG
+	/* for 4K PTE fragment support */
+	void *pte_frag;
+#endif
 } mm_context_t;
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff80000)
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 1c6461e7c6aa..1e3b8f580499 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -93,6 +93,32 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
 	((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel_g(pmd, address))? \
 		NULL: pte_offset_kernel(pmd, address))
 
+#ifdef CONFIG_NEED_PTE_FRAG
+extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
+extern void pte_fragment_free(unsigned long *, int);
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return (pte_t *)pte_fragment_alloc(mm, address, 1);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
+				      unsigned long address)
+{
+	return (pgtable_t)pte_fragment_alloc(mm, address, 0);
+}
+
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+	pte_fragment_free((unsigned long *)pte, 1);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
+{
+	pte_fragment_free((unsigned long *)ptepage, 0);
+}
+#else
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
 
@@ -106,11 +132,12 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 	pgtable_page_dtor(ptepage);
 	__free_page(ptepage);
 }
+#endif
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
 	if (!index_size) {
-		free_page((unsigned long)table);
+		pte_free_kernel(NULL, table);
 	} else {
 		BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
 		kmem_cache_free(PGT_CACHE(index_size), table);
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 3efd616bbc80..e2a22c8dc7f6 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -20,6 +20,9 @@ extern int icache_44x_need_flush;
 
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
 #define PTE_INDEX_SIZE  (PTE_SHIFT - 2)
+#define PTE_FRAG_NR		4
+#define PTE_FRAG_SIZE_SHIFT	12
+#define PTE_FRAG_SIZE		(1UL << PTE_FRAG_SIZE_SHIFT)
 #else
 #define PTE_INDEX_SIZE	PTE_SHIFT
 #endif
@@ -303,7 +306,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
  */
 #ifndef CONFIG_BOOKE
 #define pmd_page_vaddr(pmd)	\
-	((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+	((unsigned long) __va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
 #define pmd_page(pmd)		\
 	pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
 #else
diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index e09228a9ad00..8b0ab33673e5 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -390,6 +390,9 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm)
 #endif
 	mm->context.id = MMU_NO_CONTEXT;
 	mm->context.active = 0;
+#ifdef CONFIG_NEED_PTE_FRAG
+	mm->context.pte_frag = NULL;
+#endif
 	return 0;
 }
 
@@ -418,6 +421,7 @@ void destroy_context(struct mm_struct *mm)
 		nr_free_contexts++;
 	}
 	raw_spin_unlock_irqrestore(&context_lock, flags);
+	destroy_pagetable_page(mm);
 }
 
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 2d34755ed727..96cc5aa73331 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -23,6 +23,7 @@
 
 #include <linux/kernel.h>
 #include <linux/gfp.h>
+#include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -320,10 +321,17 @@ static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
 	return (pte_t *)ret;
 }
 
-pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
+__ref pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
 {
 	pte_t *pte;
 
+	if (kernel && !slab_is_available()) {
+		pte = __va(memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE));
+		if (pte)
+			memset(pte, 0, PTE_FRAG_SIZE);
+
+		return pte;
+	}
 	pte = get_from_cache(mm);
 	if (pte)
 		return pte;
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 3aa0c78db95d..5c8737cf2945 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -40,6 +40,17 @@
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
+#ifdef CONFIG_NEED_PTE_FRAG
+void pte_fragment_free(unsigned long *table, int kernel)
+{
+	struct page *page = virt_to_page(table);
+	if (put_page_testzero(page)) {
+		if (!kernel)
+			pgtable_page_dtor(page);
+		free_unref_page(page);
+	}
+}
+#else
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
 	pte_t *pte;
@@ -69,6 +80,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 	}
 	return ptepage;
 }
+#endif
 
 #ifdef CONFIG_PPC_GUARDED_PAGE_IN_PMD
 int __pte_alloc_kernel_g(pmd_t *pmd, unsigned long address)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 7172b04c91b5..eff6210ad3c0 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -340,6 +340,7 @@ config PPC_MM_SLICES
 config NEED_PTE_FRAG
 	bool
 	default y if PPC_BOOK3S_64 && PPC_64K_PAGES
+	default y if PPC_8xx && PPC_16K_PAGES
 	default n
 
 config PPC_HAVE_PMU_SUPPORT
-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  2018-05-04 12:34 ` [PATCH 12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
@ 2018-05-04 13:16   ` Joakim Tjernlund
  0 siblings, 0 replies; 28+ messages in thread
From: Joakim Tjernlund @ 2018-05-04 13:16 UTC (permalink / raw)
  To: christophe.leroy, paulus, mpe, benh, aneesh.kumar
  Cc: linuxppc-dev, linux-kernel

On Fri, 2018-05-04 at 14:34 +0200, Christophe Leroy wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
> introduced non atomic PTE updates and started the work of removing
> PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
> 8xx with the following comment:
> /* Until my rework is finished, 8xx still needs atomic PTE updates */
> 
> commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as linux
> mm expects") removed all PTE updates done in TLB miss handlers

Is that my 7 year old commit ?

> 
> Therefore, atomic PTE updates are not needed anymore for the 8xx

About time removing atomic updates then :)

> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  01/17] powerpc/nohash: remove hash related code from nohash headers.
  2018-05-04 12:33 ` [PATCH 01/17] powerpc/nohash: remove hash related code from nohash headers Christophe Leroy
@ 2018-05-08  8:25   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2018-05-08  8:25 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> When nohash and book3s header were split, some hash related stuff
> remained in the nohash header. This patch removes them.
>

Thanks for doing this. This was on the TODO list for a long time. When
we did the split for book3s, I mostly copied the generic headers to
nohash.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  Removed the call to pte_young() as it fails, back to using PAGE_ACCESSED directly.
>
>  arch/powerpc/include/asm/nohash/32/pgtable.h | 29 +++------------------
>  arch/powerpc/include/asm/nohash/64/pgtable.h | 16 ++----------
>  arch/powerpc/include/asm/nohash/pgtable.h    | 38 +++-------------------------
>  arch/powerpc/include/asm/nohash/pte-book3e.h |  1 -
>  4 files changed, 10 insertions(+), 74 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
> index 03bbd1149530..140f8e74b478 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@ -133,7 +133,7 @@ extern int icache_44x_need_flush;
>  #ifndef __ASSEMBLY__
>  
>  #define pte_clear(mm, addr, ptep) \
> -	do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
> +	do { pte_update(ptep, ~0, 0); } while (0)
>  
>  #define pmd_none(pmd)		(!pmd_val(pmd))
>  #define	pmd_bad(pmd)		(pmd_val(pmd) & _PMD_BAD)
> @@ -146,21 +146,6 @@ static inline void pmd_clear(pmd_t *pmdp)
>  
>  
>  /*
> - * When flushing the tlb entry for a page, we also need to flush the hash
> - * table entry.  flush_hash_pages is assembler (for speed) in hashtable.S.
> - */
> -extern int flush_hash_pages(unsigned context, unsigned long va,
> -			    unsigned long pmdval, int count);
> -
> -/* Add an HPTE to the hash table */
> -extern void add_hash_page(unsigned context, unsigned long va,
> -			  unsigned long pmdval);
> -
> -/* Flush an entry from the TLB/hash table */
> -extern void flush_hash_entry(struct mm_struct *mm, pte_t *ptep,
> -			     unsigned long address);
> -
> -/*
>   * PTE updates. This function is called whenever an existing
>   * valid PTE is updated. This does -not- include set_pte_at()
>   * which nowadays only sets a new PTE.
> @@ -246,12 +231,6 @@ static inline int __ptep_test_and_clear_young(unsigned int context, unsigned lon
>  {
>  	unsigned long old;
>  	old = pte_update(ptep, _PAGE_ACCESSED, 0);
> -#if _PAGE_HASHPTE != 0
> -	if (old & _PAGE_HASHPTE) {
> -		unsigned long ptephys = __pa(ptep) & PAGE_MASK;
> -		flush_hash_pages(context, addr, ptephys, 1);
> -	}
> -#endif
>  	return (old & _PAGE_ACCESSED) != 0;
>  }
>  #define ptep_test_and_clear_young(__vma, __addr, __ptep) \
> @@ -261,7 +240,7 @@ static inline int __ptep_test_and_clear_young(unsigned int context, unsigned lon
>  static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>  				       pte_t *ptep)
>  {
> -	return __pte(pte_update(ptep, ~_PAGE_HASHPTE, 0));
> +	return __pte(pte_update(ptep, ~0, 0));
>  }
>  
>  #define __HAVE_ARCH_PTEP_SET_WRPROTECT
> @@ -289,7 +268,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
>  }
>  
>  #define __HAVE_ARCH_PTE_SAME
> -#define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
> +#define pte_same(A,B)	((pte_val(A) ^ pte_val(B)) == 0)
>  
>  /*
>   * Note that on Book E processors, the pmd contains the kernel virtual
> @@ -330,7 +309,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
>  /*
>   * Encode and decode a swap entry.
>   * Note that the bits we use in a PTE for representing a swap entry
> - * must not include the _PAGE_PRESENT bit or the _PAGE_HASHPTE bit (if used).
> + * must not include the _PAGE_PRESENT bit.
>   *   -- paulus
>   */
>  #define __swp_type(entry)		((entry).val & 0x1f)
> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
> index 5c5f75d005ad..4f6f5a27bfb5 100644
> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
> @@ -173,8 +173,6 @@ static inline void pgd_set(pgd_t *pgdp, unsigned long val)
>  /* to find an entry in a kernel page-table-directory */
>  /* This now only contains the vmalloc pages */
>  #define pgd_offset_k(address) pgd_offset(&init_mm, address)
> -extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
> -			    pte_t *ptep, unsigned long pte, int huge);
>  
>  /* Atomic PTE updates */
>  static inline unsigned long pte_update(struct mm_struct *mm,
> @@ -205,11 +203,6 @@ static inline unsigned long pte_update(struct mm_struct *mm,
>  	if (!huge)
>  		assert_pte_locked(mm, addr);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	if (old & _PAGE_HASHPTE)
> -		hpte_need_flush(mm, addr, ptep, old, huge);
> -#endif
> -
>  	return old;
>  }
>  
> @@ -218,7 +211,7 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
>  {
>  	unsigned long old;
>  
> -	if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
> +	if ((pte_val(*ptep) & _PAGE_ACCESSED) == 0)
>  		return 0;
>  	old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
>  	return (old & _PAGE_ACCESSED) != 0;
> @@ -312,7 +305,7 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
>  }
>  
>  #define __HAVE_ARCH_PTE_SAME
> -#define pte_same(A,B)	(((pte_val(A) ^ pte_val(B)) & ~_PAGE_HPTEFLAGS) == 0)
> +#define pte_same(A,B)	((pte_val(A) ^ pte_val(B)) == 0)
>  
>  #define pte_ERROR(e) \
>  	pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
> @@ -324,11 +317,6 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
>  /* Encode and de-code a swap entry */
>  #define MAX_SWAPFILES_CHECK() do { \
>  	BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
> -	/*							\
> -	 * Don't have overlapping bits with _PAGE_HPTEFLAGS	\
> -	 * We filter HPTEFLAGS on set_pte.			\
> -	 */							\
> -	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
>  	} while (0)
>  /*
>   * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index c56de1e8026f..f2fe3cbe90af 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -148,37 +148,16 @@ extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>  static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  				pte_t *ptep, pte_t pte, int percpu)
>  {
> -#if defined(CONFIG_PPC_STD_MMU_32) && defined(CONFIG_SMP) && !defined(CONFIG_PTE_64BIT)
> -	/* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use the
> -	 * helper pte_update() which does an atomic update. We need to do that
> -	 * because a concurrent invalidation can clear _PAGE_HASHPTE. If it's a
> -	 * per-CPU PTE such as a kmap_atomic, we do a simple update preserving
> -	 * the hash bits instead (ie, same as the non-SMP case)
> -	 */
> -	if (percpu)
> -		*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
> -			      | (pte_val(pte) & ~_PAGE_HASHPTE));
> -	else
> -		pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
> -
> -#elif defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
> +#if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
>  	/* Second case is 32-bit with 64-bit PTE.  In this case, we
>  	 * can just store as long as we do the two halves in the right order
> -	 * with a barrier in between. This is possible because we take care,
> -	 * in the hash code, to pre-invalidate if the PTE was already hashed,
> -	 * which synchronizes us with any concurrent invalidation.
> -	 * In the percpu case, we also fallback to the simple update preserving
> -	 * the hash bits
> +	 * with a barrier in between.
> +	 * In the percpu case, we also fallback to the simple update
>  	 */
>  	if (percpu) {
> -		*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
> -			      | (pte_val(pte) & ~_PAGE_HASHPTE));
> +		*ptep = pte;
>  		return;
>  	}
> -#if _PAGE_HASHPTE != 0
> -	if (pte_val(*ptep) & _PAGE_HASHPTE)
> -		flush_hash_entry(mm, ptep, addr);
> -#endif
>  	__asm__ __volatile__("\
>  		stw%U0%X0 %2,%0\n\
>  		eieio\n\
> @@ -186,15 +165,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
>  	: "r" (pte) : "memory");
>  
> -#elif defined(CONFIG_PPC_STD_MMU_32)
> -	/* Third case is 32-bit hash table in UP mode, we need to preserve
> -	 * the _PAGE_HASHPTE bit since we may not have invalidated the previous
> -	 * translation in the hash yet (done in a subsequent flush_tlb_xxx())
> -	 * and see we need to keep track that this PTE needs invalidating
> -	 */
> -	*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
> -		      | (pte_val(pte) & ~_PAGE_HASHPTE));
> -
>  #else
>  	/* Anything else just stores the PTE normally. That covers all 64-bit
>  	 * cases, and 32-bit non-hash with 32-bit PTEs.
> diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h
> index ccee8eb509bb..9ff51b4c0cac 100644
> --- a/arch/powerpc/include/asm/nohash/pte-book3e.h
> +++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
> @@ -57,7 +57,6 @@
>  #define _PAGE_USER		(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */
>  #define _PAGE_PRIVILEGED	(_PAGE_BAP_SR)
>  
> -#define _PAGE_HASHPTE	0
>  #define _PAGE_BUSY	0
>  
>  #define _PAGE_SPECIAL	_PAGE_SW0
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  02/17] powerpc/nohash: remove _PAGE_BUSY
  2018-05-04 12:33 ` [PATCH 02/17] powerpc/nohash: remove _PAGE_BUSY Christophe Leroy
@ 2018-05-08  8:26   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2018-05-08  8:26 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> _PAGE_BUSY is always 0, remove it
>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/include/asm/nohash/64/pgtable.h | 10 +++-------
>  arch/powerpc/include/asm/nohash/pte-book3e.h |  5 -----
>  2 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h
> index 4f6f5a27bfb5..c3559d7a94fb 100644
> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
> @@ -186,14 +186,12 @@ static inline unsigned long pte_update(struct mm_struct *mm,
>  
>  	__asm__ __volatile__(
>  	"1:	ldarx	%0,0,%3		# pte_update\n\
> -	andi.	%1,%0,%6\n\
> -	bne-	1b \n\
>  	andc	%1,%0,%4 \n\
> -	or	%1,%1,%7\n\
> +	or	%1,%1,%6\n\
>  	stdcx.	%1,0,%3 \n\
>  	bne-	1b"
>  	: "=&r" (old), "=&r" (tmp), "=m" (*ptep)
> -	: "r" (ptep), "r" (clr), "m" (*ptep), "i" (_PAGE_BUSY), "r" (set)
> +	: "r" (ptep), "r" (clr), "m" (*ptep), "r" (set)
>  	: "cc" );
>  #else
>  	unsigned long old = pte_val(*ptep);
> @@ -290,13 +288,11 @@ static inline void __ptep_set_access_flags(struct mm_struct *mm,
>  
>  	__asm__ __volatile__(
>  	"1:	ldarx	%0,0,%4\n\
> -		andi.	%1,%0,%6\n\
> -		bne-	1b \n\
>  		or	%0,%3,%0\n\
>  		stdcx.	%0,0,%4\n\
>  		bne-	1b"
>  	:"=&r" (old), "=&r" (tmp), "=m" (*ptep)
> -	:"r" (bits), "r" (ptep), "m" (*ptep), "i" (_PAGE_BUSY)
> +	:"r" (bits), "r" (ptep), "m" (*ptep)
>  	:"cc");
>  #else
>  	unsigned long old = pte_val(*ptep);
> diff --git a/arch/powerpc/include/asm/nohash/pte-book3e.h b/arch/powerpc/include/asm/nohash/pte-book3e.h
> index 9ff51b4c0cac..12730b81cd98 100644
> --- a/arch/powerpc/include/asm/nohash/pte-book3e.h
> +++ b/arch/powerpc/include/asm/nohash/pte-book3e.h
> @@ -57,13 +57,8 @@
>  #define _PAGE_USER		(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */
>  #define _PAGE_PRIVILEGED	(_PAGE_BAP_SR)
>  
> -#define _PAGE_BUSY	0
> -
>  #define _PAGE_SPECIAL	_PAGE_SW0
>  
> -/* Flags to be preserved on PTE modifications */
> -#define _PAGE_HPTEFLAGS	_PAGE_BUSY
> -
>  /* Base page size */
>  #ifdef CONFIG_PPC_64K_PAGES
>  #define _PAGE_PSIZE	_PAGE_PSIZE_64K
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  09/17] powerpc: make __ioremap_caller() common to PPC32 and PPC64
  2018-05-04 12:34 ` [PATCH 09/17] powerpc: make __ioremap_caller() " Christophe Leroy
@ 2018-05-08  9:56   ` Aneesh Kumar K.V
  2018-05-16  9:58     ` Christophe LEROY
  0 siblings, 1 reply; 28+ messages in thread
From: Aneesh Kumar K.V @ 2018-05-08  9:56 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |   1 +
>  arch/powerpc/mm/ioremap.c                    | 126 +++++++--------------------
>  2 files changed, 34 insertions(+), 93 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index c5c6ead06bfb..2bebdd8302cb 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -18,6 +18,7 @@
>  #define _PAGE_RO		0
>  #define _PAGE_USER		0
>  #define _PAGE_HWWRITE		0
> +#define _PAGE_COHERENT		0

This is something I was trying to avoid when I split the headers. We do
support _PAGE_USER it is !_PAGE_PRIVILEGED. It gets really confusing
when we have these conflicting names because we are trying to make code
common across platforms.


>  
>  #define _PAGE_EXEC		0x00001 /* execute permission */
>  #define _PAGE_WRITE		0x00002 /* write access allowed */
> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
> index 65d611d44d38..59be5dfcb3e9 100644
> --- a/arch/powerpc/mm/ioremap.c
> +++ b/arch/powerpc/mm/ioremap.c
> @@ -33,95 +33,6 @@ unsigned long ioremap_bot;
>  unsigned long ioremap_bot = IOREMAP_BASE;
>  #endif
>  
> -#ifdef CONFIG_PPC32
> -
> -void __iomem *
> -__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
> -		 void *caller)
> -{
> -	unsigned long v, i;
> -	phys_addr_t p;
> -	int err;
> -
> -	/* Make sure we have the base flags */
> -	if ((flags & _PAGE_PRESENT) == 0)
> -		flags |= pgprot_val(PAGE_KERNEL);
> -
> -	/* Non-cacheable page cannot be coherent */
> -	if (flags & _PAGE_NO_CACHE)
> -		flags &= ~_PAGE_COHERENT;
> -
> -	/*
> -	 * Choose an address to map it to.
> -	 * Once the vmalloc system is running, we use it.
> -	 * Before then, we use space going up from IOREMAP_BASE
> -	 * (ioremap_bot records where we're up to).
> -	 */
> -	p = addr & PAGE_MASK;
> -	size = PAGE_ALIGN(addr + size) - p;
> -
> -	/*
> -	 * If the address lies within the first 16 MB, assume it's in ISA
> -	 * memory space
> -	 */
> -	if (p < 16*1024*1024)
> -		p += _ISA_MEM_BASE;
> -
> -#ifndef CONFIG_CRASH_DUMP
> -	/*
> -	 * Don't allow anybody to remap normal RAM that we're using.
> -	 * mem_init() sets high_memory so only do the check after that.
> -	 */
> -	if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
> -	    page_is_ram(__phys_to_pfn(p))) {
> -		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
> -		       (unsigned long long)p, __builtin_return_address(0));
> -		return NULL;
> -	}
> -#endif
> -
> -	if (size == 0)
> -		return NULL;
> -
> -	/*
> -	 * Is it already mapped?  Perhaps overlapped by a previous
> -	 * mapping.
> -	 */
> -	v = p_block_mapped(p);
> -	if (v)
> -		goto out;
> -
> -	if (slab_is_available()) {
> -		struct vm_struct *area;
> -		area = get_vm_area_caller(size, VM_IOREMAP, caller);
> -		if (area == 0)
> -			return NULL;
> -		area->phys_addr = p;
> -		v = (unsigned long) area->addr;
> -	} else {
> -		v = ioremap_bot;
> -		ioremap_bot += size;
> -	}
> -
> -	/*
> -	 * Should check if it is a candidate for a BAT mapping
> -	 */
> -
> -	err = 0;
> -	for (i = 0; i < size && err == 0; i += PAGE_SIZE)
> -		err = map_kernel_page(v+i, p+i, flags);
> -	if (err) {
> -		if (slab_is_available())
> -			vunmap((void *)v);
> -		return NULL;
> -	}
> -
> -out:
> -	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
> -}
> -
> -#else
> -
>  /**
>   * __ioremap_at - Low level function to establish the page tables
>   *                for an IO mapping
> @@ -135,6 +46,10 @@ void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size,
>  	if ((flags & _PAGE_PRESENT) == 0)
>  		flags |= pgprot_val(PAGE_KERNEL);
>  
> +	/* Non-cacheable page cannot be coherent */
> +	if (flags & _PAGE_NO_CACHE)
> +		flags &= ~_PAGE_COHERENT;
> +
>  	/* We don't support the 4K PFN hack with ioremap */
>  	if (flags & H_PAGE_4K_PFN)
>  		return NULL;
> @@ -187,6 +102,33 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
>  	if ((size == 0) || (paligned == 0))
>  		return NULL;
>  
> +	/*
> +	 * If the address lies within the first 16 MB, assume it's in ISA
> +	 * memory space
> +	 */
> +	if (IS_ENABLED(CONFIG_PPC32) && paligned < 16*1024*1024)
> +		paligned += _ISA_MEM_BASE;
> +
> +	/*
> +	 * Don't allow anybody to remap normal RAM that we're using.
> +	 * mem_init() sets high_memory so only do the check after that.
> +	 */
> +	if (!IS_ENABLED(CONFIG_CRASH_DUMP) &&
> +	    slab_is_available() && (paligned < virt_to_phys(high_memory)) &&
> +	    page_is_ram(__phys_to_pfn(paligned))) {
> +		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
> +		       (u64)paligned, __builtin_return_address(0));
> +		return NULL;
> +	}
> +
> +	/*
> +	 * Is it already mapped?  Perhaps overlapped by a previous
> +	 * mapping.
> +	 */
> +	ret = (void __iomem *)p_block_mapped(paligned);
> +	if (ret)
> +		goto out;
> +
>  	if (slab_is_available()) {
>  		struct vm_struct *area;
>  
> @@ -205,14 +147,12 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
>  		if (ret)
>  			ioremap_bot += size;
>  	}
> -
> +out:
>  	if (ret)
> -		ret += addr & ~PAGE_MASK;
> +		ret += (unsigned long)addr & ~PAGE_MASK;
>  	return ret;
>  }
>  
> -#endif
> -
>  /*
>   * Unmap an IO region and remove it from imalloc'd list.
>   * Access to IO memory should be serialized by driver.
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  05/17] powerpc: move io mapping functions into ioremap.c
  2018-05-04 12:34 ` [PATCH 05/17] powerpc: move io mapping functions into ioremap.c Christophe Leroy
@ 2018-05-11  6:01   ` Michael Ellerman
  2018-05-16 10:13     ` Christophe LEROY
  0 siblings, 1 reply; 28+ messages in thread
From: Michael Ellerman @ 2018-05-11  6:01 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
> new file mode 100644
> index 000000000000..5d2645193568
> --- /dev/null
> +++ b/arch/powerpc/mm/ioremap.c
> @@ -0,0 +1,350 @@
> +/*
> + * This file contains the routines for mapping IO areas
> + *
> + *  Derived from arch/powerpc/mm/pgtable_32.c and
> + *  arch/powerpc/mm/pgtable_64.c
> + *
> + * SPDX-License-Identifier: GPL-2.0

This goes at the top of the file as:

// SPDX-License-Identifier: GPL-2.0

See:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/license-rules.rst?#n56

> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/init.h>
> +#include <linux/highmem.h>
> +#include <linux/memblock.h>
> +#include <linux/slab.h>
> +
> +#include <asm/pgtable.h>
> +#include <asm/pgalloc.h>
> +#include <asm/fixmap.h>
> +#include <asm/io.h>
> +#include <asm/setup.h>
> +#include <asm/sections.h>

I needed:

+#include <asm/machdep.h>

To fix:

../arch/powerpc/mm/ioremap.c: In function ‘ioremap_wc’:
../arch/powerpc/mm/ioremap.c:290:6: error: ‘ppc_md’ undeclared (first use in this function)
  if (ppc_md.ioremap)
      ^~~~~~

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly
  2018-05-04 12:34 ` [PATCH 11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly Christophe Leroy
@ 2018-05-11  6:45   ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2018-05-11  6:45 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
> index 59be5dfcb3e9..b8c347077e02 100644
> --- a/arch/powerpc/mm/ioremap.c
> +++ b/arch/powerpc/mm/ioremap.c
> @@ -132,9 +132,14 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
>  	if (slab_is_available()) {
>  		struct vm_struct *area;
>  
> -		area = __get_vm_area_caller(size, VM_IOREMAP,
> -					    ioremap_bot, IOREMAP_END,
> -					    caller);
> +		if (flags & _PAGE_GUARDED)

On 64-bit configs:

  arch/powerpc/mm/ioremap.c:135:15: error: '_PAGE_GUARDED' undeclared (first use in this function)

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH  00/17] Implement use of HW assistance on TLB table walk on 8xx
  2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
                   ` (16 preceding siblings ...)
  2018-05-04 12:34 ` [PATCH BAD 17/17] powerpc/mm: Use pte_fragment_alloc() on 8xx Christophe Leroy
@ 2018-05-11  6:48 ` Michael Ellerman
  2018-05-16 10:17   ` Christophe LEROY
  17 siblings, 1 reply; 28+ messages in thread
From: Michael Ellerman @ 2018-05-11  6:48 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> The purpose of this serie is to implement hardware assistance for TLB table walk
> on the 8xx.
>
> First part is to make L1 entries and L2 entries independant.
> For that, we need to alter ioremap functions in order to handle GUARD attribute
> at the PGD/PMD level.
>
> Last part is to try and reuse PTE fragment implemented on PPC64 in order to
> not waste 16k Pages for page tables as only 4k are used. For the time being,
> it doesn't work, but I include it in the serie anyway in order to get feedback.
>
> Tested successfully on 8xx up to the one before the last.
>
> Didn't have time to do compilation test on other configs, I send it anyway
> before leaving for one week vacation in order to get feedback.

I replied to a few patches, here's some other build errors:


arch/powerpc/mm/ioremap.c:135:15: error: '_PAGE_GUARDED' undeclared (first use in this function):
  pseries_defconfig/powerpc

arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: error: 'PKMAP_BASE' undeclared (first use in this function):
  pmac32_defconfig/powerpc-5.3

include/linux/mm.h:533:41: error: 'PKMAP_BASE' undeclared (first use in this function):
  pmac32_defconfig/powerpc

ERROR: "ioremap_bot" [net/netfilter/nf_conntrack.ko] undefined!:
  linkstation_defconfig/powerpc

ERROR: "ioremap_bot" [fs/xfs/xfs.ko] undefined!:
  linkstation_defconfig/powerpc

arch/powerpc/include/asm/nohash/32/pgtable.h:80:20: error: 'PKMAP_BASE' undeclared (first use in this function):
  corenet32_smp_defconfig/powerpc-5.3

arch/powerpc/include/asm/nohash/32/pgalloc.h:64:43: error: '_PMD_GUARDED' undeclared (first use in this function):
  ppc40x_defconfig/powerpc-5.3

ERROR: "ioremap_bot" [net/packet/af_packet.ko] undefined!:
  storcenter_defconfig/powerpc

ERROR: "ioremap_bot" [drivers/usb/core/usbcore.ko] undefined!:
  ppc44x_defconfig/powerpc

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/17] powerpc: make __ioremap_caller() common to PPC32 and PPC64
  2018-05-08  9:56   ` Aneesh Kumar K.V
@ 2018-05-16  9:58     ` Christophe LEROY
  0 siblings, 0 replies; 28+ messages in thread
From: Christophe LEROY @ 2018-05-16  9:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linux-kernel, linuxppc-dev



Le 08/05/2018 à 11:56, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h |   1 +
>>   arch/powerpc/mm/ioremap.c                    | 126 +++++++--------------------
>>   2 files changed, 34 insertions(+), 93 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index c5c6ead06bfb..2bebdd8302cb 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -18,6 +18,7 @@
>>   #define _PAGE_RO		0
>>   #define _PAGE_USER		0
>>   #define _PAGE_HWWRITE		0
>> +#define _PAGE_COHERENT		0
> 
> This is something I was trying to avoid when I split the headers. We do
> support _PAGE_USER it is !_PAGE_PRIVILEGED. It gets really confusing
> when we have these conflicting names because we are trying to make code
> common across platforms.

Euh ... Here patch adds _PAGE_COHERENT

_PAGE_USER was added some time ago.

Well, we have three cases:

BOOK3S64 and NOHASH32/8xx has _PAGE_PRIVILEGED and no _PAGE_USER
BOOKE has both _PAGE_PRIVILEGED and _PAGE_USER
Others have _PAGE_USER and no _PAGE_PRIVILEGED

So when giving user rights to a page, some will set _PAGE_USER, some 
will unset _PAGE_PRIVILEGED and some will do both.

_PAGE_USER and _PAGE_PRIVILEGED being used outside of the subarch headers,
- either we have to add uggly ifdefs
- or we can just make sure unused flags are set as 0, then  (x | 0) and 
(x & ~0) will do nothing and will be eliminated by the compiler.

Today, this is done in asm/pte-common.h. Unfortunately, all headers 
except book3s64 do include pte-common.

Another solution would be to make sure _PAGE_xxx flags are not used 
outside of subarch specific headers. That would mean having specific 
helpers defined in each subarch header, but is it really worth it ?




Lets take the exemple of an even more tricky one :
- Some subarchs have _PAGE_RW, others have _PAGE_RO instead. In 
addition, the 8xx has _PAGE_NA
- Book3s64 has _PAGE_READ and _PAGE_WRITE.
- In some places, _PAGE_RW has been redefined has _PAGE_READ | _PAGE_WRITE

It has really become pretty complex. Why having defined new flags 
instead of using _PAGE_RO for _PAGE_READ and _PAGE_RW for _PAGE_READ | 
_PAGE_WRITE ? Does it make any sense to have the possibility to set 
_PAGE_WRITE without _PAGE_READ ?


I feel like having simple generic code like:

	flags = (flags & ~_PAGE_PRIVILEGED) | _PAGE_USER;

is better than having almost same code duplicated in several places or 
ugly ifdefs like:

#if defined(CONFIG_BOOK3S64) || defined(CONFIG_PPC_8xx) || defined 
(CONFIG_BOOKE)
	flags &= ~_PAGE_PRIVILEGED;
#endif
#if !defined(CONFIG_BOOK3S64) && !defined(CONFIG_PPC_8xx)
	flags |= _PAGE_USER;
#endif


It looks to me that patch 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=812fadcb941a81d1f3948b10a95a4dce663da3e4 
allowed a nice code simplification, don't you feel the same ?

Christophe

> 
> 
>>   
>>   #define _PAGE_EXEC		0x00001 /* execute permission */
>>   #define _PAGE_WRITE		0x00002 /* write access allowed */
>> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
>> index 65d611d44d38..59be5dfcb3e9 100644
>> --- a/arch/powerpc/mm/ioremap.c
>> +++ b/arch/powerpc/mm/ioremap.c
>> @@ -33,95 +33,6 @@ unsigned long ioremap_bot;
>>   unsigned long ioremap_bot = IOREMAP_BASE;
>>   #endif
>>   
>> -#ifdef CONFIG_PPC32
>> -
>> -void __iomem *
>> -__ioremap_caller(phys_addr_t addr, unsigned long size, unsigned long flags,
>> -		 void *caller)
>> -{
>> -	unsigned long v, i;
>> -	phys_addr_t p;
>> -	int err;
>> -
>> -	/* Make sure we have the base flags */
>> -	if ((flags & _PAGE_PRESENT) == 0)
>> -		flags |= pgprot_val(PAGE_KERNEL);
>> -
>> -	/* Non-cacheable page cannot be coherent */
>> -	if (flags & _PAGE_NO_CACHE)
>> -		flags &= ~_PAGE_COHERENT;
>> -
>> -	/*
>> -	 * Choose an address to map it to.
>> -	 * Once the vmalloc system is running, we use it.
>> -	 * Before then, we use space going up from IOREMAP_BASE
>> -	 * (ioremap_bot records where we're up to).
>> -	 */
>> -	p = addr & PAGE_MASK;
>> -	size = PAGE_ALIGN(addr + size) - p;
>> -
>> -	/*
>> -	 * If the address lies within the first 16 MB, assume it's in ISA
>> -	 * memory space
>> -	 */
>> -	if (p < 16*1024*1024)
>> -		p += _ISA_MEM_BASE;
>> -
>> -#ifndef CONFIG_CRASH_DUMP
>> -	/*
>> -	 * Don't allow anybody to remap normal RAM that we're using.
>> -	 * mem_init() sets high_memory so only do the check after that.
>> -	 */
>> -	if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
>> -	    page_is_ram(__phys_to_pfn(p))) {
>> -		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
>> -		       (unsigned long long)p, __builtin_return_address(0));
>> -		return NULL;
>> -	}
>> -#endif
>> -
>> -	if (size == 0)
>> -		return NULL;
>> -
>> -	/*
>> -	 * Is it already mapped?  Perhaps overlapped by a previous
>> -	 * mapping.
>> -	 */
>> -	v = p_block_mapped(p);
>> -	if (v)
>> -		goto out;
>> -
>> -	if (slab_is_available()) {
>> -		struct vm_struct *area;
>> -		area = get_vm_area_caller(size, VM_IOREMAP, caller);
>> -		if (area == 0)
>> -			return NULL;
>> -		area->phys_addr = p;
>> -		v = (unsigned long) area->addr;
>> -	} else {
>> -		v = ioremap_bot;
>> -		ioremap_bot += size;
>> -	}
>> -
>> -	/*
>> -	 * Should check if it is a candidate for a BAT mapping
>> -	 */
>> -
>> -	err = 0;
>> -	for (i = 0; i < size && err == 0; i += PAGE_SIZE)
>> -		err = map_kernel_page(v+i, p+i, flags);
>> -	if (err) {
>> -		if (slab_is_available())
>> -			vunmap((void *)v);
>> -		return NULL;
>> -	}
>> -
>> -out:
>> -	return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
>> -}
>> -
>> -#else
>> -
>>   /**
>>    * __ioremap_at - Low level function to establish the page tables
>>    *                for an IO mapping
>> @@ -135,6 +46,10 @@ void __iomem * __ioremap_at(phys_addr_t pa, void *ea, unsigned long size,
>>   	if ((flags & _PAGE_PRESENT) == 0)
>>   		flags |= pgprot_val(PAGE_KERNEL);
>>   
>> +	/* Non-cacheable page cannot be coherent */
>> +	if (flags & _PAGE_NO_CACHE)
>> +		flags &= ~_PAGE_COHERENT;
>> +
>>   	/* We don't support the 4K PFN hack with ioremap */
>>   	if (flags & H_PAGE_4K_PFN)
>>   		return NULL;
>> @@ -187,6 +102,33 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
>>   	if ((size == 0) || (paligned == 0))
>>   		return NULL;
>>   
>> +	/*
>> +	 * If the address lies within the first 16 MB, assume it's in ISA
>> +	 * memory space
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PPC32) && paligned < 16*1024*1024)
>> +		paligned += _ISA_MEM_BASE;
>> +
>> +	/*
>> +	 * Don't allow anybody to remap normal RAM that we're using.
>> +	 * mem_init() sets high_memory so only do the check after that.
>> +	 */
>> +	if (!IS_ENABLED(CONFIG_CRASH_DUMP) &&
>> +	    slab_is_available() && (paligned < virt_to_phys(high_memory)) &&
>> +	    page_is_ram(__phys_to_pfn(paligned))) {
>> +		printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
>> +		       (u64)paligned, __builtin_return_address(0));
>> +		return NULL;
>> +	}
>> +
>> +	/*
>> +	 * Is it already mapped?  Perhaps overlapped by a previous
>> +	 * mapping.
>> +	 */
>> +	ret = (void __iomem *)p_block_mapped(paligned);
>> +	if (ret)
>> +		goto out;
>> +
>>   	if (slab_is_available()) {
>>   		struct vm_struct *area;
>>   
>> @@ -205,14 +147,12 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
>>   		if (ret)
>>   			ioremap_bot += size;
>>   	}
>> -
>> +out:
>>   	if (ret)
>> -		ret += addr & ~PAGE_MASK;
>> +		ret += (unsigned long)addr & ~PAGE_MASK;
>>   	return ret;
>>   }
>>   
>> -#endif
>> -
>>   /*
>>    * Unmap an IO region and remove it from imalloc'd list.
>>    * Access to IO memory should be serialized by driver.
>> -- 
>> 2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/17] powerpc: move io mapping functions into ioremap.c
  2018-05-11  6:01   ` Michael Ellerman
@ 2018-05-16 10:13     ` Christophe LEROY
  0 siblings, 0 replies; 28+ messages in thread
From: Christophe LEROY @ 2018-05-16 10:13 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev



Le 11/05/2018 à 08:01, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 

[...]

>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/types.h>
>> +#include <linux/mm.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/init.h>
>> +#include <linux/highmem.h>
>> +#include <linux/memblock.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/pgtable.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/fixmap.h>
>> +#include <asm/io.h>
>> +#include <asm/setup.h>
>> +#include <asm/sections.h>
> 
> I needed:
> 
> +#include <asm/machdep.h>

Oops, yes it was added in the following patch. Ok I move it one patch back.

Thanks
Christophe

> 
> To fix:
> 
> ../arch/powerpc/mm/ioremap.c: In function ‘ioremap_wc’:
> ../arch/powerpc/mm/ioremap.c:290:6: error: ‘ppc_md’ undeclared (first use in this function)
>    if (ppc_md.ioremap)
>        ^~~~~~
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx
  2018-05-11  6:48 ` [PATCH 00/17] Implement use of HW assistance on TLB table walk " Michael Ellerman
@ 2018-05-16 10:17   ` Christophe LEROY
  0 siblings, 0 replies; 28+ messages in thread
From: Christophe LEROY @ 2018-05-16 10:17 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, aneesh.kumar
  Cc: linux-kernel, linuxppc-dev



Le 11/05/2018 à 08:48, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> The purpose of this serie is to implement hardware assistance for TLB table walk
>> on the 8xx.
>>
>> First part is to make L1 entries and L2 entries independant.
>> For that, we need to alter ioremap functions in order to handle GUARD attribute
>> at the PGD/PMD level.
>>
>> Last part is to try and reuse PTE fragment implemented on PPC64 in order to
>> not waste 16k Pages for page tables as only 4k are used. For the time being,
>> it doesn't work, but I include it in the serie anyway in order to get feedback.
>>
>> Tested successfully on 8xx up to the one before the last.
>>
>> Didn't have time to do compilation test on other configs, I send it anyway
>> before leaving for one week vacation in order to get feedback.
> 
> I replied to a few patches, here's some other build errors:
> 
> 
> arch/powerpc/mm/ioremap.c:135:15: error: '_PAGE_GUARDED' undeclared (first use in this function):
>    pseries_defconfig/powerpc
> 
> arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: error: 'PKMAP_BASE' undeclared (first use in this function):
>    pmac32_defconfig/powerpc-5.3
> 
> include/linux/mm.h:533:41: error: 'PKMAP_BASE' undeclared (first use in this function):
>    pmac32_defconfig/powerpc
> 
> ERROR: "ioremap_bot" [net/netfilter/nf_conntrack.ko] undefined!:
>    linkstation_defconfig/powerpc
> 
> ERROR: "ioremap_bot" [fs/xfs/xfs.ko] undefined!:
>    linkstation_defconfig/powerpc
> 
> arch/powerpc/include/asm/nohash/32/pgtable.h:80:20: error: 'PKMAP_BASE' undeclared (first use in this function):
>    corenet32_smp_defconfig/powerpc-5.3
> 
> arch/powerpc/include/asm/nohash/32/pgalloc.h:64:43: error: '_PMD_GUARDED' undeclared (first use in this function):
>    ppc40x_defconfig/powerpc-5.3
> 
> ERROR: "ioremap_bot" [net/packet/af_packet.ko] undefined!:
>    storcenter_defconfig/powerpc
> 
> ERROR: "ioremap_bot" [drivers/usb/core/usbcore.ko] undefined!:
>    ppc44x_defconfig/powerpc
> 

Thanks for testing. I have now fixed all of them in v2.

For PKMAP_BASE, I had to move it from asm/highmem.h into the 
book3s/32/pgtable.h and nohash/32/pgtable.h because including 
asm/highmem.h in the pgtable.h files was introducing circular dependency.

Christophe

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-05-16 10:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-04 12:33 [PATCH 00/17] Implement use of HW assistance on TLB table walk on 8xx Christophe Leroy
2018-05-04 12:33 ` [PATCH 01/17] powerpc/nohash: remove hash related code from nohash headers Christophe Leroy
2018-05-08  8:25   ` Aneesh Kumar K.V
2018-05-04 12:33 ` [PATCH 02/17] powerpc/nohash: remove _PAGE_BUSY Christophe Leroy
2018-05-08  8:26   ` Aneesh Kumar K.V
2018-05-04 12:33 ` [PATCH 03/17] powerpc/nohash: use IS_ENABLED() to simplify __set_pte_at() Christophe Leroy
2018-05-04 12:33 ` [PATCH 04/17] Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP" Christophe Leroy
2018-05-04 12:34 ` [PATCH 05/17] powerpc: move io mapping functions into ioremap.c Christophe Leroy
2018-05-11  6:01   ` Michael Ellerman
2018-05-16 10:13     ` Christophe LEROY
2018-05-04 12:34 ` [PATCH 06/17] powerpc: common ioremap functions Christophe Leroy
2018-05-04 12:34 ` [PATCH 07/17] powerpc: make ioremap_bot common to PPC32 and PPC64 Christophe Leroy
2018-05-04 12:34 ` [PATCH 08/17] powerpc: make __iounmap() " Christophe Leroy
2018-05-04 12:34 ` [PATCH 09/17] powerpc: make __ioremap_caller() " Christophe Leroy
2018-05-08  9:56   ` Aneesh Kumar K.V
2018-05-16  9:58     ` Christophe LEROY
2018-05-04 12:34 ` [PATCH 10/17] powerpc: use _ALIGN macro Christophe Leroy
2018-05-04 12:34 ` [PATCH 11/17] powerpc/nohash32: set GUARDED attribute in the PMD directly Christophe Leroy
2018-05-11  6:45   ` Michael Ellerman
2018-05-04 12:34 ` [PATCH 12/17] powerpc/8xx: Remove PTE_ATOMIC_UPDATES Christophe Leroy
2018-05-04 13:16   ` Joakim Tjernlund
2018-05-04 12:34 ` [PATCH 13/17] powerpc/mm: Use hardware assistance in TLB handlers on the 8xx Christophe Leroy
2018-05-04 12:34 ` [PATCH 14/17] powerpc/8xx: reunify TLB handler routines Christophe Leroy
2018-05-04 12:34 ` [PATCH 15/17] powerpc/8xx: Free up SPRN_SPRG_SCRATCH2 Christophe Leroy
2018-05-04 12:34 ` [PATCH 16/17] powerpc/mm: Make pte_fragment_alloc() common to PPC32 and PPC64 Christophe Leroy
2018-05-04 12:34 ` [PATCH BAD 17/17] powerpc/mm: Use pte_fragment_alloc() on 8xx Christophe Leroy
2018-05-11  6:48 ` [PATCH 00/17] Implement use of HW assistance on TLB table walk " Michael Ellerman
2018-05-16 10:17   ` Christophe LEROY

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).