LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator
@ 2021-07-20  2:51 Kefeng Wang
  2021-07-20  2:51 ` [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early() Kefeng Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Kefeng Wang @ 2021-07-20  2:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov
  Cc: linux-arm-kernel, linux-kernel, kasan-dev, linux-mm, Kefeng Wang

Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,
  "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
  "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
  "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"

then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
even the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Also fix a crash when both NEED_PER_CPU_PAGE_FIRST_CHUNK and KASAN_VMALLOC enabled.

Tested on ARM64 qemu with cmdline "percpu_alloc=page" based on v5.14-rc2.

V2:
- fix build error when CONFIG_KASAN disabled, found by lkp@intel.com
- drop wrong __weak comment from kasan_populate_early_vm_area_shadow(),
  found by Marco Elver <elver@google.com>

Kefeng Wang (3):
  vmalloc: Choose a better start address in vm_area_register_early()
  arm64: Support page mapping percpu first chunk allocator
  kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC

 arch/arm64/Kconfig         |  4 ++
 arch/arm64/mm/kasan_init.c | 17 ++++++++
 drivers/base/arch_numa.c   | 82 +++++++++++++++++++++++++++++++++-----
 include/linux/kasan.h      |  6 +++
 mm/kasan/init.c            |  5 +++
 mm/vmalloc.c               |  9 +++--
 6 files changed, 110 insertions(+), 13 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early()
  2021-07-20  2:51 [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
@ 2021-07-20  2:51 ` Kefeng Wang
  2021-08-01 15:23   ` Catalin Marinas
  2021-07-20  2:51 ` [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator Kefeng Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2021-07-20  2:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov
  Cc: linux-arm-kernel, linux-kernel, kasan-dev, linux-mm, Kefeng Wang

There are some fixed locations in the vmalloc area be reserved
in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
pcpu_page_first_chunk(), it calls vm_area_register_early() and
choose VMALLOC_START as the start address of vmap area which
could be conflicted with above address, then could trigger a
BUG_ON in vm_area_add_early().

Let's choose the end of existing address range in vmlist as the
start address instead of VMALLOC_START to avoid the BUG_ON.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/vmalloc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d5cd52805149..a98cf97f032f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2238,12 +2238,14 @@ void __init vm_area_add_early(struct vm_struct *vm)
  */
 void __init vm_area_register_early(struct vm_struct *vm, size_t align)
 {
-	static size_t vm_init_off __initdata;
+	unsigned long vm_start = VMALLOC_START;
+	struct vm_struct *tmp;
 	unsigned long addr;
 
-	addr = ALIGN(VMALLOC_START + vm_init_off, align);
-	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
+	for (tmp = vmlist; tmp; tmp = tmp->next)
+		vm_start = (unsigned long)tmp->addr + tmp->size;
 
+	addr = ALIGN(vm_start, align);
 	vm->addr = (void *)addr;
 
 	vm_area_add_early(vm);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator
  2021-07-20  2:51 [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
  2021-07-20  2:51 ` [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early() Kefeng Wang
@ 2021-07-20  2:51 ` Kefeng Wang
  2021-08-01 15:53   ` Catalin Marinas
  2021-07-20  2:51 ` [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC Kefeng Wang
  2021-07-26  1:19 ` [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2021-07-20  2:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov
  Cc: linux-arm-kernel, linux-kernel, kasan-dev, linux-mm, Kefeng Wang

Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,
  "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
  "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
  "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"

then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
even the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/Kconfig       |  4 ++
 drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b5b13a932561..eacb5873ded1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1045,6 +1045,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y
 	depends on NUMA
 
+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+	def_bool y
+	depends on NUMA
+
 source "kernel/Kconfig.hz"
 
 config ARCH_SPARSEMEM_ENABLE
diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
index 4cc4e117727d..563b2013b75a 100644
--- a/drivers/base/arch_numa.c
+++ b/drivers/base/arch_numa.c
@@ -14,6 +14,7 @@
 #include <linux/of.h>
 
 #include <asm/sections.h>
+#include <asm/pgalloc.h>
 
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
@@ -168,22 +169,83 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
 	memblock_free_early(__pa(ptr), size);
 }
 
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+static void __init pcpu_populate_pte(unsigned long addr)
+{
+	pgd_t *pgd = pgd_offset_k(addr);
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	p4d = p4d_offset(pgd, addr);
+	if (p4d_none(*p4d)) {
+		pud_t *new;
+
+		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+		if (!new)
+			goto err_alloc;
+		p4d_populate(&init_mm, p4d, new);
+	}
+
+	pud = pud_offset(p4d, addr);
+	if (pud_none(*pud)) {
+		pmd_t *new;
+
+		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+		if (!new)
+			goto err_alloc;
+		pud_populate(&init_mm, pud, new);
+	}
+
+	pmd = pmd_offset(pud, addr);
+	if (!pmd_present(*pmd)) {
+		pte_t *new;
+
+		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+		if (!new)
+			goto err_alloc;
+		pmd_populate_kernel(&init_mm, pmd, new);
+	}
+
+	return;
+
+err_alloc:
+	panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
+	      __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+}
+#endif
+
 void __init setup_per_cpu_areas(void)
 {
 	unsigned long delta;
 	unsigned int cpu;
-	int rc;
+	int rc = -EINVAL;
+
+	if (pcpu_chosen_fc != PCPU_FC_PAGE) {
+		/*
+		 * Always reserve area for module percpu variables.  That's
+		 * what the legacy allocator did.
+		 */
+		rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
+					    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
+					    pcpu_cpu_distance,
+					    pcpu_fc_alloc, pcpu_fc_free);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+		if (rc < 0)
+			pr_warn("PERCPU: %s allocator failed (%d), falling back to page size\n",
+				   pcpu_fc_names[pcpu_chosen_fc], rc);
+#endif
+	}
 
-	/*
-	 * Always reserve area for module percpu variables.  That's
-	 * what the legacy allocator did.
-	 */
-	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
-				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
-				    pcpu_cpu_distance,
-				    pcpu_fc_alloc, pcpu_fc_free);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+	if (rc < 0)
+		rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
+					   pcpu_fc_alloc,
+					   pcpu_fc_free,
+					   pcpu_populate_pte);
+#endif
 	if (rc < 0)
-		panic("Failed to initialize percpu areas.");
+		panic("Failed to initialize percpu areas (err=%d).", rc);
 
 	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
 	for_each_possible_cpu(cpu)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
  2021-07-20  2:51 [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
  2021-07-20  2:51 ` [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early() Kefeng Wang
  2021-07-20  2:51 ` [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator Kefeng Wang
@ 2021-07-20  2:51 ` Kefeng Wang
  2021-07-22 11:00   ` Marco Elver
  2021-07-26  1:19 ` [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2021-07-20  2:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov
  Cc: linux-arm-kernel, linux-kernel, kasan-dev, linux-mm, Kefeng Wang

With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,

Unable to handle kernel paging request at virtual address ffff7000028f2000
...
swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
[ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
Hardware name: linux,dummy-virt (DT)
pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
pc : kasan_check_range+0x90/0x1a0
lr : memcpy+0x88/0xf4
sp : ffff80001378fe20
...
Call trace:
 kasan_check_range+0x90/0x1a0
 pcpu_page_first_chunk+0x3f0/0x568
 setup_per_cpu_areas+0xb8/0x184
 start_kernel+0x8c/0x328

The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to populate
the vm area shadow memory to fix the issue.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/mm/kasan_init.c | 17 +++++++++++++++++
 include/linux/kasan.h      |  6 ++++++
 mm/kasan/init.c            |  5 +++++
 mm/vmalloc.c               |  1 +
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 61b52a92b8b6..46c1b3722901 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -287,6 +287,23 @@ static void __init kasan_init_depth(void)
 	init_task.kasan_depth = 0;
 }
 
+#ifdef CONFIG_KASAN_VMALLOC
+void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
+{
+	unsigned long shadow_start, shadow_end;
+
+	if (!is_vmalloc_or_module_addr(start))
+		return;
+
+	shadow_start = (unsigned long)kasan_mem_to_shadow(start);
+	shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
+	shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
+	shadow_end = ALIGN(shadow_end, PAGE_SIZE);
+	kasan_map_populate(shadow_start, shadow_end,
+			   early_pfn_to_nid(virt_to_pfn(start)));
+}
+#endif
+
 void __init kasan_init(void)
 {
 	kasan_init_shadow();
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index dd874a1ee862..3f8c26d9ef82 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -133,6 +133,8 @@ struct kasan_cache {
 	bool is_kmalloc;
 };
 
+void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
+
 slab_flags_t __kasan_never_merge(void);
 static __always_inline slab_flags_t kasan_never_merge(void)
 {
@@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);
 
 #else /* CONFIG_KASAN */
 
+static inline void kasan_populate_early_vm_area_shadow(void *start,
+						       unsigned long size)
+{ }
+
 static inline slab_flags_t kasan_never_merge(void)
 {
 	return 0;
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index cc64ed6858c6..d39577d088a1 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
 	return 0;
 }
 
+void __init __weak kasan_populate_early_vm_area_shadow(void *start,
+						       unsigned long size)
+{
+}
+
 static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
 {
 	pte_t *pte;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a98cf97f032f..f19e07314ee5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2249,6 +2249,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
 	vm->addr = (void *)addr;
 
 	vm_area_add_early(vm);
+	kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
 }
 
 static void vmap_init_free_space(void)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
  2021-07-20  2:51 ` [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC Kefeng Wang
@ 2021-07-22 11:00   ` Marco Elver
  2021-07-22 12:14     ` Kefeng Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Marco Elver @ 2021-07-22 11:00 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov, linux-arm-kernel, linux-kernel, kasan-dev,
	linux-mm

On Tue, Jul 20, 2021 at 10:51AM +0800, Kefeng Wang wrote:
> With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
> 
> Unable to handle kernel paging request at virtual address ffff7000028f2000
> ...
> swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
> [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
> Internal error: Oops: 96000007 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
> Hardware name: linux,dummy-virt (DT)
> pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
> pc : kasan_check_range+0x90/0x1a0
> lr : memcpy+0x88/0xf4
> sp : ffff80001378fe20
> ...
> Call trace:
>  kasan_check_range+0x90/0x1a0
>  pcpu_page_first_chunk+0x3f0/0x568
>  setup_per_cpu_areas+0xb8/0x184
>  start_kernel+0x8c/0x328
> 
> The vm area used in vm_area_register_early() has no kasan shadow memory,
> Let's add a new kasan_populate_early_vm_area_shadow() function to populate
> the vm area shadow memory to fix the issue.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Acked-by: Marco Elver <elver@google.com>

for the kasan bits.

> ---
>  arch/arm64/mm/kasan_init.c | 17 +++++++++++++++++
>  include/linux/kasan.h      |  6 ++++++
>  mm/kasan/init.c            |  5 +++++
>  mm/vmalloc.c               |  1 +
>  4 files changed, 29 insertions(+)
> 
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 61b52a92b8b6..46c1b3722901 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -287,6 +287,23 @@ static void __init kasan_init_depth(void)
>  	init_task.kasan_depth = 0;
>  }
>  
> +#ifdef CONFIG_KASAN_VMALLOC
> +void __init kasan_populate_early_vm_area_shadow(void *start, unsigned long size)
> +{
> +	unsigned long shadow_start, shadow_end;
> +
> +	if (!is_vmalloc_or_module_addr(start))
> +		return;
> +
> +	shadow_start = (unsigned long)kasan_mem_to_shadow(start);
> +	shadow_start = ALIGN_DOWN(shadow_start, PAGE_SIZE);
> +	shadow_end = (unsigned long)kasan_mem_to_shadow(start + size);
> +	shadow_end = ALIGN(shadow_end, PAGE_SIZE);
> +	kasan_map_populate(shadow_start, shadow_end,
> +			   early_pfn_to_nid(virt_to_pfn(start)));
> +}
> +#endif
> +
>  void __init kasan_init(void)
>  {
>  	kasan_init_shadow();
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index dd874a1ee862..3f8c26d9ef82 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -133,6 +133,8 @@ struct kasan_cache {
>  	bool is_kmalloc;
>  };
>  
> +void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
> +
>  slab_flags_t __kasan_never_merge(void);
>  static __always_inline slab_flags_t kasan_never_merge(void)
>  {
> @@ -303,6 +305,10 @@ void kasan_restore_multi_shot(bool enabled);
>  
>  #else /* CONFIG_KASAN */
>  
> +static inline void kasan_populate_early_vm_area_shadow(void *start,
> +						       unsigned long size)
> +{ }
> +
>  static inline slab_flags_t kasan_never_merge(void)
>  {
>  	return 0;
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index cc64ed6858c6..d39577d088a1 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -279,6 +279,11 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
>  	return 0;
>  }
>  
> +void __init __weak kasan_populate_early_vm_area_shadow(void *start,
> +						       unsigned long size)
> +{
> +}
> +
>  static void kasan_free_pte(pte_t *pte_start, pmd_t *pmd)
>  {
>  	pte_t *pte;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a98cf97f032f..f19e07314ee5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2249,6 +2249,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>  	vm->addr = (void *)addr;
>  
>  	vm_area_add_early(vm);
> +	kasan_populate_early_vm_area_shadow(vm->addr, vm->size);
>  }
>  
>  static void vmap_init_free_space(void)
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
  2021-07-22 11:00   ` Marco Elver
@ 2021-07-22 12:14     ` Kefeng Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2021-07-22 12:14 UTC (permalink / raw)
  To: Marco Elver
  Cc: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov, linux-arm-kernel, linux-kernel, kasan-dev,
	linux-mm


On 2021/7/22 19:00, Marco Elver wrote:
> On Tue, Jul 20, 2021 at 10:51AM +0800, Kefeng Wang wrote:
>> With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK, it crashs,
>>
>> Unable to handle kernel paging request at virtual address ffff7000028f2000
>> ...
>> swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
>> [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
>> Internal error: Oops: 96000007 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
>> Hardware name: linux,dummy-virt (DT)
>> pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
>> pc : kasan_check_range+0x90/0x1a0
>> lr : memcpy+0x88/0xf4
>> sp : ffff80001378fe20
>> ...
>> Call trace:
>>   kasan_check_range+0x90/0x1a0
>>   pcpu_page_first_chunk+0x3f0/0x568
>>   setup_per_cpu_areas+0xb8/0x184
>>   start_kernel+0x8c/0x328
>>
>> The vm area used in vm_area_register_early() has no kasan shadow memory,
>> Let's add a new kasan_populate_early_vm_area_shadow() function to populate
>> the vm area shadow memory to fix the issue.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Acked-by: Marco Elver <elver@google.com>
>
> for the kasan bits.
Thanks Marco.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator
  2021-07-20  2:51 [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
                   ` (2 preceding siblings ...)
  2021-07-20  2:51 ` [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC Kefeng Wang
@ 2021-07-26  1:19 ` Kefeng Wang
  3 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2021-07-26  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Andrey Ryabinin, Andrey Konovalov,
	Dmitry Vyukov
  Cc: linux-arm-kernel, linux-kernel, kasan-dev, linux-mm

Hi Will and Catalin,any comments,kindly ping, thanks.

On 2021/7/20 10:51, Kefeng Wang wrote:
> Percpu embedded first chunk allocator is the firstly option, but it
> could fails on ARM64, eg,
>    "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
>    "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
>    "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"
>
> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
> even the system could not boot successfully.
>
> Let's implement page mapping percpu first chunk allocator as a fallback
> to the embedding allocator to increase the robustness of the system.
>
> Also fix a crash when both NEED_PER_CPU_PAGE_FIRST_CHUNK and KASAN_VMALLOC enabled.
>
> Tested on ARM64 qemu with cmdline "percpu_alloc=page" based on v5.14-rc2.
>
> V2:
> - fix build error when CONFIG_KASAN disabled, found by lkp@intel.com
> - drop wrong __weak comment from kasan_populate_early_vm_area_shadow(),
>    found by Marco Elver <elver@google.com>
>
> Kefeng Wang (3):
>    vmalloc: Choose a better start address in vm_area_register_early()
>    arm64: Support page mapping percpu first chunk allocator
>    kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC
>
>   arch/arm64/Kconfig         |  4 ++
>   arch/arm64/mm/kasan_init.c | 17 ++++++++
>   drivers/base/arch_numa.c   | 82 +++++++++++++++++++++++++++++++++-----
>   include/linux/kasan.h      |  6 +++
>   mm/kasan/init.c            |  5 +++
>   mm/vmalloc.c               |  9 +++--
>   6 files changed, 110 insertions(+), 13 deletions(-)
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early()
  2021-07-20  2:51 ` [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early() Kefeng Wang
@ 2021-08-01 15:23   ` Catalin Marinas
  2021-08-02  2:39     ` Kefeng Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2021-08-01 15:23 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm

On Tue, Jul 20, 2021 at 10:51:03AM +0800, Kefeng Wang wrote:
> There are some fixed locations in the vmalloc area be reserved
> in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
> pcpu_page_first_chunk(), it calls vm_area_register_early() and
> choose VMALLOC_START as the start address of vmap area which
> could be conflicted with above address, then could trigger a
> BUG_ON in vm_area_add_early().
> 
> Let's choose the end of existing address range in vmlist as the
> start address instead of VMALLOC_START to avoid the BUG_ON.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/vmalloc.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d5cd52805149..a98cf97f032f 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2238,12 +2238,14 @@ void __init vm_area_add_early(struct vm_struct *vm)
>   */
>  void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>  {
> -	static size_t vm_init_off __initdata;
> +	unsigned long vm_start = VMALLOC_START;
> +	struct vm_struct *tmp;
>  	unsigned long addr;
>  
> -	addr = ALIGN(VMALLOC_START + vm_init_off, align);
> -	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
> +	for (tmp = vmlist; tmp; tmp = tmp->next)
> +		vm_start = (unsigned long)tmp->addr + tmp->size;
>  
> +	addr = ALIGN(vm_start, align);
>  	vm->addr = (void *)addr;
>  
>  	vm_area_add_early(vm);

Is there a risk of breaking other architectures? It doesn't look like to
me but I thought I'd ask.

Also, instead of always picking the end, could we search for a range
that fits?

-- 
Catalin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator
  2021-07-20  2:51 ` [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator Kefeng Wang
@ 2021-08-01 15:53   ` Catalin Marinas
  2021-08-02  2:47     ` Kefeng Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2021-08-01 15:53 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm,
	Greg Kroah-Hartman

On Tue, Jul 20, 2021 at 10:51:04AM +0800, Kefeng Wang wrote:
> Percpu embedded first chunk allocator is the firstly option, but it
> could fails on ARM64, eg,
>   "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
>   "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
>   "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"
> 
> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
> even the system could not boot successfully.
> 
> Let's implement page mapping percpu first chunk allocator as a fallback
> to the embedding allocator to increase the robustness of the system.

It looks like x86, powerpc and sparc implement their own
setup_per_cpu_areas(). I had a quick look on finding some commonalities
but I think it's a lot more hassle to make a generic version out of them
(powerpc looks the simplest though). I think we could add a generic
variant with the arm64 support and later migrate other architectures to
it if possible.

The patch looks ok to me otherwise but I'd need an ack from Greg as it
touches drivers/.

BTW, do we need something similar for the non-NUMA
setup_per_cpu_areas()? I can see this patch only enables
NEED_PER_CPU_PAGE_FIRST_CHUNK if NUMA.

Leaving the rest of the patch below for Greg.

> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/arm64/Kconfig       |  4 ++
>  drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++++-----
>  2 files changed, 76 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b5b13a932561..eacb5873ded1 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1045,6 +1045,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>  	def_bool y
>  	depends on NUMA
>  
> +config NEED_PER_CPU_PAGE_FIRST_CHUNK
> +	def_bool y
> +	depends on NUMA
> +
>  source "kernel/Kconfig.hz"
>  
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 4cc4e117727d..563b2013b75a 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -14,6 +14,7 @@
>  #include <linux/of.h>
>  
>  #include <asm/sections.h>
> +#include <asm/pgalloc.h>
>  
>  struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>  EXPORT_SYMBOL(node_data);
> @@ -168,22 +169,83 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
>  	memblock_free_early(__pa(ptr), size);
>  }
>  
> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> +static void __init pcpu_populate_pte(unsigned long addr)
> +{
> +	pgd_t *pgd = pgd_offset_k(addr);
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	p4d = p4d_offset(pgd, addr);
> +	if (p4d_none(*p4d)) {
> +		pud_t *new;
> +
> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +		if (!new)
> +			goto err_alloc;
> +		p4d_populate(&init_mm, p4d, new);
> +	}
> +
> +	pud = pud_offset(p4d, addr);
> +	if (pud_none(*pud)) {
> +		pmd_t *new;
> +
> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +		if (!new)
> +			goto err_alloc;
> +		pud_populate(&init_mm, pud, new);
> +	}
> +
> +	pmd = pmd_offset(pud, addr);
> +	if (!pmd_present(*pmd)) {
> +		pte_t *new;
> +
> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +		if (!new)
> +			goto err_alloc;
> +		pmd_populate_kernel(&init_mm, pmd, new);
> +	}
> +
> +	return;
> +
> +err_alloc:
> +	panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
> +	      __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
> +}
> +#endif
> +
>  void __init setup_per_cpu_areas(void)
>  {
>  	unsigned long delta;
>  	unsigned int cpu;
> -	int rc;
> +	int rc = -EINVAL;
> +
> +	if (pcpu_chosen_fc != PCPU_FC_PAGE) {
> +		/*
> +		 * Always reserve area for module percpu variables.  That's
> +		 * what the legacy allocator did.
> +		 */
> +		rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
> +					    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
> +					    pcpu_cpu_distance,
> +					    pcpu_fc_alloc, pcpu_fc_free);
> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> +		if (rc < 0)
> +			pr_warn("PERCPU: %s allocator failed (%d), falling back to page size\n",
> +				   pcpu_fc_names[pcpu_chosen_fc], rc);
> +#endif
> +	}
>  
> -	/*
> -	 * Always reserve area for module percpu variables.  That's
> -	 * what the legacy allocator did.
> -	 */
> -	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
> -				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
> -				    pcpu_cpu_distance,
> -				    pcpu_fc_alloc, pcpu_fc_free);
> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
> +	if (rc < 0)
> +		rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
> +					   pcpu_fc_alloc,
> +					   pcpu_fc_free,
> +					   pcpu_populate_pte);
> +#endif
>  	if (rc < 0)
> -		panic("Failed to initialize percpu areas.");
> +		panic("Failed to initialize percpu areas (err=%d).", rc);
>  
>  	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
>  	for_each_possible_cpu(cpu)
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early()
  2021-08-01 15:23   ` Catalin Marinas
@ 2021-08-02  2:39     ` Kefeng Wang
  2021-08-04 11:14       ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2021-08-02  2:39 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm


On 2021/8/1 23:23, Catalin Marinas wrote:
> On Tue, Jul 20, 2021 at 10:51:03AM +0800, Kefeng Wang wrote:
>> There are some fixed locations in the vmalloc area be reserved
>> in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
>> pcpu_page_first_chunk(), it calls vm_area_register_early() and
>> choose VMALLOC_START as the start address of vmap area which
>> could be conflicted with above address, then could trigger a
>> BUG_ON in vm_area_add_early().
>>
>> Let's choose the end of existing address range in vmlist as the
>> start address instead of VMALLOC_START to avoid the BUG_ON.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/vmalloc.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index d5cd52805149..a98cf97f032f 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -2238,12 +2238,14 @@ void __init vm_area_add_early(struct vm_struct *vm)
>>    */
>>   void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>>   {
>> -	static size_t vm_init_off __initdata;
>> +	unsigned long vm_start = VMALLOC_START;
>> +	struct vm_struct *tmp;
>>   	unsigned long addr;
>>   
>> -	addr = ALIGN(VMALLOC_START + vm_init_off, align);
>> -	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
>> +	for (tmp = vmlist; tmp; tmp = tmp->next)
>> +		vm_start = (unsigned long)tmp->addr + tmp->size;
>>   
>> +	addr = ALIGN(vm_start, align);
>>   	vm->addr = (void *)addr;
>>   
>>   	vm_area_add_early(vm);
> Is there a risk of breaking other architectures? It doesn't look like to
> me but I thought I'd ask.

Before this patch, vm_init_off is to record the offset from VMALLOC_START,

but it use VMALLOC_START as start address on the function 
vm_area_register_early()

called firstly,  this will cause the BUG_ON.

With this patch, the most important change is that we choose the start 
address via

dynamic calculate the 'start' address by traversing the list.

[wkf@localhost linux-next]$ git grep vm_area_register_early
arch/alpha/mm/init.c: vm_area_register_early(&console_remap_vm, PAGE_SIZE);
arch/x86/xen/p2m.c:     vm_area_register_early(&vm, PMD_SIZE * 
PMDS_PER_MID_PAGE);
mm/percpu.c:    vm_area_register_early(&vm, PAGE_SIZE);
[wkf@localhost linux-next]$ git grep vm_area_add_early
arch/arm/mm/ioremap.c:  vm_area_add_early(vm);
arch/arm64/mm/mmu.c:    vm_area_add_early(vma);

x86/alpha won't call vm_area_add_early(), only arm64 could call both vm_area_add_early()
and  vm_area_register_early() when this patchset is merged. so it won't break other architectures.

>
> Also, instead of always picking the end, could we search for a range
> that fits?
We only need a space in vmalloc range,  using end or a range in the 
middle is not different.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator
  2021-08-01 15:53   ` Catalin Marinas
@ 2021-08-02  2:47     ` Kefeng Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2021-08-02  2:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm,
	Greg Kroah-Hartman


On 2021/8/1 23:53, Catalin Marinas wrote:
> On Tue, Jul 20, 2021 at 10:51:04AM +0800, Kefeng Wang wrote:
>> Percpu embedded first chunk allocator is the firstly option, but it
>> could fails on ARM64, eg,
>>    "percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000"
>>    "percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000"
>>    "percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000"
>>
>> then we could meet "WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838",
>> even the system could not boot successfully.
>>
>> Let's implement page mapping percpu first chunk allocator as a fallback
>> to the embedding allocator to increase the robustness of the system.
> It looks like x86, powerpc and sparc implement their own
> setup_per_cpu_areas(). I had a quick look on finding some commonalities
> but I think it's a lot more hassle to make a generic version out of them
> (powerpc looks the simplest though). I think we could add a generic
> variant with the arm64 support and later migrate other architectures to
> it if possible.
Ok, let's do it later, I could try to make some cleanup after the 
patchset is merged ;)
> The patch looks ok to me otherwise but I'd need an ack from Greg as it
> touches drivers/.

the arch_numa is only used ARM64 and riscv, the 
NEED_PER_CPU_PAGE_FIRST_CHUNK

is not enabled on RISCV, so it's no bad effect.

>
> BTW, do we need something similar for the non-NUMA
> setup_per_cpu_areas()? I can see this patch only enables
> NEED_PER_CPU_PAGE_FIRST_CHUNK if NUMA.
>
> Leaving the rest of the patch below for Greg.
>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   arch/arm64/Kconfig       |  4 ++
>>   drivers/base/arch_numa.c | 82 +++++++++++++++++++++++++++++++++++-----
>>   2 files changed, 76 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index b5b13a932561..eacb5873ded1 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1045,6 +1045,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>   	def_bool y
>>   	depends on NUMA
>>   
>> +config NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +	def_bool y
>> +	depends on NUMA
>> +
>>   source "kernel/Kconfig.hz"
>>   
>>   config ARCH_SPARSEMEM_ENABLE
>> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
>> index 4cc4e117727d..563b2013b75a 100644
>> --- a/drivers/base/arch_numa.c
>> +++ b/drivers/base/arch_numa.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/of.h>
>>   
>>   #include <asm/sections.h>
>> +#include <asm/pgalloc.h>
>>   
>>   struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>   EXPORT_SYMBOL(node_data);
>> @@ -168,22 +169,83 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
>>   	memblock_free_early(__pa(ptr), size);
>>   }
>>   
>> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +static void __init pcpu_populate_pte(unsigned long addr)
>> +{
>> +	pgd_t *pgd = pgd_offset_k(addr);
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +
>> +	p4d = p4d_offset(pgd, addr);
>> +	if (p4d_none(*p4d)) {
>> +		pud_t *new;
>> +
>> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>> +		if (!new)
>> +			goto err_alloc;
>> +		p4d_populate(&init_mm, p4d, new);
>> +	}
>> +
>> +	pud = pud_offset(p4d, addr);
>> +	if (pud_none(*pud)) {
>> +		pmd_t *new;
>> +
>> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>> +		if (!new)
>> +			goto err_alloc;
>> +		pud_populate(&init_mm, pud, new);
>> +	}
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +	if (!pmd_present(*pmd)) {
>> +		pte_t *new;
>> +
>> +		new = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>> +		if (!new)
>> +			goto err_alloc;
>> +		pmd_populate_kernel(&init_mm, pmd, new);
>> +	}
>> +
>> +	return;
>> +
>> +err_alloc:
>> +	panic("%s: Failed to allocate %lu bytes align=%lx from=%lx\n",
>> +	      __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
>> +}
>> +#endif
>> +
>>   void __init setup_per_cpu_areas(void)
>>   {
>>   	unsigned long delta;
>>   	unsigned int cpu;
>> -	int rc;
>> +	int rc = -EINVAL;
>> +
>> +	if (pcpu_chosen_fc != PCPU_FC_PAGE) {
>> +		/*
>> +		 * Always reserve area for module percpu variables.  That's
>> +		 * what the legacy allocator did.
>> +		 */
>> +		rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
>> +					    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
>> +					    pcpu_cpu_distance,
>> +					    pcpu_fc_alloc, pcpu_fc_free);
>> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +		if (rc < 0)
>> +			pr_warn("PERCPU: %s allocator failed (%d), falling back to page size\n",
>> +				   pcpu_fc_names[pcpu_chosen_fc], rc);
>> +#endif
>> +	}
>>   
>> -	/*
>> -	 * Always reserve area for module percpu variables.  That's
>> -	 * what the legacy allocator did.
>> -	 */
>> -	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
>> -				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
>> -				    pcpu_cpu_distance,
>> -				    pcpu_fc_alloc, pcpu_fc_free);
>> +#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
>> +	if (rc < 0)
>> +		rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
>> +					   pcpu_fc_alloc,
>> +					   pcpu_fc_free,
>> +					   pcpu_populate_pte);
>> +#endif
>>   	if (rc < 0)
>> -		panic("Failed to initialize percpu areas.");
>> +		panic("Failed to initialize percpu areas (err=%d).", rc);
>>   
>>   	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
>>   	for_each_possible_cpu(cpu)
>> -- 
>> 2.26.2
> .
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early()
  2021-08-02  2:39     ` Kefeng Wang
@ 2021-08-04 11:14       ` Catalin Marinas
  2021-08-05 12:46         ` Kefeng Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2021-08-04 11:14 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm

On Mon, Aug 02, 2021 at 10:39:04AM +0800, Kefeng Wang wrote:
> On 2021/8/1 23:23, Catalin Marinas wrote:
> > On Tue, Jul 20, 2021 at 10:51:03AM +0800, Kefeng Wang wrote:
> > > There are some fixed locations in the vmalloc area be reserved
> > > in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
> > > pcpu_page_first_chunk(), it calls vm_area_register_early() and
> > > choose VMALLOC_START as the start address of vmap area which
> > > could be conflicted with above address, then could trigger a
> > > BUG_ON in vm_area_add_early().
> > > 
> > > Let's choose the end of existing address range in vmlist as the
> > > start address instead of VMALLOC_START to avoid the BUG_ON.
> > > 
> > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> > > ---
> > >   mm/vmalloc.c | 8 +++++---
> > >   1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index d5cd52805149..a98cf97f032f 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -2238,12 +2238,14 @@ void __init vm_area_add_early(struct vm_struct *vm)
> > >    */
> > >   void __init vm_area_register_early(struct vm_struct *vm, size_t align)
> > >   {
> > > -	static size_t vm_init_off __initdata;
> > > +	unsigned long vm_start = VMALLOC_START;
> > > +	struct vm_struct *tmp;
> > >   	unsigned long addr;
> > > -	addr = ALIGN(VMALLOC_START + vm_init_off, align);
> > > -	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
> > > +	for (tmp = vmlist; tmp; tmp = tmp->next)
> > > +		vm_start = (unsigned long)tmp->addr + tmp->size;
> > > +	addr = ALIGN(vm_start, align);
> > >   	vm->addr = (void *)addr;
> > >   	vm_area_add_early(vm);
> > Is there a risk of breaking other architectures? It doesn't look like to
> > me but I thought I'd ask.
> 
> Before this patch, vm_init_off is to record the offset from VMALLOC_START,
> 
> but it use VMALLOC_START as start address on the function
> vm_area_register_early()
> 
> called firstly,  this will cause the BUG_ON.
> 
> With this patch, the most important change is that we choose the start
> address via
> 
> dynamic calculate the 'start' address by traversing the list.
> 
> [wkf@localhost linux-next]$ git grep vm_area_register_early
> arch/alpha/mm/init.c: vm_area_register_early(&console_remap_vm, PAGE_SIZE);
> arch/x86/xen/p2m.c:     vm_area_register_early(&vm, PMD_SIZE *
> PMDS_PER_MID_PAGE);
> mm/percpu.c:    vm_area_register_early(&vm, PAGE_SIZE);
> [wkf@localhost linux-next]$ git grep vm_area_add_early
> arch/arm/mm/ioremap.c:  vm_area_add_early(vm);
> arch/arm64/mm/mmu.c:    vm_area_add_early(vma);
> 
> x86/alpha won't call vm_area_add_early(), only arm64 could call both vm_area_add_early()
> and  vm_area_register_early() when this patchset is merged. so it won't break other architectures.

Thanks for checking.

> > Also, instead of always picking the end, could we search for a range
> > that fits?
> 
> We only need a space in vmalloc range,  using end or a range in the middle
> is not different.

I was thinking of making it more future-proof in case one registers a
vm area towards the end of the range. It's fairly easy to pick a range
in the middle now that you are adding a list traversal.

-- 
Catalin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early()
  2021-08-04 11:14       ` Catalin Marinas
@ 2021-08-05 12:46         ` Kefeng Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2021-08-05 12:46 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	linux-arm-kernel, linux-kernel, kasan-dev, linux-mm


On 2021/8/4 19:14, Catalin Marinas wrote:
> On Mon, Aug 02, 2021 at 10:39:04AM +0800, Kefeng Wang wrote:
>> On 2021/8/1 23:23, Catalin Marinas wrote:
>>> On Tue, Jul 20, 2021 at 10:51:03AM +0800, Kefeng Wang wrote:
>>>> There are some fixed locations in the vmalloc area be reserved
>>>> in ARM(see iotable_init()) and ARM64(see map_kernel()), but for
>>>> pcpu_page_first_chunk(), it calls vm_area_register_early() and
>>>> choose VMALLOC_START as the start address of vmap area which
>>>> could be conflicted with above address, then could trigger a
>>>> BUG_ON in vm_area_add_early().
>>>>
>>>> Let's choose the end of existing address range in vmlist as the
>>>> start address instead of VMALLOC_START to avoid the BUG_ON.
>>>>
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> ---
>>>>    mm/vmalloc.c | 8 +++++---
>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index d5cd52805149..a98cf97f032f 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -2238,12 +2238,14 @@ void __init vm_area_add_early(struct vm_struct *vm)
>>>>     */
>>>>    void __init vm_area_register_early(struct vm_struct *vm, size_t align)
>>>>    {
>>>> -	static size_t vm_init_off __initdata;
>>>> +	unsigned long vm_start = VMALLOC_START;
>>>> +	struct vm_struct *tmp;
>>>>    	unsigned long addr;
>>>> -	addr = ALIGN(VMALLOC_START + vm_init_off, align);
>>>> -	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
>>>> +	for (tmp = vmlist; tmp; tmp = tmp->next)
>>>> +		vm_start = (unsigned long)tmp->addr + tmp->size;
>>>> +	addr = ALIGN(vm_start, align);
>>>>    	vm->addr = (void *)addr;
>>>>    	vm_area_add_early(vm);
>>> Is there a risk of breaking other architectures? It doesn't look like to
>>> me but I thought I'd ask.
>> Before this patch, vm_init_off is to record the offset from VMALLOC_START,
>>
>> but it use VMALLOC_START as start address on the function
>> vm_area_register_early()
>>
>> called firstly,  this will cause the BUG_ON.
>>
>> With this patch, the most important change is that we choose the start
>> address via
>>
>> dynamic calculate the 'start' address by traversing the list.
>>
>> [wkf@localhost linux-next]$ git grep vm_area_register_early
>> arch/alpha/mm/init.c: vm_area_register_early(&console_remap_vm, PAGE_SIZE);
>> arch/x86/xen/p2m.c:     vm_area_register_early(&vm, PMD_SIZE *
>> PMDS_PER_MID_PAGE);
>> mm/percpu.c:    vm_area_register_early(&vm, PAGE_SIZE);
>> [wkf@localhost linux-next]$ git grep vm_area_add_early
>> arch/arm/mm/ioremap.c:  vm_area_add_early(vm);
>> arch/arm64/mm/mmu.c:    vm_area_add_early(vma);
>>
>> x86/alpha won't call vm_area_add_early(), only arm64 could call both vm_area_add_early()
>> and  vm_area_register_early() when this patchset is merged. so it won't break other architectures.
> Thanks for checking.
>
>>> Also, instead of always picking the end, could we search for a range
>>> that fits?
>> We only need a space in vmalloc range,  using end or a range in the middle
>> is not different.
> I was thinking of making it more future-proof in case one registers a
> vm area towards the end of the range. It's fairly easy to pick a range
> in the middle now that you are adding a list traversal.
ok,  will chose a suitable hole in the vmalloc range.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-05 12:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20  2:51 [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang
2021-07-20  2:51 ` [PATCH v2 1/3] vmalloc: Choose a better start address in vm_area_register_early() Kefeng Wang
2021-08-01 15:23   ` Catalin Marinas
2021-08-02  2:39     ` Kefeng Wang
2021-08-04 11:14       ` Catalin Marinas
2021-08-05 12:46         ` Kefeng Wang
2021-07-20  2:51 ` [PATCH v2 2/3] arm64: Support page mapping percpu first chunk allocator Kefeng Wang
2021-08-01 15:53   ` Catalin Marinas
2021-08-02  2:47     ` Kefeng Wang
2021-07-20  2:51 ` [PATCH v2 3/3] kasan: arm64: Fix pcpu_page_first_chunk crash with KASAN_VMALLOC Kefeng Wang
2021-07-22 11:00   ` Marco Elver
2021-07-22 12:14     ` Kefeng Wang
2021-07-26  1:19 ` [PATCH v2 0/3] arm64: support page mapping percpu first chunk allocator Kefeng Wang

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox