LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Arun KS <arunks.linux@gmail.com>
To: Maciej Bielski <m.bielski@virtualopensystems.com>
Cc: "linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, ar@linux.vnet.ibm.com,
	arunks@qti.qualcomm.com, mark.rutland@arm.com,
	scott.branden@broadcom.com, will.deacon@arm.com,
	qiuxishi@huawei.com, Catalin Marinas <catalin.marinas@arm.com>,
	mhocko@suse.com, realean2@ie.ibm.com
Subject: Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Date: Fri, 24 Nov 2017 11:25:12 +0530	[thread overview]
Message-ID: <CAKZGPAPN7migyvpNJDu1bA+ditb0TJV4WLqZuPdkxOU3kYQ9Ng@mail.gmail.com> (raw)
In-Reply-To: <ba9c72239dc5986edc6ca29fc58fefb306e4b52d.1511433386.git.ar@linux.vnet.ibm.com>

On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
<m.bielski@virtualopensystems.com> wrote:
> Introduces memory hotplug functionality (hot-add) for arm64.
>
> Changes v1->v2:
> - swapper pgtable updated in place on hot add, avoiding unnecessary copy:
>   all changes are additive and non destructive.
>
> - stop_machine used to updated swapper on hot add, avoiding races
>
> - checking if pagealloc is under debug to stay coherent with mem_map
>
> Signed-off-by: Maciej Bielski <m.bielski@virtualopensystems.com>
> Signed-off-by: Andrea Reale <ar@linux.vnet.ibm.com>
> ---
>  arch/arm64/Kconfig           | 12 ++++++
>  arch/arm64/configs/defconfig |  1 +
>  arch/arm64/include/asm/mmu.h |  3 ++
>  arch/arm64/mm/init.c         | 87 ++++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/mm/mmu.c          | 39 ++++++++++++++++++++
>  5 files changed, 142 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6..c736bba 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -641,6 +641,14 @@ config HOTPLUG_CPU
>           Say Y here to experiment with turning CPUs off and on.  CPUs
>           can be controlled through /sys/devices/system/cpu.
>
> +config ARCH_HAS_ADD_PAGES
> +       def_bool y
> +       depends on ARCH_ENABLE_MEMORY_HOTPLUG
> +
> +config ARCH_ENABLE_MEMORY_HOTPLUG
> +       def_bool y
> +    depends on !NUMA
> +
>  # Common NUMA Features
>  config NUMA
>         bool "Numa Memory Allocation and Scheduler Support"
> @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
>
>  source "mm/Kconfig"
>
> +config ARCH_MEMORY_PROBE
> +       def_bool y
> +       depends on MEMORY_HOTPLUG
> +
>  config SECCOMP
>         bool "Enable seccomp to safely compute untrusted bytecode"
>         ---help---
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 34480e9..5fc5656 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
>  CONFIG_SCHED_MC=y
>  CONFIG_NUMA=y
>  CONFIG_PREEMPT=y
> +CONFIG_MEMORY_HOTPLUG=y
>  CONFIG_KSM=y
>  CONFIG_TRANSPARENT_HUGEPAGE=y
>  CONFIG_CMA=y
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 0d34bf0..2b3fa4d 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>                                pgprot_t prot, bool page_mappings_only);
>  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
>  extern void mark_linear_text_alias_ro(void);
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
> +#endif
>
>  #endif
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 5960bef..e96e7d3 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
>         return 0;
>  }
>  __initcall(register_mem_limit_dumper);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int add_pages(int nid, unsigned long start_pfn,
> +               unsigned long nr_pages, bool want_memblock)
> +{
> +       int ret;
> +       u64 start_addr = start_pfn << PAGE_SHIFT;
> +       /*
> +        * Mark the first page in the range as unusable. This is needed
> +        * because __add_section (within __add_pages) wants pfn_valid
> +        * of it to be false, and in arm64 pfn falid is implemented by
> +        * just checking at the nomap flag for existing blocks.
> +        *
> +        * A small trick here is that __add_section() requires only
> +        * phys_start_pfn (that is the first pfn of a section) to be
> +        * invalid. Regardless of whether it was assumed (by the function
> +        * author) that all pfns within a section are either all valid
> +        * or all invalid, it allows to avoid looping twice (once here,
> +        * second when memblock_clear_nomap() is called) through all
> +        * pfns of the section and modify only one pfn. Thanks to that,
> +        * further, in __add_zone() only this very first pfn is skipped
> +        * and corresponding page is not flagged reserved. Therefore it
> +        * is enough to correct this setup only for it.
> +        *
> +        * When arch_add_memory() returns the walk_memory_range() function
> +        * is called and passed with online_memory_block() callback,
> +        * which execution finally reaches the memory_block_action()
> +        * function, where also only the first pfn of a memory block is
> +        * checked to be reserved. Above, it was first pfn of a section,
> +        * here it is a block but
> +        * (drivers/base/memory.c):
> +        *     sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +        * (include/linux/memory.h):
> +        *     #define MIN_MEMORY_BLOCK_SIZE     (1UL << SECTION_SIZE_BITS)
> +        * so we can consider block and section equivalently
> +        */
> +       memblock_mark_nomap(start_addr, 1<<PAGE_SHIFT);
> +       ret = __add_pages(nid, start_pfn, nr_pages, want_memblock);
> +
> +       /*
> +        * Make the pages usable after they have been added.
> +        * This will make pfn_valid return true
> +        */
> +       memblock_clear_nomap(start_addr, 1<<PAGE_SHIFT);
> +
> +       /*
> +        * This is a hack to avoid having to mix arch specific code
> +        * into arch independent code. SetPageReserved is supposed
> +        * to be called by __add_zone (within __add_section, within
> +        * __add_pages). However, when it is called there, it assumes that
> +        * pfn_valid returns true.  For the way pfn_valid is implemented
> +        * in arm64 (a check on the nomap flag), the only way to make
> +        * this evaluate true inside __add_zone is to clear the nomap
> +        * flags of blocks in architecture independent code.
> +        *
> +        * To avoid this, we set the Reserved flag here after we cleared
> +        * the nomap flag in the line above.
> +        */
> +       SetPageReserved(pfn_to_page(start_pfn));
> +
> +       return ret;
> +}
> +
> +int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock)
> +{
> +       int ret;
> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> +       unsigned long end_pfn = start_pfn + nr_pages;
> +       unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
> +
> +       if (end_pfn > max_sparsemem_pfn) {
> +               pr_err("end_pfn too big");
> +               return -1;
> +       }
> +       hotplug_paging(start, size);
> +
> +       ret = add_pages(nid, start_pfn, nr_pages, want_memblock);
> +
> +       if (ret)
> +               pr_warn("%s: Problem encountered in __add_pages() ret=%d\n",
> +                       __func__, ret);
> +
> +       return ret;
> +}
> +
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index f1eb15e..d93043d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -28,6 +28,7 @@
>  #include <linux/mman.h>
>  #include <linux/nodemask.h>
>  #include <linux/memblock.h>
> +#include <linux/stop_machine.h>
>  #include <linux/fs.h>
>  #include <linux/io.h>
>  #include <linux/mm.h>
> @@ -615,6 +616,44 @@ void __init paging_init(void)
>                       SWAPPER_DIR_SIZE - PAGE_SIZE);
>  }
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +
> +/*
> + * hotplug_paging() is used by memory hotplug to build new page tables
> + * for hot added memory.
> + */
> +
> +struct mem_range {
> +       phys_addr_t base;
> +       phys_addr_t size;
> +};
> +
> +static int __hotplug_paging(void *data)
> +{
> +       int flags = 0;
> +       struct mem_range *section = data;
> +
> +       if (debug_pagealloc_enabled())
> +               flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> +
> +       __create_pgd_mapping(swapper_pg_dir, section->base,
> +                       __phys_to_virt(section->base), section->size,
> +                       PAGE_KERNEL, pgd_pgtable_alloc, flags);

Hello Andrea,

__hotplug_paging runs on stop_machine context.
cpu stop callbacks must not sleep.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479

__create_pgd_mapping uses pgd_pgtable_alloc. which does
__get_free_page(PGALLOC_GFP)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342

PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM

#define PGALLOC_GFP     (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
#define GFP_KERNEL      (__GFP_RECLAIM | __GFP_IO | __GFP_FS)

Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for

might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150

and then BUG()

I was testing on 4.4 kernel, but cross checked with 4.14 as well.

Regards,
Arun


> +
> +       return 0;
> +}
> +
> +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
> +{
> +       struct mem_range section = {
> +               .base = start,
> +               .size = size,
> +       };
> +
> +       stop_machine(__hotplug_paging, &section, NULL);
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> +
>  /*
>   * Check whether a kernel address is valid (derived from arch/x86/).
>   */
> --
> 2.7.4
>

  reply	other threads:[~2017-11-24  5:55 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-23 11:13 [PATCH v2 0/5] Memory hotplug support for arm64 - complete patchset v2 Andrea Reale
2017-11-23 11:13 ` [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64 Maciej Bielski
2017-11-24  5:55   ` Arun KS [this message]
2017-11-24  9:42     ` Andrea Reale
2017-11-24 10:53       ` Maciej Bielski
2017-11-26  6:58         ` Arun KS
2017-11-27 15:19   ` Robin Murphy
2017-11-27 16:39     ` Maciej Bielski
2017-11-27 17:11       ` Andrea Reale
2017-11-23 11:14 ` [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove Andrea Reale
2017-11-23 22:18   ` Rafael J. Wysocki
2017-11-24 14:39   ` Rafael J. Wysocki
2017-11-24 14:49     ` Andrea Reale
2017-11-24 15:43       ` Michal Hocko
2017-11-24 15:54         ` Andrea Reale
2017-11-24 18:17           ` Michal Hocko
2017-11-29  1:20             ` joeyli
2017-11-30  9:47               ` Michal Hocko
2017-11-27 15:20           ` Robin Murphy
2017-11-27 17:44             ` Andrea Reale
2017-11-29  0:49   ` joeyli
2017-11-29  1:52     ` joeyli
2017-12-04 11:28       ` Andrea Reale
2017-12-04 14:05         ` Rafael J. Wysocki
2017-11-23 11:14 ` [PATCH v2 3/5] mm: memory_hotplug: memblock to track partially removed vmemmap mem Andrea Reale
2017-11-27 15:20   ` Robin Murphy
2017-11-27 17:38     ` Andrea Reale
2017-11-30 14:51   ` Michal Hocko
2017-12-04 11:49     ` Andrea Reale
2017-12-04 12:32       ` Michal Hocko
2017-12-04 12:42         ` Andrea Reale
2017-12-04 12:48           ` Michal Hocko
2017-11-23 11:14 ` [PATCH v2 4/5] mm: memory_hotplug: Add memory hotremove probe device Andrea Reale
2017-11-24 10:35   ` zhong jiang
2017-11-24 10:44     ` Andrea Reale
2017-11-24 12:17       ` zhong jiang
2017-11-24 14:29         ` Andrea Reale
2017-12-04 17:50           ` Reza Arbab
2017-11-27 15:33   ` Robin Murphy
2017-11-27 17:14     ` Andrea Reale
2017-11-30 14:49   ` Michal Hocko
2017-12-04 11:51     ` Andrea Reale
2017-12-04 12:33       ` Michal Hocko
2017-12-04 12:44         ` Andrea Reale
2017-11-23 11:15 ` [PATCH v2 5/5] mm: memory-hotplug: Add memory hot remove support for arm64 Andrea Reale
2017-11-23 16:02 ` [PATCH v2 0/5] Memory hotplug support for arm64 - complete patchset v2 Michal Hocko
2017-11-23 17:33   ` Andrea Reale
2017-11-30 14:57     ` Michal Hocko
2017-12-04 11:34       ` Andrea Reale

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKZGPAPN7migyvpNJDu1bA+ditb0TJV4WLqZuPdkxOU3kYQ9Ng@mail.gmail.com \
    --to=arunks.linux@gmail.com \
    --cc=ar@linux.vnet.ibm.com \
    --cc=arunks@qti.qualcomm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.bielski@virtualopensystems.com \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=qiuxishi@huawei.com \
    --cc=realean2@ie.ibm.com \
    --cc=scott.branden@broadcom.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).