LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
@ 2020-03-17 10:49 David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Vitaly Kuznetsov, Yumei Huang, Igor Mammedov, Baoquan He,
	Eduardo Habkost, Milan Zamazal, Andrew Morton,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Haiyang Zhang,
	K. Y. Srinivasan, Michael Ellerman, Michal Hocko, Michal Hocko,
	Oscar Salvador, Paul Mackerras, Rafael J. Wysocki,
	Stephen Hemminger, Wei Liu, Wei Yang

Distributions nowadays use udev rules ([1] [2]) to specify if and
how to online hotplugged memory. The rules seem to get more complex with
many special cases. Due to the various special cases,
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
is handled via udev rules.

Everytime we hotplug memory, the udev rule will come to the same
conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
memory in separate memory blocks and wait for memory to get onlined by user
space before continuing to add more memory blocks (to not add memory faster
than it is getting onlined). This of course slows down the whole memory
hotplug process.

To make the job of distributions easier and to avoid udev rules that get
more and more complicated, let's extend the mechanism provided by
- /sys/devices/system/memory/auto_online_blocks
- "memhp_default_state=" on the kernel cmdline
to be able to specify also "online_movable" as well as "online_kernel"

v1 -> v2:
- Tweaked some patch descriptions
- Added
-- "powernv/memtrace: always online added memory blocks"
-- "hv_balloon: don't check for memhp_auto_online manually"
-- "mm/memory_hotplug: unexport memhp_auto_online"
- "mm/memory_hotplug: convert memhp_auto_online to store an online_type"
-- No longer touches hv/memtrace code


=== Example /usr/libexec/config-memhotplug ===

#!/bin/bash

VIRT=`systemd-detect-virt --vm`
ARCH=`uname -p`

sense_virtio_mem() {
  if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
    DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc -l`
    if [ $DEVICES != "0" ]; then
        return 0
    fi
  fi
  return 1
}

if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
  echo "Memory hotplug configuration support missing in the kernel"
  exit 1
fi

if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
  echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)"
  exit 1
fi

if [ $VIRT == "microsoft" ]; then
  echo "Detected Hyper-V on $ARCH"
  # Hyper-V wants all memory in ZONE_NORMAL
  ONLINE_TYPE="online_kernel"
elif sense_virtio_mem; then
  echo "Detected virtio-mem on $ARCH"
  # virtio-mem wants all memory in ZONE_NORMAL
  ONLINE_TYPE="online_kernel"
elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
  echo "Detected $ARCH"
  # standby memory should not be onlined automatically
  ONLINE_TYPE="offline"
elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
  echo "Detected" $ARCH
  # PPC64 onlines all hotplugged memory right from the kernel
  ONLINE_TYPE="offline"
elif [ $VIRT == "none" ]; then
  echo "Detected bare-metal on $ARCH"
  # Bare metal users expect hotplugged memory to be unpluggable. We assume
  # that ZONE imbalances on such enterpise servers cannot happen and is
  # properly documented
  ONLINE_TYPE="online_movable"
else
  # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
  # imbalances won't happen
  echo "Detected $VIRT on $ARCH"
  # Usually, ballooning is used in virtual environments, so memory should go to
  # ZONE_NORMAL. However, sometimes "movable_node" is relevant.
  ONLINE_TYPE="online"
fi

echo "Selected online_type:" $ONLINE_TYPE

# Configure what to do with memory that will be hotplugged in the future
echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
if [ $? != "0" ]; then
  echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)"
  # A backup udev rule should handle old kernels if necessary
  exit 1
fi

# Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem)
if [ $ONLINE_TYPE != "offline" ]; then
  for MEMORY in /sys/devices/system/memory/memory*; do
    STATE=`cat $MEMORY/state`
    if [ $STATE == "offline" ]; then
        echo $ONLINE_TYPE > $MEMORY/state
    fi
  done
fi


=== Example /usr/lib/systemd/system/config-memhotplug.service ===

[Unit]
Description=Configure memory hotplug behavior
DefaultDependencies=no
Conflicts=shutdown.target
Before=sysinit.target shutdown.target
After=systemd-modules-load.service
ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks

[Service]
ExecStart=/usr/libexec/config-memhotplug
Type=oneshot
TimeoutSec=0
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target


=== Example modification to the 40-redhat.rules [2] ===

diff --git a/40-redhat.rules b/40-redhat.rules-new
index 2c690e5..168fd03 100644
--- a/40-redhat.rules
+++ b/40-redhat.rules-new
@@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}
 # Memory hotadd request
 SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
 ACTION!="add", GOTO="memory_hotplug_end"
+# memory hotplug behavior configured
+PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end"
+
 PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"

 ENV{.state}="online"

===


[1] https://github.com/lnykryn/systemd-rhel/pull/281
[2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Yumei Huang <yuhuang@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Milan Zamazal <mzamazal@redhat.com>

David Hildenbrand (8):
  drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE
  drivers/base/memory: map MMOP_OFFLINE to 0
  drivers/base/memory: store mapping between MMOP_* and string in an
    array
  powernv/memtrace: always online added memory blocks
  hv_balloon: don't check for memhp_auto_online manually
  mm/memory_hotplug: unexport memhp_auto_online
  mm/memory_hotplug: convert memhp_auto_online to store an online_type
  mm/memory_hotplug: allow to specify a default online_type

 arch/powerpc/platforms/powernv/memtrace.c | 14 ++---
 drivers/base/memory.c                     | 71 ++++++++++++-----------
 drivers/hv/hv_balloon.c                   | 25 ++++----
 include/linux/memory_hotplug.h            | 13 ++++-
 mm/memory_hotplug.c                       | 16 ++---
 5 files changed, 69 insertions(+), 70 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 2/8] drivers/base/memory: map MMOP_OFFLINE to 0 David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Wei Yang, Greg Kroah-Hartman, Andrew Morton, Michal Hocko,
	Oscar Salvador, Rafael J. Wysocki, Baoquan He

The name is misleading and it's not really clear what is "kept". Let's just
name it like the online_type name we expose to user space ("online").

Add some documentation to the types.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          | 9 +++++----
 include/linux/memory_hotplug.h | 6 +++++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6448c9ece2cb..8c5ce42c0fc3 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -216,7 +216,7 @@ static int memory_subsys_online(struct device *dev)
 	 * attribute and need to set the online_type.
 	 */
 	if (mem->online_type < 0)
-		mem->online_type = MMOP_ONLINE_KEEP;
+		mem->online_type = MMOP_ONLINE;
 
 	ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 
@@ -251,7 +251,7 @@ static ssize_t state_store(struct device *dev, struct device_attribute *attr,
 	else if (sysfs_streq(buf, "online_movable"))
 		online_type = MMOP_ONLINE_MOVABLE;
 	else if (sysfs_streq(buf, "online"))
-		online_type = MMOP_ONLINE_KEEP;
+		online_type = MMOP_ONLINE;
 	else if (sysfs_streq(buf, "offline"))
 		online_type = MMOP_OFFLINE;
 	else {
@@ -262,7 +262,7 @@ static ssize_t state_store(struct device *dev, struct device_attribute *attr,
 	switch (online_type) {
 	case MMOP_ONLINE_KERNEL:
 	case MMOP_ONLINE_MOVABLE:
-	case MMOP_ONLINE_KEEP:
+	case MMOP_ONLINE:
 		/* mem->online_type is protected by device_hotplug_lock */
 		mem->online_type = online_type;
 		ret = device_online(&mem->dev);
@@ -342,7 +342,8 @@ static ssize_t valid_zones_show(struct device *dev,
 	}
 
 	nid = mem->nid;
-	default_zone = zone_for_pfn_range(MMOP_ONLINE_KEEP, nid, start_pfn, nr_pages);
+	default_zone = zone_for_pfn_range(MMOP_ONLINE, nid, start_pfn,
+					  nr_pages);
 	strcat(buf, default_zone->name);
 
 	print_allowed_zone(buf, nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL,
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f4d59155f3d4..261dbf010d5d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -47,9 +47,13 @@ enum {
 
 /* Types for control the zone type of onlined and offlined memory */
 enum {
+	/* Offline the memory. */
 	MMOP_OFFLINE = -1,
-	MMOP_ONLINE_KEEP,
+	/* Online the memory. Zone depends, see default_zone_for_pfn(). */
+	MMOP_ONLINE,
+	/* Online the memory to ZONE_NORMAL. */
 	MMOP_ONLINE_KERNEL,
+	/* Online the memory to ZONE_MOVABLE. */
 	MMOP_ONLINE_MOVABLE,
 };
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 2/8] drivers/base/memory: map MMOP_OFFLINE to 0
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 3/8] drivers/base/memory: store mapping between MMOP_* and string in an array David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Wei Yang, Michal Hocko, Greg Kroah-Hartman, Andrew Morton,
	Michal Hocko, Oscar Salvador, Rafael J. Wysocki, Baoquan He

Historically, we used the value -1. Just treat 0 as the special
case now. Clarify a comment (which was wrong, when we come via
device_online() the first time, the online_type would have been 0 /
MEM_ONLINE). The default is now always MMOP_OFFLINE. This removes the
last user of the manual "-1", which didn't use the enum value.

This is a preparation to use the online_type as an array index.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          | 11 ++++-------
 include/linux/memory_hotplug.h |  2 +-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8c5ce42c0fc3..e7e77cafef80 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -211,17 +211,14 @@ static int memory_subsys_online(struct device *dev)
 		return 0;
 
 	/*
-	 * If we are called from state_store(), online_type will be
-	 * set >= 0 Otherwise we were called from the device online
-	 * attribute and need to set the online_type.
+	 * When called via device_online() without configuring the online_type,
+	 * we want to default to MMOP_ONLINE.
 	 */
-	if (mem->online_type < 0)
+	if (mem->online_type == MMOP_OFFLINE)
 		mem->online_type = MMOP_ONLINE;
 
 	ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
-
-	/* clear online_type */
-	mem->online_type = -1;
+	mem->online_type = MMOP_OFFLINE;
 
 	return ret;
 }
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 261dbf010d5d..c2e06ed5e0e9 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -48,7 +48,7 @@ enum {
 /* Types for control the zone type of onlined and offlined memory */
 enum {
 	/* Offline the memory. */
-	MMOP_OFFLINE = -1,
+	MMOP_OFFLINE = 0,
 	/* Online the memory. Zone depends, see default_zone_for_pfn(). */
 	MMOP_ONLINE,
 	/* Online the memory to ZONE_NORMAL. */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 3/8] drivers/base/memory: store mapping between MMOP_* and string in an array
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 2/8] drivers/base/memory: map MMOP_OFFLINE to 0 David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Wei Yang, Michal Hocko, Greg Kroah-Hartman, Andrew Morton,
	Michal Hocko, Oscar Salvador, Rafael J. Wysocki, Baoquan He

Let's use a simple array which we can reuse soon. While at it, move the
string->mmop conversion out of the device hotplug lock.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index e7e77cafef80..8a7f29c0bf97 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -28,6 +28,24 @@
 
 #define MEMORY_CLASS_NAME	"memory"
 
+static const char *const online_type_to_str[] = {
+	[MMOP_OFFLINE] = "offline",
+	[MMOP_ONLINE] = "online",
+	[MMOP_ONLINE_KERNEL] = "online_kernel",
+	[MMOP_ONLINE_MOVABLE] = "online_movable",
+};
+
+static int memhp_online_type_from_str(const char *str)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(online_type_to_str); i++) {
+		if (sysfs_streq(str, online_type_to_str[i]))
+			return i;
+	}
+	return -EINVAL;
+}
+
 #define to_memory_block(dev) container_of(dev, struct memory_block, dev)
 
 static int sections_per_block;
@@ -236,26 +254,17 @@ static int memory_subsys_offline(struct device *dev)
 static ssize_t state_store(struct device *dev, struct device_attribute *attr,
 			   const char *buf, size_t count)
 {
+	const int online_type = memhp_online_type_from_str(buf);
 	struct memory_block *mem = to_memory_block(dev);
-	int ret, online_type;
+	int ret;
+
+	if (online_type < 0)
+		return -EINVAL;
 
 	ret = lock_device_hotplug_sysfs();
 	if (ret)
 		return ret;
 
-	if (sysfs_streq(buf, "online_kernel"))
-		online_type = MMOP_ONLINE_KERNEL;
-	else if (sysfs_streq(buf, "online_movable"))
-		online_type = MMOP_ONLINE_MOVABLE;
-	else if (sysfs_streq(buf, "online"))
-		online_type = MMOP_ONLINE;
-	else if (sysfs_streq(buf, "offline"))
-		online_type = MMOP_OFFLINE;
-	else {
-		ret = -EINVAL;
-		goto err;
-	}
-
 	switch (online_type) {
 	case MMOP_ONLINE_KERNEL:
 	case MMOP_ONLINE_MOVABLE:
@@ -271,7 +280,6 @@ static ssize_t state_store(struct device *dev, struct device_attribute *attr,
 		ret = -EINVAL; /* should never happen */
 	}
 
-err:
 	unlock_device_hotplug();
 
 	if (ret < 0)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 4/8] powernv/memtrace: always online added memory blocks
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 3/8] drivers/base/memory: store mapping between MMOP_* and string in an array David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 10:58   ` Michal Hocko
                     ` (2 more replies)
  2020-03-17 10:49 ` [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Andrew Morton, Greg Kroah-Hartman, Michal Hocko, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He, Wei Yang

Let's always try to online the re-added memory blocks. In case add_memory()
already onlined the added memory blocks, the first device_online() call
will fail and stop processing the remaining memory blocks.

This avoids manually having to check memhp_auto_online.

Note: PPC always onlines all hotplugged memory directly from the kernel
as well - something that is handled by user space on other
architectures.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index d6d64f8718e6..13b369d2cc45 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -231,16 +231,10 @@ static int memtrace_online(void)
 			continue;
 		}
 
-		/*
-		 * If kernel isn't compiled with the auto online option
-		 * we need to online the memory ourselves.
-		 */
-		if (!memhp_auto_online) {
-			lock_device_hotplug();
-			walk_memory_blocks(ent->start, ent->size, NULL,
-					   online_mem_block);
-			unlock_device_hotplug();
-		}
+		lock_device_hotplug();
+		walk_memory_blocks(ent->start, ent->size, NULL,
+				   online_mem_block);
+		unlock_device_hotplug();
 
 		/*
 		 * Memory was added successfully so clean up references to it
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 16:29   ` Vitaly Kuznetsov
  2020-03-17 18:46   ` David Hildenbrand
  2020-03-17 10:49 ` [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Andrew Morton, Michal Hocko, Oscar Salvador, Rafael J. Wysocki,
	Baoquan He, Wei Yang, Vitaly Kuznetsov

We get the MEM_ONLINE notifier call if memory is added right from the
kernel via add_memory() or later from user space.

Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
mechanism (->done) for that. Initialize the wait event only once and
reinitialize before adding memory. Unconditionally call complete() and
wait_for_completion_timeout().

If there are no waiters, complete() will only increment ->done - which
will be reset by reinit_completion(). If complete() has already been
called, wait_for_completion_timeout() will not wait.

There is still the chance for a small race between concurrent
reinit_completion() and complete(). If complete() wins, we would not
wait - which is tolerable (and the race exists in current code as well).

Note: We only wait for "some" memory to get onlined, which seems to be
      good enough for now.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: linux-hyperv@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/hv/hv_balloon.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a02ce43d778d..af5e09f08130 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -533,7 +533,6 @@ struct hv_dynmem_device {
 	 * State to synchronize hot-add.
 	 */
 	struct completion  ol_waitevent;
-	bool ha_waiting;
 	/*
 	 * This thread handles hot-add
 	 * requests from the host as well as notifying
@@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 	switch (val) {
 	case MEM_ONLINE:
 	case MEM_CANCEL_ONLINE:
-		if (dm_device.ha_waiting) {
-			dm_device.ha_waiting = false;
-			complete(&dm_device.ol_waitevent);
-		}
+		complete(&dm_device.ol_waitevent);
 		break;
 
 	case MEM_OFFLINE:
@@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		has->covered_end_pfn +=  processed_pfn;
 		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-		init_completion(&dm_device.ol_waitevent);
-		dm_device.ha_waiting = !memhp_auto_online;
+		reinit_completion(&dm_device.ol_waitevent);
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
@@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		/*
-		 * Wait for the memory block to be onlined when memory onlining
-		 * is done outside of kernel (memhp_auto_online). Since the hot
-		 * add has succeeded, it is ok to proceed even if the pages in
-		 * the hot added region have not been "onlined" within the
-		 * allowed time.
+		 * Wait for memory to get onlined. If the kernel onlined the
+		 * memory when adding it, this will return directly. Otherwise,
+		 * it will wait for user space to online the memory. This helps
+		 * to avoid adding memory faster than it is getting onlined. As
+		 * adding succeeded, it is ok to proceed even if the memory was
+		 * not onlined in time.
 		 */
-		if (dm_device.ha_waiting)
-			wait_for_completion_timeout(&dm_device.ol_waitevent,
-						    5*HZ);
+		wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ);
 		post_status(&dm_device);
 	}
 }
@@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
 #ifdef CONFIG_MEMORY_HOTPLUG
 	set_online_page_callback(&hv_online_page);
 	register_memory_notifier(&hv_memory_nb);
+	init_completion(&dm_device.ol_waitevent);
 #endif
 
 	hv_set_drvdata(dev, &dm_device);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 10:59   ` Michal Hocko
  2020-03-17 22:24   ` Wei Yang
  2020-03-17 10:49 ` [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Andrew Morton, Michal Hocko, Oscar Salvador, Rafael J. Wysocki,
	Baoquan He, Wei Yang

All in-tree users except the mm-core are gone. Let's drop the export.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1a00b5a37ef6..2d2aae830b92 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -71,7 +71,6 @@ bool memhp_auto_online;
 #else
 bool memhp_auto_online = true;
 #endif
-EXPORT_SYMBOL_GPL(memhp_auto_online);
 
 static int __init setup_memhp_default_state(char *str)
 {
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 11:00   ` Michal Hocko
  2020-03-17 10:49 ` [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
  2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
  8 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Wei Yang, Greg Kroah-Hartman, Andrew Morton, Michal Hocko,
	Oscar Salvador, Rafael J. Wysocki, Baoquan He

... and rename it to memhp_default_online_type. This is a preparation
for more detailed default online behavior.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          | 10 ++++------
 include/linux/memory_hotplug.h |  3 ++-
 mm/memory_hotplug.c            | 11 ++++++-----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8a7f29c0bf97..8d3e16dab69f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -386,10 +386,8 @@ static DEVICE_ATTR_RO(block_size_bytes);
 static ssize_t auto_online_blocks_show(struct device *dev,
 				       struct device_attribute *attr, char *buf)
 {
-	if (memhp_auto_online)
-		return sprintf(buf, "online\n");
-	else
-		return sprintf(buf, "offline\n");
+	return sprintf(buf, "%s\n",
+		       online_type_to_str[memhp_default_online_type]);
 }
 
 static ssize_t auto_online_blocks_store(struct device *dev,
@@ -397,9 +395,9 @@ static ssize_t auto_online_blocks_store(struct device *dev,
 					const char *buf, size_t count)
 {
 	if (sysfs_streq(buf, "online"))
-		memhp_auto_online = true;
+		memhp_default_online_type = MMOP_ONLINE;
 	else if (sysfs_streq(buf, "offline"))
-		memhp_auto_online = false;
+		memhp_default_online_type = MMOP_OFFLINE;
 	else
 		return -EINVAL;
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c2e06ed5e0e9..c6e090b34c4b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,7 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
 			struct mhp_restrictions *restrictions);
 extern u64 max_mem_size;
 
-extern bool memhp_auto_online;
+/* Default online_type (MMOP_*) when new memory blocks are added. */
+extern int memhp_default_online_type;
 /* If movable_node boot option specified */
 extern bool movable_node_enabled;
 static inline bool movable_node_is_enabled(void)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2d2aae830b92..1975a2b99a2b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -67,17 +67,17 @@ void put_online_mems(void)
 bool movable_node_enabled = false;
 
 #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
-bool memhp_auto_online;
+int memhp_default_online_type = MMOP_OFFLINE;
 #else
-bool memhp_auto_online = true;
+int memhp_default_online_type = MMOP_ONLINE;
 #endif
 
 static int __init setup_memhp_default_state(char *str)
 {
 	if (!strcmp(str, "online"))
-		memhp_auto_online = true;
+		memhp_default_online_type = MMOP_ONLINE;
 	else if (!strcmp(str, "offline"))
-		memhp_auto_online = false;
+		memhp_default_online_type = MMOP_OFFLINE;
 
 	return 1;
 }
@@ -990,6 +990,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
 
 static int online_memory_block(struct memory_block *mem, void *arg)
 {
+	mem->online_type = memhp_default_online_type;
 	return device_online(&mem->dev);
 }
 
@@ -1062,7 +1063,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	mem_hotplug_done();
 
 	/* online pages if requested */
-	if (memhp_auto_online)
+	if (memhp_default_online_type != MMOP_OFFLINE)
 		walk_memory_blocks(start, size, NULL, online_memory_block);
 
 	return ret;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type David Hildenbrand
@ 2020-03-17 10:49 ` David Hildenbrand
  2020-03-17 11:01   ` Michal Hocko
  2020-03-17 11:08   ` David Hildenbrand
  2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
  8 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 10:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Wei Yang, Greg Kroah-Hartman, Andrew Morton, Michal Hocko,
	Oscar Salvador, Rafael J. Wysocki, Baoquan He

For now, distributions implement advanced udev rules to essentially
- Don't online any hotplugged memory (s390x)
- Online all memory to ZONE_NORMAL (e.g., most virt environments like
  hyperv)
- Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
  care of (e.g., bare metal, special virt environments)

In summary: All memory is usually onlined the same way, however, the
kernel always has to ask user space to come up with the same answer.
E.g., Hyper-V always waits for a memory block to get onlined before
continuing, otherwise it might end up adding memory faster than
hotplugging it, which can result in strange OOM situations. This waiting
slows down adding of a bigger amount of memory.

Let's allow to specify a default online_type, not just "online" and
"offline". This allows distributions to configure the default online_type
when booting up and be done with it.

We can now specify "offline", "online", "online_movable" and
"online_kernel" via
- "memhp_default_state=" on the kernel cmdline
- /sys/devices/system/memory/auto_online_blocks
just like we are able to specify for a single memory block via
/sys/devices/system/memory/memoryX/state

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          | 11 +++++------
 include/linux/memory_hotplug.h |  2 ++
 mm/memory_hotplug.c            |  8 ++++----
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8d3e16dab69f..2b09b68b9f78 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -35,7 +35,7 @@ static const char *const online_type_to_str[] = {
 	[MMOP_ONLINE_MOVABLE] = "online_movable",
 };
 
-static int memhp_online_type_from_str(const char *str)
+int memhp_online_type_from_str(const char *str)
 {
 	int i;
 
@@ -394,13 +394,12 @@ static ssize_t auto_online_blocks_store(struct device *dev,
 					struct device_attribute *attr,
 					const char *buf, size_t count)
 {
-	if (sysfs_streq(buf, "online"))
-		memhp_default_online_type = MMOP_ONLINE;
-	else if (sysfs_streq(buf, "offline"))
-		memhp_default_online_type = MMOP_OFFLINE;
-	else
+	const int online_type = memhp_online_type_from_str(buf);
+
+	if (online_type < 0)
 		return -EINVAL;
 
+	memhp_default_online_type = online_type;
 	return count;
 }
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c6e090b34c4b..ef55115320fb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,6 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
 			struct mhp_restrictions *restrictions);
 extern u64 max_mem_size;
 
+extern int memhp_online_type_from_str(const char *str);
+
 /* Default online_type (MMOP_*) when new memory blocks are added. */
 extern int memhp_default_online_type;
 /* If movable_node boot option specified */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1975a2b99a2b..9916977b6ee1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -74,10 +74,10 @@ int memhp_default_online_type = MMOP_ONLINE;
 
 static int __init setup_memhp_default_state(char *str)
 {
-	if (!strcmp(str, "online"))
-		memhp_default_online_type = MMOP_ONLINE;
-	else if (!strcmp(str, "offline"))
-		memhp_default_online_type = MMOP_OFFLINE;
+	const int online_type = memhp_online_type_from_str(str);
+
+	if (online_type >= 0)
+		memhp_default_online_type = online_type;
 
 	return 1;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/8] powernv/memtrace: always online added memory blocks
  2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
@ 2020-03-17 10:58   ` Michal Hocko
  2020-03-17 22:04   ` Wei Yang
  2020-03-19  9:49   ` Michael Ellerman
  2 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2020-03-17 10:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Andrew Morton, Greg Kroah-Hartman, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He, Wei Yang

On Tue 17-03-20 11:49:38, David Hildenbrand wrote:
> Let's always try to online the re-added memory blocks. In case add_memory()
> already onlined the added memory blocks, the first device_online() call
> will fail and stop processing the remaining memory blocks.
> 
> This avoids manually having to check memhp_auto_online.
> 
> Note: PPC always onlines all hotplugged memory directly from the kernel
> as well - something that is handled by user space on other
> architectures.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index d6d64f8718e6..13b369d2cc45 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -231,16 +231,10 @@ static int memtrace_online(void)
>  			continue;
>  		}
>  
> -		/*
> -		 * If kernel isn't compiled with the auto online option
> -		 * we need to online the memory ourselves.
> -		 */
> -		if (!memhp_auto_online) {
> -			lock_device_hotplug();
> -			walk_memory_blocks(ent->start, ent->size, NULL,
> -					   online_mem_block);
> -			unlock_device_hotplug();
> -		}
> +		lock_device_hotplug();
> +		walk_memory_blocks(ent->start, ent->size, NULL,
> +				   online_mem_block);
> +		unlock_device_hotplug();
>  
>  		/*
>  		 * Memory was added successfully so clean up references to it
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online
  2020-03-17 10:49 ` [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online David Hildenbrand
@ 2020-03-17 10:59   ` Michal Hocko
  2020-03-17 22:24   ` Wei Yang
  1 sibling, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2020-03-17 10:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Andrew Morton, Oscar Salvador, Rafael J. Wysocki, Baoquan He,
	Wei Yang

On Tue 17-03-20 11:49:40, David Hildenbrand wrote:
> All in-tree users except the mm-core are gone. Let's drop the export.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memory_hotplug.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1a00b5a37ef6..2d2aae830b92 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -71,7 +71,6 @@ bool memhp_auto_online;
>  #else
>  bool memhp_auto_online = true;
>  #endif
> -EXPORT_SYMBOL_GPL(memhp_auto_online);
>  
>  static int __init setup_memhp_default_state(char *str)
>  {
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type
  2020-03-17 10:49 ` [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type David Hildenbrand
@ 2020-03-17 11:00   ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2020-03-17 11:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv, Wei Yang,
	Greg Kroah-Hartman, Andrew Morton, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He

On Tue 17-03-20 11:49:41, David Hildenbrand wrote:
> ... and rename it to memhp_default_online_type. This is a preparation
> for more detailed default online behavior.
> 
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/memory.c          | 10 ++++------
>  include/linux/memory_hotplug.h |  3 ++-
>  mm/memory_hotplug.c            | 11 ++++++-----
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8a7f29c0bf97..8d3e16dab69f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -386,10 +386,8 @@ static DEVICE_ATTR_RO(block_size_bytes);
>  static ssize_t auto_online_blocks_show(struct device *dev,
>  				       struct device_attribute *attr, char *buf)
>  {
> -	if (memhp_auto_online)
> -		return sprintf(buf, "online\n");
> -	else
> -		return sprintf(buf, "offline\n");
> +	return sprintf(buf, "%s\n",
> +		       online_type_to_str[memhp_default_online_type]);
>  }
>  
>  static ssize_t auto_online_blocks_store(struct device *dev,
> @@ -397,9 +395,9 @@ static ssize_t auto_online_blocks_store(struct device *dev,
>  					const char *buf, size_t count)
>  {
>  	if (sysfs_streq(buf, "online"))
> -		memhp_auto_online = true;
> +		memhp_default_online_type = MMOP_ONLINE;
>  	else if (sysfs_streq(buf, "offline"))
> -		memhp_auto_online = false;
> +		memhp_default_online_type = MMOP_OFFLINE;
>  	else
>  		return -EINVAL;
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c2e06ed5e0e9..c6e090b34c4b 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -117,7 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
>  			struct mhp_restrictions *restrictions);
>  extern u64 max_mem_size;
>  
> -extern bool memhp_auto_online;
> +/* Default online_type (MMOP_*) when new memory blocks are added. */
> +extern int memhp_default_online_type;
>  /* If movable_node boot option specified */
>  extern bool movable_node_enabled;
>  static inline bool movable_node_is_enabled(void)
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 2d2aae830b92..1975a2b99a2b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -67,17 +67,17 @@ void put_online_mems(void)
>  bool movable_node_enabled = false;
>  
>  #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
> -bool memhp_auto_online;
> +int memhp_default_online_type = MMOP_OFFLINE;
>  #else
> -bool memhp_auto_online = true;
> +int memhp_default_online_type = MMOP_ONLINE;
>  #endif
>  
>  static int __init setup_memhp_default_state(char *str)
>  {
>  	if (!strcmp(str, "online"))
> -		memhp_auto_online = true;
> +		memhp_default_online_type = MMOP_ONLINE;
>  	else if (!strcmp(str, "offline"))
> -		memhp_auto_online = false;
> +		memhp_default_online_type = MMOP_OFFLINE;
>  
>  	return 1;
>  }
> @@ -990,6 +990,7 @@ static int check_hotplug_memory_range(u64 start, u64 size)
>  
>  static int online_memory_block(struct memory_block *mem, void *arg)
>  {
> +	mem->online_type = memhp_default_online_type;
>  	return device_online(&mem->dev);
>  }
>  
> @@ -1062,7 +1063,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  	mem_hotplug_done();
>  
>  	/* online pages if requested */
> -	if (memhp_auto_online)
> +	if (memhp_default_online_type != MMOP_OFFLINE)
>  		walk_memory_blocks(start, size, NULL, online_memory_block);
>  
>  	return ret;
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-17 10:49 ` [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
@ 2020-03-17 11:01   ` Michal Hocko
  2020-03-17 11:05     ` David Hildenbrand
  2020-03-17 11:08   ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2020-03-17 11:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv, Wei Yang,
	Greg Kroah-Hartman, Andrew Morton, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He

On Tue 17-03-20 11:49:42, David Hildenbrand wrote:
> For now, distributions implement advanced udev rules to essentially
> - Don't online any hotplugged memory (s390x)
> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>   hyperv)
> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>   care of (e.g., bare metal, special virt environments)
> 
> In summary: All memory is usually onlined the same way, however, the
> kernel always has to ask user space to come up with the same answer.
> E.g., Hyper-V always waits for a memory block to get onlined before
> continuing, otherwise it might end up adding memory faster than
> hotplugging it, which can result in strange OOM situations. This waiting
> slows down adding of a bigger amount of memory.
> 
> Let's allow to specify a default online_type, not just "online" and
> "offline". This allows distributions to configure the default online_type
> when booting up and be done with it.
> 
> We can now specify "offline", "online", "online_movable" and
> "online_kernel" via
> - "memhp_default_state=" on the kernel cmdline
> - /sys/devices/system/memory/auto_online_blocks
> just like we are able to specify for a single memory block via
> /sys/devices/system/memory/memoryX/state
> 
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

As I've said earlier and several times already, I really dislike this
interface. But it is fact that this patch doesn't make it any worse.
Quite contrary, so feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/base/memory.c          | 11 +++++------
>  include/linux/memory_hotplug.h |  2 ++
>  mm/memory_hotplug.c            |  8 ++++----
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8d3e16dab69f..2b09b68b9f78 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -35,7 +35,7 @@ static const char *const online_type_to_str[] = {
>  	[MMOP_ONLINE_MOVABLE] = "online_movable",
>  };
>  
> -static int memhp_online_type_from_str(const char *str)
> +int memhp_online_type_from_str(const char *str)
>  {
>  	int i;
>  
> @@ -394,13 +394,12 @@ static ssize_t auto_online_blocks_store(struct device *dev,
>  					struct device_attribute *attr,
>  					const char *buf, size_t count)
>  {
> -	if (sysfs_streq(buf, "online"))
> -		memhp_default_online_type = MMOP_ONLINE;
> -	else if (sysfs_streq(buf, "offline"))
> -		memhp_default_online_type = MMOP_OFFLINE;
> -	else
> +	const int online_type = memhp_online_type_from_str(buf);
> +
> +	if (online_type < 0)
>  		return -EINVAL;
>  
> +	memhp_default_online_type = online_type;
>  	return count;
>  }
>  
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index c6e090b34c4b..ef55115320fb 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -117,6 +117,8 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
>  			struct mhp_restrictions *restrictions);
>  extern u64 max_mem_size;
>  
> +extern int memhp_online_type_from_str(const char *str);
> +
>  /* Default online_type (MMOP_*) when new memory blocks are added. */
>  extern int memhp_default_online_type;
>  /* If movable_node boot option specified */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1975a2b99a2b..9916977b6ee1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -74,10 +74,10 @@ int memhp_default_online_type = MMOP_ONLINE;
>  
>  static int __init setup_memhp_default_state(char *str)
>  {
> -	if (!strcmp(str, "online"))
> -		memhp_default_online_type = MMOP_ONLINE;
> -	else if (!strcmp(str, "offline"))
> -		memhp_default_online_type = MMOP_OFFLINE;
> +	const int online_type = memhp_online_type_from_str(str);
> +
> +	if (online_type >= 0)
> +		memhp_default_online_type = online_type;
>  
>  	return 1;
>  }
> -- 
> 2.24.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-17 11:01   ` Michal Hocko
@ 2020-03-17 11:05     ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 11:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv, Wei Yang,
	Greg Kroah-Hartman, Andrew Morton, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He

On 17.03.20 12:01, Michal Hocko wrote:
> On Tue 17-03-20 11:49:42, David Hildenbrand wrote:
>> For now, distributions implement advanced udev rules to essentially
>> - Don't online any hotplugged memory (s390x)
>> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>>   hyperv)
>> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>>   care of (e.g., bare metal, special virt environments)
>>
>> In summary: All memory is usually onlined the same way, however, the
>> kernel always has to ask user space to come up with the same answer.
>> E.g., Hyper-V always waits for a memory block to get onlined before
>> continuing, otherwise it might end up adding memory faster than
>> hotplugging it, which can result in strange OOM situations. This waiting
>> slows down adding of a bigger amount of memory.
>>
>> Let's allow to specify a default online_type, not just "online" and
>> "offline". This allows distributions to configure the default online_type
>> when booting up and be done with it.
>>
>> We can now specify "offline", "online", "online_movable" and
>> "online_kernel" via
>> - "memhp_default_state=" on the kernel cmdline
>> - /sys/devices/system/memory/auto_online_blocks
>> just like we are able to specify for a single memory block via
>> /sys/devices/system/memory/memoryX/state
>>
>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> As I've said earlier and several times already, I really dislike this
> interface. But it is fact that this patch doesn't make it any worse.
> Quite contrary, so feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks Michal!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-17 10:49 ` [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
  2020-03-17 11:01   ` Michal Hocko
@ 2020-03-17 11:08   ` David Hildenbrand
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 11:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, Wei Yang,
	Greg Kroah-Hartman, Andrew Morton, Michal Hocko, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He

On 17.03.20 11:49, David Hildenbrand wrote:
> For now, distributions implement advanced udev rules to essentially
> - Don't online any hotplugged memory (s390x)
> - Online all memory to ZONE_NORMAL (e.g., most virt environments like
>   hyperv)
> - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
>   care of (e.g., bare metal, special virt environments)
> 
> In summary: All memory is usually onlined the same way, however, the
> kernel always has to ask user space to come up with the same answer.
> E.g., Hyper-V always waits for a memory block to get onlined before
> continuing, otherwise it might end up adding memory faster than
> hotplugging it, which can result in strange OOM situations. This waiting

s/hotplugging/onlining/

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
  2020-03-17 10:49 ` [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually David Hildenbrand
@ 2020-03-17 16:29   ` Vitaly Kuznetsov
  2020-03-17 16:33     ` David Hildenbrand
  2020-03-17 18:46   ` David Hildenbrand
  1 sibling, 1 reply; 29+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-17 16:29 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Andrew Morton, Michal Hocko, Oscar Salvador, Rafael J. Wysocki,
	Baoquan He, Wei Yang

David Hildenbrand <david@redhat.com> writes:

> We get the MEM_ONLINE notifier call if memory is added right from the
> kernel via add_memory() or later from user space.
>
> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
> mechanism (->done) for that. Initialize the wait event only once and
> reinitialize before adding memory. Unconditionally call complete() and
> wait_for_completion_timeout().
>
> If there are no waiters, complete() will only increment ->done - which
> will be reset by reinit_completion(). If complete() has already been
> called, wait_for_completion_timeout() will not wait.
>
> There is still the chance for a small race between concurrent
> reinit_completion() and complete(). If complete() wins, we would not
> wait - which is tolerable (and the race exists in current code as
> well).

How can we see concurent reinit_completion() and complete()? Obvioulsy,
we are not onlining new memory in kernel and hv_mem_hot_add() calls are
serialized, we're waiting up to 5*HZ for the added block to come online
before proceeding to the next one. Or do you mean we actually hit this
5*HZ timeout, proceeded to the next block and immediately after
reinit_completion() we saw complete() for the previously added block?
This is tolerable indeed, we're making forward progress (and this all is
'best effort' anyway).

>
> Note: We only wait for "some" memory to get onlined, which seems to be
>       good enough for now.
>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: linux-hyperv@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/hv/hv_balloon.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a02ce43d778d..af5e09f08130 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -533,7 +533,6 @@ struct hv_dynmem_device {
>  	 * State to synchronize hot-add.
>  	 */
>  	struct completion  ol_waitevent;
> -	bool ha_waiting;
>  	/*
>  	 * This thread handles hot-add
>  	 * requests from the host as well as notifying
> @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>  	switch (val) {
>  	case MEM_ONLINE:
>  	case MEM_CANCEL_ONLINE:
> -		if (dm_device.ha_waiting) {
> -			dm_device.ha_waiting = false;
> -			complete(&dm_device.ol_waitevent);
> -		}
> +		complete(&dm_device.ol_waitevent);
>  		break;
>  
>  	case MEM_OFFLINE:
> @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  		has->covered_end_pfn +=  processed_pfn;
>  		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
> -		init_completion(&dm_device.ol_waitevent);
> -		dm_device.ha_waiting = !memhp_auto_online;
> +		reinit_completion(&dm_device.ol_waitevent);
>  
>  		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>  		ret = add_memory(nid, PFN_PHYS((start_pfn)),
> @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  		}
>  
>  		/*
> -		 * Wait for the memory block to be onlined when memory onlining
> -		 * is done outside of kernel (memhp_auto_online). Since the hot
> -		 * add has succeeded, it is ok to proceed even if the pages in
> -		 * the hot added region have not been "onlined" within the
> -		 * allowed time.
> +		 * Wait for memory to get onlined. If the kernel onlined the
> +		 * memory when adding it, this will return directly. Otherwise,
> +		 * it will wait for user space to online the memory. This helps
> +		 * to avoid adding memory faster than it is getting onlined. As
> +		 * adding succeeded, it is ok to proceed even if the memory was
> +		 * not onlined in time.
>  		 */
> -		if (dm_device.ha_waiting)
> -			wait_for_completion_timeout(&dm_device.ol_waitevent,
> -						    5*HZ);
> +		wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ);
>  		post_status(&dm_device);
>  	}
>  }
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	set_online_page_callback(&hv_online_page);
>  	register_memory_notifier(&hv_memory_nb);
> +	init_completion(&dm_device.ol_waitevent);
>  #endif
>  
>  	hv_set_drvdata(dev, &dm_device);

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
  2020-03-17 16:29   ` Vitaly Kuznetsov
@ 2020-03-17 16:33     ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 16:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Wei Liu, Andrew Morton,
	Michal Hocko, Oscar Salvador, Rafael J. Wysocki, Baoquan He,
	Wei Yang

On 17.03.20 17:29, Vitaly Kuznetsov wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> We get the MEM_ONLINE notifier call if memory is added right from the
>> kernel via add_memory() or later from user space.
>>
>> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
>> mechanism (->done) for that. Initialize the wait event only once and
>> reinitialize before adding memory. Unconditionally call complete() and
>> wait_for_completion_timeout().
>>
>> If there are no waiters, complete() will only increment ->done - which
>> will be reset by reinit_completion(). If complete() has already been
>> called, wait_for_completion_timeout() will not wait.
>>
>> There is still the chance for a small race between concurrent
>> reinit_completion() and complete(). If complete() wins, we would not
>> wait - which is tolerable (and the race exists in current code as
>> well).
> 
> How can we see concurent reinit_completion() and complete()? Obvioulsy,
> we are not onlining new memory in kernel and hv_mem_hot_add() calls are
> serialized, we're waiting up to 5*HZ for the added block to come online
> before proceeding to the next one. Or do you mean we actually hit this
> 5*HZ timeout, proceeded to the next block and immediately after
> reinit_completion() we saw complete() for the previously added block?

Yes exactly - or if an admin manually offlines+re-onlines a random
memory block.

> This is tolerable indeed, we're making forward progress (and this all is
> 'best effort' anyway).

Exactly my thoughts.

[...]

> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually
  2020-03-17 10:49 ` [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually David Hildenbrand
  2020-03-17 16:29   ` Vitaly Kuznetsov
@ 2020-03-17 18:46   ` David Hildenbrand
  1 sibling, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2020-03-17 18:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Wei Liu, Andrew Morton,
	Michal Hocko, Oscar Salvador, Rafael J. Wysocki, Baoquan He,
	Wei Yang, Vitaly Kuznetsov

> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	set_online_page_callback(&hv_online_page);
>  	register_memory_notifier(&hv_memory_nb);
> +	init_completion(&dm_device.ol_waitevent);

I'll move this one line up.

>  #endif
>  
>  	hv_set_drvdata(dev, &dm_device);
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/8] powernv/memtrace: always online added memory blocks
  2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
  2020-03-17 10:58   ` Michal Hocko
@ 2020-03-17 22:04   ` Wei Yang
  2020-03-19  9:49   ` Michael Ellerman
  2 siblings, 0 replies; 29+ messages in thread
From: Wei Yang @ 2020-03-17 22:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Andrew Morton, Greg Kroah-Hartman, Michal Hocko, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He, Wei Yang

On Tue, Mar 17, 2020 at 11:49:38AM +0100, David Hildenbrand wrote:
>Let's always try to online the re-added memory blocks. In case add_memory()
>already onlined the added memory blocks, the first device_online() call
>will fail and stop processing the remaining memory blocks.
>
>This avoids manually having to check memhp_auto_online.
>
>Note: PPC always onlines all hotplugged memory directly from the kernel
>as well - something that is handled by user space on other
>architectures.
>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Baoquan He <bhe@redhat.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: linuxppc-dev@lists.ozlabs.org
>Signed-off-by: David Hildenbrand <david@redhat.com>

Looks good.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>

>---
> arch/powerpc/platforms/powernv/memtrace.c | 14 ++++----------
> 1 file changed, 4 insertions(+), 10 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>index d6d64f8718e6..13b369d2cc45 100644
>--- a/arch/powerpc/platforms/powernv/memtrace.c
>+++ b/arch/powerpc/platforms/powernv/memtrace.c
>@@ -231,16 +231,10 @@ static int memtrace_online(void)
> 			continue;
> 		}
> 
>-		/*
>-		 * If kernel isn't compiled with the auto online option
>-		 * we need to online the memory ourselves.
>-		 */
>-		if (!memhp_auto_online) {
>-			lock_device_hotplug();
>-			walk_memory_blocks(ent->start, ent->size, NULL,
>-					   online_mem_block);
>-			unlock_device_hotplug();
>-		}
>+		lock_device_hotplug();
>+		walk_memory_blocks(ent->start, ent->size, NULL,
>+				   online_mem_block);
>+		unlock_device_hotplug();
> 
> 		/*
> 		 * Memory was added successfully so clean up references to it
>-- 
>2.24.1

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online
  2020-03-17 10:49 ` [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online David Hildenbrand
  2020-03-17 10:59   ` Michal Hocko
@ 2020-03-17 22:24   ` Wei Yang
  1 sibling, 0 replies; 29+ messages in thread
From: Wei Yang @ 2020-03-17 22:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Andrew Morton, Michal Hocko, Oscar Salvador, Rafael J. Wysocki,
	Baoquan He, Wei Yang

On Tue, Mar 17, 2020 at 11:49:40AM +0100, David Hildenbrand wrote:
>All in-tree users except the mm-core are gone. Let's drop the export.
>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Oscar Salvador <osalvador@suse.de>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Baoquan He <bhe@redhat.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>

>---
> mm/memory_hotplug.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 1a00b5a37ef6..2d2aae830b92 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -71,7 +71,6 @@ bool memhp_auto_online;
> #else
> bool memhp_auto_online = true;
> #endif
>-EXPORT_SYMBOL_GPL(memhp_auto_online);
> 
> static int __init setup_memhp_default_state(char *str)
> {
>-- 
>2.24.1

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-03-17 10:49 ` [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
@ 2020-03-18 13:05 ` Baoquan He
  2020-03-18 13:50   ` David Hildenbrand
                     ` (2 more replies)
  8 siblings, 3 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-18 13:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Vitaly Kuznetsov, Yumei Huang, Igor Mammedov, Eduardo Habkost,
	Milan Zamazal, Andrew Morton, Benjamin Herrenschmidt,
	Greg Kroah-Hartman, Haiyang Zhang, K. Y. Srinivasan,
	Michael Ellerman, Michal Hocko, Michal Hocko, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On 03/17/20 at 11:49am, David Hildenbrand wrote:
> Distributions nowadays use udev rules ([1] [2]) to specify if and
> how to online hotplugged memory. The rules seem to get more complex with
> many special cases. Due to the various special cases,
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> is handled via udev rules.
> 
> Everytime we hotplug memory, the udev rule will come to the same
> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> memory in separate memory blocks and wait for memory to get onlined by user
> space before continuing to add more memory blocks (to not add memory faster
> than it is getting onlined). This of course slows down the whole memory
> hotplug process.
> 
> To make the job of distributions easier and to avoid udev rules that get
> more and more complicated, let's extend the mechanism provided by
> - /sys/devices/system/memory/auto_online_blocks
> - "memhp_default_state=" on the kernel cmdline
> to be able to specify also "online_movable" as well as "online_kernel"

This patch series looks good, thanks. Since Andrew has merged it to -mm again,
I won't add my Reviewed-by to bother. 

Hi David, Vitaly

There are several things unclear to me.

So, these improved interfaces are used to alleviate the burden of the 
existing udev rules, or try to replace it? As you know, we have been
using udev rules to interact between kernel and user space on bare metal,
and guests who want to hot add/remove.

And also the OOM issue in hyperV when onlining pages after adding memory
block. I am not a virt devel expert, could this happen on bare metal
system?

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
@ 2020-03-18 13:50   ` David Hildenbrand
  2020-03-18 14:50     ` Baoquan He
  2020-03-18 13:54   ` Michal Hocko
  2020-03-18 13:58   ` Vitaly Kuznetsov
  2 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2020-03-18 13:50 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Vitaly Kuznetsov, Yumei Huang, Igor Mammedov, Eduardo Habkost,
	Milan Zamazal, Andrew Morton, Benjamin Herrenschmidt,
	Greg Kroah-Hartman, Haiyang Zhang, K. Y. Srinivasan,
	Michael Ellerman, Michal Hocko, Michal Hocko, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On 18.03.20 14:05, Baoquan He wrote:
> On 03/17/20 at 11:49am, David Hildenbrand wrote:
>> Distributions nowadays use udev rules ([1] [2]) to specify if and
>> how to online hotplugged memory. The rules seem to get more complex with
>> many special cases. Due to the various special cases,
>> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
>> is handled via udev rules.
>>
>> Everytime we hotplug memory, the udev rule will come to the same
>> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
>> memory in separate memory blocks and wait for memory to get onlined by user
>> space before continuing to add more memory blocks (to not add memory faster
>> than it is getting onlined). This of course slows down the whole memory
>> hotplug process.
>>
>> To make the job of distributions easier and to avoid udev rules that get
>> more and more complicated, let's extend the mechanism provided by
>> - /sys/devices/system/memory/auto_online_blocks
>> - "memhp_default_state=" on the kernel cmdline
>> to be able to specify also "online_movable" as well as "online_kernel"
> 
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 
> 
> Hi David, Vitaly
> 
> There are several things unclear to me.
> 
> So, these improved interfaces are used to alleviate the burden of the 
> existing udev rules, or try to replace it? As you know, we have been

At least in RHEL, my plan is to replace it / use a udev rules as a
fallback on older kernels (see the example scripts below). But other
distribution can handle it as they want.

> using udev rules to interact between kernel and user space on bare metal,
> and guests who want to hot add/remove.>
> And also the OOM issue in hyperV when onlining pages after adding memory
> block. I am not a virt devel expert, could this happen on bare metal
> system?

Don't think it's relevant on bare metal. If you plug a big DIMM, all
memory blocks will be added first in one shot and then all memory blocks
will be onlined. So it doesn't matter "how fast" you online that memory.

In contrast, Hyper-V (and virtio-mem) add one (or a limited number of)
memory block at a time and wait for them to get onlined.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
  2020-03-18 13:50   ` David Hildenbrand
@ 2020-03-18 13:54   ` Michal Hocko
  2020-03-18 14:41     ` Baoquan He
  2020-03-18 13:58   ` Vitaly Kuznetsov
  2 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2020-03-18 13:54 UTC (permalink / raw)
  To: Baoquan He
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	linux-hyperv, Vitaly Kuznetsov, Yumei Huang, Igor Mammedov,
	Eduardo Habkost, Milan Zamazal, Andrew Morton,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Haiyang Zhang,
	K. Y. Srinivasan, Michael Ellerman, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On Wed 18-03-20 21:05:17, Baoquan He wrote:
> On 03/17/20 at 11:49am, David Hildenbrand wrote:
> > Distributions nowadays use udev rules ([1] [2]) to specify if and
> > how to online hotplugged memory. The rules seem to get more complex with
> > many special cases. Due to the various special cases,
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> > is handled via udev rules.
> > 
> > Everytime we hotplug memory, the udev rule will come to the same
> > conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> > memory in separate memory blocks and wait for memory to get onlined by user
> > space before continuing to add more memory blocks (to not add memory faster
> > than it is getting onlined). This of course slows down the whole memory
> > hotplug process.
> > 
> > To make the job of distributions easier and to avoid udev rules that get
> > more and more complicated, let's extend the mechanism provided by
> > - /sys/devices/system/memory/auto_online_blocks
> > - "memhp_default_state=" on the kernel cmdline
> > to be able to specify also "online_movable" as well as "online_kernel"
> 
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 

JFYI, Andrew usually adds R-b or A-b tags as they are posted.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
  2020-03-18 13:50   ` David Hildenbrand
  2020-03-18 13:54   ` Michal Hocko
@ 2020-03-18 13:58   ` Vitaly Kuznetsov
  2020-03-18 14:41     ` Baoquan He
  2 siblings, 1 reply; 29+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-18 13:58 UTC (permalink / raw)
  To: Baoquan He, David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv, Yumei Huang,
	Igor Mammedov, Eduardo Habkost, Milan Zamazal, Andrew Morton,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Haiyang Zhang,
	K. Y. Srinivasan, Michael Ellerman, Michal Hocko, Michal Hocko,
	Oscar Salvador, Paul Mackerras, Rafael J. Wysocki,
	Stephen Hemminger, Wei Liu, Wei Yang

Baoquan He <bhe@redhat.com> writes:

> On 03/17/20 at 11:49am, David Hildenbrand wrote:
>> Distributions nowadays use udev rules ([1] [2]) to specify if and
>> how to online hotplugged memory. The rules seem to get more complex with
>> many special cases. Due to the various special cases,
>> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
>> is handled via udev rules.
>> 
>> Everytime we hotplug memory, the udev rule will come to the same
>> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
>> memory in separate memory blocks and wait for memory to get onlined by user
>> space before continuing to add more memory blocks (to not add memory faster
>> than it is getting onlined). This of course slows down the whole memory
>> hotplug process.
>> 
>> To make the job of distributions easier and to avoid udev rules that get
>> more and more complicated, let's extend the mechanism provided by
>> - /sys/devices/system/memory/auto_online_blocks
>> - "memhp_default_state=" on the kernel cmdline
>> to be able to specify also "online_movable" as well as "online_kernel"
>
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 
>
> Hi David, Vitaly
>
> There are several things unclear to me.
>
> So, these improved interfaces are used to alleviate the burden of the 
> existing udev rules, or try to replace it? As you know, we have been
> using udev rules to interact between kernel and user space on bare metal,
> and guests who want to hot add/remove.

With 'auto_online_blocks' interface you don't need the udev rule. David
is trying to make it more versatile.

>
> And also the OOM issue in hyperV when onlining pages after adding memory
> block. I am not a virt devel expert, could this happen on bare metal
> system?

Yes - in theory, very unlikely - in practice.

The root cause of the problem here is adding more memory to the system
requires memory (page tables, memmaps,..) so if your system is low on
memory and you're trying to hotplug A LOT you may run into OOM before
you're able to online anything. With bare metal it's usualy not the
case: servers, which are able to hotplug memory, are usually booted with
enough memory and memory hotplug is a manual action (you need to insert
DIMMs!). But, if you boot your server with e.g. 4G, almost exhaust it
and then try to hotplug e.g. 256G ... well, OOM is almost guaranteed.
With virtual machines it's very common (e.g. with Hyper-V VMs) to boot
them with low memory and hotplug it (automatically, by some management
software) when neededm thus the problem is way more common.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:58   ` Vitaly Kuznetsov
@ 2020-03-18 14:41     ` Baoquan He
  2020-03-18 15:00       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 29+ messages in thread
From: Baoquan He @ 2020-03-18 14:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	linux-hyperv, Yumei Huang, Igor Mammedov, Eduardo Habkost,
	Milan Zamazal, Andrew Morton, Benjamin Herrenschmidt,
	Greg Kroah-Hartman, Haiyang Zhang, K. Y. Srinivasan,
	Michael Ellerman, Michal Hocko, Michal Hocko, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On 03/18/20 at 02:58pm, Vitaly Kuznetsov wrote:
> Baoquan He <bhe@redhat.com> writes:
> 
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> >> Distributions nowadays use udev rules ([1] [2]) to specify if and
> >> how to online hotplugged memory. The rules seem to get more complex with
> >> many special cases. Due to the various special cases,
> >> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> >> is handled via udev rules.
> >> 
> >> Everytime we hotplug memory, the udev rule will come to the same
> >> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> >> memory in separate memory blocks and wait for memory to get onlined by user
> >> space before continuing to add more memory blocks (to not add memory faster
> >> than it is getting onlined). This of course slows down the whole memory
> >> hotplug process.
> >> 
> >> To make the job of distributions easier and to avoid udev rules that get
> >> more and more complicated, let's extend the mechanism provided by
> >> - /sys/devices/system/memory/auto_online_blocks
> >> - "memhp_default_state=" on the kernel cmdline
> >> to be able to specify also "online_movable" as well as "online_kernel"
> >
> > This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> > I won't add my Reviewed-by to bother. 
> >
> > Hi David, Vitaly
> >
> > There are several things unclear to me.
> >
> > So, these improved interfaces are used to alleviate the burden of the 
> > existing udev rules, or try to replace it? As you know, we have been
> > using udev rules to interact between kernel and user space on bare metal,
> > and guests who want to hot add/remove.
> 
> With 'auto_online_blocks' interface you don't need the udev rule. David
> is trying to make it more versatile.
> 
> >
> > And also the OOM issue in hyperV when onlining pages after adding memory
> > block. I am not a virt devel expert, could this happen on bare metal
> > system?
> 
> Yes - in theory, very unlikely - in practice.
> 
> The root cause of the problem here is adding more memory to the system
> requires memory (page tables, memmaps,..) so if your system is low on
> memory and you're trying to hotplug A LOT you may run into OOM before
> you're able to online anything. With bare metal it's usualy not the
> case: servers, which are able to hotplug memory, are usually booted with
> enough memory and memory hotplug is a manual action (you need to insert
> DIMMs!). But, if you boot your server with e.g. 4G, almost exhaust it
> and then try to hotplug e.g. 256G ... well, OOM is almost guaranteed.

Thanks for this detailed explanation.

I finally know why this is a problem in hyperV. But with the current
mechanism, it will happen on any system if thing is done like this. 

Is there a reason hyperV need boot with small memory, then enlarge it
with huge memory? Since it's a real case in hyperV, I guess there must
be reason, I am just curious.

> With virtual machines it's very common (e.g. with Hyper-V VMs) to boot
> them with low memory and hotplug it (automatically, by some management
> software) when neededm thus the problem is way more common.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:54   ` Michal Hocko
@ 2020-03-18 14:41     ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-18 14:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	linux-hyperv, Vitaly Kuznetsov, Yumei Huang, Igor Mammedov,
	Eduardo Habkost, Milan Zamazal, Andrew Morton,
	Benjamin Herrenschmidt, Greg Kroah-Hartman, Haiyang Zhang,
	K. Y. Srinivasan, Michael Ellerman, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On 03/18/20 at 02:54pm, Michal Hocko wrote:
> On Wed 18-03-20 21:05:17, Baoquan He wrote:
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> > > Distributions nowadays use udev rules ([1] [2]) to specify if and
> > > how to online hotplugged memory. The rules seem to get more complex with
> > > many special cases. Due to the various special cases,
> > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> > > is handled via udev rules.
> > > 
> > > Everytime we hotplug memory, the udev rule will come to the same
> > > conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> > > memory in separate memory blocks and wait for memory to get onlined by user
> > > space before continuing to add more memory blocks (to not add memory faster
> > > than it is getting onlined). This of course slows down the whole memory
> > > hotplug process.
> > > 
> > > To make the job of distributions easier and to avoid udev rules that get
> > > more and more complicated, let's extend the mechanism provided by
> > > - /sys/devices/system/memory/auto_online_blocks
> > > - "memhp_default_state=" on the kernel cmdline
> > > to be able to specify also "online_movable" as well as "online_kernel"
> > 
> > This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> > I won't add my Reviewed-by to bother. 
> 
> JFYI, Andrew usually adds R-b or A-b tags as they are posted.

Got it, thanks for telling.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 13:50   ` David Hildenbrand
@ 2020-03-18 14:50     ` Baoquan He
  0 siblings, 0 replies; 29+ messages in thread
From: Baoquan He @ 2020-03-18 14:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, linux-hyperv,
	Vitaly Kuznetsov, Yumei Huang, Igor Mammedov, Eduardo Habkost,
	Milan Zamazal, Andrew Morton, Benjamin Herrenschmidt,
	Greg Kroah-Hartman, Haiyang Zhang, K. Y. Srinivasan,
	Michael Ellerman, Michal Hocko, Michal Hocko, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

On 03/18/20 at 02:50pm, David Hildenbrand wrote:
> On 18.03.20 14:05, Baoquan He wrote:
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> >> Distributions nowadays use udev rules ([1] [2]) to specify if and
> >> how to online hotplugged memory. The rules seem to get more complex with
> >> many special cases. Due to the various special cases,
> >> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> >> is handled via udev rules.
> >>
> >> Everytime we hotplug memory, the udev rule will come to the same
> >> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> >> memory in separate memory blocks and wait for memory to get onlined by user
> >> space before continuing to add more memory blocks (to not add memory faster
> >> than it is getting onlined). This of course slows down the whole memory
> >> hotplug process.
> >>
> >> To make the job of distributions easier and to avoid udev rules that get
> >> more and more complicated, let's extend the mechanism provided by
> >> - /sys/devices/system/memory/auto_online_blocks
> >> - "memhp_default_state=" on the kernel cmdline
> >> to be able to specify also "online_movable" as well as "online_kernel"
> > 
> > This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> > I won't add my Reviewed-by to bother. 
> > 
> > Hi David, Vitaly
> > 
> > There are several things unclear to me.
> > 
> > So, these improved interfaces are used to alleviate the burden of the 
> > existing udev rules, or try to replace it? As you know, we have been
> 
> At least in RHEL, my plan is to replace it / use a udev rules as a
> fallback on older kernels (see the example scripts below). But other

Ok, got it. Didn't notice the script and the systemd service are your
part of plan, thought you are demonstrating the status. Thanks.

> distribution can handle it as they want.
> 
> > using udev rules to interact between kernel and user space on bare metal,
> > and guests who want to hot add/remove.>
> > And also the OOM issue in hyperV when onlining pages after adding memory
> > block. I am not a virt devel expert, could this happen on bare metal
> > system?
> 
> Don't think it's relevant on bare metal. If you plug a big DIMM, all
> memory blocks will be added first in one shot and then all memory blocks
> will be onlined. So it doesn't matter "how fast" you online that memory.
> 
> In contrast, Hyper-V (and virtio-mem) add one (or a limited number of)
> memory block at a time and wait for them to get onlined.
> 
> -- 
> Thanks,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type
  2020-03-18 14:41     ` Baoquan He
@ 2020-03-18 15:00       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 29+ messages in thread
From: Vitaly Kuznetsov @ 2020-03-18 15:00 UTC (permalink / raw)
  To: Baoquan He
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	linux-hyperv, Yumei Huang, Igor Mammedov, Eduardo Habkost,
	Milan Zamazal, Andrew Morton, Benjamin Herrenschmidt,
	Greg Kroah-Hartman, Haiyang Zhang, K. Y. Srinivasan,
	Michael Ellerman, Michal Hocko, Michal Hocko, Oscar Salvador,
	Paul Mackerras, Rafael J. Wysocki, Stephen Hemminger, Wei Liu,
	Wei Yang

Baoquan He <bhe@redhat.com> writes:

> Is there a reason hyperV need boot with small memory, then enlarge it
> with huge memory? Since it's a real case in hyperV, I guess there must
> be reason, I am just curious.
>

It doesn't really *need* to but this can be utilized in e.g. 'hot
standby' schemes I believe. Also, it may be enough if the administrator
is just trying to e.g. double the size of RAM but the VM is already
under memory pressure. I wouldn't say that these cases are common but
afair bugs like 'I tried adding more memory to my VM and it just OOMed'
were reported in the past.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/8] powernv/memtrace: always online added memory blocks
  2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
  2020-03-17 10:58   ` Michal Hocko
  2020-03-17 22:04   ` Wei Yang
@ 2020-03-19  9:49   ` Michael Ellerman
  2 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2020-03-19  9:49 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, linux-hyperv, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Andrew Morton,
	Greg Kroah-Hartman, Michal Hocko, Oscar Salvador,
	Rafael J. Wysocki, Baoquan He, Wei Yang

David Hildenbrand <david@redhat.com> writes:
> Let's always try to online the re-added memory blocks. In case add_memory()
> already onlined the added memory blocks, the first device_online() call
> will fail and stop processing the remaining memory blocks.
>
> This avoids manually having to check memhp_auto_online.
>
> Note: PPC always onlines all hotplugged memory directly from the kernel
> as well - something that is handled by user space on other
> architectures.
>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)

Fine by me.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index d6d64f8718e6..13b369d2cc45 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -231,16 +231,10 @@ static int memtrace_online(void)
>  			continue;
>  		}
>  
> -		/*
> -		 * If kernel isn't compiled with the auto online option
> -		 * we need to online the memory ourselves.
> -		 */
> -		if (!memhp_auto_online) {
> -			lock_device_hotplug();
> -			walk_memory_blocks(ent->start, ent->size, NULL,
> -					   online_mem_block);
> -			unlock_device_hotplug();
> -		}
> +		lock_device_hotplug();
> +		walk_memory_blocks(ent->start, ent->size, NULL,
> +				   online_mem_block);
> +		unlock_device_hotplug();
>  
>  		/*
>  		 * Memory was added successfully so clean up references to it
> -- 
> 2.24.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-03-19  9:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 10:49 [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
2020-03-17 10:49 ` [PATCH v2 1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE David Hildenbrand
2020-03-17 10:49 ` [PATCH v2 2/8] drivers/base/memory: map MMOP_OFFLINE to 0 David Hildenbrand
2020-03-17 10:49 ` [PATCH v2 3/8] drivers/base/memory: store mapping between MMOP_* and string in an array David Hildenbrand
2020-03-17 10:49 ` [PATCH v2 4/8] powernv/memtrace: always online added memory blocks David Hildenbrand
2020-03-17 10:58   ` Michal Hocko
2020-03-17 22:04   ` Wei Yang
2020-03-19  9:49   ` Michael Ellerman
2020-03-17 10:49 ` [PATCH v2 5/8] hv_balloon: don't check for memhp_auto_online manually David Hildenbrand
2020-03-17 16:29   ` Vitaly Kuznetsov
2020-03-17 16:33     ` David Hildenbrand
2020-03-17 18:46   ` David Hildenbrand
2020-03-17 10:49 ` [PATCH v2 6/8] mm/memory_hotplug: unexport memhp_auto_online David Hildenbrand
2020-03-17 10:59   ` Michal Hocko
2020-03-17 22:24   ` Wei Yang
2020-03-17 10:49 ` [PATCH v2 7/8] mm/memory_hotplug: convert memhp_auto_online to store an online_type David Hildenbrand
2020-03-17 11:00   ` Michal Hocko
2020-03-17 10:49 ` [PATCH v2 8/8] mm/memory_hotplug: allow to specify a default online_type David Hildenbrand
2020-03-17 11:01   ` Michal Hocko
2020-03-17 11:05     ` David Hildenbrand
2020-03-17 11:08   ` David Hildenbrand
2020-03-18 13:05 ` [PATCH v2 0/8] " Baoquan He
2020-03-18 13:50   ` David Hildenbrand
2020-03-18 14:50     ` Baoquan He
2020-03-18 13:54   ` Michal Hocko
2020-03-18 14:41     ` Baoquan He
2020-03-18 13:58   ` Vitaly Kuznetsov
2020-03-18 14:41     ` Baoquan He
2020-03-18 15:00       ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).