Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Anthony Yznaga <anthony.yznaga@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, rppt@linux.ibm.com,
	akpm@linux-foundation.org, hughd@google.com,
	ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org,
	ndesaulniers@google.com, dima@golovin.in,
	daniel.kiper@oracle.com, nivedita@alum.mit.edu,
	rafael.j.wysocki@intel.com, dan.j.williams@intel.com,
	zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com,
	guro@fb.com, Thomas.Lendacky@amd.com,
	andriy.shevchenko@linux.intel.com, keescook@chromium.org,
	hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org,
	ying.huang@intel.com, yang.shi@linux.alibaba.com,
	gustavo@embeddedor.com, ziqian.lzq@antfin.com,
	vdavydov.dev@gmail.com, jason.zeng@intel.com,
	kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com,
	paul.c.lai@intel.com, ashok.raj@intel.com,
	linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
	kexec@lists.infradead.org
Subject: [RFC 01/43] mm: add PKRAM API stubs and Kconfig
Date: Wed,  6 May 2020 17:41:27 -0700	[thread overview]
Message-ID: <1588812129-8596-2-git-send-email-anthony.yznaga@oracle.com> (raw)
In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com>

Preserved-across-kexec memory or PKRAM is a method for saving memory
pages of the currently executing kernel and restoring them after kexec
boot into a new one. This can be utilized for preserving guest VM state,
large in-memory databases, process memory, etc. across reboot. While
DRAM-as-PMEM or actual persistent memory could be used to accomplish
these things, PKRAM provides the latency of DRAM with the flexibility
of dynamically determining the amount of memory to preserve.

The proposed API:

 * Preserved memory is divided into nodes which can be saved or loaded
   independently of each other. The nodes are identified by unique name
   strings. A PKRAM node is created when save is initiated by calling
   pkram_prepare_save(). A PKRAM node is removed when load is initiated by
   calling pkram_prepare_load(). See below

 * A node is further divided into objects. An object represents a
   grouping of associated pages and any relevant metadata preserved
   with them. For example, the pages and attributes of a file.

 * For saving/loading data from a PKRAM node/object an instance of the
   pkram_stream struct is used. The struct is initialized by calling
   pkram_prepare_save() for saving data or pkram_prepare_load() for
   loading data. After save (load) is complete, pkram_finish_save()
   (pkram_finish_load()) must be called. If an error occurred during
   save, the saved data and the PKRAM node may be freed by calling
   pkram_discard_save() instead of pkram_finish_save().

 * Both page data and byte data can separately be streamed to a PKRAM
   object.  pkram_save_page() and pkram_load_page() are used to stream
   page data while pkram_write() and pkram_read() are used to stream byte
   data.

A sequence of operations for saving/loading data from PKRAM would
look like:

  * For saving data to PKRAM:

    /* create a PKRAM node and do initial stream setup */
    pkram_prepare_save()

    /* create a PKRAM object associated with the PKRAM node and complete stream initialization */
    pkram_prepare_save_obj()

    /* save data to the node/object */
    pkram_save_page()[,...]      /* for page stream, or
    pkram_write()[,...]           * ... for byte stream */

    pkram_finish_save_obj()

    /* commit the save or discard and delete the node */
    pkram_finish_save()          /* on success, or
    pkram_discard_save()          * ... in case of error */

  * For loading data from PKRAM:

    /* remove a PKRAM node from the list and do initial stream setup */
    pkram_prepare_load()

    /* Remove a PKRAM object from the node and complete stream initializtion for loading data from it. */
    pkram_prepare_load_obj()

    /* load data from the node/object */
    pkram_load_page()[,...]      /* for page stream, or
    pkram_read()[,...]            * ... for byte stream */

    /* free the object */
    pkram_finish_load_obj()

    /* free the node */
    pkram_finish_load()

Originally-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
---
 include/linux/pkram.h |  32 ++++++++++
 mm/Kconfig            |   9 +++
 mm/Makefile           |   1 +
 mm/pkram.c            | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 211 insertions(+)
 create mode 100644 include/linux/pkram.h
 create mode 100644 mm/pkram.c

diff --git a/include/linux/pkram.h b/include/linux/pkram.h
new file mode 100644
index 000000000000..4c4e13311ec8
--- /dev/null
+++ b/include/linux/pkram.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PKRAM_H
+#define _LINUX_PKRAM_H
+
+#include <linux/gfp.h>
+#include <linux/types.h>
+#include <linux/mm_types.h>
+
+struct pkram_stream;
+
+#define PKRAM_NAME_MAX		256	/* including nul */
+
+int pkram_prepare_save(struct pkram_stream *ps, const char *name,
+		       gfp_t gfp_mask);
+int pkram_prepare_save_obj(struct pkram_stream *ps);
+void pkram_finish_save(struct pkram_stream *ps);
+void pkram_finish_save_obj(struct pkram_stream *ps);
+void pkram_discard_save(struct pkram_stream *ps);
+
+int pkram_prepare_load(struct pkram_stream *ps, const char *name);
+int pkram_prepare_load_obj(struct pkram_stream *ps);
+void pkram_finish_load(struct pkram_stream *ps);
+void pkram_finish_load_obj(struct pkram_stream *ps);
+
+int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags);
+struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index,
+			     short *flags);
+
+ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count);
+size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count);
+
+#endif /* _LINUX_PKRAM_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..bddf20ecf6e1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,13 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+config PKRAM
+	bool "Preserved-over-kexec memory storage"
+	default n
+	help
+	  This option adds the kernel API that enables saving memory pages of
+	  the currently executing kernel and restoring them after a kexec in
+	  the newly booted one. This can be utilized for speeding up reboot by
+	  leaving process memory and/or FS caches in-place.
+
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index fccd3756b25f..59cd381194af 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -112,3 +112,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
 obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
 obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o
+obj-$(CONFIG_PKRAM) += pkram.o
diff --git a/mm/pkram.c b/mm/pkram.c
new file mode 100644
index 000000000000..d6f2f79d4852
--- /dev/null
+++ b/mm/pkram.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/pkram.h>
+#include <linux/types.h>
+
+/**
+ * Create a preserved memory node with name @name and initialize stream @ps
+ * for saving data to it.
+ *
+ * @gfp_mask specifies the memory allocation mask to be used when saving data.
+ *
+ * Returns 0 on success, -errno on failure.
+ *
+ * After the save has finished, pkram_finish_save() (or pkram_discard_save() in
+ * case of failure) is to be called.
+ */
+int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Create a preserved memory object and initialize stream @ps for saving data
+ * to it.
+ *
+ * Returns 0 on success, -errno on failure.
+ *
+ * After the save has finished, pkram_finish_save_obj() (or pkram_discard_save()
+ * in case of failure) is to be called.
+ */
+int pkram_prepare_save_obj(struct pkram_stream *ps)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Commit the object started with pkram_prepare_save_obj() to preserved memory.
+ */
+void pkram_finish_save_obj(struct pkram_stream *ps)
+{
+	BUG();
+}
+
+/**
+ * Commit the save to preserved memory started with pkram_prepare_save().
+ * After the call, the stream may not be used any more.
+ */
+void pkram_finish_save(struct pkram_stream *ps)
+{
+	BUG();
+}
+
+/**
+ * Cancel the save to preserved memory started with pkram_prepare_save() and
+ * destroy the corresponding preserved memory node freeing any data already
+ * saved to it.
+ */
+void pkram_discard_save(struct pkram_stream *ps)
+{
+	BUG();
+}
+
+/**
+ * Remove the preserved memory node with name @name and initialize stream @ps
+ * for loading data from it.
+ *
+ * Returns 0 on success, -errno on failure.
+ *
+ * After the load has finished, pkram_finish_load() is to be called.
+ */
+int pkram_prepare_load(struct pkram_stream *ps, const char *name)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Remove the next preserved memory object from the stream @ps and
+ * initialize stream @ps for loading data from it.
+ *
+ * Returns 0 on success, -errno on failure.
+ *
+ * After the load has finished, pkram_finish_load_obj() is to be called.
+ */
+int pkram_prepare_load_obj(struct pkram_stream *ps)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Finish the load of a preserved memory object started with
+ * pkram_prepare_load_obj() freeing the object and any data that has not
+ * been loaded from it.
+ */
+void pkram_finish_load_obj(struct pkram_stream *ps)
+{
+	BUG();
+}
+
+/**
+ * Finish the load from preserved memory started with pkram_prepare_load()
+ * freeing the corresponding preserved memory node and any data that has
+ * not been loaded from it.
+ */
+void pkram_finish_load(struct pkram_stream *ps)
+{
+	BUG();
+}
+
+/**
+ * Save page @page to the preserved memory node and object associated with
+ * stream @ps. The stream must have been initialized with pkram_prepare_save()
+ * and pkram_prepare_save_obj().
+ *
+ * @flags specifies supplemental page state to be preserved.
+ *
+ * Returns 0 on success, -errno on failure.
+ */
+int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Load the next page from the preserved memory node and object associated
+ * with stream @ps. The stream must have been initialized with
+ * pkram_prepare_load() and pkram_prepare_load_obj().
+ *
+ * If not NULL, @index is initialized with the preserved mapping offset of the
+ * page loaded.
+ * If not NULL, @flags is initialized with preserved supplemental state of the
+ * page loaded.
+ *
+ * Returns the page loaded or NULL if the node is empty.
+ *
+ * The page loaded has its refcount incremented.
+ */
+struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags)
+{
+	return NULL;
+}
+
+/**
+ * Copy @count bytes from @buf to the preserved memory node and object
+ * associated with stream @ps. The stream must have been initialized with
+ * pkram_prepare_save() and pkram_prepare_save_obj().
+ *
+ * On success, returns the number of bytes written, which is always equal to
+ * @count. On failure, -errno is returned.
+ */
+ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count)
+{
+	return -ENOSYS;
+}
+
+/**
+ * Copy up to @count bytes from the preserved memory node and object
+ * associated with stream @ps to @buf. The stream must have been initialized
+ * with pkram_prepare_load() and pkram_prepare_load_obj().
+ *
+ * Returns the number of bytes read, which may be less than @count if the node
+ * has fewer bytes available.
+ */
+size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count)
+{
+	return 0;
+}
-- 
2.13.3


  reply	other threads:[~2020-05-07  0:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07  0:41 [RFC 00/43] PKRAM: Preserved-over-Kexec RAM Anthony Yznaga
2020-05-07  0:41 ` Anthony Yznaga [this message]
2020-05-07  0:41 ` [RFC 02/43] mm: PKRAM: implement node load and save functions Anthony Yznaga
2020-05-07  0:41 ` [RFC 03/43] mm: PKRAM: implement object " Anthony Yznaga
2020-05-07  0:41 ` [RFC 04/43] mm: PKRAM: implement page stream operations Anthony Yznaga
2020-05-07  0:41 ` [RFC 05/43] mm: PKRAM: support preserving transparent hugepages Anthony Yznaga
2020-05-07  0:41 ` [RFC 06/43] mm: PKRAM: implement byte stream operations Anthony Yznaga
2020-05-07  0:41 ` [RFC 07/43] mm: PKRAM: link nodes by pfn before reboot Anthony Yznaga
2020-05-07  0:41 ` [RFC 08/43] mm: PKRAM: introduce super block Anthony Yznaga
2020-05-07  0:41 ` [RFC 09/43] PKRAM: build a physical mapping pagetable of pages to be preserved Anthony Yznaga
2020-05-07  0:41 ` [RFC 10/43] PKRAM: add code for walking the preserved pages pagetable Anthony Yznaga
2020-05-07  0:41 ` [RFC 11/43] PKRAM: pass the preserved pages pagetable to the next kernel Anthony Yznaga
2020-05-07  0:41 ` [RFC 12/43] mm: PKRAM: reserve preserved memory at boot Anthony Yznaga
2020-05-07  0:41 ` [RFC 13/43] mm: PKRAM: free preserved pages pagetable Anthony Yznaga
2020-05-07  0:41 ` [RFC 14/43] mm: memblock: PKRAM: prevent memblock resize from clobbering preserved pages Anthony Yznaga
2020-05-11 13:57   ` Mike Rapoport
2020-05-11 23:29     ` Anthony Yznaga
2020-05-07  0:41 ` [RFC 15/43] PKRAM: provide a way to ban pages from use by PKRAM Anthony Yznaga
2020-05-07  0:41 ` [RFC 16/43] kexec: PKRAM: prevent kexec clobbering preserved pages in some cases Anthony Yznaga
2020-05-07  0:41 ` [RFC 17/43] PKRAM: provide a way to check if a memory range has preserved pages Anthony Yznaga
2020-05-07  0:41 ` [RFC 18/43] kexec: PKRAM: avoid clobbering already " Anthony Yznaga
2020-05-07  0:41 ` [RFC 19/43] mm: PKRAM: allow preserved memory to be freed from userspace Anthony Yznaga
2020-05-07  0:41 ` [RFC 20/43] PKRAM: disable feature when running the kdump kernel Anthony Yznaga
2020-05-07  0:41 ` [RFC 21/43] x86/KASLR: PKRAM: support physical kaslr Anthony Yznaga
2020-05-07 17:51   ` Kees Cook
2020-05-07 18:41     ` Anthony Yznaga
2020-05-07  0:41 ` [RFC 22/43] mm: shmem: introduce shmem_insert_page Anthony Yznaga
2020-05-07  0:41 ` [RFC 23/43] mm: shmem: enable saving to PKRAM Anthony Yznaga
2020-05-07  0:41 ` [RFC 24/43] mm: shmem: prevent swapping of PKRAM-enabled tmpfs pages Anthony Yznaga
2020-05-07  0:41 ` [RFC 25/43] mm: shmem: specify the mm to use when inserting pages Anthony Yznaga
2020-05-07  0:41 ` [RFC 26/43] mm: shmem: when inserting, handle pages already charged to a memcg Anthony Yznaga
2020-05-07  0:41 ` [RFC 27/43] x86/mm/numa: add numa_isolate_memblocks() Anthony Yznaga
2020-05-07  0:41 ` [RFC 28/43] PKRAM: ensure memblocks with preserved pages init'd for numa Anthony Yznaga
2020-05-07  0:41 ` [RFC 29/43] memblock: PKRAM: mark memblocks that contain preserved pages Anthony Yznaga
2020-05-07  0:41 ` [RFC 30/43] memblock: add for_each_reserved_mem_range() Anthony Yznaga
2020-05-07  0:41 ` [RFC 31/43] memblock, mm: defer initialization of preserved pages Anthony Yznaga
2020-05-07  0:41 ` [RFC 32/43] shmem: PKRAM: preserve shmem files a chunk at a time Anthony Yznaga
2020-05-07  0:41 ` [RFC 33/43] PKRAM: atomically add and remove link pages Anthony Yznaga
2020-05-07  0:42 ` [RFC 34/43] shmem: PKRAM: multithread preserving and restoring shmem pages Anthony Yznaga
2020-05-07 16:30   ` Randy Dunlap
2020-05-07 17:59     ` Anthony Yznaga
2020-05-07  0:42 ` [RFC 35/43] shmem: introduce shmem_insert_pages() Anthony Yznaga
2020-05-07  0:42 ` [RFC 36/43] PKRAM: add support for loading pages in bulk Anthony Yznaga
2020-05-07  0:42 ` [RFC 37/43] shmem: PKRAM: enable bulk loading of preserved pages into shmem Anthony Yznaga
2020-05-07  0:42 ` [RFC 38/43] mm: implement splicing a list of pages to the LRU Anthony Yznaga
2020-05-07  0:42 ` [RFC 39/43] shmem: optimize adding pages to the LRU in shmem_insert_pages() Anthony Yznaga
2020-05-07  0:42 ` [RFC 40/43] shmem: initial support for adding multiple pages to pagecache Anthony Yznaga
2020-05-07  0:42 ` [RFC 41/43] XArray: add xas_export_node() and xas_import_node() Anthony Yznaga
2020-05-07  0:42 ` [RFC 42/43] shmem: reduce time holding xa_lock when inserting pages Anthony Yznaga
2020-05-07  0:42 ` [RFC 43/43] PKRAM: improve index alignment of pkram_link entries Anthony Yznaga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1588812129-8596-2-git-send-email-anthony.yznaga@oracle.com \
    --to=anthony.yznaga@oracle.com \
    --cc=Thomas.Lendacky@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=ardb@kernel.org \
    --cc=ashok.raj@intel.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.kiper@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dima@golovin.in \
    --cc=ebiederm@xmission.com \
    --cc=guro@fb.com \
    --cc=gustavo@embeddedor.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jason.zeng@intel.com \
    --cc=jroedel@suse.de \
    --cc=keescook@chromium.org \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=lei.l.li@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=ndesaulniers@google.com \
    --cc=nivedita@alum.mit.edu \
    --cc=paul.c.lai@intel.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=zhenzhong.duan@oracle.com \
    --cc=zhiyuan.lv@intel.com \
    --cc=ziqian.lzq@antfin.com \
    --subject='Re: [RFC 01/43] mm: add PKRAM API stubs and Kconfig' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).