LKML Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
@ 2018-09-07 17:04 ` Ahmed S. Darwish
  2018-09-18 14:31   ` Yi Zhang
  2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Ahmed S. Darwish @ 2018-09-07 17:04 UTC (permalink / raw)
  To: Zhang Yi
  Cc: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch, linux-mm,
	rkrcmar, jglisse, yi.z.zhang

Hi!

On Sat, Sep 08, 2018 at 02:03:02AM +0800, Zhang Yi wrote:
[...]
>
> V1:
> https://lkml.org/lkml/2018/7/4/91
>
> V2:
> https://lkml.org/lkml/2018/7/10/135
>
> V3:
> https://lkml.org/lkml/2018/8/9/17
>
> V4:
> https://lkml.org/lkml/2018/8/22/17
>

Can we please avoid referencing "lkml.org"?

It's just an unreliable broken website. [1][2] Much more important
though is that its URLs _hide_ the Message-Id field; running the
threat of losing the e-mail reference forever at some point in the
future.

From Documentation/process/submitting-patches.rst:

    If the patch follows from a mailing list discussion, give a
    URL to the mailing list archive; use the https://lkml.kernel.org/
    redirector with a ``Message-Id``, to ensure that the links
    cannot become stale.

So the V1 link above should've been either:

    https://lore.kernel.org/lkml/cover.1530716899.git.yi.z.zhang@linux.intel.com

or:

    https://lkml.kernel.org/r/cover.1530716899.git.yi.z.zhang@linux.intel.com

and so on..

Thanks,

[1] https://www.theregister.co.uk/2018/01/14/linux_kernel_mailing_list_archives_will_return_soon
[2] The threading interface is also broken and in a lot of cases
    does not show all messages in a thread

--
Darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio
@ 2018-09-07 18:03 Zhang Yi
  2018-09-07 17:04 ` Ahmed S. Darwish
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Zhang Yi @ 2018-09-07 18:03 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang, Zhang Yi

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

V1:
https://lkml.org/lkml/2018/7/4/91

V2:
https://lkml.org/lkml/2018/7/10/135

V3:
https://lkml.org/lkml/2018/8/9/17

V4:
https://lkml.org/lkml/2018/8/22/17

V5:
[PATCH V3 1/4] Reviewed-by: David / Acked-by: Pankaj
[PATCH V3 2/4] Reviewed-by: Jan
[PATCH V3 3/4] Acked-by: Jan
[PATCH V3 4/4] Added "Acked-by: Pankaj", Added in-line comments: Dave

Zhang Yi (4):
  kvm: remove redundant reserved page check
  mm: introduce memory type MEMORY_DEVICE_DEV_DAX
  mm: add a function to differentiate the pages is from DAX device
    memory
  kvm: add a check if pfn is from NVDIMM pmem.

 drivers/dax/pmem.c       |  1 +
 include/linux/memremap.h |  8 ++++++++
 include/linux/mm.h       | 12 ++++++++++++
 virt/kvm/kvm_main.c      | 24 ++++++++++++++++--------
 4 files changed, 37 insertions(+), 8 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V5 1/4] kvm: remove redundant reserved page check
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
  2018-09-07 17:04 ` Ahmed S. Darwish
@ 2018-09-07 18:03 ` Zhang Yi
  2018-10-24 14:32   ` Yi Zhang
  2018-09-07 18:03 ` [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX Zhang Yi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2018-09-07 18:03 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang, Zhang Yi

PageReserved() is already checked inside kvm_is_reserved_pfn(),
remove it from kvm_set_pfn_dirty().

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
---
 virt/kvm/kvm_main.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b47507f..c44c406 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1690,12 +1690,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
 
 void kvm_set_pfn_dirty(kvm_pfn_t pfn)
 {
-	if (!kvm_is_reserved_pfn(pfn)) {
-		struct page *page = pfn_to_page(pfn);
-
-		if (!PageReserved(page))
-			SetPageDirty(page);
-	}
+	if (!kvm_is_reserved_pfn(pfn))
+		SetPageDirty(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
  2018-09-07 17:04 ` Ahmed S. Darwish
  2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi
@ 2018-09-07 18:03 ` Zhang Yi
  2018-09-07 18:03 ` [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory Zhang Yi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2018-09-07 18:03 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang, Zhang Yi

Currently, NVDIMM pages will be marked 'PageReserved'. However, unlike
other reserved PFNs, pages on NVDIMM shall still behave like normal ones
in many cases, i.e. when used as backend memory of KVM guest. This patch
introduces a new memory type, MEMORY_DEVICE_DEV_DAX. And set this flag
while dax driver hotplug the device memory.

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/dax/pmem.c       | 1 +
 include/linux/memremap.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index fd49b24..fb3f363 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -111,6 +111,7 @@ static int dax_pmem_probe(struct device *dev)
 		return rc;
 
 	dax_pmem->pgmap.ref = &dax_pmem->ref;
+	dax_pmem->pgmap.type = MEMORY_DEVICE_DEV_DAX;
 	addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index f91f9e7..cd07ca8 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -53,11 +53,19 @@ struct vmem_altmap {
  * wakeup event whenever a page is unpinned and becomes idle. This
  * wakeup is used to coordinate physical address space management (ex:
  * fs truncate/hole punch) vs pinned pages (ex: device dma).
+ *
+ * MEMORY_DEVICE_DEV_DAX:
+ * Device memory that support raw access to persistent memory. Without need
+ * of an intervening filesystem, it could be directed mapped via an mmap
+ * capable character device. Together with the type MEMORY_DEVICE_FS_DAX,
+ * we could distinguish the persistent memory pages from normal ZONE_DEVICE
+ * pages.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
+	MEMORY_DEVICE_DEV_DAX,
 };
 
 /*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
                   ` (2 preceding siblings ...)
  2018-09-07 18:03 ` [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX Zhang Yi
@ 2018-09-07 18:03 ` Zhang Yi
  2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi
  2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang
  5 siblings, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2018-09-07 18:03 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang, Zhang Yi

DAX driver hotplug the device memory and move it to memory zone, these
pages will be marked reserved flag, however, some other kernel componet
will misconceive these pages are reserved mmio (ex: we map these dev_dax
or fs_dax pages to kvm for DIMM/NVDIMM backend). Together with the type
MEMORY_DEVICE_FS_DAX, we can use is_dax_page() to differentiate the pages
is DAX device memory or not.

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 68a5121..de5cbc3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -889,6 +889,13 @@ static inline bool is_device_public_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PUBLIC;
 }
 
+static inline bool is_dax_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		(page->pgmap->type == MEMORY_DEVICE_FS_DAX ||
+		page->pgmap->type == MEMORY_DEVICE_DEV_DAX);
+}
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
 static inline void dev_pagemap_get_ops(void)
 {
@@ -912,6 +919,11 @@ static inline bool is_device_public_page(const struct page *page)
 {
 	return false;
 }
+
+static inline bool is_dax_page(const struct page *page)
+{
+	return false;
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline void get_page(struct page *page)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
                   ` (3 preceding siblings ...)
  2018-09-07 18:03 ` [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory Zhang Yi
@ 2018-09-07 18:04 ` Zhang Yi
  2018-09-19  2:53   ` Dan Williams
  2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang
  5 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2018-09-07 18:04 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang, Zhang Yi

For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved, the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to identify these pages are from NVDIMM pmem and let kvm treat these
as normal pages.

Without this patch, many operations will be missed due to this
mistreatment to pmem pages, for example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
---
 virt/kvm/kvm_main.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c44c406..9c49634 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 
 bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return PageReserved(pfn_to_page(pfn));
+	struct page *page;
+
+	if (pfn_valid(pfn)) {
+		page = pfn_to_page(pfn);
+
+		/*
+		 * For device specific memory space, there is a case
+		 * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages
+		 * to kvm, these pages marked reserved flag as it is a
+		 * zone device memory, we need to identify these pages
+		 * and let kvm treat these as normal pages
+		 */
+		return PageReserved(page) && !is_dax_page(page);
+	}
 
 	return true;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio
  2018-09-07 17:04 ` Ahmed S. Darwish
@ 2018-09-18 14:31   ` Yi Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Yi Zhang @ 2018-09-18 14:31 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch, linux-mm,
	rkrcmar, jglisse, yi.z.zhang

Thanks Darwi's remind, Will follow that next time.

Thanks.
Yi

On 2018-09-07 at 17:04:51 +0000, Ahmed S. Darwish wrote:
> Hi!
> 
> On Sat, Sep 08, 2018 at 02:03:02AM +0800, Zhang Yi wrote:
> [...]
> >
> > V1:
> > https://lkml.org/lkml/2018/7/4/91
> >
> > V2:
> > https://lkml.org/lkml/2018/7/10/135
> >
> > V3:
> > https://lkml.org/lkml/2018/8/9/17
> >
> > V4:
> > https://lkml.org/lkml/2018/8/22/17
> >
> 
> Can we please avoid referencing "lkml.org"?
> 
> It's just an unreliable broken website. [1][2] Much more important
> though is that its URLs _hide_ the Message-Id field; running the
> threat of losing the e-mail reference forever at some point in the
> future.
> 
> From Documentation/process/submitting-patches.rst:
> 
>     If the patch follows from a mailing list discussion, give a
>     URL to the mailing list archive; use the https://lkml.kernel.org/
>     redirector with a ``Message-Id``, to ensure that the links
>     cannot become stale.
> 
> So the V1 link above should've been either:
> 
>     https://lore.kernel.org/lkml/cover.1530716899.git.yi.z.zhang@linux.intel.com
> 
> or:
> 
>     https://lkml.kernel.org/r/cover.1530716899.git.yi.z.zhang@linux.intel.com
> 
> and so on..
> 
> Thanks,
> 
> [1] https://www.theregister.co.uk/2018/01/14/linux_kernel_mailing_list_archives_will_return_soon
> [2] The threading interface is also broken and in a lot of cases
>     does not show all messages in a thread
> 
> --
> Darwi
> http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio
  2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang
@ 2018-09-19  2:43   ` Pankaj Gupta
  0 siblings, 0 replies; 20+ messages in thread
From: Pankaj Gupta @ 2018-09-19  2:43 UTC (permalink / raw)
  To: Yi Zhang
  Cc: kvm, linux-kernel, linux-nvdimm, pbonzini, dan j williams,
	dave jiang, yu c zhang, david, jack, hch, linux-mm, rkrcmar,
	jglisse, yi z zhang


Hello Yi,

> Any comments?
> 
> Hi Pankaj and Paolo,

I am just helping with the review. Paolo & Dan probably will decide.

Thanks,
Pankaj

> 
> Can we Queue this to merge list since there no other comments last 2
> weeks?
> 
> Regards
> Yi.
> 
> On 2018-09-08 at 02:03:02 +0800, Zhang Yi wrote:
> > For device specific memory space, when we move these area of pfn to
> > memory zone, we will set the page reserved flag at that time, some of
> > these reserved for device mmio, and some of these are not, such as
> > NVDIMM pmem.
> > 
> > Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> > backend, since these pages are reserved. the check of
> > kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> > introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> > to indentify these pages are from NVDIMM pmem. and let kvm treat these
> > as normal pages.
> > 
> > Without this patch, Many operations will be missed due to this
> > mistreatment to pmem pages. For example, a page may not have chance to
> > be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> > marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> > 
> > V1:
> > https://lkml.org/lkml/2018/7/4/91
> > 
> > V2:
> > https://lkml.org/lkml/2018/7/10/135
> > 
> > V3:
> > https://lkml.org/lkml/2018/8/9/17
> > 
> > V4:
> > https://lkml.org/lkml/2018/8/22/17
> > 
> > V5:
> > [PATCH V3 1/4] Reviewed-by: David / Acked-by: Pankaj
> > [PATCH V3 2/4] Reviewed-by: Jan
> > [PATCH V3 3/4] Acked-by: Jan
> > [PATCH V3 4/4] Added "Acked-by: Pankaj", Added in-line comments: Dave
> > 
> > Zhang Yi (4):
> >   kvm: remove redundant reserved page check
> >   mm: introduce memory type MEMORY_DEVICE_DEV_DAX
> >   mm: add a function to differentiate the pages is from DAX device
> >     memory
> >   kvm: add a check if pfn is from NVDIMM pmem.
> > 
> >  drivers/dax/pmem.c       |  1 +
> >  include/linux/memremap.h |  8 ++++++++
> >  include/linux/mm.h       | 12 ++++++++++++
> >  virt/kvm/kvm_main.c      | 24 ++++++++++++++++--------
> >  4 files changed, 37 insertions(+), 8 deletions(-)
> > 
> > --
> > 2.7.4
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi
@ 2018-09-19  2:53   ` Dan Williams
  2018-09-19  7:20     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2018-09-19  2:53 UTC (permalink / raw)
  To: Zhang Yi
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, David Hildenbrand,
	Jan Kara, Christoph Hellwig, Linux MM, rkrcmar,
	Jérôme Glisse, Zhang, Yi Z

On Fri, Sep 7, 2018 at 2:25 AM Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
>
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved, the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to identify these pages are from NVDIMM pmem and let kvm treat these
> as normal pages.
>
> Without this patch, many operations will be missed due to this
> mistreatment to pmem pages, for example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Acked-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c44c406..9c49634 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>
>  bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
>  {
> -       if (pfn_valid(pfn))
> -               return PageReserved(pfn_to_page(pfn));
> +       struct page *page;
> +
> +       if (pfn_valid(pfn)) {
> +               page = pfn_to_page(pfn);
> +
> +               /*
> +                * For device specific memory space, there is a case
> +                * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages
> +                * to kvm, these pages marked reserved flag as it is a
> +                * zone device memory, we need to identify these pages
> +                * and let kvm treat these as normal pages
> +                */
> +               return PageReserved(page) && !is_dax_page(page);

Should we consider just not setting PageReserved for
devm_memremap_pages()? Perhaps kvm is not be the only component making
these assumptions about this flag?

Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?

This has less to do with "dax" pages and more to do with
devm_memremap_pages() established ranges. P2PDMA is another producer
of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
used in these kvm paths then I think this points to consider clearing
the Reserved flag.

That said I haven't audited all the locations that test PageReserved().

Sorry for not responding sooner I was on extended leave.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-19  2:53   ` Dan Williams
@ 2018-09-19  7:20     ` David Hildenbrand
  2018-09-20 22:49       ` Yi Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-09-19  7:20 UTC (permalink / raw)
  To: Dan Williams, Zhang Yi
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

Am 19.09.18 um 04:53 schrieb Dan Williams:
> On Fri, Sep 7, 2018 at 2:25 AM Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
>>
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved, the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to identify these pages are from NVDIMM pmem and let kvm treat these
>> as normal pages.
>>
>> Without this patch, many operations will be missed due to this
>> mistreatment to pmem pages, for example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
>> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
>> Acked-by: Pankaj Gupta <pagupta@redhat.com>
>> ---
>>  virt/kvm/kvm_main.c | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index c44c406..9c49634 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>>
>>  bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
>>  {
>> -       if (pfn_valid(pfn))
>> -               return PageReserved(pfn_to_page(pfn));
>> +       struct page *page;
>> +
>> +       if (pfn_valid(pfn)) {
>> +               page = pfn_to_page(pfn);
>> +
>> +               /*
>> +                * For device specific memory space, there is a case
>> +                * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages
>> +                * to kvm, these pages marked reserved flag as it is a
>> +                * zone device memory, we need to identify these pages
>> +                * and let kvm treat these as normal pages
>> +                */
>> +               return PageReserved(page) && !is_dax_page(page);
> 
> Should we consider just not setting PageReserved for
> devm_memremap_pages()? Perhaps kvm is not be the only component making
> these assumptions about this flag?

I was asking the exact same question in v3 or so.

I was recently going through all PageReserved users, trying to clean up
and document how it is used.

PG_reserved used to be a marker "not available for the page allocator".
This is only partially true and not really helpful I think. My current
understanding:

"
PG_reserved is set for special pages, struct pages of such pages should
in general not be touched except by their owner. Pages marked as
reserved include:
- Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
- Pages allocated early during boot (bootmem, memblock)
- Zero pages
- Pages that have been associated with a zone but were not onlined
  (e.g. NVDIMM/pmem, online_page_callback used by XEN)
- Pages to exclude from the hibernation image (e.g. loaded kexec images)
- MCA (memory error) pages on ia64
- Offline pages
Some architectures don't allow to ioremap RAM pages that are not marked
as reserved. Allocated pages might have to be set reserved to allow for
that - if there is a good reason to enforce this. Consequently,
PG_reserved part of a user space table might be the indicator for the
zero page, pmem or MMIO pages.
"

Swapping code does not care about PageReserved at all as far as I
remember. This seems to be fine as it only looks at the way pages have
been mapped into user space.

I don't really see a good reason to set pmem pages as reserved. One
question would be, how/if to exclude them from the hibernation image.
But that could also be solved differently (we would have to double check
how they are handled in hibernation code).


A similar user of PageReserved to look at is:

drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()

It will not mark pages dirty if they are reserved. Similar to KVM code.

> 
> Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
> 
> This has less to do with "dax" pages and more to do with
> devm_memremap_pages() established ranges. P2PDMA is another producer
> of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
> used in these kvm paths then I think this points to consider clearing
> the Reserved flag.
> 
> That said I haven't audited all the locations that test PageReserved().
> 
> Sorry for not responding sooner I was on extended leave.
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio
  2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
                   ` (4 preceding siblings ...)
  2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi
@ 2018-09-19 10:55 ` Yi Zhang
  2018-09-19  2:43   ` Pankaj Gupta
  5 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2018-09-19 10:55 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-nvdimm, pbonzini, dan.j.williams,
	dave.jiang, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang

Any comments?

Hi Pankaj and Paolo,

Can we Queue this to merge list since there no other comments last 2
weeks?

Regards
Yi.

On 2018-09-08 at 02:03:02 +0800, Zhang Yi wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
> 
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
> 
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> 
> V1:
> https://lkml.org/lkml/2018/7/4/91
> 
> V2:
> https://lkml.org/lkml/2018/7/10/135
> 
> V3:
> https://lkml.org/lkml/2018/8/9/17
> 
> V4:
> https://lkml.org/lkml/2018/8/22/17
> 
> V5:
> [PATCH V3 1/4] Reviewed-by: David / Acked-by: Pankaj
> [PATCH V3 2/4] Reviewed-by: Jan
> [PATCH V3 3/4] Acked-by: Jan
> [PATCH V3 4/4] Added "Acked-by: Pankaj", Added in-line comments: Dave
> 
> Zhang Yi (4):
>   kvm: remove redundant reserved page check
>   mm: introduce memory type MEMORY_DEVICE_DEV_DAX
>   mm: add a function to differentiate the pages is from DAX device
>     memory
>   kvm: add a check if pfn is from NVDIMM pmem.
> 
>  drivers/dax/pmem.c       |  1 +
>  include/linux/memremap.h |  8 ++++++++
>  include/linux/mm.h       | 12 ++++++++++++
>  virt/kvm/kvm_main.c      | 24 ++++++++++++++++--------
>  4 files changed, 37 insertions(+), 8 deletions(-)
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-20 22:49       ` Yi Zhang
@ 2018-09-20 21:19         ` Dan Williams
  2018-09-21 22:47           ` Yi Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2018-09-20 21:19 UTC (permalink / raw)
  To: David Hildenbrand, KVM list, Linux Kernel Mailing List,
	linux-nvdimm, Paolo Bonzini, Dave Jiang, Zhang, Yu C,
	Pankaj Gupta, Jan Kara, Christoph Hellwig, Linux MM, rkrcmar,
	Jérôme Glisse, Zhang, Yi Z

On Thu, Sep 20, 2018 at 7:11 AM Yi Zhang <yi.z.zhang@linux.intel.com> wrote:
>
> On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
> > Am 19.09.18 um 04:53 schrieb Dan Williams:
> > >
> > > Should we consider just not setting PageReserved for
> > > devm_memremap_pages()? Perhaps kvm is not be the only component making
> > > these assumptions about this flag?
> >
> > I was asking the exact same question in v3 or so.
> >
> > I was recently going through all PageReserved users, trying to clean up
> > and document how it is used.
> >
> > PG_reserved used to be a marker "not available for the page allocator".
> > This is only partially true and not really helpful I think. My current
> > understanding:
> >
> > "
> > PG_reserved is set for special pages, struct pages of such pages should
> > in general not be touched except by their owner. Pages marked as
> > reserved include:
> > - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
> > - Pages allocated early during boot (bootmem, memblock)
> > - Zero pages
> > - Pages that have been associated with a zone but were not onlined
> >   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
> > - Pages to exclude from the hibernation image (e.g. loaded kexec images)
> > - MCA (memory error) pages on ia64
> > - Offline pages
> > Some architectures don't allow to ioremap RAM pages that are not marked
> > as reserved. Allocated pages might have to be set reserved to allow for
> > that - if there is a good reason to enforce this. Consequently,
> > PG_reserved part of a user space table might be the indicator for the
> > zero page, pmem or MMIO pages.
> > "
> >
> > Swapping code does not care about PageReserved at all as far as I
> > remember. This seems to be fine as it only looks at the way pages have
> > been mapped into user space.
> >
> > I don't really see a good reason to set pmem pages as reserved. One
> > question would be, how/if to exclude them from the hibernation image.
> > But that could also be solved differently (we would have to double check
> > how they are handled in hibernation code).
> >
> >
> > A similar user of PageReserved to look at is:
> >
> > drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
> >
> > It will not mark pages dirty if they are reserved. Similar to KVM code.
> Yes, kvm is not the only one user of the dax reserved page.
> >
> > >
> > > Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
> > >
> > > This has less to do with "dax" pages and more to do with
> > > devm_memremap_pages() established ranges. P2PDMA is another producer
> > > of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
> > > used in these kvm paths then I think this points to consider clearing
> > > the Reserved flag.
>
> Thanks Dan/David's comments.
> for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
> memory resource to share to guest, Jerome says we could ignore it at
> this time.
>
> And p2pmem, it seems mapped in a PCI bar space which should most likely
> a mmio. I think kvm should treated as a reserved page.

Ok, but the question you left unanswered is whether it would be better
for devm_memremap_pages() to clear the PageReserved flag for
MEMORY_DEVICE_{FS,DEV}_DAX rather than introduce a local kvm-only hack
for what looks like a global problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-19  7:20     ` David Hildenbrand
@ 2018-09-20 22:49       ` Yi Zhang
  2018-09-20 21:19         ` Dan Williams
  0 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2018-09-20 22:49 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
> Am 19.09.18 um 04:53 schrieb Dan Williams:
> > 
> > Should we consider just not setting PageReserved for
> > devm_memremap_pages()? Perhaps kvm is not be the only component making
> > these assumptions about this flag?
> 
> I was asking the exact same question in v3 or so.
> 
> I was recently going through all PageReserved users, trying to clean up
> and document how it is used.
> 
> PG_reserved used to be a marker "not available for the page allocator".
> This is only partially true and not really helpful I think. My current
> understanding:
> 
> "
> PG_reserved is set for special pages, struct pages of such pages should
> in general not be touched except by their owner. Pages marked as
> reserved include:
> - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
> - Pages allocated early during boot (bootmem, memblock)
> - Zero pages
> - Pages that have been associated with a zone but were not onlined
>   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
> - Pages to exclude from the hibernation image (e.g. loaded kexec images)
> - MCA (memory error) pages on ia64
> - Offline pages
> Some architectures don't allow to ioremap RAM pages that are not marked
> as reserved. Allocated pages might have to be set reserved to allow for
> that - if there is a good reason to enforce this. Consequently,
> PG_reserved part of a user space table might be the indicator for the
> zero page, pmem or MMIO pages.
> "
> 
> Swapping code does not care about PageReserved at all as far as I
> remember. This seems to be fine as it only looks at the way pages have
> been mapped into user space.
> 
> I don't really see a good reason to set pmem pages as reserved. One
> question would be, how/if to exclude them from the hibernation image.
> But that could also be solved differently (we would have to double check
> how they are handled in hibernation code).
> 
> 
> A similar user of PageReserved to look at is:
> 
> drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
> 
> It will not mark pages dirty if they are reserved. Similar to KVM code.
Yes, kvm is not the only one user of the dax reserved page. 
> 
> > 
> > Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
> > 
> > This has less to do with "dax" pages and more to do with
> > devm_memremap_pages() established ranges. P2PDMA is another producer
> > of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
> > used in these kvm paths then I think this points to consider clearing
> > the Reserved flag.

Thanks Dan/David's comments.
for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
memory resource to share to guest, Jerome says we could ignore it at
this time.

And p2pmem, it seems mapped in a PCI bar space which should most likely
a mmio. I think kvm should treated as a reserved page.
> > 
> > That said I haven't audited all the locations that test PageReserved().
> > 
> > Sorry for not responding sooner I was on extended leave.
> > 
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-21 22:47           ` Yi Zhang
@ 2018-09-21 14:23             ` David Hildenbrand
  2018-09-21 18:17               ` Dan Williams
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-09-21 14:23 UTC (permalink / raw)
  To: Dan Williams, KVM list, Linux Kernel Mailing List, linux-nvdimm,
	Paolo Bonzini, Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

On 22/09/2018 00:47, Yi Zhang wrote:
> On 2018-09-20 at 14:19:17 -0700, Dan Williams wrote:
>> On Thu, Sep 20, 2018 at 7:11 AM Yi Zhang <yi.z.zhang@linux.intel.com> wrote:
>>>
>>> On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
>>>> Am 19.09.18 um 04:53 schrieb Dan Williams:
>>>>>
>>>>> Should we consider just not setting PageReserved for
>>>>> devm_memremap_pages()? Perhaps kvm is not be the only component making
>>>>> these assumptions about this flag?
>>>>
>>>> I was asking the exact same question in v3 or so.
>>>>
>>>> I was recently going through all PageReserved users, trying to clean up
>>>> and document how it is used.
>>>>
>>>> PG_reserved used to be a marker "not available for the page allocator".
>>>> This is only partially true and not really helpful I think. My current
>>>> understanding:
>>>>
>>>> "
>>>> PG_reserved is set for special pages, struct pages of such pages should
>>>> in general not be touched except by their owner. Pages marked as
>>>> reserved include:
>>>> - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
>>>> - Pages allocated early during boot (bootmem, memblock)
>>>> - Zero pages
>>>> - Pages that have been associated with a zone but were not onlined
>>>>   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
>>>> - Pages to exclude from the hibernation image (e.g. loaded kexec images)
>>>> - MCA (memory error) pages on ia64
>>>> - Offline pages
>>>> Some architectures don't allow to ioremap RAM pages that are not marked
>>>> as reserved. Allocated pages might have to be set reserved to allow for
>>>> that - if there is a good reason to enforce this. Consequently,
>>>> PG_reserved part of a user space table might be the indicator for the
>>>> zero page, pmem or MMIO pages.
>>>> "
>>>>
>>>> Swapping code does not care about PageReserved at all as far as I
>>>> remember. This seems to be fine as it only looks at the way pages have
>>>> been mapped into user space.
>>>>
>>>> I don't really see a good reason to set pmem pages as reserved. One
>>>> question would be, how/if to exclude them from the hibernation image.
>>>> But that could also be solved differently (we would have to double check
>>>> how they are handled in hibernation code).
>>>>
>>>>
>>>> A similar user of PageReserved to look at is:
>>>>
>>>> drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
>>>>
>>>> It will not mark pages dirty if they are reserved. Similar to KVM code.
>>> Yes, kvm is not the only one user of the dax reserved page.
>>>>
>>>>>
>>>>> Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
>>>>>
>>>>> This has less to do with "dax" pages and more to do with
>>>>> devm_memremap_pages() established ranges. P2PDMA is another producer
>>>>> of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
>>>>> used in these kvm paths then I think this points to consider clearing
>>>>> the Reserved flag.
>>>
>>> Thanks Dan/David's comments.
>>> for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
>>> memory resource to share to guest, Jerome says we could ignore it at
>>> this time.
>>>
>>> And p2pmem, it seems mapped in a PCI bar space which should most likely
>>> a mmio. I think kvm should treated as a reserved page.
>>
>> Ok, but the question you left unanswered is whether it would be better
>> for devm_memremap_pages() to clear the PageReserved flag for
>> MEMORY_DEVICE_{FS,DEV}_DAX rather than introduce a local kvm-only hack
>> for what looks like a global problem.
> 
> Remove the PageReserved flag sounds more reasonable. 
> And Could we still have a flag to identify it is a device private memory, or
> where these pages coming from?

We could use a page type for that or what you proposed. (as I said, we
might have to change hibernation code to skip the pages once we drop the
reserved flag).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-21 14:23             ` David Hildenbrand
@ 2018-09-21 18:17               ` Dan Williams
  2018-09-21 19:29                 ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2018-09-21 18:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

On Fri, Sep 21, 2018 at 7:24 AM David Hildenbrand <david@redhat.com> wrote:
[..]
> > Remove the PageReserved flag sounds more reasonable.
> > And Could we still have a flag to identify it is a device private memory, or
> > where these pages coming from?
>
> We could use a page type for that or what you proposed. (as I said, we
> might have to change hibernation code to skip the pages once we drop the
> reserved flag).

I think it would be reasonable to reject all ZONE_DEVICE pages in
saveable_page().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-21 18:17               ` Dan Williams
@ 2018-09-21 19:29                 ` David Hildenbrand
  2018-10-19 16:33                   ` Barret Rhoden
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2018-09-21 19:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

On 21/09/2018 20:17, Dan Williams wrote:
> On Fri, Sep 21, 2018 at 7:24 AM David Hildenbrand <david@redhat.com> wrote:
> [..]
>>> Remove the PageReserved flag sounds more reasonable.
>>> And Could we still have a flag to identify it is a device private memory, or
>>> where these pages coming from?
>>
>> We could use a page type for that or what you proposed. (as I said, we
>> might have to change hibernation code to skip the pages once we drop the
>> reserved flag).
> 
> I think it would be reasonable to reject all ZONE_DEVICE pages in
> saveable_page().
> 

Indeed, that sounds like the easiest solution - guess that answer was
too easy for me to figure out :) .

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-20 21:19         ` Dan Williams
@ 2018-09-21 22:47           ` Yi Zhang
  2018-09-21 14:23             ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Yi Zhang @ 2018-09-21 22:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: David Hildenbrand, KVM list, Linux Kernel Mailing List,
	linux-nvdimm, Paolo Bonzini, Dave Jiang, Zhang, Yu C,
	Pankaj Gupta, Jan Kara, Christoph Hellwig, Linux MM, rkrcmar,
	Jérôme Glisse, Zhang, Yi Z

On 2018-09-20 at 14:19:17 -0700, Dan Williams wrote:
> On Thu, Sep 20, 2018 at 7:11 AM Yi Zhang <yi.z.zhang@linux.intel.com> wrote:
> >
> > On 2018-09-19 at 09:20:25 +0200, David Hildenbrand wrote:
> > > Am 19.09.18 um 04:53 schrieb Dan Williams:
> > > >
> > > > Should we consider just not setting PageReserved for
> > > > devm_memremap_pages()? Perhaps kvm is not be the only component making
> > > > these assumptions about this flag?
> > >
> > > I was asking the exact same question in v3 or so.
> > >
> > > I was recently going through all PageReserved users, trying to clean up
> > > and document how it is used.
> > >
> > > PG_reserved used to be a marker "not available for the page allocator".
> > > This is only partially true and not really helpful I think. My current
> > > understanding:
> > >
> > > "
> > > PG_reserved is set for special pages, struct pages of such pages should
> > > in general not be touched except by their owner. Pages marked as
> > > reserved include:
> > > - Kernel image (including vDSO) and similar (e.g. BIOS, initrd)
> > > - Pages allocated early during boot (bootmem, memblock)
> > > - Zero pages
> > > - Pages that have been associated with a zone but were not onlined
> > >   (e.g. NVDIMM/pmem, online_page_callback used by XEN)
> > > - Pages to exclude from the hibernation image (e.g. loaded kexec images)
> > > - MCA (memory error) pages on ia64
> > > - Offline pages
> > > Some architectures don't allow to ioremap RAM pages that are not marked
> > > as reserved. Allocated pages might have to be set reserved to allow for
> > > that - if there is a good reason to enforce this. Consequently,
> > > PG_reserved part of a user space table might be the indicator for the
> > > zero page, pmem or MMIO pages.
> > > "
> > >
> > > Swapping code does not care about PageReserved at all as far as I
> > > remember. This seems to be fine as it only looks at the way pages have
> > > been mapped into user space.
> > >
> > > I don't really see a good reason to set pmem pages as reserved. One
> > > question would be, how/if to exclude them from the hibernation image.
> > > But that could also be solved differently (we would have to double check
> > > how they are handled in hibernation code).
> > >
> > >
> > > A similar user of PageReserved to look at is:
> > >
> > > drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn()
> > >
> > > It will not mark pages dirty if they are reserved. Similar to KVM code.
> > Yes, kvm is not the only one user of the dax reserved page.
> > >
> > > >
> > > > Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?
> > > >
> > > > This has less to do with "dax" pages and more to do with
> > > > devm_memremap_pages() established ranges. P2PDMA is another producer
> > > > of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
> > > > used in these kvm paths then I think this points to consider clearing
> > > > the Reserved flag.
> >
> > Thanks Dan/David's comments.
> > for MEMORY_DEVICE_PUBLIC memory, since host driver could manager the
> > memory resource to share to guest, Jerome says we could ignore it at
> > this time.
> >
> > And p2pmem, it seems mapped in a PCI bar space which should most likely
> > a mmio. I think kvm should treated as a reserved page.
> 
> Ok, but the question you left unanswered is whether it would be better
> for devm_memremap_pages() to clear the PageReserved flag for
> MEMORY_DEVICE_{FS,DEV}_DAX rather than introduce a local kvm-only hack
> for what looks like a global problem.

Remove the PageReserved flag sounds more reasonable. 
And Could we still have a flag to identify it is a device private memory, or
where these pages coming from?
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-09-21 19:29                 ` David Hildenbrand
@ 2018-10-19 16:33                   ` Barret Rhoden
  2018-10-22  8:47                     ` Yi Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Barret Rhoden @ 2018-10-19 16:33 UTC (permalink / raw)
  To: David Hildenbrand, Dan Williams
  Cc: KVM list, Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z

On 2018-09-21 at 21:29 David Hildenbrand <david@redhat.com> wrote:
> On 21/09/2018 20:17, Dan Williams wrote:
> > On Fri, Sep 21, 2018 at 7:24 AM David Hildenbrand <david@redhat.com> wrote:
> > [..]  
> >>> Remove the PageReserved flag sounds more reasonable.
> >>> And Could we still have a flag to identify it is a device private memory, or
> >>> where these pages coming from?  
> >>
> >> We could use a page type for that or what you proposed. (as I said, we
> >> might have to change hibernation code to skip the pages once we drop the
> >> reserved flag).  
> > 
> > I think it would be reasonable to reject all ZONE_DEVICE pages in
> > saveable_page().
> >   
> 
> Indeed, that sounds like the easiest solution - guess that answer was
> too easy for me to figure out :) .
> 

Just to follow-up, is the plan to clear PageReserved for nvdimm pages
instead of the approach taken in this patch set?  Or should we special
case nvdimm/dax pages in kvm_is_reserved_pfn()?

Thanks,

Barret




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.
  2018-10-19 16:33                   ` Barret Rhoden
@ 2018-10-22  8:47                     ` Yi Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Yi Zhang @ 2018-10-22  8:47 UTC (permalink / raw)
  To: Barret Rhoden
  Cc: David Hildenbrand, Dan Williams, KVM list,
	Linux Kernel Mailing List, linux-nvdimm, Paolo Bonzini,
	Dave Jiang, Zhang, Yu C, Pankaj Gupta, Jan Kara,
	Christoph Hellwig, Linux MM, rkrcmar, Jérôme Glisse,
	Zhang, Yi Z, Alexander Duyck


[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On 2018-10-19 at 12:33:48 -0400, Barret Rhoden wrote:
> On 2018-09-21 at 21:29 David Hildenbrand <david@redhat.com> wrote:
> > On 21/09/2018 20:17, Dan Williams wrote:
> > > On Fri, Sep 21, 2018 at 7:24 AM David Hildenbrand <david@redhat.com> wrote:
> > > [..]  
> > >>> Remove the PageReserved flag sounds more reasonable.
> > >>> And Could we still have a flag to identify it is a device private memory, or
> > >>> where these pages coming from?  
> > >>
> > >> We could use a page type for that or what you proposed. (as I said, we
> > >> might have to change hibernation code to skip the pages once we drop the
> > >> reserved flag).  
> > > 
> > > I think it would be reasonable to reject all ZONE_DEVICE pages in
> > > saveable_page().
> > >   
> > 
> > Indeed, that sounds like the easiest solution - guess that answer was
> > too easy for me to figure out :) .
> > 
> 
> Just to follow-up, is the plan to clear PageReserved for nvdimm pages
> instead of the approach taken in this patch set?  Or should we special
> case nvdimm/dax pages in kvm_is_reserved_pfn()?
Yes, we are going to remove the PageReserved flag for nvdimm pages.
Added Alex, attached the patch-set.
> 
> Thanks,
> 
> Barret
> 
> 
> 

[-- Attachment #2: Type: message/rfc822, Size: 6180 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V5 1/4] kvm: remove redundant reserved page check
  2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi
@ 2018-10-24 14:32   ` Yi Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Yi Zhang @ 2018-10-24 14:32 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, yu.c.zhang, pagupta, david, jack, hch
  Cc: linux-mm, rkrcmar, jglisse, yi.z.zhang

On 2018-09-08 at 02:03:28 +0800, Zhang Yi wrote:
> PageReserved() is already checked inside kvm_is_reserved_pfn(),
> remove it from kvm_set_pfn_dirty().
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Acked-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  virt/kvm/kvm_main.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b47507f..c44c406 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1690,12 +1690,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
>  
>  void kvm_set_pfn_dirty(kvm_pfn_t pfn)
>  {
> -	if (!kvm_is_reserved_pfn(pfn)) {
> -		struct page *page = pfn_to_page(pfn);
> -
> -		if (!PageReserved(page))
> -			SetPageDirty(page);
> -	}
> +	if (!kvm_is_reserved_pfn(pfn))
> +		SetPageDirty(pfn_to_page(pfn));
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>  
> -- 
> 2.7.4
>
Hi Paolo,
We will remove the reserved flag in dax pages, then patch 2[3,4]/4 is
unnecessary,  can we queue this 1/4 to next merge? 

Thank you very much.
Yi

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 18:03 [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Zhang Yi
2018-09-07 17:04 ` Ahmed S. Darwish
2018-09-18 14:31   ` Yi Zhang
2018-09-07 18:03 ` [PATCH V5 1/4] kvm: remove redundant reserved page check Zhang Yi
2018-10-24 14:32   ` Yi Zhang
2018-09-07 18:03 ` [PATCH V5 2/4] mm: introduce memory type MEMORY_DEVICE_DEV_DAX Zhang Yi
2018-09-07 18:03 ` [PATCH V5 3/4] mm: add a function to differentiate the pages is from DAX device memory Zhang Yi
2018-09-07 18:04 ` [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem Zhang Yi
2018-09-19  2:53   ` Dan Williams
2018-09-19  7:20     ` David Hildenbrand
2018-09-20 22:49       ` Yi Zhang
2018-09-20 21:19         ` Dan Williams
2018-09-21 22:47           ` Yi Zhang
2018-09-21 14:23             ` David Hildenbrand
2018-09-21 18:17               ` Dan Williams
2018-09-21 19:29                 ` David Hildenbrand
2018-10-19 16:33                   ` Barret Rhoden
2018-10-22  8:47                     ` Yi Zhang
2018-09-19 10:55 ` [PATCH V5 0/4] Fix kvm misconceives NVDIMM pages as reserved mmio Yi Zhang
2018-09-19  2:43   ` Pankaj Gupta

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lkml.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lkml.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lkml.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git