LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE
@ 2021-07-28 8:22 Jia He
2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He
0 siblings, 1 reply; 7+ messages in thread
From: Jia He @ 2021-07-28 8:22 UTC (permalink / raw)
To: Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd, Jia He
Background
==========
I once sent out the preparatory patches [1] but I dropped the last patch
of using fallback nid with memory_add_physaddr_to_nid() due to some
dependency.
After phys_addr_to_target_node() and memory_add_physaddr_to_nid() are
stable now, it's time to fix the original bug on arm64 now.
Compared with the last version [2], this version rebases the patch to
latest v5.14-rc3 (s/kmem_start/range.start)
Test
====
Tested on ThunderX2 host/qemu "-M virt" guest with a nvdimm device. The
memblocks from the dax pmem device can be either hot-added or hot-removed
on arm64 guest. Also passed the compilation test on x86.
[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2228771.html
[2] https://lkml.org/lkml/2020/7/8/1546
Jia He (1):
device-dax: use fallback nid when numa_node is invalid
drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-28 8:22 [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE Jia He
@ 2021-07-28 8:22 ` Jia He
2021-07-28 20:17 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Jia He @ 2021-07-28 8:22 UTC (permalink / raw)
To: Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd, Jia He
Previously, numa_off was set unconditionally in dummy_numa_init()
even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
after acpi_map_pxm_to_node() because it regards numa_off as turning
off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
arm64 with fake numa.
Without this patch, pmem can't be probed as a RAM device on arm64 if
SRAT table isn't present:
$ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
kmem: probe of dax0.0 failed with error -22
This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index ac231cc36359..749674909e51 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
struct dax_kmem_data *data;
int rc = -ENOMEM;
int i, mapped = 0;
- int numa_node;
-
- /*
- * Ensure good NUMA information for the persistent memory.
- * Without this check, there is a risk that slow memory
- * could be mixed in a node with faster memory, causing
- * unavoidable performance issues.
- */
- numa_node = dev_dax->target_node;
- if (numa_node < 0) {
- dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
- numa_node);
- return -EINVAL;
- }
+ int numa_node = dev_dax->target_node, new_node;
data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
if (!data)
@@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
*/
res->flags = IORESOURCE_SYSTEM_RAM;
+ /*
+ * Ensure good NUMA information for the persistent memory.
+ * Without this check, there is a risk but not fatal that slow
+ * memory could be mixed in a node with faster memory, causing
+ * unavoidable performance issues. Furthermore, fallback node
+ * id can be used when numa_node is invalid.
+ */
+ if (numa_node < 0) {
+ new_node = memory_add_physaddr_to_nid(range.start);
+ dev_info(dev, "changing nid from %d to %d for DAX region %pR\n",
+ numa_node, new_node, res);
+ numa_node = new_node;
+ }
+
/*
* Ensure that future kexec'd kernels will not treat
* this as RAM automatically.
@@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
int i, success = 0;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
+ int numa_node = dev_dax->target_node;
/*
* We have one shot for removing memory, if some memory blocks were not
@@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
if (rc)
continue;
- rc = remove_memory(dev_dax->target_node, range.start,
- range_len(&range));
+ if (numa_node < 0)
+ numa_node = memory_add_physaddr_to_nid(range.start);
+
+ rc = remove_memory(numa_node, range.start, range_len(&range));
if (rc == 0) {
release_resource(data->res[i]);
kfree(data->res[i]);
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He
@ 2021-07-28 20:17 ` David Hildenbrand
2021-07-29 0:20 ` Justin He
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2021-07-28 20:17 UTC (permalink / raw)
To: Jia He, Dan Williams, Vishal Verma, Dave Jiang; +Cc: nvdimm, linux-kernel, nd
On 28.07.21 10:22, Jia He wrote:
> Previously, numa_off was set unconditionally in dummy_numa_init()
> even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
> after acpi_map_pxm_to_node() because it regards numa_off as turning
> off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> arm64 with fake numa.
>
> Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table isn't present:
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> kmem: probe of dax0.0 failed with error -22
>
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
> drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
> 1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index ac231cc36359..749674909e51 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> struct dax_kmem_data *data;
> int rc = -ENOMEM;
> int i, mapped = 0;
> - int numa_node;
> -
> - /*
> - * Ensure good NUMA information for the persistent memory.
> - * Without this check, there is a risk that slow memory
> - * could be mixed in a node with faster memory, causing
> - * unavoidable performance issues.
> - */
> - numa_node = dev_dax->target_node;
> - if (numa_node < 0) {
> - dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> - numa_node);
> - return -EINVAL;
> - }
> + int numa_node = dev_dax->target_node, new_node;
>
> data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
> if (!data)
> @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> */
> res->flags = IORESOURCE_SYSTEM_RAM;
>
> + /*
> + * Ensure good NUMA information for the persistent memory.
> + * Without this check, there is a risk but not fatal that slow
> + * memory could be mixed in a node with faster memory, causing
> + * unavoidable performance issues. Furthermore, fallback node
> + * id can be used when numa_node is invalid.
> + */
> + if (numa_node < 0) {
> + new_node = memory_add_physaddr_to_nid(range.start);
> + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n",
> + numa_node, new_node, res);
> + numa_node = new_node;
> + }
> +
> /*
> * Ensure that future kexec'd kernels will not treat
> * this as RAM automatically.
> @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> int i, success = 0;
> struct device *dev = &dev_dax->dev;
> struct dax_kmem_data *data = dev_get_drvdata(dev);
> + int numa_node = dev_dax->target_node;
>
> /*
> * We have one shot for removing memory, if some memory blocks were not
> @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> if (rc)
> continue;
>
> - rc = remove_memory(dev_dax->target_node, range.start,
> - range_len(&range));
> + if (numa_node < 0)
> + numa_node = memory_add_physaddr_to_nid(range.start);
> +
> + rc = remove_memory(numa_node, range.start, range_len(&range));
> if (rc == 0) {
> release_resource(data->res[i]);
> kfree(data->res[i]);
>
Note that this patch conflicts with:
https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
But nothing fundamental. Determining a single NID is similar to how I'm
handling it for ACPI:
https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-28 20:17 ` David Hildenbrand
@ 2021-07-29 0:20 ` Justin He
2021-07-29 7:59 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Justin He @ 2021-07-29 0:20 UTC (permalink / raw)
To: David Hildenbrand, Dan Williams, Vishal Verma, Dave Jiang
Cc: nvdimm, linux-kernel, nd
Hi David
> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Thursday, July 29, 2021 4:17 AM
> To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>;
> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>
> Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com>
> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
>
> On 28.07.21 10:22, Jia He wrote:
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa.
> >
> > Without this patch, pmem can't be probed as a RAM device on arm64 if
> > SRAT table isn't present:
> > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> > kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> > drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
> > 1 file changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index ac231cc36359..749674909e51 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > struct dax_kmem_data *data;
> > int rc = -ENOMEM;
> > int i, mapped = 0;
> > - int numa_node;
> > -
> > - /*
> > - * Ensure good NUMA information for the persistent memory.
> > - * Without this check, there is a risk that slow memory
> > - * could be mixed in a node with faster memory, causing
> > - * unavoidable performance issues.
> > - */
> > - numa_node = dev_dax->target_node;
> > - if (numa_node < 0) {
> > - dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> > - numa_node);
> > - return -EINVAL;
> > - }
> > + int numa_node = dev_dax->target_node, new_node;
> >
> > data = kzalloc(struct_size(data, res, dev_dax->nr_range),
> GFP_KERNEL);
> > if (!data)
> > @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > */
> > res->flags = IORESOURCE_SYSTEM_RAM;
> >
> > + /*
> > + * Ensure good NUMA information for the persistent memory.
> > + * Without this check, there is a risk but not fatal that slow
> > + * memory could be mixed in a node with faster memory, causing
> > + * unavoidable performance issues. Furthermore, fallback node
> > + * id can be used when numa_node is invalid.
> > + */
> > + if (numa_node < 0) {
> > + new_node = memory_add_physaddr_to_nid(range.start);
> > + dev_info(dev, "changing nid from %d to %d for DAX
> region %pR\n",
> > + numa_node, new_node, res);
> > + numa_node = new_node;
> > + }
> > +
> > /*
> > * Ensure that future kexec'd kernels will not treat
> > * this as RAM automatically.
> > @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> > int i, success = 0;
> > struct device *dev = &dev_dax->dev;
> > struct dax_kmem_data *data = dev_get_drvdata(dev);
> > + int numa_node = dev_dax->target_node;
> >
> > /*
> > * We have one shot for removing memory, if some memory blocks were
> not
> > @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> > if (rc)
> > continue;
> >
> > - rc = remove_memory(dev_dax->target_node, range.start,
> > - range_len(&range));
> > + if (numa_node < 0)
> > + numa_node = memory_add_physaddr_to_nid(range.start);
> > +
> > + rc = remove_memory(numa_node, range.start, range_len(&range));
> > if (rc == 0) {
> > release_resource(data->res[i]);
> > kfree(data->res[i]);
> >
>
> Note that this patch conflicts with:
>
> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
>
> But nothing fundamental. Determining a single NID is similar to how I'm
> handling it for ACPI:
>
> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
>
Okay, got it. Thanks for the reminder.
Seems my patch is not useful after your patch.
--
Cheers,
Justin (Jia He)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-29 0:20 ` Justin He
@ 2021-07-29 7:59 ` David Hildenbrand
2021-07-29 14:44 ` Justin He
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2021-07-29 7:59 UTC (permalink / raw)
To: Justin He, Dan Williams, Vishal Verma, Dave Jiang
Cc: nvdimm, linux-kernel, nd
Hi Justin,
>>>
>>
>> Note that this patch conflicts with:
>>
>> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
>>
>> But nothing fundamental. Determining a single NID is similar to how I'm
>> handling it for ACPI:
>>
>> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
>>
>
> Okay, got it. Thanks for the reminder.
> Seems my patch is not useful after your patch.
>
I think your patch still makes sense. With
https://lore.kernel.org/linux-acpi/20210723125210.29987-7-david@redhat.com/
We'd have to detect the node id in the first loop instead.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-29 7:59 ` David Hildenbrand
@ 2021-07-29 14:44 ` Justin He
2021-08-02 15:47 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Justin He @ 2021-07-29 14:44 UTC (permalink / raw)
To: David Hildenbrand, Dan Williams, Vishal Verma, Dave Jiang
Cc: nvdimm, linux-kernel, nd
Hi David
> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Thursday, July 29, 2021 3:59 PM
> To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>;
> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>
> Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com>
> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is
> invalid
>
> Hi Justin,
>
> >>>
> >>
> >> Note that this patch conflicts with:
> >>
> >> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
> >>
> >> But nothing fundamental. Determining a single NID is similar to how I'm
> >> handling it for ACPI:
> >>
> >> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
> >>
> >
> > Okay, got it. Thanks for the reminder.
> > Seems my patch is not useful after your patch.
> >
>
> I think your patch still makes sense. With
>
> https://lore.kernel.org/linux-acpi/20210723125210.29987-7-
> david@redhat.com/
>
> We'd have to detect the node id in the first loop instead.
Ok, I got your point. I will do that in v2.
Btw, sorry for commenting there about your patch 06 since I didn't
subscribe lkml via this mailbox.
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc) {
+ dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n",
+ i, range.start, range.end);
+ continue;
+ }
+ total_len += range_len(&range);
+ }
You add an additional loop to get the total_len.
I wonder is it independent on 2nd loop?
If yes, why not merge the 2 loops into one?
Sorry if this question is too simple, I don't know too much
about the background of your patch.
--
Cheers,
Justin (Jia He)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
2021-07-29 14:44 ` Justin He
@ 2021-08-02 15:47 ` David Hildenbrand
0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2021-08-02 15:47 UTC (permalink / raw)
To: Justin He, Dan Williams, Vishal Verma, Dave Jiang
Cc: nvdimm, linux-kernel, nd
On 29.07.21 16:44, Justin He wrote:
> Hi David
>
>> -----Original Message-----
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Thursday, July 29, 2021 3:59 PM
>> To: Justin He <Justin.He@arm.com>; Dan Williams <dan.j.williams@intel.com>;
>> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>
>> Cc: nvdimm@lists.linux.dev; linux-kernel@vger.kernel.org; nd <nd@arm.com>
>> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is
>> invalid
>>
>> Hi Justin,
>>
>>>>>
>>>>
>>>> Note that this patch conflicts with:
>>>>
>>>> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
>>>>
>>>> But nothing fundamental. Determining a single NID is similar to how I'm
>>>> handling it for ACPI:
>>>>
>>>> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
>>>>
>>>
>>> Okay, got it. Thanks for the reminder.
>>> Seems my patch is not useful after your patch.
>>>
>>
>> I think your patch still makes sense. With
>>
>> https://lore.kernel.org/linux-acpi/20210723125210.29987-7-
>> david@redhat.com/
>>
>> We'd have to detect the node id in the first loop instead.
>
> Ok, I got your point. I will do that in v2.
>
> Btw, sorry for commenting there about your patch 06 since I didn't
> subscribe lkml via this mailbox.
Sure, you really should subscribe :)
> > + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range range;
> +
> + rc = dax_kmem_range(dev_dax, i, &range);
> + if (rc) {
> + dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n",
> + i, range.start, range.end);
> + continue;
> + }
> + total_len += range_len(&range);
> + }
> You add an additional loop to get the total_len.
> I wonder is it independent on 2nd loop?
> If yes, why not merge the 2 loops into one?
> Sorry if this question is too simple, I don't know too much
> about the background of your patch.
We need total_len to register the memory group. We need the memory group
to add memory. Therefore, we need a second loop to calculate total_len
upfront.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-02 15:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 8:22 [PATCH 0/1] fix pmem RAM device when nid is NUMA_NO_NODE Jia He
2021-07-28 8:22 ` [PATCH] device-dax: use fallback nid when numa_node is invalid Jia He
2021-07-28 20:17 ` David Hildenbrand
2021-07-29 0:20 ` Justin He
2021-07-29 7:59 ` David Hildenbrand
2021-07-29 14:44 ` Justin He
2021-08-02 15:47 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).