LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Bharata B Rao <bharata@amd.com>
To: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Cc: <akpm@linux-foundation.org>, <kamezawa.hiroyu@jp.fujitsu.com>,
<lee.schermerhorn@hp.com>, <mgorman@suse.de>,
<Krupa.Ramakrishnan@amd.com>, <Sadagopan.Srinivasan@amd.com>,
Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>,
Bharata B Rao <bharata@amd.com>
Subject: [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building node fallback list
Date: Mon, 30 Aug 2021 17:46:03 +0530 [thread overview]
Message-ID: <20210830121603.1081-3-bharata@amd.com> (raw)
In-Reply-To: <20210830121603.1081-1-bharata@amd.com>
From: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>
In build_zonelists(), when the fallback list is built for the nodes,
the node load gets reinitialized during each iteration. This results
in nodes with same distances occupying the same slot in different
node fallback lists rather than appearing in the intended round-
robin manner. This results in one node getting picked for allocation
more compared to other nodes with the same distance.
As an example, consider a 4 node system with the following distance
matrix.
Node 0 1 2 3
----------------
0 10 12 32 32
1 12 10 32 32
2 32 32 10 12
3 32 32 12 10
For this case, the node fallback list gets built like this:
Node Fallback list
---------------------
0 0 1 2 3
1 1 0 3 2
2 2 3 0 1
3 3 2 0 1 <-- Unexpected fallback order
In the fallback list for nodes 2 and 3, the nodes 0 and 1
appear in the same order which results in more allocations
getting satisfied from node 0 compared to node 1.
The effect of this on remote memory bandwidth as seen by stream
benchmark is shown below:
Case 1: Bandwidth from cores on nodes 2 & 3 to memory on nodes 0 & 1
(numactl -m 0,1 ./stream_lowOverhead ... --cores <from 2, 3>)
Case 2: Bandwidth from cores on nodes 0 & 1 to memory on nodes 2 & 3
(numactl -m 2,3 ./stream_lowOverhead ... --cores <from 0, 1>)
----------------------------------------
BANDWIDTH (MB/s)
TEST Case 1 Case 2
----------------------------------------
COPY 57479.6 110791.8
SCALE 55372.9 105685.9
ADD 50460.6 96734.2
TRIADD 50397.6 97119.1
----------------------------------------
The bandwidth drop in Case 1 occurs because most of the allocations
get satisfied by node 0 as it appears first in the fallback order
for both nodes 2 and 3.
This can be fixed by accumulating the node load in build_zonelists()
rather than reinitializing it during each iteration. With this the
nodes with the same distance rightly get assigned in the round robin
manner. In fact this was how it was originally until the
commit f0c0b2b808f2 ("change zonelist order: zonelist order selection
logic") dropped the load accumulation and resorted to initializing
the load during each iteration. While zonelist ordering was removed by
commit c9bff3eebc09 ("mm, page_alloc: rip out ZONELIST_ORDER_ZONE"),
the change to the node load accumulation in build_zonelists() remained.
So essentially this patch reverts back to the accumulated node load
logic.
After this fix, the fallback order gets built like this:
Node Fallback list
------------------
0 0 1 2 3
1 1 0 3 2
2 2 3 0 1
3 3 2 1 0 <-- Note the change here
The bandwidth in Case 1 improves and matches Case 2 as shown below.
----------------------------------------
BANDWIDTH (MB/s)
TEST Case 1 Case 2
----------------------------------------
COPY 110438.9 110107.2
SCALE 105930.5 105817.5
ADD 97005.1 96159.8
TRIADD 97441.5 96757.1
----------------------------------------
The correctness of the fallback list generation has been verified
for the above node configuration where the node 3 starts as
memory-less node and comes up online only during memory hotplug.
[bharata@amd.com: Added changelog, review, test validation]
Fixes: f0c0b2b808f2 ("change zonelist order: zonelist order selection
logic")
Signed-off-by: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>
Co-developed-by: Sadagopan Srinivasan <Sadagopan.Srinivasan@amd.com>
Signed-off-by: Sadagopan Srinivasan <Sadagopan.Srinivasan@amd.com>
Signed-off-by: Bharata B Rao <bharata@amd.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 22f7ad6ec11c..47f4d160971e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6268,7 +6268,7 @@ static void build_zonelists(pg_data_t *pgdat)
*/
if (node_distance(local_node, node) !=
node_distance(local_node, prev_node))
- node_load[node] = load;
+ node_load[node] += load;
node_order[nr_nodes++] = node;
prev_node = node;
--
2.25.1
next prev parent reply other threads:[~2021-08-30 12:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-30 12:16 [FIX PATCH 0/2] Fix NUMA nodes fallback list ordering Bharata B Rao
2021-08-30 12:16 ` [FIX PATCH 1/2] mm/page_alloc: Print node fallback order Bharata B Rao
2021-08-30 12:26 ` Mel Gorman
2021-09-03 4:15 ` Anshuman Khandual
2021-09-03 4:17 ` Bharata B Rao
2021-09-03 4:31 ` Anshuman Khandual
2021-08-30 12:16 ` Bharata B Rao [this message]
2021-08-30 12:29 ` [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building node fallback list Mel Gorman
2021-08-31 9:58 ` Anshuman Khandual
2021-08-31 15:26 ` Ramakrishnan, Krupa
2021-09-03 4:01 ` Anshuman Khandual
2021-09-03 4:20 ` Anshuman Khandual
2021-09-03 4:43 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210830121603.1081-3-bharata@amd.com \
--to=bharata@amd.com \
--cc=Krupa.Ramakrishnan@amd.com \
--cc=Sadagopan.Srinivasan@amd.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--subject='Re: [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building node fallback list' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).