LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Michael Ellerman <mpe@ellerman.id.au> Cc: LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net>, Rik van Riel <riel@surriel.com>, Srikar Dronamraju <srikar@linux.vnet.ibm.com>, Thomas Gleixner <tglx@linutronix.de>, Valentin Schneider <valentin.schneider@arm.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, linuxppc-dev@lists.ozlabs.org, Nathan Lynch <nathanl@linux.ibm.com>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>, Laurent Dufour <ldufour@linux.ibm.com> Subject: [PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes Date: Thu, 1 Jul 2021 09:45:52 +0530 [thread overview] Message-ID: <20210701041552.112072-3-srikar@linux.vnet.ibm.com> (raw) In-Reply-To: <20210701041552.112072-1-srikar@linux.vnet.ibm.com> Currently scheduler populates the distance map by looking at distance of each node from all other nodes. This should work for most architectures and platforms. Scheduler expects unique number of node distances to be available at boot. It uses node distance to calculate this unique node distances. On Power Servers, node distances for offline nodes is not available. However, Power Servers already knows unique possible node distances. Fake the offline node's distance_lookup_table entries so that all possible node distances are updated. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then distance info from numactl will look like node distances: node 0 1 0: 10 20 1: 20 10 It may be implementation dependent on what node_distance(0,3) where node 0 is online and node 3 is offline. In Power Servers case, it returns LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max distance between nodes is 20. However that would not be true. When Nodes are onlined and CPUs from those nodes are hotplugged, the max node distance would be 40. However this only needs to be done if the number of unique node distances that can be computed for online nodes is less than the number of possible unique node distances as represented by distance_ref_points_depth. When the node is actually onlined, distance_lookup_table will be updated with actual entries. Cc: LKML <linux-kernel@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Rik van Riel <riel@surriel.com> Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> --- Changelog v1->v2: Move to a Powerpc specific solution as suggested by Peter and Valentin arch/powerpc/mm/numa.c | 70 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index f2bf98bdcea2..6d0d89127190 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -860,6 +860,75 @@ void __init dump_numa_cpu_topology(void) } } +/* + * Scheduler expects unique number of node distances to be available at + * boot. It uses node distance to calculate this unique node distances. On + * POWER, node distances for offline nodes is not available. However, POWER + * already knows unique possible node distances. Fake the offline node's + * distance_lookup_table entries so that all possible node distances are + * updated. + */ +void __init fake_update_distance_lookup_table(void) +{ + unsigned long distance_map; + int i, nr_levels, nr_depth, node; + + if (!numa_enabled) + return; + + if (!form1_affinity) + return; + + /* + * distance_ref_points_depth lists the unique numa domains + * available. However it ignore LOCAL_DISTANCE. So add +1 + * to get the actual number of unique distances. + */ + nr_depth = distance_ref_points_depth + 1; + + WARN_ON(nr_depth > sizeof(distance_map)); + + bitmap_zero(&distance_map, nr_depth); + bitmap_set(&distance_map, 0, 1); + + for_each_online_node(node) { + int nd, distance = LOCAL_DISTANCE; + + if (node == first_online_node) + continue; + + nd = __node_distance(node, first_online_node); + for (i = 0; i < nr_depth; i++, distance *= 2) { + if (distance == nd) { + bitmap_set(&distance_map, i, 1); + break; + } + } + nr_levels = bitmap_weight(&distance_map, nr_depth); + if (nr_levels == nr_depth) + return; + } + + for_each_node(node) { + if (node_online(node)) + continue; + + i = find_first_zero_bit(&distance_map, nr_depth); + if (i >= nr_depth || i == 0) { + pr_warn("Levels(%d) not matching levels(%d)", nr_levels, nr_depth); + return; + } + + bitmap_set(&distance_map, i, 1); + while (i--) + distance_lookup_table[node][i] = node; + + nr_levels = bitmap_weight(&distance_map, nr_depth); + if (nr_levels == nr_depth) + return; + } +} + /* Initialize NODE_DATA for a node on the local memory */ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) { @@ -975,6 +1044,7 @@ void __init mem_topology_setup(void) */ numa_setup_cpu(cpu); } + fake_update_distance_lookup_table(); } void __init initmem_init(void) -- 2.27.0
next prev parent reply other threads:[~2021-07-01 4:16 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-01 4:15 [PATCH v2 0/2] Skip numa distance for offline nodes Srikar Dronamraju 2021-07-01 4:15 ` [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes Srikar Dronamraju 2021-07-01 14:28 ` Valentin Schneider 2021-07-12 12:48 ` Srikar Dronamraju 2021-07-13 16:32 ` Valentin Schneider 2021-07-23 14:39 ` Srikar Dronamraju 2021-08-04 10:01 ` Srikar Dronamraju 2021-08-04 10:20 ` Valentin Schneider 2021-08-08 15:56 ` Valentin Schneider 2021-08-09 6:52 ` Srikar Dronamraju 2021-08-09 12:52 ` Valentin Schneider 2021-08-10 11:47 ` Srikar Dronamraju 2021-08-16 10:33 ` Srikar Dronamraju 2021-08-17 0:01 ` Valentin Schneider 2021-07-01 4:15 ` Srikar Dronamraju [this message] 2021-07-01 9:36 ` [PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes kernel test robot 2021-07-01 10:20 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210701041552.112072-3-srikar@linux.vnet.ibm.com \ --to=srikar@linux.vnet.ibm.com \ --cc=Geetika.Moolchandani1@ibm.com \ --cc=dietmar.eggemann@arm.com \ --cc=ego@linux.vnet.ibm.com \ --cc=ldufour@linux.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mgorman@techsingularity.net \ --cc=mingo@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=nathanl@linux.ibm.com \ --cc=peterz@infradead.org \ --cc=riel@surriel.com \ --cc=tglx@linutronix.de \ --cc=valentin.schneider@arm.com \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).