LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Valentin Schneider <valentin.schneider@arm.com> Cc: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Michael Ellerman <mpe@ellerman.id.au>, LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net>, Rik van Riel <riel@surriel.com>, Thomas Gleixner <tglx@linutronix.de>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, linuxppc-dev@lists.ozlabs.org, Nathan Lynch <nathanl@linux.ibm.com>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>, Laurent Dufour <ldufour@linux.ibm.com> Subject: Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes Date: Tue, 10 Aug 2021 17:17:27 +0530 [thread overview] Message-ID: <20210810114727.GB21942@linux.vnet.ibm.com> (raw) In-Reply-To: <875yweaig9.mognet@arm.com> * Valentin Schneider <valentin.schneider@arm.com> [2021-08-09 13:52:38]: > On 09/08/21 12:22, Srikar Dronamraju wrote: > > * Valentin Schneider <valentin.schneider@arm.com> [2021-08-08 16:56:47]: > >> Wait, doesn't the distance matrix (without any offline node) say > >> > >> distance(0, 3) == 40 > >> > >> ? We should have at the very least: > >> > >> node 0 1 2 3 > >> 0: 10 20 ?? 40 > >> 1: 20 20 ?? 40 > >> 2: ?? ?? ?? ?? > >> 3: 40 40 ?? 10 > >> > > > > Before onlining node 3 and CPU 3 (node/CPU 0 and 1 are already online) > > Note: Node 2-7 and CPU 2-7 are still offline. > > > > node 0 1 2 3 > > 0: 10 20 40 10 > > 1: 20 20 40 10 > > 2: 40 40 10 10 > > 3: 10 10 10 10 > > > > NODE->mask(0) == 0 > > NODE->mask(1) == 1 > > NODE->mask(2) == 0 > > NODE->mask(3) == 0 > > > > Note: This is with updating Node 2's distance as 40 for figuring out > > the number of numa levels. Since we have all possible distances, we > > dont update Node 3 distance, so it will be as if its local to node 0. > > > > Now when Node 3 and CPU 3 are onlined > > Note: Node 2, 3-7 and CPU 2, 3-7 are still offline. > > > > node 0 1 2 3 > > 0: 10 20 40 40 > > 1: 20 20 40 40 > > 2: 40 40 10 40 > > 3: 40 40 40 10 > > > > NODE->mask(0) == 0 > > NODE->mask(1) == 1 > > NODE->mask(2) == 0 > > NODE->mask(3) == 0,3 > > > > CPU 0 continues to be part of Node->mask(3) because when we online and > > we find the right distance, there is no API to reset the numa mask of > > 3 to remove CPU 0 from the numa masks. > > > > If we had an API to clear/set sched_domains_numa_masks[node][] when > > the node state changes, we could probably plug-in to clear/set the > > node masks whenever node state changes. > > > > Gotcha, this is now coming back to me... > > [...] > > >> Ok, so it looks like we really can't do without that part - even if we get > >> "sensible" distance values for the online nodes, we can't divine values for > >> the offline ones. > >> > > > > Yes > > > > Argh, while your approach does take care of the masks, it leaves > sched_numa_topology_type unchanged. You *can* force an update of it, but > yuck :( > > I got to the below... > Yes, I completely missed that we should update sched_numa_topology_type. > --- > From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Date: Thu, 1 Jul 2021 09:45:51 +0530 > Subject: [PATCH 1/1] sched/topology: Skip updating masks for non-online nodes > > The scheduler currently expects NUMA node distances to be stable from init > onwards, and as a consequence builds the related data structures > once-and-for-all at init (see sched_init_numa()). > > Unfortunately, on some architectures node distance is unreliable for > offline nodes and may very well change upon onlining. > > Skip over offline nodes during sched_init_numa(). Track nodes that have > been onlined at least once, and trigger a build of a node's NUMA masks when > it is first onlined post-init. > Your version is much much better than mine. And I have verified that it works as expected. > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> > --- > kernel/sched/topology.c | 65 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 65 insertions(+) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index b77ad49dc14f..cba95793a9b7 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1482,6 +1482,8 @@ int sched_max_numa_distance; > static int *sched_domains_numa_distance; > static struct cpumask ***sched_domains_numa_masks; > int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; > + > +static unsigned long __read_mostly *sched_numa_onlined_nodes; > #endif > > /* > @@ -1833,6 +1835,16 @@ void sched_init_numa(void) > sched_domains_numa_masks[i][j] = mask; > > for_each_node(k) { > + /* > + * Distance information can be unreliable for > + * offline nodes, defer building the node > + * masks to its bringup. > + * This relies on all unique distance values > + * still being visible at init time. > + */ > + if (!node_online(j)) > + continue; > + > if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) > sched_numa_warn("Node-distance not symmetric"); > > @@ -1886,6 +1898,53 @@ void sched_init_numa(void) > sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1]; > > init_numa_topology_type(); > + > + sched_numa_onlined_nodes = bitmap_alloc(nr_node_ids, GFP_KERNEL); > + if (!sched_numa_onlined_nodes) > + return; > + > + bitmap_zero(sched_numa_onlined_nodes, nr_node_ids); > + for_each_online_node(i) > + bitmap_set(sched_numa_onlined_nodes, i, 1); > +} > + > +void __sched_domains_numa_masks_set(unsigned int node) > +{ > + int i, j; > + > + /* > + * NUMA masks are not built for offline nodes in sched_init_numa(). > + * Thus, when a CPU of a never-onlined-before node gets plugged in, > + * adding that new CPU to the right NUMA masks is not sufficient: the > + * masks of that CPU's node must also be updated. > + */ > + if (test_bit(node, sched_numa_onlined_nodes)) > + return; > + > + bitmap_set(sched_numa_onlined_nodes, node, 1); > + > + for (i = 0; i < sched_domains_numa_levels; i++) { > + for (j = 0; j < nr_node_ids; j++) { > + if (!node_online(j) || node == j) > + continue; > + > + if (node_distance(j, node) > sched_domains_numa_distance[i]) > + continue; > + > + /* Add remote nodes in our masks */ > + cpumask_or(sched_domains_numa_masks[i][node], > + sched_domains_numa_masks[i][node], > + sched_domains_numa_masks[0][j]); > + } > + } > + > + /* > + * A new node has been brought up, potentially changing the topology > + * classification. > + * > + * Note that this is racy vs any use of sched_numa_topology_type :/ > + */ > + init_numa_topology_type(); > } > > void sched_domains_numa_masks_set(unsigned int cpu) > @@ -1893,8 +1952,14 @@ void sched_domains_numa_masks_set(unsigned int cpu) > int node = cpu_to_node(cpu); > int i, j; > > + __sched_domains_numa_masks_set(node); > + > for (i = 0; i < sched_domains_numa_levels; i++) { > for (j = 0; j < nr_node_ids; j++) { > + if (!node_online(j)) > + continue; > + > + /* Set ourselves in the remote node's masks */ > if (node_distance(j, node) <= sched_domains_numa_distance[i]) > cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]); > } > -- > 2.25.1 > -- Thanks and Regards Srikar Dronamraju
next prev parent reply other threads:[~2021-08-10 11:48 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-01 4:15 [PATCH v2 0/2] Skip numa distance for offline nodes Srikar Dronamraju 2021-07-01 4:15 ` [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes Srikar Dronamraju 2021-07-01 14:28 ` Valentin Schneider 2021-07-12 12:48 ` Srikar Dronamraju 2021-07-13 16:32 ` Valentin Schneider 2021-07-23 14:39 ` Srikar Dronamraju 2021-08-04 10:01 ` Srikar Dronamraju 2021-08-04 10:20 ` Valentin Schneider 2021-08-08 15:56 ` Valentin Schneider 2021-08-09 6:52 ` Srikar Dronamraju 2021-08-09 12:52 ` Valentin Schneider 2021-08-10 11:47 ` Srikar Dronamraju [this message] 2021-08-16 10:33 ` Srikar Dronamraju 2021-08-17 0:01 ` Valentin Schneider 2021-07-01 4:15 ` [PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes Srikar Dronamraju 2021-07-01 9:36 ` kernel test robot 2021-07-01 10:20 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210810114727.GB21942@linux.vnet.ibm.com \ --to=srikar@linux.vnet.ibm.com \ --cc=Geetika.Moolchandani1@ibm.com \ --cc=dietmar.eggemann@arm.com \ --cc=ego@linux.vnet.ibm.com \ --cc=ldufour@linux.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mgorman@techsingularity.net \ --cc=mingo@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=nathanl@linux.ibm.com \ --cc=peterz@infradead.org \ --cc=riel@surriel.com \ --cc=tglx@linutronix.de \ --cc=valentin.schneider@arm.com \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).