LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Michael Ellerman <mpe@ellerman.id.au> Cc: LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net>, Rik van Riel <riel@surriel.com>, Srikar Dronamraju <srikar@linux.vnet.ibm.com>, Thomas Gleixner <tglx@linutronix.de>, Valentin Schneider <valentin.schneider@arm.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, linuxppc-dev@lists.ozlabs.org, Nathan Lynch <nathanl@linux.ibm.com>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>, Laurent Dufour <ldufour@linux.ibm.com> Subject: [PATCH v2 0/2] Skip numa distance for offline nodes Date: Thu, 1 Jul 2021 09:45:50 +0530 [thread overview] Message-ID: <20210701041552.112072-1-srikar@linux.vnet.ibm.com> (raw) Changelog v1->v2: v1: http://lore.kernel.org/lkml/20210520154427.1041031-1-srikar@linux.vnet.ibm.com/t/#u - Update the numa masks, whenever 1st CPU is added to cpuless node - Populate all possible nodes distances in boot in a powerpc specific function Geetika reported yet another trace while doing a dlpar CPU add operation. This was true even on top of a recent commit 6980d13f0dd1 ("powerpc/smp: Set numa node before updating mask") which fixed a similar trace. WARNING: CPU: 40 PID: 2954 at kernel/sched/topology.c:2088 build_sched_domains+0x6e8/0x1540 Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink dm_multipath pseries_rng xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 40 PID: 2954 Comm: kworker/40:0 Not tainted 5.13.0-rc1+ #19 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001de588 LR: c0000000001de584 CTR: 00000000006cd36c REGS: c00000002772b250 TRAP: 0700 Not tainted (5.12.0-rc5-master+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28828422 XER: 0000000d CFAR: c00000000020c2f8 IRQMASK: 0 #012GPR00: c0000000001de584 c00000002772b4f0 c000000001f55400 0000000000000036 #012GPR04: c0000063c6368010 c0000063c63f0a00 0000000000000027 c0000063c6368018 #012GPR08: 0000000000000023 c0000063c636ef48 00000063c4de0000 c0000063bfe9ffe8 #012GPR12: 0000000028828424 c0000063fe68fe80 0000000000000000 0000000000000417 #012GPR16: 0000000000000028 c00000000740dcd8 c00000000205db68 c000000001a3a4a0 #012GPR20: c000000091ed7d20 c000000091ed8520 0000000000000001 0000000000000000 #012GPR24: c0000000113a9600 0000000000000190 0000000000000028 c0000000010e3ac0 #012GPR28: 0000000000000000 c00000000740dd00 c0000000317b5900 0000000000000190 NIP [c0000000001de588] build_sched_domains+0x6e8/0x1540 LR [c0000000001de584] build_sched_domains+0x6e4/0x1540 Call Trace: [c00000002772b4f0] [c0000000001de584] build_sched_domains+0x6e4/0x1540 (unreliable) [c00000002772b640] [c0000000001e08dc] partition_sched_domains_locked+0x3ec/0x530 [c00000002772b6e0] [c0000000002a2144] rebuild_sched_domains_locked+0x524/0xbf0 [c00000002772b7e0] [c0000000002a5620] rebuild_sched_domains+0x40/0x70 [c00000002772b810] [c0000000002a58e4] cpuset_hotplug_workfn+0x294/0xe20 [c00000002772bc30] [c000000000187510] process_one_work+0x300/0x670 [c00000002772bd10] [c0000000001878f8] worker_thread+0x78/0x520 [c00000002772bda0] [c0000000001937f0] kthread+0x1a0/0x1b0 [c00000002772be10] [c00000000000d6ec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 7ee5bb78 7f0ac378 7f29cb78 7f68db78 7f46d378 7f84e378 f8610068 3c62ff19 fbe10060 3863e558 4802dd31 60000000 <0fe00000> 3920fff4 f9210080 e86100b0 Detailed analysis of the failing scenario showed that the span in question belongs to NODE domain and further the cpumasks for some cpus in NODE overlapped. There are two possible reasons how we ended up here: (1) The numa node was offline or blank with no CPUs or memory. Hence the sched_max_numa_distance could not be set correctly, or the sched_domains_numa_distance happened to be partially populated. (2) Depending on a bogus node_distance of an offline node to populate cpumasks is the issue. On POWER platform the node_distance is correctly available only for an online node which has some CPU or memory resource associated with it. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then the numa topology will look like node distances: node 0 1 0: 10 20 1: 20 10 This series tries to fix both these problems. Note: These problems are now visible, thanks to Commit ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") Cc: LKML <linux-kernel@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Rik van Riel <riel@surriel.com> Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Srikar Dronamraju (2): sched/topology: Skip updating masks for non-online nodes powerpc/numa: Fill distance_lookup_table for offline nodes arch/powerpc/mm/numa.c | 70 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/topology.c | 25 +++++++++++++-- 2 files changed, 93 insertions(+), 2 deletions(-) base-commit: 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7 -- 2.27.0
next reply other threads:[~2021-07-01 4:16 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-01 4:15 Srikar Dronamraju [this message] 2021-07-01 4:15 ` [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes Srikar Dronamraju 2021-07-01 14:28 ` Valentin Schneider 2021-07-12 12:48 ` Srikar Dronamraju 2021-07-13 16:32 ` Valentin Schneider 2021-07-23 14:39 ` Srikar Dronamraju 2021-08-04 10:01 ` Srikar Dronamraju 2021-08-04 10:20 ` Valentin Schneider 2021-08-08 15:56 ` Valentin Schneider 2021-08-09 6:52 ` Srikar Dronamraju 2021-08-09 12:52 ` Valentin Schneider 2021-08-10 11:47 ` Srikar Dronamraju 2021-08-16 10:33 ` Srikar Dronamraju 2021-08-17 0:01 ` Valentin Schneider 2021-07-01 4:15 ` [PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes Srikar Dronamraju 2021-07-01 9:36 ` kernel test robot 2021-07-01 10:20 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210701041552.112072-1-srikar@linux.vnet.ibm.com \ --to=srikar@linux.vnet.ibm.com \ --cc=Geetika.Moolchandani1@ibm.com \ --cc=dietmar.eggemann@arm.com \ --cc=ego@linux.vnet.ibm.com \ --cc=ldufour@linux.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mgorman@techsingularity.net \ --cc=mingo@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=nathanl@linux.ibm.com \ --cc=peterz@infradead.org \ --cc=riel@surriel.com \ --cc=tglx@linutronix.de \ --cc=valentin.schneider@arm.com \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).