LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* Linux 5.14 @ 2021-08-29 22:19 Linus Torvalds 2021-08-30 9:11 ` Sudip Mukherjee ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Linus Torvalds @ 2021-08-29 22:19 UTC (permalink / raw) To: Linux Kernel Mailing List So I realize you must all still be busy with all the galas and fancy balls and all the other 30th anniversary events, but at some point you must be getting tired of the constant glitz, the fireworks, and the champagne. That ball gown or tailcoat isn't the most comfortable thing, either. The celebrations will go on for a few more weeks yet, but you all may just need a breather from them. And when that happens, I have just the thing for you - a new kernel release to test and enjoy. Because 5.14 is out there, just waiting for you to kick the tires and remind yourself what all the festivities are about. Of course, the poor tireless kernel maintainers won't have time for the festivities, because for them, this just means that the merge window will start tomorrow. We have another 30 years to look forward to, after all. But for the rest of you, take a breather, build a kernel, test it out, and then you can go back to the seemingly endless party that I'm sure you just crawled out of. Linus --- Aaron Ma (1): igc: fix page fault when thunderbolt is unplugged Adam Ford (1): clk: renesas: rcar-usb2-clock-sel: Fix kernel NULL pointer dereference Alexey Gladkov (1): ucounts: Increase ucounts reference counter before the security hook Andrey Ignatov (1): rtnetlink: Return correct error on changing device netns Andy Shevchenko (1): media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor() Babu Moger (1): x86/resctrl: Fix a maybe-uninitialized build warning treated as error Bart Van Assche (1): mq-deadline: Fix request accounting Bin Meng (2): riscv: dts: microchip: Use 'local-mac-address' for emac1 riscv: dts: microchip: Add ethernet0 to the aliases node Bob Pearson (1): RDMA/rxe: Fix memory allocation while in a spin lock Borislav Petkov (1): drm/amdgpu: Fix build with missing pm_suspend_target_state module export Christian König (1): drm/amdgpu: use the preferred pin domain after the check Christoph Hellwig (1): cryptoloop: add a deprecation warning Christophe JAILLET (1): xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()' Colin Ian King (1): perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32 DENG Qingfang (1): net: phy: mediatek: add the missing suspend/resume callbacks Dan Carpenter (1): pd: fix a NULL vs IS_ERR() check Daniel Borkmann (1): bpf: Fix ringbuf helper function compatibility David Hildenbrand (1): virtio-mem: fix sleeping in RCU read side section in virtio_mem_online_page_cb() Davide Caratti (1): net/sched: ets: fix crash when flipping from 'strict' to 'quantum' Dinghao Liu (1): RDMA/bnxt_re: Remove unpaired rtnl unlock in bnxt_re_dev_init() Dmitry Osipenko (1): PM: domains: Improve runtime PM performance state handling Eric Dumazet (2): ipv6: use siphash in rt6_exception_hash() ipv4: use siphash instead of Jenkins in fnhe_hashfun() Eric W. Biederman (1): ucounts: Fix regression preventing increasing of rlimits in init_user_ns Gal Pressman (2): RDMA/uverbs: Track dmabuf memory regions RDMA/efa: Free IRQ vectors on error flow Geert Uytterhoeven (1): reset: RESET_MCHP_SPARX5 should depend on ARCH_SPARX5 Guangbin Huang (1): net: hns3: fix get wrong pfc_en when query PFC configuration Guojia Liao (1): net: hns3: fix duplicate node in VLAN list Harini Katakam (1): net: macb: Add a NULL check on desc_ptp Helge Deller (1): Revert "parisc: Add assembly implementations for memset, strlen, strcpy, strncpy and strcat" Jacob Keller (1): ice: do not abort devlink info if board identifier can't be found Jens Axboe (1): Revert "block/mq-deadline: Prioritize high-priority requests" Jerome Brunet (2): usb: gadget: f_uac2: fixup feedback endpoint stop usb: gadget: u_audio: fix race condition on endpoint stop Joerg Roedel (1): x86/efi: Restore Firmware IDT before calling ExitBootServices() Johan Hovold (1): Revert "USB: serial: ch341: fix character loss at high transfer rates" Kalle Valo (1): Revert "net: really fix the build..." Kim Phillips (3): perf/x86/amd/ibs: Work around erratum #1197 perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op perf/x86/amd/power: Assign pmu.module Krzysztof Hałasa (1): gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats Kurt Kanzenbach (2): net: dsa: hellcreek: Fix incorrect setting of GCL net: dsa: hellcreek: Adjust schedule look ahead window Kyle Tso (1): usb: typec: tcpm: Raise vdm_sm_running flag only when VDM SM is running Li Jinlin (1): scsi: core: Fix hang of freezing queue between blocking and running device Linus Torvalds (3): Revert "media: dvb header files: move some headers to staging" pipe: do FASYNC notifications for every pipe IO, not just state changes Linux 5.14 Linus Walleij (1): ARM: 9104/2: Fix Keystone 2 kernel mapping regression Lukas Bulwahn (2): RDMA/irdma: Use correct kconfig symbol for AUXILIARY_BUS powerpc: Re-enable ARCH_ENABLE_SPLIT_PMD_PTLOCK Maor Gottlieb (1): RDMA/mlx5: Fix crash when unbind multiport slave Marc Zyngier (1): stmmac: Revert "stmmac: align RX buffers" Marek Marczykowski-Górecki (1): PCI/MSI: Skip masking MSI-X on Xen PV Marijn Suijten (1): opp: core: Check for pending links before reading required_opp pointers Matthew Brost (1): drm/i915: Fix syncmap memory leak Maxim Kiselev (1): net: marvell: fix MVNETA_TX_IN_PRGRS bit number Miaohe Lin (1): mm/memory_hotplug: fix potential permanent lru cache disable Michael Riesch (1): net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings Michel Dänzer (1): drm/amdgpu: Cancel delayed work when GFXOFF is disabled Namjae Jeon (1): MAINTAINERS: exfat: update my email address Naresh Kumar PBS (1): RDMA/bnxt_re: Add missing spin lock initialization Nathan Rossi (1): net: dsa: mv88e6xxx: Update mv88e6393x serdes errata Nicholas Piggin (1): powerpc/64s: Fix scv implicit soft-mask table for relocated kernels Oleksij Rempel (2): net: usb: asix: ax88772: move embedded PHY detection as early as possible net: usb: asix: do not call phy_disconnect() for ax88178 Peter Zijlstra (1): sched: Fix Core-wide rq->lock for uninitialized CPUs Petko Manolov (1): net: usb: pegasus: fixes of set_register(s) return value evaluation; Philipp Zabel (1): drm/imx: ipuv3-plane: fix accidental partial revert of 8 pixel alignment fix Qu Wenruo (1): Revert "btrfs: compression: don't try to compress if we don't have enough pages" Rahul Lakkireddy (1): cxgb4: dont touch blocked freelist bitmap after free Sai Krishna Potthuri (1): reset: reset-zynqmp: Fixed the argument data type Sasha Neftin (2): e1000e: Fix the max snoop/no-snoop latency for 10M e1000e: Do not take care about recovery NVM checksum Sebastian Andrzej Siewior (1): sched: Fix get_push_task() vs migrate_disable() Shai Malin (2): qed: Fix the VF msix vectors flow qede: Fix memset corruption Shreyansh Chouhan (2): ip_gre: add validation for csum_start ip6_gre: add validation for csum_start Song Yoong Siang (2): net: stmmac: fix kernel panic due to NULL pointer dereference of xsk_pool net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp Stefan Mätje (1): can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters Swati Sharma (1): drm/i915/dp: Drop redundant debug print Takashi Iwai (1): usb: renesas-xhci: Prefer firmware loading on unknown ROM state Thinh Nguyen (1): usb: dwc3: gadget: Fix dwc3_calc_trbs_left() Toshiki Nishioka (1): igc: Use num_tx_queues when iterating over tx_ring queue Trond Myklebust (1): SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()... Tuo Li (2): IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs() ceph: fix possible null-pointer dereference in ceph_mdsmap_decode() Ulf Hansson (1): Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711" Vincent Chen (1): riscv: Ensure the value of FP registers in the core dump file is up to date Wesley Cheng (1): usb: dwc3: gadget: Stop EP0 transfers during pullup disable Will Deacon (1): Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID" Wong Vee Khee (1): net: stmmac: fix kernel panic due to NULL pointer dereference of plat->est Xiao Yang (1): RDMA/rxe: Zero out index member of struct rxe_queue Xiaolong Huang (1): net: qrtr: fix another OOB Read in qrtr_endpoint_post Xiaoyao Li (1): perf/x86/intel/pt: Fix mask of num_address_ranges Xiubo Li (1): ceph: correctly handle releasing an embedded cap flush Yonglong Liu (1): net: hns3: fix speed unknown issue in bond 4 Yufeng Mo (4): net: hns3: clear hardware resource when loading driver net: hns3: add waiting time before cmdq memory is released net: hns3: change the method of getting cmd index in debugfs net: hns3: fix GRO configuration error after reset Zhengjun Zhang (1): USB: serial: option: add new VID/PID to support Fibocom FG150 kernel test robot (1): net: usb: asix: ax88772: fix boolconv.cocci warnings zhang kai (1): ipv6: correct comments about fib6_node sernum 王贇 (1): net: fix NULL pointer reference in cipso_v4_doi_free ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-29 22:19 Linux 5.14 Linus Torvalds @ 2021-08-30 9:11 ` Sudip Mukherjee 2021-08-30 15:17 ` Linus Torvalds 2021-08-30 9:39 ` Andy Shevchenko 2021-08-30 20:12 ` Guenter Roeck 2 siblings, 1 reply; 12+ messages in thread From: Sudip Mukherjee @ 2021-08-30 9:11 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Linus Torvalds Hi All, On Sun, Aug 29, 2021 at 11:23 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > <snip> > > Of course, the poor tireless kernel maintainers won't have time for > the festivities, because for them, this just means that the merge > window will start tomorrow. We have another 30 years to look forward > to, after all. But for the rest of you, take a breather, build a > kernel, test it out, and then you can go back to the seemingly endless > party that I'm sure you just crawled out of. We were recently working on openqa based testing and is a very basic testing for now.. Build the kernel for x86_64 and arm64, boot it on qemu and rpi4 and test that the desktop environment is working. And, it now tests mainline branch every night. So, last night it tested "5.14.0-7d2a07b76933" and both tests were ok. rpi4: https://openqa.qa.codethink.co.uk/tests/68 qemu: https://openqa.qa.codethink.co.uk/tests/67 -- Regards Sudip ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 9:11 ` Sudip Mukherjee @ 2021-08-30 15:17 ` Linus Torvalds 2021-09-12 19:29 ` Sudip Mukherjee 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2021-08-30 15:17 UTC (permalink / raw) To: Sudip Mukherjee; +Cc: Linux Kernel Mailing List On Mon, Aug 30, 2021 at 2:12 AM Sudip Mukherjee <sudipm.mukherjee@gmail.com> wrote: > > We were recently working on openqa based testing and is a very basic > testing for now.. Build the kernel for x86_64 and arm64, boot it on > qemu and rpi4 and test that the desktop environment is working. And, > it now tests mainline branch every night. So, last night it tested > "5.14.0-7d2a07b76933" and both tests were ok. Thanks. The more the merrier, and if you do this every night, having a fairly low-latency "it stopped working" will be good. Of course, if you can find some other slightly more oddball configuration that you would also like to test, that it would be even better. Because while it's lovely to have more automated testing, if _everybody_ only tests x86-64 and arm64, the less common cases get little to no testing. No big deal, but I thought I'd just mention it in case you go "Yeah, I know XYZ is entirely irrelevant, but I happen to like it, so I could easily add that to the testing too". Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 15:17 ` Linus Torvalds @ 2021-09-12 19:29 ` Sudip Mukherjee 0 siblings, 0 replies; 12+ messages in thread From: Sudip Mukherjee @ 2021-09-12 19:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List Hi Linus, On Mon, Aug 30, 2021 at 4:17 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Aug 30, 2021 at 2:12 AM Sudip Mukherjee > <sudipm.mukherjee@gmail.com> wrote: > > > > We were recently working on openqa based testing and is a very basic > > testing for now.. Build the kernel for x86_64 and arm64, boot it on > > qemu and rpi4 and test that the desktop environment is working. And, > > it now tests mainline branch every night. So, last night it tested > > "5.14.0-7d2a07b76933" and both tests were ok. > > Thanks. The more the merrier, and if you do this every night, having a > fairly low-latency "it stopped working" will be good. > > Of course, if you can find some other slightly more oddball > configuration that you would also like to test, that it would be even > better. > > Because while it's lovely to have more automated testing, if > _everybody_ only tests x86-64 and arm64, the less common cases get > little to no testing. > > No big deal, but I thought I'd just mention it in case you go "Yeah, I > know XYZ is entirely irrelevant, but I happen to like it, so I could > easily add that to the testing too". A late reply, but better late than never. I have now added a ppc64 qemu test which will run every night along with the previous two arch. We are also working on adding a risc-v board to the tests. -- Regards Sudip ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-29 22:19 Linux 5.14 Linus Torvalds 2021-08-30 9:11 ` Sudip Mukherjee @ 2021-08-30 9:39 ` Andy Shevchenko 2021-08-30 11:28 ` Andy Shevchenko 2021-08-30 20:12 ` Guenter Roeck 2 siblings, 1 reply; 12+ messages in thread From: Andy Shevchenko @ 2021-08-30 9:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List On Mon, Aug 30, 2021 at 1:20 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So I realize you must all still be busy with all the galas and fancy > balls and all the other 30th anniversary events, but at some point you > must be getting tired of the constant glitz, the fireworks, and the > champagne. That ball gown or tailcoat isn't the most comfortable > thing, either. The celebrations will go on for a few more weeks yet, > but you all may just need a breather from them. > > And when that happens, I have just the thing for you - a new kernel > release to test and enjoy. Because 5.14 is out there, just waiting for > you to kick the tires and remind yourself what all the festivities are > about. > > Of course, the poor tireless kernel maintainers won't have time for > the festivities, because for them, this just means that the merge > window will start tomorrow. We have another 30 years to look forward > to, after all. But for the rest of you, take a breather, build a > kernel, test it out, and then you can go back to the seemingly endless > party that I'm sure you just crawled out of. Haven't investigated so far, but all 32-bit builds for x86 on Debian unstable gcc (Debian 10.2.1-6) 10.2.1 20210110 fail for me with FATAL: modpost: section header offset=11258999068426292 in file 'vmlinux.o' is bigger than filesize=509598908 (hex value is 28000000000034) Replacing #if KERNEL_ELFCLASS == ELFCLASS32 with #if 1 in scripts/mod/modpost.h fixes it to me. As said, I haven't done any work to find the root cause, so JFYI. P.S. Yes, I did a completely clean build and tried different kernel configurations including just default i386_defconfig in the release, the same error. x86_64 builds are good. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 9:39 ` Andy Shevchenko @ 2021-08-30 11:28 ` Andy Shevchenko 0 siblings, 0 replies; 12+ messages in thread From: Andy Shevchenko @ 2021-08-30 11:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List On Mon, Aug 30, 2021 at 12:39 PM Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Mon, Aug 30, 2021 at 1:20 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > Haven't investigated so far, but all 32-bit builds for x86 on Debian unstable > gcc (Debian 10.2.1-6) 10.2.1 20210110 > fail for me with > FATAL: modpost: section header offset=11258999068426292 in file > 'vmlinux.o' is bigger than filesize=509598908 > > (hex value is 28000000000034) > > Replacing > #if KERNEL_ELFCLASS == ELFCLASS32 > with > #if 1 > > in scripts/mod/modpost.h fixes it to me. > > As said, I haven't done any work to find the root cause, so JFYI. > > P.S. Yes, I did a completely clean build and tried different kernel > configurations including just default i386_defconfig in the release, > the same error. x86_64 builds are good. Okay, I think I found it. I have had ccache with a quite bit pile of cache files in between, After cleaning it it seems everything went fine. Sorry for the noise. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-29 22:19 Linux 5.14 Linus Torvalds 2021-08-30 9:11 ` Sudip Mukherjee 2021-08-30 9:39 ` Andy Shevchenko @ 2021-08-30 20:12 ` Guenter Roeck 2021-08-30 20:15 ` Linus Torvalds 2021-08-30 20:32 ` Thomas Gleixner 2 siblings, 2 replies; 12+ messages in thread From: Guenter Roeck @ 2021-08-30 20:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Peter Zijlstra On Sun, Aug 29, 2021 at 03:19:23PM -0700, Linus Torvalds wrote: > So I realize you must all still be busy with all the galas and fancy > balls and all the other 30th anniversary events, but at some point you > must be getting tired of the constant glitz, the fireworks, and the > champagne. That ball gown or tailcoat isn't the most comfortable > thing, either. The celebrations will go on for a few more weeks yet, > but you all may just need a breather from them. > > And when that happens, I have just the thing for you - a new kernel > release to test and enjoy. Because 5.14 is out there, just waiting for > you to kick the tires and remind yourself what all the festivities are > about. > > Of course, the poor tireless kernel maintainers won't have time for > the festivities, because for them, this just means that the merge > window will start tomorrow. We have another 30 years to look forward > to, after all. But for the rest of you, take a breather, build a > kernel, test it out, and then you can go back to the seemingly endless > party that I'm sure you just crawled out of. > Build results: total: 154 pass: 154 fail: 0 Qemu test results: total: 479 pass: 479 fail: 0 So far so good, but there is a brand new runtime warning, seen when booting s390 images. [ 3.218816] ------------[ cut here ]------------ [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 [ 3.219548] Modules linked in: [ 3.219948] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0 #1 [ 3.220139] Hardware name: QEMU 2964 QEMU (KVM/Linux) [ 3.220312] Krnl PSW : 0400e00180000000 0000000000186e86 (sched_core_cpu_starting+0x176/0x180) [ 3.220593] R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 3.220746] Krnl GPRS: 0000000000000000 0000000000000200 0000000000000200 0000000000000200 [ 3.220821] ffffffffffffffff 0000000000000000 000000000161f300 0000000001209c30 [ 3.220893] 0000000000000002 00000000019bf418 0000000000000001 000000001fbf2300 [ 3.220964] 0000000000000000 0000000000000001 0000000000186dc4 0000038000093c90 [ 3.222032] Krnl Code: 0000000000186e7a: af000000 mc 0,0 [ 3.222032] 0000000000186e7e: a7f4ff88 brc 15,0000000000186d8e [ 3.222032] #0000000000186e82: af000000 mc 0,0 [ 3.222032] >0000000000186e86: a7f4ffe7 brc 15,0000000000186e54 [ 3.222032] 0000000000186e8a: 0707 bcr 0,%r7 [ 3.222032] 0000000000186e8c: 0707 bcr 0,%r7 [ 3.222032] 0000000000186e8e: 0707 bcr 0,%r7 [ 3.222032] 0000000000186e90: c00400000000 brcl 0,0000000000186e90 [ 3.222845] Call Trace: [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 [ 3.223614] no locks held by swapper/1/0. [ 3.223713] Last Breaking-Event-Address: [ 3.223762] [<0000000000000000>] 0x0 [ 3.224578] random: get_random_bytes called from __warn+0x11e/0x158 with crng_init=0 [ 3.234056] ---[ end trace 5ffbc0f4ab37cea9 ]--- Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized CPUs") sems to be the culprit. Indeed, the warning is gone after reverting this commit. Guenter ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 20:12 ` Guenter Roeck @ 2021-08-30 20:15 ` Linus Torvalds 2021-08-30 21:28 ` Peter Zijlstra 2021-08-30 20:32 ` Thomas Gleixner 1 sibling, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2021-08-30 20:15 UTC (permalink / raw) To: Guenter Roeck, Heiko Carstens, Vasily Gorbik, Christian Borntraeger Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390 On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote: > > So far so good, but there is a brand new runtime warning, seen when booting > s390 images. > > [ 3.218816] ------------[ cut here ]------------ > [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 > [ 3.222845] Call Trace: > [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 > [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) > [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 > [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 > [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 > [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 > [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 > [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 > > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized > CPUs") seems to be the culprit. Indeed, the warning is gone after reverting > this commit. Ouch, not great timing. Adding the s390 people to the cc too, just to make sure everybody involved is aware. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 20:15 ` Linus Torvalds @ 2021-08-30 21:28 ` Peter Zijlstra 2021-08-31 11:04 ` Heiko Carstens 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2021-08-30 21:28 UTC (permalink / raw) To: Linus Torvalds Cc: Guenter Roeck, Heiko Carstens, Vasily Gorbik, Christian Borntraeger, Linux Kernel Mailing List, linux-s390, Sven Schnelle On Mon, Aug 30, 2021 at 01:15:37PM -0700, Linus Torvalds wrote: > On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > So far so good, but there is a brand new runtime warning, seen when booting > > s390 images. > > > > [ 3.218816] ------------[ cut here ]------------ > > [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 > > [ 3.222845] Call Trace: > > [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 > > [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) > > [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 > > [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 > > [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 > > [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 > > [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 > > [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 > > > > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized > > CPUs") seems to be the culprit. Indeed, the warning is gone after reverting > > this commit. > > Ouch, not great timing. > > Adding the s390 people to the cc too, just to make sure everybody > involved is aware. 'Funny' thing, Sven actually tested that on s390. I had already comitted the patch which is why his tag isn't on the commit: https://lkml.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com Anyway, looks like Thomas found something fishy in their topology code. Lemme go catch up. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 21:28 ` Peter Zijlstra @ 2021-08-31 11:04 ` Heiko Carstens 0 siblings, 0 replies; 12+ messages in thread From: Heiko Carstens @ 2021-08-31 11:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Guenter Roeck, Vasily Gorbik, Christian Borntraeger, Linux Kernel Mailing List, linux-s390, Sven Schnelle, Thomas Gleixner On Mon, Aug 30, 2021 at 11:28:54PM +0200, Peter Zijlstra wrote: > On Mon, Aug 30, 2021 at 01:15:37PM -0700, Linus Torvalds wrote: > > On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > So far so good, but there is a brand new runtime warning, seen when booting > > > s390 images. > > > > > > [ 3.218816] ------------[ cut here ]------------ > > > [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 > > > [ 3.222845] Call Trace: > > > [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 > > > [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) > > > [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 > > > [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 > > > [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 > > > [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 > > > [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 > > > [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 > > > > > > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized > > > CPUs") seems to be the culprit. Indeed, the warning is gone after reverting > > > this commit. > > > > Ouch, not great timing. > > > > Adding the s390 people to the cc too, just to make sure everybody > > involved is aware. > > 'Funny' thing, Sven actually tested that on s390. I had already comitted > the patch which is why his tag isn't on the commit: > > https://lkml.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com > > Anyway, looks like Thomas found something fishy in their topology code. > Lemme go catch up. Sven provided the patch below which should fix the topology problem. If it fixes everything it will go upstream with a stable tag, but it first needs to see our CI to hopefully make sure it doesn't introduce new regressions. From: Sven Schnelle <svens@linux.ibm.com> Subject: [PATCH] s390: fix topology information when calling cpu hotplug notifiers The cpu hotplug notifiers are called without updating the core/thread masks when a new CPU is added. This causes problems with code setting up data structures in a cpu hotplug notifier, and relying on that later in normal code. This caused a crash in the new core scheduling code (SCHED_CORE), where rq->core was set up in a notifier depending on cpu masks. To fix this, add a cpu_setup_mask which is used in update_cpu_masks() instead of the cpu_online_mask to determine whether the cpu masks should be set for a certain cpu. Also move update_cpu_masks() to update the masks before calling notify_cpu_starting() so that the notifiers are seeing the updated masks. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> --- arch/s390/include/asm/smp.h | 1 + arch/s390/kernel/smp.c | 9 +++++++-- arch/s390/kernel/topology.c | 10 +++++----- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/smp.h b/arch/s390/include/asm/smp.h index e317fd4866c1..f16f4d054ae2 100644 --- a/arch/s390/include/asm/smp.h +++ b/arch/s390/include/asm/smp.h @@ -18,6 +18,7 @@ extern struct mutex smp_cpu_state_mutex; extern unsigned int smp_cpu_mt_shift; extern unsigned int smp_cpu_mtid; extern __vector128 __initdata boot_cpu_vector_save_area[__NUM_VXRS]; +extern cpumask_t cpu_setup_mask; extern int __cpu_up(unsigned int cpu, struct task_struct *tidle); diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 2a991e43ead3..1a04e5bdf655 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -95,6 +95,7 @@ __vector128 __initdata boot_cpu_vector_save_area[__NUM_VXRS]; #endif static unsigned int smp_max_threads __initdata = -1U; +cpumask_t cpu_setup_mask; static int __init early_nosmt(char *s) { @@ -902,13 +903,14 @@ static void smp_start_secondary(void *cpuvoid) vtime_init(); vdso_getcpu_init(); pfault_init(); + cpumask_set_cpu(cpu, &cpu_setup_mask); + update_cpu_masks(); notify_cpu_starting(cpu); if (topology_cpu_dedicated(cpu)) set_cpu_flag(CIF_DEDICATED_CPU); else clear_cpu_flag(CIF_DEDICATED_CPU); set_cpu_online(cpu, true); - update_cpu_masks(); inc_irq_stat(CPU_RST); local_irq_enable(); cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); @@ -950,10 +952,13 @@ early_param("possible_cpus", _setup_possible_cpus); int __cpu_disable(void) { unsigned long cregs[16]; + int cpu; /* Handle possible pending IPIs */ smp_handle_ext_call(); - set_cpu_online(smp_processor_id(), false); + cpu = smp_processor_id(); + set_cpu_online(cpu, false); + cpumask_clear_cpu(cpu, &cpu_setup_mask); update_cpu_masks(); /* Disable pseudo page faults on this cpu. */ pfault_fini(); diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c index d2458a29618f..5cc7aeae4610 100644 --- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -67,9 +67,8 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c static cpumask_t mask; cpumask_clear(&mask); - if (!cpu_online(cpu)) + if (!cpumask_test_cpu(cpu, &cpu_setup_mask)) goto out; - cpumask_set_cpu(cpu, &mask); switch (topology_mode) { case TOPOLOGY_MODE_HW: while (info) { @@ -89,6 +88,7 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c break; } cpumask_and(&mask, &mask, cpu_online_mask); + cpumask_set_cpu(cpu, &mask); out: cpumask_copy(dst, &mask); } @@ -99,16 +99,15 @@ static void cpu_thread_map(cpumask_t *dst, unsigned int cpu) int i; cpumask_clear(&mask); - if (!cpu_online(cpu)) + if (!cpumask_test_cpu(cpu, &cpu_setup_mask)) goto out; cpumask_set_cpu(cpu, &mask); if (topology_mode != TOPOLOGY_MODE_HW) goto out; cpu -= cpu % (smp_cpu_mtid + 1); for (i = 0; i <= smp_cpu_mtid; i++) - if (cpu_present(cpu + i)) + if (cpu_online(cpu + i)) cpumask_set_cpu(cpu + i, &mask); - cpumask_and(&mask, &mask, cpu_online_mask); out: cpumask_copy(dst, &mask); } @@ -569,6 +568,7 @@ void __init topology_init_early(void) alloc_masks(info, &book_info, 2); alloc_masks(info, &drawer_info, 3); out: + cpumask_set_cpu(0, &cpu_setup_mask); __arch_update_cpu_topology(); __arch_update_dedicated_flag(NULL); } -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 20:12 ` Guenter Roeck 2021-08-30 20:15 ` Linus Torvalds @ 2021-08-30 20:32 ` Thomas Gleixner 2021-08-30 23:57 ` Thomas Gleixner 1 sibling, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2021-08-30 20:32 UTC (permalink / raw) To: Guenter Roeck, Linus Torvalds Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390, Heiko Carstens On Mon, Aug 30 2021 at 13:12, Guenter Roeck wrote: > On Sun, Aug 29, 2021 at 03:19:23PM -0700, Linus Torvalds wrote: > So far so good, but there is a brand new runtime warning, seen when booting > s390 images. > > [ 3.218816] ------------[ cut here ]------------ > [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 > [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 > [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) > [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 > [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 > [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 > [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 > [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 > [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 > > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized > CPUs") sems to be the culprit. Indeed, the warning is gone after reverting > this commit. The warning is gone, but the underlying S390 problem persists: S390 invokes notify_cpu_starting() _before_ updating the topology masks. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Linux 5.14 2021-08-30 20:32 ` Thomas Gleixner @ 2021-08-30 23:57 ` Thomas Gleixner 0 siblings, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2021-08-30 23:57 UTC (permalink / raw) To: Guenter Roeck, Linus Torvalds Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390, Heiko Carstens, Sven Schnelle On Mon, Aug 30 2021 at 22:32, Thomas Gleixner wrote: > On Mon, Aug 30 2021 at 13:12, Guenter Roeck wrote: >> On Sun, Aug 29, 2021 at 03:19:23PM -0700, Linus Torvalds wrote: >> So far so good, but there is a brand new runtime warning, seen when booting >> s390 images. >> >> [ 3.218816] ------------[ cut here ]------------ >> [ 3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180 >> [ 3.222992] [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180 >> [ 3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180) >> [ 3.223182] [<00000000001963e4>] sched_cpu_starting+0x2c/0x68 >> [ 3.223243] [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970 >> [ 3.223304] [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108 >> [ 3.223364] [<000000000015123c>] notify_cpu_starting+0x84/0xa8 >> [ 3.223426] [<0000000000117bca>] smp_init_secondary+0x72/0xf0 >> [ 3.223492] [<0000000000117846>] smp_start_secondary+0x86/0x90 >> >> Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized >> CPUs") sems to be the culprit. Indeed, the warning is gone after reverting >> this commit. > > The warning is gone, but the underlying S390 problem persists: > > S390 invokes notify_cpu_starting() _before_ updating the topology masks. And interestingly enough that very commit was tested on S390: https://lore.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-09-12 19:30 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-29 22:19 Linux 5.14 Linus Torvalds 2021-08-30 9:11 ` Sudip Mukherjee 2021-08-30 15:17 ` Linus Torvalds 2021-09-12 19:29 ` Sudip Mukherjee 2021-08-30 9:39 ` Andy Shevchenko 2021-08-30 11:28 ` Andy Shevchenko 2021-08-30 20:12 ` Guenter Roeck 2021-08-30 20:15 ` Linus Torvalds 2021-08-30 21:28 ` Peter Zijlstra 2021-08-31 11:04 ` Heiko Carstens 2021-08-30 20:32 ` Thomas Gleixner 2021-08-30 23:57 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).