LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Linux 3.19-rc5
@ 2015-01-18  9:17 Linus Torvalds
  2015-01-19 18:02 ` Bruno Prémont
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-01-18  9:17 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another week, another -rc.

Fairly normal release, although I'd wish that by rc5 we'd have calmed
down even further.  But no, with some of the driver tree merges in
particular, this is actually larger than rc4 was.

That said, it's not like there is anything particularly scary in here.

The arm64 vm bug that I mentioned as pending in the rc4 notes got
fixed within a day of that previous rc release, and the rest looks
pretty standard. Mostly drivers (networking, usb, scsi target, block
layer, mmc,  tty etc), but also arch updates (arm, x86, s390 and some
tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing
fixes, and some perf tooling fixes.

Shortlog with the details appended.

Go forth and test.

                Linus

---

Abhilash Kesavan (2):
      drivers: bus: check cci device tree node status
      ARM: dts: disable CCI on exynos5420 based arndale-octa

Adrian Hunter (6):
      mmc: sdhci: Tuning should not change max_blk_count
      mmc: sdhci: Add out_unlock to sdhci_execute_tuning
      mmc: sdhci: Simplify use of tuning timer
      mmc: sdhci: Disable re-tuning for HS400
      mmc: sdhci-acpi: Add ACPI HID INT344D
      mmc: sdhci-pci: Add support for Intel SPT

Alan Stern (2):
      USB: EHCI: fix initialization bug in iso_stream_schedule()
      USB: EHCI: adjust error return code

Alexander Stein (1):
      ARM: at91/dt: sam9263: Add missing clocks to lcdc node

Alexander Usyskin (1):
      mei: clean reset bit before reset

Alexandre Belloni (1):
      net/at91_ether: prepare and unprepare clock

Alexey Brodkin (1):
      perf tools: Fix statfs.f_type data type mismatch build error with uclibc

Alexey Khoroshilov (1):
      usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()

Amit Virdi (2):
      usb: dwc3: gadget: Fix TRB preparation during SG
      usb: dwc3: gadget: Stop TRB preparation after limit is reached

Andreas Faerber (1):
      ARM: exynos_defconfig: Enable LM90 driver

Andrey Skvortsov (1):
      selftests/vm: fix link error for transhuge-stress test

Andy Grover (1):
      iscsi-target: Fix typos in enum cmd_flags_table

Andy Shevchenko (1):
      dmaengine: dw: balance PM runtime calls

Ani Sinha (1):
      update ip-sysctl.txt documentation (v2)

Anjali Singhai (2):
      i40e: Fix Rx checksum error counter
      i40e: Fix bug with TCP over IPv6 over VXLAN

Anson Huang (1):
      Thermal: imx: add clk disable/enable for suspend/resume

Anton Blanchard (1):
      powernv: Fix OPAL tracepoint code

Arik Nemtsov (1):
      iwlwifi: pcie: correctly define 7265-D cfg

Arnaldo Carvalho de Melo (1):
      tools: Remove bitops/hweight usage of bits in tools/perf

Arnd Bergmann (2):
      usb: phy: mv-usb: fix usb_phy build errors
      bridge: only provide proxy ARP when CONFIG_INET is enabled

Arseny Solokha (1):
      OHCI: add a quirk for ULi M5237 blocking on reset

Axel Lin (2):
      gpio: dln2: Fix gpio output value in dln2_gpio_direction_output()
      gpio: grgpio: Avoid potential NULL pointer dereference

B Viswanath (1):
      net: Corrected the comment describing the ndo operations to
reflect the actual prototype for couple of operations

Benjamin Poirier (1):
      netdevice: Add missing parentheses in macro

Bo Shen (4):
      usb: gadget: udc: atmel: change setting for DMA
      usb: gadget: udc: atmel: fix possible IN hang issue
      ARM: at91/dt: sama5d4: fix the timer reg length
      ARM: at91: sama5d3: dt: correct the sound route

Boris Brezillon (1):
      clk: at91: keep slow clk enabled to prevent system hang

Boris Ostrovsky (2):
      x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()
      x86/xen: Free bootmem in free_p2m_page() during early boot

Catalin Marinas (1):
      arm64: partially revert "ARM: 8167/1: extend the reserved memory
for initrd to be page aligned"

Chanwoo Choi (1):
      serial: samsung: Add the support for Exynos5433 SoC

Chen Gang (1):
      s390/timex: fix get_tod_clock_ext() inline assembly

Christian Borntraeger (3):
      s390/vtime: Get rid of redundant WARN_ON
      s390/kernel: use stnsm 255 instead of stosm 0
      kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)

Christoph Hellwig (1):
      scsi: ->queue_rq can't sleep

Christoph Jaeger (1):
      packet: bail out of packet_snd() if L2 header creation fails

Chuck Lever (1):
      NFS: Ignore transport protocol when detecting server trunking

Colin Ian King (1):
      fbdev/broadsheetfb: fix memory leak

Dan Carpenter (3):
      ipvs: uninitialized data with IP_VS_IPV6
      phy: miphy28lp: unlock on error in miphy28lp_init()
      usb: gadget: gadgetfs: fix an oops in ep_write()

Darrick J. Wong (1):
      uas: disable UAS on Apricorn SATA dongles

David Drysdale (1):
      selftests/exec: allow shell return code of 126

David Peterson (1):
      USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices

David Spinadel (2):
      iwlwifi: mvm: add a flag to enable match found notification
      iwlwifi: mvm: scan dwell time corrections

David Vrabel (3):
      x86/xen: don't count how many PFNs are identity mapped
      x86/xen: add extra memory for remapped frames during setup
      xen-netfront: use different locks for Rx and Tx stats

Doug Anderson (1):
      ARM: dts: rockchip: bump sd card pin drive strength up on rk3288-evb

Eddie Kovsky (1):
      staging: vt6655: fix sparse warning: argument type

Eduardo Valentin (1):
      MAINTAINERS: update ti-soc-thermal status

Emmanuel Grumbach (2):
      iwlwifi: 7000: fix reported firmware name for 7265D
      iwlwifi: bump firmware API for mvm devices to 12

Eric Dumazet (2):
      net: dnet: fix dnet_poll()
      alx: fix alx_poll()

Eyal Shapira (2):
      iwlwifi: mvm: fix Rx with both chains
      iwlwifi: mvm: fix out of bounds access to tid_to_mac80211_ac

Fabien Proriol (1):
      iio: iio: Fix iio_channel_read return if channel havn't info

Fabio Estevam (3):
      ARM: dts: imx25: Fix the SPI1 clocks
      ARM: imx6sx: Set PLL2 as parent of QSPI clocks
      ARM: dts: imx51-babbage: Fix ULPI PHY reset modelling

Felipe Balbi (2):
      usb: musb: debugfs: cope with blackfin's oddities
      usb: musb: blackfin: fix build break

Gary Bisson (1):
      ARM: clk-imx6q: fix video divider for rev T0 1.0

Geert Uytterhoeven (5):
      ARM: shmobile: r8a7740: Instantiate GIC from C board code in legacy builds
      thermal: of: Remove bogus type qualifier for of_thermal_get_trip_points()
      ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
      m68k: Wire up execveat
      thermal: rcar: Spelling/grammar: s/drier use .../driver uses ...s/

Giel van Schijndel (1):
      isdn: fix NUL (\0 or \x00) specification in string

Hans de Goede (7):
      phy-sun4i-usb: Change disconnect threshold value for sun6i
      xhci: Add broken-streams quirk for Fresco Logic FL1000G xhci controllers
      uas: Add US_FL_NO_ATA_1X for Seagate devices with usb-id 0bc2:a013
      uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS566 with usb-id 0bc2:a013
      uas: Do not blacklist ASM1153 disk enclosures
      uas: Add US_FL_NO_ATA_1X for 2 more Seagate disk enclosures
      simplefb: Fix build failure on Sparc

Harald Freudenberger (1):
      s390/zcrypt: kernel oops at insmod of the z90crypt device driver

Hariprasad Shenai (2):
      cxgb4vf: Initialize mdio_addr before using it
      cxgb4vf: Fix queue allocation for 40G adapter

Heikki Krogerus (1):
      usb: dwc3: pci: add support for Intel Sunrise Point PCH

Heiko Carstens (1):
      s390: wire up execveat syscall

Heiko Stuebner (2):
      clk: rockchip: fix rk3066 pll lock bit location
      clk: rockchip: fix rk3288 cpuclk core dividers

Heiko Stübner (2):
      ARM: rockchip: disable jtag/sdmmc autoswitching on rk3288
      clk: rockchip: fix deadlock possibility in cpuclk

Hubert Feurstein (1):
      net: fec: fix NULL pointer dereference in fec_enet_timeout_work

Ian Munsie (1):
      cxl: Fix issues when unmapping contexts

James Bottomley (1):
      serial: fix parisc boot hang

Jan Beulich (1):
      x86/xen: properly retrieve NMI reason

Jan Willeke (1):
      s390/uprobes: fix user space PER events

Javi Merino (1):
      Documentation: thermal: document of_cpufreq_cooling_register()

Javier Martinez Canillas (1):
      ARM: exynos_defconfig: Enable options for display panel support

Jean-Francois Remy (1):
      neighbour: fix base_reachable_time(_ms) not effective immediatly
when changed

Jens Axboe (3):
      block: wake up waiters when a queue is marked dying
      blk-mq: get rid of ->cmd_size in the hardware queue
      blk-mq: Add helper to abort requeued requests

Jeremiah Mahler (2):
      usb: serial: silence all non-critical read errors
      usb: serial: handle -ENODEV quietly in generic_submit_read_urb

Jesse Brandeburg (1):
      i40e: fix un-necessary Tx hangs

Jiri Pirko (1):
      team: avoid possible underflow of count_pending value for
notify_peers and mcast_rejoin

Jisheng Zhang (4):
      ARM: dts: berlin: fix io clk and add missing core clk for BG2Q sdhci2 host
      ARM: dts: berlin: add broken-cd and set bus width for eMMC in
Marvell DMP DT
      ARM: dts: berlin: correct BG2Q's SM GPIO location.
      clk: berlin: bg2q: remove non-exist "smemc" gate clock

Johan Hovold (3):
      USB: keyspan: fix null-deref at probe
      USB: console: fix uninitialised ldisc semaphore
      USB: console: fix potential use after free

Johannes Thumshirn (1):
      mcb: mcb-pci: Only remap the 1st 0x200 bytes of BAR 0

John W. Linville (1):
      usb: gadget: udc: avoid dereference before NULL check in ep_queue

Jon Paul Maloy (1):
      tipc: fix bug in broadcast retransmit code

Juergen Gross (4):
      xen: correct error for building p2m list on 32 bits
      xen: correct race in alloc_p2m_pmd()
      xen: use correct type for physical addresses
      xen: check for zero sized area when invalidating memory

Julia Lawall (1):
      usb: gadget: fix misspelling of current function in string

Julien CHAUVEAU (1):
      clk: rockchip: add CLK_IGNORE_UNUSED flag to fix rk3066/rk3188 USB Host

Kan Liang (1):
      perf/x86/intel: Fix bug for "cycles:p" and "cycles:pp" on SLM

Keith Busch (14):
      blk-mq: Exit queue on alloc failure
      blk-mq: Export freeze/unfreeze functions
      NVMe: Fix double free irq
      blk-mq: Wake tasks entering queue on dying
      blk-mq: Export if requests were started
      blk-mq: Let drivers cancel requeue_work
      blk-mq: Allow requests to never expire
      blk-mq: End unstarted requests on a dying queue
      NVMe: Start all requests
      NVMe: Reference count admin queue usage
      NVMe: Admin queue removal handling
      NVMe: Command abort handling fixes
      NVMe: Start and stop h/w queues on reset
      NVMe: Fix locking on abort handling

Kevin Hao (1):
      Revert "clk: ppc-corenet: Fix Section mismatch warning"

Krzysztof Kozlowski (1):
      mmc: sdhci: Fix sleep in atomic after inserting SD card

Larry Finger (1):
      rtlwifi: Fix error when accessing unmapped memory in skb

Lars-Peter Clausen (1):
      iio: ad799x: Fix ad7991/ad7995/ad7999 config setup

Lee Duncan (1):
      target: Allow Write Exclusive non-reservation holders to READ

Lennart Sorensen (3):
      ARM: omap5/dra7xx: Fix frequency typos
      ARM: dra7xx: Fix counter frequency drift for AM572x errata i856
      ARM: omap5/dra7xx: Enable booting secondary CPU in HYP mode

Linus Torvalds (1):
      Linux 3.19-rc5

Linus Walleij (1):
      ARM: nomadik: fix up leftover device tree pins

Louis Langholtz (1):
      kernel: avoid overflow in cmp_range

Malcolm Priestley (2):
      staging: vt6655: vnt_tx_packet Fix corrupted tx packets.
      staging: vt6655: Fix loss of distant/weak access points on channel change.

Marc Zyngier (2):
      arm64: KVM: Fix TLB invalidation by IPA/VMID
      arm64: KVM: Fix HCR setting for 32bit guests

Mario Schuknecht (1):
      usb: gadget: gadgetfs: Free memory allocated by memdup_user()

Martin Schwidefsky (1):
      s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS

Mathias Nyman (1):
      xhci: Check if slot is already in default state before moving it there

Maxime Ripard (1):
      usb: phy: Fix deferred probing

Michael Ellerman (1):
      powerpc: Work around gcc bug in current_thread_info()

Michael Holzheu (2):
      s390/bpf: Fix ALU_NEG (A = -A)
      s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)

Mike Krinkin (1):
      staging: vt6655: fix sparse warnings: incorrect argument type

Miklos Szeredi (2):
      fuse: fix LOOKUP vs INIT compat handling
      fuse: add memory barrier to INIT

Ming Lei (1):
      block: fix checking return value of blk_mq_init_queue

Mugunthan V N (2):
      ARM: dts: dra7-evm: fix qspi device tree partition size
      drivers: net: cpsw: fix multicast flush in dual emac mode

Namhyung Kim (4):
      perf probe: Propagate error code when write(2) failed
      perf tools: Fix building error in x86_64 when dwarf unwind is on
      perf machine: Fix __machine__findnew_thread() error path
      perf tools: Fix segfault for symbol annotation on TUI

NeilBrown (1):
      locks: fix NULL-deref in generic_delete_lease

Nicholas Bellinger (4):
      vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion
      Documentation/target: Update fabric_ops to latest code
      target: Drop arbitrary maximum I/O size limit
      target: Drop left-over fabric_max_sectors attribute

Nilesh Javali (1):
      MAINTAINERS: Update maintainer list for qla4xxx

Nishanth Menon (2):
      MAINTAINERS: Add linux-omap to list of reviewers for TI Thermal
      ARM: omap2plus_defconfig: use CONFIG_CPUFREQ_DT

Nobuhiro Iwamatsu (2):
      sh-eth: Set fdr_value of R-Car SoCs
      sh_eth: Fix access to TRSCER register

Octavian Purdila (2):
      gpio: dln2: fix issue when an IRQ is unmasked then enabled
      gpio: dln2: use bus_sync_unlock instead of scheduling work

Pablo Neira Ayuso (4):
      netfilter: conntrack: fix race between confirmation and flush
      netfilter: nfnetlink: validate nfnetlink header from batch
      netfilter: nfnetlink: relax strict multicast group check from netlink_bind
      netfilter: nf_tables: fix flush ruleset chain dependencies

Peter Chen (2):
      usb: gadget: f_uac1: access freed memory at f_audio_free_inst
      Revert "usb: chipidea: remove duplicate dev_set_drvdata for host_start"

Peter Hurley (2):
      tty: Prevent hw state corruption in exclusive mode reopen
      Revert "tty: Fix pty master poll() after slave closes v2"

Philipp Zabel (1):
      ARM: dts: imx6qdl: Fix CODA960 interrupt order

Prashant Sreedharan (3):
      tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
      tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
      tg3: Release tp->lock before invoking synchronize_irq()

Preston Fick (1):
      USB: cp210x: fix ID for production CEL MeshConnect USB Stick

Rasmus Villemoes (2):
      usb: musb: Fix a few off-by-one lengths
      kernfs: Fix kernfs_name_compare

Reinhard Speyerer (1):
      USB: qcserial/option: make AT URCs work for Sierra Wireless MC73xx

Robert Baldyga (1):
      usb: dwc2: gadget: kill requests with 'force' in s3c_hsotg_udc_stop()

Romain Perier (1):
      clk: rockchip: Fix clock gate for rk3188 hclk_emem_peri

Sagi Grimberg (1):
      MAINTAINERS: Add entry for iSER target driver

Sebastian Andrzej Siewior (1):
      usb: musb: stuff leak of struct usb_hcd

Sergej Pupykin (1):
      tty: Add support for the WCH384 4S multi-IO card

Simon Guinot (1):
      leds: netxbig: fix oops at probe time

Songjun Wu (1):
      usb: gadget: udc: atmel: fix possible oops when unloading module

Stanimir Varbanov (1):
      clk: fix possible null pointer dereference

Stefan Agner (1):
      net: fec: fix MDIO bus assignement for dual fec SoC's

Stephane Eranian (1):
      perf/rapl: Fix sysfs_show() initialization for RAPL PMU

Steven Rostedt (Red Hat) (5):
      ftrace: Fix updating of filters for shared global_ops filters
      ftrace: Check both notrace and filter for old hash
      ftrace/jprobes/x86: Fix conflict between jprobes and function
graph tracing
      tracing: Remove extra call to init_ftrace_syscalls()
      tracing: Fix enabling of syscall events on the command line

Sukadev Bhattiprolu (1):
      perf tools powerpc: Use dwfl_report_elf() instead of offline.

Thierry Reding (1):
      usb: phy: Restore deferred probing path

Thomas Falcon (1):
      MAINTAINERS: add me as ibmveth maintainer

Thomas Graf (1):
      openvswitch: packet messages need their own probe attribtue

Thomas Petazzoni (1):
      mmc: sdhci-pxav3: do the mbus window configuration after enabling clocks

Tim Kryger (1):
      mmc: sdhci: Set SDHCI_POWER_ON with external vmmc

Tomas Winkler (1):
      mei: add ABI documentation for fw_status exported through sysfs

Tony Lindgren (3):
      usb: musb: Fix randconfig build issues for Kconfig options
      ARM: OMAP2+: Fix n900 board name for legacy user space
      ARM: dts: Revert disabling of smc91x for n900

Trond Myklebust (5):
      LOCKD: Fix a race when initialising nlmsvc_timeout
      NFSv4.1: Fix client id trunking on Linux
      NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_client
      NFSv4/v4.1: Verify the client owner id during trunking detection
      NFSv4: Remove incorrect check in can_open_delegated()

Tyler Baker (1):
      reset: sunxi: fix spinlock initialization

Vasu Dev (1):
      i40e: adds FCoE configure option

Vignesh R (1):
      phy: phy-ti-pipe3: fix inconsistent enumeration of PCIe gen2 cards

Vince Hsu (1):
      usb: host: ehci-tegra: request deferred probe when failing to get phy

Vineet Gupta (2):
      perf tools: Elide strlcpy warning with uclibc
      perf tools: Avoid build splat for syscall numbers with uclibc

Vinod Koul (1):
      MAINTAINERS: dmaengine: fix the header file for dmaengine

Vitaly Kuznetsov (1):
      x86/xen: avoid freeing static 'name' when kasprintf() fails

Vivek Gautam (1):
      arm: dts: Use pmu_system_controller phandle for dp phy

Wang Nan (1):
      perf test: Fix dwarf unwind using libunwind.

Wenyou Yang (1):
      ARM: at91: board-dt-sama5: add phy_fixup to override NAND_Tree

Will Deacon (2):
      arm64: compat: wire up compat_sys_execveat
      mm: mmu_gather: use tlb->end != 0 only for TLB invalidation

Xiubo Li (1):
      ARM: ls1021a: dtsi: add 'big-endian' property for scfg node

Yoshihiro Shimoda (2):
      thermal: rcar: fix ENR register value
      thermal: rcar: change type of ctemp in rcar_thermal_update_temp()

Zhang Rui (3):
      ACPI/int340x_thermal: enumerate INT340X devices even if they're
not in _ART/_TRT
      ACPI/int340x_thermal: enumerate INT3401 for Intel SoC DTS thermal driver
      int340x_thermal/processor_thermal_device: return failure when

dann frazier (1):
      tools: testing: selftests: mq_perf_tests: Fix infinite loop on ARM

leroy christophe (1):
      netfilter: nf_tables: fix port natting in little endian archs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-18  9:17 Linux 3.19-rc5 Linus Torvalds
@ 2015-01-19 18:02 ` Bruno Prémont
  2015-01-20  6:24   ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Bruno Prémont @ 2015-01-19 18:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sun, 18 January 2015 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Another week, another -rc.
> 
> Fairly normal release, although I'd wish that by rc5 we'd have calmed
> down even further.  But no, with some of the driver tree merges in
> particular, this is actually larger than rc4 was.
> 
> That said, it's not like there is anything particularly scary in here.
> 
> The arm64 vm bug that I mentioned as pending in the rc4 notes got
> fixed within a day of that previous rc release, and the rest looks
> pretty standard. Mostly drivers (networking, usb, scsi target, block
> layer, mmc,  tty etc), but also arch updates (arm, x86, s390 and some
> tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing
> fixes, and some perf tooling fixes.
> 
> Shortlog with the details appended.
> 
> Go forth and test.

No idea yet which rc is the offender (nor exact patch), but on my not
so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
converting my laptop into a heater.

lspci for affected nodes:
02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)


Very basics I have, before I attempt any bisection:

cat /proc/$(pidof pccardd)/stack
[<c143ec95>] pccardd+0x1a5/0x3e0
[<c10543a0>] kthread+0xa0/0xc0
[<c16cd840>] ret_from_kernel_thread+0x20/0x30
[<ffffffff>] 0xffffffff


Doing objdump on drivers/pcmcia/cs.o in which pccardd() is defined
it seems pccardd is stuck in try_to_freeze_unsafe().

Does this kind of behavior ring a bell?

I can unbind the two yenta_cardbus devices from yenta_cardbus to stop
the CPU usage.


Possibly important config option:
# CONFIG_SMP is not set




Objdump below:

static int pccardd(void *__skt)
{
     ca0:       55                      push   %ebp
     ca1:       89 e5                   mov    %esp,%ebp
     ca3:       57                      push   %edi
     ca4:       56                      push   %esi
     ca5:       53                      push   %ebx
     ca6:       89 c3                   mov    %eax,%ebx
     ca8:       83 ec 24                sub    $0x24,%esp

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)
{
        return this_cpu_read_stable(current_task);
     cab:       a1 00 00 00 00          mov    0x0,%eax
        struct pcmcia_socket *skt = __skt;
        int ret;

        skt->thread = current;
     cb0:       89 83 e4 00 00 00       mov    %eax,0xe4(%ebx)
        skt->socket = dead_socket;
     cb6:       a1 00 00 00 00          mov    0x0,%eax
        skt->ops->init(skt);
     cbb:       8b 93 cc 00 00 00       mov    0xcc(%ebx),%edx
{
        struct pcmcia_socket *skt = __skt;
        int ret;

        skt->thread = current;
        skt->socket = dead_socket;
     cc1:       89 43 04                mov    %eax,0x4(%ebx)
     cc4:       a1 04 00 00 00          mov    0x4,%eax
     cc9:       89 43 08                mov    %eax,0x8(%ebx)
     ccc:       a1 08 00 00 00          mov    0x8,%eax
     cd1:       89 43 0c                mov    %eax,0xc(%ebx)
        skt->ops->init(skt);
     cd4:       89 d8                   mov    %ebx,%eax
     cd6:       ff 12                   call   *(%edx)
        skt->ops->set_socket(skt, &skt->socket);
     cd8:       8b 8b cc 00 00 00       mov    0xcc(%ebx),%ecx
     cde:       8d 53 04                lea    0x4(%ebx),%edx
     ce1:       89 d8                   mov    %ebx,%eax
     ce3:       ff 51 0c                call   *0xc(%ecx)

        /* register with the device core */
        ret = device_register(&skt->dev);
     ce6:       8d 83 2c 01 00 00       lea    0x12c(%ebx),%eax
     cec:       89 45 e0                mov    %eax,-0x20(%ebp)
     cef:       e8 fc ff ff ff          call   cf0 <pccardd+0x50>
        if (ret) {
     cf4:       85 c0                   test   %eax,%eax
     cf6:       0f 85 d4 02 00 00       jne    fd0 <pccardd+0x330>
                           "PCMCIA: unable to register socket\n");
                skt->thread = NULL;
                complete(&skt->thread_done);
                return 0;
        }
        ret = pccard_sysfs_add_socket(&skt->dev);
     cfc:       8b 45 e0                mov    -0x20(%ebp),%eax
     cff:       e8 fc ff ff ff          call   d00 <pccardd+0x60>
        if (ret)
     d04:       85 c0                   test   %eax,%eax
                           "PCMCIA: unable to register socket\n");
                skt->thread = NULL;
                complete(&skt->thread_done);
                return 0;
        }
        ret = pccard_sysfs_add_socket(&skt->dev);
     d06:       89 45 e4                mov    %eax,-0x1c(%ebp)
        if (ret)
     d09:       0f 85 01 03 00 00       jne    1010 <pccardd+0x370>
                dev_warn(&skt->dev, "err %d adding socket attributes\n", ret);

        complete(&skt->thread_done);
     d0f:       8d 83 e8 00 00 00       lea    0xe8(%ebx),%eax
     d15:       e8 fc ff ff ff          call   d16 <pccardd+0x76>

        /* wait for userspace to catch up */
        msleep(250);
     d1a:       b8 fa 00 00 00          mov    $0xfa,%eax
     d1f:       e8 fc ff ff ff          call   d20 <pccardd+0x80>

        set_freezable();
     d24:       e8 fc ff ff ff          call   d25 <pccardd+0x85>
     d29:       8d 83 fc 00 00 00       lea    0xfc(%ebx),%eax
     d2f:       8b 3d 00 00 00 00       mov    0x0,%edi
     d35:       89 45 e8                mov    %eax,-0x18(%ebp)
     d38:       89 7d dc                mov    %edi,-0x24(%ebp)
     d3b:       90                      nop
     d3c:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
        for (;;) {
                unsigned long flags;
                unsigned int events;
                unsigned int sysfs_events;

                set_current_state(TASK_INTERRUPTIBLE);
     d40:       be 40 0d 00 00          mov    $0xd40,%esi
     d45:       89 b7 8c 0c 00 00       mov    %esi,0xc8c(%edi)
     d4b:       c7 07 01 00 00 00       movl   $0x1,(%edi)
        /*
         * "=rm" is safe here, because "pop" adjusts the stack before
         * it evaluates its effective address -- this is part of the
         * documented behavior of the "pop" instruction.
         */
        asm volatile("# __raw_save_flags\n\t"
     d51:       9c                      pushf  
     d52:       58                      pop    %eax
                     :"memory", "cc");
}

static inline void native_irq_disable(void)
{
        asm volatile("cli": : :"memory");
     d53:       fa                      cli    
 * The various preempt_count add/sub methods
 */

static __always_inline void __preempt_count_add(int val)
{
        raw_cpu_add_4(__preempt_count, val);
     d54:       ff 05 00 00 00 00       incl   0x0

                spin_lock_irqsave(&skt->thread_lock, flags);
                events = skt->thread_events;
     d5a:       8b 8b f4 00 00 00       mov    0xf4(%ebx),%ecx
                skt->thread_events = 0;
     d60:       31 d2                   xor    %edx,%edx
                sysfs_events = skt->sysfs_events;
     d62:       8b b3 f8 00 00 00       mov    0xf8(%ebx),%esi

                set_current_state(TASK_INTERRUPTIBLE);

                spin_lock_irqsave(&skt->thread_lock, flags);
                events = skt->thread_events;
                skt->thread_events = 0;
     d68:       89 93 f4 00 00 00       mov    %edx,0xf4(%ebx)
                unsigned int sysfs_events;

                set_current_state(TASK_INTERRUPTIBLE);

                spin_lock_irqsave(&skt->thread_lock, flags);
                events = skt->thread_events;
     d6e:       89 4d ec                mov    %ecx,-0x14(%ebp)
                skt->thread_events = 0;
                sysfs_events = skt->sysfs_events;
                skt->sysfs_events = 0;
     d71:       31 c9                   xor    %ecx,%ecx
     d73:       89 8b f8 00 00 00       mov    %ecx,0xf8(%ebx)
        return flags;
}

static inline void native_restore_fl(unsigned long flags)
{
        asm volatile("push %0 ; popf"
     d79:       50                      push   %eax
     d7a:       9d                      popf   
                spin_unlock_irqrestore(&skt->thread_lock, flags);

                mutex_lock(&skt->skt_mutex);
     d7b:       8b 45 e8                mov    -0x18(%ebp),%eax
}

static __always_inline void __preempt_count_sub(int val)
{
        raw_cpu_add_4(__preempt_count, -val);
     d7e:       ff 0d 00 00 00 00       decl   0x0
     d84:       e8 fc ff ff ff          call   d85 <pccardd+0xe5>
                if (events & SS_DETECT)
     d89:       f6 45 ec 80             testb  $0x80,-0x14(%ebp)
     d8d:       0f 85 f5 00 00 00       jne    e88 <pccardd+0x1e8>
                        socket_detect_change(skt);

                if (sysfs_events) {
     d93:       85 f6                   test   %esi,%esi
     d95:       0f 84 85 00 00 00       je     e20 <pccardd+0x180>
                        if (sysfs_events & PCMCIA_UEVENT_EJECT)
     d9b:       f7 c6 01 00 00 00       test   $0x1,%esi
     da1:       0f 85 21 01 00 00       jne    ec8 <pccardd+0x228>
                                socket_remove(skt);
                        if (sysfs_events & PCMCIA_UEVENT_INSERT)
     da7:       f7 c6 02 00 00 00       test   $0x2,%esi
     dad:       0f 85 28 01 00 00       jne    edb <pccardd+0x23b>
                                socket_insert(skt);
                        if ((sysfs_events & PCMCIA_UEVENT_SUSPEND) &&
     db3:       f7 c6 04 00 00 00       test   $0x4,%esi
     db9:       74 25                   je     de0 <pccardd+0x140>
     dbb:       f6 43 11 80             testb  $0x80,0x11(%ebx)
     dbf:       75 1f                   jne    de0 <pccardd+0x140>
                                !(skt->state & SOCKET_CARDBUS)) {
                                if (skt->callback)
     dc1:       8b 93 14 01 00 00       mov    0x114(%ebx),%edx
     dc7:       85 d2                   test   %edx,%edx
     dc9:       0f 84 e1 01 00 00       je     fb0 <pccardd+0x310>
                                        ret = skt->callback->suspend(skt);
     dcf:       89 d8                   mov    %ebx,%eax
     dd1:       ff 52 14                call   *0x14(%edx)
                                else
                                        ret = 0;
                                if (!ret) {
     dd4:       85 c0                   test   %eax,%eax
                        if (sysfs_events & PCMCIA_UEVENT_INSERT)
                                socket_insert(skt);
                        if ((sysfs_events & PCMCIA_UEVENT_SUSPEND) &&
                                !(skt->state & SOCKET_CARDBUS)) {
                                if (skt->callback)
                                        ret = skt->callback->suspend(skt);
     dd6:       89 45 e4                mov    %eax,-0x1c(%ebp)
                                else
                                        ret = 0;
                                if (!ret) {
     dd9:       0f 84 d1 01 00 00       je     fb0 <pccardd+0x310>
     ddf:       90                      nop
                                        socket_suspend(skt);
                                        msleep(100);
                                }
                        }
                        if ((sysfs_events & PCMCIA_UEVENT_RESUME) &&
     de0:       f7 c6 08 00 00 00       test   $0x8,%esi
     de6:       74 0c                   je     df4 <pccardd+0x154>
                                !(skt->state & SOCKET_CARDBUS)) {
     de8:       8b 43 10                mov    0x10(%ebx),%eax
                                if (!ret) {
                                        socket_suspend(skt);
                                        msleep(100);
                                }
                        }
                        if ((sysfs_events & PCMCIA_UEVENT_RESUME) &&
     deb:       f6 c4 80                test   $0x80,%ah
     dee:       0f 84 24 01 00 00       je     f18 <pccardd+0x278>
                                !(skt->state & SOCKET_CARDBUS)) {
                                ret = socket_resume(skt);
                                if (!ret && skt->callback)
                                        skt->callback->resume(skt);
                        }
                        if ((sysfs_events & PCMCIA_UEVENT_REQUERY) &&
     df4:       f7 c6 10 00 00 00       test   $0x10,%esi
     dfa:       74 24                   je     e20 <pccardd+0x180>
     dfc:       f6 43 11 80             testb  $0x80,0x11(%ebx)
     e00:       75 1e                   jne    e20 <pccardd+0x180>
                                !(skt->state & SOCKET_CARDBUS)) {
                                if (!ret && skt->callback)
     e02:       8b 4d e4                mov    -0x1c(%ebp),%ecx
     e05:       85 c9                   test   %ecx,%ecx
     e07:       75 17                   jne    e20 <pccardd+0x180>
     e09:       8b 93 14 01 00 00       mov    0x114(%ebx),%edx
     e0f:       85 d2                   test   %edx,%edx
     e11:       74 0d                   je     e20 <pccardd+0x180>
                                        skt->callback->requery(skt);
     e13:       89 d8                   mov    %ebx,%eax
     e15:       ff 52 0c                call   *0xc(%edx)
     e18:       90                      nop
     e19:       8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
                        }
                }
                mutex_unlock(&skt->skt_mutex);
     e20:       8b 45 e8                mov    -0x18(%ebp),%eax
     e23:       e8 fc ff ff ff          call   e24 <pccardd+0x184>

                if (events || sysfs_events)
     e28:       8b 45 ec                mov    -0x14(%ebp),%eax
     e2b:       09 f0                   or     %esi,%eax
     e2d:       0f 85 0d ff ff ff       jne    d40 <pccardd+0xa0>
                        continue;

                if (kthread_should_stop())
     e33:       e8 fc ff ff ff          call   e34 <pccardd+0x194>
     e38:       84 c0                   test   %al,%al
     e3a:       0f 85 30 01 00 00       jne    f70 <pccardd+0x2d0>
                        break;

                schedule();
     e40:       e8 fc ff ff ff          call   e41 <pccardd+0x1a1>
 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
 */
static inline bool try_to_freeze_unsafe(void)
{
        might_sleep();
     e45:       31 c9                   xor    %ecx,%ecx



Stack as obtained points to here.





     e47:       ba 38 00 00 00          mov    $0x38,%edx
     e4c:       b8 3c 01 00 00          mov    $0x13c,%eax
     e51:       e8 fc ff ff ff          call   e52 <pccardd+0x1b2>
     e56:       e8 fc ff ff ff          call   e57 <pccardd+0x1b7>
 *
 * Atomically reads the value of @v.
 */
static inline int atomic_read(const atomic_t *v)
{
        return ACCESS_ONCE((v)->counter);
     e5b:       a1 00 00 00 00          mov    0x0,%eax
/*
 * Check if there is a request to freeze a process
 */
static inline bool freezing(struct task_struct *p)
{
        if (likely(!atomic_read(&system_freezing_cnt)))
     e60:       85 c0                   test   %eax,%eax
     e62:       0f 84 d8 fe ff ff       je     d40 <pccardd+0xa0>
                return false;
        return freezing_slow_path(p);
     e68:       8b 45 dc                mov    -0x24(%ebp),%eax
     e6b:       e8 fc ff ff ff          call   e6c <pccardd+0x1cc>
 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
 */
...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-19 18:02 ` Bruno Prémont
@ 2015-01-20  6:24   ` Linus Torvalds
  2015-01-21 20:37     ` Bruno Prémont
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-01-20  6:24 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Linux Kernel Mailing List

On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont
<bonbons@linux-vserver.org> wrote:
>
> No idea yet which rc is the offender (nor exact patch), but on my not
> so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> converting my laptop into a heater.
>
> lspci for affected nodes:
> 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
>
> Very basics I have, before I attempt any bisection:

Hmm. I'm not seeing anything recent changing anything in this area, so
I suspect that unless somebody else steps up and says "Ahh, that
sounds like xyz", your bisection is the best option.

                   Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-20  6:24   ` Linus Torvalds
@ 2015-01-21 20:37     ` Bruno Prémont
  2015-01-21 21:37       ` Bruno Prémont
  0 siblings, 1 reply; 31+ messages in thread
From: Bruno Prémont @ 2015-01-21 20:37 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra; +Cc: Linux Kernel Mailing List

On Tue, 20 January 2015 Linus Torvalds wrote:
> On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> >
> > No idea yet which rc is the offender (nor exact patch), but on my not
> > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > converting my laptop into a heater.
> >
> > lspci for affected nodes:
> > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> >
> > Very basics I have, before I attempt any bisection:
> 
> Hmm. I'm not seeing anything recent changing anything in this area, so
> I suspect that unless somebody else steps up and says "Ahh, that
> sounds like xyz", your bisection is the best option.

I've done most of the bisection and ended up in the scheduler changes.
CCing Peter.

A few iterations from the end kernel is warning about (might?)sleeping during
locking or so (scrolling too fast to read end making access to console
hard).

My current bisection log is:
git bisect start
# bad: [117af36e3bceb4fceb1c63489d6f3d94ed259a4c] Apple-GMUX: inhibit VGA arbitration changes
git bisect bad 117af36e3bceb4fceb1c63489d6f3d94ed259a4c
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d
# bad: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect bad c0222ac086669a631814bbf857f8c8023452a4d7
# bad: [2183a58803c2bbd87c2d0057eed6779ec4718d4d] Merge tag 'media/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad 2183a58803c2bbd87c2d0057eed6779ec4718d4d
# good: [6da314122ddc11936c6f054753bbb956a499d020] Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 6da314122ddc11936c6f054753bbb956a499d020
# bad: [cbfe0de303a55ed96d8831c2d5f56f8131cd6612] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad cbfe0de303a55ed96d8831c2d5f56f8131cd6612
# bad: [9e66645d72d3c395da92b0f8855c787f4b5f0e89] Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9e66645d72d3c395da92b0f8855c787f4b5f0e89
# good: [5706ffd045c3810912c4982357d7daa721af3464] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 5706ffd045c3810912c4982357d7daa721af3464
# bad: [a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa
# bad: [acb32132ec0433c03bed750f3e9508dc29db0328] sched/deadline: Add deadline rq status print
git bisect bad acb32132ec0433c03bed750f3e9508dc29db0328
# good: [1029a2b52c09e479fd7b07275812ad97868c0fb0] sched, exit: Deal with nested sleeps
git bisect good 1029a2b52c09e479fd7b07275812ad97868c0fb0


That means, the remaining commits would be:
 commit acb32132ec0433c03bed750f3e9508dc29db0328
 Author: Wanpeng Li <wanpeng.li@linux.intel.com>
 Date:   Fri Oct 31 06:39:33 2014 +0800

     sched/deadline: Add deadline rq status print

 ...

 commit e23738a7300a7591a57a22f47b813fd1b53ec404
 Author: Peter Zijlstra <peterz@infradead.org>
 Date:   Wed Sep 24 10:18:50 2014 +0200

     sched, inotify: Deal with nested sleeps


It looks like pccardd might be doing the wrong thing for
locking/sleeping/waiting.


Trying to complete bisection.

Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-21 20:37     ` Bruno Prémont
@ 2015-01-21 21:37       ` Bruno Prémont
  2015-01-21 22:12         ` Davidlohr Bueso
  0 siblings, 1 reply; 31+ messages in thread
From: Bruno Prémont @ 2015-01-21 21:37 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Linux Kernel Mailing List, tglx, ilya.dryomov, umgwanakikbuti, oleg

On Wed, 21 January 2015 Bruno Prémont wrote:
> On Tue, 20 January 2015 Linus Torvalds wrote:
> > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> > >
> > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > converting my laptop into a heater.
> > >
> > > lspci for affected nodes:
> > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > >
> > > Very basics I have, before I attempt any bisection:
> > 
> > Hmm. I'm not seeing anything recent changing anything in this area, so
> > I suspect that unless somebody else steps up and says "Ahh, that
> > sounds like xyz", your bisection is the best option.

Bisecting to the end did point me at (the warning traces produced in great
quantities might not be the very same issue as the abusive CPU usage, but
certainly look very related):
  [CCing people on CC for the patch]

commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Sep 24 10:18:55 2014 +0200

    sched: Debug nested sleeps
    
    Validate we call might_sleep() with TASK_RUNNING, which catches places
    where we nest blocking primitives, eg. mutex usage in a wait loop.
    
    Since all blocking is arranged through task_struct::state, nesting
    this will cause the inner primitive to set TASK_RUNNING and the outer
    will thus not block.
    
    Another observed problem is calling a blocking function from
    schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
    then destroy the task state for the actual __schedule() call that
    comes after it.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: tglx@linutronix.de
    Cc: ilya.dryomov@inktank.com
    Cc: umgwanakikbuti@gmail.com
    Cc: oleg@redhat.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

Which does produce the following trace (hand-copied most important parts of it):
  Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
  do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
  ...
  Call trace:
    ...
    __might_sleep+0x143/0x170
    ? pccardd+0xa0/0x3e0
    ? pccardd+0xa0/0x3e0
    mutex_lock+0x17/0x2a
    pccardd+0xe9/0x3e0
    ? pcmcia_socket_uevent+0x30/0x30

pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
Peter's patch wants to warn about.


Bruno

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-21 21:37       ` Bruno Prémont
@ 2015-01-21 22:12         ` Davidlohr Bueso
  2015-01-21 22:54           ` Peter Hurley
  0 siblings, 1 reply; 31+ messages in thread
From: Davidlohr Bueso @ 2015-01-21 22:12 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Linus Torvalds, Peter Zijlstra, Linux Kernel Mailing List, tglx,
	ilya.dryomov, umgwanakikbuti, oleg

On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
> On Wed, 21 January 2015 Bruno Prémont wrote:
> > On Tue, 20 January 2015 Linus Torvalds wrote:
> > > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> > > >
> > > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > > converting my laptop into a heater.
> > > >
> > > > lspci for affected nodes:
> > > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20)
> > > >
> > > > Very basics I have, before I attempt any bisection:
> > > 
> > > Hmm. I'm not seeing anything recent changing anything in this area, so
> > > I suspect that unless somebody else steps up and says "Ahh, that
> > > sounds like xyz", your bisection is the best option.
> 
> Bisecting to the end did point me at (the warning traces produced in great
> quantities might not be the very same issue as the abusive CPU usage, but
> certainly look very related):
>   [CCing people on CC for the patch]
> 
> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Wed Sep 24 10:18:55 2014 +0200
> 
>     sched: Debug nested sleeps
>     
>     Validate we call might_sleep() with TASK_RUNNING, which catches places
>     where we nest blocking primitives, eg. mutex usage in a wait loop.
>     
>     Since all blocking is arranged through task_struct::state, nesting
>     this will cause the inner primitive to set TASK_RUNNING and the outer
>     will thus not block.
>     
>     Another observed problem is calling a blocking function from
>     schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
>     then destroy the task state for the actual __schedule() call that
>     comes after it.
>     
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: tglx@linutronix.de
>     Cc: ilya.dryomov@inktank.com
>     Cc: umgwanakikbuti@gmail.com
>     Cc: oleg@redhat.com
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.org
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> Which does produce the following trace (hand-copied most important parts of it):
>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
>   do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
>   ...
>   Call trace:
>     ...
>     __might_sleep+0x143/0x170
>     ? pccardd+0xa0/0x3e0
>     ? pccardd+0xa0/0x3e0
>     mutex_lock+0x17/0x2a
>     pccardd+0xe9/0x3e0
>     ? pcmcia_socket_uevent+0x30/0x30
> 
> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
> Peter's patch wants to warn about.

Yeah setting current to interruptable so early in the game is bogus. It
should be set after unlocking the skt_mutex.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-21 22:12         ` Davidlohr Bueso
@ 2015-01-21 22:54           ` Peter Hurley
  2015-01-30  1:22             ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Hurley @ 2015-01-21 22:54 UTC (permalink / raw)
  To: Davidlohr Bueso, Bruno Prémont, Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, tglx, ilya.dryomov,
	umgwanakikbuti, oleg

On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
> On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
>> On Wed, 21 January 2015 Bruno Prémont wrote:
>>> On Tue, 20 January 2015 Linus Torvalds wrote:
>>>> On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
>>>>>
>>>>> No idea yet which rc is the offender (nor exact patch), but on my not
>>>>> so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
>>>>> converting my laptop into a heater.

[...]

>> Bisecting to the end did point me at (the warning traces produced in great
>> quantities might not be the very same issue as the abusive CPU usage, but
>> certainly look very related):
>>   [CCing people on CC for the patch]
>>
>> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
>> Author: Peter Zijlstra <peterz@infradead.org>
>> Date:   Wed Sep 24 10:18:55 2014 +0200

[...]

>> Which does produce the following trace (hand-copied most important parts of it):
>>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
>>   do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
>>   ...
>>   Call trace:
>>     ...
>>     __might_sleep+0x143/0x170
>>     ? pccardd+0xa0/0x3e0
>>     ? pccardd+0xa0/0x3e0
>>     mutex_lock+0x17/0x2a
>>     pccardd+0xe9/0x3e0
>>     ? pcmcia_socket_uevent+0x30/0x30
>>
>> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
>> Peter's patch wants to warn about.
> 
> Yeah setting current to interruptable so early in the game is bogus. It
> should be set after unlocking the skt_mutex.

Yeah, but the debug check is triggering worse behavior, requiring
bisecting back to the debug commit.

How does the might_sleep() check here guarantee the task won't sleep?

Regards,
Peter Hurley


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:22             ` Rafael J. Wysocki
@ 2015-01-30  1:12               ` Linus Torvalds
  2015-01-30  1:25                 ` Linus Torvalds
  2015-01-30  1:49                 ` Rafael J. Wysocki
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-01-30  1:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
>>
>> Yeah, but the debug check is triggering worse behavior, requiring
>> bisecting back to the debug commit.
>
> Yes, it is.
>
> So I'm wondering is anyone is working on fixing this in any way?
>
> It kind of sucks when this is happening on an otherwise perfectly usable
> old(ish) machine ...

The WARN() was already changed to a WARN_ONCE().

So that debug check doesn't cause problems any more. If somebody is
bisecting something else, and the WARN() is a problem for those
intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
should get you past that point.

IOW, this really shouldn't be an issue.

Does the pccard thing still not work?

                           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-21 22:54           ` Peter Hurley
@ 2015-01-30  1:22             ` Rafael J. Wysocki
  2015-01-30  1:12               ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-01-30  1:22 UTC (permalink / raw)
  To: Peter Hurley, Davidlohr Bueso, Peter Zijlstra
  Cc: Bruno Prémont, Linus Torvalds, Linux Kernel Mailing List,
	tglx, ilya.dryomov, umgwanakikbuti, oleg

On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
> On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
> > On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
> >> On Wed, 21 January 2015 Bruno Prémont wrote:
> >>> On Tue, 20 January 2015 Linus Torvalds wrote:
> >>>> On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> >>>>>
> >>>>> No idea yet which rc is the offender (nor exact patch), but on my not
> >>>>> so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> >>>>> converting my laptop into a heater.
> 
> [...]
> 
> >> Bisecting to the end did point me at (the warning traces produced in great
> >> quantities might not be the very same issue as the abusive CPU usage, but
> >> certainly look very related):
> >>   [CCing people on CC for the patch]
> >>
> >> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
> >> Author: Peter Zijlstra <peterz@infradead.org>
> >> Date:   Wed Sep 24 10:18:55 2014 +0200
> 
> [...]
> 
> >> Which does produce the following trace (hand-copied most important parts of it):
> >>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
> >>   do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1436390>] pccardd+0xa0/0x3e0
> >>   ...
> >>   Call trace:
> >>     ...
> >>     __might_sleep+0x143/0x170
> >>     ? pccardd+0xa0/0x3e0
> >>     ? pccardd+0xa0/0x3e0
> >>     mutex_lock+0x17/0x2a
> >>     pccardd+0xe9/0x3e0
> >>     ? pcmcia_socket_uevent+0x30/0x30
> >>
> >> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
> >> Peter's patch wants to warn about.
> > 
> > Yeah setting current to interruptable so early in the game is bogus. It
> > should be set after unlocking the skt_mutex.
> 
> Yeah, but the debug check is triggering worse behavior, requiring
> bisecting back to the debug commit.

Yes, it is.

So I'm wondering is anyone is working on fixing this in any way?

It kind of sucks when this is happening on an otherwise perfectly usable
old(ish) machine ...

Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:12               ` Linus Torvalds
@ 2015-01-30  1:25                 ` Linus Torvalds
  2015-01-30  1:35                   ` Linus Torvalds
                                     ` (5 more replies)
  2015-01-30  1:49                 ` Rafael J. Wysocki
  1 sibling, 6 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-01-30  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar
  Cc: Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The WARN() was already changed to a WARN_ONCE().

Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
always happening.

So I think the right fix is to:

 - warn once like we do

 - but *not* do that __set_current_state() which was always total crap anyway

Why do I say "total crap"? Because of two independent issues:

 (a) it actually changes behavior for a debug vs non-debug kernel,
which is a really bad idea to begin with

 (b) it's really wrong. The whole "nested sleep" case was never a
major bug to begin with, just a possible inefficiency where constant
nested sleeps would possibly make the outer sleep not sleep. But that
"could possibly make" case was the unlikely case, and the debug patch
made it happen *all* the time by explicitly setting things running.

So I think the proper patch is the attached.

The comment is also crap. The comment says

    "Blocking primitives will set (and therefore destroy) current->state [...]"

but the reality is that they *may* set it, and only in the unlikely
slow-path where they actually block.

So doing this in "__may_sleep()" is just bogus and horrible horrible
crap. It turns the "harmless ugliness" into a real *harmful* bug. The
key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
make relatively rare cases show up.

PeterZ, please don't make "debugging" patches like this. Ever again.
Because this was just stupid, and it took me too long to realize that
despite the warning being shut up, the debug patch was still actively
doing bad bad things.

Ingo, maybe you'd want to apply this through the scheduler tree, the
way you already did the WARN_ONCE() thing.

Bruno, does this finally actually fix your pccard thing?

                              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 845 bytes --]

 kernel/sched/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc00566e..76aaf5c61114 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7292,13 +7292,12 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
 	 * otherwise we will destroy state.
 	 */
-	if (WARN_ONCE(current->state != TASK_RUNNING,
+	WARN_ONCE(current->state != TASK_RUNNING,
 			"do not call blocking ops when !TASK_RUNNING; "
 			"state=%lx set at [<%p>] %pS\n",
 			current->state,
 			(void *)current->task_state_change,
-			(void *)current->task_state_change))
-		__set_current_state(TASK_RUNNING);
+			(void *)current->task_state_change);
 
 	___might_sleep(file, line, preempt_offset);
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
@ 2015-01-30  1:35                   ` Linus Torvalds
  2015-01-30  2:06                   ` Rafael J. Wysocki
                                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-01-30  1:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar
  Cc: Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, Jan 29, 2015 at 5:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.

Side note: I can obviously just commit it myself, but for things that
have obvious maintainers I tend to try to try to push the changes
through channels.

                            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:12               ` Linus Torvalds
  2015-01-30  1:25                 ` Linus Torvalds
@ 2015-01-30  1:49                 ` Rafael J. Wysocki
  2015-02-02  9:48                   ` Zdenek Kabelac
  1 sibling, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-01-30  1:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
> >>
> >> Yeah, but the debug check is triggering worse behavior, requiring
> >> bisecting back to the debug commit.
> >
> > Yes, it is.
> >
> > So I'm wondering is anyone is working on fixing this in any way?
> >
> > It kind of sucks when this is happening on an otherwise perfectly usable
> > old(ish) machine ...
> 
> The WARN() was already changed to a WARN_ONCE().
> 
> So that debug check doesn't cause problems any more. If somebody is
> bisecting something else, and the WARN() is a problem for those
> intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
> should get you past that point.
> 
> IOW, this really shouldn't be an issue.
> 
> Does the pccard thing still not work?

Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
the hint.

Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
  2015-01-30  1:35                   ` Linus Torvalds
@ 2015-01-30  2:06                   ` Rafael J. Wysocki
  2015-01-30 15:47                   ` Oleg Nesterov
                                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-01-30  2:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thursday, January 29, 2015 05:25:07 PM Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.

I applied the patch and compiled the kernel with CONFIG_DEBUG_ATOMIC_SLEEP
set again.  The warning is back, so the DEBUG_ATOMIC_SLEEP thing appears to
be working (and I don't really care about the freaking warning anyway), but the
pccardd load issue is gone, so the patch fixes the problem for me.  Thanks!

Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
  2015-01-30  1:35                   ` Linus Torvalds
  2015-01-30  2:06                   ` Rafael J. Wysocki
@ 2015-01-30 15:47                   ` Oleg Nesterov
  2015-01-31 18:32                     ` Linus Torvalds
  2015-01-31  9:16                   ` Bruno Prémont
                                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2015-01-30 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Hurley, Davidlohr Bueso,
	Peter Zijlstra, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On 01/29, Linus Torvalds wrote:
>
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with

Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
too...

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
                                     ` (2 preceding siblings ...)
  2015-01-30 15:47                   ` Oleg Nesterov
@ 2015-01-31  9:16                   ` Bruno Prémont
  2015-01-31  9:48                   ` Peter Zijlstra
  2015-02-05 21:14                   ` Bruno Prémont
  5 siblings, 0 replies; 31+ messages in thread
From: Bruno Prémont @ 2015-01-31  9:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Hurley, Davidlohr Bueso,
	Peter Zijlstra, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, 29 Jan 2015 17:25:07 Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total
> crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.
> 
> The comment is also crap. The comment says
> 
>     "Blocking primitives will set (and therefore destroy)
> current->state [...]"
> 
> but the reality is that they *may* set it, and only in the unlikely
> slow-path where they actually block.
> 
> So doing this in "__may_sleep()" is just bogus and horrible horrible
> crap. It turns the "harmless ugliness" into a real *harmful* bug. The
> key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
> make relatively rare cases show up.
> 
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.
> 
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.
> 
> Bruno, does this finally actually fix your pccard thing?

I will report back on Wednesday when I'm back home from FOSDEM. I don't
have the affected machine at hand at the moment.

Thanks for looking into it!
Bruno

>                               Linus


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
                                     ` (3 preceding siblings ...)
  2015-01-31  9:16                   ` Bruno Prémont
@ 2015-01-31  9:48                   ` Peter Zijlstra
  2015-02-03 11:18                     ` Ingo Molnar
  2015-02-05 21:14                   ` Bruno Prémont
  5 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2015-01-31  9:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Hurley, Davidlohr Bueso,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.

Understood.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30 15:47                   ` Oleg Nesterov
@ 2015-01-31 18:32                     ` Linus Torvalds
  2015-01-31 20:16                       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-01-31 18:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Hurley, Davidlohr Bueso,
	Peter Zijlstra, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
> too...

Ugh. That thing is horrible. The naming doesn't make it obvious at all
that it's actually making sure that we have state set to TASK_RUNNING,
and I could easily imagine that it would cause similar "busy-loops
while scheduling" issues if anybody ever uses it in the wrong context.

So I really think that whole thing is a sign of "the debug
infrastructure is buggy, and people are introducing fragile things to
just shut up the false positives".

I don't know how to fix it. I really get the feeling that the whole
new "nested sleep" detection code was a mistake to begin with, since
it wasn't even a real bug, and it has now created more bugs than it
ever detected afaik.

                              Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-31 18:32                     ` Linus Torvalds
@ 2015-01-31 20:16                       ` Peter Zijlstra
  2015-01-31 21:54                         ` Linus Torvalds
  2015-02-01 19:43                         ` Oleg Nesterov
  0 siblings, 2 replies; 31+ messages in thread
From: Peter Zijlstra @ 2015-01-31 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
> On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
> > too...
> 
> Ugh. That thing is horrible. The naming doesn't make it obvious at all
> that it's actually making sure that we have state set to TASK_RUNNING,
> and I could easily imagine that it would cause similar "busy-loops
> while scheduling" issues if anybody ever uses it in the wrong context.

The alternative was putting unconditional __set_task_state(TASK_RUNNING)
things in a few code paths. I didn't want to cause the extra code in
case we didn't need them. Particularly the include/net/sock.h one, as I
know the network people are cycle counters.

But this function should indeed be used very rarely. Currently we're at
5 instances.

> So I really think that whole thing is a sign of "the debug
> infrastructure is buggy, and people are introducing fragile things to
> just shut up the false positives".

Aside from this one annotation, which is used in cases where we broke
out of the wait loop but haven't reset task state yet, there have not
really been false positives.

All the stuff it flagged are genuinely wrong, albeit not disastrously
so, things mostly just work.

Should I make the default wait_event() safe against sleeps? It would
make it slightly more expensive, but on the whole that should not really
be a problem.

Alternatively we should provide an alternative to wait_event() that
allows sleeps, but I'm failing to come up with a good name.

> I don't know how to fix it. I really get the feeling that the whole
> new "nested sleep" detection code was a mistake to begin with, since
> it wasn't even a real bug, and it has now created more bugs than it
> ever detected afaik.

I appreciate I caused you some pain here, and I'm very sorry for that.
But I do think we should be little more careful with task::state, on
occasion things do go wrong there and funny stuff happens.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-31 20:16                       ` Peter Zijlstra
@ 2015-01-31 21:54                         ` Linus Torvalds
  2015-02-02 13:19                           ` Peter Zijlstra
  2015-02-01 19:43                         ` Oleg Nesterov
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-01-31 21:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> All the stuff it flagged are genuinely wrong, albeit not disastrously
> so, things mostly just work.

I really disagree.

They weren't wrong. They *could* occasionally result in extra
reschedules, but that was never wrong to begin with.

But the debugging code made that much much worse, and made that "could
result in extra reschedules" happen all the time if it triggered.

And the whole "set task to sleep early" thing was rather intentional,
and was a fundamental part of the original design for
"select()/poll()" behavior, for example. Yes, we ended up having the
waitqueue functions and using that "pollwake()" and use
"wq->triggered" etc instead, but the whole optimistic early sleep
thing worked reasonably well for a long time even when the "early
TASK_SLEEP" was done before calling *thousands* of random poll
routines.

So you say "genuinely wrong", and I say "but that's how things were
designed - it's an optimistic approach, not an exact one". Your
debugging code changed that behavior, and actually introduced a real
bug, exactly because you felt that the "no nested sleeps" was a harder
requirement than it has ever actually been.

In other words, I think the debugging code itself is wrong, and then
that sched_annotate_sleep() thing is just a symptom of how it is
wrong. If you have to sprinkle these kinds of random workarounds in a
few core scheduler places (ok, mainly "wait()" paths, it looks like),
why would you expect random *drivers* to have to care about things
that even core kernel code says "I'm not going to care about this,
I'll just shut the warning up, because the warning is wrong".

Yes, the fact that select/poll was changed to try to avoid excessive
polling scheduling because it actually *was* a problem under some
loads does say that we generally want to try to avoid nested sleeping.
Because while it is rare and the optimistic approach works fine in
most cases, it certainly *can* become a problem if the optimistic "I'm
normally not going to sleep" thing ends up not being sufficiently
accurate.

So don't get me wrong - I think the whole "add debug code to find
places where we might have issues" was well worth it, and resulted in
improvements.

But once the low-hanging fruit and the core code that everybody hits
has been fixed, and people cannot apparently even be bothered with the
other cases it finds (like the pccardd case), at that point the value
of the debug code becomes rather less obvious.

And the downsides become bigger.

The pccardd example is an example of legacy use of our old and
original semantics of how the whole nested sleep was supposed to work.
And it *does* work. It's not a bug. It's how things have worked time
immemorial, and clearly nobody is really willing to bother with
changing working - but legacy - cardbus code.  And at that point, I
think the debug code is actually *wrong*, and causes more problems
than it "fixes".

And debug code that causes more problems that it fixes should either
be removed, or improved to the point where the problems go away.

The "improved" part might be about saying "it's actually perfectly
_fine_ to have nested sleeps, as long as it is truly rare that the
nested sleep actually sleeps". And thus make the debug code really
test that it's *rare*, rather than test that it never happens. Warn if
it happens more than a couple of times a second for any particular
process, or something like that.

Hmm?

                                  Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-31 20:16                       ` Peter Zijlstra
  2015-01-31 21:54                         ` Linus Torvalds
@ 2015-02-01 19:43                         ` Oleg Nesterov
  2015-02-01 20:09                           ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2015-02-01 19:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On 01/31, Peter Zijlstra wrote:
>
> On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
> > On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
> > > too...
> >
> > Ugh. That thing is horrible. The naming doesn't make it obvious at all
> > that it's actually making sure that we have state set to TASK_RUNNING,
> > and I could easily imagine that it would cause similar "busy-loops
> > while scheduling" issues if anybody ever uses it in the wrong context.
>
> The alternative was putting unconditional __set_task_state(TASK_RUNNING)
> things in a few code paths. I didn't want to cause the extra code in
> case we didn't need them. Particularly the include/net/sock.h one, as I
> know the network people are cycle counters.

And personally I agree. sched_annotate_sleep() looks self-documented, it
is clear that it is used to suppress the warning.

Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
adds?

Oleg.


--- x/include/linux/kernel.h
+++ x/include/linux/kernel.h
@@ -176,7 +176,7 @@ extern int _cond_resched(void);
  */
 # define might_sleep() \
 	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
-# define sched_annotate_sleep()	__set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()	(current->task_state_change = 0)
 #else
   static inline void ___might_sleep(const char *file, int line,
 				   int preempt_offset) { }
--- x/kernel/sched/core.c
+++ x/kernel/sched/core.c
@@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
 	 * otherwise we will destroy state.
 	 */
-	if (WARN_ONCE(current->state != TASK_RUNNING,
+	if (WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,
 			"do not call blocking ops when !TASK_RUNNING; "
 			"state=%lx set at [<%p>] %pS\n",
 			current->state,


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-01 19:43                         ` Oleg Nesterov
@ 2015-02-01 20:09                           ` Linus Torvalds
  2015-02-01 20:19                             ` Oleg Nesterov
  2015-02-02 15:34                             ` Peter Zijlstra
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-02-01 20:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> And personally I agree. sched_annotate_sleep() looks self-documented, it
> is clear that it is used to suppress the warning.

But *that's not the problem*.

If it was just silencing the warning, things would be fine.

But it is actively screwing task state up, and actually changing
behavior of the kernel (in a very subtle part of the code too), and
doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

That's the thing that upsets me. This is debug code. It's not even
debugging things that are "buggy" - its' just finding things that can
be potentially slightly inefficient. And it already introduced a
subtle bug once.

(Ok, it wasn't *that* subtle in the end, but it needed both a good
bisection result and some reading of unrelated source code to figure
out, so it certainly wasn't really obvious).

> Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
> adds?

Now this patch I agree with 100% percent, apart from you calling it a
"subtle change". It's more than a "subtle change", this was adding a
real and present bug that took two weeks to get fixed!

(And by "took two weeks to get fixed" I mean "not actually fixed yet,
but now we at least know what was going on").

I like your patch, but I'm going to combine it with mine that actually
fixes a real bug, because what you don't see in that patch of yours:

> --- x/include/linux/kernel.h
> +++ x/include/linux/kernel.h
> @@ -176,7 +176,7 @@ extern int _cond_resched(void);
>   */
>  # define might_sleep() \
>         do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
> -# define sched_annotate_sleep()        __set_current_state(TASK_RUNNING)
> +# define sched_annotate_sleep()        (current->task_state_change = 0)
>  #else
>    static inline void ___might_sleep(const char *file, int line,
>                                    int preempt_offset) { }
> --- x/kernel/sched/core.c
> +++ x/kernel/sched/core.c
> @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
>          * since we will exit with TASK_RUNNING make sure we enter with it,
>          * otherwise we will destroy state.
>          */
> -       if (WARN_ONCE(current->state != TASK_RUNNING,
> +       if (WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,
>                         "do not call blocking ops when !TASK_RUNNING; "
>                         "state=%lx set at [<%p>] %pS\n",
>                         current->state,

is that the whole "if (WARN_ONCE()" remains horribly buggy, because of
the line that follows it:

                __set_current_state(TASK_RUNNING);

in the if-statement.

Now, I have the patch that removes that thing (but I was hoping to get
it from the scheduler tree before doing rc7, which seems to not have
happened), but yes, that together with your patch seems like it should
fix all the nasty bug-inducing crud where the "debugging helpers" end
up silently changing core process state.

I'll just combine it with yours to avoid extra noise in this area, and
mark you as the author, fixing *both* of the incorrect state changes.
Ok?

                          Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-01 20:09                           ` Linus Torvalds
@ 2015-02-01 20:19                             ` Oleg Nesterov
  2015-02-02 15:34                             ` Peter Zijlstra
  1 sibling, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2015-02-01 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On 02/01, Linus Torvalds wrote:
>
> On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > And personally I agree. sched_annotate_sleep() looks self-documented, it
> > is clear that it is used to suppress the warning.
>
> But *that's not the problem*.
>
> If it was just silencing the warning, things would be fine.
>
> But it is actively screwing task state up, and actually changing
> behavior of the kernel (in a very subtle part of the code too), and
> doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
> DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

I understand, that is why I suggested to change it.

> I like your patch, but I'm going to combine it with mine that actually
> fixes a real bug,

Sure, I agree.

> because what you don't see in that patch of yours:
> ...
>
>
> is that the whole "if (WARN_ONCE()" remains horribly buggy, because of
> the line that follows it:
>
>                 __set_current_state(TASK_RUNNING);
>
> in the if-statement.

Ah. I just forgot to mention that this change should be rediffed on top
of your patch, of course it is not enough.

> I'll just combine it with yours to avoid extra noise in this area, and
> mark you as the author, fixing *both* of the incorrect state changes.
> Ok?

Please combine them, but don't mark me as an author, I do not want to
take the false credits ;)

Oleg.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:49                 ` Rafael J. Wysocki
@ 2015-02-02  9:48                   ` Zdenek Kabelac
  0 siblings, 0 replies; 31+ messages in thread
From: Zdenek Kabelac @ 2015-02-02  9:48 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linus Torvalds
  Cc: Peter Hurley, Davidlohr Bueso, Peter Zijlstra,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

Dne 30.1.2015 v 02:49 Rafael J. Wysocki napsal(a):
> On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:
>> On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
>>>>
>>>> Yeah, but the debug check is triggering worse behavior, requiring
>>>> bisecting back to the debug commit.
>>>
>>> Yes, it is.
>>>
>>> So I'm wondering is anyone is working on fixing this in any way?
>>>
>>> It kind of sucks when this is happening on an otherwise perfectly usable
>>> old(ish) machine ...
>>
>> The WARN() was already changed to a WARN_ONCE().
>>
>> So that debug check doesn't cause problems any more. If somebody is
>> bisecting something else, and the WARN() is a problem for those
>> intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
>> should get you past that point.
>>
>> IOW, this really shouldn't be an issue.
>>
>> Does the pccard thing still not work?
>
> Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
> unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
> the hint.


Ok - I  can now 'use'  3.19-rc7 on T61 (C2D) without having a single core 
occupied on 100% with pccardd, but I still get this one logged:


[    2.185320] pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c0000-0x0fffff:
[    2.185363]  excluding 0xc0000-0xcffff 0xe0000-0xfffff
[    2.185526] pcmcia_socket pcmcia_socket0: cs: memory probe 
0xa0000000-0xa0ffffff:
[    2.185559]  excluding 0xa0000000-0xa0ffffff
[    2.185720] pcmcia_socket pcmcia_socket0: cs: memory probe 
0x60000000-0x60ffffff:
[    2.185754]  excluding 0x60000000-0x60ffffff
[    2.297692] ------------[ cut here ]------------
[    2.297698] WARNING: CPU: 1 PID: 185 at kernel/sched/core.c:7300 
__might_sleep+0xae/0xc0()
[    2.297702] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[<ffffffff81518e18>] pccardd+0xe8/0x490
[    2.297712] Modules linked in: uhci_hcd sr_mod cdrom ehci_pci ehci_hcd 
yenta_socket usbcore usb_common video backlight autofs4
[    2.297715] CPU: 1 PID: 185 Comm: pccardd Not tainted 
3.19.0-rc7-00002-g9afdf96 #5
[    2.297716] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[    2.297719]  ffffffff819e4d8a ffff8800b8bdbc58 ffffffff8163926d 
ffffffff810a9e1e
[    2.297721]  ffff8800b8bdbca8 ffff8800b8bdbc98 ffffffff810587ba 
ffff880000000000
[    2.297724]  ffffffff819e6073 000000000000026d 0000000000000000 
0000000000000080
[    2.297725] Call Trace:
[    2.297729]  [<ffffffff8163926d>] dump_stack+0x4f/0x7b
[    2.297732]  [<ffffffff810a9e1e>] ? down_trylock+0x2e/0x40
[    2.297736]  [<ffffffff810587ba>] warn_slowpath_common+0x8a/0xc0
[    2.297738]  [<ffffffff81058836>] warn_slowpath_fmt+0x46/0x50
[    2.297741]  [<ffffffff81518e18>] ? pccardd+0xe8/0x490
[    2.297742]  [<ffffffff81518e18>] ? pccardd+0xe8/0x490
[    2.297744]  [<ffffffff8108412e>] __might_sleep+0xae/0xc0
[    2.297747]  [<ffffffff8163cd7e>] mutex_lock_nested+0x2e/0x440
[    2.297749]  [<ffffffff8164189d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[    2.297753]  [<ffffffff8108b42b>] ? preempt_count_sub+0xab/0x100
[    2.297755]  [<ffffffff81518e80>] pccardd+0x150/0x490
[    2.297757]  [<ffffffff81518d30>] ? pcmcia_socket_dev_complete+0x40/0x40
[    2.297760]  [<ffffffff8107db1f>] kthread+0x11f/0x140
[    2.297763]  [<ffffffff8107da00>] ? kthread_create_on_node+0x260/0x260
[    2.297766]  [<ffffffff8164242c>] ret_from_fork+0x7c/0xb0
[    2.297768]  [<ffffffff8107da00>] ? kthread_create_on_node+0x260/0x260
[    2.297770] ---[ end trace ed9e591061d223e6 ]---
[    2.464635] usb 3-1: new full-speed USB device number 2 using uhci_hcd




and of course number of these:

[   29.660708] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   29.667061] i915 0000:00:02.0: registered panic notifier
[   30.314206] ------------[ cut here ]------------
[   30.318866] WARNING: CPU: 0 PID: 1010 at drivers/gpu/drm/drm_irq.c:1121 
drm_wait_one_vblank+0x180/0x190 [drm]()
[   30.329085] vblank not available on crtc 1, ret=-22
[   30.334003] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm 
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
  iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun brid
ge stp llc ipv6 ebtables ip6_tables iptable_filter ip_tables x_tables bnep 
iTCO_wdt hid_generic iTCO_vendor_support snd_hda_codec_analo
g snd_hda_codec_generic coretemp kvm_intel kvm microcode usbhid psmouse 
serio_raw hid arc4 r852 sm_common nand i2c_i801 nand_ecc i2c_co
re nand_ids mtd btusb bluetooth iwl3945 sdhci_pci r592 iwlegacy memstick 
pcmcia snd_hda_intel sdhci mac80211 snd_hda_controller mmc_cor
e snd_hda_codec snd_hwdep lpc_ich snd_seq mfd_core snd_seq_device snd_pcm 
e1000e cfg80211 ptp thinkpad_acpi snd_timer pps_core wmi nvra
m
[   30.406592]  snd soundcore evdev nfsd auth_rpcgss oid_registry nfs_acl 
lockd grace binfmt_misc loop sunrpc uhci_hcd sr_mod cdrom ehc
i_pci ehci_hcd yenta_socket usbcore usb_common video backlight autofs4
[   30.424031] CPU: 1 PID: 1010 Comm: Xorg.bin Tainted: G        W 
3.19.0-rc7-00002-g9afdf96 #5
[   30.432930] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[   30.440627]  ffffffffa0e1017f ffff880135a37a28 ffffffff8163926d 
ffff88013abcf238
[   30.448250]  ffff880135a37a78 ffff880135a37a68 ffffffff810587ba 
0000000000000001
[   30.455780]  ffff8800af5b4000 ffff880135974000 0000000000000001 
ffff8800acca0000
[   30.463328] Call Trace:
[   30.465850]  [<ffffffff8163926d>] dump_stack+0x4f/0x7b
[   30.471066]  [<ffffffff810587ba>] warn_slowpath_common+0x8a/0xc0
[   30.477095]  [<ffffffff81058836>] warn_slowpath_fmt+0x46/0x50
[   30.482919]  [<ffffffffa0de5a10>] drm_wait_one_vblank+0x180/0x190 [drm]
[   30.489683]  [<ffffffffa0f11a70>] intel_enable_sdvo+0x70/0x120 [i915]
[   30.496206]  [<ffffffffa0ed6241>] ? intel_enable_pipe+0xf1/0x220 [i915]
[   30.502914]  [<ffffffffa0ede11b>] i9xx_crtc_enable+0x3fb/0x4b0 [i915]
[   30.509478]  [<ffffffffa0edc692>] __intel_set_mode+0x8a2/0xcb0 [i915]
[   30.515995]  [<ffffffffa0ee351b>] intel_crtc_set_config+0xc0b/0xf90 [i915]
[   30.522922]  [<ffffffffa0def259>] drm_mode_set_config_internal+0x69/0x120 [drm]
[   30.530310]  [<ffffffffa0e4e5d8>] restore_fbdev_mode+0xc8/0xf0 [drm_kms_helper]
[   30.537682]  [<ffffffff811d1607>] ? kfree+0xe7/0x3b0
[   30.542771]  [<ffffffffa0e50649>] 
drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[   30.552175]  [<ffffffffa0ef212e>] intel_fbdev_restore_mode+0x1e/0x50 [i915]
[   30.559259]  [<ffffffffa0f16fee>] i915_driver_lastclose+0xe/0x10 [i915]
[   30.565916]  [<ffffffffa0de1e1e>] drm_lastclose+0x2e/0x150 [drm]
[   30.571991]  [<ffffffffa0de2280>] drm_release+0x340/0x550 [drm]
[   30.577986]  [<ffffffff811ef5b3>] __fput+0xf3/0x210
[   30.582926]  [<ffffffff811ef71e>] ____fput+0xe/0x10
[   30.587931]  [<ffffffff8107bfdc>] task_work_run+0xcc/0x100
[   30.593479]  [<ffffffff81004181>] do_notify_resume+0x61/0x90
[   30.599241]  [<ffffffff816427c7>] int_signal+0x12/0x17
[   30.604456] ---[ end trace ed9e591061d223e7 ]---
[   30.609191] ------------[ cut here ]------------




Regards


Zdenek


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-31 21:54                         ` Linus Torvalds
@ 2015-02-02 13:19                           ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2015-02-02 13:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith, Julia Lawall,
	Gilles Muller, Nicolas Palix, Michal Marek

On Sat, Jan 31, 2015 at 01:54:36PM -0800, Linus Torvalds wrote:
> On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > All the stuff it flagged are genuinely wrong, albeit not disastrously
> > so, things mostly just work.
> 
> I really disagree.
> 
> They weren't wrong. They *could* occasionally result in extra
> reschedules, but that was never wrong to begin with.
> 
> But the debugging code made that much much worse, and made that "could
> result in extra reschedules" happen all the time if it triggered.

Yes, that was a bad choice. Again, sorry about that.

> So you say "genuinely wrong", and I say "but that's how things were
> designed - it's an optimistic approach, not an exact one". Your
> debugging code changed that behavior, and actually introduced a real
> bug, exactly because you felt that the "no nested sleeps" was a harder
> requirement than it has ever actually been.
> 
> In other words, I think the debugging code itself is wrong, and then
> that sched_annotate_sleep() thing is just a symptom of how it is
> wrong. If you have to sprinkle these kinds of random workarounds in a
> few core scheduler places (ok, mainly "wait()" paths, it looks like),
> why would you expect random *drivers* to have to care about things
> that even core kernel code says "I'm not going to care about this,
> I'll just shut the warning up, because the warning is wrong".

I was not aware this pattern was so wide spread / accepted. I have
totally underestimated the amount of fallout here.

> So don't get me wrong - I think the whole "add debug code to find
> places where we might have issues" was well worth it, and resulted in
> improvements.
> 
> But once the low-hanging fruit and the core code that everybody hits
> has been fixed, and people cannot apparently even be bothered with the
> other cases it finds (like the pccardd case), at that point the value
> of the debug code becomes rather less obvious.

My bad (again) for not paying proper attention there. No excuses for
that.

> The pccardd example is an example of legacy use of our old and
> original semantics of how the whole nested sleep was supposed to work.
> And it *does* work. It's not a bug. It's how things have worked time
> immemorial, and clearly nobody is really willing to bother with
> changing working - but legacy - cardbus code.  And at that point, I
> think the debug code is actually *wrong*, and causes more problems
> than it "fixes".
> 
> And debug code that causes more problems that it fixes should either
> be removed, or improved to the point where the problems go away.
> 
> The "improved" part might be about saying "it's actually perfectly
> _fine_ to have nested sleeps, as long as it is truly rare that the
> nested sleep actually sleeps". And thus make the debug code really
> test that it's *rare*, rather than test that it never happens. Warn if
> it happens more than a couple of times a second for any particular
> process, or something like that.

So the thing that makes it work is the fact that schedule is called in
a loop, without that loop things come apart very quickly.

Now most core primitives have this loop, and are safe from spurious
wakeups -- and I think we can class this under that. But there's heaps
of legacy code that has non looping calls of schedule and really breaks
with spurious wakeups (I ran into that a few years ago when I made
schedule() return 'randomly' for some other reason).

So I worry about separating nesting sleep in loops, which are mostly
harmless vs the non-loop case which is really bad.

FWIW the bug that started all this was someone calling blocking
functions from io_schedule():

	io_schedule() -> blk_flush_plug() -> block-layer-magic ->
	device-magic() -> mutex_lock().

And mutex_lock(() used:

	if (need_resched())
		schedule();

for preemption and would never return (because !TASK_RUNNING).

Now, we've since fixed the mutex code to not assume TASK_RUNNING, so we
won't actually trigger the reported 'deadlock' anymore.

In any case, I can certainly make the warning go away for now and try
again later with a (hopefully) more intelligent version.




On a related note, would it be possible to make sparse/coccinelle try
and warn on broken wait primitives? ie, no loop around schedule(), no
conditional after set_current_state()?

Examples:

drivers/atm/atmtcp.c-   while (test_bit(flag,&vcc->flags) == old_test) {
drivers/atm/atmtcp.c-           mb();
drivers/atm/atmtcp.c-           out_vcc = PRIV(vcc->dev) ? PRIV(vcc->dev)->vcc : NULL;
drivers/atm/atmtcp.c-           if (!out_vcc) {
drivers/atm/atmtcp.c-                   error = -EUNATCH;
drivers/atm/atmtcp.c-                   break;
drivers/atm/atmtcp.c-           }

		set_bit(flag, &vcc->flags);
		wake_up_process(foo);

drivers/atm/atmtcp.c-           set_current_state(TASK_UNINTERRUPTIBLE);
drivers/atm/atmtcp.c:           schedule();
drivers/atm/atmtcp.c-   }

Would not ever wake again it seems.



And I'm not entire sure what this is supposed to do, but it looks fishy
(seems to be popular though, I've found more instances of it):

drivers/iommu/amd_iommu_v2.c-static void put_device_state_wait(struct device_state *dev_state)
drivers/iommu/amd_iommu_v2.c-{
drivers/iommu/amd_iommu_v2.c-   DEFINE_WAIT(wait);
drivers/iommu/amd_iommu_v2.c-
drivers/iommu/amd_iommu_v2.c-   prepare_to_wait(&dev_state->wq, &wait, TASK_UNINTERRUPTIBLE);
drivers/iommu/amd_iommu_v2.c-   if (!atomic_dec_and_test(&dev_state->count))
drivers/iommu/amd_iommu_v2.c:           schedule();
drivers/iommu/amd_iommu_v2.c-   finish_wait(&dev_state->wq, &wait);
drivers/iommu/amd_iommu_v2.c-
drivers/iommu/amd_iommu_v2.c-   free_device_state(dev_state);
drivers/iommu/amd_iommu_v2.c-}

Did they want to write:

	if (atomic_dec_and_test(&dev_state->count)) {
		wake_up(&dev_state->wq);
		return;
	}

	wait_event(dev_state->wq, !atomic_read(&dev_state->count));



Also, people seem to have gotten the memo that sched_yield() is
undesired, but not actually understood why; there's a absolute ton of:

	while(!foo)
		schedule();

Some explicitly set TASK_RUNNING, most not so much.



I realize we're not ever going to fix all of that; but it would be good
to avoid growing more of it. And I think the amd_iommu_v2 thing shows
its not only legacy code.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-01 20:09                           ` Linus Torvalds
  2015-02-01 20:19                             ` Oleg Nesterov
@ 2015-02-02 15:34                             ` Peter Zijlstra
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2015-02-02 15:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Bruno Prémont, Linux Kernel Mailing List,
	Thomas Gleixner, Ilya Dryomov, Mike Galbraith

On Sun, Feb 01, 2015 at 12:09:32PM -0800, Linus Torvalds wrote:
> Now, I have the patch that removes that thing (but I was hoping to get
> it from the scheduler tree before doing rc7, which seems to not have
> happened), but yes, that together with your patch seems like it should
> fix all the nasty bug-inducing crud where the "debugging helpers" end
> up silently changing core process state.
> 
> I'll just combine it with yours to avoid extra noise in this area, and
> mark you as the author, fixing *both* of the incorrect state changes.
> Ok?

Ah I see it in your tree; I was about to suggest:

-# define sched_annotate_sleep()        __set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()        do { current->task_state_change = 0; } while (0)

Instead of the assignment, which has a rvalue.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-31  9:48                   ` Peter Zijlstra
@ 2015-02-03 11:18                     ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2015-02-03 11:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Rafael J. Wysocki, Peter Hurley, Davidlohr Bueso,
	Bruno Prémont, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:
>
> > PeterZ, please don't make "debugging" patches like this. Ever 
> > again. Because this was just stupid, and it took me too long 
> > to realize that despite the warning being shut up, the debug 
> > patch was still actively doing bad bad things.
> 
> Understood.

It was my fault too - I should have realized this side effect of 
the new debugging facility when I merged it, which side effect, 
as Linus pointed it out, is a really bad idea.

We are generally pretty good at doing transparent debugging in 
the scheduler and in the locking code, but screwed up this time 
around :-(

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-01-30  1:25                 ` Linus Torvalds
                                     ` (4 preceding siblings ...)
  2015-01-31  9:48                   ` Peter Zijlstra
@ 2015-02-05 21:14                   ` Bruno Prémont
  2015-02-06 11:50                     ` Peter Zijlstra
  5 siblings, 1 reply; 31+ messages in thread
From: Bruno Prémont @ 2015-02-05 21:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Hurley, Davidlohr Bueso,
	Peter Zijlstra, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, 29 January 2015 Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.
> 
> The comment is also crap. The comment says
> 
>     "Blocking primitives will set (and therefore destroy) current->state [...]"
> 
> but the reality is that they *may* set it, and only in the unlikely
> slow-path where they actually block.
> 
> So doing this in "__may_sleep()" is just bogus and horrible horrible
> crap. It turns the "harmless ugliness" into a real *harmful* bug. The
> key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
> make relatively rare cases show up.
> 
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.
> 
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.
> 
> Bruno, does this finally actually fix your pccard thing?

Tested the variant that was applied by running rc7 and it fixes the
endless loop.
The loop is now replaced by a single WARN() trace - I guess expected:

[    3.083647] ------------[ cut here ]------------
[    3.087477] WARNING: CPU: 0 PID: 67 at /usr/src/linux-git/kernel/sched/core.c:7300 __might_sleep+0x79/0x90()
[    3.091357] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c1442040>] pccardd+0xa0/0x3e0
[    3.095232] Modules linked in:
[    3.099020] CPU: 0 PID: 67 Comm: pccardd Not tainted 3.19.0-rc7-00003-g67288c4 #17
[    3.102760] Hardware name: Acer TravelMate 660/TravelMate 660, BIOS 3A19 01/14/2004
[    3.106504]  c212def4 c212def4 c212deb4 c16caf23 c212dee4 c10416fd c1907334 c212df10
[    3.110315]  00000043 c1907380 00001c84 c105a099 c105a099 c1442040 00000001 f5f54bc0
[    3.114143]  c212defc c104176e 00000009 c212def4 c1907334 c212df10 c212df30 c105a099
[    3.117960] Call Trace:
[    3.121703]  [<c16caf23>] dump_stack+0x16/0x18
[    3.125447]  [<c10416fd>] warn_slowpath_common+0x7d/0xc0
[    3.129172]  [<c105a099>] ? __might_sleep+0x79/0x90
[    3.132868]  [<c105a099>] ? __might_sleep+0x79/0x90
[    3.136500]  [<c1442040>] ? pccardd+0xa0/0x3e0
[    3.140092]  [<c104176e>] warn_slowpath_fmt+0x2e/0x30
[    3.143657]  [<c105a099>] __might_sleep+0x79/0x90
[    3.147209]  [<c1442040>] ? pccardd+0xa0/0x3e0
[    3.150747]  [<c1442040>] ? pccardd+0xa0/0x3e0
[    3.154256]  [<c16cf447>] mutex_lock+0x17/0x2a
[    3.157734]  [<c1442089>] pccardd+0xe9/0x3e0
[    3.161207]  [<c1441fa0>] ? pcmcia_socket_uevent+0x30/0x30
[    3.164660]  [<c1056990>] kthread+0xa0/0xc0
[    3.168059]  [<c16d1040>] ret_from_kernel_thread+0x20/0x30
[    3.171436]  [<c10568f0>] ? kthread_worker_fn+0x140/0x140
[    3.174796] ---[ end trace c3f708b642e3d8f0 ]---

>From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
is another issue left for the future.

Thanks,
Bruno

>                               Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-05 21:14                   ` Bruno Prémont
@ 2015-02-06 11:50                     ` Peter Zijlstra
  2015-02-06 16:02                       ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2015-02-06 11:50 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Linus Torvalds, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Thu, Feb 05, 2015 at 10:14:36PM +0100, Bruno Prémont wrote:
> The loop is now replaced by a single WARN() trace - I guess expected:

> From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
> is another issue left for the future.

Yeah, something like the below will make it go away -- under the
assumption that that comment is actually correct, I don't know, the
pcmcia people should probably write a better comment :/

Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
cares about that barrier, so make it go away.

---
 drivers/pcmcia/cs.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pcmcia/cs.c b/drivers/pcmcia/cs.c
index 5292db69c426..5678e161a17d 100644
--- a/drivers/pcmcia/cs.c
+++ b/drivers/pcmcia/cs.c
@@ -635,6 +635,12 @@ static int pccardd(void *__skt)
 		skt->sysfs_events = 0;
 		spin_unlock_irqrestore(&skt->thread_lock, flags);
 
+		/*
+		 * Supposedly this is a rarely contended mutex and
+		 * sleeping is therefore unlikely, the occasional
+		 * extra loop iteration is harmless.
+		 */
+		sched_annotate_sleep();
 		mutex_lock(&skt->skt_mutex);
 		if (events & SS_DETECT)
 			socket_detect_change(skt);
@@ -679,7 +685,7 @@ static int pccardd(void *__skt)
 		try_to_freeze();
 	}
 	/* make sure we are running before we exit */
-	set_current_state(TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
 
 	/* shut down socket, if a device is still present */
 	if (skt->state & SOCKET_PRESENT) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-06 11:50                     ` Peter Zijlstra
@ 2015-02-06 16:02                       ` Linus Torvalds
  2015-02-06 16:39                         ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2015-02-06 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Bruno Prémont, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
> cares about that barrier, so make it go away.

I'd rather not mix this with the patch, and wonder if we should just
do that globally with some preprocessor magic. We do have a fair
number of "set_current_state(TASK_RUNNING)" and at least for the
*documented* reason for the memory barrier, all of them could/should
be barrier-less.

So something like

    if (__is_constant_p(state) && state == TASK_RUNNING)
        tsk->state = state;
    else
        set_mb(tsk->state, state);

might be more general solution than randomly doing one at a time when
changing code around it..

                         Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-06 16:02                       ` Linus Torvalds
@ 2015-02-06 16:39                         ` Peter Zijlstra
  2015-02-06 16:46                           ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2015-02-06 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bruno Prémont, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote:
> On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
> > cares about that barrier, so make it go away.
> 
> I'd rather not mix this with the patch, and wonder if we should just
> do that globally with some preprocessor magic. We do have a fair
> number of "set_current_state(TASK_RUNNING)" 

138

> and at least for the
> *documented* reason for the memory barrier, all of them could/should
> be barrier-less.
> 
> So something like
> 
>     if (__is_constant_p(state) && state == TASK_RUNNING)
>         tsk->state = state;
>     else
>         set_mb(tsk->state, state);
> 
> might be more general solution than randomly doing one at a time when
> changing code around it..

Yeah, or we could do the coccinelle thing and do a mass conversion.

I like the macro one though; I worry a wee bit about non-documented
cases through. If someone is doing something way subtle we'll break it
:/


---
Subject: sched: Avoid the full memory-barrier for set_current_state(TASK_RUNNING)

One should never need the full memory barrier implied by
set_current_state() to set TASK_RUNNING for the documented reason of
avoiding races against wakeup.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..aea44c4eeed8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!(
 				((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
 				 (task->flags & PF_FROZEN) == 0)
 
+/*
+ * set_current_state() includes a barrier so that the write of current->state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ *	set_current_state(TASK_UNINTERRUPTIBLE);
+ *	if (do_i_need_to_sleep())
+ *		schedule();
+ *
+ * If the caller does not need such serialisation then use
+ * __set_current_state(). This is always true for TASK_RUNNING since
+ * there is no race against wakeup -- both write the same value.
+ */
+#define ___set_current_state(state)				\
+do {								\
+	if (__is_constant_p(state) && (state) == TASK_RUNNING)	\
+		current->state = (state);			\
+	else							\
+		set_mb(current->state, (state));		\
+} while (0)
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
 #define __set_task_state(tsk, state_value)			\
@@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!(
 		set_mb((tsk)->state, (state_value));		\
 	} while (0)
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- *	set_current_state(TASK_UNINTERRUPTIBLE);
- *	if (do_i_need_to_sleep())
- *		schedule();
- *
- * If the caller does not need such serialisation then use __set_current_state()
- */
 #define __set_current_state(state_value)			\
 	do {							\
 		current->task_state_change = _THIS_IP_;		\
@@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_current_state(state_value)				\
 	do {							\
 		current->task_state_change = _THIS_IP_;		\
-		set_mb(current->state, (state_value));		\
+		___set_current_state(state_value);		\
 	} while (0)
 
 #else
@@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_task_state(tsk, state_value)		\
 	set_mb((tsk)->state, (state_value))
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- *	set_current_state(TASK_UNINTERRUPTIBLE);
- *	if (do_i_need_to_sleep())
- *		schedule();
- *
- * If the caller does not need such serialisation then use __set_current_state()
- */
 #define __set_current_state(state_value)		\
 	do { current->state = (state_value); } while (0)
 #define set_current_state(state_value)			\
-	set_mb(current->state, (state_value))
+	___set_current_state(state_value);
 
 #endif
 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Linux 3.19-rc5
  2015-02-06 16:39                         ` Peter Zijlstra
@ 2015-02-06 16:46                           ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2015-02-06 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Bruno Prémont, Rafael J. Wysocki, Ingo Molnar, Peter Hurley,
	Davidlohr Bueso, Linux Kernel Mailing List, Thomas Gleixner,
	Ilya Dryomov, Mike Galbraith, Oleg Nesterov

On Fri, Feb 6, 2015 at 8:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> I like the macro one though; I worry a wee bit about non-documented
> cases through. If someone is doing something way subtle we'll break it
> :/

Yeah, I agree. And I wonder how much we should even care. People who
do this thing by hand - rather than using the wait-event model - are
*likely* legacy stuff or just random drivers, or simply stuff that
isn't all that performance-sensitive.

So I dunno how worthwhile this all is.

                     Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-02-06 16:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-18  9:17 Linux 3.19-rc5 Linus Torvalds
2015-01-19 18:02 ` Bruno Prémont
2015-01-20  6:24   ` Linus Torvalds
2015-01-21 20:37     ` Bruno Prémont
2015-01-21 21:37       ` Bruno Prémont
2015-01-21 22:12         ` Davidlohr Bueso
2015-01-21 22:54           ` Peter Hurley
2015-01-30  1:22             ` Rafael J. Wysocki
2015-01-30  1:12               ` Linus Torvalds
2015-01-30  1:25                 ` Linus Torvalds
2015-01-30  1:35                   ` Linus Torvalds
2015-01-30  2:06                   ` Rafael J. Wysocki
2015-01-30 15:47                   ` Oleg Nesterov
2015-01-31 18:32                     ` Linus Torvalds
2015-01-31 20:16                       ` Peter Zijlstra
2015-01-31 21:54                         ` Linus Torvalds
2015-02-02 13:19                           ` Peter Zijlstra
2015-02-01 19:43                         ` Oleg Nesterov
2015-02-01 20:09                           ` Linus Torvalds
2015-02-01 20:19                             ` Oleg Nesterov
2015-02-02 15:34                             ` Peter Zijlstra
2015-01-31  9:16                   ` Bruno Prémont
2015-01-31  9:48                   ` Peter Zijlstra
2015-02-03 11:18                     ` Ingo Molnar
2015-02-05 21:14                   ` Bruno Prémont
2015-02-06 11:50                     ` Peter Zijlstra
2015-02-06 16:02                       ` Linus Torvalds
2015-02-06 16:39                         ` Peter Zijlstra
2015-02-06 16:46                           ` Linus Torvalds
2015-01-30  1:49                 ` Rafael J. Wysocki
2015-02-02  9:48                   ` Zdenek Kabelac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).