LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Linux 3.18 released
@ 2014-12-08  0:10 Linus Torvalds
  2014-12-08 18:39 ` Vince Weaver
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2014-12-08  0:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List

It's been a quiet week, and the patch from rc7 is tiny, so 3.18 is out.

I'd love to say that we've figured out the problem that plagues 3.17
for a couple of people, but we haven't. At the same time, there's
absolutely no point in having everybody else twiddling their thumbs
when a couple of people are actively trying to bisect an older issue,
so holding up the release just didn't make sense. Especially since
that would just have then held things up entirely over the holiday
break.

So the merge window for 3.19 is open, and DaveJ will hopefully get his
bisection done (or at least narrow things down sufficiently that we
have that "Ahaa" moment) over the next week. But in solidarity with
Dave (and to make my life easier too ;) let's try to avoid introducing
any _new_ nasty issues, ok?

                   Linus

---

Aaron Lu (1):
      ACPI / video: update condition to check if device is in _DOD list

Abhilash Kesavan (1):
      watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7

Al Viro (1):
      fat: fix oops on corrupted vfat fs

Alexander Kochetkov (2):
      i2c: omap: fix NACK and Arbitration Lost irq handling
      i2c: omap: fix i207 errata handling

Andrew Jackson (1):
      i2c: designware: prevent early stop on TX FIFO empty

Andrew Morton (2):
      mm/vmpressure.c: fix race in vmpressure_work_fn()
      drivers/input/evdev.c: don't kfree() a vmalloc address

Andrey Utkin (1):
      [media] Update MAINTAINERS for solo6x10

Andy Lutomirski (1):
      context_tracking: Restore previous state in schedule_user

Ben Skeggs (1):
      drm/nouveau/fifo/g84-: ack non-stall interrupt before handling it

Borislav Petkov (1):
      x86, microcode: Limit the microcode reloading to 64-bit for now

Chris Clayton (1):
      x86: Use $(OBJDUMP) instead of plain objdump

Christian König (1):
      drm/radeon: sync all BOs involved in a CS v2

Daniel Forrest (1):
      mm: fix anon_vma_clone() error treatment

Daniel Vetter (2):
      drm/i915: More cautious with pch fifo underruns
      drm/i915: Unlock panel even when LVDS is disabled

Darrick J. Wong (2):
      jbd2: fix regression where we fail to initialize checksum seed
when loading
      block: fix regression where bio_integrity_process uses wrong
bio_vec iterator

Dave Airlie (1):
      nouveau: move the hotplug ignore to correct place.

David Howells (3):
      KEYS: Fix the size of the key description passed to/from userspace
      KEYS: Simplify KEYRING_SEARCH_{NO,DO}_STATE_CHECK flags
      KEYS: request_key() should reget expired keys rather than give EKEYEXPIRED

David Härdeman (1):
      [media] rc-core: fix toggle handling in the rc6 decoder

Devin Ryles (1):
      AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller

Dmitry Torokhov (1):
      sata_fsl: fix error handling of irq_of_parse_and_map

Grygorii Strashko (1):
      i2c: davinci: generate STP always when NACK is received

Hans Verkuil (1):
      [media] cx23885: use sg = sg_next(sg) instead of sg++

Hariprasad Shenai (1):
      cxgb4: Fill in supported link mode for SFP modules

Huacai Chen (1):
      stmmac: platform: Move plat_dat checking earlier

Hugh Dickins (1):
      mm: fix swapoff hang after page migration and fork

Ian Campbell (1):
      of/fdt: memblock_reserve /memreserve/ regions in the case of
partial overlap

Ilia Mirkin (1):
      drm/nouveau/gf116: remove copy1 engine

Kailang Yang (1):
      ALSA: hda/realtek - Add headset Mic support for new Dell machine

Krzysztof Hałasa (1):
      [media] solo6x10: fix a race in IRQ handler

Linus Torvalds (1):
      Linux 3.18

Maarten Lankhorst (1):
      drm/nouveau: prevent stale fence->channel pointers, and protect with rcu

Manfred Spraul (1):
      ipc/sem.c: fully initialize sem_array before making it visible

Masahiro Yamada (1):
      uapi: fix to export linux/vm_sockets.h

Mauro Carvalho Chehab (1):
      MAINTAINERS: Update mchehab's addresses

Michal Simek (1):
      lib/genalloc.c: export devm_gen_pool_create() for modules

Michel Dänzer (1):
      drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86

Mitsuhiro Kimura (2):
      sh_eth: Fix skb alloc size and alignment adjust rule.
      sh_eth: Fix sleeping function called from invalid context

Nicolas Dichtel (1):
      rtnetlink: release net refcnt on error in do_setlink()

Paul Mackerras (1):
      slab: fix nodeid bounds check for non-contiguous node IDs

Petr Mladek (1):
      drm/radeon: kernel panic in
drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6

Rafael Aquini (1):
      mm: do not overwrite reserved pages counter at show_mem()

Sakari Ailus (1):
      [media] smiapp: Only some selection targets are settable

Sebastian Ott (1):
      s390: fix machine check handling

Seth Forshee (1):
      xen-netfront: Remove BUGs on paged skb data which crosses a page boundary

Tejun Heo (1):
      ahci: disable MSI on SAMSUNG 0xa800 SSD

Thierry Reding (1):
      PCI: tegra: Use physical range for I/O mapping

Thomas Graf (1):
      bond: Check length of IFLA_BOND_ARP_IP_TARGET attributes

Vishnu Motghare (1):
      i2c: cadence: Set the hardware time-out register to maximum value

Weijie Yang (1):
      mm: frontswap: invalidate expired data on a dup-store failure

sensoray-dev (1):
      [media] s2255drv: fix payload size for JPG, MJPEG

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux 3.18 released
  2014-12-08  0:10 Linux 3.18 released Linus Torvalds
@ 2014-12-08 18:39 ` Vince Weaver
  2014-12-08 19:11   ` Vince Weaver
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Vince Weaver @ 2014-12-08 18:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar

On Sun, 7 Dec 2014, Linus Torvalds wrote:

> I'd love to say that we've figured out the problem that plagues 3.17
> for a couple of people, but we haven't. At the same time, there's
> absolutely no point in having everybody else twiddling their thumbs
> when a couple of people are actively trying to bisect an older issue,
> so holding up the release just didn't make sense. Especially since
> that would just have then held things up entirely over the holiday
> break.
> 
> So the merge window for 3.19 is open, and DaveJ will hopefully get his
> bisection done (or at least narrow things down sufficiently that we
> have that "Ahaa" moment) over the next week. But in solidarity with
> Dave (and to make my life easier too ;) let's try to avoid introducing
> any _new_ nasty issues, ok?

It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still 
quickly locks the kernel pretty solid on 3.18.

Just 5 minutes of testing managed to trip over the following issue that 
dates back to at least 3.15-rc7

My notes say last time I tracked down the issue as so:

  What happens is in kernel/core/events.c  find_get_context()
  somehow perf_lock_task_context() returns NULL 
  due to !atomic_inc_not_zero(&ctx->refcount)
  but task->perf_event_ctxp[ctxn] still has a valid value.

There are multiple perf related issues like this that are hard to track 
down.  They are borderline heisenbugs that are possibly race conditions, 
so bisecting doesn't work and even things like enablibg ftrace will make 
the issue go away (or crash ftrace itself).

This particular manifestation of the bug (or bugs) wedges things but I can 
use alt-sysrq from the serial console to see where it is stuck (see 
below; the CPU is stuck in a loop).


[ 2225.916004]  [<ffffffff810e61e9>] ? get_page_from_freelist+0x55/0x781
[ 2225.916004]  [<ffffffff810e6a7c>] __alloc_pages_nodemask+0x167/0x6dc
[ 2225.916004]  [<ffffffff8101a4a3>] ? intel_pmu_enable_all+0x28/0xa4
[ 2225.916004]  [<ffffffff8111f0b3>] kmem_getpages+0x58/0xec
[ 2225.916004]  [<ffffffff81120278>] cache_grow+0xad/0x1d8
[ 2225.916004]  [<ffffffff81120021>] ____cache_alloc+0x237/0x2ce
[ 2225.916004]  [<ffffffff811216b9>] __kmalloc+0x8f/0xf2
[ 2225.916004]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
[ 2225.916004]  [<ffffffff810dc35d>] T.1336+0xe/0x10
[ 2225.916004]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
[ 2225.916004]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
[ 2225.916004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2225.916004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2225.916004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

[ 2256.708004]  [<ffffffff810d7e36>] ? put_ctx+0x40/0x61
[ 2256.708004]  [<ffffffff810dcaa4>] find_get_context+0x1a9/0x1c7
[ 2256.708004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2256.708004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2256.708004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

[ 2303.796003]  [<ffffffff810fa6cb>] ? kmalloc_slab+0x7f/0x8d
[ 2303.796003]  [<ffffffff81121653>] __kmalloc+0x29/0xf2
[ 2303.796003]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
[ 2303.796003]  [<ffffffff810dc35d>] T.1336+0xe/0x10
[ 2303.796003]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
[ 2303.796003]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
[ 2303.796003]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
[ 2303.796003]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
[ 2303.796003]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b

Vince

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux 3.18 released
  2014-12-08 18:39 ` Vince Weaver
@ 2014-12-08 19:11   ` Vince Weaver
  2014-12-09 10:18   ` Ingo Molnar
  2014-12-11  0:38   ` Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: Vince Weaver @ 2014-12-08 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar, vincent.weaver

On Mon, 8 Dec 2014, Vince Weaver wrote:

> down.  They are borderline heisenbugs that are possibly race conditions, 
> so bisecting doesn't work and even things like enablibg ftrace will make 
> the issue go away (or crash ftrace itself).

For example, just trying to enable some extra printks in the fuzzer to get 
a log of the syscalls causing the problem quickly causes this:

[  398.604507] ------------[ cut here ]------------
[  398.604507] WARNING: CPU: 0 PID: 2889 at kernel/watchdog.c:290 watchdog_overflow_callback+0x9b/0xa6()
[  398.604507] Watchdog detected hard LOCKUP on cpu 0
[  398.604507] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative f71882fg evdev mcs7830 usbnet ohci_pci psmouse pcspkr serio_raw coretemp ohci_hcd wmi video button i2c_nforce2 acpi_cpufreq processor thermal_sys sg sd_mod ehci_pci ehci_hcd usbcore usb_common
[  398.604507] CPU: 0 PID: 2889 Comm: perf_fuzzer Not tainted 3.18.0+ #166
[  398.604507] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[  398.604507]  0000000000000122 ffff88011fc06ba8 ffffffff8155cf3b 0000000000000122
[  398.604507]  ffff88011fc06bf8 ffff88011fc06be8 ffffffff8104361b 0000000000000000
[  398.604507]  ffffffff810b1711 ffff88011f055000 0000000000000000 ffff88011fc06d28
[  398.604507] Call Trace:
[  398.604507]  <NMI>  [<ffffffff8155cf3b>] dump_stack+0x49/0x5e
[  398.604507]  [<ffffffff8104361b>] warn_slowpath_common+0x81/0x9b
[  398.604507]  [<ffffffff810b1711>] ? watchdog_overflow_callback+0x9b/0xa6
[  398.604507]  [<ffffffff810436d8>] warn_slowpath_fmt+0x46/0x48
[  398.604507]  [<ffffffff810b1711>] watchdog_overflow_callback+0x9b/0xa6
[  398.604507]  [<ffffffff810dafbe>] __perf_event_overflow+0x13c/0x1c6
[  398.604507]  [<ffffffff810db684>] perf_event_overflow+0x19/0x1b
[  398.604507]  [<ffffffff8101ab1e>] intel_pmu_handle_irq+0x2e3/0x378
[  398.604507]  [<ffffffff81013dcf>] perf_event_nmi_handler+0x2d/0x4a
[  398.604507]  [<ffffffff8100636f>] nmi_handle+0x5c/0xf9
[  398.604507]  [<ffffffff810065db>] default_do_nmi+0x4e/0xe2
[  398.604507]  [<ffffffff810066d9>] do_nmi+0x6a/0xaf
[  398.604507]  [<ffffffff81561dda>] end_repeat_nmi+0x1e/0x2e
[  398.604507]  [<ffffffff8155fbec>] ? _raw_spin_lock+0x26/0x2a
[  398.604507]  [<ffffffff8155fbec>] ? _raw_spin_lock+0x26/0x2a
[  398.604507]  [<ffffffff8155fbec>] ? _raw_spin_lock+0x26/0x2a
[  398.604507]  <<EOE>>  [<ffffffff810d5fc7>] perf_ctx_lock+0x28/0x2d
[  398.604507]  [<ffffffff810d8854>] perf_event_context_sched_in+0x3d/0xa0
[  398.604507]  [<ffffffff810d8a4e>] __perf_event_task_sched_in+0x3f/0x111
[  398.604507]  [<ffffffff8106570f>] finish_task_switch+0x46/0xad
[  398.604507]  [<ffffffff8155d48d>] __schedule+0x454/0x4a8
[  398.604507]  [<ffffffff8155d64f>] schedule+0x6e/0x70
[  398.604507]  [<ffffffff811ec527>] jbd2_log_wait_commit+0xa5/0xf0
[  398.604507]  [<ffffffff8106e17e>] ? signal_pending_state+0x35/0x35
[  398.604507]  [<ffffffff8114c79b>] ? default_file_splice_read+0x2d1/0x2d1
[  398.604507]  [<ffffffff811c03a9>] ext4_sync_fs+0xd6/0x124
[  398.604507]  [<ffffffff8114c7bb>] sync_fs_one_sb+0x20/0x22
[  398.604507]  [<ffffffff811295cb>] iterate_supers+0x6e/0xbf
[  398.604507]  [<ffffffff8114cb2f>] sys_sync+0x55/0x83
[  398.604507]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
[  398.604507] ---[ end trace cb94cd46e328aa66 ]---
[  401.000000] ------------[ cut here ]------------
[  401.000000] WARNING: CPU: 1 PID: 3296 at kernel/watchdog.c:290 watchdog_overflow_callback+0x9b/0xa6()
[  401.000000] Watchdog detected hard LOCKUP on cpu 1
[  401.000000] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative f71882fg evdev mcs7830 usbnet ohci_pci psmouse pcspkr serio_raw coretemp ohci_hcd wmi video button i2c_nforce2 acpi_cpufreq processor thermal_sys sg sd_mod ehci_pci ehci_hcd usbcore usb_common
[  401.000000] CPU: 1 PID: 3296 Comm: perf_fuzzer Tainted: G        W      3.18.0+ #166
[  401.000000] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015  10/19/2012
[  401.000000]  0000000000000122 ffff88011fc86ba8 ffffffff8155cf3b 0000000000000122
[  401.000000]  ffff88011fc86bf8 ffff88011fc86be8 ffffffff8104361b 0000000000000000
[  401.000000]  ffffffff810b1711 ffff88011b3bec00 0000000000000000 ffff88011fc86d28
[  401.000000] Call Trace:
[  401.000000]  <NMI>  [<ffffffff8155cf3b>] dump_stack+0x49/0x5e
[  401.000000]  [<ffffffff8104361b>] warn_slowpath_common+0x81/0x9b
[  401.000000]  [<ffffffff810b1711>] ? watchdog_overflow_callback+0x9b/0xa6
[  401.000000]  [<ffffffff810436d8>] warn_slowpath_fmt+0x46/0x48
[  401.000000]  [<ffffffff810b1711>] watchdog_overflow_callback+0x9b/0xa6
[  401.000000]  [<ffffffff810dafbe>] __perf_event_overflow+0x13c/0x1c6
[  401.000000]  [<ffffffff810db684>] perf_event_overflow+0x19/0x1b
[  401.000000]  [<ffffffff8101ab1e>] intel_pmu_handle_irq+0x2e3/0x378
[  401.000000]  [<ffffffff81013dcf>] perf_event_nmi_handler+0x2d/0x4a
[  401.000000]  [<ffffffff8100636f>] nmi_handle+0x5c/0xf9
[  401.000000]  [<ffffffff810065db>] default_do_nmi+0x4e/0xe2
[  401.000000]  [<ffffffff810066d9>] do_nmi+0x6a/0xaf
[  401.000000]  [<ffffffff81561dda>] end_repeat_nmi+0x1e/0x2e
[  401.000000]  [<ffffffff8155fbe5>] ? _raw_spin_lock+0x1f/0x2a
[  401.000000]  [<ffffffff8155fbe5>] ? _raw_spin_lock+0x1f/0x2a
[  401.000000]  [<ffffffff8155fbe5>] ? _raw_spin_lock+0x1f/0x2a
[  401.000000]  <<EOE>>  <IRQ>  [<ffffffff8106c391>] sched_rt_period_timer+0x8b/0x1f2
[  401.000000]  [<ffffffff8108a6b7>] __run_hrtimer+0xba/0x145
[  401.000000]  [<ffffffff8106c306>] ? init_rt_bandwidth+0x46/0x46
[  401.000000]  [<ffffffff8108a91a>] hrtimer_interrupt+0xd5/0x1c3
[  401.000000]  [<ffffffff8102d209>] local_apic_timer_interrupt+0x58/0x5d
[  401.000000]  [<ffffffff8102d6b0>] smp_apic_timer_interrupt+0x2a/0x3c
[  401.000000]  [<ffffffff81560eaa>] apic_timer_interrupt+0x6a/0x70
[  401.000000]  <EOI> 
[  401.000000] ---[ end trace cb94cd46e328aa67 ]---


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux 3.18 released
  2014-12-08 18:39 ` Vince Weaver
  2014-12-08 19:11   ` Vince Weaver
@ 2014-12-09 10:18   ` Ingo Molnar
  2014-12-09 11:06     ` Ingo Molnar
  2014-12-11  0:38   ` Andy Lutomirski
  2 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2014-12-09 10:18 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar


* Vince Weaver <vince@deater.net> wrote:

> On Sun, 7 Dec 2014, Linus Torvalds wrote:
> 
> > I'd love to say that we've figured out the problem that plagues 3.17
> > for a couple of people, but we haven't. At the same time, there's
> > absolutely no point in having everybody else twiddling their thumbs
> > when a couple of people are actively trying to bisect an older issue,
> > so holding up the release just didn't make sense. Especially since
> > that would just have then held things up entirely over the holiday
> > break.
> > 
> > So the merge window for 3.19 is open, and DaveJ will hopefully get his
> > bisection done (or at least narrow things down sufficiently that we
> > have that "Ahaa" moment) over the next week. But in solidarity with
> > Dave (and to make my life easier too ;) let's try to avoid introducing
> > any _new_ nasty issues, ok?
> 
> It's probably unrelated to DaveJ's issue, but my perf_event 
> fuzzer still quickly locks the kernel pretty solid on 3.18.

I'm really tempted to restrict most of the weirder perf ABI 
details (such as event groups, raw events, etc.) to root only 
(with a perf_event_paranoid level to make it available for power 
users and distros so inclined) - most of the past lockups/races 
you have triggered were in such weird, seldom used corners of the 
perf ABI that doesn't get much testing outside Trinity fuzzing.

The bits used by tools/perf and used by the majority of users are 
generally rock solid.

Doing that is not a fix obviously and we'd like to fix all 
pending bugs for real as well, but would at least isolate the 
impact to root-only until it's fixed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux 3.18 released
  2014-12-09 10:18   ` Ingo Molnar
@ 2014-12-09 11:06     ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2014-12-09 11:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Vince Weaver <vince@deater.net> wrote:
> 
> > On Sun, 7 Dec 2014, Linus Torvalds wrote:
> > 
> > > I'd love to say that we've figured out the problem that 
> > > plagues 3.17 for a couple of people, but we haven't. At the 
> > > same time, there's absolutely no point in having everybody 
> > > else twiddling their thumbs when a couple of people are 
> > > actively trying to bisect an older issue, so holding up the 
> > > release just didn't make sense. Especially since that would 
> > > just have then held things up entirely over the holiday 
> > > break.
> > > 
> > > So the merge window for 3.19 is open, and DaveJ will 
> > > hopefully get his bisection done (or at least narrow things 
> > > down sufficiently that we have that "Ahaa" moment) over the 
> > > next week. But in solidarity with Dave (and to make my life 
> > > easier too ;) let's try to avoid introducing any _new_ 
> > > nasty issues, ok?
> > 
> > It's probably unrelated to DaveJ's issue, but my perf_event 
> > fuzzer still quickly locks the kernel pretty solid on 3.18.
> 
> I'm really tempted to restrict most of the weirder perf ABI 
> details (such as event groups, raw events, etc.) to root only 
> (with a perf_event_paranoid level to make it available for 
> power users and distros so inclined) - most of the past 
> lockups/races you have triggered were in such weird, seldom 
> used corners of the perf ABI that doesn't get much testing 
> outside Trinity fuzzing.
> 
> The bits used by tools/perf and used by the majority of users 
> are generally rock solid.

Note that it's entirely possible that I'm wrong about that 
suspicion, that the leftover bug(s?) are still in the core 
portion of perf.

Maybe fuzzing could help there: initially only fuzz core portions 
of perf ABI (bits that things like tools/perf uses on an everyday 
basis), then the rarer bits? If we knew it that it's say the 
cgroups bits or tracepoint integration of perf that is causing 
the trouble, that would already narrow down the inquiry quite a 
bit.

> Doing that is not a fix obviously and we'd like to fix all 
> pending bugs for real as well, but would at least isolate the 
> impact to root-only until it's fixed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux 3.18 released
  2014-12-08 18:39 ` Vince Weaver
  2014-12-08 19:11   ` Vince Weaver
  2014-12-09 10:18   ` Ingo Molnar
@ 2014-12-11  0:38   ` Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2014-12-11  0:38 UTC (permalink / raw)
  To: Vince Weaver, Linus Torvalds
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar

On 12/08/2014 10:39 AM, Vince Weaver wrote:
> On Sun, 7 Dec 2014, Linus Torvalds wrote:
> 
>> I'd love to say that we've figured out the problem that plagues 3.17
>> for a couple of people, but we haven't. At the same time, there's
>> absolutely no point in having everybody else twiddling their thumbs
>> when a couple of people are actively trying to bisect an older issue,
>> so holding up the release just didn't make sense. Especially since
>> that would just have then held things up entirely over the holiday
>> break.
>>
>> So the merge window for 3.19 is open, and DaveJ will hopefully get his
>> bisection done (or at least narrow things down sufficiently that we
>> have that "Ahaa" moment) over the next week. But in solidarity with
>> Dave (and to make my life easier too ;) let's try to avoid introducing
>> any _new_ nasty issues, ok?
> 
> It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still 
> quickly locks the kernel pretty solid on 3.18.
> 
> Just 5 minutes of testing managed to trip over the following issue that 
> dates back to at least 3.15-rc7

Out of curiosity, can you see if this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid-and-more&id=38e49874d0ab18276f753f5784420b091f4be6eb

makes the problem much worse?  (Don't take the whole series there --
just cherry-pick the one patch.)

--Andy

> 
> My notes say last time I tracked down the issue as so:
> 
>   What happens is in kernel/core/events.c  find_get_context()
>   somehow perf_lock_task_context() returns NULL 
>   due to !atomic_inc_not_zero(&ctx->refcount)
>   but task->perf_event_ctxp[ctxn] still has a valid value.
> 
> There are multiple perf related issues like this that are hard to track 
> down.  They are borderline heisenbugs that are possibly race conditions, 
> so bisecting doesn't work and even things like enablibg ftrace will make 
> the issue go away (or crash ftrace itself).
> 
> This particular manifestation of the bug (or bugs) wedges things but I can 
> use alt-sysrq from the serial console to see where it is stuck (see 
> below; the CPU is stuck in a loop).
> 
> 
> [ 2225.916004]  [<ffffffff810e61e9>] ? get_page_from_freelist+0x55/0x781
> [ 2225.916004]  [<ffffffff810e6a7c>] __alloc_pages_nodemask+0x167/0x6dc
> [ 2225.916004]  [<ffffffff8101a4a3>] ? intel_pmu_enable_all+0x28/0xa4
> [ 2225.916004]  [<ffffffff8111f0b3>] kmem_getpages+0x58/0xec
> [ 2225.916004]  [<ffffffff81120278>] cache_grow+0xad/0x1d8
> [ 2225.916004]  [<ffffffff81120021>] ____cache_alloc+0x237/0x2ce
> [ 2225.916004]  [<ffffffff811216b9>] __kmalloc+0x8f/0xf2
> [ 2225.916004]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2225.916004]  [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2225.916004]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2225.916004]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2225.916004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2225.916004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2225.916004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> [ 2256.708004]  [<ffffffff810d7e36>] ? put_ctx+0x40/0x61
> [ 2256.708004]  [<ffffffff810dcaa4>] find_get_context+0x1a9/0x1c7
> [ 2256.708004]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2256.708004]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2256.708004]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> [ 2303.796003]  [<ffffffff810fa6cb>] ? kmalloc_slab+0x7f/0x8d
> [ 2303.796003]  [<ffffffff81121653>] __kmalloc+0x29/0xf2
> [ 2303.796003]  [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2303.796003]  [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2303.796003]  [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2303.796003]  [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2303.796003]  [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2303.796003]  [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2303.796003]  [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
> 
> Vince
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-12-11  0:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08  0:10 Linux 3.18 released Linus Torvalds
2014-12-08 18:39 ` Vince Weaver
2014-12-08 19:11   ` Vince Weaver
2014-12-09 10:18   ` Ingo Molnar
2014-12-09 11:06     ` Ingo Molnar
2014-12-11  0:38   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).