LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Manfred Spraul <manfred@colorfullife.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: cboulte@gmail.com, Nadia.Derbey@bull.net,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization
Date: Thu, 13 Nov 2008 07:10:00 +0100	[thread overview]
Message-ID: <491BC4B8.1050406@colorfullife.com> (raw)
In-Reply-To: <20081111141603.f0e7fa8d.akpm@linux-foundation.org>

Andrew Morton wrote:
> Time is starting to press on this one.  Is there something which we can
> revert which would fix this bug?
>   
My previous analysis was bogus, let's start from scratch:

1) the initial oops report:
http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0

- lockdep is enabled, the oops is somewhere in __lock_acquire
- the instruction that oopses is

 >>>  lock incl 0x138(%r12)
R12 is 0x0038004000000000

That could be an debug_atomic_inc() in __lock_acquire. The class pointer 
in the spinlock_t is not initialized, thus it crashes.
Ingo - is that possible?

2) the latest oops was actually a soft lockup:

It starts with:
> [  400.393024] INFO: trying to register non-static key.
> [  400.397005] the code is fine but needs lockdep annotation.
> [  400.397005] turning off the locking correctness validator.
> [  400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
> [  400.397005] Call Trace:
> [  400.397005]  [<ffffffff80257055>] static_obj+0x60/0x77
> [  400.397005]  [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
> [  400.397005]  [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
> [  400.397005]  [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [  400.397005]  [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
> [  400.397005]  [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [  400.397005]  [<ffffffff802feb07>] ipc_lock+0x62/0x99
> [  400.397005]  [<ffffffff802feaa5>] ipc_lock+0x0/0x99
> [  400.397005]  [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
> [  400.397005]  [<ffffffff803002c3>] sys_msgctl+0x188/0x461
> [  400.397005]  [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [  400.397005]  [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [  400.397005]  [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
> [  400.397005]  [<ffffffff80212e09>] sched_clock+0x5/0x7
> [  400.397005]  [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
> [  400.397005]  [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
> [  400.397005]  [<ffffffff80212e09>] sched_clock+0x5/0x7
> [  400.397005]  [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
> [  400.397005]
> [  464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
> [  464.933006] Call Trace:
> [  464.933006]  [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
> [  464.933006]  [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
> [  464.933006]  [<ffffffff802feb07>] ipc_lock+0x62/0x99

For me, it reads like an uninitialized spinlock_t:
The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself.
But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock().
after 60 seconds, the soft lockup detection notices the problem and oopses.




  parent reply	other threads:[~2008-11-13  6:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20081028145952.620752409@bull.net>
2008-10-28 14:59 ` Nadia.Derbey
2008-10-28 17:22   ` Manfred Spraul
2008-10-29  9:11   ` cboulte
2008-11-11 22:16     ` Andrew Morton
2008-11-12  6:41       ` Manfred Spraul
2008-11-13  6:10       ` Manfred Spraul [this message]
2008-11-13  8:06         ` Peter Zijlstra
2008-11-13 10:08         ` Nadia Derbey
2008-11-13 17:53           ` Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=491BC4B8.1050406@colorfullife.com \
    --to=manfred@colorfullife.com \
    --cc=Nadia.Derbey@bull.net \
    --cc=akpm@linux-foundation.org \
    --cc=cboulte@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --subject='Re: [PATCH] SYSVIPC - Fix the ipc structures initialization' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).