LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jarek Poplawski <jarkao2@o2.pl>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Folkert van Heusden <folkert@vanheusden.com>,
	linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@tv-sign.ru>,
	"J\. Bruce Fields" <bfields@fieldses.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
Date: Tue, 20 Mar 2007 10:37:53 +0100	[thread overview]
Message-ID: <20070320093753.GA1751@ff.dom.local> (raw)
In-Reply-To: <17918.11420.155569.991473@notabene.brown>

On 19-03-2007 07:24, Neil Brown wrote:
> On Friday March 16, akpm@linux-foundation.org wrote:
>> OK.  That's not necessarily a bug: one could envisage a (weird) piece of
>> code which takes a lock then releases it on a later workqueue invokation. 
>> But I'm not sure that nfs4_laundromat() is actually supposed to be doing
>> anything like that.
>>
>> Then again, maybe it is: it seems to be waddling through a directory under
>> the control of a little state machine, with timeouts.
>>
>> Neil: help?
> 
> I'm quite certain that laundromat_main does *not* leave client_mutex
> locked as the last thing it does is call nfs4_unlock_state which is
> 	mutex_unlock(&client_mutex);
> To me, that raises some doubt about whether the lock leak check is
> working properly...
> It is somewhat harder to track locking of i_mutex, but it seems to me
> that every time it is taken, it is released again shortly afterwards.
> 
> So I think this must be a problem with leak detection, not with NFSd.
> 
> NeilBrown
> 
> 
>> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>>
>>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
>>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
>>>>> ...
>>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
>>>>> [ 1756.728271]     last function: laundromat_main+0x0/0x69 [nfsd]
>>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
>>>>> [ 1756.728435]  #0:  (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
>>>>> [ 1756.728679]  #1:  (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
>>>>> [ 1756.728923]  [<c1003d57>] show_trace_log_lvl+0x1a/0x30
>>>>> [ 1756.729015]  [<c1003d7f>] show_trace+0x12/0x14
>>>>> [ 1756.729103]  [<c1003e79>] dump_stack+0x16/0x18
>>>>> [ 1756.729187]  [<c102c2e8>] run_workqueue+0x167/0x170
>>>>> [ 1756.729276]  [<c102c437>] worker_thread+0x146/0x165
>>>>> [ 1756.729368]  [<c102f797>] kthread+0x97/0xc4
>>>>> [ 1756.729456]  [<c1003bdb>] kernel_thread_helper+0x7/0x10
>>>>> [ 1756.729547]  =======================
...
>>> I think I'm responsible for this message (commit
>>> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
>>> function executed by the workqueue (here laundromat_main) leaked an
>>> atomic context or is still holding locks (2 locks in this case).

This check is valid with keventd, but it looks like nfsd runs
kthread by itself. I'm not sure it's illegal to hold locks then,
so, if I'm not wrong with this, here is my patch proposal (for
testing) to take this into consideration.

Reported-by: Folkert van Heusden <folkert@vanheusden.com>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>

---

diff -Nurp 2.6.21-rc4-git4-/kernel/workqueue.c 2.6.21-rc4-git4/kernel/workqueue.c
--- 2.6.21-rc4-git4-/kernel/workqueue.c	2007-02-04 19:44:54.000000000 +0100
+++ 2.6.21-rc4-git4/kernel/workqueue.c	2007-03-20 09:30:46.000000000 +0100
@@ -316,6 +316,7 @@ static void run_workqueue(struct cpu_wor
 		struct work_struct *work = list_entry(cwq->worklist.next,
 						struct work_struct, entry);
 		work_func_t f = work->func;
+		int ld;
 
 		list_del_init(cwq->worklist.next);
 		spin_unlock_irqrestore(&cwq->lock, flags);
@@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor
 		BUG_ON(get_wq_data(work) != cwq);
 		if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work)))
 			work_release(work);
+		ld = lockdep_depth(current);
+
 		f(work);
 
-		if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
+		if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
 			printk(KERN_ERR "BUG: workqueue leaked lock or atomic: "
-					"%s/0x%08x/%d\n",
+					"%s/0x%08x/%d/%d\n",
 					current->comm, preempt_count(),
-				       	current->pid);
+				       	current->pid, ld);
 			printk(KERN_ERR "    last function: ");
 			print_symbol("%s\n", (unsigned long)f);
 			debug_show_held_locks(current);

  reply	other threads:[~2007-03-20  9:33 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-13 16:50 Folkert van Heusden
2007-03-15 19:06 ` Andrew Morton
2007-03-15 19:17   ` Folkert van Heusden
2007-03-16 14:49     ` Jarek Poplawski
2007-03-20 11:17     ` dquot.c: possible circular locking " Jarek Poplawski
2007-03-20 11:22       ` Jarek Poplawski
2007-03-20 11:31         ` Jarek Poplawski
2007-03-20 12:19           ` Jan Kara
2007-03-20 13:35             ` Arjan van de Ven
2007-03-20 14:21               ` Jan Kara
2007-03-20 14:18                 ` Arjan van de Ven
2007-03-20 13:44             ` Jarek Poplawski
2007-03-20 14:00               ` Jan Kara
2007-03-16  8:41   ` Peter Zijlstra
2007-03-16 11:39     ` Andrew Morton
2007-03-19  6:24       ` Neil Brown
2007-03-20  9:37         ` Jarek Poplawski [this message]
2007-03-20 16:07           ` [PATCH] " Oleg Nesterov
2007-03-21  8:05             ` Jarek Poplawski
2007-03-21 14:46               ` Oleg Nesterov
2007-03-21 15:16                 ` Jarek Poplawski
2007-03-21 15:17                   ` Folkert van Heusden
2007-03-21 15:29                   ` Oleg Nesterov
2007-03-21 18:16                     ` Oleg Nesterov
2007-03-22  6:11                       ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
2007-03-22  6:28                         ` Andrew Morton
2007-03-22  7:06                           ` Jarek Poplawski
2007-03-22  7:23                             ` Jarek Poplawski
2007-03-22  7:13                           ` Jarek Poplawski
2007-03-22  8:26                           ` Jarek Poplawski
2007-03-22  6:57                         ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski
2007-03-22  7:23                           ` Peter Zijlstra
2007-03-22  9:06                             ` Ingo Molnar
2007-03-22  7:22                         ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra
2007-03-22  9:06                           ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070320093753.GA1751@ff.dom.local \
    --to=jarkao2@o2.pl \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=folkert@vanheusden.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=neilb@suse.de \
    --cc=oleg@tv-sign.ru \
    --subject='[PATCH] Re: [2.6.20] BUG: workqueue leaked lock' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).