LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [2.6.20] BUG: workqueue leaked lock
@ 2007-03-13 16:50 Folkert van Heusden
2007-03-15 19:06 ` Andrew Morton
0 siblings, 1 reply; 35+ messages in thread
From: Folkert van Heusden @ 2007-03-13 16:50 UTC (permalink / raw)
To: linux-kernel
...
[ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
[ 1756.728392] 2 locks held by nfsd4/3577:
[ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
[ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
[ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
[ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
[ 1756.729368] [<c102f797>] kthread+0x97/0xc4
[ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
[ 1756.729547] =======================
[ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
[ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
[ 1846.683832] 2 locks held by nfsd4/3577:
[ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
[ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
[ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
[ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
[ 1846.684012] [<c102f797>] kthread+0x97/0xc4
[ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
[ 1846.684027] =======================
[ 1936.638930] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[ 1936.638992] last function: laundromat_main+0x0/0x69 [nfsd]
[ 1936.639141] 2 locks held by nfsd4/3577:
[ 1936.639195] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1936.639292] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 1936.639301] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 1936.639306] [<c1003d7f>] show_trace+0x12/0x14
[ 1936.639311] [<c1003e79>] dump_stack+0x16/0x18
[ 1936.639314] [<c102c2e8>] run_workqueue+0x167/0x170
[ 1936.639321] [<c102c437>] worker_thread+0x146/0x165
[ 1936.639325] [<c102f797>] kthread+0x97/0xc4
[ 1936.639329] [<c1003bdb>] kernel_thread_helper+0x7/0x10
[ 1936.639332] =======================
this repeats several times
also:
[ 137.324255] =======================================================
[ 137.324359] [ INFO: possible circular locking dependency detected ]
[ 137.324412] 2.6.20 #2
[ 137.324457] -------------------------------------------------------
[ 137.324510] tor/4857 is trying to acquire lock:
[ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.324765]
[ 137.324766] but task is already holding lock:
[ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.325067]
[ 137.325069] which lock already depends on the new lock.
[ 137.325071]
[ 137.325206]
[ 137.325208] the existing dependency chain (in reverse order) is:
[ 137.325300]
[ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}:
[ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.326538] [<c1037624>] lock_acquire+0x62/0x81
[ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d
[ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
[ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
[ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
[ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
[ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228
[ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78
[ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270
[ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161
[ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4
[ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22
[ 137.331063] [<c1002f78>] syscall_call+0x7/0xb
[ 137.331406] [<ffffffff>] 0xffffffff
[ 137.331771]
[ 137.331772] -> #3 (&ei->truncate_mutex){--..}:
[ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.333025] [<c1037624>] lock_acquire+0x62/0x81
[ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468
[ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116
[ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c
[ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116
[ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c
[ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f
[ 137.335627] [<c107d909>] notify_change+0x35b/0x392
[ 137.335968] [<c10679d2>] do_truncate+0x52/0x75
[ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231
[ 137.336642] [<c107211c>] open_namei+0xda/0x59b
[ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53
[ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8
[ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e
[ 137.337980] [<c1002f78>] syscall_call+0x7/0xb
[ 137.338315] [<ffffffff>] 0xffffffff
[ 137.338665]
[ 137.338666] -> #2 (&inode->i_alloc_sem){--..}:
[ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.339874] [<c1037624>] lock_acquire+0x62/0x81
[ 137.340207] [<c1032bf5>] down_write+0x2b/0x45
[ 137.340545] [<c107d890>] notify_change+0x2e2/0x392
[ 137.340886] [<c10679d2>] do_truncate+0x52/0x75
[ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231
[ 137.341557] [<c107211c>] open_namei+0xda/0x59b
[ 137.341898] [<c106863a>] do_filp_open+0x2c/0x53
[ 137.342236] [<c106896f>] do_sys_open+0x52/0xd8
[ 137.342572] [<c1068a11>] sys_open+0x1c/0x1e
[ 137.343109] [<c1002f78>] syscall_call+0x7/0xb
[ 137.343444] [<ffffffff>] 0xffffffff
[ 137.343792]
[ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}:
[ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.344986] [<c1037624>] lock_acquire+0x62/0x81
[ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c
[ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12
[ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44
[ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2
[ 137.347325] [<c114b742>] device_unregister+0xb/0x15
[ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a
[ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38
[ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b
[ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce
[ 137.348999] [<c113077b>] tty_release+0x12/0x1c
[ 137.349332] [<c106a471>] __fput+0x145/0x16e
[ 137.349671] [<c106a32a>] fput+0x19/0x1b
[ 137.350004] [<c1068a92>] filp_close+0x3c/0x75
[ 137.350338] [<c1068b34>] sys_close+0x69/0xb4
[ 137.350671] [<c1002f78>] syscall_call+0x7/0xb
[ 137.351003] [<ffffffff>] 0xffffffff
[ 137.351352]
[ 137.351354] -> #0 (tty_mutex){--..}:
[ 137.351544] [<c1035398>] check_prev_add+0x34/0x206
[ 137.352086] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.352421] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.352759] [<c1037624>] lock_acquire+0x62/0x81
[ 137.353090] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.353426] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.353761] [<c1098d79>] print_warning+0x8c/0x15d
[ 137.354096] [<c1099537>] dquot_alloc_space+0x184/0x189
[ 137.354434] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
[ 137.354773] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
[ 137.355109] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
[ 137.355446] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
[ 137.355783] [<c10ac66c>] ext3_get_block+0x78/0xe3
[ 137.356121] [<c108a239>] __block_prepare_write+0x168/0x3fd
[ 137.356459] [<c108ad10>] block_prepare_write+0x28/0x3b
[ 137.356800] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d
[ 137.357138] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6
[ 137.357474] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad
[ 137.357817] [<c104e6ea>] generic_file_aio_write+0x58/0xc4
[ 137.358156] [<c10aa723>] ext3_file_write+0x2d/0xba
[ 137.357817] [<c104e6ea>] generic_file_aio_write+0x58/0xc4
[ 137.358156] [<c10aa723>] ext3_file_write+0x2d/0xba
[ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116
[ 137.358838] [<c1069612>] vfs_write+0x182/0x187
[ 137.359176] [<c10696b8>] sys_write+0x3d/0x64
[ 137.359513] [<c1002f78>] syscall_call+0x7/0xb
[ 137.359848] [<ffffffff>] 0xffffffff
[ 137.360200]
[ 137.360202] other info that might help us debug this:
[ 137.360204]
[ 137.360335] 3 locks held by tor/4857:
[ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.361313]
[ 137.361314] stack backtrace:
[ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 137.361488] [<c1003d7f>] show_trace+0x12/0x14
[ 137.361571] [<c1003e79>] dump_stack+0x16/0x18
[ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71
[ 137.361752] [<c1035398>] check_prev_add+0x34/0x206
[ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.362877] [<c1037624>] lock_acquire+0x62/0x81
[ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d
[ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189
[ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
[ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
[ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
[ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
[ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3
[ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd
[ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b
[ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d
[ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6
[ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad
[ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4
[ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba
[ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116
[ 137.364425] [<c1069612>] vfs_write+0x182/0x187
[ 137.364511] [<c10696b8>] sys_write+0x3d/0x64
[ 137.364595] [<c1002f78>] syscall_call+0x7/0xb
[ 137.364682] =======================
[ 137.471738] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[ 137.471812] last function: laundromat_main+0x0/0x69 [nfsd]
[ 137.471923] 2 locks held by nfsd4/3577:
[ 137.471971] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.472209] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.472445] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 137.472528] [<c1003d7f>] show_trace+0x12/0x14
[ 137.472613] [<c1003e79>] dump_stack+0x16/0x18
[ 137.472695] [<c102c2e8>] run_workqueue+0x167/0x170
[ 137.472779] [<c102c437>] worker_thread+0x146/0x165
[ 137.472861] [<c102f797>] kthread+0x97/0xc4
[ 137.472943] [<c1003bdb>] kernel_thread_helper+0x7/0x10
[ 137.473026] =======================
Folkert van Heusden
--
MultiTail is a versatile tool for watching logfiles and output of
commands. Filtering, coloring, merging, diff-view, etc.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-13 16:50 [2.6.20] BUG: workqueue leaked lock Folkert van Heusden
@ 2007-03-15 19:06 ` Andrew Morton
2007-03-15 19:17 ` Folkert van Heusden
2007-03-16 8:41 ` Peter Zijlstra
0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2007-03-15 19:06 UTC (permalink / raw)
To: Folkert van Heusden; +Cc: linux-kernel, Oleg Nesterov, Neil Brown
> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> ...
> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> [ 1756.728392] 2 locks held by nfsd4/3577:
> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> [ 1756.729547] =======================
> [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
> [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
> [ 1846.683832] 2 locks held by nfsd4/3577:
> [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
> [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
> [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
> [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
> [ 1846.684012] [<c102f797>] kthread+0x97/0xc4
> [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
Oleg, that's a fairly incomprehensible message we have in there. Can you
please explain what it means?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-15 19:06 ` Andrew Morton
@ 2007-03-15 19:17 ` Folkert van Heusden
2007-03-16 14:49 ` Jarek Poplawski
2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski
2007-03-16 8:41 ` Peter Zijlstra
1 sibling, 2 replies; 35+ messages in thread
From: Folkert van Heusden @ 2007-03-15 19:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Oleg Nesterov, Neil Brown
> > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > ...
> > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
...
> > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> Oleg, that's a fairly incomprehensible message we have in there. Can you
> please explain what it means?
Haha ok :-)
Good, since I run 2.6.20 with these debugging switches switched on, I
get occasionally errors like these. I get ALWAYS the following error
when the system first boots when the TOR executable is started:
[ 137.324255] =======================================================
[ 137.324359] [ INFO: possible circular locking dependency detected ]
[ 137.324412] 2.6.20 #2
[ 137.324457] -------------------------------------------------------
[ 137.324510] tor/4857 is trying to acquire lock:
[ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.324765]
[ 137.324766] but task is already holding lock:
[ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.325067]
[ 137.325069] which lock already depends on the new lock.
[ 137.325071]
[ 137.325206]
[ 137.325208] the existing dependency chain (in reverse order) is:
[ 137.325300]
[ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}:
[ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.326538] [<c1037624>] lock_acquire+0x62/0x81
[ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d
[ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
[ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
[ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
[ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
[ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228
[ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78
[ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270
[ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161
[ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4
[ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22
[ 137.331063] [<c1002f78>] syscall_call+0x7/0xb
[ 137.331406] [<ffffffff>] 0xffffffff
[ 137.331771]
[ 137.331772] -> #3 (&ei->truncate_mutex){--..}:
[ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.333025] [<c1037624>] lock_acquire+0x62/0x81
[ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468
[ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116
[ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c
[ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f
[ 137.335627] [<c107d909>] notify_change+0x35b/0x392
[ 137.335968] [<c10679d2>] do_truncate+0x52/0x75
[ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231
[ 137.336642] [<c107211c>] open_namei+0xda/0x59b
[ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53
[ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8
[ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e
[ 137.337980] [<c1002f78>] syscall_call+0x7/0xb
[ 137.338315] [<ffffffff>] 0xffffffff
[ 137.338665]
[ 137.338666] -> #2 (&inode->i_alloc_sem){--..}:
[ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.339874] [<c1037624>] lock_acquire+0x62/0x81
[ 137.340207] [<c1032bf5>] down_write+0x2b/0x45
[ 137.340545] [<c107d890>] notify_change+0x2e2/0x392
[ 137.340886] [<c10679d2>] do_truncate+0x52/0x75
[ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231
[ 137.341557] [<c107211c>] open_namei+0xda/0x59b
[ 137.341898] [<c1068a11>] sys_open+0x1c/0x1e
[ 137.343109] [<c1002f78>] syscall_call+0x7/0xb
[ 137.343444] [<ffffffff>] 0xffffffff
[ 137.343792]
[ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}:
[ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206
[ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.344986] [<c1037624>] lock_acquire+0x62/0x81
[ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c
[ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12
[ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44
[ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2
[ 137.347325] [<c114b742>] device_unregister+0xb/0x15
[ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a
[ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38
[ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b
[ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce
[ 137.348999] [<c113077b>] tty_release+0x12/0x1c
[ 137.349332] [<c10aa723>] ext3_file_write+0x2d/0xba
[ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116
[ 137.358838] [<c1069612>] vfs_write+0x182/0x187
[ 137.359176] [<c10696b8>] sys_write+0x3d/0x64
[ 137.359513] [<c1002f78>] syscall_call+0x7/0xb
[ 137.359848] [<ffffffff>] 0xffffffff
[ 137.360200]
[ 137.360202] other info that might help us debug this:
[ 137.360204]
[ 137.360335] 3 locks held by tor/4857:
[ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
[ 137.361313]
[ 137.361314] stack backtrace:
[ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 137.361488] [<c1003d7f>] show_trace+0x12/0x14
[ 137.361571] [<c1003e79>] dump_stack+0x16/0x18
[ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71
[ 137.361752] [<c1035398>] check_prev_add+0x34/0x206
[ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5
[ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05
[ 137.362877] [<c1037624>] lock_acquire+0x62/0x81
[ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
[ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa
[ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d
[ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189
[ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
[ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
[ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
[ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
[ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3
[ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd
[ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b
[ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d
[ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6
[ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad
[ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4
[ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba
[ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116
[ 137.364425] [<c1069612>] vfs_write+0x182/0x187
[ 137.364511] [<c10696b8>] sys_write+0x3d/0x64
[ 137.364595] [<c1002f78>] syscall_call+0x7/0xb
[ 137.364682] =======================
Now apart from this reproducable problem, I suddenly also get these errors in dmesg:
[ 137.471738] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[ 137.471812] last function: laundromat_main+0x0/0x69 [nfsd]
[ 137.471923] 2 locks held by nfsd4/3577:
[ 137.471971] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.472209] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[ 137.472445] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[ 137.472528] [<c1003d7f>] show_trace+0x12/0x14
[ 137.472613] [<c1003e79>] dump_stack+0x16/0x18
[ 137.472695] [<c102c2e8>] run_workqueue+0x167/0x170
[ 137.472779] [<c102c437>] worker_thread+0x146/0x165
[ 137.472861] [<c102f797>] kthread+0x97/0xc4
[ 137.472943] [<c1003bdb>] kernel_thread_helper+0x7/0x10
These happen on 2 times per minute and then once in 2 minutes.
The last one (while writing this message) is:
[182932.590394] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
[182932.590450] last function: laundromat_main+0x0/0x69 [nfsd]
[182932.590557] 2 locks held by nfsd4/3577:
[182932.590600] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[182932.590830] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
[182932.591061] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
[182932.591145] [<c1003d7f>] show_trace+0x12/0x14
[182932.591234] [<c1003e79>] dump_stack+0x16/0x18
[182932.591316] [<c102c2e8>] run_workqueue+0x167/0x170
[182932.591399] [<c102c437>] worker_thread+0x146/0x165
[182932.591482] [<c102f797>] kthread+0x97/0xc4
[182932.591564] [<c1003bdb>] kernel_thread_helper+0x7/0x10
Config can be found here:
http://keetweej.vanheusden.com/~folkert/config-2.6.20.txt
System is a P4 @ 3.2GHz with 2GB of ram. IDE harddisks, 5 mounts mounted
on a server connected via a 1Gb lan. The other end (the server) runs
2.6.19.1.
On the problem-system this is the output of nfsstat:
Server rpc stats:
calls badcalls badauth badclnt xdrcall
1833 611 611 0 0
Server nfs v2:
null getattr setattr root lookup readlink
611 100% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
Server nfs v3:
null getattr setattr lookup access readlink
611 100% 0 0% 0 0% 0 0% 0 0% 0 0%
read write create mkdir symlink mknod
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 0 0% 0 0% 0 0%
Server nfs v4:
null compound
611 100% 0 0%
Server nfs v4 operations:
op0-unused op1-unused op2-future access close commit
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
create delegpurge delegreturn getattr getfh link
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
lock lockt locku lookup lookup_root nverify
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
open openattr open_conf open_dgrd putfh putpubfh
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
putrootfh read readdir readlink remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
renew restorefh savefh secinfo setattr setcltid
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
setcltidconf verify write rellockowner
0 0% 0 0% 0 0% 0 0%
Client rpc stats:
calls retrans authrefrsh
1404874 5502 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 157240 11% 11360 0% 0 0% 707396 50% 20434 1%
read wrcache write create remove rename
18 0% 0 0% 471048 33% 2900 0% 323 0% 2888 0%
link symlink mkdir rmdir readdir fsstat
0 0% 2 0% 60 0% 32 0% 27463 1% 3709 0%
Client nfs v4:
null read write commit open open_conf
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
open_noat open_dgrd close setattr fsinfo renew
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
setclntid confirm lock lockt locku access
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
getattr lookup lookup_root remove rename link
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
symlink create pathconf statfs readlink readdir
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
server_caps delegreturn
0 0% 0 0%
Folkert van Heusden
--
www.biglumber.com <- site where one can exchange PGP key signatures
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-15 19:06 ` Andrew Morton
2007-03-15 19:17 ` Folkert van Heusden
@ 2007-03-16 8:41 ` Peter Zijlstra
2007-03-16 11:39 ` Andrew Morton
1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2007-03-16 8:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Folkert van Heusden, linux-kernel, Oleg Nesterov, Neil Brown
On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > ...
> > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > [ 1756.728392] 2 locks held by nfsd4/3577:
> > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > [ 1756.729547] =======================
> > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
> > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
> > [ 1846.683832] 2 locks held by nfsd4/3577:
> > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
> > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
> > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
> > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
> > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4
> > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
>
> Oleg, that's a fairly incomprehensible message we have in there. Can you
> please explain what it means?
I think I'm responsible for this message (commit
d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
function executed by the workqueue (here laundromat_main) leaked an
atomic context or is still holding locks (2 locks in this case).
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-16 8:41 ` Peter Zijlstra
@ 2007-03-16 11:39 ` Andrew Morton
2007-03-19 6:24 ` Neil Brown
0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2007-03-16 11:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Folkert van Heusden, linux-kernel, Oleg Nesterov, Neil Brown
On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > ...
> > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > > [ 1756.728392] 2 locks held by nfsd4/3577:
> > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > > [ 1756.729547] =======================
> > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
> > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
> > > [ 1846.683832] 2 locks held by nfsd4/3577:
> > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
> > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
> > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
> > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
> > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4
> > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> >
> > Oleg, that's a fairly incomprehensible message we have in there. Can you
> > please explain what it means?
>
> I think I'm responsible for this message (commit
> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
> function executed by the workqueue (here laundromat_main) leaked an
> atomic context or is still holding locks (2 locks in this case).
>
OK. That's not necessarily a bug: one could envisage a (weird) piece of
code which takes a lock then releases it on a later workqueue invokation.
But I'm not sure that nfs4_laundromat() is actually supposed to be doing
anything like that.
Then again, maybe it is: it seems to be waddling through a directory under
the control of a little state machine, with timeouts.
Neil: help?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-15 19:17 ` Folkert van Heusden
@ 2007-03-16 14:49 ` Jarek Poplawski
2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski
1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-16 14:49 UTC (permalink / raw)
To: Folkert van Heusden
Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown
On 15-03-2007 20:17, Folkert van Heusden wrote:
>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
>>> ...
>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> ...
>>> [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
>> Oleg, that's a fairly incomprehensible message we have in there. Can you
>> please explain what it means?
>
> Haha ok :-)
>
> Good, since I run 2.6.20 with these debugging switches switched on, I
> get occasionally errors like these. I get ALWAYS the following error
> when the system first boots when the TOR executable is started:
>
> [ 137.324255] =======================================================
> [ 137.324359] [ INFO: possible circular locking dependency detected ]
> [ 137.324412] 2.6.20 #2
> [ 137.324457] -------------------------------------------------------
Hi,
IMHO you should now switch off lockdep ("Lock debuging: ..." options),
until you update or get some patch for this circular problem. Your
logs should be sufficient to nfsd people and lockdep turns off checking
after this anyway. I don't know if this is a known and real locking
problem, but it could be interesting now, if workqueue error would
reapear. It seems nfsd code is breaking somewhere with locks held -
maybe because of problems with lockdep's error handling.
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock
2007-03-16 11:39 ` Andrew Morton
@ 2007-03-19 6:24 ` Neil Brown
2007-03-20 9:37 ` [PATCH] " Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Neil Brown @ 2007-03-19 6:24 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Folkert van Heusden, linux-kernel, Oleg Nesterov,
J. Bruce Fields
On Friday March 16, akpm@linux-foundation.org wrote:
>
> OK. That's not necessarily a bug: one could envisage a (weird) piece of
> code which takes a lock then releases it on a later workqueue invokation.
> But I'm not sure that nfs4_laundromat() is actually supposed to be doing
> anything like that.
>
> Then again, maybe it is: it seems to be waddling through a directory under
> the control of a little state machine, with timeouts.
>
> Neil: help?
I'm quite certain that laundromat_main does *not* leave client_mutex
locked as the last thing it does is call nfs4_unlock_state which is
mutex_unlock(&client_mutex);
To me, that raises some doubt about whether the lock leak check is
working properly...
It is somewhat harder to track locking of i_mutex, but it seems to me
that every time it is taken, it is released again shortly afterwards.
So I think this must be a problem with leak detection, not with NFSd.
NeilBrown
> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > > ...
> > > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > > > [ 1756.728392] 2 locks held by nfsd4/3577:
> > > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > > > [ 1756.729547] =======================
> > > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
> > > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
> > > > [ 1846.683832] 2 locks held by nfsd4/3577:
> > > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
> > > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
> > > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
> > > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
> > > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4
> > > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > >
> > > Oleg, that's a fairly incomprehensible message we have in there. Can you
> > > please explain what it means?
> >
> > I think I'm responsible for this message (commit
> > d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
> > function executed by the workqueue (here laundromat_main) leaked an
> > atomic context or is still holding locks (2 locks in this case).
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-19 6:24 ` Neil Brown
@ 2007-03-20 9:37 ` Jarek Poplawski
2007-03-20 16:07 ` Oleg Nesterov
0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-20 9:37 UTC (permalink / raw)
To: Neil Brown
Cc: Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel,
Oleg Nesterov, J. Bruce Fields, Ingo Molnar
On 19-03-2007 07:24, Neil Brown wrote:
> On Friday March 16, akpm@linux-foundation.org wrote:
>> OK. That's not necessarily a bug: one could envisage a (weird) piece of
>> code which takes a lock then releases it on a later workqueue invokation.
>> But I'm not sure that nfs4_laundromat() is actually supposed to be doing
>> anything like that.
>>
>> Then again, maybe it is: it seems to be waddling through a directory under
>> the control of a little state machine, with timeouts.
>>
>> Neil: help?
>
> I'm quite certain that laundromat_main does *not* leave client_mutex
> locked as the last thing it does is call nfs4_unlock_state which is
> mutex_unlock(&client_mutex);
> To me, that raises some doubt about whether the lock leak check is
> working properly...
> It is somewhat harder to track locking of i_mutex, but it seems to me
> that every time it is taken, it is released again shortly afterwards.
>
> So I think this must be a problem with leak detection, not with NFSd.
>
> NeilBrown
>
>
>> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>>
>>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
>>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
>>>>> ...
>>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
>>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
>>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
>>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
>>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
>>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
>>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
>>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
>>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
>>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
>>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
>>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
>>>>> [ 1756.729547] =======================
...
>>> I think I'm responsible for this message (commit
>>> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
>>> function executed by the workqueue (here laundromat_main) leaked an
>>> atomic context or is still holding locks (2 locks in this case).
This check is valid with keventd, but it looks like nfsd runs
kthread by itself. I'm not sure it's illegal to hold locks then,
so, if I'm not wrong with this, here is my patch proposal (for
testing) to take this into consideration.
Reported-by: Folkert van Heusden <folkert@vanheusden.com>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---
diff -Nurp 2.6.21-rc4-git4-/kernel/workqueue.c 2.6.21-rc4-git4/kernel/workqueue.c
--- 2.6.21-rc4-git4-/kernel/workqueue.c 2007-02-04 19:44:54.000000000 +0100
+++ 2.6.21-rc4-git4/kernel/workqueue.c 2007-03-20 09:30:46.000000000 +0100
@@ -316,6 +316,7 @@ static void run_workqueue(struct cpu_wor
struct work_struct *work = list_entry(cwq->worklist.next,
struct work_struct, entry);
work_func_t f = work->func;
+ int ld;
list_del_init(cwq->worklist.next);
spin_unlock_irqrestore(&cwq->lock, flags);
@@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor
BUG_ON(get_wq_data(work) != cwq);
if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work)))
work_release(work);
+ ld = lockdep_depth(current);
+
f(work);
- if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
+ if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
printk(KERN_ERR "BUG: workqueue leaked lock or atomic: "
- "%s/0x%08x/%d\n",
+ "%s/0x%08x/%d/%d\n",
current->comm, preempt_count(),
- current->pid);
+ current->pid, ld);
printk(KERN_ERR " last function: ");
print_symbol("%s\n", (unsigned long)f);
debug_show_held_locks(current);
^ permalink raw reply [flat|nested] 35+ messages in thread
* dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-15 19:17 ` Folkert van Heusden
2007-03-16 14:49 ` Jarek Poplawski
@ 2007-03-20 11:17 ` Jarek Poplawski
2007-03-20 11:22 ` Jarek Poplawski
1 sibling, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-20 11:17 UTC (permalink / raw)
To: Folkert van Heusden
Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown,
Christoph Hellwig, Jan Kara
On 15-03-2007 20:17, Folkert van Heusden wrote:
>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
...
> Haha ok :-)
>
> Good, since I run 2.6.20 with these debugging switches switched on, I
> get occasionally errors like these. I get ALWAYS the following error
> when the system first boots when the TOR executable is started:
>
> [ 137.324255] =======================================================
> [ 137.324359] [ INFO: possible circular locking dependency detected ]
> [ 137.324412] 2.6.20 #2
> [ 137.324457] -------------------------------------------------------
> [ 137.324510] tor/4857 is trying to acquire lock:
> [ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.324765]
> [ 137.324766] but task is already holding lock:
> [ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
> [ 137.325067]
> [ 137.325069] which lock already depends on the new lock.
IMHO lockdep found that two locks are taken in different order:
-> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
-> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
Probably print_warning() and flush_warnings() should be reworked
to do printing without dqptr_sem or locking order should be changed.
I hope somebody from this CC can work it out better.
Regards,
Jarek P.
> [ 137.325071]
> [ 137.325206]
> [ 137.325208] the existing dependency chain (in reverse order) is:
> [ 137.325300]
> [ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}:
> [ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206
> [ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.326538] [<c1037624>] lock_acquire+0x62/0x81
> [ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d
> [ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189
> [ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
> [ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
> [ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
> [ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
> [ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228
> [ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78
> [ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270
> [ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161
> [ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4
> [ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22
> [ 137.331063] [<c1002f78>] syscall_call+0x7/0xb
> [ 137.331406] [<ffffffff>] 0xffffffff
> [ 137.331771]
> [ 137.331772] -> #3 (&ei->truncate_mutex){--..}:
> [ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206
> [ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.333025] [<c1037624>] lock_acquire+0x62/0x81
> [ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
> [ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468
> [ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116
> [ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c
> [ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f
> [ 137.335627] [<c107d909>] notify_change+0x35b/0x392
> [ 137.335968] [<c10679d2>] do_truncate+0x52/0x75
> [ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231
> [ 137.336642] [<c107211c>] open_namei+0xda/0x59b
> [ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53
> [ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8
> [ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e
> [ 137.337980] [<c1002f78>] syscall_call+0x7/0xb
> [ 137.338315] [<ffffffff>] 0xffffffff
> [ 137.338665]
> [ 137.338666] -> #2 (&inode->i_alloc_sem){--..}:
> [ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206
> [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.339874] [<c1037624>] lock_acquire+0x62/0x81
> [ 137.340207] [<c1032bf5>] down_write+0x2b/0x45
> [ 137.340545] [<c107d890>] notify_change+0x2e2/0x392
> [ 137.340886] [<c10679d2>] do_truncate+0x52/0x75
> [ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231
> [ 137.341557] [<c107211c>] open_namei+0xda/0x59b
> [ 137.341898] [<c1068a11>] sys_open+0x1c/0x1e
> [ 137.343109] [<c1002f78>] syscall_call+0x7/0xb
> [ 137.343444] [<ffffffff>] 0xffffffff
> [ 137.343792]
> [ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}:
> [ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206
> [ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.344986] [<c1037624>] lock_acquire+0x62/0x81
> [ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
> [ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c
> [ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12
> [ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44
> [ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2
> [ 137.347325] [<c114b742>] device_unregister+0xb/0x15
> [ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a
> [ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38
> [ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b
> [ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce
> [ 137.348999] [<c113077b>] tty_release+0x12/0x1c
> [ 137.349332] [<c10aa723>] ext3_file_write+0x2d/0xba
> [ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116
> [ 137.358838] [<c1069612>] vfs_write+0x182/0x187
> [ 137.359176] [<c10696b8>] sys_write+0x3d/0x64
> [ 137.359513] [<c1002f78>] syscall_call+0x7/0xb
> [ 137.359848] [<ffffffff>] 0xffffffff
> [ 137.360200]
> [ 137.360202] other info that might help us debug this:
> [ 137.360204]
> [ 137.360335] 3 locks held by tor/4857:
> [ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189
> [ 137.361313]
> [ 137.361314] stack backtrace:
> [ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> [ 137.361488] [<c1003d7f>] show_trace+0x12/0x14
> [ 137.361571] [<c1003e79>] dump_stack+0x16/0x18
> [ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71
> [ 137.361752] [<c1035398>] check_prev_add+0x34/0x206
> [ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5
> [ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05
> [ 137.362877] [<c1037624>] lock_acquire+0x62/0x81
> [ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c
> [ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa
> [ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d
> [ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189
> [ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2
> [ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf
> [ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b
> [ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367
> [ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3
> [ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd
> [ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b
> [ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d
> [ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6
> [ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad
> [ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4
> [ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba
> [ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116
> [ 137.364425] [<c1069612>] vfs_write+0x182/0x187
> [ 137.364511] [<c10696b8>] sys_write+0x3d/0x64
> [ 137.364595] [<c1002f78>] syscall_call+0x7/0xb
> [ 137.364682] =======================
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski
@ 2007-03-20 11:22 ` Jarek Poplawski
2007-03-20 11:31 ` Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-20 11:22 UTC (permalink / raw)
To: Folkert van Heusden
Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown,
Christoph Hellwig, Jan Kara
On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote:
...
> IMHO lockdep found that two locks are taken in different order:
>
> -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
> -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
Should be:
-> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning()
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 11:22 ` Jarek Poplawski
@ 2007-03-20 11:31 ` Jarek Poplawski
2007-03-20 12:19 ` Jan Kara
0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-20 11:31 UTC (permalink / raw)
To: Folkert van Heusden
Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown,
Christoph Hellwig, Jan Kara
On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote:
> On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote:
> ...
> > IMHO lockdep found that two locks are taken in different order:
> >
> > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
> > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
Once more - should be:
-> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later)
-> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning()
Very sorry!!!
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 11:31 ` Jarek Poplawski
@ 2007-03-20 12:19 ` Jan Kara
2007-03-20 13:35 ` Arjan van de Ven
2007-03-20 13:44 ` Jarek Poplawski
0 siblings, 2 replies; 35+ messages in thread
From: Jan Kara @ 2007-03-20 12:19 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov,
Neil Brown, Christoph Hellwig
On Tue 20-03-07 12:31:51, Jarek Poplawski wrote:
> On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote:
> > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote:
> > ...
> > > IMHO lockdep found that two locks are taken in different order:
> > >
> > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
> > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
>
> Once more - should be:
> -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later)
> -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning()
Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
acquired under dqptr_sem in quota code. But looking at the path from
con_close() there's another inversion with i_mutex which is also acquired
along the path for sysfs. And we can hardly get rid of it in the quota code.
Now none of these is a real deadlock as quota should never call
print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
suppose tty_mutex is above i_mutex because of those sysfs calls and it
seems sysfs must be called under tty_mutex because of races with
init_dev(). So it's not easy to get rid of that dependency either.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 12:19 ` Jan Kara
@ 2007-03-20 13:35 ` Arjan van de Ven
2007-03-20 14:21 ` Jan Kara
2007-03-20 13:44 ` Jarek Poplawski
1 sibling, 1 reply; 35+ messages in thread
From: Arjan van de Ven @ 2007-03-20 13:35 UTC (permalink / raw)
To: Jan Kara
Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton,
linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig
> Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
> acquired under dqptr_sem in quota code. But looking at the path from
> con_close() there's another inversion with i_mutex which is also acquired
> along the path for sysfs. And we can hardly get rid of it in the quota code.
> Now none of these is a real deadlock as quota should never call
> print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
> suppose tty_mutex is above i_mutex because of those sysfs calls and it
> seems sysfs must be called under tty_mutex because of races with
> init_dev(). So it's not easy to get rid of that dependency either.
maybe a far more serious option: Why on EARTH is the quota code going to
TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but
nowadays most people use graphical user interfaces and the like...
can we please just get rid of this instead?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 12:19 ` Jan Kara
2007-03-20 13:35 ` Arjan van de Ven
@ 2007-03-20 13:44 ` Jarek Poplawski
2007-03-20 14:00 ` Jan Kara
1 sibling, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-20 13:44 UTC (permalink / raw)
To: Jan Kara
Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov,
Neil Brown, Christoph Hellwig
On Tue, Mar 20, 2007 at 01:19:09PM +0100, Jan Kara wrote:
> On Tue 20-03-07 12:31:51, Jarek Poplawski wrote:
> > On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote:
> > > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote:
> > > ...
> > > > IMHO lockdep found that two locks are taken in different order:
> > > >
> > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
> > > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
> >
> > Once more - should be:
> > -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later)
> > -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning()
> Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
> acquired under dqptr_sem in quota code. But looking at the path from
> con_close() there's another inversion with i_mutex which is also acquired
> along the path for sysfs. And we can hardly get rid of it in the quota code.
> Now none of these is a real deadlock as quota should never call
> print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
> suppose tty_mutex is above i_mutex because of those sysfs calls and it
> seems sysfs must be called under tty_mutex because of races with
> init_dev(). So it's not easy to get rid of that dependency either.
I wonder if this cannot be done with a workqueue (message to a buffer,
maybe after try_lock on tty_mutex) and BTW isn't there any "modern"
way to queue console messages like these?
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 13:44 ` Jarek Poplawski
@ 2007-03-20 14:00 ` Jan Kara
0 siblings, 0 replies; 35+ messages in thread
From: Jan Kara @ 2007-03-20 14:00 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov,
Neil Brown, Christoph Hellwig
On Tue 20-03-07 14:44:46, Jarek Poplawski wrote:
> On Tue, Mar 20, 2007 at 01:19:09PM +0100, Jan Kara wrote:
> > On Tue 20-03-07 12:31:51, Jarek Poplawski wrote:
> > > On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote:
> > > > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote:
> > > > ...
> > > > > IMHO lockdep found that two locks are taken in different order:
> > > > >
> > > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later)
> > > > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning()
> > >
> > > Once more - should be:
> > > -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later)
> > > -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning()
> > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
> > acquired under dqptr_sem in quota code. But looking at the path from
> > con_close() there's another inversion with i_mutex which is also acquired
> > along the path for sysfs. And we can hardly get rid of it in the quota code.
> > Now none of these is a real deadlock as quota should never call
> > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
> > suppose tty_mutex is above i_mutex because of those sysfs calls and it
> > seems sysfs must be called under tty_mutex because of races with
> > init_dev(). So it's not easy to get rid of that dependency either.
>
> I wonder if this cannot be done with a workqueue (message to a buffer,
> maybe after try_lock on tty_mutex) and BTW isn't there any "modern"
> way to queue console messages like these?
I don't know about any such way...
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 14:21 ` Jan Kara
@ 2007-03-20 14:18 ` Arjan van de Ven
0 siblings, 0 replies; 35+ messages in thread
From: Arjan van de Ven @ 2007-03-20 14:18 UTC (permalink / raw)
To: Jan Kara
Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton,
linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig
On Tue, 2007-03-20 at 15:21 +0100, Jan Kara wrote:
> On Tue 20-03-07 14:35:10, Arjan van de Ven wrote:
> >
> >
> > > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
> > > acquired under dqptr_sem in quota code. But looking at the path from
> > > con_close() there's another inversion with i_mutex which is also acquired
> > > along the path for sysfs. And we can hardly get rid of it in the quota code.
> > > Now none of these is a real deadlock as quota should never call
> > > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
> > > suppose tty_mutex is above i_mutex because of those sysfs calls and it
> > > seems sysfs must be called under tty_mutex because of races with
> > > init_dev(). So it's not easy to get rid of that dependency either.
> >
> > maybe a far more serious option: Why on EARTH is the quota code going to
> > TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but
> > nowadays most people use graphical user interfaces and the like...
> We've been discussing this sometimes back in August ;)
> (http://lkml.org/lkml/2006/8/8/237) and we've decided to leave the code in.
> The only reason why I think it should stay in is the existence of quota
> softlimits. There it's nice to warn the user and there's no other way to
> propagate this information into userspace (as the write succeeds).
> One solution would be to leave the warning to some userspace process
> (like warnquota) run from cron but still I'm not sure we should change the
> behavior.
or send a uevent or something
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 13:35 ` Arjan van de Ven
@ 2007-03-20 14:21 ` Jan Kara
2007-03-20 14:18 ` Arjan van de Ven
0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2007-03-20 14:21 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton,
linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig
On Tue 20-03-07 14:35:10, Arjan van de Ven wrote:
>
>
> > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being
> > acquired under dqptr_sem in quota code. But looking at the path from
> > con_close() there's another inversion with i_mutex which is also acquired
> > along the path for sysfs. And we can hardly get rid of it in the quota code.
> > Now none of these is a real deadlock as quota should never call
> > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I
> > suppose tty_mutex is above i_mutex because of those sysfs calls and it
> > seems sysfs must be called under tty_mutex because of races with
> > init_dev(). So it's not easy to get rid of that dependency either.
>
> maybe a far more serious option: Why on EARTH is the quota code going to
> TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but
> nowadays most people use graphical user interfaces and the like...
We've been discussing this sometimes back in August ;)
(http://lkml.org/lkml/2006/8/8/237) and we've decided to leave the code in.
The only reason why I think it should stay in is the existence of quota
softlimits. There it's nice to warn the user and there's no other way to
propagate this information into userspace (as the write succeeds).
One solution would be to leave the warning to some userspace process
(like warnquota) run from cron but still I'm not sure we should change the
behavior.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 9:37 ` [PATCH] " Jarek Poplawski
@ 2007-03-20 16:07 ` Oleg Nesterov
2007-03-21 8:05 ` Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2007-03-20 16:07 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On 03/20, Jarek Poplawski wrote:
>
> >> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >>
> >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> >>>>> ...
> >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> >>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
> >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> >>>>> [ 1756.729547] =======================
> ...
> >>> I think I'm responsible for this message (commit
> >>> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
> >>> function executed by the workqueue (here laundromat_main) leaked an
> >>> atomic context or is still holding locks (2 locks in this case).
>
> This check is valid with keventd, but it looks like nfsd runs
> kthread by itself. I'm not sure it's illegal to hold locks then,
nfsd creates laundry_wq by itself, yes, but cwq->thread runs with
lockdep_depth() == 0. Unless we have a bug with lockdep_depth(),
lockdep_depth() != 0 means that work->func() returns with a lock
held (or it can flush its own workqueue under lock, but in that case
we should have a different trace).
Personally I agree with Andrew:
>
> > OK. That's not necessarily a bug: one could envisage a (weird) piece of
> > code which takes a lock then releases it on a later workqueue invokation.
> @@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor
> BUG_ON(get_wq_data(work) != cwq);
> if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work)))
> work_release(work);
> + ld = lockdep_depth(current);
> +
> f(work);
>
> - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
> + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
and with this change we will also have a BUG report on "then releases it on a
later workqueue invokation".
Btw. __nfs4_state_shutdown() does cancel_delayed_work(&laundromat_work).
This is unneeded and imho confusing. laundry_wq was already destroyed,
it would be BUG to have a pending delayed work.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-20 16:07 ` Oleg Nesterov
@ 2007-03-21 8:05 ` Jarek Poplawski
2007-03-21 14:46 ` Oleg Nesterov
0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-21 8:05 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote:
> On 03/20, Jarek Poplawski wrote:
...
> > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > >>>>> ...
> > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
> > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > >>>>> [ 1756.729547] =======================
...
> > This check is valid with keventd, but it looks like nfsd runs
> > kthread by itself. I'm not sure it's illegal to hold locks then,
>
> nfsd creates laundry_wq by itself, yes, but cwq->thread runs with
> lockdep_depth() == 0. Unless we have a bug with lockdep_depth(),
> lockdep_depth() != 0 means that work->func() returns with a lock
> held (or it can flush its own workqueue under lock, but in that case
> we should have a different trace).
IMHO you can only say this thread is supposed to run with
lockdep_depth() == 0. lockdep_depth is counted within a process,
which starts before f(), so the only way to say f() leaked locks
is to check these locks before and after f().
>
> Personally I agree with Andrew:
> >
> > > OK. That's not necessarily a bug: one could envisage a (weird) piece of
> > > code which takes a lock then releases it on a later workqueue invokation.
But this code is named here as laundromat_main and it doesn't
seem to work like this.
>
> > @@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor
> > BUG_ON(get_wq_data(work) != cwq);
> > if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work)))
> > work_release(work);
> > + ld = lockdep_depth(current);
> > +
> > f(work);
> >
> > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
> > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
>
> and with this change we will also have a BUG report on "then releases it on a
> later workqueue invokation".
Then we could say at least this code is weird (buggy - in my opinion).
This patch doesn't change the way the "standard" code is treated,
so I cannot see any possibility to get it worse then now.
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 8:05 ` Jarek Poplawski
@ 2007-03-21 14:46 ` Oleg Nesterov
2007-03-21 15:16 ` Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2007-03-21 14:46 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On 03/21, Jarek Poplawski wrote:
>
> On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote:
> > On 03/20, Jarek Poplawski wrote:
> ...
> > > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > >>>>> ...
> > > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
> > > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > > >>>>> [ 1756.729547] =======================
> ...
> > > This check is valid with keventd, but it looks like nfsd runs
> > > kthread by itself. I'm not sure it's illegal to hold locks then,
> >
> > nfsd creates laundry_wq by itself, yes, but cwq->thread runs with
> > lockdep_depth() == 0. Unless we have a bug with lockdep_depth(),
> > lockdep_depth() != 0 means that work->func() returns with a lock
> > held (or it can flush its own workqueue under lock, but in that case
> > we should have a different trace).
>
> IMHO you can only say this thread is supposed to run with
> lockdep_depth() == 0. lockdep_depth is counted within a process,
> which starts before f(), so the only way to say f() leaked locks
> is to check these locks before and after f().
Sorry, I can't understand you. lockdep_depth is counted within a process,
which starts before f(), yes. This process is cwq->thread, it was forked
during create_workqueue(). It does not take any locks directly, only by
calling work->func(). laundry_wq doesn't differ from keventd_wq or any other
wq in this sense. nfsd does not "runs kthread by itself", it inserts the
work and wakes up cwq->thread.
> > Personally I agree with Andrew:
> > >
> > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of
> > > > code which takes a lock then releases it on a later workqueue invokation.
>
> But this code is named here as laundromat_main and it doesn't
> seem to work like this.
This means we have a problem with leak detection.
> > > + ld = lockdep_depth(current);
> > > +
> > > f(work);
> > >
> > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
> > > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
> >
> > and with this change we will also have a BUG report on "then releases it on a
> > later workqueue invokation".
>
> Then we could say at least this code is weird (buggy - in my opinion).
> This patch doesn't change the way the "standard" code is treated,
> so I cannot see any possibility to get it worse then now.
I didn't mean this patch makes things worse (except it conflicts with other
patches in -mm tree). In fact, it may improve the diagnostics. My point is
that this patch afaics has nothing to do with the discussed problem.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 14:46 ` Oleg Nesterov
@ 2007-03-21 15:16 ` Jarek Poplawski
2007-03-21 15:17 ` Folkert van Heusden
2007-03-21 15:29 ` Oleg Nesterov
0 siblings, 2 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-21 15:16 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Wed, Mar 21, 2007 at 05:46:20PM +0300, Oleg Nesterov wrote:
> On 03/21, Jarek Poplawski wrote:
> >
> > On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote:
> > > On 03/20, Jarek Poplawski wrote:
> > ...
> > > > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > > > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > > >>>>> ...
> > > > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > > > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577:
> > > > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > > > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > > > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > > > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > > > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > > > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > > > >>>>> [ 1756.729547] =======================
> > ...
> > > > This check is valid with keventd, but it looks like nfsd runs
> > > > kthread by itself. I'm not sure it's illegal to hold locks then,
> > >
> > > nfsd creates laundry_wq by itself, yes, but cwq->thread runs with
> > > lockdep_depth() == 0. Unless we have a bug with lockdep_depth(),
> > > lockdep_depth() != 0 means that work->func() returns with a lock
> > > held (or it can flush its own workqueue under lock, but in that case
> > > we should have a different trace).
> >
> > IMHO you can only say this thread is supposed to run with
> > lockdep_depth() == 0. lockdep_depth is counted within a process,
> > which starts before f(), so the only way to say f() leaked locks
> > is to check these locks before and after f().
>
> Sorry, I can't understand you. lockdep_depth is counted within a process,
> which starts before f(), yes. This process is cwq->thread, it was forked
> during create_workqueue(). It does not take any locks directly, only by
> calling work->func(). laundry_wq doesn't differ from keventd_wq or any other
> wq in this sense. nfsd does not "runs kthread by itself", it inserts the
> work and wakes up cwq->thread.
I think we know how it all should start and go. If only analyzing
the code could be enough, this current check of lockdep_depth()
is unnecessary too, because laundromat_code code looks as good
as run_workqueue code. I send it for testing and I mean it -
something strange is going here, so some checks should be added
- I didn't say mine is the right and will certainly help. So, if
you have another idea for looking after it, let us know.
>
> > > Personally I agree with Andrew:
> > > >
> > > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of
> > > > > code which takes a lock then releases it on a later workqueue invokation.
> >
> > But this code is named here as laundromat_main and it doesn't
> > seem to work like this.
>
> This means we have a problem with leak detection.
I totally agree!
>
> > > > + ld = lockdep_depth(current);
> > > > +
> > > > f(work);
> > > >
> > > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
> > > > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) {
> > >
> > > and with this change we will also have a BUG report on "then releases it on a
> > > later workqueue invokation".
> >
> > Then we could say at least this code is weird (buggy - in my opinion).
> > This patch doesn't change the way the "standard" code is treated,
> > so I cannot see any possibility to get it worse then now.
>
> I didn't mean this patch makes things worse (except it conflicts with other
> patches in -mm tree). In fact, it may improve the diagnostics. My point is
> that this patch afaics has nothing to do with the discussed problem.
I thought you are trying to diagnose a bug and maybe Folkert could test,
if this patch makes any difference. Sorry, if I missed the real problem.
Cheers,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 15:16 ` Jarek Poplawski
@ 2007-03-21 15:17 ` Folkert van Heusden
2007-03-21 15:29 ` Oleg Nesterov
1 sibling, 0 replies; 35+ messages in thread
From: Folkert van Heusden @ 2007-03-21 15:17 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Peter Zijlstra,
linux-kernel, J. Bruce Fields, Ingo Molnar
> I thought you are trying to diagnose a bug and maybe Folkert could test,
> if this patch makes any difference. Sorry, if I missed the real problem.
Yes please let me know when I can test a patch.
Folkert van Heusden
--
www.biglumber.com <- site where one can exchange PGP key signatures
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 15:16 ` Jarek Poplawski
2007-03-21 15:17 ` Folkert van Heusden
@ 2007-03-21 15:29 ` Oleg Nesterov
2007-03-21 18:16 ` Oleg Nesterov
1 sibling, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2007-03-21 15:29 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On 03/21, Jarek Poplawski wrote:
>
> > Sorry, I can't understand you. lockdep_depth is counted within a process,
> > which starts before f(), yes. This process is cwq->thread, it was forked
> > during create_workqueue(). It does not take any locks directly, only by
> > calling work->func(). laundry_wq doesn't differ from keventd_wq or any other
> > wq in this sense. nfsd does not "runs kthread by itself", it inserts the
> > work and wakes up cwq->thread.
>
> I think we know how it all should start and go. If only analyzing
> the code could be enough, this current check of lockdep_depth()
> is unnecessary too, because laundromat_code code looks as good
> as run_workqueue code. I send it for testing and I mean it -
> something strange is going here, so some checks should be added
> - I didn't say mine is the right and will certainly help.
Ah, sorry, I misunderstood you. I had (my fault) a false impression
that you propose this patch as a "fix", while we seem to agree that
the real source of this problem remains unknown.
> So, if
> you have another idea for looking after it, let us know.
No, I don't.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 15:29 ` Oleg Nesterov
@ 2007-03-21 18:16 ` Oleg Nesterov
2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
0 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2007-03-21 18:16 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On 03/21, Oleg Nesterov wrote:
>
> On 03/21, Jarek Poplawski wrote:
> >
> > So, if
> > you have another idea for looking after it, let us know.
>
> No, I don't.
Perhaps we can add lockdep_depth() checks to laundromat_main/nfs4_laundromat
to rule out workqueue.c. According to the trace, we return with client_mutex
held, this seems impossible. So we can also print the value of mutex_is_locked()
if lockdep_depth() != 0.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-21 18:16 ` Oleg Nesterov
@ 2007-03-22 6:11 ` Jarek Poplawski
2007-03-22 6:28 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 6:11 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
Here is some joke:
[PATCH] lockdep: lockdep_depth vs. debug_locks
lockdep really shouldn't be used when debug_locks == 0!
Reported-by: Folkert van Heusden <folkert@vanheusden.com>
Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---
diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h
--- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100
+++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100
@@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_
# define INIT_LOCKDEP .lockdep_recursion = 0,
-#define lockdep_depth(tsk) ((tsk)->lockdep_depth)
+#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
#else /* !LOCKDEP */
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
@ 2007-03-22 6:28 ` Andrew Morton
2007-03-22 7:06 ` Jarek Poplawski
` (2 more replies)
2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski
2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra
2 siblings, 3 replies; 35+ messages in thread
From: Andrew Morton @ 2007-03-22 6:28 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> Here is some joke:
>
> [PATCH] lockdep: lockdep_depth vs. debug_locks
>
> lockdep really shouldn't be used when debug_locks == 0!
>
This isn't a very good changelog.
>
> Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
>
> ---
>
> diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h
> --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100
> +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100
> @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_
>
> # define INIT_LOCKDEP .lockdep_recursion = 0,
>
> -#define lockdep_depth(tsk) ((tsk)->lockdep_depth)
> +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
>
> #else /* !LOCKDEP */
>
What problem does this solve, and how does it solve it?
I assume that some codepath is incrementing ->lockdep_depth even when
debug_locks==0. Isn't that wrong of it?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
2007-03-22 6:28 ` Andrew Morton
@ 2007-03-22 6:57 ` Jarek Poplawski
2007-03-22 7:23 ` Peter Zijlstra
2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra
2 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 6:57 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
And here is some addition.
[PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
lockdep's data shouldn't be used when debug_locks == 0
because it's not updated after this, so it's more misleading
than helpful.
PS: probably lockdep's current-> fields should be reset after
it turns debug_locks off: so, after printing a bug report, but
before return from exported functions, but there are really
a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON),
so, something could be missed. (Of course direct use of this
fields isn't recommended either.)
Reported-by: Folkert van Heusden <folkert@vanheusden.com>
Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---
diff -Nurp 2.6.21-rc4-git4-/kernel/lockdep.c 2.6.21-rc4-git4/kernel/lockdep.c
--- 2.6.21-rc4-git4-/kernel/lockdep.c 2007-03-21 22:46:26.000000000 +0100
+++ 2.6.21-rc4-git4/kernel/lockdep.c 2007-03-21 23:05:17.000000000 +0100
@@ -2742,6 +2742,10 @@ void debug_show_all_locks(void)
int count = 10;
int unlock = 1;
+ if (unlikely(!debug_locks)) {
+ printk("INFO: lockdep is turned off.\n");
+ return;
+ }
printk("\nShowing all locks held in the system:\n");
/*
@@ -2785,6 +2789,10 @@ EXPORT_SYMBOL_GPL(debug_show_all_locks);
void debug_show_held_locks(struct task_struct *task)
{
+ if (unlikely(!debug_locks)) {
+ printk("INFO: lockdep is turned off.\n");
+ return;
+ }
lockdep_print_held_locks(task);
}
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 6:28 ` Andrew Morton
@ 2007-03-22 7:06 ` Jarek Poplawski
2007-03-22 7:23 ` Jarek Poplawski
2007-03-22 7:13 ` Jarek Poplawski
2007-03-22 8:26 ` Jarek Poplawski
2 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 7:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote:
> On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> >
> > Here is some joke:
> >
> > [PATCH] lockdep: lockdep_depth vs. debug_locks
> >
> > lockdep really shouldn't be used when debug_locks == 0!
> >
>
> This isn't a very good changelog.
>
> >
> > Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
> >
> > ---
> >
> > diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h
> > --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100
> > +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100
> > @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_
> >
> > # define INIT_LOCKDEP .lockdep_recursion = 0,
> >
> > -#define lockdep_depth(tsk) ((tsk)->lockdep_depth)
> > +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
> >
> > #else /* !LOCKDEP */
> >
>
> What problem does this solve, and how does it solve it?
>
> I assume that some codepath is incrementing ->lockdep_depth even when
> debug_locks==0. Isn't that wrong of it?
>
This should definitely solve this problem - as it was said
a few times before lockdep stops registering locks after
a bug, so even the lock which caused the warning isn't
reported. Here lockdep found a bug in a workqueue function,
so after this no lock or unlock isn't counted nor reported.
I think Ingo can tell more.
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 6:28 ` Andrew Morton
2007-03-22 7:06 ` Jarek Poplawski
@ 2007-03-22 7:13 ` Jarek Poplawski
2007-03-22 8:26 ` Jarek Poplawski
2 siblings, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 7:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote:
...
> I assume that some codepath is incrementing ->lockdep_depth even when
> debug_locks==0. Isn't that wrong of it?
>
lockdep simply stops to update lockdep_depth just after (during)
a bug or a WARN.
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
2007-03-22 6:28 ` Andrew Morton
2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski
@ 2007-03-22 7:22 ` Peter Zijlstra
2007-03-22 9:06 ` Ingo Molnar
2 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2007-03-22 7:22 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Thu, 2007-03-22 at 07:11 +0100, Jarek Poplawski wrote:
> Here is some joke:
>
> [PATCH] lockdep: lockdep_depth vs. debug_locks
>
> lockdep really shouldn't be used when debug_locks == 0!
This happens then lockdep reports a fatal error, at which point
it will stop tracking locks and leave whatever state was there.
> Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
This looks sane, thanks for figuring this out.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>
> diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h
> --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100
> +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100
> @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_
>
> # define INIT_LOCKDEP .lockdep_recursion = 0,
>
> -#define lockdep_depth(tsk) ((tsk)->lockdep_depth)
> +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
>
> #else /* !LOCKDEP */
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 7:06 ` Jarek Poplawski
@ 2007-03-22 7:23 ` Jarek Poplawski
0 siblings, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 7:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Thu, Mar 22, 2007 at 08:06:44AM +0100, Jarek Poplawski wrote:
...
> This should definitely solve this problem - as it was said
> a few times before lockdep stops registering locks after
> a bug, so even the lock which caused the warning isn't
> reported. Here lockdep found a bug in a workqueue function,
More precisely:
lockdep found a bug during a run of workqueue function - this
could be also caused by a bug from other code running
simultaneusly.
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski
@ 2007-03-22 7:23 ` Peter Zijlstra
2007-03-22 9:06 ` Ingo Molnar
0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2007-03-22 7:23 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Thu, 2007-03-22 at 07:57 +0100, Jarek Poplawski wrote:
> And here is some addition.
>
> [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
>
> lockdep's data shouldn't be used when debug_locks == 0
> because it's not updated after this, so it's more misleading
> than helpful.
>
> PS: probably lockdep's current-> fields should be reset after
> it turns debug_locks off: so, after printing a bug report, but
> before return from exported functions, but there are really
> a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON),
> so, something could be missed. (Of course direct use of this
> fields isn't recommended either.)
>
> Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>
> diff -Nurp 2.6.21-rc4-git4-/kernel/lockdep.c 2.6.21-rc4-git4/kernel/lockdep.c
> --- 2.6.21-rc4-git4-/kernel/lockdep.c 2007-03-21 22:46:26.000000000 +0100
> +++ 2.6.21-rc4-git4/kernel/lockdep.c 2007-03-21 23:05:17.000000000 +0100
> @@ -2742,6 +2742,10 @@ void debug_show_all_locks(void)
> int count = 10;
> int unlock = 1;
>
> + if (unlikely(!debug_locks)) {
> + printk("INFO: lockdep is turned off.\n");
> + return;
> + }
> printk("\nShowing all locks held in the system:\n");
>
> /*
> @@ -2785,6 +2789,10 @@ EXPORT_SYMBOL_GPL(debug_show_all_locks);
>
> void debug_show_held_locks(struct task_struct *task)
> {
> + if (unlikely(!debug_locks)) {
> + printk("INFO: lockdep is turned off.\n");
> + return;
> + }
> lockdep_print_held_locks(task);
> }
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 6:28 ` Andrew Morton
2007-03-22 7:06 ` Jarek Poplawski
2007-03-22 7:13 ` Jarek Poplawski
@ 2007-03-22 8:26 ` Jarek Poplawski
2 siblings, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2007-03-22 8:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden,
linux-kernel, J. Bruce Fields, Ingo Molnar
On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote:
> On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> >
> > Here is some joke:
> >
> > [PATCH] lockdep: lockdep_depth vs. debug_locks
> >
> > lockdep really shouldn't be used when debug_locks == 0!
> >
>
> This isn't a very good changelog.
Actually it's not a changelog (it's a joke).
Jarek P.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock
2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra
@ 2007-03-22 9:06 ` Ingo Molnar
0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2007-03-22 9:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jarek Poplawski, Oleg Nesterov, Neil Brown, Andrew Morton,
Folkert van Heusden, linux-kernel, J. Bruce Fields
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Thu, 2007-03-22 at 07:11 +0100, Jarek Poplawski wrote:
> > Here is some joke:
> >
> > [PATCH] lockdep: lockdep_depth vs. debug_locks
> >
> > lockdep really shouldn't be used when debug_locks == 0!
>
> This happens then lockdep reports a fatal error, at which point
> it will stop tracking locks and leave whatever state was there.
>
> > Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
>
> This looks sane, thanks for figuring this out.
>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
2007-03-22 7:23 ` Peter Zijlstra
@ 2007-03-22 9:06 ` Ingo Molnar
0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2007-03-22 9:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jarek Poplawski, Oleg Nesterov, Neil Brown, Andrew Morton,
Folkert van Heusden, linux-kernel, J. Bruce Fields
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Thu, 2007-03-22 at 07:57 +0100, Jarek Poplawski wrote:
> > And here is some addition.
> >
> > [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks
> >
> > lockdep's data shouldn't be used when debug_locks == 0
> > because it's not updated after this, so it's more misleading
> > than helpful.
> >
> > PS: probably lockdep's current-> fields should be reset after
> > it turns debug_locks off: so, after printing a bug report, but
> > before return from exported functions, but there are really
> > a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON),
> > so, something could be missed. (Of course direct use of this
> > fields isn't recommended either.)
> >
> > Reported-by: Folkert van Heusden <folkert@vanheusden.com>
> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru>
> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-03-22 9:07 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-13 16:50 [2.6.20] BUG: workqueue leaked lock Folkert van Heusden
2007-03-15 19:06 ` Andrew Morton
2007-03-15 19:17 ` Folkert van Heusden
2007-03-16 14:49 ` Jarek Poplawski
2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski
2007-03-20 11:22 ` Jarek Poplawski
2007-03-20 11:31 ` Jarek Poplawski
2007-03-20 12:19 ` Jan Kara
2007-03-20 13:35 ` Arjan van de Ven
2007-03-20 14:21 ` Jan Kara
2007-03-20 14:18 ` Arjan van de Ven
2007-03-20 13:44 ` Jarek Poplawski
2007-03-20 14:00 ` Jan Kara
2007-03-16 8:41 ` Peter Zijlstra
2007-03-16 11:39 ` Andrew Morton
2007-03-19 6:24 ` Neil Brown
2007-03-20 9:37 ` [PATCH] " Jarek Poplawski
2007-03-20 16:07 ` Oleg Nesterov
2007-03-21 8:05 ` Jarek Poplawski
2007-03-21 14:46 ` Oleg Nesterov
2007-03-21 15:16 ` Jarek Poplawski
2007-03-21 15:17 ` Folkert van Heusden
2007-03-21 15:29 ` Oleg Nesterov
2007-03-21 18:16 ` Oleg Nesterov
2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski
2007-03-22 6:28 ` Andrew Morton
2007-03-22 7:06 ` Jarek Poplawski
2007-03-22 7:23 ` Jarek Poplawski
2007-03-22 7:13 ` Jarek Poplawski
2007-03-22 8:26 ` Jarek Poplawski
2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski
2007-03-22 7:23 ` Peter Zijlstra
2007-03-22 9:06 ` Ingo Molnar
2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra
2007-03-22 9:06 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).