LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* ext3 lockdep warning in 2.6.25-rc6
@ 2008-03-22 14:37 Erez Zadok
2008-03-22 17:38 ` Peter Zijlstra
2008-03-25 18:29 ` Jan Kara
0 siblings, 2 replies; 11+ messages in thread
From: Erez Zadok @ 2008-03-22 14:37 UTC (permalink / raw)
To: sct, akpm, adilger, linux-ext4; +Cc: linux-kernel
I was building a kernel using "make -j 4" inside a unionfs, mounted on top
of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
got the lockdep warning below. Note that unionfs doesn't appear to be
involved in this lockdep warning at all, so I suspect this is probably an
issue between jbd and ext3 directly.
Let me know if I can be of more help.
Cheers,
Erez.
[31270.749254] =======================================================
[31270.749473] [ INFO: possible circular locking dependency detected ]
[31270.749720] 2.6.25-rc6-unionfs2 #69
[31270.749831] -------------------------------------------------------
[31270.749958] flush-caches.sh/9702 is trying to acquire lock:
[31270.750171] (&journal->j_list_lock){--..}, at: [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
[31270.751607]
[31270.751607] but task is already holding lock:
[31270.751607] (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
[31270.751607]
[31270.751607] which lock already depends on the new lock.
[31270.751607]
[31270.751607]
[31270.751607] the existing dependency chain (in reverse order) is:
[31270.751607]
[31270.751607] -> #1 (inode_lock){--..}:
[31270.751607] [<c0231e01>] __lock_acquire+0x934/0xab8
[31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
[31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
[31270.751607] [<c02713d2>] __mark_inode_dirty+0xbe/0x134
[31270.751607] [<c0274b3a>] __set_page_dirty+0xdd/0xec
[31270.751607] [<c0274ba9>] mark_buffer_dirty+0x60/0x63
[31270.751607] [<c029d2f6>] __journal_temp_unlink_buffer+0xc9/0xce
[31270.751607] [<c029d43b>] __journal_unfile_buffer+0xb/0x15
[31270.751607] [<c029d487>] __journal_refile_buffer+0x42/0x8c
[31270.751607] [<c029fdd1>] journal_commit_transaction+0xc59/0xe64
[31270.751607] [<c02a29f1>] kjournald+0xae/0x1b0
[31270.751607] [<c0228833>] kthread+0x3b/0x64
[31270.751607] [<c02039d3>] kernel_thread_helper+0x7/0x10
[31270.751607] [<ffffffff>] 0xffffffff
[31270.751607]
[31270.751607] -> #0 (&journal->j_list_lock){--..}:
[31270.751607] [<c0231d28>] __lock_acquire+0x85b/0xab8
[31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
[31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
[31270.751607] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
[31270.751607] [<c02925fe>] ext3_releasepage+0x53/0x5c
[31270.751607] [<c023e4bd>] try_to_release_page+0x32/0x3f
[31270.751607] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
[31270.751607] [<c02718de>] drop_pagecache+0x68/0xd3
[31270.751607] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
[31270.751607] [<c0289383>] proc_sys_write+0x61/0x7e
[31270.751607] [<c02594a1>] vfs_write+0x8c/0x108
[31270.751607] [<c0259a2b>] sys_write+0x3b/0x60
[31270.751607] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
[31270.751607] [<ffffffff>] 0xffffffff
[31270.751607]
[31270.751607] other info that might help us debug this:
[31270.751607]
[31270.751607] 2 locks held by flush-caches.sh/9702:
[31270.751607] #0: (&type->s_umount_key#12){----}, at: [<c02718ab>] drop_pagecache+0x35/0xd3
[31270.751607] #1: (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
[31270.751608]
[31270.751608] stack backtrace:
[31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
[31270.751608] [<c0230687>] print_circular_bug_tail+0x5b/0x66
[31270.751608] [<c0231d28>] __lock_acquire+0x85b/0xab8
[31270.751608] [<c0232307>] lock_acquire+0x4e/0x6a
[31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
[31270.751608] [<c0395e33>] _spin_lock+0x23/0x32
[31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
[31270.751608] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
[31270.751608] [<c02925fe>] ext3_releasepage+0x53/0x5c
[31270.751608] [<c023e4bd>] try_to_release_page+0x32/0x3f
[31270.751608] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
[31270.751608] [<c02718de>] drop_pagecache+0x68/0xd3
[31270.751608] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
[31270.751608] [<c0289383>] proc_sys_write+0x61/0x7e
[31270.751608] [<c0289322>] ? proc_sys_write+0x0/0x7e
[31270.751608] [<c02594a1>] vfs_write+0x8c/0x108
[31270.751608] [<c0259a2b>] sys_write+0x3b/0x60
[31270.751608] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
[31270.751608] =======================
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-22 14:37 ext3 lockdep warning in 2.6.25-rc6 Erez Zadok
@ 2008-03-22 17:38 ` Peter Zijlstra
2008-03-22 22:34 ` Erez Zadok
2008-03-25 18:29 ` Jan Kara
1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2008-03-22 17:38 UTC (permalink / raw)
To: Erez Zadok; +Cc: sct, akpm, adilger, linux-ext4, linux-kernel
On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
> I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> got the lockdep warning below. Note that unionfs doesn't appear to be
> involved in this lockdep warning at all, so I suspect this is probably an
> issue between jbd and ext3 directly.
Known issue, drop_caches isn't production quality.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-22 17:38 ` Peter Zijlstra
@ 2008-03-22 22:34 ` Erez Zadok
2008-03-23 11:06 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Erez Zadok @ 2008-03-22 22:34 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Erez Zadok, sct, akpm, adilger, linux-ext4, linux-kernel
In message <1206207527.6437.83.camel@lappy>, Peter Zijlstra writes:
> On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
> > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > got the lockdep warning below. Note that unionfs doesn't appear to be
> > involved in this lockdep warning at all, so I suspect this is probably an
> > issue between jbd and ext3 directly.
>
> Known issue, drop_caches isn't production quality.
Is this a general problem with drop_caches, or its interaction with
ext3/jbd? If I get a similar lockdep warning with other file systems,
should I bother to report this to the respective f/s maintainers?
Thanks,
Erez.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-22 22:34 ` Erez Zadok
@ 2008-03-23 11:06 ` Peter Zijlstra
0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2008-03-23 11:06 UTC (permalink / raw)
To: Erez Zadok; +Cc: sct, akpm, adilger, linux-ext4, linux-kernel
On Sat, 2008-03-22 at 18:34 -0400, Erez Zadok wrote:
> In message <1206207527.6437.83.camel@lappy>, Peter Zijlstra writes:
> > On Sat, 2008-03-22 at 10:37 -0400, Erez Zadok wrote:
> > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > > got the lockdep warning below. Note that unionfs doesn't appear to be
> > > involved in this lockdep warning at all, so I suspect this is probably an
> > > issue between jbd and ext3 directly.
> >
> > Known issue, drop_caches isn't production quality.
>
> Is this a general problem with drop_caches, or its interaction with
> ext3/jbd? If I get a similar lockdep warning with other file systems,
> should I bother to report this to the respective f/s maintainers?
Its a generic problem with drop_caches. If you look in the lkml archives
you'll find similar reports against XFS and others.
Some time ago Andrew outlined a fix for it, but so far that hasn't
happened: http://lkml.org/lkml/2007/2/23/46
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-22 14:37 ext3 lockdep warning in 2.6.25-rc6 Erez Zadok
2008-03-22 17:38 ` Peter Zijlstra
@ 2008-03-25 18:29 ` Jan Kara
2008-04-01 21:23 ` Erez Zadok
2008-04-04 15:37 ` Erez Zadok
1 sibling, 2 replies; 11+ messages in thread
From: Jan Kara @ 2008-03-25 18:29 UTC (permalink / raw)
To: Erez Zadok; +Cc: sct, akpm, adilger, linux-ext4, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5071 bytes --]
> I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> got the lockdep warning below. Note that unionfs doesn't appear to be
> involved in this lockdep warning at all, so I suspect this is probably an
> issue between jbd and ext3 directly.
>
> Let me know if I can be of more help.
Actually, I have a fix for that - attached. Can you try it? Thanks.
Honza
> [31270.749254] =======================================================
> [31270.749473] [ INFO: possible circular locking dependency detected ]
> [31270.749720] 2.6.25-rc6-unionfs2 #69
> [31270.749831] -------------------------------------------------------
> [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
> [31270.750171] (&journal->j_list_lock){--..}, at: [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> [31270.751607]
> [31270.751607] but task is already holding lock:
> [31270.751607] (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> [31270.751607]
> [31270.751607] which lock already depends on the new lock.
> [31270.751607]
> [31270.751607]
> [31270.751607] the existing dependency chain (in reverse order) is:
> [31270.751607]
> [31270.751607] -> #1 (inode_lock){--..}:
> [31270.751607] [<c0231e01>] __lock_acquire+0x934/0xab8
> [31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
> [31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
> [31270.751607] [<c02713d2>] __mark_inode_dirty+0xbe/0x134
> [31270.751607] [<c0274b3a>] __set_page_dirty+0xdd/0xec
> [31270.751607] [<c0274ba9>] mark_buffer_dirty+0x60/0x63
> [31270.751607] [<c029d2f6>] __journal_temp_unlink_buffer+0xc9/0xce
> [31270.751607] [<c029d43b>] __journal_unfile_buffer+0xb/0x15
> [31270.751607] [<c029d487>] __journal_refile_buffer+0x42/0x8c
> [31270.751607] [<c029fdd1>] journal_commit_transaction+0xc59/0xe64
> [31270.751607] [<c02a29f1>] kjournald+0xae/0x1b0
> [31270.751607] [<c0228833>] kthread+0x3b/0x64
> [31270.751607] [<c02039d3>] kernel_thread_helper+0x7/0x10
> [31270.751607] [<ffffffff>] 0xffffffff
> [31270.751607]
> [31270.751607] -> #0 (&journal->j_list_lock){--..}:
> [31270.751607] [<c0231d28>] __lock_acquire+0x85b/0xab8
> [31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
> [31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
> [31270.751607] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> [31270.751607] [<c02925fe>] ext3_releasepage+0x53/0x5c
> [31270.751607] [<c023e4bd>] try_to_release_page+0x32/0x3f
> [31270.751607] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> [31270.751607] [<c02718de>] drop_pagecache+0x68/0xd3
> [31270.751607] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> [31270.751607] [<c0289383>] proc_sys_write+0x61/0x7e
> [31270.751607] [<c02594a1>] vfs_write+0x8c/0x108
> [31270.751607] [<c0259a2b>] sys_write+0x3b/0x60
> [31270.751607] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> [31270.751607] [<ffffffff>] 0xffffffff
> [31270.751607]
> [31270.751607] other info that might help us debug this:
> [31270.751607]
> [31270.751607] 2 locks held by flush-caches.sh/9702:
> [31270.751607] #0: (&type->s_umount_key#12){----}, at: [<c02718ab>] drop_pagecache+0x35/0xd3
> [31270.751607] #1: (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> [31270.751608]
> [31270.751608] stack backtrace:
> [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
> [31270.751608] [<c0230687>] print_circular_bug_tail+0x5b/0x66
> [31270.751608] [<c0231d28>] __lock_acquire+0x85b/0xab8
> [31270.751608] [<c0232307>] lock_acquire+0x4e/0x6a
> [31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> [31270.751608] [<c0395e33>] _spin_lock+0x23/0x32
> [31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> [31270.751608] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> [31270.751608] [<c02925fe>] ext3_releasepage+0x53/0x5c
> [31270.751608] [<c023e4bd>] try_to_release_page+0x32/0x3f
> [31270.751608] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> [31270.751608] [<c02718de>] drop_pagecache+0x68/0xd3
> [31270.751608] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> [31270.751608] [<c0289383>] proc_sys_write+0x61/0x7e
> [31270.751608] [<c0289322>] ? proc_sys_write+0x0/0x7e
> [31270.751608] [<c02594a1>] vfs_write+0x8c/0x108
> [31270.751608] [<c0259a2b>] sys_write+0x3b/0x60
> [31270.751608] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> [31270.751608] =======================
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
[-- Attachment #2: vfs-2.6.25-drop_pagecache_sb_fix.diff --]
[-- Type: text/x-diff, Size: 1093 bytes --]
>From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 18 Mar 2008 14:38:06 +0100
Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
inode_lock.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/drop_caches.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 59375ef..f5aae26 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -14,15 +14,21 @@ int sysctl_drop_caches;
static void drop_pagecache_sb(struct super_block *sb)
{
- struct inode *inode;
+ struct inode *inode, *toput_inode = NULL;
spin_lock(&inode_lock);
list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
if (inode->i_state & (I_FREEING|I_WILL_FREE))
continue;
+ __iget(inode);
+ spin_unlock(&inode_lock);
__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
+ iput(toput_inode);
+ toput_inode = inode;
+ spin_lock(&inode_lock);
}
spin_unlock(&inode_lock);
+ iput(toput_inode);
}
void drop_pagecache(void)
--
1.5.2.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-25 18:29 ` Jan Kara
@ 2008-04-01 21:23 ` Erez Zadok
2008-04-02 8:12 ` Jan Kara
2008-04-04 15:37 ` Erez Zadok
1 sibling, 1 reply; 11+ messages in thread
From: Erez Zadok @ 2008-04-01 21:23 UTC (permalink / raw)
To: Jan Kara; +Cc: Erez Zadok, sct, akpm, adilger, linux-ext4, linux-kernel
In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:
>
> --sdtB3X0nJg68CQEu
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
>
> > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > got the lockdep warning below. Note that unionfs doesn't appear to be
> > involved in this lockdep warning at all, so I suspect this is probably an
> > issue between jbd and ext3 directly.
> >
> > Let me know if I can be of more help.
> Actually, I have a fix for that - attached. Can you try it? Thanks.
> Honza
>
> > [31270.749254] =======================================================
> > [31270.749473] [ INFO: possible circular locking dependency detected ]
> > [31270.749720] 2.6.25-rc6-unionfs2 #69
> > [31270.749831] -------------------------------------------------------
> > [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
> > [31270.750171] (&journal->j_list_lock){--..}, at: [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751607]
> > [31270.751607] but task is already holding lock:
> > [31270.751607] (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> > [31270.751607]
> > [31270.751607] which lock already depends on the new lock.
> > [31270.751607]
> > [31270.751607]
> > [31270.751607] the existing dependency chain (in reverse order) is:
> > [31270.751607]
> > [31270.751607] -> #1 (inode_lock){--..}:
> > [31270.751607] [<c0231e01>] __lock_acquire+0x934/0xab8
> > [31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751607] [<c02713d2>] __mark_inode_dirty+0xbe/0x134
> > [31270.751607] [<c0274b3a>] __set_page_dirty+0xdd/0xec
> > [31270.751607] [<c0274ba9>] mark_buffer_dirty+0x60/0x63
> > [31270.751607] [<c029d2f6>] __journal_temp_unlink_buffer+0xc9/0xce
> > [31270.751607] [<c029d43b>] __journal_unfile_buffer+0xb/0x15
> > [31270.751607] [<c029d487>] __journal_refile_buffer+0x42/0x8c
> > [31270.751607] [<c029fdd1>] journal_commit_transaction+0xc59/0xe64
> > [31270.751607] [<c02a29f1>] kjournald+0xae/0x1b0
> > [31270.751607] [<c0228833>] kthread+0x3b/0x64
> > [31270.751607] [<c02039d3>] kernel_thread_helper+0x7/0x10
> > [31270.751607] [<ffffffff>] 0xffffffff
> > [31270.751607]
> > [31270.751607] -> #0 (&journal->j_list_lock){--..}:
> > [31270.751607] [<c0231d28>] __lock_acquire+0x85b/0xab8
> > [31270.751607] [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751607] [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751607] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751607] [<c02925fe>] ext3_releasepage+0x53/0x5c
> > [31270.751607] [<c023e4bd>] try_to_release_page+0x32/0x3f
> > [31270.751607] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> > [31270.751607] [<c02718de>] drop_pagecache+0x68/0xd3
> > [31270.751607] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> > [31270.751607] [<c0289383>] proc_sys_write+0x61/0x7e
> > [31270.751607] [<c02594a1>] vfs_write+0x8c/0x108
> > [31270.751607] [<c0259a2b>] sys_write+0x3b/0x60
> > [31270.751607] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> > [31270.751607] [<ffffffff>] 0xffffffff
> > [31270.751607]
> > [31270.751607] other info that might help us debug this:
> > [31270.751607]
> > [31270.751607] 2 locks held by flush-caches.sh/9702:
> > [31270.751607] #0: (&type->s_umount_key#12){----}, at: [<c02718ab>] drop_pagecache+0x35/0xd3
> > [31270.751607] #1: (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> > [31270.751608]
> > [31270.751608] stack backtrace:
> > [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
> > [31270.751608] [<c0230687>] print_circular_bug_tail+0x5b/0x66
> > [31270.751608] [<c0231d28>] __lock_acquire+0x85b/0xab8
> > [31270.751608] [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608] [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751608] [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608] [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608] [<c02925fe>] ext3_releasepage+0x53/0x5c
> > [31270.751608] [<c023e4bd>] try_to_release_page+0x32/0x3f
> > [31270.751608] [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> > [31270.751608] [<c02718de>] drop_pagecache+0x68/0xd3
> > [31270.751608] [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> > [31270.751608] [<c0289383>] proc_sys_write+0x61/0x7e
> > [31270.751608] [<c0289322>] ? proc_sys_write+0x0/0x7e
> > [31270.751608] [<c02594a1>] vfs_write+0x8c/0x108
> > [31270.751608] [<c0259a2b>] sys_write+0x3b/0x60
> > [31270.751608] [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> > [31270.751608] =======================
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> --
> Jan Kara <jack@suse.cz>
> SuSE CR Labs
>
> --sdtB3X0nJg68CQEu
> Content-Type: text/x-diff; charset=us-ascii
> Content-Disposition: attachment; filename="vfs-2.6.25-drop_pagecache_sb_fix.diff"
>
> From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 18 Mar 2008 14:38:06 +0100
> Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> inode_lock.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/drop_caches.c | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..f5aae26 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,21 @@ int sysctl_drop_caches;
>
> static void drop_pagecache_sb(struct super_block *sb)
> {
> - struct inode *inode;
> + struct inode *inode, *toput_inode = NULL;
>
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> if (inode->i_state & (I_FREEING|I_WILL_FREE))
> continue;
> + __iget(inode);
> + spin_unlock(&inode_lock);
> __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> + iput(toput_inode);
> + toput_inode = inode;
> + spin_lock(&inode_lock);
> }
> spin_unlock(&inode_lock);
> + iput(toput_inode);
> }
Jan, I'll be happy to test this, but I don't understand two things about
this patch:
1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
2. What's the motivation behind having the second toput_inode pointer? It
appears that the first iteration through the loop, toput_inode will be
NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
trying to iput the previous inode pointer that the list iterated over,
right? Is that intended?
Peter's post:
http://lkml.org/lkml/2008/3/23/202
includes a reference to a mail by Andrew which implies that the fix may be
much more involved than what you outlined above, no?
Thanks,
Erez.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-04-01 21:23 ` Erez Zadok
@ 2008-04-02 8:12 ` Jan Kara
2008-04-02 16:34 ` Erez Zadok
2008-04-04 3:14 ` David Chinner
0 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2008-04-02 8:12 UTC (permalink / raw)
To: Erez Zadok; +Cc: sct, akpm, adilger, linux-ext4, linux-kernel
On Tue 01-04-08 17:23:34, Erez Zadok wrote:
<snip>
> > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Tue, 18 Mar 2008 14:38:06 +0100
> > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> > inode_lock.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/drop_caches.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >
> > static void drop_pagecache_sb(struct super_block *sb)
> > {
> > - struct inode *inode;
> > + struct inode *inode, *toput_inode = NULL;
> >
> > spin_lock(&inode_lock);
> > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > continue;
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > + iput(toput_inode);
> > + toput_inode = inode;
> > + spin_lock(&inode_lock);
> > }
> > spin_unlock(&inode_lock);
> > + iput(toput_inode);
> > }
>
> Jan, I'll be happy to test this, but I don't understand two things about
> this patch:
>
> 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
>
> 2. What's the motivation behind having the second toput_inode pointer? It
> appears that the first iteration through the loop, toput_inode will be
> NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
> trying to iput the previous inode pointer that the list iterated over,
> right? Is that intended?
I'll try to explain the locking here:
1) We are not allowed to call into __invalidate_mapping_pages() with
inode_lock held - that it the bug lockdep is complaining about. Moreover it
leads to rather long waiting times for inode_lock (quite possibly several
seconds).
2) When we release inode_lock, we need to protect from inode going away,
thus we hold reference to it - that guarantees us inode stays in the list.
3) When we continue scanning of the list we must get inode_lock before we
put the inode reference to avoid races. But we cannot do iput() when we
hold inode_lock. Thus we save pointer to inode and do iput() next time we
have released the inode_lock...
> Peter's post:
>
> http://lkml.org/lkml/2008/3/23/202
>
> includes a reference to a mail by Andrew which implies that the fix may be
> much more involved than what you outlined above, no?
Definitely we can do a more involved fix ;) What Andrew proposes would have
some other benefits as well. But until somebody gets to that, this slight
hack should work fine (Andrew has already merged my patch in -mm so I
guess he agrees ;).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-04-02 8:12 ` Jan Kara
@ 2008-04-02 16:34 ` Erez Zadok
2008-04-04 3:14 ` David Chinner
1 sibling, 0 replies; 11+ messages in thread
From: Erez Zadok @ 2008-04-02 16:34 UTC (permalink / raw)
To: Jan Kara; +Cc: Erez Zadok, sct, akpm, adilger, linux-ext4, linux-kernel
In message <20080402081215.GA2716@duck.suse.cz>, Jan Kara writes:
> On Tue 01-04-08 17:23:34, Erez Zadok wrote:
> <snip>
[...]
> I'll try to explain the locking here:
> 1) We are not allowed to call into __invalidate_mapping_pages() with
> inode_lock held - that it the bug lockdep is complaining about. Moreover it
> leads to rather long waiting times for inode_lock (quite possibly several
> seconds).
> 2) When we release inode_lock, we need to protect from inode going away,
> thus we hold reference to it - that guarantees us inode stays in the list.
> 3) When we continue scanning of the list we must get inode_lock before we
> put the inode reference to avoid races. But we cannot do iput() when we
> hold inode_lock. Thus we save pointer to inode and do iput() next time we
> have released the inode_lock...
>
> > Peter's post:
> >
> > http://lkml.org/lkml/2008/3/23/202
> >
> > includes a reference to a mail by Andrew which implies that the fix may be
> > much more involved than what you outlined above, no?
> Definitely we can do a more involved fix ;) What Andrew proposes would have
> some other benefits as well. But until somebody gets to that, this slight
> hack should work fine (Andrew has already merged my patch in -mm so I
> guess he agrees ;).
Thanks for the explanation. And good to know it's in -mm. I'll give it a
try.
Cheers,
Erez.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-04-02 8:12 ` Jan Kara
2008-04-02 16:34 ` Erez Zadok
@ 2008-04-04 3:14 ` David Chinner
1 sibling, 0 replies; 11+ messages in thread
From: David Chinner @ 2008-04-04 3:14 UTC (permalink / raw)
To: Jan Kara; +Cc: Erez Zadok, sct, akpm, adilger, linux-ext4, linux-kernel
On Wed, Apr 02, 2008 at 10:12:15AM +0200, Jan Kara wrote:
> On Tue 01-04-08 17:23:34, Erez Zadok wrote:
> <snip>
<snip>
> > Jan, I'll be happy to test this, but I don't understand two things about
> > this patch:
> >
> > 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
> >
> > 2. What's the motivation behind having the second toput_inode pointer? It
> > appears that the first iteration through the loop, toput_inode will be
> > NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're
> > trying to iput the previous inode pointer that the list iterated over,
> > right? Is that intended?
> I'll try to explain the locking here:
> 1) We are not allowed to call into __invalidate_mapping_pages() with
> inode_lock held - that it the bug lockdep is complaining about. Moreover it
> leads to rather long waiting times for inode_lock (quite possibly several
> seconds).
When you have tens of millions of cached inodes, it can take somewhat
longer than a few seconds.... :/
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-03-25 18:29 ` Jan Kara
2008-04-01 21:23 ` Erez Zadok
@ 2008-04-04 15:37 ` Erez Zadok
2008-04-07 11:47 ` Jan Kara
1 sibling, 1 reply; 11+ messages in thread
From: Erez Zadok @ 2008-04-04 15:37 UTC (permalink / raw)
To: Jan Kara; +Cc: Erez Zadok, sct, akpm, adilger, linux-ext4, linux-kernel
In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:
> > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > got the lockdep warning below. Note that unionfs doesn't appear to be
> > involved in this lockdep warning at all, so I suspect this is probably an
> > issue between jbd and ext3 directly.
> >
> > Let me know if I can be of more help.
> Actually, I have a fix for that - attached. Can you try it? Thanks.
> Honza
[...]
> From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 18 Mar 2008 14:38:06 +0100
> Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> inode_lock.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/drop_caches.c | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..f5aae26 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,21 @@ int sysctl_drop_caches;
>
> static void drop_pagecache_sb(struct super_block *sb)
> {
> - struct inode *inode;
> + struct inode *inode, *toput_inode = NULL;
>
> spin_lock(&inode_lock);
> list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> if (inode->i_state & (I_FREEING|I_WILL_FREE))
> continue;
> + __iget(inode);
> + spin_unlock(&inode_lock);
> __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> + iput(toput_inode);
> + toput_inode = inode;
> + spin_lock(&inode_lock);
> }
> spin_unlock(&inode_lock);
> + iput(toput_inode);
> }
>
> void drop_pagecache(void)
Jan, I tried the above patch on top of v2.6.25-rc8-82-g49115b7, and using
the same setup and workloads that produced the warning before: running "make
-j 20" of the linux kernel 100 times, dual-CPU, SMP, PREEMPT, while running
flush_cache every few seconds. The entire run took over 12 hours. I'm
happy to report that so far I've not gotten the same lockdep warning as
before.
Maybe this can now go to -mm for more testing?
Cheers,
Erez.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ext3 lockdep warning in 2.6.25-rc6
2008-04-04 15:37 ` Erez Zadok
@ 2008-04-07 11:47 ` Jan Kara
0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2008-04-07 11:47 UTC (permalink / raw)
To: Erez Zadok; +Cc: sct, akpm, adilger, linux-ext4, linux-kernel
On Fri 04-04-08 11:37:58, Erez Zadok wrote:
> In message <20080325182909.GD21732@atrey.karlin.mff.cuni.cz>, Jan Kara writes:
>
> > > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > > of ext3. The kernel is vanilla 2.6.25-rc6 plus unionfs patches. At some
> > > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > > got the lockdep warning below. Note that unionfs doesn't appear to be
> > > involved in this lockdep warning at all, so I suspect this is probably an
> > > issue between jbd and ext3 directly.
> > >
> > > Let me know if I can be of more help.
> > Actually, I have a fix for that - attached. Can you try it? Thanks.
> > Honza
> [...]
>
> > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Tue, 18 Mar 2008 14:38:06 +0100
> > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> > inode_lock.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/drop_caches.c | 8 +++++++-
> > 1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >
> > static void drop_pagecache_sb(struct super_block *sb)
> > {
> > - struct inode *inode;
> > + struct inode *inode, *toput_inode = NULL;
> >
> > spin_lock(&inode_lock);
> > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > continue;
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > + iput(toput_inode);
> > + toput_inode = inode;
> > + spin_lock(&inode_lock);
> > }
> > spin_unlock(&inode_lock);
> > + iput(toput_inode);
> > }
> >
> > void drop_pagecache(void)
>
> Jan, I tried the above patch on top of v2.6.25-rc8-82-g49115b7, and using
> the same setup and workloads that produced the warning before: running "make
> -j 20" of the linux kernel 100 times, dual-CPU, SMP, PREEMPT, while running
> flush_cache every few seconds. The entire run took over 12 hours. I'm
> happy to report that so far I've not gotten the same lockdep warning as
> before.
Great news, thanks.
> Maybe this can now go to -mm for more testing?
Andrew has already merged it :).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-04-07 11:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-22 14:37 ext3 lockdep warning in 2.6.25-rc6 Erez Zadok
2008-03-22 17:38 ` Peter Zijlstra
2008-03-22 22:34 ` Erez Zadok
2008-03-23 11:06 ` Peter Zijlstra
2008-03-25 18:29 ` Jan Kara
2008-04-01 21:23 ` Erez Zadok
2008-04-02 8:12 ` Jan Kara
2008-04-02 16:34 ` Erez Zadok
2008-04-04 3:14 ` David Chinner
2008-04-04 15:37 ` Erez Zadok
2008-04-07 11:47 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).