LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* mm: soft lockup in 2.6.23-6636. caused by drop_caches ?
@ 2007-10-23 13:55 richard kennedy
  2007-10-23 14:03 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: richard kennedy @ 2007-10-23 13:55 UTC (permalink / raw)
  To: lkml, linux-mm

on git v2.6.23-6636-g557ebb7 I'm getting a soft lockup when running a
simple disk write test case on AMD64X2, sata hd &  ext3.

the test does this
sync
echo 3 > /proc/sys/vm/drop_caches
for (( i=0; $i < $count; i=$i+1 )) ; do
dd if=large_file of=copy_file_$i bs=4k &
done

It never recovers once the lockup occurs and this message repeats on the
console :- 
BUG: soft lockup - CPU#0 stuck for 11s! [kjournald:456]
CPU 0:
Pid: 456, comm: kjournald Not tainted 2.6.23 #1
RIP: 0010:[<ffffffff812442df>]  [<ffffffff812442df>] _spin_lock+0x5/0xf
RSP: 0018:ffff8100096e3db8  EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000015
RDX: 0000000000200000 RSI: 0000000000000004 RDI: ffffffff81378f80
RBP: ffff8100096e3da0 R08: 0000000000000021 R09: 0000000000000000
R10: ffffe2000142b7c0 R11: 00000000fffffffa R12: ffff8100096e3d90
R13: ffffffff812430b2 R14: ffff810006402bd0 R15: ffff81007fba7000
FS:  00002b3cf6460090(0000) GS:ffffffff813aa000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002afa6e20b000 CR3: 000000000d6ba000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [<ffffffff810495cf>] wake_bit_function+0x0/0x23
 [<ffffffff810abdd4>] __mark_inode_dirty+0xe0/0x168
 [<ffffffff810afb43>] __set_page_dirty+0x112/0x11e
 [<ffffffff88021238>] :jbd:__journal_unfile_buffer+0x9/0x13
 [<ffffffff88023583>] :jbd:journal_commit_transaction+0xb3d/0xd1a
 [<ffffffff8103f385>] lock_timer_base+0x26/0x4b
 [<ffffffff88025f56>] :jbd:kjournald+0xc3/0x1e6
 [<ffffffff810495a1>] autoremove_wake_function+0x0/0x2e
 [<ffffffff88025e93>] :jbd:kjournald+0x0/0x1e6
 [<ffffffff8104946e>] kthread+0x47/0x75
 [<ffffffff8100cca8>] child_rip+0xa/0x12
 [<ffffffff81049427>] kthread+0x0/0x75
 [<ffffffff8100cc9e>] child_rip+0x0/0x12

BUG: soft lockup - CPU#1 stuck for 11s! [bash:3428]
CPU 1:
Pid: 3428, comm: bash Not tainted 2.6.23 #1
RIP: 0010:[<ffffffff812442e1>]  [<ffffffff812442e1>] _spin_lock+0x7/0xf
RSP: 0018:ffff810009163d80  EFLAGS: 00000286
RAX: 0000000000208029 RBX: ffff81004a895b40 RCX: 0000000000000001
RDX: ffff81004a895b40 RSI: ffffe20000a844b0 RDI: ffff8100095bb954
RBP: 0000000000000000 R08: 0000000000001000 R09: 0000000000000002
R10: ffff810009459230 R11: ffffffff88035a41 R12: ffffffff810700f4
R13: 0000000000000246 R14: ffff81000000d780 R15: ffffe20001bd6d90
FS:  00002b58a941ef40(0000) GS:ffff810006801700(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000362b8676e0 CR3: 000000000eddf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [<ffffffff880228c9>] :jbd:journal_try_to_free_buffers+0x7e/0x124
 [<ffffffff81070b24>] __invalidate_mapping_pages+0x81/0x103
 [<ffffffff810ac53b>] drop_pagecache+0x74/0xe6
 [<ffffffff810ac5c7>] drop_caches_sysctl_handler+0x1a/0x2f
 [<ffffffff810cf5b7>] proc_sys_write+0x7c/0xa4
 [<ffffffff810912b9>] vfs_write+0xc6/0x16f
 [<ffffffff81091874>] sys_write+0x45/0x6e
 [<ffffffff8100c00c>] tracesys+0xdc/0xe1

Richard


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mm: soft lockup in 2.6.23-6636. caused by drop_caches ?
  2007-10-23 13:55 mm: soft lockup in 2.6.23-6636. caused by drop_caches ? richard kennedy
@ 2007-10-23 14:03 ` Peter Zijlstra
  2007-10-23 21:53   ` richard kennedy
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2007-10-23 14:03 UTC (permalink / raw)
  To: richard kennedy; +Cc: lkml, linux-mm, Andrew Morton

On Tue, 2007-10-23 at 14:55 +0100, richard kennedy wrote:
> on git v2.6.23-6636-g557ebb7 I'm getting a soft lockup when running a
> simple disk write test case on AMD64X2, sata hd &  ext3.
> 
> the test does this
> sync
> echo 3 > /proc/sys/vm/drop_caches
> for (( i=0; $i < $count; i=$i+1 )) ; do
> dd if=large_file of=copy_file_$i bs=4k &
> done

have you tried with lockdep enabled?

Also, doesn't really surprise me since its known that drop_caches has a
deadlock in it.

> It never recovers once the lockup occurs and this message repeats on the
> console :- 
> BUG: soft lockup - CPU#0 stuck for 11s! [kjournald:456]
> CPU 0:
> Pid: 456, comm: kjournald Not tainted 2.6.23 #1
> RIP: 0010:[<ffffffff812442df>]  [<ffffffff812442df>] _spin_lock+0x5/0xf
> RSP: 0018:ffff8100096e3db8  EFLAGS: 00000286
> RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000015
> RDX: 0000000000200000 RSI: 0000000000000004 RDI: ffffffff81378f80
> RBP: ffff8100096e3da0 R08: 0000000000000021 R09: 0000000000000000
> R10: ffffe2000142b7c0 R11: 00000000fffffffa R12: ffff8100096e3d90
> R13: ffffffff812430b2 R14: ffff810006402bd0 R15: ffff81007fba7000
> FS:  00002b3cf6460090(0000) GS:ffffffff813aa000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00002afa6e20b000 CR3: 000000000d6ba000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 
> Call Trace:
>  [<ffffffff810495cf>] wake_bit_function+0x0/0x23
>  [<ffffffff810abdd4>] __mark_inode_dirty+0xe0/0x168
>  [<ffffffff810afb43>] __set_page_dirty+0x112/0x11e
>  [<ffffffff88021238>] :jbd:__journal_unfile_buffer+0x9/0x13
>  [<ffffffff88023583>] :jbd:journal_commit_transaction+0xb3d/0xd1a
>  [<ffffffff8103f385>] lock_timer_base+0x26/0x4b
>  [<ffffffff88025f56>] :jbd:kjournald+0xc3/0x1e6
>  [<ffffffff810495a1>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff88025e93>] :jbd:kjournald+0x0/0x1e6
>  [<ffffffff8104946e>] kthread+0x47/0x75
>  [<ffffffff8100cca8>] child_rip+0xa/0x12
>  [<ffffffff81049427>] kthread+0x0/0x75
>  [<ffffffff8100cc9e>] child_rip+0x0/0x12
> 
> BUG: soft lockup - CPU#1 stuck for 11s! [bash:3428]
> CPU 1:
> Pid: 3428, comm: bash Not tainted 2.6.23 #1
> RIP: 0010:[<ffffffff812442e1>]  [<ffffffff812442e1>] _spin_lock+0x7/0xf
> RSP: 0018:ffff810009163d80  EFLAGS: 00000286
> RAX: 0000000000208029 RBX: ffff81004a895b40 RCX: 0000000000000001
> RDX: ffff81004a895b40 RSI: ffffe20000a844b0 RDI: ffff8100095bb954
> RBP: 0000000000000000 R08: 0000000000001000 R09: 0000000000000002
> R10: ffff810009459230 R11: ffffffff88035a41 R12: ffffffff810700f4
> R13: 0000000000000246 R14: ffff81000000d780 R15: ffffe20001bd6d90
> FS:  00002b58a941ef40(0000) GS:ffff810006801700(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000362b8676e0 CR3: 000000000eddf000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 
> Call Trace:
>  [<ffffffff880228c9>] :jbd:journal_try_to_free_buffers+0x7e/0x124
>  [<ffffffff81070b24>] __invalidate_mapping_pages+0x81/0x103
>  [<ffffffff810ac53b>] drop_pagecache+0x74/0xe6
>  [<ffffffff810ac5c7>] drop_caches_sysctl_handler+0x1a/0x2f
>  [<ffffffff810cf5b7>] proc_sys_write+0x7c/0xa4
>  [<ffffffff810912b9>] vfs_write+0xc6/0x16f
>  [<ffffffff81091874>] sys_write+0x45/0x6e
>  [<ffffffff8100c00c>] tracesys+0xdc/0xe1
> 
> Richard
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mm: soft lockup in 2.6.23-6636. caused by drop_caches ?
  2007-10-23 14:03 ` Peter Zijlstra
@ 2007-10-23 21:53   ` richard kennedy
  0 siblings, 0 replies; 3+ messages in thread
From: richard kennedy @ 2007-10-23 21:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, linux-mm, Andrew Morton

On Tue, 2007-10-23 at 16:03 +0200, Peter Zijlstra wrote:
> On Tue, 2007-10-23 at 14:55 +0100, richard kennedy wrote:
> > on git v2.6.23-6636-g557ebb7 I'm getting a soft lockup when running a
> > simple disk write test case on AMD64X2, sata hd &  ext3.
> > 
> > the test does this
> > sync
> > echo 3 > /proc/sys/vm/drop_caches
> > for (( i=0; $i < $count; i=$i+1 )) ; do
> > dd if=large_file of=copy_file_$i bs=4k &
> > done
> 
> have you tried with lockdep enabled?
> 
> Also, doesn't really surprise me since its known that drop_caches has a
> deadlock in it.
> 
Thanks for suggestion, of course it took a lot longer to fail with all
the debug turned on.

But, lockdep gives a possible circular lock dependency between
journal->j_list_lock and inode_lock

drop_pagecache_sb takes the inode_lock and calls down into
journal_try_to_free_buffers which takes the journal->j_list_lock

while in kjournald, journal_commit_transaction take the j_list_lock and
calls __journal_unfile_buffer that takes the inode_lock.

I'm not sure how to fix this, but hope this helps someone else ;)
Here's the full info.

Cheers
Richard


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23 #2
-------------------------------------------------------
bash/3535 is trying to acquire lock:
 (&journal->j_list_lock){--..}, at: [<ffffffff88023b58>] journal_try_to_free_buffers+0x7e/0x131 [jbd]

but task is already holding lock:
 (inode_lock){--..}, at: [<ffffffff810b5574>] drop_pagecache+0x4d/0xeb

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (inode_lock){--..}:
       [<ffffffff81056026>] __lock_acquire+0xac8/0xcf0
       [<ffffffff810b4f2e>] __mark_inode_dirty+0xe2/0x174
       [<ffffffff810562d2>] lock_acquire+0x84/0xa8
       [<ffffffff810b4f2e>] __mark_inode_dirty+0xe2/0x174
       [<ffffffff8125564a>] _spin_lock+0x1e/0x27
       [<ffffffff810b4f2e>] __mark_inode_dirty+0xe2/0x174
       [<ffffffff810b8c2a>] __set_page_dirty+0x118/0x124
       [<ffffffff88022238>] __journal_unfile_buffer+0x9/0x13 [jbd]
       [<ffffffff880248c5>] journal_commit_transaction+0xbd1/0xde8 [jbd]
       [<ffffffff880278f1>] kjournald+0xc6/0x1f1 [jbd]
       [<ffffffff8104b87a>] autoremove_wake_function+0x0/0x2e
       [<ffffffff810550bd>] trace_hardirqs_on+0x11e/0x149
       [<ffffffff8802782b>] kjournald+0x0/0x1f1 [jbd]
       [<ffffffff8104b760>] kthread+0x47/0x73
       [<ffffffff8125512e>] trace_hardirqs_on_thunk+0x35/0x3a
       [<ffffffff8100cdf8>] child_rip+0xa/0x12
       [<ffffffff8100c50f>] restore_args+0x0/0x30
       [<ffffffff8104b719>] kthread+0x0/0x73
       [<ffffffff8100cdee>] child_rip+0x0/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&journal->j_list_lock){--..}:
       [<ffffffff8105479f>] print_circular_bug_header+0xcc/0xd3
       [<ffffffff81055f2b>] __lock_acquire+0x9cd/0xcf0
       [<ffffffff88023b58>] journal_try_to_free_buffers+0x7e/0x131 [jbd]
       [<ffffffff810562d2>] lock_acquire+0x84/0xa8
       [<ffffffff88023b58>] journal_try_to_free_buffers+0x7e/0x131 [jbd]
       [<ffffffff8125564a>] _spin_lock+0x1e/0x27
       [<ffffffff88023b58>] journal_try_to_free_buffers+0x7e/0x131 [jbd]
       [<ffffffff810785db>] __invalidate_mapping_pages+0x81/0x103
       [<ffffffff810b559d>] drop_pagecache+0x76/0xeb
       [<ffffffff810b562c>] drop_caches_sysctl_handler+0x1a/0x2e
       [<ffffffff810d9407>] proc_sys_write+0x7c/0xa4
       [<ffffffff81099619>] vfs_write+0xc6/0x16f
       [<ffffffff81099bd6>] sys_write+0x45/0x6e
       [<ffffffff8100c05a>] tracesys+0xdc/0xe1
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by bash/3535:
 #0:  (&type->s_umount_key#17){----}, at: [<ffffffff810b5561>] drop_pagecache+0x3a/0xeb
 #1:  (inode_lock){--..}, at: [<ffffffff810b5574>] drop_pagecache+0x4d/0xeb

stack backtrace:

Call Trace:
 [<ffffffff81054369>] print_circular_bug_tail+0x69/0x72
 [<ffffffff8105479f>] print_circular_bug_header+0xcc/0xd3
 [<ffffffff81055f2b>] __lock_acquire+0x9cd/0xcf0
 [<ffffffff88023b58>] :jbd:journal_try_to_free_buffers+0x7e/0x131
 [<ffffffff810562d2>] lock_acquire+0x84/0xa8
 [<ffffffff88023b58>] :jbd:journal_try_to_free_buffers+0x7e/0x131
 [<ffffffff8125564a>] _spin_lock+0x1e/0x27
 [<ffffffff88023b58>] :jbd:journal_try_to_free_buffers+0x7e/0x131
 [<ffffffff810785db>] __invalidate_mapping_pages+0x81/0x103
 [<ffffffff810b559d>] drop_pagecache+0x76/0xeb
 [<ffffffff810b562c>] drop_caches_sysctl_handler+0x1a/0x2e
 [<ffffffff810d9407>] proc_sys_write+0x7c/0xa4
 [<ffffffff81099619>] vfs_write+0xc6/0x16f
 [<ffffffff81099bd6>] sys_write+0x45/0x6e
 [<ffffffff8100c05a>] tracesys+0xdc/0xe1



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-10-23 21:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-23 13:55 mm: soft lockup in 2.6.23-6636. caused by drop_caches ? richard kennedy
2007-10-23 14:03 ` Peter Zijlstra
2007-10-23 21:53   ` richard kennedy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).