LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: deadlock on 2.6.24.3-rt3
       [not found] ` <47DF097B.4090200@ct.jp.nec.com>
@ 2008-03-18  1:53   ` Steven Rostedt
  2008-03-18  9:40     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2008-03-18  1:53 UTC (permalink / raw)
  To: Hiroshi Shimamoto
  Cc: linux-rt-users, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, LKML

[Added Peter Zijlsta and LKML]

Hiroshi, thanks for looking into this!

Peter, this looks like a legit fix, could you Ack it.


On Mon, 17 Mar 2008, Hiroshi Shimamoto wrote:

> Hiroshi Shimamoto wrote:
> > Hi,
> >
> > I got a soft lockup message on 2.6.24.3-rt3.
> > I attached the .config.
> >
> > I think there is a deadlock scenario, I explain later.
> >
> > Here is the console log;
> > BUG: soft lockup - CPU#2 stuck for 11s! [bash:2175]
> > CPU 2:
> > Modules linked in:
> > Pid: 2175, comm: bash Not tainted 2.6.24.3-rt3 #1
> > RIP: 0010:[<ffffffff8063f052>]  [<ffffffff8063f052>] __spin_lock+0x57/0x67
> > RSP: 0000:ffff8100c52a1d48  EFLAGS: 00000202
> > RAX: 0000000000000000 RBX: 0000000000004bc5 RCX: 0000000000004bc5
> > RDX: 0000000000000002 RSI: 00000000006c3208 RDI: 0000000000000001
> > RBP: 000000000000000d R08: ffff8100cbc28018 R09: ffff810007c95458
> > R10: 00000000006c3208 R11: 0000000000000246 R12: ffffffff808246e8
> > R13: 000284d000000002 R14: ffffffff80387277 R15: 00000000ffffffff
> > FS:  00002b28926a2ef0(0000) GS:ffff8100cf8a3940(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00000000006c3208 CR3: 00000000c3cac000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Call Trace:
> >  [<ffffffff8063f024>] __spin_lock+0x29/0x67
> >  [<ffffffff80296597>] swap_info_get+0x65/0xdd
> >  [<ffffffff80296c0c>] can_share_swap_page+0x39/0x84
> >  [<ffffffff8028ae6e>] do_wp_page+0x2f9/0x519
> >  [<ffffffff8028c8d2>] handle_mm_fault+0x615/0x7cf
> >  [<ffffffff802db823>] proc_flush_task+0x171/0x29c
> >  [<ffffffff8024a64d>] recalc_sigpending+0xe/0x3c
> >  [<ffffffff8022645e>] do_page_fault+0x162/0x754
> >  [<ffffffff8026fe81>] audit_syscall_exit+0x31c/0x37a
> >  [<ffffffff8063f449>] error_exit+0x0/0x51
> > ---------------------------
> > | preempt count: 00010002 ]
> > | 2-level deep critical section nesting:
> > ----------------------------------------
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> > BUG: soft lockup - CPU#3 stuck for 11s! [stress:9460]
> > CPU 3:
> > Modules linked in:
> > Pid: 9460, comm: stress Not tainted 2.6.24.3-rt3 #1
> > RIP: 0010:[<ffffffff8027c974>]  [<ffffffff8027c974>] find_get_page+0xad/0xbe
> > RSP: 0018:ffff8100cbf25b88  EFLAGS: 00000202
> >  0000000000002009 RBX: ffffffff80824bc8 RCX: 0000000000000002
> > RDX: 0000000000000002 RSI: ffff8100cbfcf298 RDI: ffff810005ad8910
> > RBP: ffffffff80383a57 R08: ffff810005ad8918 R09: 0000000000000003
> > R10: ffff810005ad88d8 R11: 0000000000000001 R12: ffffffff80822880
> > R13: ffff81000799ce48 R14: ffffffff8028921c R15: ffffffff80822880
> > FS:  00002acaa373bb00(0000) GS:ffff8100cf8a32c0(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00002b9b2827c530 CR3: 000000006cc90000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Call Trace:
> >  [<ffffffff8027c8ea>] find_get_page+0x23/0xbe
> >  [<ffffffff80296f83>] free_swap_and_cache+0x46/0xdd
> >  [<ffffffff8028b9b7>] unmap_vmas+0x626/0x8ce
> >  [<ffffffff8028fa4c>] exit_mmap+0xac/0x147
> >  [<ffffffff8023ced7>] mmput+0x32/0xae
> >  [<ffffffff80242f00>] do_exit+0x199/0x914
> >  [<ffffffff8024ab3b>] __dequeue_signal+0x19/0x1b7
> >  [<ffffffff802436a7>] do_group_exit+0x2c/0x7e
> >  [<ffffffff8024c47b>] get_signal_to_deliver+0x2ef/0x4aa
> >  [<ffffffff8020b5dc>] do_notify_resume+0xa8/0x7cd
> >  [<ffffffff80239320>] add_preempt_count+0x14/0x111
> >  [<ffffffff8038b292>] __up_read+0x13/0x8d
> >  [<ffffffff80226483>] do_page_fault+0x187/0x754
> >  [<ffffffff802335ae>] __dequeue_entity+0x2d/0x34
> >  [<ffffffff8020a6d5>] __switch_to+0x27/0x2c9
> >  [<ffffffff802264f0>] do_page_fault+0x1f4/0x754
> >  [<ffffffff8020c7be>] retint_signal+0x3d/0x7f
> > ---------------------------
> > | preempt count: 00010005 ]
> > | 5-level deep critical section nesting:
> > ----------------------------------------
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> > .. [<ffffffff8027c8db>] .... find_get_page+0x14/0xbe
> > .....[<00000000>] ..   ( <= 0x0)
> > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > .....[<00000000>] ..   ( <= 0x0)
> >
> >
> > I also got a kernel core.
> > (gdb) info thr
> >   4 process 9460  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
> >     offset=18446744071570598016) at include/asm/processor_64.h:385
> >   3 process 2175  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> >   2 process 9132  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
> > * 1 process 9478  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> >
> > CPU3(thread 4) is in find_get_page(), and the others in __spin_lock().
> > The thread 4 is waiting to turn PG_nonewrefs bit off in wait_on_page_ref() which is
> > called from page_cache_get_speculative(), and the thread 4 holds the swap_lock.
> > The other threads waiting the swap_lock.
> > On the other hand, the thread 1 turned PG_nonewrefs bit on by calling
> > lock_page_ref_irq() in remove_mapping(), and then waiting the swap_lock.
> > So if the target page of remove_mapping() is in the exiting process memory,
> > the kernel is deadlock.
> >
> > (gdb) bt
> > #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
> >     at mm/swapfile.c:253
> > #2  0xffffffff80296618 in swap_free (entry={val = 1}) at mm/swapfile.c:300
> > #3  0xffffffff80286acd in remove_mapping (mapping=<value optimized out>,
> >     page=0xffff810005ad8910) at mm/vmscan.c:423
> > ...
> >
> > (gdb) thr 2
> > (gdb) bt
> > #0  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
> > #1  0xffffffff80296374 in valid_swaphandles (entry=<value optimized out>,
> >     offset=0xffff81001e22bc78) at mm/swapfile.c:1783
> > #2  0xffffffff8028b0af in swapin_readahead (entry={val = 1}, addr=0, vma=0x1)
> >     at mm/memory.c:2054
> > #3  0xffffffff8029a6af in shmem_getpage (inode=0xffff8100cdf4fd48, idx=0,
> >     pagep=0xffff81001e22bd80, sgp=SGP_FAULT, type=0xffff81001e22bd34) at mm/shmem.c:1089
> > ...
> >
> > (gdb) thr 3
> > (gdb) bt
> > #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
> >     at mm/swapfile.c:253
> > #2  0xffffffff80296c0c in can_share_swap_page (page=<value optimized out>)
> >     at mm/swapfile.c:317
> > #3  0xffffffff8028ae6e in do_wp_page (mm=0xffff8100ce772f40, vma=0xffff8100cd212f00,
> >     address=7090696, page_table=0xffff8100cbcef618, pmd=0xffff8100cbc28018,
> >     ptl=0xffff810007c95458, orig_pte=<value optimized out>) at mm/memory.c:1606
> > ...
> >
> > (gdb) thr 4
> > (gdb) bt
> > #0  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
> >     offset=18446744071570598016) at include/asm/processor_64.h:385
> > #1  0xffffffff80296f83 in free_swap_and_cache (entry={val = 4032}) at mm/swapfile.c:403
> > #2  0xffffffff8028b9b7 in unmap_vmas (tlbp=0xffff8100cbf25cd8, vma=0xffff8100cde5c678,
> >     start_addr=0, end_addr=18446744073709551615, nr_accounted=0xffff8100cbf25cd0,
> >     details=0x0) at mm/memory.c:728
> > #3  0xffffffff8028fa4c in exit_mmap (mm=0xffff8100cd093600) at mm/mmap.c:2048
> > #4  0xffffffff8023ced7 in mmput (mm=0xffff8100cd093600) at kernel/fork.c:443
> > #5  0xffffffff80242f00 in do_exit (code=14) at kernel/exit.c:997
> > ...
> >
> >
> > I think it came from the lockless speculative get page patch.
> > I found the newer version of this patch in linux-mm.
> > http://marc.info/?l=linux-mm&m=119477111927364&w=2
> >
> > I haven't tested it because it looks big change and hard to apply.
> > But it seems to fix this deadlock issue.
> > Any other patch to fix this issue is welcome.
> >
>
> Is this patch good?
>
> ---
> From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> Subject: [PATCH] avoid deadlock related with PG_nonewrefs and swap_lock
>
> There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
> remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
> free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
> off in find_get_page().
>
> swap_lock can be unlocked before calling find_get_page().
>
> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> ---
>  mm/swapfile.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 5036b70..581afee 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -400,13 +400,14 @@ void free_swap_and_cache(swp_entry_t entry)
>  	p = swap_info_get(entry);
>  	if (p) {
>  		if (swap_entry_free(p, swp_offset(entry)) == 1) {
> +			spin_unlock(&swap_lock);
>  			page = find_get_page(&swapper_space, entry.val);
>  			if (page && unlikely(TestSetPageLocked(page))) {
>  				page_cache_release(page);
>  				page = NULL;
>  			}
> -		}
> -		spin_unlock(&swap_lock);
> +		} else
> +			spin_unlock(&swap_lock);
>  	}
>  	if (page) {
>  		int one_user;
> --
> 1.5.4.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: deadlock on 2.6.24.3-rt3
  2008-03-18  1:53   ` deadlock on 2.6.24.3-rt3 Steven Rostedt
@ 2008-03-18  9:40     ` Peter Zijlstra
  2008-03-18 17:15       ` Hiroshi Shimamoto
  2008-03-24 18:24       ` [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock Hiroshi Shimamoto
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-03-18  9:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Hiroshi Shimamoto, linux-rt-users, Ingo Molnar, Thomas Gleixner, LKML

On Mon, 2008-03-17 at 21:53 -0400, Steven Rostedt wrote:
> [Added Peter Zijlsta and LKML]

Yeah, patches and such should really go to LKML as LKML is the -rt
development list.

> Hiroshi, thanks for looking into this!
> 
> Peter, this looks like a legit fix, could you Ack it.
> 
> 
> On Mon, 17 Mar 2008, Hiroshi Shimamoto wrote:
> 
> > Hiroshi Shimamoto wrote:
> > > Hi,
> > >
> > > I got a soft lockup message on 2.6.24.3-rt3.
> > > I attached the .config.
> > >
> > > I think there is a deadlock scenario, I explain later.
> > >
> > > Here is the console log;
> > > BUG: soft lockup - CPU#2 stuck for 11s! [bash:2175]
> > > CPU 2:
> > > Modules linked in:
> > > Pid: 2175, comm: bash Not tainted 2.6.24.3-rt3 #1
> > > RIP: 0010:[<ffffffff8063f052>]  [<ffffffff8063f052>] __spin_lock+0x57/0x67
> > > RSP: 0000:ffff8100c52a1d48  EFLAGS: 00000202
> > > RAX: 0000000000000000 RBX: 0000000000004bc5 RCX: 0000000000004bc5
> > > RDX: 0000000000000002 RSI: 00000000006c3208 RDI: 0000000000000001
> > > RBP: 000000000000000d R08: ffff8100cbc28018 R09: ffff810007c95458
> > > R10: 00000000006c3208 R11: 0000000000000246 R12: ffffffff808246e8
> > > R13: 000284d000000002 R14: ffffffff80387277 R15: 00000000ffffffff
> > > FS:  00002b28926a2ef0(0000) GS:ffff8100cf8a3940(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00000000006c3208 CR3: 00000000c3cac000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  [<ffffffff8063f024>] __spin_lock+0x29/0x67
> > >  [<ffffffff80296597>] swap_info_get+0x65/0xdd
> > >  [<ffffffff80296c0c>] can_share_swap_page+0x39/0x84
> > >  [<ffffffff8028ae6e>] do_wp_page+0x2f9/0x519
> > >  [<ffffffff8028c8d2>] handle_mm_fault+0x615/0x7cf
> > >  [<ffffffff802db823>] proc_flush_task+0x171/0x29c
> > >  [<ffffffff8024a64d>] recalc_sigpending+0xe/0x3c
> > >  [<ffffffff8022645e>] do_page_fault+0x162/0x754
> > >  [<ffffffff8026fe81>] audit_syscall_exit+0x31c/0x37a
> > >  [<ffffffff8063f449>] error_exit+0x0/0x51
> > > ---------------------------
> > > | preempt count: 00010002 ]
> > > | 2-level deep critical section nesting:
> > > ----------------------------------------
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > > BUG: soft lockup - CPU#3 stuck for 11s! [stress:9460]
> > > CPU 3:
> > > Modules linked in:
> > > Pid: 9460, comm: stress Not tainted 2.6.24.3-rt3 #1
> > > RIP: 0010:[<ffffffff8027c974>]  [<ffffffff8027c974>] find_get_page+0xad/0xbe
> > > RSP: 0018:ffff8100cbf25b88  EFLAGS: 00000202
> > >  0000000000002009 RBX: ffffffff80824bc8 RCX: 0000000000000002
> > > RDX: 0000000000000002 RSI: ffff8100cbfcf298 RDI: ffff810005ad8910
> > > RBP: ffffffff80383a57 R08: ffff810005ad8918 R09: 0000000000000003
> > > R10: ffff810005ad88d8 R11: 0000000000000001 R12: ffffffff80822880
> > > R13: ffff81000799ce48 R14: ffffffff8028921c R15: ffffffff80822880
> > > FS:  00002acaa373bb00(0000) GS:ffff8100cf8a32c0(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00002b9b2827c530 CR3: 000000006cc90000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  [<ffffffff8027c8ea>] find_get_page+0x23/0xbe
> > >  [<ffffffff80296f83>] free_swap_and_cache+0x46/0xdd
> > >  [<ffffffff8028b9b7>] unmap_vmas+0x626/0x8ce
> > >  [<ffffffff8028fa4c>] exit_mmap+0xac/0x147
> > >  [<ffffffff8023ced7>] mmput+0x32/0xae
> > >  [<ffffffff80242f00>] do_exit+0x199/0x914
> > >  [<ffffffff8024ab3b>] __dequeue_signal+0x19/0x1b7
> > >  [<ffffffff802436a7>] do_group_exit+0x2c/0x7e
> > >  [<ffffffff8024c47b>] get_signal_to_deliver+0x2ef/0x4aa
> > >  [<ffffffff8020b5dc>] do_notify_resume+0xa8/0x7cd
> > >  [<ffffffff80239320>] add_preempt_count+0x14/0x111
> > >  [<ffffffff8038b292>] __up_read+0x13/0x8d
> > >  [<ffffffff80226483>] do_page_fault+0x187/0x754
> > >  [<ffffffff802335ae>] __dequeue_entity+0x2d/0x34
> > >  [<ffffffff8020a6d5>] __switch_to+0x27/0x2c9
> > >  [<ffffffff802264f0>] do_page_fault+0x1f4/0x754
> > >  [<ffffffff8020c7be>] retint_signal+0x3d/0x7f
> > > ---------------------------
> > > | preempt count: 00010005 ]
> > > | 5-level deep critical section nesting:
> > > ----------------------------------------
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > > .. [<ffffffff8027c8db>] .... find_get_page+0x14/0xbe
> > > .....[<00000000>] ..   ( <= 0x0)
> > > .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
> > > .....[<00000000>] ..   ( <= 0x0)
> > >
> > >
> > > I also got a kernel core.
> > > (gdb) info thr
> > >   4 process 9460  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
> > >     offset=18446744071570598016) at include/asm/processor_64.h:385
> > >   3 process 2175  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > >   2 process 9132  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
> > > * 1 process 9478  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > >
> > > CPU3(thread 4) is in find_get_page(), and the others in __spin_lock().
> > > The thread 4 is waiting to turn PG_nonewrefs bit off in wait_on_page_ref() which is
> > > called from page_cache_get_speculative(), and the thread 4 holds the swap_lock.
> > > The other threads waiting the swap_lock.
> > > On the other hand, the thread 1 turned PG_nonewrefs bit on by calling
> > > lock_page_ref_irq() in remove_mapping(), and then waiting the swap_lock.
> > > So if the target page of remove_mapping() is in the exiting process memory,
> > > the kernel is deadlock.
> > >
> > > (gdb) bt
> > > #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > > #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
> > >     at mm/swapfile.c:253
> > > #2  0xffffffff80296618 in swap_free (entry={val = 1}) at mm/swapfile.c:300
> > > #3  0xffffffff80286acd in remove_mapping (mapping=<value optimized out>,
> > >     page=0xffff810005ad8910) at mm/vmscan.c:423
> > > ...
> > >
> > > (gdb) thr 2
> > > (gdb) bt
> > > #0  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
> > > #1  0xffffffff80296374 in valid_swaphandles (entry=<value optimized out>,
> > >     offset=0xffff81001e22bc78) at mm/swapfile.c:1783
> > > #2  0xffffffff8028b0af in swapin_readahead (entry={val = 1}, addr=0, vma=0x1)
> > >     at mm/memory.c:2054
> > > #3  0xffffffff8029a6af in shmem_getpage (inode=0xffff8100cdf4fd48, idx=0,
> > >     pagep=0xffff81001e22bd80, sgp=SGP_FAULT, type=0xffff81001e22bd34) at mm/shmem.c:1089
> > > ...
> > >
> > > (gdb) thr 3
> > > (gdb) bt
> > > #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
> > > #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
> > >     at mm/swapfile.c:253
> > > #2  0xffffffff80296c0c in can_share_swap_page (page=<value optimized out>)
> > >     at mm/swapfile.c:317
> > > #3  0xffffffff8028ae6e in do_wp_page (mm=0xffff8100ce772f40, vma=0xffff8100cd212f00,
> > >     address=7090696, page_table=0xffff8100cbcef618, pmd=0xffff8100cbc28018,
> > >     ptl=0xffff810007c95458, orig_pte=<value optimized out>) at mm/memory.c:1606
> > > ...
> > >
> > > (gdb) thr 4
> > > (gdb) bt
> > > #0  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
> > >     offset=18446744071570598016) at include/asm/processor_64.h:385
> > > #1  0xffffffff80296f83 in free_swap_and_cache (entry={val = 4032}) at mm/swapfile.c:403
> > > #2  0xffffffff8028b9b7 in unmap_vmas (tlbp=0xffff8100cbf25cd8, vma=0xffff8100cde5c678,
> > >     start_addr=0, end_addr=18446744073709551615, nr_accounted=0xffff8100cbf25cd0,
> > >     details=0x0) at mm/memory.c:728
> > > #3  0xffffffff8028fa4c in exit_mmap (mm=0xffff8100cd093600) at mm/mmap.c:2048
> > > #4  0xffffffff8023ced7 in mmput (mm=0xffff8100cd093600) at kernel/fork.c:443
> > > #5  0xffffffff80242f00 in do_exit (code=14) at kernel/exit.c:997
> > > ...
> > >
> > >
> > > I think it came from the lockless speculative get page patch.
> > > I found the newer version of this patch in linux-mm.
> > > http://marc.info/?l=linux-mm&m=119477111927364&w=2
> > >
> > > I haven't tested it because it looks big change and hard to apply.
> > > But it seems to fix this deadlock issue.
> > > Any other patch to fix this issue is welcome.

Yeah, in the latest lockless pagecache patches Nick got rid of
PG_nonewrefs as suggested by Hugh, however -rt also has my concurrent
pagecache patches and those need PG_nonewrefs on their own, so I hadn't
bothered to update to Nick's latest.

Perhaps I ought to, as you point out, page_cache_get_speculative() is
cleaner in his latest.. /me puts it on his overlong TODO list.

> > Is this patch good?

Yes, it does look good. Doesn't remove_exclusice_swap_page() also nest
PG_nonewrefs inside of swap_lock?

Looks to me that also needs fixing..

> > ---
> > From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> > Subject: [PATCH] avoid deadlock related with PG_nonewrefs and swap_lock
> >
> > There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
> > remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
> > free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
> > off in find_get_page().
> >
> > swap_lock can be unlocked before calling find_get_page().
> >
> > Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> > ---
> >  mm/swapfile.c |    5 +++--
> >  1 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 5036b70..581afee 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -400,13 +400,14 @@ void free_swap_and_cache(swp_entry_t entry)
> >  	p = swap_info_get(entry);
> >  	if (p) {
> >  		if (swap_entry_free(p, swp_offset(entry)) == 1) {
> > +			spin_unlock(&swap_lock);
> >  			page = find_get_page(&swapper_space, entry.val);
> >  			if (page && unlikely(TestSetPageLocked(page))) {
> >  				page_cache_release(page);
> >  				page = NULL;
> >  			}
> > -		}
> > -		spin_unlock(&swap_lock);
> > +		} else
> > +			spin_unlock(&swap_lock);
> >  	}
> >  	if (page) {
> >  		int one_user;
> > --
> > 1.5.4.1
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: deadlock on 2.6.24.3-rt3
  2008-03-18  9:40     ` Peter Zijlstra
@ 2008-03-18 17:15       ` Hiroshi Shimamoto
  2008-03-20 20:31         ` Hiroshi Shimamoto
  2008-03-24 18:24       ` [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock Hiroshi Shimamoto
  1 sibling, 1 reply; 7+ messages in thread
From: Hiroshi Shimamoto @ 2008-03-18 17:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-rt-users, Ingo Molnar, Thomas Gleixner, LKML

Peter Zijlstra wrote:
> On Mon, 2008-03-17 at 21:53 -0400, Steven Rostedt wrote:
>> [Added Peter Zijlsta and LKML]
> 
> Yeah, patches and such should really go to LKML as LKML is the -rt
> development list.

Sorry, I missed to add LKML. It's written in RT wiki.
I'm not sure about BUG report too. Should I send it to LKML?

> 
>> Hiroshi, thanks for looking into this!
>>
>> Peter, this looks like a legit fix, could you Ack it.
>>
>>
>> On Mon, 17 Mar 2008, Hiroshi Shimamoto wrote:
>>
>>> Hiroshi Shimamoto wrote:
>>>> Hi,
>>>>
>>>> I got a soft lockup message on 2.6.24.3-rt3.
>>>> I attached the .config.
>>>>
>>>> I think there is a deadlock scenario, I explain later.
>>>>
>>>> Here is the console log;
>>>> BUG: soft lockup - CPU#2 stuck for 11s! [bash:2175]
>>>> CPU 2:
>>>> Modules linked in:
>>>> Pid: 2175, comm: bash Not tainted 2.6.24.3-rt3 #1
>>>> RIP: 0010:[<ffffffff8063f052>]  [<ffffffff8063f052>] __spin_lock+0x57/0x67
>>>> RSP: 0000:ffff8100c52a1d48  EFLAGS: 00000202
>>>> RAX: 0000000000000000 RBX: 0000000000004bc5 RCX: 0000000000004bc5
>>>> RDX: 0000000000000002 RSI: 00000000006c3208 RDI: 0000000000000001
>>>> RBP: 000000000000000d R08: ffff8100cbc28018 R09: ffff810007c95458
>>>> R10: 00000000006c3208 R11: 0000000000000246 R12: ffffffff808246e8
>>>> R13: 000284d000000002 R14: ffffffff80387277 R15: 00000000ffffffff
>>>> FS:  00002b28926a2ef0(0000) GS:ffff8100cf8a3940(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> CR2: 00000000006c3208 CR3: 00000000c3cac000 CR4: 00000000000006e0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>>  [<ffffffff8063f024>] __spin_lock+0x29/0x67
>>>>  [<ffffffff80296597>] swap_info_get+0x65/0xdd
>>>>  [<ffffffff80296c0c>] can_share_swap_page+0x39/0x84
>>>>  [<ffffffff8028ae6e>] do_wp_page+0x2f9/0x519
>>>>  [<ffffffff8028c8d2>] handle_mm_fault+0x615/0x7cf
>>>>  [<ffffffff802db823>] proc_flush_task+0x171/0x29c
>>>>  [<ffffffff8024a64d>] recalc_sigpending+0xe/0x3c
>>>>  [<ffffffff8022645e>] do_page_fault+0x162/0x754
>>>>  [<ffffffff8026fe81>] audit_syscall_exit+0x31c/0x37a
>>>>  [<ffffffff8063f449>] error_exit+0x0/0x51
>>>> ---------------------------
>>>> | preempt count: 00010002 ]
>>>> | 2-level deep critical section nesting:
>>>> ----------------------------------------
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> BUG: soft lockup - CPU#3 stuck for 11s! [stress:9460]
>>>> CPU 3:
>>>> Modules linked in:
>>>> Pid: 9460, comm: stress Not tainted 2.6.24.3-rt3 #1
>>>> RIP: 0010:[<ffffffff8027c974>]  [<ffffffff8027c974>] find_get_page+0xad/0xbe
>>>> RSP: 0018:ffff8100cbf25b88  EFLAGS: 00000202
>>>>  0000000000002009 RBX: ffffffff80824bc8 RCX: 0000000000000002
>>>> RDX: 0000000000000002 RSI: ffff8100cbfcf298 RDI: ffff810005ad8910
>>>> RBP: ffffffff80383a57 R08: ffff810005ad8918 R09: 0000000000000003
>>>> R10: ffff810005ad88d8 R11: 0000000000000001 R12: ffffffff80822880
>>>> R13: ffff81000799ce48 R14: ffffffff8028921c R15: ffffffff80822880
>>>> FS:  00002acaa373bb00(0000) GS:ffff8100cf8a32c0(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> CR2: 00002b9b2827c530 CR3: 000000006cc90000 CR4: 00000000000006e0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>>  [<ffffffff8027c8ea>] find_get_page+0x23/0xbe
>>>>  [<ffffffff80296f83>] free_swap_and_cache+0x46/0xdd
>>>>  [<ffffffff8028b9b7>] unmap_vmas+0x626/0x8ce
>>>>  [<ffffffff8028fa4c>] exit_mmap+0xac/0x147
>>>>  [<ffffffff8023ced7>] mmput+0x32/0xae
>>>>  [<ffffffff80242f00>] do_exit+0x199/0x914
>>>>  [<ffffffff8024ab3b>] __dequeue_signal+0x19/0x1b7
>>>>  [<ffffffff802436a7>] do_group_exit+0x2c/0x7e
>>>>  [<ffffffff8024c47b>] get_signal_to_deliver+0x2ef/0x4aa
>>>>  [<ffffffff8020b5dc>] do_notify_resume+0xa8/0x7cd
>>>>  [<ffffffff80239320>] add_preempt_count+0x14/0x111
>>>>  [<ffffffff8038b292>] __up_read+0x13/0x8d
>>>>  [<ffffffff80226483>] do_page_fault+0x187/0x754
>>>>  [<ffffffff802335ae>] __dequeue_entity+0x2d/0x34
>>>>  [<ffffffff8020a6d5>] __switch_to+0x27/0x2c9
>>>>  [<ffffffff802264f0>] do_page_fault+0x1f4/0x754
>>>>  [<ffffffff8020c7be>] retint_signal+0x3d/0x7f
>>>> ---------------------------
>>>> | preempt count: 00010005 ]
>>>> | 5-level deep critical section nesting:
>>>> ----------------------------------------
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> .. [<ffffffff8027c8db>] .... find_get_page+0x14/0xbe
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>
>>>>
>>>> I also got a kernel core.
>>>> (gdb) info thr
>>>>   4 process 9460  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
>>>>     offset=18446744071570598016) at include/asm/processor_64.h:385
>>>>   3 process 2175  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>   2 process 9132  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
>>>> * 1 process 9478  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>
>>>> CPU3(thread 4) is in find_get_page(), and the others in __spin_lock().
>>>> The thread 4 is waiting to turn PG_nonewrefs bit off in wait_on_page_ref() which is
>>>> called from page_cache_get_speculative(), and the thread 4 holds the swap_lock.
>>>> The other threads waiting the swap_lock.
>>>> On the other hand, the thread 1 turned PG_nonewrefs bit on by calling
>>>> lock_page_ref_irq() in remove_mapping(), and then waiting the swap_lock.
>>>> So if the target page of remove_mapping() is in the exiting process memory,
>>>> the kernel is deadlock.
>>>>
>>>> (gdb) bt
>>>> #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>> #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
>>>>     at mm/swapfile.c:253
>>>> #2  0xffffffff80296618 in swap_free (entry={val = 1}) at mm/swapfile.c:300
>>>> #3  0xffffffff80286acd in remove_mapping (mapping=<value optimized out>,
>>>>     page=0xffff810005ad8910) at mm/vmscan.c:423
>>>> ...
>>>>
>>>> (gdb) thr 2
>>>> (gdb) bt
>>>> #0  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
>>>> #1  0xffffffff80296374 in valid_swaphandles (entry=<value optimized out>,
>>>>     offset=0xffff81001e22bc78) at mm/swapfile.c:1783
>>>> #2  0xffffffff8028b0af in swapin_readahead (entry={val = 1}, addr=0, vma=0x1)
>>>>     at mm/memory.c:2054
>>>> #3  0xffffffff8029a6af in shmem_getpage (inode=0xffff8100cdf4fd48, idx=0,
>>>>     pagep=0xffff81001e22bd80, sgp=SGP_FAULT, type=0xffff81001e22bd34) at mm/shmem.c:1089
>>>> ...
>>>>
>>>> (gdb) thr 3
>>>> (gdb) bt
>>>> #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>> #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
>>>>     at mm/swapfile.c:253
>>>> #2  0xffffffff80296c0c in can_share_swap_page (page=<value optimized out>)
>>>>     at mm/swapfile.c:317
>>>> #3  0xffffffff8028ae6e in do_wp_page (mm=0xffff8100ce772f40, vma=0xffff8100cd212f00,
>>>>     address=7090696, page_table=0xffff8100cbcef618, pmd=0xffff8100cbc28018,
>>>>     ptl=0xffff810007c95458, orig_pte=<value optimized out>) at mm/memory.c:1606
>>>> ...
>>>>
>>>> (gdb) thr 4
>>>> (gdb) bt
>>>> #0  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
>>>>     offset=18446744071570598016) at include/asm/processor_64.h:385
>>>> #1  0xffffffff80296f83 in free_swap_and_cache (entry={val = 4032}) at mm/swapfile.c:403
>>>> #2  0xffffffff8028b9b7 in unmap_vmas (tlbp=0xffff8100cbf25cd8, vma=0xffff8100cde5c678,
>>>>     start_addr=0, end_addr=18446744073709551615, nr_accounted=0xffff8100cbf25cd0,
>>>>     details=0x0) at mm/memory.c:728
>>>> #3  0xffffffff8028fa4c in exit_mmap (mm=0xffff8100cd093600) at mm/mmap.c:2048
>>>> #4  0xffffffff8023ced7 in mmput (mm=0xffff8100cd093600) at kernel/fork.c:443
>>>> #5  0xffffffff80242f00 in do_exit (code=14) at kernel/exit.c:997
>>>> ...
>>>>
>>>>
>>>> I think it came from the lockless speculative get page patch.
>>>> I found the newer version of this patch in linux-mm.
>>>> http://marc.info/?l=linux-mm&m=119477111927364&w=2
>>>>
>>>> I haven't tested it because it looks big change and hard to apply.
>>>> But it seems to fix this deadlock issue.
>>>> Any other patch to fix this issue is welcome.
> 
> Yeah, in the latest lockless pagecache patches Nick got rid of
> PG_nonewrefs as suggested by Hugh, however -rt also has my concurrent
> pagecache patches and those need PG_nonewrefs on their own, so I hadn't
> bothered to update to Nick's latest.
> 
> Perhaps I ought to, as you point out, page_cache_get_speculative() is
> cleaner in his latest.. /me puts it on his overlong TODO list.
> 
>>> Is this patch good?
> 
> Yes, it does look good. Doesn't remove_exclusice_swap_page() also nest
> PG_nonewrefs inside of swap_lock?

I'm not sure about it. I haven't noticed it and I haven't checked all PG_nonewrefs.
Will look into it.

By the way, unfortunately, I got another BUG message w/ my patch... I'll report it.
The new issue is related with SLAB.
kernel BUG at mm/slab.c:3125!

thanks,
Hiroshi Shimamoto

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: deadlock on 2.6.24.3-rt3
  2008-03-18 17:15       ` Hiroshi Shimamoto
@ 2008-03-20 20:31         ` Hiroshi Shimamoto
  0 siblings, 0 replies; 7+ messages in thread
From: Hiroshi Shimamoto @ 2008-03-20 20:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-rt-users, Ingo Molnar, Thomas Gleixner, LKML

Hiroshi Shimamoto wrote:
> Peter Zijlstra wrote:
>> On Mon, 2008-03-17 at 21:53 -0400, Steven Rostedt wrote:
>>> [Added Peter Zijlsta and LKML]
>> Yeah, patches and such should really go to LKML as LKML is the -rt
>> development list.
> 
> Sorry, I missed to add LKML. It's written in RT wiki.
> I'm not sure about BUG report too. Should I send it to LKML?
> 
>>> Hiroshi, thanks for looking into this!
>>>
>>> Peter, this looks like a legit fix, could you Ack it.
>>>
>>>
>>> On Mon, 17 Mar 2008, Hiroshi Shimamoto wrote:
>>>
>>>> Hiroshi Shimamoto wrote:
>>>>> Hi,
>>>>>
>>>>> I got a soft lockup message on 2.6.24.3-rt3.
>>>>> I attached the .config.
>>>>>
>>>>> I think there is a deadlock scenario, I explain later.
>>>>>
>>>>> Here is the console log;
>>>>> BUG: soft lockup - CPU#2 stuck for 11s! [bash:2175]
>>>>> CPU 2:
>>>>> Modules linked in:
>>>>> Pid: 2175, comm: bash Not tainted 2.6.24.3-rt3 #1
>>>>> RIP: 0010:[<ffffffff8063f052>]  [<ffffffff8063f052>] __spin_lock+0x57/0x67
>>>>> RSP: 0000:ffff8100c52a1d48  EFLAGS: 00000202
>>>>> RAX: 0000000000000000 RBX: 0000000000004bc5 RCX: 0000000000004bc5
>>>>> RDX: 0000000000000002 RSI: 00000000006c3208 RDI: 0000000000000001
>>>>> RBP: 000000000000000d R08: ffff8100cbc28018 R09: ffff810007c95458
>>>>> R10: 00000000006c3208 R11: 0000000000000246 R12: ffffffff808246e8
>>>>> R13: 000284d000000002 R14: ffffffff80387277 R15: 00000000ffffffff
>>>>> FS:  00002b28926a2ef0(0000) GS:ffff8100cf8a3940(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> CR2: 00000000006c3208 CR3: 00000000c3cac000 CR4: 00000000000006e0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>  [<ffffffff8063f024>] __spin_lock+0x29/0x67
>>>>>  [<ffffffff80296597>] swap_info_get+0x65/0xdd
>>>>>  [<ffffffff80296c0c>] can_share_swap_page+0x39/0x84
>>>>>  [<ffffffff8028ae6e>] do_wp_page+0x2f9/0x519
>>>>>  [<ffffffff8028c8d2>] handle_mm_fault+0x615/0x7cf
>>>>>  [<ffffffff802db823>] proc_flush_task+0x171/0x29c
>>>>>  [<ffffffff8024a64d>] recalc_sigpending+0xe/0x3c
>>>>>  [<ffffffff8022645e>] do_page_fault+0x162/0x754
>>>>>  [<ffffffff8026fe81>] audit_syscall_exit+0x31c/0x37a
>>>>>  [<ffffffff8063f449>] error_exit+0x0/0x51
>>>>> ---------------------------
>>>>> | preempt count: 00010002 ]
>>>>> | 2-level deep critical section nesting:
>>>>> ----------------------------------------
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> BUG: soft lockup - CPU#3 stuck for 11s! [stress:9460]
>>>>> CPU 3:
>>>>> Modules linked in:
>>>>> Pid: 9460, comm: stress Not tainted 2.6.24.3-rt3 #1
>>>>> RIP: 0010:[<ffffffff8027c974>]  [<ffffffff8027c974>] find_get_page+0xad/0xbe
>>>>> RSP: 0018:ffff8100cbf25b88  EFLAGS: 00000202
>>>>>  0000000000002009 RBX: ffffffff80824bc8 RCX: 0000000000000002
>>>>> RDX: 0000000000000002 RSI: ffff8100cbfcf298 RDI: ffff810005ad8910
>>>>> RBP: ffffffff80383a57 R08: ffff810005ad8918 R09: 0000000000000003
>>>>> R10: ffff810005ad88d8 R11: 0000000000000001 R12: ffffffff80822880
>>>>> R13: ffff81000799ce48 R14: ffffffff8028921c R15: ffffffff80822880
>>>>> FS:  00002acaa373bb00(0000) GS:ffff8100cf8a32c0(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> CR2: 00002b9b2827c530 CR3: 000000006cc90000 CR4: 00000000000006e0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>  [<ffffffff8027c8ea>] find_get_page+0x23/0xbe
>>>>>  [<ffffffff80296f83>] free_swap_and_cache+0x46/0xdd
>>>>>  [<ffffffff8028b9b7>] unmap_vmas+0x626/0x8ce
>>>>>  [<ffffffff8028fa4c>] exit_mmap+0xac/0x147
>>>>>  [<ffffffff8023ced7>] mmput+0x32/0xae
>>>>>  [<ffffffff80242f00>] do_exit+0x199/0x914
>>>>>  [<ffffffff8024ab3b>] __dequeue_signal+0x19/0x1b7
>>>>>  [<ffffffff802436a7>] do_group_exit+0x2c/0x7e
>>>>>  [<ffffffff8024c47b>] get_signal_to_deliver+0x2ef/0x4aa
>>>>>  [<ffffffff8020b5dc>] do_notify_resume+0xa8/0x7cd
>>>>>  [<ffffffff80239320>] add_preempt_count+0x14/0x111
>>>>>  [<ffffffff8038b292>] __up_read+0x13/0x8d
>>>>>  [<ffffffff80226483>] do_page_fault+0x187/0x754
>>>>>  [<ffffffff802335ae>] __dequeue_entity+0x2d/0x34
>>>>>  [<ffffffff8020a6d5>] __switch_to+0x27/0x2c9
>>>>>  [<ffffffff802264f0>] do_page_fault+0x1f4/0x754
>>>>>  [<ffffffff8020c7be>] retint_signal+0x3d/0x7f
>>>>> ---------------------------
>>>>> | preempt count: 00010005 ]
>>>>> | 5-level deep critical section nesting:
>>>>> ----------------------------------------
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> .. [<ffffffff8027c8db>] .... find_get_page+0x14/0xbe
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>> .. [<ffffffff8063f009>] .... __spin_lock+0xe/0x67
>>>>> .....[<00000000>] ..   ( <= 0x0)
>>>>>
>>>>>
>>>>> I also got a kernel core.
>>>>> (gdb) info thr
>>>>>   4 process 9460  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
>>>>>     offset=18446744071570598016) at include/asm/processor_64.h:385
>>>>>   3 process 2175  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>>   2 process 9132  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
>>>>> * 1 process 9478  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>>
>>>>> CPU3(thread 4) is in find_get_page(), and the others in __spin_lock().
>>>>> The thread 4 is waiting to turn PG_nonewrefs bit off in wait_on_page_ref() which is
>>>>> called from page_cache_get_speculative(), and the thread 4 holds the swap_lock.
>>>>> The other threads waiting the swap_lock.
>>>>> On the other hand, the thread 1 turned PG_nonewrefs bit on by calling
>>>>> lock_page_ref_irq() in remove_mapping(), and then waiting the swap_lock.
>>>>> So if the target page of remove_mapping() is in the exiting process memory,
>>>>> the kernel is deadlock.
>>>>>
>>>>> (gdb) bt
>>>>> #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>> #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
>>>>>     at mm/swapfile.c:253
>>>>> #2  0xffffffff80296618 in swap_free (entry={val = 1}) at mm/swapfile.c:300
>>>>> #3  0xffffffff80286acd in remove_mapping (mapping=<value optimized out>,
>>>>>     page=0xffff810005ad8910) at mm/vmscan.c:423
>>>>> ...
>>>>>
>>>>> (gdb) thr 2
>>>>> (gdb) bt
>>>>> #0  __spin_lock (lock=0xffffffff80893f80) at include/asm/spinlock_64.h:22
>>>>> #1  0xffffffff80296374 in valid_swaphandles (entry=<value optimized out>,
>>>>>     offset=0xffff81001e22bc78) at mm/swapfile.c:1783
>>>>> #2  0xffffffff8028b0af in swapin_readahead (entry={val = 1}, addr=0, vma=0x1)
>>>>>     at mm/memory.c:2054
>>>>> #3  0xffffffff8029a6af in shmem_getpage (inode=0xffff8100cdf4fd48, idx=0,
>>>>>     pagep=0xffff81001e22bd80, sgp=SGP_FAULT, type=0xffff81001e22bd34) at mm/shmem.c:1089
>>>>> ...
>>>>>
>>>>> (gdb) thr 3
>>>>> (gdb) bt
>>>>> #0  __spin_lock (lock=0xffffffff80893f80) at kernel/spinlock.c:333
>>>>> #1  0xffffffff80296597 in swap_info_get (entry=<value optimized out>)
>>>>>     at mm/swapfile.c:253
>>>>> #2  0xffffffff80296c0c in can_share_swap_page (page=<value optimized out>)
>>>>>     at mm/swapfile.c:317
>>>>> #3  0xffffffff8028ae6e in do_wp_page (mm=0xffff8100ce772f40, vma=0xffff8100cd212f00,
>>>>>     address=7090696, page_table=0xffff8100cbcef618, pmd=0xffff8100cbc28018,
>>>>>     ptl=0xffff810007c95458, orig_pte=<value optimized out>) at mm/memory.c:1606
>>>>> ...
>>>>>
>>>>> (gdb) thr 4
>>>>> (gdb) bt
>>>>> #0  0xffffffff8027c974 in find_get_page (mapping=<value optimized out>,
>>>>>     offset=18446744071570598016) at include/asm/processor_64.h:385
>>>>> #1  0xffffffff80296f83 in free_swap_and_cache (entry={val = 4032}) at mm/swapfile.c:403
>>>>> #2  0xffffffff8028b9b7 in unmap_vmas (tlbp=0xffff8100cbf25cd8, vma=0xffff8100cde5c678,
>>>>>     start_addr=0, end_addr=18446744073709551615, nr_accounted=0xffff8100cbf25cd0,
>>>>>     details=0x0) at mm/memory.c:728
>>>>> #3  0xffffffff8028fa4c in exit_mmap (mm=0xffff8100cd093600) at mm/mmap.c:2048
>>>>> #4  0xffffffff8023ced7 in mmput (mm=0xffff8100cd093600) at kernel/fork.c:443
>>>>> #5  0xffffffff80242f00 in do_exit (code=14) at kernel/exit.c:997
>>>>> ...
>>>>>
>>>>>
>>>>> I think it came from the lockless speculative get page patch.
>>>>> I found the newer version of this patch in linux-mm.
>>>>> http://marc.info/?l=linux-mm&m=119477111927364&w=2
>>>>>
>>>>> I haven't tested it because it looks big change and hard to apply.
>>>>> But it seems to fix this deadlock issue.
>>>>> Any other patch to fix this issue is welcome.
>> Yeah, in the latest lockless pagecache patches Nick got rid of
>> PG_nonewrefs as suggested by Hugh, however -rt also has my concurrent
>> pagecache patches and those need PG_nonewrefs on their own, so I hadn't
>> bothered to update to Nick's latest.
>>
>> Perhaps I ought to, as you point out, page_cache_get_speculative() is
>> cleaner in his latest.. /me puts it on his overlong TODO list.
>>
>>>> Is this patch good?
>> Yes, it does look good. Doesn't remove_exclusice_swap_page() also nest
>> PG_nonewrefs inside of swap_lock?
> 
> I'm not sure about it. I haven't noticed it and I haven't checked all PG_nonewrefs.
> Will look into it.
> 

Is this valid fix?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 581afee..6fbc77e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -366,6 +366,7 @@ int remove_exclusive_swap_page(struct page *page)
 	/* Is the only swap cache user the cache itself? */
 	retval = 0;
 	if (p->swap_map[swp_offset(entry)] == 1) {
+		spin_unlock(&swap_lock);
 		/* Recheck the page count with the swapcache lock held.. */
 		lock_page_ref_irq(page);
 		if ((page_count(page) == 2) && !PageWriteback(page)) {
@@ -374,8 +375,8 @@ int remove_exclusive_swap_page(struct page *page)
 			retval = 1;
 		}
 		unlock_page_ref_irq(page);
-	}
-	spin_unlock(&swap_lock);
+	} else
+		spin_unlock(&swap_lock);
 
 	if (retval) {
 		swap_free(entry);



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock
  2008-03-18  9:40     ` Peter Zijlstra
  2008-03-18 17:15       ` Hiroshi Shimamoto
@ 2008-03-24 18:24       ` Hiroshi Shimamoto
  2008-03-26  9:50         ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Hiroshi Shimamoto @ 2008-03-24 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-rt-users, Ingo Molnar, Thomas Gleixner, LKML

Hi Peter,

I've updated the patch. Could you please review it?

I'm also thinking that it can be in the mainline because it makes
the lock period shorter, correct?

---
From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
off in find_get_page().

swap_lock can be unlocked before calling find_get_page().

In remove_exclusive_swap_page(), there is similar lock sequence;
swap_lock, then PG_nonewrefs bit. swap_lock can be unlocked before
turning PG_nonewrefs bit on.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
---
 mm/swapfile.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 5036b70..6fbc77e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -366,6 +366,7 @@ int remove_exclusive_swap_page(struct page *page)
 	/* Is the only swap cache user the cache itself? */
 	retval = 0;
 	if (p->swap_map[swp_offset(entry)] == 1) {
+		spin_unlock(&swap_lock);
 		/* Recheck the page count with the swapcache lock held.. */
 		lock_page_ref_irq(page);
 		if ((page_count(page) == 2) && !PageWriteback(page)) {
@@ -374,8 +375,8 @@ int remove_exclusive_swap_page(struct page *page)
 			retval = 1;
 		}
 		unlock_page_ref_irq(page);
-	}
-	spin_unlock(&swap_lock);
+	} else
+		spin_unlock(&swap_lock);
 
 	if (retval) {
 		swap_free(entry);
@@ -400,13 +401,14 @@ void free_swap_and_cache(swp_entry_t entry)
 	p = swap_info_get(entry);
 	if (p) {
 		if (swap_entry_free(p, swp_offset(entry)) == 1) {
+			spin_unlock(&swap_lock);
 			page = find_get_page(&swapper_space, entry.val);
 			if (page && unlikely(TestSetPageLocked(page))) {
 				page_cache_release(page);
 				page = NULL;
 			}
-		}
-		spin_unlock(&swap_lock);
+		} else
+			spin_unlock(&swap_lock);
 	}
 	if (page) {
 		int one_user;
-- 
1.5.4.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock
  2008-03-24 18:24       ` [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock Hiroshi Shimamoto
@ 2008-03-26  9:50         ` Peter Zijlstra
  2008-03-26 10:09           ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-03-26  9:50 UTC (permalink / raw)
  To: Hiroshi Shimamoto
  Cc: Steven Rostedt, linux-rt-users, Ingo Molnar, Thomas Gleixner,
	LKML, Clark Williams, Nick Piggin, hugh

On Mon, 2008-03-24 at 11:24 -0700, Hiroshi Shimamoto wrote:
> Hi Peter,
> 
> I've updated the patch. Could you please review it?
> 
> I'm also thinking that it can be in the mainline because it makes
> the lock period shorter, correct?

Possibly yeah, Nick, Hugh?

> ---
> From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> 
> There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
> remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
> free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
> off in find_get_page().
> 
> swap_lock can be unlocked before calling find_get_page().
> 
> In remove_exclusive_swap_page(), there is similar lock sequence;
> swap_lock, then PG_nonewrefs bit. swap_lock can be unlocked before
> turning PG_nonewrefs bit on.

I worry about this, Once we free the swap entry with swap_entry_free(),
and drop the swap_lock, another task is basically free to re-use that
swap location and try to insert another page in that same spot in
add_to_swap() - read_swap_cache_async() can't race because it would mean
it still has a swap entry pinned.

However, add_to_swap() can already handle the race, because it used to
race against read_swap_cache_async(). It also swap_free()s the entry so
as to not leak entries. So I think this is indeed correct.

[ I ought to find some time to port the concurrent page-cache patches on
  top of Nick's latest lockless series, Hugh's suggestion makes the
  speculative get much nicer. ]

> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

> ---
>  mm/swapfile.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 5036b70..6fbc77e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -366,6 +366,7 @@ int remove_exclusive_swap_page(struct page *page)
>  	/* Is the only swap cache user the cache itself? */
>  	retval = 0;
>  	if (p->swap_map[swp_offset(entry)] == 1) {
> +		spin_unlock(&swap_lock);
>  		/* Recheck the page count with the swapcache lock held.. */
>  		lock_page_ref_irq(page);
>  		if ((page_count(page) == 2) && !PageWriteback(page)) {
> @@ -374,8 +375,8 @@ int remove_exclusive_swap_page(struct page *page)
>  			retval = 1;
>  		}
>  		unlock_page_ref_irq(page);
> -	}
> -	spin_unlock(&swap_lock);
> +	} else
> +		spin_unlock(&swap_lock);
>  
>  	if (retval) {
>  		swap_free(entry);
> @@ -400,13 +401,14 @@ void free_swap_and_cache(swp_entry_t entry)
>  	p = swap_info_get(entry);
>  	if (p) {
>  		if (swap_entry_free(p, swp_offset(entry)) == 1) {
> +			spin_unlock(&swap_lock);
>  			page = find_get_page(&swapper_space, entry.val);
>  			if (page && unlikely(TestSetPageLocked(page))) {
>  				page_cache_release(page);
>  				page = NULL;
>  			}
> -		}
> -		spin_unlock(&swap_lock);
> +		} else
> +			spin_unlock(&swap_lock);
>  	}
>  	if (page) {
>  		int one_user;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock
  2008-03-26  9:50         ` Peter Zijlstra
@ 2008-03-26 10:09           ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-03-26 10:09 UTC (permalink / raw)
  To: Hiroshi Shimamoto
  Cc: Steven Rostedt, linux-rt-users, Ingo Molnar, Thomas Gleixner,
	LKML, Clark Williams, Nick Piggin, hugh

On Wed, 2008-03-26 at 10:50 +0100, Peter Zijlstra wrote:
> On Mon, 2008-03-24 at 11:24 -0700, Hiroshi Shimamoto wrote:
> > Hi Peter,
> > 
> > I've updated the patch. Could you please review it?
> > 
> > I'm also thinking that it can be in the mainline because it makes
> > the lock period shorter, correct?
> 
> Possibly yeah, Nick, Hugh?
> 
> > ---
> > From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> > 
> > There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
> > remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
> > free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
> > off in find_get_page().
> > 
> > swap_lock can be unlocked before calling find_get_page().
> > 
> > In remove_exclusive_swap_page(), there is similar lock sequence;
> > swap_lock, then PG_nonewrefs bit. swap_lock can be unlocked before
> > turning PG_nonewrefs bit on.
> 
> I worry about this, Once we free the swap entry with swap_entry_free(),
> and drop the swap_lock, another task is basically free to re-use that
> swap location and try to insert another page in that same spot in
> add_to_swap() - read_swap_cache_async() can't race because it would mean
> it still has a swap entry pinned.

D'oh of course it can race, otherwise the add_to_swap() vs
read_swap_cache_async() race wouldn't exist.

Still, given that add_to_swap() handles the race I suspect the other end
does the right thing as well.

> However, add_to_swap() can already handle the race, because it used to
> race against read_swap_cache_async(). It also swap_free()s the entry so
> as to not leak entries. So I think this is indeed correct.
> 
> [ I ought to find some time to port the concurrent page-cache patches on
>   top of Nick's latest lockless series, Hugh's suggestion makes the
>   speculative get much nicer. ]
> 
> > Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> > ---
> >  mm/swapfile.c |   10 ++++++----
> >  1 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 5036b70..6fbc77e 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -366,6 +366,7 @@ int remove_exclusive_swap_page(struct page *page)
> >  	/* Is the only swap cache user the cache itself? */
> >  	retval = 0;
> >  	if (p->swap_map[swp_offset(entry)] == 1) {
> > +		spin_unlock(&swap_lock);
> >  		/* Recheck the page count with the swapcache lock held.. */
> >  		lock_page_ref_irq(page);
> >  		if ((page_count(page) == 2) && !PageWriteback(page)) {
> > @@ -374,8 +375,8 @@ int remove_exclusive_swap_page(struct page *page)
> >  			retval = 1;
> >  		}
> >  		unlock_page_ref_irq(page);
> > -	}
> > -	spin_unlock(&swap_lock);
> > +	} else
> > +		spin_unlock(&swap_lock);
> >  
> >  	if (retval) {
> >  		swap_free(entry);
> > @@ -400,13 +401,14 @@ void free_swap_and_cache(swp_entry_t entry)
> >  	p = swap_info_get(entry);
> >  	if (p) {
> >  		if (swap_entry_free(p, swp_offset(entry)) == 1) {
> > +			spin_unlock(&swap_lock);
> >  			page = find_get_page(&swapper_space, entry.val);
> >  			if (page && unlikely(TestSetPageLocked(page))) {
> >  				page_cache_release(page);
> >  				page = NULL;
> >  			}
> > -		}
> > -		spin_unlock(&swap_lock);
> > +		} else
> > +			spin_unlock(&swap_lock);
> >  	}
> >  	if (page) {
> >  		int one_user;


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-26 10:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <47DEB7F0.8040207@ct.jp.nec.com>
     [not found] ` <47DF097B.4090200@ct.jp.nec.com>
2008-03-18  1:53   ` deadlock on 2.6.24.3-rt3 Steven Rostedt
2008-03-18  9:40     ` Peter Zijlstra
2008-03-18 17:15       ` Hiroshi Shimamoto
2008-03-20 20:31         ` Hiroshi Shimamoto
2008-03-24 18:24       ` [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock Hiroshi Shimamoto
2008-03-26  9:50         ` Peter Zijlstra
2008-03-26 10:09           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).