From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752922Ab1AYRNb (ORCPT ); Tue, 25 Jan 2011 12:13:31 -0500 Received: from saturn.retrosnub.co.uk ([178.18.118.26]:55745 "EHLO saturn.retrosnub.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751769Ab1AYRN3 (ORCPT ); Tue, 25 Jan 2011 12:13:29 -0500 Date: Tue, 25 Jan 2011 17:13:25 +0000 (GMT) From: Malcolm Scott X-X-Sender: mas90@callisto.malc.org.uk To: Dave Chinner cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: [PATCH] Re: XFS deadlock in 2.6.37 In-Reply-To: <20110121052802.GA16267@dastard> Message-ID: References: <20110121052802.GA16267@dastard> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 21 Jan 2011, Dave Chinner wrote: > Looks like everything is hung up on the freelist lock. Can you > test the patch below? So far I haven't seen that crash after applying your patch; thanks. (It previously took several days to occur on a busy box, though, so not 100% sure it's gone.) However, there's now a new crash -- not sure how related this is but at some of the stack traces involve XFS. Any ideas? Unlike the last one, I could apparently recover the kernel with sysrq-i. INFO: task nfsd:3923 blocked for more than 120 seconds. [221162.675428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.691015] ffff88030a5d79f0 0000000000000046 ffff88030a5d7980 ffff880002b97ff8 [221162.706250] 0000000000013a00 ffff880339cfad80 ffff880339cfb118 ffff88030a5d7fd8 [221162.721488] ffff880339cfb120 0000000000013a00 ffff88030a5d6010 0000000000013a00 [221162.736739] Call Trace: [221162.746904] [] __mutex_lock_slowpath+0xf7/0x180 [221162.760862] [] ? exportfs_get_name+0x13e/0x160 [exportfs] [221162.775580] [] mutex_lock+0x23/0x50 [221162.788271] [] reconnect_path+0x1f2/0x300 [exportfs] [221162.802483] [] exportfs_decode_fh+0xea/0x940 [exportfs] [221162.816907] [] ? nfsd_acceptable+0x0/0x120 [nfsd] [221162.830732] [] ? cache_check+0x6f/0x210 [sunrpc] [221162.844380] [] ? exp_find_key+0x62/0xb0 [nfsd] [221162.857770] [] ? kmem_cache_alloc_notrace+0x9c/0xb0 [221162.871504] [] ? aa_dup_task_context+0x3d/0x70 [221162.871509] [] ? apparmor_cred_prepare+0x40/0x60 [221162.871519] [] nfsd_set_fh_dentry+0x214/0x3d0 [nfsd] [221162.871523] [] ? __alloc_pages_slowpath+0x1c5/0x5c0 [221162.871530] [] fh_verify+0x1c5/0x280 [nfsd] [221162.871537] [] nfsd_access+0x38/0x160 [nfsd] [221162.871546] [] nfsd3_proc_access+0x77/0xd0 [nfsd] [221162.871556] [] nfsd_dispatch+0xfe/0x250 [nfsd] [221162.871571] [] svc_process_common+0x344/0x630 [sunrpc] [221162.871576] [] ? default_wake_function+0x0/0x20 [221162.871582] [] ? nfsd+0x0/0x170 [nfsd] [221162.871593] [] svc_process+0x110/0x150 [sunrpc] [221162.871600] [] nfsd+0xc5/0x170 [nfsd] [221162.871604] [] kthread+0x96/0xa0 [221162.871608] [] kernel_thread_helper+0x4/0x10 [221162.871611] [] ? kthread+0x0/0xa0 [221162.871614] [] ? kernel_thread_helper+0x0/0x10 [221162.871672] INFO: task php5-cgi:32524 blocked for more than 120 seconds. [221162.871674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.871680] ffff880057f4fc88 0000000000000086 ffff880057f4fbe8 ffffffff815c9160 [221162.871684] 0000000000013a00 ffff88002bc22d80 ffff88002bc23118 ffff880057f4ffd8 [221162.871687] ffff88002bc23120 0000000000013a00 ffff880057f4e010 0000000000013a00 [221162.871691] Call Trace: [221162.871695] [] ? io_schedule+0x90/0xc0 [221162.871700] [] ? wait_on_page_bit+0x73/0x80 [221162.871703] [] __mutex_lock_slowpath+0xf7/0x180 [221162.871707] [] ? in_group_p+0x31/0x40 [221162.871710] [] mutex_lock+0x23/0x50 [221162.871713] [] do_lookup+0x102/0x180 [221162.871716] [] do_last+0x29d/0x420 [221162.871719] [] do_filp_open+0x203/0x5f0 [221162.871723] [] ? handle_mm_fault+0x1b9/0x420 [221162.871728] [] ? alloc_fd+0x10a/0x150 [221162.871732] [] do_sys_open+0x69/0x150 [221162.871735] [] sys_open+0x20/0x30 [221162.871739] [] system_call_fastpath+0x16/0x1b [221162.871743] INFO: task sshd:32537 blocked for more than 120 seconds. [221162.871745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.871750] ffff88007c09bc88 0000000000000082 ffff88007c09bbe8 ffffffff815c9160 [221162.871754] 0000000000013a00 ffff88027d098000 ffff88027d098398 ffff88007c09bfd8 [221162.871757] ffff88027d0983a0 0000000000013a00 ffff88007c09a010 0000000000013a00 [221162.871761] Call Trace: [221162.871763] [] ? io_schedule+0x90/0xc0 [221162.871767] [] ? wait_on_page_bit+0x73/0x80 [221162.871770] [] __mutex_lock_slowpath+0xf7/0x180 [221162.871773] [] mutex_lock+0x23/0x50 [221162.871775] [] do_lookup+0x102/0x180 [221162.871778] [] do_last+0x29d/0x420 [221162.871781] [] do_filp_open+0x203/0x5f0 [221162.871784] [] ? handle_mm_fault+0x1b9/0x420 [221162.871787] [] ? alloc_fd+0x10a/0x150 [221162.871790] [] do_sys_open+0x69/0x150 [221162.871793] [] sys_open+0x20/0x30 [221162.871796] [] system_call_fastpath+0x16/0x1b [221162.871799] INFO: task sshd:32541 blocked for more than 120 seconds. [221162.871801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.871806] ffff880056c7fc88 0000000000000082 ffff880056c7fbe8 ffffffff00000000 [221162.871809] 0000000000013a00 ffff88027d09ad80 ffff88027d09b118 ffff880056c7ffd8 [221162.871813] ffff88027d09b120 0000000000013a00 ffff880056c7e010 0000000000013a00 [221162.871816] Call Trace: [221162.871819] [] __mutex_lock_slowpath+0xf7/0x180 [221162.871822] [] mutex_lock+0x23/0x50 [221162.871825] [] do_lookup+0x102/0x180 [221162.871828] [] do_last+0x29d/0x420 [221162.871830] [] do_filp_open+0x203/0x5f0 [221162.871833] [] ? handle_mm_fault+0x1b9/0x420 [221162.871836] [] ? alloc_fd+0x10a/0x150 [221162.871839] [] do_sys_open+0x69/0x150 [221162.871842] [] sys_open+0x20/0x30 [221162.871845] [] system_call_fastpath+0x16/0x1b [221162.871849] INFO: task sshd:32547 blocked for more than 120 seconds. [221162.871850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.871855] ffff88009935bc88 0000000000000086 ffff88009935bbe8 ffffffff00000000 [221162.871859] 0000000000013a00 ffff8802279596c0 ffff880227959a58 ffff88009935bfd8 [221162.871862] ffff880227959a60 0000000000013a00 ffff88009935a010 0000000000013a00 [221162.871865] Call Trace: [221162.871868] [] __mutex_lock_slowpath+0xf7/0x180 [221162.871871] [] mutex_lock+0x23/0x50 [221162.871874] [] do_lookup+0x102/0x180 [221162.871876] [] do_last+0x29d/0x420 [221162.871879] [] do_filp_open+0x203/0x5f0 [221162.871882] [] ? handle_mm_fault+0x1b9/0x420 [221162.871885] [] ? alloc_fd+0x10a/0x150 [221162.871888] [] do_sys_open+0x69/0x150 [221162.871891] [] sys_open+0x20/0x30 [221162.871894] [] system_call_fastpath+0x16/0x1b [221162.871896] INFO: task sshd:32548 blocked for more than 120 seconds. [221162.871898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221162.871903] ffff88006cabda98 0000000000000082 ffff88006cabda38 ffffffff811787d6 [221162.871906] 0000000000013a00 ffff880066494440 ffff8800664947d8 ffff88006cabdfd8 [221162.871910] ffff8800664947e0 0000000000013a00 ffff88006cabc010 0000000000013a00 [221162.871913] Call Trace: [221162.871916] [] ? __d_lookup+0x136/0x150 [221162.871919] [] __mutex_lock_slowpath+0xf7/0x180 [221162.871922] [] mutex_lock+0x23/0x50 [221162.871925] [] do_lookup+0x102/0x180 [221162.871927] [] do_last+0x29d/0x420 [221162.871930] [] do_filp_open+0x203/0x5f0 [221162.871933] [] ? do_sync_read+0xda/0x120 [221162.871936] [] ? lru_cache_add_lru+0x2d/0x50 [221162.871941] [] open_exec+0x3d/0x110 [221162.871945] [] load_elf_binary+0xdcd/0xeb0 [221162.871948] [] ? get_user_pages+0x49/0x50 [221162.871951] [] ? get_arg_page+0x5c/0x100 [221162.871954] [] search_binary_handler+0xe2/0x300 [221162.871957] [] do_execve+0x237/0x340 [221162.871961] [] sys_execve+0x4a/0x80 [221162.871964] [] stub_execve+0x6c/0xc0 [221282.870468] INFO: task nfsd:3923 blocked for more than 120 seconds. [221282.882094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221282.895378] ffff88030a5d79f0 0000000000000046 ffff88030a5d7980 ffff880002b97ff8 [221282.908408] 0000000000013a00 ffff880339cfad80 ffff880339cfb118 ffff88030a5d7fd8 [221282.921529] ffff880339cfb120 0000000000013a00 ffff88030a5d6010 0000000000013a00 [221282.921533] Call Trace: [221282.921544] [] __mutex_lock_slowpath+0xf7/0x180 [221282.921558] [] ? exportfs_get_name+0x13e/0x160 [exportfs] [221282.921561] [] mutex_lock+0x23/0x50 [221282.921565] [] reconnect_path+0x1f2/0x300 [exportfs] [221282.921569] [] exportfs_decode_fh+0xea/0x940 [exportfs] [221282.921580] [] ? nfsd_acceptable+0x0/0x120 [nfsd] [221282.921600] [] ? cache_check+0x6f/0x210 [sunrpc] [221282.921608] [] ? exp_find_key+0x62/0xb0 [nfsd] [221282.921613] [] ? kmem_cache_alloc_notrace+0x9c/0xb0 [221282.921618] [] ? aa_dup_task_context+0x3d/0x70 [221282.921622] [] ? apparmor_cred_prepare+0x40/0x60 [221282.921629] [] nfsd_set_fh_dentry+0x214/0x3d0 [nfsd] [221282.921633] [] ? __alloc_pages_slowpath+0x1c5/0x5c0 [221282.921640] [] fh_verify+0x1c5/0x280 [nfsd] [221282.921647] [] nfsd_access+0x38/0x160 [nfsd] [221282.921655] [] nfsd3_proc_access+0x77/0xd0 [nfsd] [221282.921662] [] nfsd_dispatch+0xfe/0x250 [nfsd] [221282.921674] [] svc_process_common+0x344/0x630 [sunrpc] [221282.921679] [] ? default_wake_function+0x0/0x20 [221282.921685] [] ? nfsd+0x0/0x170 [nfsd] [221282.921696] [] svc_process+0x110/0x150 [sunrpc] [221282.921702] [] nfsd+0xc5/0x170 [nfsd] [221282.921707] [] kthread+0x96/0xa0 [221282.921711] [] kernel_thread_helper+0x4/0x10 [221282.921714] [] ? kthread+0x0/0xa0 [221282.921717] [] ? kernel_thread_helper+0x0/0x10 [221282.921755] INFO: task kworker/0:2:21740 blocked for more than 120 seconds. [221282.921756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221282.921762] ffff880345031ac0 0000000000000046 ffff880345031a10 ffffffff812db486 [221282.921765] 0000000000013a00 ffff88004c5bc440 ffff88004c5bc7d8 ffff880345031fd8 [221282.921768] ffff88004c5bc7e0 0000000000013a00 ffff880345030010 0000000000013a00 [221282.921772] Call Trace: [221282.921776] [] ? rb_erase+0xd6/0x160 [221282.921782] [] __mutex_lock_slowpath+0xf7/0x180 [221282.921787] [] ? enqueue_entity+0x132/0x1b0 [221282.921789] [] mutex_lock+0x23/0x50 [221282.921825] [] xfs_file_aio_write+0x4fa/0xac0 [xfs] [221282.921829] [] ? tg_nop+0x0/0x10 [221282.921832] [] ? enqueue_sleeper+0x158/0x230 [221282.921836] [] do_sync_write+0xda/0x120 [221282.921839] [] ? try_to_wake_up+0xc3/0x410 [221282.921844] [] ? ktime_get_ts+0xa9/0xe0 [221282.921848] [] do_acct_process+0x3fb/0x500 [221282.921851] [] acct_process+0x7e/0xb0 [221282.921854] [] do_exit+0x2a5/0x400 [221282.921857] [] ? worker_thread+0x0/0x410 [221282.921860] [] kthread+0x8e/0xa0 [221282.921863] [] kernel_thread_helper+0x4/0x10 [221282.921866] [] ? kthread+0x0/0xa0 [221282.921868] [] ? kernel_thread_helper+0x0/0x10 [221282.921872] INFO: task kworker/3:0:16949 blocked for more than 120 seconds. [221282.921873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221282.921879] ffff88021acd9ac0 0000000000000046 0000000000000001 ffff88009e0d3a00 [221282.921882] 0000000000013a00 ffff8800020696c0 ffff880002069a58 ffff88021acd9fd8 [221282.921885] ffff880002069a60 0000000000013a00 ffff88021acd8010 0000000000013a00 [221282.921889] Call Trace: [221282.921892] [] __mutex_lock_slowpath+0xf7/0x180 [221282.921895] [] mutex_lock+0x23/0x50 [221282.921915] [] xfs_file_aio_write+0x4fa/0xac0 [xfs] [221282.921919] [] ? default_spin_lock_flags+0x9/0x10 [221282.921923] [] ? sched_clock+0x9/0x10 [221282.921926] [] ? rb_insert_color+0x9d/0x160 [221282.921929] [] do_sync_write+0xda/0x120 [221282.921932] [] ? try_to_wake_up+0xc3/0x410 [221282.921935] [] ? ktime_get_ts+0xa9/0xe0 [221282.921938] [] do_acct_process+0x3fb/0x500 [221282.921941] [] acct_process+0x7e/0xb0 [221282.921944] [] do_exit+0x2a5/0x400 [221282.921947] [] ? worker_thread+0x0/0x410 [221282.921950] [] kthread+0x8e/0xa0 [221282.921952] [] kernel_thread_helper+0x4/0x10 [221282.921955] [] ? kthread+0x0/0xa0 [221282.921958] [] ? kernel_thread_helper+0x0/0x10 [221282.921961] INFO: task kworker/1:2:29640 blocked for more than 120 seconds. [221282.921962] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [221282.921968] ffff880144665ac0 0000000000000046 ffff880144665a10 ffffffff812db486 [221282.921971] 0000000000013a00 ffff8800025b96c0 ffff8800025b9a58 ffff880144665fd8 [221282.921974] ffff8800025b9a60 0000000000013a00 ffff880144664010 0000000000013a00 [221282.921978] Call Trace: [221282.921980] [] ? rb_erase+0xd6/0x160 [221282.921984] [] __mutex_lock_slowpath+0xf7/0x180 [221282.921987] [] mutex_lock+0x23/0x50 [221282.922007] [] xfs_file_aio_write+0x4fa/0xac0 [xfs] [221282.922010] [] ? put_dec+0x59/0x60 [221282.922030] [] ? xfs_icsb_count+0x8b/0xc0 [xfs] [221282.922034] [] ? default_spin_lock_flags+0x9/0x10 [221282.922037] [] ? _raw_spin_lock_irqsave+0x2f/0x40 [221282.922040] [] do_sync_write+0xda/0x120 [221282.922044] [] ? ktime_get_ts+0xa9/0xe0 [221282.922047] [] do_acct_process+0x3fb/0x500 [221282.922050] [] acct_process+0x7e/0xb0 [221282.922053] [] do_exit+0x2a5/0x400 [221282.922056] [] ? worker_thread+0x0/0x410 [221282.922059] [] kthread+0x8e/0xa0 [221282.922062] [] kernel_thread_helper+0x4/0x10 [221282.922065] [] ? kthread+0x0/0xa0 [221282.922067] [] ? kernel_thread_helper+0x0/0x10 -- Malcolm Scott