LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* 2.6.24 regression: pan hanging unkilleable and un-straceable @ 2008-01-21 20:58 Frederik Himpe 2008-01-22 0:05 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Frederik Himpe @ 2008-01-21 20:58 UTC (permalink / raw) To: linux-kernel With Linux 2.6.24-rc8 I often have the problem that the pan usenet reader starts using 100% of CPU time after some time. When this happens, kill -9 does not work, and strace just hangs when trying to attach to the process. The same with gdb. ps shows the process as being in the R state. I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: Jan 21 21:45:01 Anastacia kernel: pan R running task 0 8063 1 Jan 21 21:45:01 Anastacia kernel: ssh S 0000000000000000 0 8323 6809 Jan 21 21:45:01 Anastacia kernel: ffff81000a51f9c8 0000000000000082 ffff81000ed6dc00 ffffffff8045ad6f Jan 21 21:45:01 Anastacia kernel: ffffffff805875b8 ffffffff80623980 ffffffff80623980 ffffffff80623980 Jan 21 21:45:01 Anastacia kernel: ffffffff8061fe80 ffffffff80623980 ffff81003941b8a8 ffffffff8043142b Jan 21 21:45:01 Anastacia kernel: Call Trace: Jan 21 21:45:01 Anastacia kernel: [arp_bind_neighbour+143/208] arp_bind_neighbour+0x8f/0xd0 Jan 21 21:45:01 Anastacia kernel: [rt_intern_hash+955/1056] rt_intern_hash+0x3bb/0x420 Jan 21 21:45:01 Anastacia kernel: [nommu_map_single+56/96] nommu_map_single+0x38/0x60 Jan 21 21:45:01 Anastacia kernel: [schedule_timeout+149/208] schedule_timeout+0x95/0xd0 Jan 21 21:45:01 Anastacia kernel: [tty_ldisc_deref+82/128] tty_ldisc_deref+0x52/0x80 Jan 21 21:45:01 Anastacia kernel: [tty_poll+145/160] tty_poll+0x91/0xa0 Jan 21 21:45:01 Anastacia kernel: [do_select+1128/1376] do_select+0x468/0x560 Jan 21 21:45:01 Anastacia kernel: [__pollwait+0/304] __pollwait+0x0/0x130 Jan 21 21:45:01 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 21 21:45:01 Anastacia kernel:last message repeated 2 times Jan 21 21:45:01 Anastacia kernel: [enqueue_task+19/48] enqueue_task+0x13/0x30 Jan 21 21:45:01 Anastacia kernel: [try_to_wake_up+98/720] try_to_wake_up+0x62/0x2d0 Jan 21 21:45:01 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 21 21:45:01 Anastacia kernel: [tcp_recvmsg+1463/3360] tcp_recvmsg+0x5b7/0xd20 Jan 21 21:45:01 Anastacia kernel: [__wake_up_common+90/144] __wake_up_common+0x5a/0x90 Jan 21 21:45:01 Anastacia kernel: [__wake_up+67/112] __wake_up+0x43/0x70 Jan 21 21:45:01 Anastacia kernel: [n_tty_receive_buf+821/3888] n_tty_receive_buf+0x335/0xf30 Jan 21 21:45:01 Anastacia kernel: [sock_aio_read+349/368] sock_aio_read+0x15d/0x170 Jan 21 21:45:01 Anastacia kernel: [core_sys_select+521/768] core_sys_select+0x209/0x300 Jan 21 21:45:01 Anastacia kernel: [remove_wait_queue+25/96] remove_wait_queue+0x19/0x60 Jan 21 21:45:01 Anastacia kernel: [__wake_up+67/112] __wake_up+0x43/0x70 Jan 21 21:45:01 Anastacia kernel: [tty_ldisc_deref+82/128] tty_ldisc_deref+0x52/0x80 Jan 21 21:45:01 Anastacia kernel: [tty_write+569/592] tty_write+0x239/0x250 Jan 21 21:45:01 Anastacia kernel: [sys_select+68/448] sys_select+0x44/0x1c0 Jan 21 21:45:01 Anastacia kernel: [sys_write+83/144] sys_write+0x53/0x90 Jan 21 21:45:01 Anastacia kernel: [system_call+126/131] system_call+0x7e/0x83 What could be causing this problem? -- Frederik Himpe <fhimpe@telenet.be> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-21 20:58 2.6.24 regression: pan hanging unkilleable and un-straceable Frederik Himpe @ 2008-01-22 0:05 ` Nick Piggin 2008-01-22 5:03 ` Mike Galbraith 2008-01-22 10:37 ` Ingo Molnar 0 siblings, 2 replies; 19+ messages in thread From: Nick Piggin @ 2008-01-22 0:05 UTC (permalink / raw) To: Frederik Himpe; +Cc: linux-kernel On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > reader starts using 100% of CPU time after some time. When this happens, > kill -9 does not work, and strace just hangs when trying to attach to > the process. The same with gdb. ps shows the process as being in the R > state. > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > Jan 21 21:45:01 Anastacia kernel: pan R running task 0 Well I've twice tried to submit a patch to print stacks for running tasks as well, but nobody seems interested. It would at least give a chance to see something. Can you post a few Sysrq+P traces? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 0:05 ` Nick Piggin @ 2008-01-22 5:03 ` Mike Galbraith 2008-01-22 5:25 ` Nick Piggin 2008-01-22 10:37 ` Ingo Molnar 1 sibling, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2008-01-22 5:03 UTC (permalink / raw) To: Nick Piggin; +Cc: Frederik Himpe, linux-kernel On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote: > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > reader starts using 100% of CPU time after some time. When this happens, > > kill -9 does not work, and strace just hangs when trying to attach to > > the process. The same with gdb. ps shows the process as being in the R > > state. > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > Jan 21 21:45:01 Anastacia kernel: pan R running task 0 > > Well I've twice tried to submit a patch to print stacks for running > tasks as well, but nobody seems interested. It would at least give a > chance to see something. I've hit same twice recently (not pan, and not repeatable). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:03 ` Mike Galbraith @ 2008-01-22 5:25 ` Nick Piggin 2008-01-22 5:47 ` Mike Galbraith ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Nick Piggin @ 2008-01-22 5:25 UTC (permalink / raw) To: Mike Galbraith; +Cc: Frederik Himpe, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1071 bytes --] On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote: > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > reader starts using 100% of CPU time after some time. When this > > > happens, kill -9 does not work, and strace just hangs when trying to > > > attach to the process. The same with gdb. ps shows the process as > > > being in the R state. > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > 0 > > > > Well I've twice tried to submit a patch to print stacks for running > > tasks as well, but nobody seems interested. It would at least give a > > chance to see something. > > I've hit same twice recently (not pan, and not repeatable). Nasty. The attached patch is something really simple that can sometimes help. sysrq+p is also an option, if you're on a UP system. Any luck getting traces? [-- Attachment #2: show-task-running-stack.patch --] [-- Type: text/x-diff, Size: 453 bytes --] Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -4920,8 +4920,7 @@ static void show_task(struct task_struct printk(KERN_CONT "%5lu %5d %6d\n", free, task_pid_nr(p), task_pid_nr(p->real_parent)); - if (state != TASK_RUNNING) - show_stack(p, NULL); + show_stack(p, NULL); } void show_state_filter(unsigned long state_filter) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:25 ` Nick Piggin @ 2008-01-22 5:47 ` Mike Galbraith 2008-02-04 14:49 ` Mike Galbraith 2008-01-22 10:38 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2008-01-22 5:47 UTC (permalink / raw) To: Nick Piggin; +Cc: Frederik Himpe, linux-kernel On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > > I've hit same twice recently (not pan, and not repeatable). > > Nasty. The attached patch is something really simple that can sometimes help. > sysrq+p is also an option, if you're on a UP system. SMP (P4/HT imitating real cores) > Any luck getting traces? We'll see. Armed. -Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:47 ` Mike Galbraith @ 2008-02-04 14:49 ` Mike Galbraith 2008-02-04 23:02 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Mike Galbraith @ 2008-02-04 14:49 UTC (permalink / raw) To: Nick Piggin; +Cc: Frederik Himpe, linux-kernel On Tue, 2008-01-22 at 06:47 +0100, Mike Galbraith wrote: > On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > > > > I've hit same twice recently (not pan, and not repeatable). > > > > Nasty. The attached patch is something really simple that can sometimes help. > > sysrq+p is also an option, if you're on a UP system. > > SMP (P4/HT imitating real cores) > > > Any luck getting traces? > > We'll see. Armed. Hm. ld just went loopy (but killable) in v2.6.24-6928-g9135f19. During kbuild, modpost segfaulted, restart build, ld goes gaga. Third attempt, build finished. Not what I hit before, but mentionable. [ 674.589134] modpost[18588]: segfault at 3e8dc42c ip 0804a96d sp af982920 error 5 in modpost[8048000+9000] [ 674.589211] mm/memory.c:115: bad pgd 3e081163. [ 674.589214] mm/memory.c:115: bad pgd 3e0d2163. [ 674.589217] mm/memory.c:115: bad pgd 3eb01163. [ 1407.322144] ======================= [ 1407.322144] ld R running 0 21963 21962 [ 1407.322144] db9d7f1c 00200086 c75f9020 b1814300 b0428300 b0428300 b0428300 c75f9280 [ 1407.322144] b1814300 00000001 db9d7000 00000000 d08c2f90 dba4f300 00000002 00000000 [ 1407.322144] b1810120 dba4f334 00200046 ffffffff db9d7000 c75f9020 db9d7f30 b02f333f [ 1407.322144] Call Trace: [ 1407.322144] [<b02f333f>] preempt_schedule_irq+0x45/0x5b [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 [ 1407.322144] [<b0104d6e>] need_resched+0x1f/0x21 [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 [ 1407.322144] [<b0117a5c>] ? do_page_fault+0x4c/0x470 [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 [ 1407.322144] [<b02f4a3a>] ? error_code+0x72/0x78 [ 1407.322144] [<b02f0000>] ? init_transmeta+0xcf/0x22f <== zzt P4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-02-04 14:49 ` Mike Galbraith @ 2008-02-04 23:02 ` Nick Piggin 0 siblings, 0 replies; 19+ messages in thread From: Nick Piggin @ 2008-02-04 23:02 UTC (permalink / raw) To: Mike Galbraith; +Cc: Frederik Himpe, linux-kernel On Tuesday 05 February 2008 01:49, Mike Galbraith wrote: > On Tue, 2008-01-22 at 06:47 +0100, Mike Galbraith wrote: > > On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > > > > I've hit same twice recently (not pan, and not repeatable). > > > > > > Nasty. The attached patch is something really simple that can sometimes > > > help. sysrq+p is also an option, if you're on a UP system. > > > > SMP (P4/HT imitating real cores) > > > > > Any luck getting traces? > > > > We'll see. Armed. > > Hm. ld just went loopy (but killable) in v2.6.24-6928-g9135f19. During > kbuild, modpost segfaulted, restart build, ld goes gaga. Third attempt, > build finished. Not what I hit before, but mentionable. > > > [ 674.589134] modpost[18588]: segfault at 3e8dc42c ip 0804a96d sp af982920 > error 5 in modpost[8048000+9000] [ 674.589211] mm/memory.c:115: bad pgd > 3e081163. > [ 674.589214] mm/memory.c:115: bad pgd 3e0d2163. > [ 674.589217] mm/memory.c:115: bad pgd 3eb01163. Hmm, this _could_ be bad memory. Or if it is very easy to reproduce with a particular kernel version, then it is probably a memory scribble from another part of the kernel :( First thing I guess would be easy and helpful to run memtest86 for a while if you have time. If that's clean, then I don't have another good option except to bisect the problem. Turning on DEBUG_VM, DEBUG_SLAB, DEBUG_LIST, DEBUG_PAGEALLOC, DEBUG_STACKOVERFLOW, DEBUG_RODATA might help catch it sooner... SLAB and PAGEALLOC could slow you down quite a bit though. And if the problem is quite reproduceable, then obviously don't touch your config ;) Thanks, Nick > > [ 1407.322144] ======================= > [ 1407.322144] ld R running 0 21963 21962 > [ 1407.322144] db9d7f1c 00200086 c75f9020 b1814300 b0428300 b0428300 > b0428300 c75f9280 [ 1407.322144] b1814300 00000001 db9d7000 00000000 > d08c2f90 dba4f300 00000002 00000000 [ 1407.322144] b1810120 dba4f334 > 00200046 ffffffff db9d7000 c75f9020 db9d7f30 b02f333f [ 1407.322144] Call > Trace: > [ 1407.322144] [<b02f333f>] preempt_schedule_irq+0x45/0x5b > [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 > [ 1407.322144] [<b0104d6e>] need_resched+0x1f/0x21 > [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 > [ 1407.322144] [<b0117a5c>] ? do_page_fault+0x4c/0x470 > [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470 > [ 1407.322144] [<b02f4a3a>] ? error_code+0x72/0x78 > [ 1407.322144] [<b02f0000>] ? init_transmeta+0xcf/0x22f <== zzt P4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:25 ` Nick Piggin 2008-01-22 5:47 ` Mike Galbraith @ 2008-01-22 10:38 ` Ingo Molnar 2008-01-24 5:30 ` Valdis.Kletnieks 2008-01-26 13:29 ` Frederik Himpe 3 siblings, 0 replies; 19+ messages in thread From: Ingo Molnar @ 2008-01-22 10:38 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Frederik Himpe, linux-kernel * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -4920,8 +4920,7 @@ static void show_task(struct task_struct > printk(KERN_CONT "%5lu %5d %6d\n", free, > task_pid_nr(p), task_pid_nr(p->real_parent)); > > - if (state != TASK_RUNNING) > - show_stack(p, NULL); > + show_stack(p, NULL); thanks - applied to sched-devel.git. We'll see whether it causes any problems. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:25 ` Nick Piggin 2008-01-22 5:47 ` Mike Galbraith 2008-01-22 10:38 ` Ingo Molnar @ 2008-01-24 5:30 ` Valdis.Kletnieks 2008-01-26 13:29 ` Frederik Himpe 3 siblings, 0 replies; 19+ messages in thread From: Valdis.Kletnieks @ 2008-01-24 5:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Frederik Himpe, linux-kernel [-- Attachment #1: Type: text/plain, Size: 653 bytes --] On Tue, 22 Jan 2008 16:25:58 +1100, Nick Piggin said: > > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -4920,8 +4920,7 @@ static void show_task(struct task_struct > printk(KERN_CONT "%5lu %5d %6d\n", free, > task_pid_nr(p), task_pid_nr(p->real_parent)); > > - if (state != TASK_RUNNING) > - show_stack(p, NULL); > + show_stack(p, NULL); > } Maybe something like this would be better? if (state == TASK_RUNNING) printk("running task, stack trace may be inaccurate\n"); show_stack(p, NULL); Just a thought.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 5:25 ` Nick Piggin ` (2 preceding siblings ...) 2008-01-24 5:30 ` Valdis.Kletnieks @ 2008-01-26 13:29 ` Frederik Himpe 2008-01-26 13:46 ` Nick Piggin 2008-01-28 1:46 ` Nick Piggin 3 siblings, 2 replies; 19+ messages in thread From: Frederik Himpe @ 2008-01-26 13:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, linux-kernel On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > > reader starts using 100% of CPU time after some time. When this > > > > happens, kill -9 does not work, and strace just hangs when trying to > > > > attach to the process. The same with gdb. ps shows the process as > > > > being in the R state. > > > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > > 0 > Nasty. The attached patch is something really simple that can sometimes help. > sysrq+p is also an option, if you're on a UP system. > > Any luck getting traces? I just succeeded to reproduce the problem with this patch. Does this smell like an XFS problem? Jan 26 14:17:43 Anastacia kernel: pan R running task 0 7564 1 Jan 26 14:17:43 Anastacia kernel: 000000003f5b3248 0000000000001000 ffffffff880c28b0 0000000000000000 Jan 26 14:17:43 Anastacia kernel: ffff81003f5b3248 ffff81002d1ed900 000000002d1ed900 0000000000000000 Jan 26 14:17:43 Anastacia kernel: ffff810016050dd0 fffff000fffff000 0000000000000000 ffff81002d1eda10 Jan 26 14:17:43 Anastacia kernel: Call Trace: Jan 26 14:17:43 Anastacia kernel: [_end+127964408/2129947720] :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [unix_poll+0/176] unix_poll+0x0/0xb0 Jan 26 14:17:43 Anastacia kernel: [_end+127964408/2129947720] :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [iov_iter_copy_from_user_atomic+65/160] iov_iter_copy_from_user_atomic+0x41/0xa0 Jan 26 14:17:43 Anastacia kernel: [iov_iter_copy_from_user_atomic+46/160] iov_iter_copy_from_user_atomic+0x2e/0xa0 Jan 26 14:17:43 Anastacia kernel: [generic_file_buffered_write+383/1728] generic_file_buffered_write+0x17f/0x6c0 Jan 26 14:17:43 Anastacia kernel: [current_fs_time+30/48] current_fs_time+0x1e/0x30 Jan 26 14:17:43 Anastacia kernel: [_end+127997742/2129947720] :xfs:xfs_write+0x676/0x910 Jan 26 14:17:43 Anastacia kernel: [find_lock_page+61/192] find_lock_page+0x3d/0xc0 Jan 26 14:17:43 Anastacia kernel: [_end+127981080/2129947720] :xfs:xfs_file_aio_write+0x0/0x50 Jan 26 14:17:43 Anastacia kernel: [do_sync_readv_writev+203/272] do_sync_readv_writev+0xcb/0x110 Jan 26 14:17:43 Anastacia kernel: [__do_fault+501/1056] __do_fault+0x1f5/0x420 Jan 26 14:17:43 Anastacia kernel: [autoremove_wake_function+0/48] autoremove_wake_function+0x0/0x30 Jan 26 14:17:43 Anastacia kernel: [handle_mm_fault+1344/2048] handle_mm_fault+0x540/0x800 Jan 26 14:17:43 Anastacia kernel: [rw_copy_check_uvector+157/336] rw_copy_check_uvector+0x9d/0x150 Jan 26 14:17:43 Anastacia kernel: [do_readv_writev+253/560] do_readv_writev+0xfd/0x230 Jan 26 14:17:43 Anastacia kernel: [sys_writev+83/144] sys_writev+0x53/0x90 Jan 26 14:17:43 Anastacia kernel: [system_call+126/131] system_call+0x7e/0x83 Jan 26 14:17:43 Anastacia kernel: Jan 26 14:17:43 Anastacia kernel: pan S 0000000000000000 0 7565 1 Jan 26 14:17:43 Anastacia kernel: ffff810001401c58 0000000000000086 ffff810001401bb8 ffff81003cd3a280 Jan 26 14:17:43 Anastacia kernel: ffff81003cd3a300 ffffffff80623980 ffffffff80623980 ffffffff80623980 Jan 26 14:17:43 Anastacia kernel: ffffffff8061fe80 ffffffff80623980 ffff810001bcc9a8 ffff8100299b34e8 Jan 26 14:17:43 Anastacia kernel: Call Trace: Jan 26 14:17:43 Anastacia kernel: [__qdisc_run+173/528] __qdisc_run+0xad/0x210 Jan 26 14:17:43 Anastacia kernel: [dev_queue_xmit+216/768] dev_queue_xmit+0xd8/0x300 Jan 26 14:17:43 Anastacia kernel: [futex_wait+838/912] futex_wait+0x346/0x390 Jan 26 14:17:43 Anastacia kernel: [tcp_connect+851/896] tcp_connect+0x353/0x380 Jan 26 14:17:43 Anastacia kernel: [tcp_v4_connect+914/1696] tcp_v4_connect+0x392/0x6a0 Jan 26 14:17:43 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [do_futex+287/3008] do_futex+0x11f/0xbc0 Jan 26 14:17:43 Anastacia kernel: [_spin_lock_bh+9/32] _spin_lock_bh+0x9/0x20 Jan 26 14:17:43 Anastacia kernel: [cp_new_stat+229/256] cp_new_stat+0xe5/0x100 Jan 26 14:17:43 Anastacia kernel: [sys_futex+171/304] sys_futex+0xab/0x130 Jan 26 14:17:43 Anastacia kernel: [system_call+126/131] system_call+0x7e/0x83 Jan 26 14:17:43 Anastacia kernel: Jan 26 14:17:43 Anastacia kernel: pan S 0000000000000000 0 7566 1 Jan 26 14:17:43 Anastacia kernel: ffff8100013fdc58 0000000000000086 0000000000a492c0 0000000000a493c8 Jan 26 14:17:43 Anastacia kernel: 0000000000a494d0 ffffffff80623980 ffffffff80623980 ffffffff80623980 Jan 26 14:17:43 Anastacia kernel: ffffffff8061fe80 ffffffff80623980 ffff810001bcd8a8 ffff8100339756c8 Jan 26 14:17:43 Anastacia kernel: Call Trace: Jan 26 14:17:43 Anastacia kernel: [enqueue_entity+55/112] enqueue_entity+0x37/0x70 Jan 26 14:17:43 Anastacia kernel: [enqueue_task_fair+56/80] enqueue_task_fair+0x38/0x50 Jan 26 14:17:43 Anastacia kernel: [futex_wait+838/912] futex_wait+0x346/0x390 Jan 26 14:17:43 Anastacia kernel: [__wake_up+67/112] __wake_up+0x43/0x70 Jan 26 14:17:43 Anastacia kernel: [wake_futex+57/80] wake_futex+0x39/0x50 Jan 26 14:17:43 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [do_futex+287/3008] do_futex+0x11f/0xbc0 Jan 26 14:17:43 Anastacia kernel: [__up_read+33/176] __up_read+0x21/0xb0 Jan 26 14:17:43 Anastacia kernel: [do_page_fault+411/2000] do_page_fault+0x19b/0x7d0 Jan 26 14:17:43 Anastacia kernel: [sys_futex+171/304] sys_futex+0xab/0x130 Jan 26 14:17:43 Anastacia kernel: [system_call+126/131] system_call+0x7e/0x83 Jan 26 14:17:43 Anastacia kernel: Jan 26 14:17:43 Anastacia kernel: pan S 0000000000000000 0 7567 1 Jan 26 14:17:43 Anastacia kernel: ffff810001409c58 0000000000000086 0000000000a51618 0000000000a51720 Jan 26 14:17:43 Anastacia kernel: 0000000000a51828 ffffffff80623980 ffffffff80623980 ffffffff80623980 Jan 26 14:17:43 Anastacia kernel: ffffffff8061fe80 ffffffff80623980 ffff8100339758a8 ffff810033974f48 Jan 26 14:17:43 Anastacia kernel: Call Trace: Jan 26 14:17:43 Anastacia kernel: [enqueue_entity+55/112] enqueue_entity+0x37/0x70 Jan 26 14:17:43 Anastacia kernel: [enqueue_task_fair+56/80] enqueue_task_fair+0x38/0x50 Jan 26 14:17:43 Anastacia kernel: [futex_wait+838/912] futex_wait+0x346/0x390 Jan 26 14:17:43 Anastacia kernel: [__wake_up+67/112] __wake_up+0x43/0x70 Jan 26 14:17:43 Anastacia kernel: [wake_futex+57/80] wake_futex+0x39/0x50 Jan 26 14:17:43 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [do_futex+287/3008] do_futex+0x11f/0xbc0 Jan 26 14:17:43 Anastacia kernel: [__up_read+33/176] __up_read+0x21/0xb0 Jan 26 14:17:43 Anastacia kernel: [do_page_fault+411/2000] do_page_fault+0x19b/0x7d0 Jan 26 14:17:43 Anastacia kernel: [sys_futex+171/304] sys_futex+0xab/0x130 Jan 26 14:17:43 Anastacia kernel: [system_call+126/131] system_call+0x7e/0x83 Jan 26 14:17:43 Anastacia kernel: Jan 26 14:17:43 Anastacia kernel: pan S 0000000000000000 0 7568 1 Jan 26 14:17:43 Anastacia kernel: ffff8100013fbc58 0000000000000086 00002aaaabbe4320 ffffffffffffadbc Jan 26 14:17:43 Anastacia kernel: 0000000000000000 ffffffff80623980 ffffffff80623980 ffffffff80623980 Jan 26 14:17:43 Anastacia kernel: ffffffff8061fe80 ffffffff80623980 ffff810033975128 0000000001137ad0 Jan 26 14:17:43 Anastacia kernel: Call Trace: Jan 26 14:17:43 Anastacia kernel: [futex_wait+838/912] futex_wait+0x346/0x390 Jan 26 14:17:43 Anastacia kernel: [zone_statistics+125/128] zone_statistics+0x7d/0x80 Jan 26 14:17:43 Anastacia kernel: [__alloc_pages+170/976] __alloc_pages+0xaa/0x3d0 Jan 26 14:17:43 Anastacia kernel: [default_wake_function+0/16] default_wake_function+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: [do_futex+287/3008] do_futex+0x11f/0xbc0 Jan 26 14:17:43 Anastacia kernel: [__up_read+33/176] __up_read+0x21/0xb0 Jan 26 14:17:43 Anastacia kernel: [do_page_fault+411/2000] do_page_fault+0x19b/0x7d0 Jan 26 14:17:43 Anastacia kernel: [sys_futex+171/304] sys_futex+0xab/0x130 SysRq : Show Regs CPU 0: Modules linked in: usb_storage af_packet nvidia(P) vboxdrv ipv6 fuse snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq cpufreq_ondemand video output tc1100_wmi sbs sbshc container dock battery ac binfmt_misc loop ext3 jbd dm_mirror sr_mod dm_mod pata_amd ata_generic sata_sil usbmouse usbhid ff_memless floppy usblp powernow_k8 freq_table snd_pcm_oss snd_mixer_oss snd_mpu401 snd_mpu401_uart snd_rawmidi ns558 gameport parport_pc snd_seq_device parport rtc_cmos pcspkr snd_intel8x0 k8temp snd_ac97_codec ohci1394 ac97_bus ieee1394 snd_pcm snd_timer skge ohci_hcd ehci_hcd snd soundcore usbcore forcedeth snd_page_alloc ssb fan pcmcia pcmcia_core i2c_nforce2 i2c_core button thermal processor sg evdev genrtc xfs scsi_wait_scan sd_mod sata_nv libata scsi_mod Pid: 7564, comm: pan Tainted: P 2.6.24-desktop-0.rc8.2.1mdv #1 RIP: 0010:[<ffffffff802d5a57>] [<ffffffff802d5a57>] block_write_begin +0x87/0xe0 RSP: 0018:ffff81002e9b5ac8 EFLAGS: 00000286 RAX: ffff81003f5b3248 RBX: 00000000fffffff4 RCX: 0000000000000000 RDX: ffff81003f5b3248 RSI: 0000000000000000 RDI: ffff81002d1eda18 RBP: ffff81003f5b3248 R08: 0000000000000000 R09: ffff81002e9b5be0 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffffff880c28b0 R14: 0000000000001000 R15: 000000003f5b3248 FS: 00002b6bb3bf7960(0000) GS:ffffffff80589000(0000) knlGS:00000000f78568d0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b0537cf6000 CR3: 00000000391d2000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff802d5a4f>] block_write_begin+0x7f/0xe0 [<ffffffff880c2102>] :xfs:xfs_vm_write_begin+0x22/0x30 [<ffffffff880c28b0>] :xfs:xfs_get_blocks+0x0/0x10 [<ffffffff80280ba9>] generic_file_buffered_write+0x149/0x6c0 [<ffffffff80240a2e>] current_fs_time+0x1e/0x30 [<ffffffff880caae6>] :xfs:xfs_write+0x676/0x910 [<ffffffff8027f98d>] find_lock_page+0x3d/0xc0 [<ffffffff880c69d0>] :xfs:xfs_file_aio_write+0x0/0x50 [<ffffffff802aee9b>] do_sync_readv_writev+0xcb/0x110 [<ffffffff8028eb95>] __do_fault+0x1f5/0x420 [<ffffffff802522b0>] autoremove_wake_function+0x0/0x30 [<ffffffff80290b90>] handle_mm_fault+0x540/0x800 [<ffffffff802aecdd>] rw_copy_check_uvector+0x9d/0x150 [<ffffffff802af5fd>] do_readv_writev+0xfd/0x230 [<ffffffff802afc33>] sys_writev+0x53/0x90 [<ffffffff8020c36e>] system_call+0x7e/0x83 -- Frederik Himpe <fhimpe@telenet.be> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-26 13:29 ` Frederik Himpe @ 2008-01-26 13:46 ` Nick Piggin 2008-01-26 14:27 ` Pascal Terjan 2008-01-28 1:46 ` Nick Piggin 1 sibling, 1 reply; 19+ messages in thread From: Nick Piggin @ 2008-01-26 13:46 UTC (permalink / raw) To: Frederik Himpe; +Cc: Mike Galbraith, linux-kernel On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > > > reader starts using 100% of CPU time after some time. When this > > > > > happens, kill -9 does not work, and strace just hangs when trying > > > > > to attach to the process. The same with gdb. ps shows the process > > > > > as being in the R state. > > > > > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > > > 0 > > > > Nasty. The attached patch is something really simple that can sometimes > > help. sysrq+p is also an option, if you're on a UP system. > > > > Any luck getting traces? > > I just succeeded to reproduce the problem with this patch. Does this > smell like an XFS problem? Possible. Though I think it is more likely to be a bug in the new deadlock avoidance code in the generic buffered write path. Dang... I wonder why this hasn't come up earlier. It looks like pan's use of writev might be tickling it. How quickly can you reproduce this? Can you use strace to see what the hanging syscall looks like? Thanks, Nick > Jan 26 14:17:43 Anastacia kernel: pan R running task 0 > 7564 1 Jan 26 14:17:43 Anastacia kernel: 000000003f5b3248 > 0000000000001000 ffffffff880c28b0 0000000000000000 Jan 26 14:17:43 > Anastacia kernel: ffff81003f5b3248 ffff81002d1ed900 000000002d1ed900 > 0000000000000000 Jan 26 14:17:43 Anastacia kernel: ffff810016050dd0 > fffff000fffff000 0000000000000000 ffff81002d1eda10 Jan 26 14:17:43 > Anastacia kernel: Call Trace: > Jan 26 14:17:43 Anastacia kernel: [_end+127964408/2129947720] > :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: > [unix_poll+0/176] unix_poll+0x0/0xb0 Jan 26 14:17:43 Anastacia kernel: > [_end+127964408/2129947720] :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 > Anastacia kernel: [iov_iter_copy_from_user_atomic+65/160] > iov_iter_copy_from_user_atomic+0x41/0xa0 Jan 26 14:17:43 Anastacia kernel: > [iov_iter_copy_from_user_atomic+46/160] > iov_iter_copy_from_user_atomic+0x2e/0xa0 Jan 26 14:17:43 Anastacia kernel: > [generic_file_buffered_write+383/1728] > generic_file_buffered_write+0x17f/0x6c0 Jan 26 14:17:43 Anastacia kernel: > [current_fs_time+30/48] current_fs_time+0x1e/0x30 Jan 26 14:17:43 Anastacia > kernel: [_end+127997742/2129947720] :xfs:xfs_write+0x676/0x910 Jan 26 > 14:17:43 Anastacia kernel: [find_lock_page+61/192] > find_lock_page+0x3d/0xc0 Jan 26 14:17:43 Anastacia kernel: > [_end+127981080/2129947720] :xfs:xfs_file_aio_write+0x0/0x50 Jan 26 > 14:17:43 Anastacia kernel: [do_sync_readv_writev+203/272] > do_sync_readv_writev+0xcb/0x110 Jan 26 14:17:43 Anastacia kernel: > [__do_fault+501/1056] __do_fault+0x1f5/0x420 Jan 26 14:17:43 Anastacia > kernel: [autoremove_wake_function+0/48] autoremove_wake_function+0x0/0x30 > Jan 26 14:17:43 Anastacia kernel: [handle_mm_fault+1344/2048] > handle_mm_fault+0x540/0x800 Jan 26 14:17:43 Anastacia kernel: > [rw_copy_check_uvector+157/336] rw_copy_check_uvector+0x9d/0x150 Jan 26 > 14:17:43 Anastacia kernel: [do_readv_writev+253/560] > do_readv_writev+0xfd/0x230 Jan 26 14:17:43 Anastacia kernel: > [sys_writev+83/144] sys_writev+0x53/0x90 Jan 26 14:17:43 Anastacia kernel: > [system_call+126/131] system_call+0x7e/0x83 Jan 26 14:17:43 Anastacia > kernel: > SysRq : Show Regs > CPU 0: > Modules linked in: usb_storage af_packet nvidia(P) vboxdrv ipv6 fuse > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq cpufreq_ondemand > video output tc1100_wmi sbs sbshc container dock battery ac binfmt_misc > loop ext3 jbd dm_mirror sr_mod dm_mod pata_amd ata_generic sata_sil > usbmouse usbhid ff_memless floppy usblp powernow_k8 freq_table > snd_pcm_oss snd_mixer_oss snd_mpu401 snd_mpu401_uart snd_rawmidi ns558 > gameport parport_pc snd_seq_device parport rtc_cmos pcspkr snd_intel8x0 > k8temp snd_ac97_codec ohci1394 ac97_bus ieee1394 snd_pcm snd_timer skge > ohci_hcd ehci_hcd snd soundcore usbcore forcedeth snd_page_alloc ssb fan > pcmcia pcmcia_core i2c_nforce2 i2c_core button thermal processor sg > evdev genrtc xfs scsi_wait_scan sd_mod sata_nv libata scsi_mod > Pid: 7564, comm: pan Tainted: P 2.6.24-desktop-0.rc8.2.1mdv #1 > RIP: 0010:[<ffffffff802d5a57>] [<ffffffff802d5a57>] block_write_begin > +0x87/0xe0 > RSP: 0018:ffff81002e9b5ac8 EFLAGS: 00000286 > RAX: ffff81003f5b3248 RBX: 00000000fffffff4 RCX: 0000000000000000 > RDX: ffff81003f5b3248 RSI: 0000000000000000 RDI: ffff81002d1eda18 > RBP: ffff81003f5b3248 R08: 0000000000000000 R09: ffff81002e9b5be0 > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > R13: ffffffff880c28b0 R14: 0000000000001000 R15: 000000003f5b3248 > FS: 00002b6bb3bf7960(0000) GS:ffffffff80589000(0000) > knlGS:00000000f78568d0 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00002b0537cf6000 CR3: 00000000391d2000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Call Trace: > [<ffffffff802d5a4f>] block_write_begin+0x7f/0xe0 > [<ffffffff880c2102>] :xfs:xfs_vm_write_begin+0x22/0x30 > [<ffffffff880c28b0>] :xfs:xfs_get_blocks+0x0/0x10 > [<ffffffff80280ba9>] generic_file_buffered_write+0x149/0x6c0 > [<ffffffff80240a2e>] current_fs_time+0x1e/0x30 > [<ffffffff880caae6>] :xfs:xfs_write+0x676/0x910 > [<ffffffff8027f98d>] find_lock_page+0x3d/0xc0 > [<ffffffff880c69d0>] :xfs:xfs_file_aio_write+0x0/0x50 > [<ffffffff802aee9b>] do_sync_readv_writev+0xcb/0x110 > [<ffffffff8028eb95>] __do_fault+0x1f5/0x420 > [<ffffffff802522b0>] autoremove_wake_function+0x0/0x30 > [<ffffffff80290b90>] handle_mm_fault+0x540/0x800 > [<ffffffff802aecdd>] rw_copy_check_uvector+0x9d/0x150 > [<ffffffff802af5fd>] do_readv_writev+0xfd/0x230 > [<ffffffff802afc33>] sys_writev+0x53/0x90 > [<ffffffff8020c36e>] system_call+0x7e/0x83 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-26 13:46 ` Nick Piggin @ 2008-01-26 14:27 ` Pascal Terjan 2008-01-28 1:49 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Pascal Terjan @ 2008-01-26 14:27 UTC (permalink / raw) To: linux-kernel Nick Piggin <nickpiggin <at> yahoo.com.au> writes: > On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > > I just succeeded to reproduce the problem with this patch. Does this > > smell like an XFS problem? I got the same issue using ext3 > Possible. Though I think it is more likely to be a bug in the > new deadlock avoidance code in the generic buffered write path. > Dang... I wonder why this hasn't come up earlier. It looks like > pan's use of writev might be tickling it. > > How quickly can you reproduce this? When I was using pan daily one month ago, I got it twice over a week > Can you use strace to see what the hanging syscall looks like? I tried last week during 5 hours without luck, I can try again ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-26 14:27 ` Pascal Terjan @ 2008-01-28 1:49 ` Nick Piggin 0 siblings, 0 replies; 19+ messages in thread From: Nick Piggin @ 2008-01-28 1:49 UTC (permalink / raw) To: Pascal Terjan; +Cc: linux-kernel On Sunday 27 January 2008 01:27, Pascal Terjan wrote: > Nick Piggin <nickpiggin <at> yahoo.com.au> writes: > > On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > > > I just succeeded to reproduce the problem with this patch. Does this > > > smell like an XFS problem? > > I got the same issue using ext3 > > > Possible. Though I think it is more likely to be a bug in the > > new deadlock avoidance code in the generic buffered write path. > > Dang... I wonder why this hasn't come up earlier. It looks like > > pan's use of writev might be tickling it. > > > > How quickly can you reproduce this? > > When I was using pan daily one month ago, I got it twice over a week > > > Can you use strace to see what the hanging syscall looks like? > > I tried last week during 5 hours without luck, I can try again Dang, I didn't see any reports of this earlier :( Anyway, I sent a patch to fix it in the original thread (can you reply-to-all please? just it a bit easier to keep threads together) Thanks, Nick ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-26 13:29 ` Frederik Himpe 2008-01-26 13:46 ` Nick Piggin @ 2008-01-28 1:46 ` Nick Piggin 2008-01-28 18:05 ` Frederik Himpe 2008-01-31 22:45 ` Frederik Himpe 1 sibling, 2 replies; 19+ messages in thread From: Nick Piggin @ 2008-01-28 1:46 UTC (permalink / raw) To: Frederik Himpe, Andrew Morton, stable; +Cc: Mike Galbraith, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2137 bytes --] On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > > > reader starts using 100% of CPU time after some time. When this > > > > > happens, kill -9 does not work, and strace just hangs when trying > > > > > to attach to the process. The same with gdb. ps shows the process > > > > > as being in the R state. > > > > > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > > > 0 > > > > Nasty. The attached patch is something really simple that can sometimes > > help. sysrq+p is also an option, if you're on a UP system. > > > > Any luck getting traces? > > I just succeeded to reproduce the problem with this patch. Does this > smell like an XFS problem? > > Jan 26 14:17:43 Anastacia kernel: pan R running task 0 > 7564 1 Jan 26 14:17:43 Anastacia kernel: 000000003f5b3248 > 0000000000001000 ffffffff880c28b0 0000000000000000 Jan 26 14:17:43 > Anastacia kernel: ffff81003f5b3248 ffff81002d1ed900 000000002d1ed900 > 0000000000000000 Jan 26 14:17:43 Anastacia kernel: ffff810016050dd0 > fffff000fffff000 0000000000000000 ffff81002d1eda10 Jan 26 14:17:43 > Anastacia kernel: Call Trace: > Jan 26 14:17:43 Anastacia kernel: [_end+127964408/2129947720] > :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: > [unix_poll+0/176] unix_poll+0x0/0xb0 Jan 26 14:17:43 Anastacia kernel: > [_end+127964408/2129947720] :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 > Anastacia kernel: [iov_iter_copy_from_user_atomic+65/160] > iov_iter_copy_from_user_atomic+0x41/0xa0 Jan 26 14:17:43 Anastacia kernel: > [iov_iter_copy_from_user_atomic+46/160] > iov_iter_copy_from_user_atomic+0x2e/0xa0 Jan 26 14:17:43 Anastacia kernel: > [generic_file_buffered_write+383/1728] Well after trying a lot of writev combinations, I've reproduced a hang *hangs head*. Does this help? [-- Attachment #2: mm-zerolen-iov-fix.patch --] [-- Type: text/x-diff, Size: 1681 bytes --] Zero length iovecs can go into an infinite loop in writev, because the iovec iterator does not always advance over them. The sequence required to trigger this is not trivial. I think it requires that a zero-length iovec be followed by a non-zero-length iovec which causes a pagefault in the atomic usercopy. This causes the writev code to drop back into single-segment copy mode, which then tries to copy the 0 bytes of the zero-length iovec; a zero length copy looks like a failure though, so it loops. Put a test into iov_iter_advance to catch zero-length iovecs. We could just put the test in the fallback path, but I feel it is more robust to skip over zero-length iovecs throughout the code (iovec iterator may be used in filesystems too, so it should be robust). Signed-off-by: Nick Piggin <npiggin@suse.de> --- Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1733,7 +1733,11 @@ static void __iov_iter_advance_iov(struc const struct iovec *iov = i->iov; size_t base = i->iov_offset; - while (bytes) { + /* + * The !iov->iov_len check ensures we skip over unlikely + * zero-length segments. + */ + while (bytes || !iov->iov_len) { int copy = min(bytes, iov->iov_len - base); bytes -= copy; @@ -2251,6 +2255,7 @@ again: cond_resched(); + iov_iter_advance(i, copied); if (unlikely(copied == 0)) { /* * If we were unable to copy any data at all, we must @@ -2264,7 +2269,6 @@ again: iov_iter_single_seg_count(i)); goto again; } - iov_iter_advance(i, copied); pos += copied; written += copied; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-28 1:46 ` Nick Piggin @ 2008-01-28 18:05 ` Frederik Himpe 2008-01-31 22:45 ` Frederik Himpe 1 sibling, 0 replies; 19+ messages in thread From: Frederik Himpe @ 2008-01-28 18:05 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, stable, Mike Galbraith, linux-kernel On ma, 2008-01-28 at 12:46 +1100, Nick Piggin wrote: > On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > > On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > > > > reader starts using 100% of CPU time after some time. When this > > > > > > happens, kill -9 does not work, and strace just hangs when trying > > > > > > to attach to the process. The same with gdb. ps shows the process > > > > > > as being in the R state. > > > > > > > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > > > > 0 > > > > > > Nasty. The attached patch is something really simple that can sometimes > > > help. sysrq+p is also an option, if you're on a UP system. > > > > > > Any luck getting traces? > > > > I just succeeded to reproduce the problem with this patch. Does this > > smell like an XFS problem? > > > > Jan 26 14:17:43 Anastacia kernel: pan R running task 0 > > 7564 1 Jan 26 14:17:43 Anastacia kernel: 000000003f5b3248 > > 0000000000001000 ffffffff880c28b0 0000000000000000 Jan 26 14:17:43 > > Anastacia kernel: ffff81003f5b3248 ffff81002d1ed900 000000002d1ed900 > > 0000000000000000 Jan 26 14:17:43 Anastacia kernel: ffff810016050dd0 > > fffff000fffff000 0000000000000000 ffff81002d1eda10 Jan 26 14:17:43 > > Anastacia kernel: Call Trace: > > Jan 26 14:17:43 Anastacia kernel: [_end+127964408/2129947720] > > :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 Anastacia kernel: > > [unix_poll+0/176] unix_poll+0x0/0xb0 Jan 26 14:17:43 Anastacia kernel: > > [_end+127964408/2129947720] :xfs:xfs_get_blocks+0x0/0x10 Jan 26 14:17:43 > > Anastacia kernel: [iov_iter_copy_from_user_atomic+65/160] > > iov_iter_copy_from_user_atomic+0x41/0xa0 Jan 26 14:17:43 Anastacia kernel: > > [iov_iter_copy_from_user_atomic+46/160] > > iov_iter_copy_from_user_atomic+0x2e/0xa0 Jan 26 14:17:43 Anastacia kernel: > > [generic_file_buffered_write+383/1728] > > Well after trying a lot of writev combinations, I've reproduced a hang > *hangs head*. > > Does this help? I'm currently running with this patch. The problem happens about two times a week, so it will take a few days to come to a conclusion whether it is fixed. I'll let you all know. Thanks for the patch! -- Frederik Himpe <fhimpe@telenet.be> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-28 1:46 ` Nick Piggin 2008-01-28 18:05 ` Frederik Himpe @ 2008-01-31 22:45 ` Frederik Himpe 2008-02-02 0:53 ` Nick Piggin 1 sibling, 1 reply; 19+ messages in thread From: Frederik Himpe @ 2008-01-31 22:45 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, stable, Mike Galbraith, linux-kernel On ma, 2008-01-28 at 12:46 +1100, Nick Piggin wrote: > On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > > On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > > > > reader starts using 100% of CPU time after some time. When this > > > > > > happens, kill -9 does not work, and strace just hangs when trying > > > > > > to attach to the process. The same with gdb. ps shows the process > > > > > > as being in the R state. > Well after trying a lot of writev combinations, I've reproduced a hang > *hangs head*. > > Does this help? Just to confirm: in four days of testing, I haven't seen the problem anymore, so it looks like this was indeed the right fix. Thanks! -- Frederik Himpe <fhimpe@telenet.be> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-31 22:45 ` Frederik Himpe @ 2008-02-02 0:53 ` Nick Piggin 0 siblings, 0 replies; 19+ messages in thread From: Nick Piggin @ 2008-02-02 0:53 UTC (permalink / raw) To: Frederik Himpe; +Cc: Andrew Morton, stable, Mike Galbraith, linux-kernel On Friday 01 February 2008 09:45, Frederik Himpe wrote: > On ma, 2008-01-28 at 12:46 +1100, Nick Piggin wrote: > > On Sunday 27 January 2008 00:29, Frederik Himpe wrote: > > > On di, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > > > > > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > > > > > With Linux 2.6.24-rc8 I often have the problem that the pan > > > > > > > usenet reader starts using 100% of CPU time after some time. > > > > > > > When this happens, kill -9 does not work, and strace just hangs > > > > > > > when trying to attach to the process. The same with gdb. ps > > > > > > > shows the process as being in the R state. > > > > Well after trying a lot of writev combinations, I've reproduced a hang > > *hangs head*. > > > > Does this help? > > Just to confirm: in four days of testing, I haven't seen the problem > anymore, so it looks like this was indeed the right fix. Thanks very much for reporting and testing. This patch needs to go into 2.6.24.stable and upstream. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 0:05 ` Nick Piggin 2008-01-22 5:03 ` Mike Galbraith @ 2008-01-22 10:37 ` Ingo Molnar 2008-01-22 23:00 ` Nick Piggin 1 sibling, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2008-01-22 10:37 UTC (permalink / raw) To: Nick Piggin; +Cc: Frederik Himpe, linux-kernel * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > reader starts using 100% of CPU time after some time. When this happens, > > kill -9 does not work, and strace just hangs when trying to attach to > > the process. The same with gdb. ps shows the process as being in the R > > state. > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > Jan 21 21:45:01 Anastacia kernel: pan R running task 0 > > Well I've twice tried to submit a patch to print stacks for running > tasks as well, but nobody seems interested. It would at least give a > chance to see something. i definitely remembering having done this myself a couple of times (it makes tons of sense to get _some_ info out of the system) but some problem in -mm kept reverting it. I dont remember the specifics ... it was some race. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: 2.6.24 regression: pan hanging unkilleable and un-straceable 2008-01-22 10:37 ` Ingo Molnar @ 2008-01-22 23:00 ` Nick Piggin 0 siblings, 0 replies; 19+ messages in thread From: Nick Piggin @ 2008-01-22 23:00 UTC (permalink / raw) To: Ingo Molnar; +Cc: Frederik Himpe, linux-kernel On Tuesday 22 January 2008 21:37, Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > Well I've twice tried to submit a patch to print stacks for running > > tasks as well, but nobody seems interested. It would at least give a > > chance to see something. > > i definitely remembering having done this myself a couple of times (it > makes tons of sense to get _some_ info out of the system) but some > problem in -mm kept reverting it. I dont remember the specifics ... it > was some race. Hmm, that's not unlikely. But there is nothing in the backtrace code which prevents a task from being woken up anyway, is there? I guess it will be more common now, but if we find a race we can try to fix the root cause. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-02-04 23:03 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-21 20:58 2.6.24 regression: pan hanging unkilleable and un-straceable Frederik Himpe 2008-01-22 0:05 ` Nick Piggin 2008-01-22 5:03 ` Mike Galbraith 2008-01-22 5:25 ` Nick Piggin 2008-01-22 5:47 ` Mike Galbraith 2008-02-04 14:49 ` Mike Galbraith 2008-02-04 23:02 ` Nick Piggin 2008-01-22 10:38 ` Ingo Molnar 2008-01-24 5:30 ` Valdis.Kletnieks 2008-01-26 13:29 ` Frederik Himpe 2008-01-26 13:46 ` Nick Piggin 2008-01-26 14:27 ` Pascal Terjan 2008-01-28 1:49 ` Nick Piggin 2008-01-28 1:46 ` Nick Piggin 2008-01-28 18:05 ` Frederik Himpe 2008-01-31 22:45 ` Frederik Himpe 2008-02-02 0:53 ` Nick Piggin 2008-01-22 10:37 ` Ingo Molnar 2008-01-22 23:00 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).