LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: 2.6.21-mm1: many processes end up in D state
@ 2007-05-04 19:19 Mikael Pettersson
  2007-05-05 15:30 ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Mikael Pettersson @ 2007-05-04 19:19 UTC (permalink / raw)
  To: jirislaby; +Cc: akpm, jgarzik, linux-ide, linux-kernel, mikpe, stable

On Fri, 04 May 2007 17:02:10 +0200, Jiri Slaby wrote:
>>>> I have a problem with higher disk loads (e.g. running git-log or yum update).
>>>> Many processes end up in D state and system is unusable -- I'm not able to run
>>>> anything but smooth mouse moving when this happens.
...(boring stack dumps deleted)
>> causes this? When I change this:
>> diff --git a/drivers/ata/sata_promise.c b/drivers/ata/sata_promise.c
>> index f56549b..a0024ae 100644
>> --- a/drivers/ata/sata_promise.c
>> +++ b/drivers/ata/sata_promise.c
>> @@ -668,10 +668,8 @@ static inline unsigned int pdc_host_intr( struct ata_port *ap,
>>         else
>>                 err_mask &= ~PDC2_ERR_MASK;
>>         port_status = readl(port_mmio + PDC_GLOBAL_CTL);
>> -       if (unlikely(port_status & err_mask)) {
>> +       if (unlikely(port_status & err_mask))
>>                 pdc_error_intr(ap, qc, port_status, err_mask);
>> -               return 1;
>> -       }
>> 
>>         switch (qc->tf.protocol) {
>>         case ATA_PROT_DMA:
>> the problem disappears.
>Actually changes:
>ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>ata6.00: (port_status 0x20080000)
>ata6.00: cmd c8/00:08:03:d3:61/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
>         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x6 (timeout)
>ata6: soft resetting port
>ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>ata6.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
>ata6.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
>ata6.00: configured for UDMA/133
>ata6: EH complete
>sd 5:0:0:0: [sdc] 156301488 512-byte hardware sectors (80026 MB)
>sd 5:0:0:0: [sdc] Write Protect is off
>sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO
>or FUA
>
>
>to
>ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
>ata5.00: (port_status 0x20080000)
>ata5.00: cmd c8/00:08:bf:cd:4b/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
>         res 50/00:00:c6:cd:4b/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
>ata5: soft resetting port
>ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>ata5.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
>ata5.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
>ata5.00: configured for UDMA/133
>ata5: EH complete
>sd 4:0:0:0: [sdb] 156301488 512-byte hardware sectors (80026 MB)
>sd 4:0:0:0: [sdb] Write Protect is off
>sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
>sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO
>or FUA
>
>
>I.e. no freezing of ports...

Your patch to delete the 'return 1;' on error is correct,
and makes the code match exactly the behaviour of previous
versions of sata_promise, except for the additional error
decoding.

ahci and sata_sil24 do the return in this situation. I don't
yet understand why they can get away with it while sata_promise
cannot, but for now the return should be removed.

/Mikael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-05-04 19:19 2.6.21-mm1: many processes end up in D state Mikael Pettersson
@ 2007-05-05 15:30 ` Tejun Heo
  0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2007-05-05 15:30 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: jirislaby, akpm, jgarzik, linux-ide, linux-kernel, stable

Mikael Pettersson wrote:
>> I.e. no freezing of ports...
> 
> Your patch to delete the 'return 1;' on error is correct,
> and makes the code match exactly the behaviour of previous
> versions of sata_promise, except for the additional error
> decoding.
> 
> ahci and sata_sil24 do the return in this situation. I don't
> yet understand why they can get away with it while sata_promise
> cannot, but for now the return should be removed.

That's because sata_sil24 and ahci call either ata_port_abort() or
ata_port_freeze() prior to finishing error_intr routine.  Both functions
abort all in-flight command and schedule EH.

--
tejun


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-05-05 23:46 Mikael Pettersson
@ 2007-05-07  9:52 ` Jiri Slaby
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Slaby @ 2007-05-07  9:52 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: htejun, akpm, jgarzik, linux-ide, linux-kernel, stable

Mikael Pettersson napsal(a):
> Jiri: please test the patch below instead. That is, revert to
> the original code _with_ the 'return 1;', and then add this
> patch to it. It should have pretty much the same effect as
> removing the 'return 1;', however calling ata_port_abort()
> is more in line with libata's new-style error handling.
> 
> /Mikael
> 
> --- linux-2.6.21-mm1/drivers/ata/sata_promise.c.~1~	2007-05-05 22:24:41.000000000 +0200
> +++ linux-2.6.21-mm1/drivers/ata/sata_promise.c	2007-05-05 22:25:21.000000000 +0200
> @@ -653,6 +653,8 @@ static void pdc_error_intr(struct ata_po
>  	qc->err_mask |= ac_err_mask;
>  
>  	pdc_reset_port(ap);
> +
> +	ata_port_abort(ap);
>  }
>  
>  static inline unsigned int pdc_host_intr( struct ata_port *ap,
> 

Yes, no freezing anymore.

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
@ 2007-05-05 23:46 Mikael Pettersson
  2007-05-07  9:52 ` Jiri Slaby
  0 siblings, 1 reply; 12+ messages in thread
From: Mikael Pettersson @ 2007-05-05 23:46 UTC (permalink / raw)
  To: htejun, mikpe; +Cc: akpm, jgarzik, jirislaby, linux-ide, linux-kernel, stable

On Sat, 05 May 2007 17:30:51 +0200, Tejun Heo wrote:
> Mikael Pettersson wrote:
> >> I.e. no freezing of ports...
> > 
> > Your patch to delete the 'return 1;' on error is correct,
> > and makes the code match exactly the behaviour of previous
> > versions of sata_promise, except for the additional error
> > decoding.
> > 
> > ahci and sata_sil24 do the return in this situation. I don't
> > yet understand why they can get away with it while sata_promise
> > cannot, but for now the return should be removed.
> 
> That's because sata_sil24 and ahci call either ata_port_abort() or
> ata_port_freeze() prior to finishing error_intr routine.  Both functions
> abort all in-flight command and schedule EH.

Ah. Thanks, that clarifies things.

Jiri: please test the patch below instead. That is, revert to
the original code _with_ the 'return 1;', and then add this
patch to it. It should have pretty much the same effect as
removing the 'return 1;', however calling ata_port_abort()
is more in line with libata's new-style error handling.

/Mikael

--- linux-2.6.21-mm1/drivers/ata/sata_promise.c.~1~	2007-05-05 22:24:41.000000000 +0200
+++ linux-2.6.21-mm1/drivers/ata/sata_promise.c	2007-05-05 22:25:21.000000000 +0200
@@ -653,6 +653,8 @@ static void pdc_error_intr(struct ata_po
 	qc->err_mask |= ac_err_mask;
 
 	pdc_reset_port(ap);
+
+	ata_port_abort(ap);
 }
 
 static inline unsigned int pdc_host_intr( struct ata_port *ap,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-05-04 14:58   ` Jiri Slaby
@ 2007-05-04 15:02     ` Jiri Slaby
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Slaby @ 2007-05-04 15:02 UTC (permalink / raw)
  Cc: stable, Andrew Morton, Linux Kernel Mailing List,
	Mikael Pettersson, linux-ide, Jeff Garzik



Jiri Slaby napsal(a):
> Cc: Mikael Pettersson <mikpe@it.uu.se>
> Cc: linux-ide@vger.kernel.org
> Cc: Jeff Garzik <jgarzik@pobox.com>
> 
> Jiri Slaby napsal(a):
>> Cc: stable@kernel.org
>>
>> Jiri Slaby napsal(a):
>>> Hi,
>>>
>>> I have a problem with higher disk loads (e.g. running git-log or yum update).
>>> Many processes end up in D state and system is unusable -- I'm not able to run
>>> anything but smooth mouse moving when this happens.
>>>
>>> If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
>>> also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
>>> fine, but I'm uncertain. If it is important, let me know to re-test.
>>>
>>> sysrq-t:
>>>
>>> yum           D 0000004C     0  2976   2955 (NOTLB)
>>>        c455bd00 00200082 8c68ea9e 0000004c 00000000 00000000 c455bcdc c455bc9c
>>>        c455bc9c c2d6a580 8c68ea9e 0000004c c455bcc0 c2d6a694 c1809980 00000000
>>>        c455bcd0 c4640c80 000036a0 00000001 c455bcf0 c02a7132 c3682164 00001cde
>>> Call Trace:
>>>  [<c037834c>] io_schedule+0x21/0x2e
>>>  [<c014d505>] sync_page+0x38/0x43
>>>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>>>  [<c014d4b6>] __lock_page+0x54/0x5c
>>>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>>  [<c016cada>] vfs_read+0x89/0x11d
>>>  [<c016d02c>] sys_pread64+0x64/0x68
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<47176012>] 0x47176012
>>>  =======================
>>> xterm         S 00000051     0  2980      1 (NOTLB)
>>>        c4678ae0 00000082 b2fbf0ce 00000051 00000000 00000000 00000000 00000000
>>>        c4678b30 c469d540 b2fbf0ce 00000051 c4678ab0 c469d654 c1813980 00000001
>>>        c4678ad0 c47f0e40 c4678b00 00000286 c4678ae0 c012c077 00000001 00003286
>>> Call Trace:
>>>  [<c03784ad>] schedule_timeout+0x44/0xa4
>>>  [<c0177ee3>] do_select+0x46d/0x55b
>>>  [<c017816f>] core_sys_select+0x19e/0x2c0
>>>  [<c01786f3>] sys_select+0xd3/0x18d
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<47180f98>] 0x47180f98
>>>  =======================
>>> bash          S 0000004A     0  2982   2980 (NOTLB)
>>>        c50b3e80 00000086 ccf4cc7e 0000004a 00000000 c021bed9 00000000 c45c0030
>>>        c50b3e30 c45c0030 ccf4cc7e 0000004a c50b3e50 c45c0144 c1809980 00000000
>>>        c1809e44 c4517900 00000000 000000b3 c50b3e90 c014f841 c477cb40 c2d79000
>>> Call Trace:
>>>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>>>  [<c0221960>] read_chan+0x1b9/0x5b9
>>>  [<c021e48f>] tty_read+0x75/0xaa
>>>  [<c016cada>] vfs_read+0x89/0x11d
>>>  [<c016cf3d>] sys_read+0x3d/0x64
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<4717883e>] 0x4717883e
>>>  =======================
>>> metacity      S 00000050     0  3005   2429 (NOTLB)
>>>        c2fd5bb0 00000086 5205656d 00000050 00000000 00000000 00000000 c441c51c
>>>        00000246 c45c0540 5205656d 00000050 c2fd5b80 c45c0654 c1809980 00000000
>>>        c1813e44 c27f4900 00002cb8 00000246 c2fd5ba0 c0135a80 c2fd5ba8 00002cb2
>>> Call Trace:
>>>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>>>  [<c0177905>] do_sys_poll+0x33f/0x46d
>>>  [<c0177a74>] sys_poll+0x41/0x43
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<4717e3a6>] 0x4717e3a6
>>>  =======================
>>> ccpd          D 0000004C     0  3009   2010 (NOTLB)
>>>        c50abbb0 00000082 f08ef585 0000004c 00000000 00000000 c180abc0 c22fc788
>>>        c50abba0 c505d070 f08ef585 0000004c 00000000 c505d184 c1809980 00000000
>>>        c21167f0 c2fc6740 c5794e3c c31eff74 c5794310 c22fc788 00000002 00001e64
>>> Call Trace:
>>>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>>>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>>>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>>>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>>>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>>>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>>>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>>>  [<c017eacc>] touch_atime+0x84/0xd8
>>>  [<c014dd08>] do_generic_mapping_read+0x463/0x57e
>>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>>  [<c016cada>] vfs_read+0x89/0x11d
>>>  [<c016fc6e>] kernel_read+0x36/0x48
>>>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>>>  [<c0171322>] do_execve+0xe6/0x1e1
>>>  [<c0102715>] sys_execve+0x2e/0x7f
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<471474da>] 0x471474da
>>>  =======================
>>> bash          D 0000004D     0  3011   2859 (NOTLB)
>>>        c471dc90 00000082 08f8373c 0000004d 00000000 00000000 c471dc30 c471dc2c
>>>        c471dc2c c4670540 08f8373c 0000004d c471dc50 c4670654 c1809980 00000000
>>>        c471dc60 c2f3f040 00000005 00000001 c471dc80 c02a7132 c591af64 00001eea
>>> Call Trace:
>>>  [<c037834c>] io_schedule+0x21/0x2e
>>>  [<c014d505>] sync_page+0x38/0x43
>>>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>>>  [<c014d4b6>] __lock_page+0x54/0x5c
>>>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>>  [<c016cada>] vfs_read+0x89/0x11d
>>>  [<c016fc6e>] kernel_read+0x36/0x48
>>>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>>>  [<c0171322>] do_execve+0xe6/0x1e1
>>>  [<c0102715>] sys_execve+0x2e/0x7f
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<471474da>] 0x471474da
>>>  =======================
>>> bash          D 0000004D     0  3012   2884 (NOTLB)
>>>        c50d5a80 00000082 a21bfc35 0000004d 00000000 00000000 c180abc0 c2033dd4
>>>        c50d5a70 c2f4f070 a21bfc35 0000004d 00000000 c2f4f184 c1809980 00000000
>>>        c21167bc c47f0c80 c2033310 c22fc788 c5794e3c c2033dd4 00000002 0000216d
>>> Call Trace:
>>>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>>>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>>>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>>>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>>>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>>>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>>>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>>>  [<c017eacc>] touch_atime+0x84/0xd8
>>>  [<c017499a>] __link_path_walk+0x893/0xca4
>>>  [<c0174df1>] link_path_walk+0x46/0xc3
>>>  [<c0175038>] do_path_lookup+0x86/0x1b0
>>>  [<c0175ad0>] __path_lookup_intent_open+0x44/0x7f
>>>  [<c0175b83>] path_lookup_open+0x21/0x27
>>>  [<c016fd86>] open_exec+0x27/0xa2
>>>  [<c01993a3>] load_elf_binary+0x1482/0x1acf
>>>  [<c016fa1d>] search_binary_handler+0x6e/0x19f
>>>  [<c0171386>] do_execve+0x14a/0x1e1
>>>  [<c0102715>] sys_execve+0x2e/0x7f
>>>  [<c0103ffc>] syscall_call+0x7/0xb
>>>  [<471474da>] 0x471474da
>>>  =======================
>>>
>>> Note that yum works on lvm on raid0 and git too, but on the another md volume.
>>> Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
>>> Using noop is no change (but seems to be harder to reproduce with it). I figured
>>> out that it probably happens when 2+ processes are on both "processors" (HT on
>>> P4) and are IO wait (multiload-applet shows red above the half).
>>>
>>> Swap usage is 0 all the time.
>> I'm able to reproduce this in the latest git (dc87c3985e9b442c). Going to bisect.
> 
> Is it possible, that
> Commit: 176efb054422bffe6b5a34194ffad134366c7f7e
> Author: Mikael Pettersson <mikpe@it.uu.se> Wed, 14 Mar 2007 09:51:35 +0100
> 
>     sata_promise: decode and report error reasons
> 
>     This patch adds much needed error reason decoding and
>     reporting to sata_promise. It's simplistic but should
>     log all relevant error info the controller provides.
> 
>     Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
>     Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> causes this? When I change this:
> diff --git a/drivers/ata/sata_promise.c b/drivers/ata/sata_promise.c
> index f56549b..a0024ae 100644
> --- a/drivers/ata/sata_promise.c
> +++ b/drivers/ata/sata_promise.c
> @@ -668,10 +668,8 @@ static inline unsigned int pdc_host_intr( struct ata_port *ap,
>         else
>                 err_mask &= ~PDC2_ERR_MASK;
>         port_status = readl(port_mmio + PDC_GLOBAL_CTL);
> -       if (unlikely(port_status & err_mask)) {
> +       if (unlikely(port_status & err_mask))
>                 pdc_error_intr(ap, qc, port_status, err_mask);
> -               return 1;
> -       }
> 
>         switch (qc->tf.protocol) {
>         case ATA_PROT_DMA:
> the problem disappears.
Actually changes:
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata6.00: (port_status 0x20080000)
ata6.00: cmd c8/00:08:03:d3:61/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x6 (timeout)
ata6: soft resetting port
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata6.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata6.00: configured for UDMA/133
ata6: EH complete
sd 5:0:0:0: [sdc] 156301488 512-byte hardware sectors (80026 MB)
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO
or FUA


to
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata5.00: (port_status 0x20080000)
ata5.00: cmd c8/00:08:bf:cd:4b/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
         res 50/00:00:c6:cd:4b/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
ata5: soft resetting port
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata5.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata5.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata5.00: configured for UDMA/133
ata5: EH complete
sd 4:0:0:0: [sdb] 156301488 512-byte hardware sectors (80026 MB)
sd 4:0:0:0: [sdb] Write Protect is off
sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO
or FUA


I.e. no freezing of ports...


regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-05-01 10:06 ` Jiri Slaby
@ 2007-05-04 14:58   ` Jiri Slaby
  2007-05-04 15:02     ` Jiri Slaby
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Slaby @ 2007-05-04 14:58 UTC (permalink / raw)
  Cc: stable, Andrew Morton, Linux Kernel Mailing List,
	Mikael Pettersson, linux-ide, Jeff Garzik

Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: linux-ide@vger.kernel.org
Cc: Jeff Garzik <jgarzik@pobox.com>

Jiri Slaby napsal(a):
> Cc: stable@kernel.org
> 
> Jiri Slaby napsal(a):
>> Hi,
>>
>> I have a problem with higher disk loads (e.g. running git-log or yum update).
>> Many processes end up in D state and system is unusable -- I'm not able to run
>> anything but smooth mouse moving when this happens.
>>
>> If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
>> also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
>> fine, but I'm uncertain. If it is important, let me know to re-test.
>>
>> sysrq-t:
>>
>> yum           D 0000004C     0  2976   2955 (NOTLB)
>>        c455bd00 00200082 8c68ea9e 0000004c 00000000 00000000 c455bcdc c455bc9c
>>        c455bc9c c2d6a580 8c68ea9e 0000004c c455bcc0 c2d6a694 c1809980 00000000
>>        c455bcd0 c4640c80 000036a0 00000001 c455bcf0 c02a7132 c3682164 00001cde
>> Call Trace:
>>  [<c037834c>] io_schedule+0x21/0x2e
>>  [<c014d505>] sync_page+0x38/0x43
>>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>>  [<c014d4b6>] __lock_page+0x54/0x5c
>>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>  [<c016cada>] vfs_read+0x89/0x11d
>>  [<c016d02c>] sys_pread64+0x64/0x68
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<47176012>] 0x47176012
>>  =======================
>> xterm         S 00000051     0  2980      1 (NOTLB)
>>        c4678ae0 00000082 b2fbf0ce 00000051 00000000 00000000 00000000 00000000
>>        c4678b30 c469d540 b2fbf0ce 00000051 c4678ab0 c469d654 c1813980 00000001
>>        c4678ad0 c47f0e40 c4678b00 00000286 c4678ae0 c012c077 00000001 00003286
>> Call Trace:
>>  [<c03784ad>] schedule_timeout+0x44/0xa4
>>  [<c0177ee3>] do_select+0x46d/0x55b
>>  [<c017816f>] core_sys_select+0x19e/0x2c0
>>  [<c01786f3>] sys_select+0xd3/0x18d
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<47180f98>] 0x47180f98
>>  =======================
>> bash          S 0000004A     0  2982   2980 (NOTLB)
>>        c50b3e80 00000086 ccf4cc7e 0000004a 00000000 c021bed9 00000000 c45c0030
>>        c50b3e30 c45c0030 ccf4cc7e 0000004a c50b3e50 c45c0144 c1809980 00000000
>>        c1809e44 c4517900 00000000 000000b3 c50b3e90 c014f841 c477cb40 c2d79000
>> Call Trace:
>>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>>  [<c0221960>] read_chan+0x1b9/0x5b9
>>  [<c021e48f>] tty_read+0x75/0xaa
>>  [<c016cada>] vfs_read+0x89/0x11d
>>  [<c016cf3d>] sys_read+0x3d/0x64
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<4717883e>] 0x4717883e
>>  =======================
>> metacity      S 00000050     0  3005   2429 (NOTLB)
>>        c2fd5bb0 00000086 5205656d 00000050 00000000 00000000 00000000 c441c51c
>>        00000246 c45c0540 5205656d 00000050 c2fd5b80 c45c0654 c1809980 00000000
>>        c1813e44 c27f4900 00002cb8 00000246 c2fd5ba0 c0135a80 c2fd5ba8 00002cb2
>> Call Trace:
>>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>>  [<c0177905>] do_sys_poll+0x33f/0x46d
>>  [<c0177a74>] sys_poll+0x41/0x43
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<4717e3a6>] 0x4717e3a6
>>  =======================
>> ccpd          D 0000004C     0  3009   2010 (NOTLB)
>>        c50abbb0 00000082 f08ef585 0000004c 00000000 00000000 c180abc0 c22fc788
>>        c50abba0 c505d070 f08ef585 0000004c 00000000 c505d184 c1809980 00000000
>>        c21167f0 c2fc6740 c5794e3c c31eff74 c5794310 c22fc788 00000002 00001e64
>> Call Trace:
>>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>>  [<c017eacc>] touch_atime+0x84/0xd8
>>  [<c014dd08>] do_generic_mapping_read+0x463/0x57e
>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>  [<c016cada>] vfs_read+0x89/0x11d
>>  [<c016fc6e>] kernel_read+0x36/0x48
>>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>>  [<c0171322>] do_execve+0xe6/0x1e1
>>  [<c0102715>] sys_execve+0x2e/0x7f
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<471474da>] 0x471474da
>>  =======================
>> bash          D 0000004D     0  3011   2859 (NOTLB)
>>        c471dc90 00000082 08f8373c 0000004d 00000000 00000000 c471dc30 c471dc2c
>>        c471dc2c c4670540 08f8373c 0000004d c471dc50 c4670654 c1809980 00000000
>>        c471dc60 c2f3f040 00000005 00000001 c471dc80 c02a7132 c591af64 00001eea
>> Call Trace:
>>  [<c037834c>] io_schedule+0x21/0x2e
>>  [<c014d505>] sync_page+0x38/0x43
>>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>>  [<c014d4b6>] __lock_page+0x54/0x5c
>>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>>  [<c016c31c>] do_sync_read+0xd0/0x106
>>  [<c016cada>] vfs_read+0x89/0x11d
>>  [<c016fc6e>] kernel_read+0x36/0x48
>>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>>  [<c0171322>] do_execve+0xe6/0x1e1
>>  [<c0102715>] sys_execve+0x2e/0x7f
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<471474da>] 0x471474da
>>  =======================
>> bash          D 0000004D     0  3012   2884 (NOTLB)
>>        c50d5a80 00000082 a21bfc35 0000004d 00000000 00000000 c180abc0 c2033dd4
>>        c50d5a70 c2f4f070 a21bfc35 0000004d 00000000 c2f4f184 c1809980 00000000
>>        c21167bc c47f0c80 c2033310 c22fc788 c5794e3c c2033dd4 00000002 0000216d
>> Call Trace:
>>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>>  [<c017eacc>] touch_atime+0x84/0xd8
>>  [<c017499a>] __link_path_walk+0x893/0xca4
>>  [<c0174df1>] link_path_walk+0x46/0xc3
>>  [<c0175038>] do_path_lookup+0x86/0x1b0
>>  [<c0175ad0>] __path_lookup_intent_open+0x44/0x7f
>>  [<c0175b83>] path_lookup_open+0x21/0x27
>>  [<c016fd86>] open_exec+0x27/0xa2
>>  [<c01993a3>] load_elf_binary+0x1482/0x1acf
>>  [<c016fa1d>] search_binary_handler+0x6e/0x19f
>>  [<c0171386>] do_execve+0x14a/0x1e1
>>  [<c0102715>] sys_execve+0x2e/0x7f
>>  [<c0103ffc>] syscall_call+0x7/0xb
>>  [<471474da>] 0x471474da
>>  =======================
>>
>> Note that yum works on lvm on raid0 and git too, but on the another md volume.
>> Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
>> Using noop is no change (but seems to be harder to reproduce with it). I figured
>> out that it probably happens when 2+ processes are on both "processors" (HT on
>> P4) and are IO wait (multiload-applet shows red above the half).
>>
>> Swap usage is 0 all the time.
> 
> I'm able to reproduce this in the latest git (dc87c3985e9b442c). Going to bisect.

Is it possible, that
Commit: 176efb054422bffe6b5a34194ffad134366c7f7e
Author: Mikael Pettersson <mikpe@it.uu.se> Wed, 14 Mar 2007 09:51:35 +0100

    sata_promise: decode and report error reasons

    This patch adds much needed error reason decoding and
    reporting to sata_promise. It's simplistic but should
    log all relevant error info the controller provides.

    Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

causes this? When I change this:
diff --git a/drivers/ata/sata_promise.c b/drivers/ata/sata_promise.c
index f56549b..a0024ae 100644
--- a/drivers/ata/sata_promise.c
+++ b/drivers/ata/sata_promise.c
@@ -668,10 +668,8 @@ static inline unsigned int pdc_host_intr( struct ata_port *ap,
        else
                err_mask &= ~PDC2_ERR_MASK;
        port_status = readl(port_mmio + PDC_GLOBAL_CTL);
-       if (unlikely(port_status & err_mask)) {
+       if (unlikely(port_status & err_mask))
                pdc_error_intr(ap, qc, port_status, err_mask);
-               return 1;
-       }

        switch (qc->tf.protocol) {
        case ATA_PROT_DMA:
the problem disappears.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-04-30 15:39 Jiri Slaby
  2007-04-30 18:05 ` Andrew Morton
@ 2007-05-01 10:06 ` Jiri Slaby
  2007-05-04 14:58   ` Jiri Slaby
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Slaby @ 2007-05-01 10:06 UTC (permalink / raw)
  To: stable; +Cc: Andrew Morton, Linux Kernel Mailing List

Cc: stable@kernel.org

Jiri Slaby napsal(a):
> Hi,
> 
> I have a problem with higher disk loads (e.g. running git-log or yum update).
> Many processes end up in D state and system is unusable -- I'm not able to run
> anything but smooth mouse moving when this happens.
> 
> If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
> also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
> fine, but I'm uncertain. If it is important, let me know to re-test.
> 
> sysrq-t:
> 
> yum           D 0000004C     0  2976   2955 (NOTLB)
>        c455bd00 00200082 8c68ea9e 0000004c 00000000 00000000 c455bcdc c455bc9c
>        c455bc9c c2d6a580 8c68ea9e 0000004c c455bcc0 c2d6a694 c1809980 00000000
>        c455bcd0 c4640c80 000036a0 00000001 c455bcf0 c02a7132 c3682164 00001cde
> Call Trace:
>  [<c037834c>] io_schedule+0x21/0x2e
>  [<c014d505>] sync_page+0x38/0x43
>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>  [<c014d4b6>] __lock_page+0x54/0x5c
>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>  [<c016c31c>] do_sync_read+0xd0/0x106
>  [<c016cada>] vfs_read+0x89/0x11d
>  [<c016d02c>] sys_pread64+0x64/0x68
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<47176012>] 0x47176012
>  =======================
> xterm         S 00000051     0  2980      1 (NOTLB)
>        c4678ae0 00000082 b2fbf0ce 00000051 00000000 00000000 00000000 00000000
>        c4678b30 c469d540 b2fbf0ce 00000051 c4678ab0 c469d654 c1813980 00000001
>        c4678ad0 c47f0e40 c4678b00 00000286 c4678ae0 c012c077 00000001 00003286
> Call Trace:
>  [<c03784ad>] schedule_timeout+0x44/0xa4
>  [<c0177ee3>] do_select+0x46d/0x55b
>  [<c017816f>] core_sys_select+0x19e/0x2c0
>  [<c01786f3>] sys_select+0xd3/0x18d
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<47180f98>] 0x47180f98
>  =======================
> bash          S 0000004A     0  2982   2980 (NOTLB)
>        c50b3e80 00000086 ccf4cc7e 0000004a 00000000 c021bed9 00000000 c45c0030
>        c50b3e30 c45c0030 ccf4cc7e 0000004a c50b3e50 c45c0144 c1809980 00000000
>        c1809e44 c4517900 00000000 000000b3 c50b3e90 c014f841 c477cb40 c2d79000
> Call Trace:
>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>  [<c0221960>] read_chan+0x1b9/0x5b9
>  [<c021e48f>] tty_read+0x75/0xaa
>  [<c016cada>] vfs_read+0x89/0x11d
>  [<c016cf3d>] sys_read+0x3d/0x64
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<4717883e>] 0x4717883e
>  =======================
> metacity      S 00000050     0  3005   2429 (NOTLB)
>        c2fd5bb0 00000086 5205656d 00000050 00000000 00000000 00000000 c441c51c
>        00000246 c45c0540 5205656d 00000050 c2fd5b80 c45c0654 c1809980 00000000
>        c1813e44 c27f4900 00002cb8 00000246 c2fd5ba0 c0135a80 c2fd5ba8 00002cb2
> Call Trace:
>  [<c03784d3>] schedule_timeout+0x6a/0xa4
>  [<c0177905>] do_sys_poll+0x33f/0x46d
>  [<c0177a74>] sys_poll+0x41/0x43
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<4717e3a6>] 0x4717e3a6
>  =======================
> ccpd          D 0000004C     0  3009   2010 (NOTLB)
>        c50abbb0 00000082 f08ef585 0000004c 00000000 00000000 c180abc0 c22fc788
>        c50abba0 c505d070 f08ef585 0000004c 00000000 c505d184 c1809980 00000000
>        c21167f0 c2fc6740 c5794e3c c31eff74 c5794310 c22fc788 00000002 00001e64
> Call Trace:
>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>  [<c017eacc>] touch_atime+0x84/0xd8
>  [<c014dd08>] do_generic_mapping_read+0x463/0x57e
>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>  [<c016c31c>] do_sync_read+0xd0/0x106
>  [<c016cada>] vfs_read+0x89/0x11d
>  [<c016fc6e>] kernel_read+0x36/0x48
>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>  [<c0171322>] do_execve+0xe6/0x1e1
>  [<c0102715>] sys_execve+0x2e/0x7f
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<471474da>] 0x471474da
>  =======================
> bash          D 0000004D     0  3011   2859 (NOTLB)
>        c471dc90 00000082 08f8373c 0000004d 00000000 00000000 c471dc30 c471dc2c
>        c471dc2c c4670540 08f8373c 0000004d c471dc50 c4670654 c1809980 00000000
>        c471dc60 c2f3f040 00000005 00000001 c471dc80 c02a7132 c591af64 00001eea
> Call Trace:
>  [<c037834c>] io_schedule+0x21/0x2e
>  [<c014d505>] sync_page+0x38/0x43
>  [<c0378581>] __wait_on_bit_lock+0x40/0x63
>  [<c014d4b6>] __lock_page+0x54/0x5c
>  [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
>  [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
>  [<c016c31c>] do_sync_read+0xd0/0x106
>  [<c016cada>] vfs_read+0x89/0x11d
>  [<c016fc6e>] kernel_read+0x36/0x48
>  [<c016fd32>] prepare_binprm+0xb2/0xdf
>  [<c0171322>] do_execve+0xe6/0x1e1
>  [<c0102715>] sys_execve+0x2e/0x7f
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<471474da>] 0x471474da
>  =======================
> bash          D 0000004D     0  3012   2884 (NOTLB)
>        c50d5a80 00000082 a21bfc35 0000004d 00000000 00000000 c180abc0 c2033dd4
>        c50d5a70 c2f4f070 a21bfc35 0000004d 00000000 c2f4f184 c1809980 00000000
>        c21167bc c47f0c80 c2033310 c22fc788 c5794e3c c2033dd4 00000002 0000216d
> Call Trace:
>  [<c01b64d9>] do_get_write_access+0x310/0x4f5
>  [<c01b66d9>] journal_get_write_access+0x1b/0x2a
>  [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
>  [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
>  [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
>  [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
>  [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
>  [<c017eacc>] touch_atime+0x84/0xd8
>  [<c017499a>] __link_path_walk+0x893/0xca4
>  [<c0174df1>] link_path_walk+0x46/0xc3
>  [<c0175038>] do_path_lookup+0x86/0x1b0
>  [<c0175ad0>] __path_lookup_intent_open+0x44/0x7f
>  [<c0175b83>] path_lookup_open+0x21/0x27
>  [<c016fd86>] open_exec+0x27/0xa2
>  [<c01993a3>] load_elf_binary+0x1482/0x1acf
>  [<c016fa1d>] search_binary_handler+0x6e/0x19f
>  [<c0171386>] do_execve+0x14a/0x1e1
>  [<c0102715>] sys_execve+0x2e/0x7f
>  [<c0103ffc>] syscall_call+0x7/0xb
>  [<471474da>] 0x471474da
>  =======================
> 
> Note that yum works on lvm on raid0 and git too, but on the another md volume.
> Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
> Using noop is no change (but seems to be harder to reproduce with it). I figured
> out that it probably happens when 2+ processes are on both "processors" (HT on
> P4) and are IO wait (multiload-applet shows red above the half).
> 
> Swap usage is 0 all the time.

I'm able to reproduce this in the latest git (dc87c3985e9b442c). Going to bisect.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-04-30 18:44     ` Andrew Morton
@ 2007-05-01  8:40       ` Jiri Slaby
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Slaby @ 2007-05-01  8:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List

Andrew Morton napsal(a):
> I would check the anticipatory scheduler as well, please.  I don't know
> what no-op would do with a workload like that, but it probably isn't very
> good.
> 
> You appear to believe that it's related to the CPU scheduler?  

Oh, no. I have really no idea, what may cause this :).

> That's a bit
> unexpected - it sounds more like a VFS/IO thing?  But stranger things have
> happened.

I think so.

> I guess it's time to end the staircase experiment in -mm. 
> http://userweb.kernel.org/~akpm/js.bz2 is my current rollup (against
> 2.6.21) minus staircase and related things.  Pretty please.

Anticipatory + -js is no change. Going to try vanilla and in both cases whether
it works or not, I'll bisect either -mm or vanilla.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-04-30 18:14   ` Jiri Slaby
@ 2007-04-30 18:44     ` Andrew Morton
  2007-05-01  8:40       ` Jiri Slaby
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-04-30 18:44 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Linux Kernel Mailing List

On Mon, 30 Apr 2007 20:14:05 +0200
"Jiri Slaby" <jirislaby@gmail.com> wrote:

> > > I have a problem with higher disk loads (e.g. running git-log or yum update).
> > > Many processes end up in D state and system is unusable -- I'm not able to run
> > > anything but smooth mouse moving when this happens.
> > >
> > > If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
> > > also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
> > > fine, but I'm uncertain. If it is important, let me know to re-test.
> > >
> >
> > It is important, but I doubt if retesting 2.6.21-rc6-mm1 will clarify
> > things a lot.
> >
> > Could you try switching to a different IO scheduler please?  Anticipatory
> > would suit.
> 
> As I wrote below the sysrq-t, switch to noop didn't help, but it seems
> that it's harder to reproduce with that:
> 
> <cite it's_bad_to_write_anything_below_logs="true">
> Note that yum works on lvm on raid0 and git too, but on the another md volume.
> Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
> Using noop is no change (but seems to be harder to reproduce with it). I figured
> out that it probably happens when 2+ processes are on both "processors" (HT on
> P4) and are IO wait (multiload-applet shows red above the half).
> 
> Swap usage is 0 all the time.
> </cite>

My comprehension skills on Monday morning are even less than usual ;)

I would check the anticipatory scheduler as well, please.  I don't know
what no-op would do with a workload like that, but it probably isn't very
good.

You appear to believe that it's related to the CPU scheduler?  That's a bit
unexpected - it sounds more like a VFS/IO thing?  But stranger things have
happened.

I guess it's time to end the staircase experiment in -mm. 
http://userweb.kernel.org/~akpm/js.bz2 is my current rollup (against
2.6.21) minus staircase and related things.  Pretty please.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-04-30 18:05 ` Andrew Morton
@ 2007-04-30 18:14   ` Jiri Slaby
  2007-04-30 18:44     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Slaby @ 2007-04-30 18:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List

On 4/30/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 30 Apr 2007 17:39:19 +0200
> Jiri Slaby <jirislaby@gmail.com> wrote:
>
> > Hi,
> >
> > I have a problem with higher disk loads (e.g. running git-log or yum update).
> > Many processes end up in D state and system is unusable -- I'm not able to run
> > anything but smooth mouse moving when this happens.
> >
> > If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
> > also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
> > fine, but I'm uncertain. If it is important, let me know to re-test.
> >
>
> It is important, but I doubt if retesting 2.6.21-rc6-mm1 will clarify
> things a lot.
>
> Could you try switching to a different IO scheduler please?  Anticipatory
> would suit.

As I wrote below the sysrq-t, switch to noop didn't help, but it seems
that it's harder to reproduce with that:

<cite it's_bad_to_write_anything_below_logs="true">
Note that yum works on lvm on raid0 and git too, but on the another md volume.
Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
Using noop is no change (but seems to be harder to reproduce with it). I figured
out that it probably happens when 2+ processes are on both "processors" (HT on
P4) and are IO wait (multiload-applet shows red above the half).

Swap usage is 0 all the time.
</cite>

> Please keep a close eye on mainline, too.  Wait for it to appear there :(

Ok, I'll try to play with that...

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-mm1: many processes end up in D state
  2007-04-30 15:39 Jiri Slaby
@ 2007-04-30 18:05 ` Andrew Morton
  2007-04-30 18:14   ` Jiri Slaby
  2007-05-01 10:06 ` Jiri Slaby
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-04-30 18:05 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Linux Kernel Mailing List

On Mon, 30 Apr 2007 17:39:19 +0200
Jiri Slaby <jirislaby@gmail.com> wrote:

> Hi,
> 
> I have a problem with higher disk loads (e.g. running git-log or yum update).
> Many processes end up in D state and system is unusable -- I'm not able to run
> anything but smooth mouse moving when this happens.
> 
> If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
> also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
> fine, but I'm uncertain. If it is important, let me know to re-test.
> 

It is important, but I doubt if retesting 2.6.21-rc6-mm1 will clarify
things a lot.

Could you try switching to a different IO scheduler please?  Anticipatory
would suit.

Please keep a close eye on mainline, too.  Wait for it to appear there :(


^ permalink raw reply	[flat|nested] 12+ messages in thread

* 2.6.21-mm1: many processes end up in D state
@ 2007-04-30 15:39 Jiri Slaby
  2007-04-30 18:05 ` Andrew Morton
  2007-05-01 10:06 ` Jiri Slaby
  0 siblings, 2 replies; 12+ messages in thread
From: Jiri Slaby @ 2007-04-30 15:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List

Hi,

I have a problem with higher disk loads (e.g. running git-log or yum update).
Many processes end up in D state and system is unusable -- I'm not able to run
anything but smooth mouse moving when this happens.

If I wait for a 20-30sec it becomes usable. This happens in 2.6.21-rc7-mm2 and
also in 2007-04-28-05-06 broken-out snapshot. I think 2.6.21-rc6-mm1 worked
fine, but I'm uncertain. If it is important, let me know to re-test.

sysrq-t:

yum           D 0000004C     0  2976   2955 (NOTLB)
       c455bd00 00200082 8c68ea9e 0000004c 00000000 00000000 c455bcdc c455bc9c
       c455bc9c c2d6a580 8c68ea9e 0000004c c455bcc0 c2d6a694 c1809980 00000000
       c455bcd0 c4640c80 000036a0 00000001 c455bcf0 c02a7132 c3682164 00001cde
Call Trace:
 [<c037834c>] io_schedule+0x21/0x2e
 [<c014d505>] sync_page+0x38/0x43
 [<c0378581>] __wait_on_bit_lock+0x40/0x63
 [<c014d4b6>] __lock_page+0x54/0x5c
 [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
 [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
 [<c016c31c>] do_sync_read+0xd0/0x106
 [<c016cada>] vfs_read+0x89/0x11d
 [<c016d02c>] sys_pread64+0x64/0x68
 [<c0103ffc>] syscall_call+0x7/0xb
 [<47176012>] 0x47176012
 =======================
xterm         S 00000051     0  2980      1 (NOTLB)
       c4678ae0 00000082 b2fbf0ce 00000051 00000000 00000000 00000000 00000000
       c4678b30 c469d540 b2fbf0ce 00000051 c4678ab0 c469d654 c1813980 00000001
       c4678ad0 c47f0e40 c4678b00 00000286 c4678ae0 c012c077 00000001 00003286
Call Trace:
 [<c03784ad>] schedule_timeout+0x44/0xa4
 [<c0177ee3>] do_select+0x46d/0x55b
 [<c017816f>] core_sys_select+0x19e/0x2c0
 [<c01786f3>] sys_select+0xd3/0x18d
 [<c0103ffc>] syscall_call+0x7/0xb
 [<47180f98>] 0x47180f98
 =======================
bash          S 0000004A     0  2982   2980 (NOTLB)
       c50b3e80 00000086 ccf4cc7e 0000004a 00000000 c021bed9 00000000 c45c0030
       c50b3e30 c45c0030 ccf4cc7e 0000004a c50b3e50 c45c0144 c1809980 00000000
       c1809e44 c4517900 00000000 000000b3 c50b3e90 c014f841 c477cb40 c2d79000
Call Trace:
 [<c03784d3>] schedule_timeout+0x6a/0xa4
 [<c0221960>] read_chan+0x1b9/0x5b9
 [<c021e48f>] tty_read+0x75/0xaa
 [<c016cada>] vfs_read+0x89/0x11d
 [<c016cf3d>] sys_read+0x3d/0x64
 [<c0103ffc>] syscall_call+0x7/0xb
 [<4717883e>] 0x4717883e
 =======================
metacity      S 00000050     0  3005   2429 (NOTLB)
       c2fd5bb0 00000086 5205656d 00000050 00000000 00000000 00000000 c441c51c
       00000246 c45c0540 5205656d 00000050 c2fd5b80 c45c0654 c1809980 00000000
       c1813e44 c27f4900 00002cb8 00000246 c2fd5ba0 c0135a80 c2fd5ba8 00002cb2
Call Trace:
 [<c03784d3>] schedule_timeout+0x6a/0xa4
 [<c0177905>] do_sys_poll+0x33f/0x46d
 [<c0177a74>] sys_poll+0x41/0x43
 [<c0103ffc>] syscall_call+0x7/0xb
 [<4717e3a6>] 0x4717e3a6
 =======================
ccpd          D 0000004C     0  3009   2010 (NOTLB)
       c50abbb0 00000082 f08ef585 0000004c 00000000 00000000 c180abc0 c22fc788
       c50abba0 c505d070 f08ef585 0000004c 00000000 c505d184 c1809980 00000000
       c21167f0 c2fc6740 c5794e3c c31eff74 c5794310 c22fc788 00000002 00001e64
Call Trace:
 [<c01b64d9>] do_get_write_access+0x310/0x4f5
 [<c01b66d9>] journal_get_write_access+0x1b/0x2a
 [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
 [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
 [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
 [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
 [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
 [<c017eacc>] touch_atime+0x84/0xd8
 [<c014dd08>] do_generic_mapping_read+0x463/0x57e
 [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
 [<c016c31c>] do_sync_read+0xd0/0x106
 [<c016cada>] vfs_read+0x89/0x11d
 [<c016fc6e>] kernel_read+0x36/0x48
 [<c016fd32>] prepare_binprm+0xb2/0xdf
 [<c0171322>] do_execve+0xe6/0x1e1
 [<c0102715>] sys_execve+0x2e/0x7f
 [<c0103ffc>] syscall_call+0x7/0xb
 [<471474da>] 0x471474da
 =======================
bash          D 0000004D     0  3011   2859 (NOTLB)
       c471dc90 00000082 08f8373c 0000004d 00000000 00000000 c471dc30 c471dc2c
       c471dc2c c4670540 08f8373c 0000004d c471dc50 c4670654 c1809980 00000000
       c471dc60 c2f3f040 00000005 00000001 c471dc80 c02a7132 c591af64 00001eea
Call Trace:
 [<c037834c>] io_schedule+0x21/0x2e
 [<c014d505>] sync_page+0x38/0x43
 [<c0378581>] __wait_on_bit_lock+0x40/0x63
 [<c014d4b6>] __lock_page+0x54/0x5c
 [<c014da81>] do_generic_mapping_read+0x1dc/0x57e
 [<c014fc84>] generic_file_aio_read+0xc9/0x1b5
 [<c016c31c>] do_sync_read+0xd0/0x106
 [<c016cada>] vfs_read+0x89/0x11d
 [<c016fc6e>] kernel_read+0x36/0x48
 [<c016fd32>] prepare_binprm+0xb2/0xdf
 [<c0171322>] do_execve+0xe6/0x1e1
 [<c0102715>] sys_execve+0x2e/0x7f
 [<c0103ffc>] syscall_call+0x7/0xb
 [<471474da>] 0x471474da
 =======================
bash          D 0000004D     0  3012   2884 (NOTLB)
       c50d5a80 00000082 a21bfc35 0000004d 00000000 00000000 c180abc0 c2033dd4
       c50d5a70 c2f4f070 a21bfc35 0000004d 00000000 c2f4f184 c1809980 00000000
       c21167bc c47f0c80 c2033310 c22fc788 c5794e3c c2033dd4 00000002 0000216d
Call Trace:
 [<c01b64d9>] do_get_write_access+0x310/0x4f5
 [<c01b66d9>] journal_get_write_access+0x1b/0x2a
 [<c01b59ce>] __ext3_journal_get_write_access+0x19/0x3f
 [<c01aa5f6>] ext3_reserve_inode_write+0x53/0x6c
 [<c01aa62f>] ext3_mark_inode_dirty+0x20/0x37
 [<c01ad350>] ext3_dirty_inode+0x6b/0x6d
 [<c01868fb>] __mark_inode_dirty+0x2a/0x16d
 [<c017eacc>] touch_atime+0x84/0xd8
 [<c017499a>] __link_path_walk+0x893/0xca4
 [<c0174df1>] link_path_walk+0x46/0xc3
 [<c0175038>] do_path_lookup+0x86/0x1b0
 [<c0175ad0>] __path_lookup_intent_open+0x44/0x7f
 [<c0175b83>] path_lookup_open+0x21/0x27
 [<c016fd86>] open_exec+0x27/0xa2
 [<c01993a3>] load_elf_binary+0x1482/0x1acf
 [<c016fa1d>] search_binary_handler+0x6e/0x19f
 [<c0171386>] do_execve+0x14a/0x1e1
 [<c0102715>] sys_execve+0x2e/0x7f
 [<c0103ffc>] syscall_call+0x7/0xb
 [<471474da>] 0x471474da
 =======================

Note that yum works on lvm on raid0 and git too, but on the another md volume.
Both ext3s. Drivers are sata_promise and ata_piix (sata disk); CFQ scheduler.
Using noop is no change (but seems to be harder to reproduce with it). I figured
out that it probably happens when 2+ processes are on both "processors" (HT on
P4) and are IO wait (multiload-applet shows red above the half).

Swap usage is 0 all the time.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-05-07  9:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-04 19:19 2.6.21-mm1: many processes end up in D state Mikael Pettersson
2007-05-05 15:30 ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2007-05-05 23:46 Mikael Pettersson
2007-05-07  9:52 ` Jiri Slaby
2007-04-30 15:39 Jiri Slaby
2007-04-30 18:05 ` Andrew Morton
2007-04-30 18:14   ` Jiri Slaby
2007-04-30 18:44     ` Andrew Morton
2007-05-01  8:40       ` Jiri Slaby
2007-05-01 10:06 ` Jiri Slaby
2007-05-04 14:58   ` Jiri Slaby
2007-05-04 15:02     ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).