LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.20 kernel hang with USB drive and vfat doing ftruncate
@ 2007-02-16 19:54 Kumar Gala
  2007-02-18 16:10 ` OGAWA Hirofumi
  0 siblings, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-16 19:54 UTC (permalink / raw)
  To: Linux Kernel list

I'm seeing an issue with a stock 2.6.20 kernel running on an embedded  
PPC.  I've got a usb flash drive plugged in and the filesystem on the  
drive is vfat.  Running with 64M and no swap.

If I execute a series of large (100M+) ftruncate() on the disk the  
kernel will hang and never return.  It seems to be stuck in the idle  
loop().

The following is the test program I'm running:

#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>

void usage (void)
{
         printf ("truncate_test <filename> <size>\n\n");
}

int main(int argc, char *argv[])
{
         int fd, i;
         int ret = 0;
         unsigned int len;

         if (argc != 3) {
                 printf("Invalid number of arguments\n\n");
                 usage();
                 exit(1);
         }

         fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, S_IRWXU);
         len = strtoul(argv[2], NULL, 0);

         ret = ftruncate(fd, len);

         if (ret)
                 printf ("ftruncate ret = %d %d\n", ret, errno);

         close(fd);

         return ret;
}

I usually run the following twice to get the hang state:

time ./trunc_test bar 100000000 &
time ./trunc_test baz 100000000 &

I was wondering if anyone had any suggestions on what to poke at next  
to try and figure out what is going on.

- k

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-16 19:54 2.6.20 kernel hang with USB drive and vfat doing ftruncate Kumar Gala
@ 2007-02-18 16:10 ` OGAWA Hirofumi
  2007-02-19 21:58   ` Kumar Gala
  2007-02-19 22:06   ` Kumar Gala
  0 siblings, 2 replies; 16+ messages in thread
From: OGAWA Hirofumi @ 2007-02-18 16:10 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Linux Kernel list

Kumar Gala <galak@kernel.crashing.org> writes:

> I'm seeing an issue with a stock 2.6.20 kernel running on an embedded  
> PPC.  I've got a usb flash drive plugged in and the filesystem on the  
> drive is vfat.  Running with 64M and no swap.
>
> If I execute a series of large (100M+) ftruncate() on the disk the  
> kernel will hang and never return.  It seems to be stuck in the idle  
> loop().
>
> The following is the test program I'm running:
>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <errno.h>
>
> void usage (void)
> {
>          printf ("truncate_test <filename> <size>\n\n");
> }
>
> int main(int argc, char *argv[])
> {
>          int fd, i;
>          int ret = 0;
>          unsigned int len;
>
>          if (argc != 3) {
>                  printf("Invalid number of arguments\n\n");
>                  usage();
>                  exit(1);
>          }
>
>          fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, S_IRWXU);
>          len = strtoul(argv[2], NULL, 0);
>
>          ret = ftruncate(fd, len);
>
>          if (ret)
>                  printf ("ftruncate ret = %d %d\n", ret, errno);
>
>          close(fd);
>
>          return ret;
> }
>
> I usually run the following twice to get the hang state:
>
> time ./trunc_test bar 100000000 &
> time ./trunc_test baz 100000000 &
>
> I was wondering if anyone had any suggestions on what to poke at next  
> to try and figure out what is going on.

Can you check /sys/block/xxx/stat or something to make sure there is
no outstanding IO request?

It seems to be no response from the lower layer...
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-18 16:10 ` OGAWA Hirofumi
@ 2007-02-19 21:58   ` Kumar Gala
  2007-02-19 22:19     ` OGAWA Hirofumi
  2007-02-19 22:06   ` Kumar Gala
  1 sibling, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-19 21:58 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Linux Kernel list


On Feb 18, 2007, at 10:10 AM, OGAWA Hirofumi wrote:

> Kumar Gala <galak@kernel.crashing.org> writes:
>
>> I'm seeing an issue with a stock 2.6.20 kernel running on an embedded
>> PPC.  I've got a usb flash drive plugged in and the filesystem on the
>> drive is vfat.  Running with 64M and no swap.
>>
>> If I execute a series of large (100M+) ftruncate() on the disk the
>> kernel will hang and never return.  It seems to be stuck in the idle
>> loop().
>>
>> The following is the test program I'm running:
>>
>> #include <sys/mman.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <errno.h>
>>
>> void usage (void)
>> {
>>          printf ("truncate_test <filename> <size>\n\n");
>> }
>>
>> int main(int argc, char *argv[])
>> {
>>          int fd, i;
>>          int ret = 0;
>>          unsigned int len;
>>
>>          if (argc != 3) {
>>                  printf("Invalid number of arguments\n\n");
>>                  usage();
>>                  exit(1);
>>          }
>>
>>          fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, S_IRWXU);
>>          len = strtoul(argv[2], NULL, 0);
>>
>>          ret = ftruncate(fd, len);
>>
>>          if (ret)
>>                  printf ("ftruncate ret = %d %d\n", ret, errno);
>>
>>          close(fd);
>>
>>          return ret;
>> }
>>
>> I usually run the following twice to get the hang state:
>>
>> time ./trunc_test bar 100000000 &
>> time ./trunc_test baz 100000000 &
>>
>> I was wondering if anyone had any suggestions on what to poke at next
>> to try and figure out what is going on.
>
> Can you check /sys/block/xxx/stat or something to make sure there is
> no outstanding IO request?
>
> It seems to be no response from the lower layer...

Once the system locks up I dont have any ability to do anything.

- k

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-18 16:10 ` OGAWA Hirofumi
  2007-02-19 21:58   ` Kumar Gala
@ 2007-02-19 22:06   ` Kumar Gala
  2007-02-21 20:18     ` OGAWA Hirofumi
  1 sibling, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-19 22:06 UTC (permalink / raw)
  To: Linux Kernel list; +Cc: Andrew Morton, Greg KH


On Feb 18, 2007, at 10:10 AM, OGAWA Hirofumi wrote:

> Kumar Gala <galak@kernel.crashing.org> writes:
>
>> I'm seeing an issue with a stock 2.6.20 kernel running on an embedded
>> PPC.  I've got a usb flash drive plugged in and the filesystem on the
>> drive is vfat.  Running with 64M and no swap.
>>
>> If I execute a series of large (100M+) ftruncate() on the disk the
>> kernel will hang and never return.  It seems to be stuck in the idle
>> loop().
>>
>> The following is the test program I'm running:
>>
>> #include <sys/mman.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <errno.h>
>>
>> void usage (void)
>> {
>>          printf ("truncate_test <filename> <size>\n\n");
>> }
>>
>> int main(int argc, char *argv[])
>> {
>>          int fd, i;
>>          int ret = 0;
>>          unsigned int len;
>>
>>          if (argc != 3) {
>>                  printf("Invalid number of arguments\n\n");
>>                  usage();
>>                  exit(1);
>>          }
>>
>>          fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, S_IRWXU);
>>          len = strtoul(argv[2], NULL, 0);
>>
>>          ret = ftruncate(fd, len);
>>
>>          if (ret)
>>                  printf ("ftruncate ret = %d %d\n", ret, errno);
>>
>>          close(fd);
>>
>>          return ret;
>> }
>>
>> I usually run the following twice to get the hang state:
>>
>> time ./trunc_test bar 100000000 &
>> time ./trunc_test baz 100000000 &
>>
>> I was wondering if anyone had any suggestions on what to poke at next
>> to try and figure out what is going on.

So I realized I could use sysrq to provide some more debug  
information.  When the system locks up I get the following output  
from 't'

[  496.901002] Show State
[  496.903356]
[  496.903360]                          free                         
sibling
[  496.911532]   task             PC    stack   pid father child  
younger older
[  496.918486] init          S 3009C7EC     0     1      0      
2               (NOTLB)
[  496.926169] Call Trace:
[  496.928611] [C3FC7DA0] [C006F03C] __link_path_walk+0xd24/0x112c  
(unreliable)
[  496.935687] [C3FC7E60] [C00083AC] __switch_to+0x28/0x40
[  496.940931] [C3FC7E80] [C01F4B78] schedule+0x324/0x6bc
[  496.946086] [C3FC7EC0] [C001E164] do_wait+0x700/0x100c
[  496.951242] [C3FC7F40] [C000FAD4] ret_from_syscall+0x0/0x38
[  496.956828] --- Exception: c01 at 0x3009c7ec
[  496.961099]     LR = 0x3009c3e0
[  496.964234] ksoftirqd/0   S 00000000     0     2       
1             3       (L-TLB)
[  496.971913] Call Trace:
[  496.974355] [C033DE80] [C0133F64] scsi_io_completion+0x74/0x318  
(unreliable)
[  496.981428] [C033DF40] [C00083AC] __switch_to+0x28/0x40
[  496.986664] [C033DF60] [C01F4B78] schedule+0x324/0x6bc
[  496.991811] [C033DFA0] [C00210CC] ksoftirqd+0xfc/0x114
[  496.996960] [C033DFC0] [C0033E48] kthread+0xf4/0x130
[  497.001941] [C033DFF0] [C001093C] kernel_thread+0x44/0x60
[  497.007350] events/0      S 00000000     0     3       
1             4     2 (L-TLB)
[  497.015030] Call Trace:
[  497.017472] [C033FEE0] [C00083AC] __switch_to+0x28/0x40
[  497.022707] [C033FF00] [C01F4B78] schedule+0x324/0x6bc
[  497.027855] [C033FF40] [C002F67C] worker_thread+0x144/0x148
[  497.033435] [C033FFC0] [C0033E48] kthread+0xf4/0x130
[  497.038409] [C033FFF0] [C001093C] kernel_thread+0x44/0x60
[  497.043817] khelper       S 00000000     0     4       
1             5     3 (L-TLB)
[  497.051497] Call Trace:
[  497.053940] [C3FE1E20] [C3FE0000] 0xc3fe0000 (unreliable)
[  497.059351] [C3FE1EE0] [C00083AC] __switch_to+0x28/0x40
[  497.064586] [C3FE1F00] [C01F4B78] schedule+0x324/0x6bc
[  497.069734] [C3FE1F40] [C002F67C] worker_thread+0x144/0x148
[  497.075316] [C3FE1FC0] [C0033E48] kthread+0xf4/0x130
[  497.080291] [C3FE1FF0] [C001093C] kernel_thread+0x44/0x60
[  497.085697] kthread       S 00000000     0     5      1    37      
617     4 (L-TLB)
[  497.093378] Call Trace:
[  497.095820] [C3FCBE20] [00001032] 0x1032 (unreliable)
[  497.100881] --- Exception: c3fcbef0 at __switch_to+0x28/0x40
[  497.106545]     LR = 0xc3fcbef0
[  497.109681] [C3FCBEE0] [C00083AC] __switch_to+0x28/0x40 (unreliable)
[  497.116051] [C3FCBF00] [C01F4B78] schedule+0x324/0x6bc
[  497.121201] [C3FCBF40] [C002F67C] worker_thread+0x144/0x148
[  497.126783] [C3FCBFC0] [C0033E48] kthread+0xf4/0x130
[  497.131758] [C3FCBFF0] [C001093C] kernel_thread+0x44/0x60
[  497.137165] kblockd/0     S 00000000     0    37      5             
41       (L-TLB)
[  497.144845] Call Trace:
[  497.147286] [C3D9FE20] [C3EBF490] 0xc3ebf490 (unreliable)
[  497.152697] [C3D9FEE0] [C00083AC] __switch_to+0x28/0x40
[  497.157933] [C3D9FF00] [C01F4B78] schedule+0x324/0x6bc
[  497.163082] [C3D9FF40] [C002F67C] worker_thread+0x144/0x148
[  497.168663] [C3D9FFC0] [C0033E48] kthread+0xf4/0x130
[  497.173637] [C3D9FFF0] [C001093C] kernel_thread+0x44/0x60
[  497.179045] khubd         S 00000000     0    41      5             
53    37 (L-TLB)
[  497.186726] Call Trace:
[  497.189167] [C0341E00] [C3F03900] 0xc3f03900 (unreliable)
[  497.194578] [C0341EC0] [C00083AC] __switch_to+0x28/0x40
[  497.199813] [C0341EE0] [C01F4B78] schedule+0x324/0x6bc
[  497.204961] [C0341F20] [C0152288] hub_thread+0xb40/0xcc0
[  497.210283] [C0341FC0] [C0033E48] kthread+0xf4/0x130
[  497.215257] [C0341FF0] [C001093C] kernel_thread+0x44/0x60
[  497.220664] pdflush       D 00000000     0    53      5             
55    41 (L-TLB)
[  497.228344] Call Trace:
[  497.230786] [C3CABD10] [C008E098] __find_get_block+0x10c/0x288  
(unreliable)
[  497.237769] [C3CABDD0] [C00083AC] __switch_to+0x28/0x40
[  497.243004] [C3CABDF0] [C01F4B78] schedule+0x324/0x6bc
[  497.248152] [C3CABE30] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.253822] [C3CABE70] [C01F5C9C] io_schedule_timeout+0x30/0x54
[  497.259752] [C3CABE90] [C0050DE4] congestion_wait+0x64/0x8c
[  497.265343] [C3CABEE0] [C004AA9C] background_writeout+0x44/0xe8
[  497.271274] [C3CABF50] [C004BE04] pdflush+0x16c/0x27c
[  497.276335] [C3CABFC0] [C0033E48] kthread+0xf4/0x130
[  497.281311] [C3CABFF0] [C001093C] kernel_thread+0x44/0x60
[  497.286718] kswapd0       D 00000000     0    55      5             
56    53 (L-TLB)
[  497.294399] Call Trace:
[  497.296841] [C3DEFB70] [C0025FE8] run_timer_softirq+0x20/0x230  
(unreliable)
[  497.303823] [C3DEFC30] [C00083AC] __switch_to+0x28/0x40
[  497.309058] [C3DEFC50] [C01F4B78] schedule+0x324/0x6bc
[  497.314207] [C3DEFC90] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.319876] --- Exception: c3defd60 at 0xc3deff40
[  497.324582]     LR = 0xc3dee000
[  497.327718] [C3DEFCD0] [C01F5C9C] io_schedule_timeout+0x30/0x54  
(unreliable)
[  497.334783] [C3DEFCF0] [C0050DE4] congestion_wait+0x64/0x8c
[  497.340367] [C3DEFD40] [C004A9F0] throttle_vm_writeout+0x1c/0x84
[  497.346384] [C3DEFD60] [C004F33C] shrink_zone+0xbb0/0xfe4
[  497.351792] [C3DEFF10] [C004FD10] kswapd+0x2d4/0x424
[  497.356767] [C3DEFFC0] [C0033E48] kthread+0xf4/0x130
[  497.361741] [C3DEFFF0] [C001093C] kernel_thread+0x44/0x60
[  497.367151] aio/0         S 00000000     0    56      5            
670    55 (L-TLB)
[  497.374832] Call Trace:
[  497.377272] [C3CADE20] [00000020] 0x20 (unreliable)
[  497.382162] [C3CADEE0] [C00083AC] __switch_to+0x28/0x40
[  497.387398] [C3CADF00] [C01F4B78] schedule+0x324/0x6bc
[  497.392547] [C3CADF40] [C002F67C] worker_thread+0x144/0x148
[  497.398128] [C3CADFC0] [C0033E48] kthread+0xf4/0x130
[  497.403102] [C3CADFF0] [C001093C] kernel_thread+0x44/0x60
[  497.408508] mtdblockd     S 00000000     0   617      1            
718     5 (L-TLB)
[  497.416191] Call Trace:
[  497.418632] [C3F27E70] [C02A0000] 0xc02a0000 (unreliable)
[  497.424043] [C3F27F30] [C00083AC] __switch_to+0x28/0x40
[  497.429278] [C3F27F50] [C01F4B78] schedule+0x324/0x6bc
[  497.434425] [C3F27F90] [C013FAAC] mtd_blktrans_thread+0x250/0x340
[  497.440534] [C3F27FF0] [C001093C] kernel_thread+0x44/0x60
[  497.445941] scsi_eh_0     D 00000000     0   670      5            
671    56 (L-TLB)
[  497.453622] Call Trace:
[  497.456062] [C3F1FDF0] [00000011] 0x11 (unreliable)
[  497.460951] [C3F1FEB0] [C00083AC] __switch_to+0x28/0x40
[  497.466187] [C3F1FED0] [C01F4B78] schedule+0x324/0x6bc
[  497.471335] [C3F1FF10] [C01F50D4] wait_for_completion+0xa0/0x150
[  497.477351] [C3F1FF50] [C016641C] command_abort+0xdc/0x118
[  497.482846] [C3F1FF60] [C0132BC0] scsi_error_handler+0x5f0/0x810
[  497.488868] [C3F1FFC0] [C0033E48] kthread+0xf4/0x130
[  497.493842] [C3F1FFF0] [C001093C] kernel_thread+0x44/0x60
[  497.499249] usb-storage   D 00000000     0   671      5            
773   670 (L-TLB)
[  497.506930] Call Trace:
[  497.509372] [C3F35A60] [C00083AC] __switch_to+0x28/0x40
[  497.514608] [C3F35A80] [C01F4B78] schedule+0x324/0x6bc
[  497.519756] [C3F35AC0] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.525426] [C3F35B00] [C01F5C9C] io_schedule_timeout+0x30/0x54
[  497.531356] [C3F35B20] [C0050DE4] congestion_wait+0x64/0x8c
[  497.536941] [C3F35B70] [C004A9F0] throttle_vm_writeout+0x1c/0x84
[  497.542958] [C3F35B90] [C004F33C] shrink_zone+0xbb0/0xfe4
[  497.548367] [C3F35D40] [C004F8F4] try_to_free_pages+0x184/0x2cc
[  497.554298] [C3F35DB0] [C0049AA8] __alloc_pages+0x110/0x2c0
[  497.559878] [C3F35E00] [C0060F84] cache_alloc_refill+0x394/0x694
[  497.565900] [C3F35E30] [C00614A0] __kmalloc+0xc4/0xcc
[  497.570961] [C3F35E40] [C01544D0] usb_alloc_urb+0x1c/0x5c
[  497.576371] [C3F35E50] [C015520C] usb_sg_init+0x1a0/0x2f8
[  497.581779] [C3F35EA0] [C0167318] usb_stor_bulk_transfer_sg+0x8c/ 
0x138
[  497.588317] [C3F35ED0] [C0167960] usb_stor_Bulk_transport+0x140/0x310
[  497.594767] [C3F35F00] [C0167DCC] usb_stor_invoke_transport+0x2c/ 
0x344
[  497.601303] [C3F35F50] [C0166B2C] usb_stor_transparent_scsi_command 
+0x10/0x20
[  497.608449] [C3F35F60] [C0168498] usb_stor_control_thread+0x1f8/0x290
[  497.614900] [C3F35FC0] [C0033E48] kthread+0xf4/0x130
[  497.619876] [C3F35FF0] [C001093C] kernel_thread+0x44/0x60
[  497.625285] sh            D 3009C7EC     0   718      1            
787   617 (NOTLB)
[  497.632968] Call Trace:
[  497.635410] [C3C37A90] [C01339AC] scsi_run_queue+0x220/0x2e0  
(unreliable)
[  497.642216] [C3C37B50] [C00083AC] __switch_to+0x28/0x40
[  497.647452] [C3C37B70] [C01F4B78] schedule+0x324/0x6bc
[  497.652601] [C3C37BB0] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.658272] [C3C37BF0] [C01F5C9C] io_schedule_timeout+0x30/0x54
[  497.664202] [C3C37C10] [C0050DE4] congestion_wait+0x64/0x8c
[  497.669786] [C3C37C60] [C004A9F0] throttle_vm_writeout+0x1c/0x84
[  497.675802] [C3C37C80] [C004F33C] shrink_zone+0xbb0/0xfe4
[  497.681211] [C3C37E30] [C004F8F4] try_to_free_pages+0x184/0x2cc
[  497.687141] [C3C37EA0] [C0049AA8] __alloc_pages+0x110/0x2c0
[  497.692723] [C3C37EF0] [C0049C8C] __get_free_pages+0x34/0x74
[  497.698390] [C3C37F00] [C007C3C4] sys_getcwd+0x30/0x2b0
[  497.703629] [C3C37F40] [C000FAD4] ret_from_syscall+0x0/0x38
[  497.709210] --- Exception: c01 at 0x3009c7ec
[  497.713480]     LR = 0x3009a7b0
[  497.716614] pdflush       D 00000000     0   773       
5                 671 (L-TLB)
[  497.724294] Call Trace:
[  497.726737] [C32ADD10] [C008E098] __find_get_block+0x10c/0x288  
(unreliable)
[  497.733718] [C32ADDD0] [C00083AC] __switch_to+0x28/0x40
[  497.738954] [C32ADDF0] [C01F4B78] schedule+0x324/0x6bc
[  497.744102] [C32ADE30] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.749772] [C32ADE70] [C01F5C9C] io_schedule_timeout+0x30/0x54
[  497.755702] [C32ADE90] [C0050DE4] congestion_wait+0x64/0x8c
[  497.761285] [C32ADEE0] [C004AC74] wb_kupdate+0xf0/0x160
[  497.766520] [C32ADF50] [C004BE04] pdflush+0x16c/0x27c
[  497.771581] [C32ADFC0] [C0033E48] kthread+0xf4/0x130
[  497.776556] [C32ADFF0] [C001093C] kernel_thread+0x44/0x60
[  497.781965] time          S 3009C7EC     0   787      1   789      
788   718 (NOTLB)
[  497.789648] Call Trace:
[  497.792089] [C03A3E60] [C00083AC] __switch_to+0x28/0x40
[  497.797326] [C03A3E80] [C01F4B78] schedule+0x324/0x6bc
[  497.802474] [C03A3EC0] [C001E164] do_wait+0x700/0x100c
[  497.807630] [C03A3F40] [C000FAD4] ret_from_syscall+0x0/0x38
[  497.813210] --- Exception: c01 at 0x3009c7ec
[  497.817481]     LR = 0x3009c414
[  497.820616] trunc_test    D 300787EC     0   789     
787                     (NOTLB)
[  497.828295] Call Trace:
[  497.830737] [C19B3960] [C0160000] handshake+0x6c/0x9c (unreliable)
[  497.836939] [C19B3A20] [C00083AC] __switch_to+0x28/0x40
[  497.842175] [C19B3A40] [C01F4B78] schedule+0x324/0x6bc
[  497.847323] [C19B3A80] [C01F5D6C] schedule_timeout+0x6c/0xd0
[  497.852994] [C19B3AC0] [C01F5C9C] io_schedule_timeout+0x30/0x54
[  497.858924] [C19B3AE0] [C0050DE4] congestion_wait+0x64/0x8c
[  497.864510] [C19B3B30] [C004A9F0] throttle_vm_writeout+0x1c/0x84
[  497.870525] [C19B3B50] [C004F33C] shrink_zone+0xbb0/0xfe4
[  497.875934] [C19B3D00] [C004F8F4] try_to_free_pages+0x184/0x2cc
[  497.881864] [C19B3D70] [C0049AA8] __alloc_pages+0x110/0x2c0
[  497.887445] [C19B3DC0] [C00447F4] find_or_create_page+0x8c/0xe4
[  497.893386] [C19B3DE0] [C0090DAC] cont_prepare_write+0xac/0x32c
[  497.899321] [C19B3E20] [C00D7A50] fat_prepare_write+0x30/0x40
[  497.905077] [C19B3E30] [C008E68C] __generic_cont_expand+0xa4/0x158
[  497.911268] [C19B3E50] [C00D7254] fat_notify_change+0xf4/0x208
[  497.917109] [C19B3E80] [C007EB24] notify_change+0x1ec/0x1fc
[  497.922695] [C19B3EB0] [C0062DC0] do_truncate+0x58/0x88
[  497.927935] [C19B3F10] [C006316C] do_sys_ftruncate+0x180/0x1a8
[  497.933780] [C19B3F40] [C000FAD4] ret_from_syscall+0x0/0x38
[  497.939361] --- Exception: c01 at 0x300787ec
[  497.943634]     LR = 0x1000073c
[  497.946768] time          S 3009C7EC     0   788      1    
790           787 (NOTLB)
[  497.954450] Call Trace:
[  497.956892] [C1919E60] [C00083AC] __switch_to+0x28/0x40
[  497.962129] [C1919E80] [C01F4B78] schedule+0x324/0x6bc
[  497.967278] [C1919EC0] [C001E164] do_wait+0x700/0x100c
[  497.972431] [C1919F40] [C000FAD4] ret_from_syscall+0x0/0x38
[  497.978011] --- Exception: c01 at 0x3009c7ec
[  497.982282]     LR = 0x3009c414
[  497.985417] trunc_test    D 300787EC     0   790     
788                     (NOTLB)
[  497.993101] Call Trace:
[  497.995542] [C2BFDA00] [C0047E68] mempool_alloc+0x38/0x144  
(unreliable)
[  498.002171] [C2BFDAC0] [C00083AC] __switch_to+0x28/0x40
[  498.007406] [C2BFDAE0] [C01F4B78] schedule+0x324/0x6bc
[  498.012554] [C2BFDB20] [C01F5C48] io_schedule+0x30/0x54
[  498.017790] [C2BFDB40] [C008D01C] sync_buffer+0x68/0x7c
[  498.023026] [C2BFDB50] [C01F5E80] __wait_on_bit+0x98/0xec
[  498.028435] [C2BFDB70] [C01F5F34] out_of_line_wait_on_bit+0x60/0x74
[  498.034713] [C2BFDBC0] [C008CF3C] __wait_on_buffer+0x3c/0x4c
[  498.040382] [C2BFDBD0] [C00916F4] __bread+0xe8/0xf4
[  498.045270] [C2BFDBE0] [C00D5C24] fat_ent_bread+0x48/0xa8
[  498.050678] [C2BFDC00] [C00D6358] fat_ent_read+0x168/0x1f0
[  498.056171] [C2BFDC30] [C00D6690] fat_free_clusters+0x64/0x260
[  498.062011] [C2BFDCC0] [C00D75C4] fat_truncate+0x25c/0x334
[  498.067507] [C2BFDD30] [C0053EE4] vmtruncate+0x184/0x1a4
[  498.072833] [C2BFDD50] [C007E810] inode_setattr+0x7c/0x1a4
[  498.078329] [C2BFDD90] [C00D7314] fat_notify_change+0x1b4/0x208
[  498.084257] [C2BFDDC0] [C007EB24] notify_change+0x1ec/0x1fc
[  498.089840] [C2BFDDF0] [C0062DC0] do_truncate+0x58/0x88
[  498.095077] [C2BFDE50] [C007028C] may_open+0x1fc/0x200
[  498.100230] [C2BFDE70] [C0070380] open_namei+0xf0/0x714
[  498.105465] [C2BFDEB0] [C0063BB8] do_filp_open+0x30/0x78
[  498.110788] [C2BFDF20] [C0064018] do_sys_open+0x70/0xc0
[  498.116023] [C2BFDF40] [C000FAD4] ret_from_syscall+0x0/0x38
[  498.121605] --- Exception: c01 at 0x300787ec
[  498.125878]     LR = 0x30077580

and from 'm'

[  731.834529] Show Memory
[  731.836968] Mem-info:
[  731.839234] DMA per-cpu:
[  731.841768] CPU    0: Hot: hi:   18, btch:   3 usd:   3   Cold:  
hi:    6, btch:   1 usd:   2
[  731.850206] Active:1510 inactive:11309 dirty:7188 writeback:3330  
unstable:0 free:1009 slab:1671 mapped:110 pagetables:19
[  731.861075] DMA free:4036kB min:4096kB low:5120kB high:6144kB  
active:6040kB inactive:45236kB present:65024kB pages_scanned:292  
all_unreclaimable? no
[  731.874363] lowmem_reserve[]: 0 0
[  731.877685] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB  
1*512kB 1*1024kB 1*2048kB 0*4096kB = 4036kB
[  731.887669] Free swap:            0kB
[  731.893913] 16384 pages of RAM
[  731.896963] 798 reserved pages
[  731.900011] 10946 pages shared
[  731.903058] 0 pages swap cached

It seems like usb-storage and aio are completely off in the weeds.   
Ideas?

If you need any additional debug output let me know.

- k


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-19 21:58   ` Kumar Gala
@ 2007-02-19 22:19     ` OGAWA Hirofumi
  2007-02-19 22:27       ` Kumar Gala
  0 siblings, 1 reply; 16+ messages in thread
From: OGAWA Hirofumi @ 2007-02-19 22:19 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Linux Kernel list

Kumar Gala <galak@kernel.crashing.org> writes:

>>> I usually run the following twice to get the hang state:
>>>
>>> time ./trunc_test bar 100000000 &
>>> time ./trunc_test baz 100000000 &
>>>
>>> I was wondering if anyone had any suggestions on what to poke at next
>>> to try and figure out what is going on.
>>
>> Can you check /sys/block/xxx/stat or something to make sure there is
>> no outstanding IO request?
>>
>> It seems to be no response from the lower layer...
>
> Once the system locks up I dont have any ability to do anything.

Ah, doesn't sysrq also work? If sysrq work, it can use to see IO
request state with a patch.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-19 22:19     ` OGAWA Hirofumi
@ 2007-02-19 22:27       ` Kumar Gala
  2007-02-20 17:20         ` OGAWA Hirofumi
  0 siblings, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-19 22:27 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Linux Kernel list


On Feb 19, 2007, at 4:19 PM, OGAWA Hirofumi wrote:

> Kumar Gala <galak@kernel.crashing.org> writes:
>
>>>> I usually run the following twice to get the hang state:
>>>>
>>>> time ./trunc_test bar 100000000 &
>>>> time ./trunc_test baz 100000000 &
>>>>
>>>> I was wondering if anyone had any suggestions on what to poke at  
>>>> next
>>>> to try and figure out what is going on.
>>>
>>> Can you check /sys/block/xxx/stat or something to make sure there is
>>> no outstanding IO request?
>>>
>>> It seems to be no response from the lower layer...
>>
>> Once the system locks up I dont have any ability to do anything.
>
> Ah, doesn't sysrq also work? If sysrq work, it can use to see IO
> request state with a patch.

Yeah, got sysrq working today.  If you can point me at the patch I  
happy to apply it and get data.

- k


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-19 22:27       ` Kumar Gala
@ 2007-02-20 17:20         ` OGAWA Hirofumi
  0 siblings, 0 replies; 16+ messages in thread
From: OGAWA Hirofumi @ 2007-02-20 17:20 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Linux Kernel list

[-- Attachment #1: Type: text/plain, Size: 593 bytes --]

Kumar Gala <galak@kernel.crashing.org> writes:

> On Feb 19, 2007, at 4:19 PM, OGAWA Hirofumi wrote:
>
>> Kumar Gala <galak@kernel.crashing.org> writes:
>>
>>> Once the system locks up I dont have any ability to do anything.
>>
>> Ah, doesn't sysrq also work? If sysrq work, it can use to see IO
>> request state with a patch.
>
> Yeah, got sysrq working today.  If you can point me at the patch I  
> happy to apply it and get data.

Ok, please try attached patch. I hope it helps you.
BTW, new sysrq is sysrq-j, and it will show disk stats.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: debug-block.patch --]
[-- Type: text/x-diff, Size: 2821 bytes --]



Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 block/genhd.c        |   27 +++++++++++++++++++++++++++
 drivers/char/sysrq.c |   15 ++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff -puN drivers/char/sysrq.c~debug-block drivers/char/sysrq.c
--- linux-2.6/drivers/char/sysrq.c~debug-block	2007-02-21 00:58:35.000000000 +0900
+++ linux-2.6-hirofumi/drivers/char/sysrq.c	2007-02-21 02:02:52.000000000 +0900
@@ -311,6 +311,19 @@ static struct sysrq_key_op sysrq_kill_op
 	.enable_mask	= SYSRQ_ENABLE_SIGNAL,
 };
 
+extern void block_req_callback(struct work_struct *ignored);
+static DECLARE_WORK(block_req_work, block_req_callback);
+static void sysrq_handle_block_req(int key, struct tty_struct *tty)
+{
+	schedule_work(&block_req_work);
+}
+static struct sysrq_key_op sysrq_block_req_op = {
+	.handler	= sysrq_handle_block_req,
+	.help_msg	= "block req (j)",
+	.action_msg	= "Block Req",
+	.enable_mask	= SYSRQ_ENABLE_DUMP,
+};
+
 static void sysrq_handle_unrt(int key, struct tty_struct *tty)
 {
 	normalize_rt_tasks();
@@ -351,7 +364,7 @@ static struct sysrq_key_op *sysrq_key_ta
 	NULL,				/* g */
 	NULL,				/* h */
 	&sysrq_kill_op,			/* i */
-	NULL,				/* j */
+	&sysrq_block_req_op,		/* j */
 	&sysrq_SAK_op,			/* k */
 	NULL,				/* l */
 	&sysrq_showmem_op,		/* m */
diff -puN block/genhd.c~debug-block block/genhd.c
--- linux-2.6/block/genhd.c~debug-block	2007-02-21 01:02:13.000000000 +0900
+++ linux-2.6-hirofumi/block/genhd.c	2007-02-21 02:15:56.000000000 +0900
@@ -555,6 +555,33 @@ static struct kset_uevent_ops block_ueve
 
 decl_subsys(block, &ktype_block, &block_uevent_ops);
 
+void block_req_callback(struct work_struct *ignored)
+{
+	struct gendisk *gp;
+	char buf[BDEVNAME_SIZE];
+
+	mutex_lock(&block_subsys_lock);
+	list_for_each_entry(gp, &block_subsys.kset.list, kobj.entry) {
+		printk("%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u:"
+		       " %u %u %u\n",
+		       gp->major, gp->first_minor, disk_name(gp, 0, buf),
+		       disk_stat_read(gp, ios[0]),
+		       disk_stat_read(gp, merges[0]),
+		       (unsigned long long)disk_stat_read(gp, sectors[0]),
+		       jiffies_to_msecs(disk_stat_read(gp, ticks[0])),
+		       disk_stat_read(gp, ios[1]),
+		       disk_stat_read(gp, merges[1]),
+		       (unsigned long long)disk_stat_read(gp, sectors[1]),
+		       jiffies_to_msecs(disk_stat_read(gp, ticks[1])),
+		       gp->in_flight,
+		       jiffies_to_msecs(disk_stat_read(gp, io_ticks)),
+		       jiffies_to_msecs(disk_stat_read(gp, time_in_queue)),
+		       gp->queue->rq.count[0], gp->queue->rq.count[1],
+		       gp->queue->in_flight);
+	}
+	mutex_unlock(&block_subsys_lock);
+}
+
 /*
  * aggregate disk stat collector.  Uses the same stats that the sysfs
  * entries do, above, but makes them available through one seq_file.
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-19 22:06   ` Kumar Gala
@ 2007-02-21 20:18     ` OGAWA Hirofumi
  2007-02-21 20:57       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: OGAWA Hirofumi @ 2007-02-21 20:18 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Linux Kernel list, Andrew Morton, Greg KH

Kumar Gala <galak@kernel.crashing.org> writes:

>>> I usually run the following twice to get the hang state:
>>>
>>> time ./trunc_test bar 100000000 &
>>> time ./trunc_test baz 100000000 &
>>>
>>> I was wondering if anyone had any suggestions on what to poke at next
>>> to try and figure out what is going on.
>
> So I realized I could use sysrq to provide some more debug  
> information.  When the system locks up I get the following output  
> from 't'
>
> [  497.499249] usb-storage   D 00000000     0   671      5            
> 773   670 (L-TLB)
> [  497.506930] Call Trace:
> [  497.509372] [C3F35A60] [C00083AC] __switch_to+0x28/0x40
> [  497.514608] [C3F35A80] [C01F4B78] schedule+0x324/0x6bc
> [  497.519756] [C3F35AC0] [C01F5D6C] schedule_timeout+0x6c/0xd0
> [  497.525426] [C3F35B00] [C01F5C9C] io_schedule_timeout+0x30/0x54
> [  497.531356] [C3F35B20] [C0050DE4] congestion_wait+0x64/0x8c
> [  497.536941] [C3F35B70] [C004A9F0] throttle_vm_writeout+0x1c/0x84
> [  497.542958] [C3F35B90] [C004F33C] shrink_zone+0xbb0/0xfe4
> [  497.548367] [C3F35D40] [C004F8F4] try_to_free_pages+0x184/0x2cc
> [  497.554298] [C3F35DB0] [C0049AA8] __alloc_pages+0x110/0x2c0
> [  497.559878] [C3F35E00] [C0060F84] cache_alloc_refill+0x394/0x694
> [  497.565900] [C3F35E30] [C00614A0] __kmalloc+0xc4/0xcc
> [  497.570961] [C3F35E40] [C01544D0] usb_alloc_urb+0x1c/0x5c
> [  497.576371] [C3F35E50] [C015520C] usb_sg_init+0x1a0/0x2f8
> [  497.581779] [C3F35EA0] [C0167318] usb_stor_bulk_transfer_sg+0x8c/ 
> 0x138
> [  497.588317] [C3F35ED0] [C0167960] usb_stor_Bulk_transport+0x140/0x310
> [  497.594767] [C3F35F00] [C0167DCC] usb_stor_invoke_transport+0x2c/ 
> 0x344
> [  497.601303] [C3F35F50] [C0166B2C] usb_stor_transparent_scsi_command 
> +0x10/0x20
> [  497.608449] [C3F35F60] [C0168498] usb_stor_control_thread+0x1f8/0x290
> [  497.614900] [C3F35FC0] [C0033E48] kthread+0xf4/0x130
> [  497.619876] [C3F35FF0] [C001093C] kernel_thread+0x44/0x60
> [  497.625285] sh            D 3009C7EC     0   718      1            

[...]

> and from 'm'
>
> [  731.834529] Show Memory
> [  731.836968] Mem-info:
> [  731.839234] DMA per-cpu:
> [  731.841768] CPU    0: Hot: hi:   18, btch:   3 usd:   3   Cold:  
> hi:    6, btch:   1 usd:   2
> [  731.850206] Active:1510 inactive:11309 dirty:7188 writeback:3330  
> unstable:0 free:1009 slab:1671 mapped:110 pagetables:19
> [  731.861075] DMA free:4036kB min:4096kB low:5120kB high:6144kB  
> active:6040kB inactive:45236kB present:65024kB pages_scanned:292  
> all_unreclaimable? no
> [  731.874363] lowmem_reserve[]: 0 0
> [  731.877685] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB  
> 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4036kB
> [  731.887669] Free swap:            0kB
> [  731.893913] 16384 pages of RAM
> [  731.896963] 798 reserved pages
> [  731.900011] 10946 pages shared
> [  731.903058] 0 pages swap cached
>
> It seems like usb-storage and aio are completely off in the weeds.   
> Ideas?

It seems usb-storage should remove some kmalloc and use mempool() for
urb...  Is someone working on this? And idea?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 20:18     ` OGAWA Hirofumi
@ 2007-02-21 20:57       ` Andrew Morton
  2007-02-21 21:22         ` [linux-usb-devel] " Alan Stern
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-02-21 20:57 UTC (permalink / raw)
  To: OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev
  Cc: Kumar Gala, Linux Kernel list, Greg KH

On Thu, 22 Feb 2007 05:18:45 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> Kumar Gala <galak@kernel.crashing.org> writes:
> 
> >>> I usually run the following twice to get the hang state:
> >>>
> >>> time ./trunc_test bar 100000000 &
> >>> time ./trunc_test baz 100000000 &
> >>>
> >>> I was wondering if anyone had any suggestions on what to poke at next
> >>> to try and figure out what is going on.
> >
> > So I realized I could use sysrq to provide some more debug  
> > information.  When the system locks up I get the following output  
> > from 't'
> >
> > [  497.499249] usb-storage   D 00000000     0   671      5            
> > 773   670 (L-TLB)
> > [  497.506930] Call Trace:
> > [  497.509372] [C3F35A60] [C00083AC] __switch_to+0x28/0x40
> > [  497.514608] [C3F35A80] [C01F4B78] schedule+0x324/0x6bc
> > [  497.519756] [C3F35AC0] [C01F5D6C] schedule_timeout+0x6c/0xd0
> > [  497.525426] [C3F35B00] [C01F5C9C] io_schedule_timeout+0x30/0x54
> > [  497.531356] [C3F35B20] [C0050DE4] congestion_wait+0x64/0x8c
> > [  497.536941] [C3F35B70] [C004A9F0] throttle_vm_writeout+0x1c/0x84
> > [  497.542958] [C3F35B90] [C004F33C] shrink_zone+0xbb0/0xfe4
> > [  497.548367] [C3F35D40] [C004F8F4] try_to_free_pages+0x184/0x2cc
> > [  497.554298] [C3F35DB0] [C0049AA8] __alloc_pages+0x110/0x2c0
> > [  497.559878] [C3F35E00] [C0060F84] cache_alloc_refill+0x394/0x694
> > [  497.565900] [C3F35E30] [C00614A0] __kmalloc+0xc4/0xcc
> > [  497.570961] [C3F35E40] [C01544D0] usb_alloc_urb+0x1c/0x5c
> > [  497.576371] [C3F35E50] [C015520C] usb_sg_init+0x1a0/0x2f8
> > [  497.581779] [C3F35EA0] [C0167318] usb_stor_bulk_transfer_sg+0x8c/ 
> > 0x138
> > [  497.588317] [C3F35ED0] [C0167960] usb_stor_Bulk_transport+0x140/0x310
> > [  497.594767] [C3F35F00] [C0167DCC] usb_stor_invoke_transport+0x2c/ 
> > 0x344
> > [  497.601303] [C3F35F50] [C0166B2C] usb_stor_transparent_scsi_command 
> > +0x10/0x20
> > [  497.608449] [C3F35F60] [C0168498] usb_stor_control_thread+0x1f8/0x290
> > [  497.614900] [C3F35FC0] [C0033E48] kthread+0xf4/0x130
> > [  497.619876] [C3F35FF0] [C001093C] kernel_thread+0x44/0x60
> > [  497.625285] sh            D 3009C7EC     0   718      1            
> 
> [...]
> 
> > and from 'm'
> >
> > [  731.834529] Show Memory
> > [  731.836968] Mem-info:
> > [  731.839234] DMA per-cpu:
> > [  731.841768] CPU    0: Hot: hi:   18, btch:   3 usd:   3   Cold:  
> > hi:    6, btch:   1 usd:   2
> > [  731.850206] Active:1510 inactive:11309 dirty:7188 writeback:3330  
> > unstable:0 free:1009 slab:1671 mapped:110 pagetables:19
> > [  731.861075] DMA free:4036kB min:4096kB low:5120kB high:6144kB  
> > active:6040kB inactive:45236kB present:65024kB pages_scanned:292  
> > all_unreclaimable? no
> > [  731.874363] lowmem_reserve[]: 0 0
> > [  731.877685] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB  
> > 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4036kB
> > [  731.887669] Free swap:            0kB
> > [  731.893913] 16384 pages of RAM
> > [  731.896963] 798 reserved pages
> > [  731.900011] 10946 pages shared
> > [  731.903058] 0 pages swap cached
> >
> > It seems like usb-storage and aio are completely off in the weeds.   
> > Ideas?
> 
> It seems usb-storage should remove some kmalloc and use mempool() for
> urb...  Is someone working on this? And idea?

I think Pete said that we're supposed to be using GFP_NOIO in there.

Not that it'll help much: the VM calls throttle_vm_writeout() for GFP_NOIO
and GFP_NOFS allocations, which is a bug.  Because if the caller holds
locks which prevent filesystem or IO progress, we deadlock.

I'll fix the VM if someone else fixes USB ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 20:57       ` Andrew Morton
@ 2007-02-21 21:22         ` Alan Stern
  2007-02-21 21:31           ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Stern @ 2007-02-21 21:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev, Greg KH,
	Kumar Gala, Linux Kernel list

On Wed, 21 Feb 2007, Andrew Morton wrote:

> > > It seems like usb-storage and aio are completely off in the weeds.   
> > > Ideas?
> > 
> > It seems usb-storage should remove some kmalloc and use mempool() for
> > urb...  Is someone working on this? And idea?
> 
> I think Pete said that we're supposed to be using GFP_NOIO in there.

We _are_ using it.

> Not that it'll help much: the VM calls throttle_vm_writeout() for GFP_NOIO
> and GFP_NOFS allocations, which is a bug.  Because if the caller holds
> locks which prevent filesystem or IO progress, we deadlock.
> 
> I'll fix the VM if someone else fixes USB ;)

What else needs to be fixed?

Alan Stern


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 21:22         ` [linux-usb-devel] " Alan Stern
@ 2007-02-21 21:31           ` Andrew Morton
  2007-02-21 21:50             ` Alan Stern
  2007-02-22  7:40             ` Kumar Gala
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2007-02-21 21:31 UTC (permalink / raw)
  To: Alan Stern
  Cc: OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev, Greg KH,
	Kumar Gala, Linux Kernel list

On Wed, 21 Feb 2007 16:22:17 -0500 (EST)
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Wed, 21 Feb 2007, Andrew Morton wrote:
> 
> > > > It seems like usb-storage and aio are completely off in the weeds.   
> > > > Ideas?
> > > 
> > > It seems usb-storage should remove some kmalloc and use mempool() for
> > > urb...  Is someone working on this? And idea?
> > 
> > I think Pete said that we're supposed to be using GFP_NOIO in there.
> 
> We _are_ using it.

How admirably prompt.

> > Not that it'll help much: the VM calls throttle_vm_writeout() for GFP_NOIO
> > and GFP_NOFS allocations, which is a bug.  Because if the caller holds
> > locks which prevent filesystem or IO progress, we deadlock.
> > 
> > I'll fix the VM if someone else fixes USB ;)
> 
> What else needs to be fixed?

Would be nice if someone can confirm that this fixes it:



From: Andrew Morton <akpm@linux-foundation.org>

throttle_vm_writeout() is designed to wait for the dirty levels to subside. 
But if the caller holds IO or FS locks, we might be holding up that writeout.

So change it to take a single nap to give other devices a chance to clean some
memory, then return.

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/writeback.h |    2 +-
 mm/page-writeback.c       |   13 +++++++++++--
 mm/vmscan.c               |    2 +-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff -puN mm/vmscan.c~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations mm/vmscan.c
--- a/mm/vmscan.c~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
+++ a/mm/vmscan.c
@@ -952,7 +952,7 @@ static unsigned long shrink_zone(int pri
 		}
 	}
 
-	throttle_vm_writeout();
+	throttle_vm_writeout(sc->gfp_mask);
 
 	atomic_dec(&zone->reclaim_in_progress);
 	return nr_reclaimed;
diff -puN mm/page-writeback.c~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations mm/page-writeback.c
--- a/mm/page-writeback.c~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
+++ a/mm/page-writeback.c
@@ -296,11 +296,21 @@ void balance_dirty_pages_ratelimited_nr(
 }
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
 
-void throttle_vm_writeout(void)
+void throttle_vm_writeout(gfp_t gfp_mask)
 {
 	long background_thresh;
 	long dirty_thresh;
 
+	if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
+		/*
+		 * The caller might hold locks which can prevert IO completion
+		 * or progress in the filesystem.  So we cannot just sit here
+		 * waiting for IO to complete.
+		 */
+		congestion_wait(WRITE, HZ/10);
+		return;
+	}
+
         for ( ; ; ) {
 		get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
 
@@ -317,7 +327,6 @@ void throttle_vm_writeout(void)
         }
 }
 
-
 /*
  * writeback at least _min_pages, and keep writing until the amount of dirty
  * memory is less than the background threshold, or until we're all clean.
diff -puN include/linux/writeback.h~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations include/linux/writeback.h
--- a/include/linux/writeback.h~throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations
+++ a/include/linux/writeback.h
@@ -84,7 +84,7 @@ static inline void wait_on_inode(struct 
 int wakeup_pdflush(long nr_pages);
 void laptop_io_completion(void);
 void laptop_sync_completion(void);
-void throttle_vm_writeout(void);
+void throttle_vm_writeout(gfp_t gfp_mask);
 
 /* These are exported to sysctl. */
 extern int dirty_background_ratio;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 21:31           ` Andrew Morton
@ 2007-02-21 21:50             ` Alan Stern
  2007-02-21 22:54               ` Andrew Morton
  2007-02-22  7:40             ` Kumar Gala
  1 sibling, 1 reply; 16+ messages in thread
From: Alan Stern @ 2007-02-21 21:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev, Greg KH,
	Kumar Gala, Linux Kernel list

On Wed, 21 Feb 2007, Andrew Morton wrote:

> On Wed, 21 Feb 2007 16:22:17 -0500 (EST)
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > On Wed, 21 Feb 2007, Andrew Morton wrote:
> > 
> > > > > It seems like usb-storage and aio are completely off in the weeds.   
> > > > > Ideas?
> > > > 
> > > > It seems usb-storage should remove some kmalloc and use mempool() for
> > > > urb...  Is someone working on this? And idea?
> > > 
> > > I think Pete said that we're supposed to be using GFP_NOIO in there.
> > 
> > We _are_ using it.
> 
> How admirably prompt.

Shucks, we've been using it for years...

> > > Not that it'll help much: the VM calls throttle_vm_writeout() for GFP_NOIO
> > > and GFP_NOFS allocations, which is a bug.  Because if the caller holds
> > > locks which prevent filesystem or IO progress, we deadlock.
> > > 
> > > I'll fix the VM if someone else fixes USB ;)
> > 
> > What else needs to be fixed?
> 
> Would be nice if someone can confirm that this fixes it:

Not having experienced the problem, I can't confirm the fix.  However...

> +	if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {

Is that really the correct test?  I don't know enough about the memory 
management subsystem to say one way or the other.  What's special about 
having both flags set?

> +		/*
> +		 * The caller might hold locks which can prevert IO completion
--------------------------------------------------------------^  Typo

Although perhaps "prevert" is an acceptable neologism in this context.

> +		 * or progress in the filesystem.  So we cannot just sit here
> +		 * waiting for IO to complete.
> +		 */

Alan Stern


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 21:50             ` Alan Stern
@ 2007-02-21 22:54               ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2007-02-21 22:54 UTC (permalink / raw)
  To: Alan Stern
  Cc: OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev, Greg KH,
	Kumar Gala, Linux Kernel list

On Wed, 21 Feb 2007 16:50:23 -0500 (EST)
Alan Stern <stern@rowland.harvard.edu> wrote:

> > +	if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> 
> Is that really the correct test?  I don't know enough about the memory 
> management subsystem to say one way or the other.  What's special about 
> having both flags set?

yup.  We're saying "if the caller is unable to take either IO locks or FS
locks, don't wait on FS or IO completion".

ie: don't wait on writeout progress unless we know that both the IO system
and the FS are able to make progress.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-21 21:31           ` Andrew Morton
  2007-02-21 21:50             ` Alan Stern
@ 2007-02-22  7:40             ` Kumar Gala
  2007-02-22 18:20               ` Kumar Gala
  1 sibling, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-22  7:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Stern, OGAWA Hirofumi, linux-usb-devel, Pete Zaitcev,
	Greg KH, Linux Kernel list


On Feb 21, 2007, at 3:31 PM, Andrew Morton wrote:

> On Wed, 21 Feb 2007 16:22:17 -0500 (EST)
> Alan Stern <stern@rowland.harvard.edu> wrote:
>
>> On Wed, 21 Feb 2007, Andrew Morton wrote:
>>
>>>>> It seems like usb-storage and aio are completely off in the weeds.
>>>>> Ideas?
>>>>
>>>> It seems usb-storage should remove some kmalloc and use mempool 
>>>> () for
>>>> urb...  Is someone working on this? And idea?
>>>
>>> I think Pete said that we're supposed to be using GFP_NOIO in there.
>>
>> We _are_ using it.
>
> How admirably prompt.
>
>>> Not that it'll help much: the VM calls throttle_vm_writeout() for  
>>> GFP_NOIO
>>> and GFP_NOFS allocations, which is a bug.  Because if the caller  
>>> holds
>>> locks which prevent filesystem or IO progress, we deadlock.
>>>
>>> I'll fix the VM if someone else fixes USB ;)
>>
>> What else needs to be fixed?
>
> Would be nice if someone can confirm that this fixes it:

Doesn't seem to help my problem in a quick test, will get more data  
in the morning.

- k

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-22  7:40             ` Kumar Gala
@ 2007-02-22 18:20               ` Kumar Gala
  2007-02-22 21:57                 ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Kumar Gala @ 2007-02-22 18:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Stern, OGAWA Hirofumi, USB development list, Pete Zaitcev,
	Greg KH, Linux Kernel list

>>>> Not that it'll help much: the VM calls throttle_vm_writeout()  
>>>> for GFP_NOIO
>>>> and GFP_NOFS allocations, which is a bug.  Because if the caller  
>>>> holds
>>>> locks which prevent filesystem or IO progress, we deadlock.
>>>>
>>>> I'll fix the VM if someone else fixes USB ;)
>>>
>>> What else needs to be fixed?
>>
>> Would be nice if someone can confirm that this fixes it:
>
> Doesn't seem to help my problem in a quick test, will get more data  
> in the morning.
\x7f
Well, I didn't realize the patch you sent via mm-commits and the one  
here are actually different.  I noticed that mm-commits one has:

+       if ((gfp_mask & (__GFP_FS|__GFP_IO)) != __GFP_FS|__GFP_IO) {

vs

+       if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {

The second seems to make more sense.  I tested with the first last  
night which didn't help.

With the proper patch in place things look good.  Is this a candidate  
for 2.6.20-stable?

- k

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-usb-devel] 2.6.20 kernel hang with USB drive and vfat doing ftruncate
  2007-02-22 18:20               ` Kumar Gala
@ 2007-02-22 21:57                 ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2007-02-22 21:57 UTC (permalink / raw)
  To: Kumar Gala
  Cc: stern, hirofumi, linux-usb-devel, zaitcev, gregkh, linux-kernel

> On Thu, 22 Feb 2007 12:20:06 -0600 Kumar Gala <galak@kernel.crashing.org> wrote:
> +       if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> 
> The second seems to make more sense.  I tested with the first last  
> night which didn't help.
> 
> With the proper patch in place things look good.  Is this a candidate  
> for 2.6.20-stable?

I suppose so, yes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-02-22 21:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-16 19:54 2.6.20 kernel hang with USB drive and vfat doing ftruncate Kumar Gala
2007-02-18 16:10 ` OGAWA Hirofumi
2007-02-19 21:58   ` Kumar Gala
2007-02-19 22:19     ` OGAWA Hirofumi
2007-02-19 22:27       ` Kumar Gala
2007-02-20 17:20         ` OGAWA Hirofumi
2007-02-19 22:06   ` Kumar Gala
2007-02-21 20:18     ` OGAWA Hirofumi
2007-02-21 20:57       ` Andrew Morton
2007-02-21 21:22         ` [linux-usb-devel] " Alan Stern
2007-02-21 21:31           ` Andrew Morton
2007-02-21 21:50             ` Alan Stern
2007-02-21 22:54               ` Andrew Morton
2007-02-22  7:40             ` Kumar Gala
2007-02-22 18:20               ` Kumar Gala
2007-02-22 21:57                 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).