LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
@ 2004-05-27  8:09 dag
  2004-05-27  8:18 ` Christoph Hellwig
  2004-05-27 10:05 ` Nathan Scott
  0 siblings, 2 replies; 7+ messages in thread
From: dag @ 2004-05-27  8:09 UTC (permalink / raw)
  To: nathans; +Cc: linux-kernel, linux-xfs

One failure, one success, one question  :-)

On Thu, 27 May 2004 15:58:29 +1000, Nathan Scott wrote:

> 
> On Wed, May 26, 2004 at 09:13:14AM -0700, dag@bakke.com wrote:
> > 
> > I experience hangs with xfsdump, when dumping my rootfs to a USB 2.0
> > ...
> > http://thaifood.homeip.net/xfsdumphang/xfsdump.dmesg.txt 
> > ...
> 
> The xfsdump stack trace in there is the important one.
> Can you try this patch and let me know how it goes?
> 
> --- fs/xfs/linux/xfs_buf.c.orig	2004-05-27 14:06:59.992936144 +1000
> +++ fs/xfs/linux/xfs_buf.c	2004-05-27 14:08:21.548537808 +1000
> @@ -370,8 +370,12 @@
>  	      retry:
>  		page = find_or_create_page(mapping, first + i, gfp_mask);
>  		if (unlikely(page == NULL)) {
> -			if (flags & PBF_READ_AHEAD)
> +			if (flags & PBF_READ_AHEAD) {
> +				for (--i; i >= 0; i--)
> +					page_cache_release(bp->pb_pages[i]);
> +				_pagebuf_free_pages(bp);
>  				return -ENOMEM;
> +			}
>  
>  			/*
>  			 * This could deadlock.

xfsdump completes "successfully", but prematurely. And I get an oops.

Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
c02381b3
*pde = 00000000
Oops: 0000 [#1]
PREEMPT 
Modules linked in: 3c589_cs
CPU:    0
EIP:    0060:[<c02381b3>]    Not tainted
EFLAGS: 00010206   (2.6.7-rc1-bk3) 
EIP is at _pagebuf_lookup_pages+0x2b3/0x2e0
eax: ffffffff   ebx: 0000ffff   ecx: cba8db84   edx: 00000000
esi: cba8dac0   edi: 000389b4   ebp: 00002000   esp: cefa5cc8
ds: 007b   es: 007b   ss: 0068
Process xfsdump (pid: 5769, threadinfo=cefa4000 task=cf54b6f0)
Stack: cffa6afc 000389b4 00001200 00000000 cefa4000 00000000 00000000
000389b4 
       00000002 00001200 00000000 00001000 00000009 cffa6afc cba8dac0
cba8dac0 
       00019011 00000002 c023867b cba8dac0 00019011 00000000 00002000
00019011 
Call Trace:
 [<c023867b>] pagebuf_get+0x19b/0x1b0
 [<c01f05e1>] xfs_btree_reada_bufs+0x71/0x90
 [<c0217668>] xfs_bulkstat+0xd18/0x1000
 [<c023c43b>] xfs_ioc_bulkstat+0x10b/0x1f0
 [<c0216360>] xfs_bulkstat_one+0x0/0x5f0
 [<c023c0bd>] xfs_ioctl+0x68d/0x840
 [<c0105865>] setup_rt_frame+0x1b5/0x2d0
 [<c022f9f7>] xfs_inactive_free_eofblocks+0x107/0x2d0
 [<c01657e1>] dput+0x31/0x220
 [<c023acad>] linvfs_ioctl+0x3d/0x70
 [<c0105865>] setup_rt_frame+0x1b5/0x2d0
 [<c0105865>] setup_rt_frame+0x1b5/0x2d0
 [<c0160d60>] sys_ioctl+0x100/0x270
 [<c0105865>] setup_rt_frame+0x1b5/0x2d0
 [<c014e1d1>] sys_close+0x61/0xa0
 [<c0105d59>] sysenter_past_esp+0x52/0x71
 [<c0105865>] setup_rt_frame+0x1b5/0x2d0

Code: 8b 02 f6 c4 08 75 f0 8b 42 04 40 74 14 83 42 04 ff 0f 98 c0 
 
xfsrestore: restore complete: 403 seconds elapsed
xfsrestore: Restore Status: SUCCESS

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda3             3.3G  2.0G  1.3G  62% /
/dev/scsi/host0/bus0/target0/lun0/part3
                      9.4G  562M  8.8G   6% /mnt/target



But: his patch from hch Works For Me:

--- 1.111/fs/xfs/linux/xfs_buf.c	2004-04-28 06:45:14 +02:00
+++ edited/fs/xfs/linux/xfs_buf.c	2004-05-26 18:58:14 +02:00
@@ -370,8 +370,12 @@
 	      retry:
 		page = find_or_create_page(mapping, first + i, gfp_mask);
 		if (unlikely(page == NULL)) {
-			if (flags & PBF_READ_AHEAD)
+			if (flags & PBF_READ_AHEAD) {
+				bp->pb_page_count = i;
+				for (i = 0; i < bp->pb_page_count; i++)
+					unlock_page(bp->pb_pages[i]);
 				return -ENOMEM;
+			}
 
 			/*
 			 * This could deadlock.


Tested two dumps now, and both completes successfully. And for real.
I have yet to boot on the new root fs, though. :-)

The one remaining question is: why does xfsrestore print
xfsrestore: WARNING: open_by_handle of mnt failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of bin failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of dev/rd failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of dev/ida failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of dev failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of sys failed:Bad file descriptor
xfsrestore: WARNING: open_by_handle of tftpboot failed:Bad file descriptor

etc. etc. for what appears to be every directory in the source fs? This
is at the end of the dump, just prior to the 

xfsrestore: restore complete: 403 seconds elapsed
xfsrestore: Restore Status: SUCCESS

message?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
  2004-05-27  8:09 xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3 dag
@ 2004-05-27  8:18 ` Christoph Hellwig
  2004-05-27 10:05 ` Nathan Scott
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2004-05-27  8:18 UTC (permalink / raw)
  To: dag; +Cc: nathans, linux-kernel, linux-xfs

My patch still wasn't complete, you're still leaking pages, just not
locked ones, this patch should be better and I'll check it in in a few
minutes:


--- 1.111/fs/xfs/linux/xfs_buf.c	2004-04-28 06:45:14 +02:00
+++ edited/fs/xfs/linux/xfs_buf.c	2004-05-27 08:38:46 +02:00
@@ -359,6 +359,7 @@
 	error = _pagebuf_get_pages(bp, page_count, flags);
 	if (unlikely(error))
 		return error;
+	bp->pb_flags |= _PBF_PAGE_CACHE;
 
 	offset = bp->pb_offset;
 	first = bp->pb_file_offset >> PAGE_CACHE_SHIFT;
@@ -370,8 +371,12 @@
 	      retry:
 		page = find_or_create_page(mapping, first + i, gfp_mask);
 		if (unlikely(page == NULL)) {
-			if (flags & PBF_READ_AHEAD)
+			if (flags & PBF_READ_AHEAD) {
+				bp->pb_page_count = i;
+				for (i = 0; i < bp->pb_page_count; i++)
+					unlock_page(bp->pb_pages[i]);
 				return -ENOMEM;
+			}
 
 			/*
 			 * This could deadlock.
@@ -426,8 +431,6 @@
 		for (i = 0; i < bp->pb_page_count; i++)
 			unlock_page(bp->pb_pages[i]);
 	}
-
-	bp->pb_flags |= _PBF_PAGE_CACHE;
 
 	if (page_count) {
 		/* if we have any uptodate pages, mark that in the buffer */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
  2004-05-27  8:09 xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3 dag
  2004-05-27  8:18 ` Christoph Hellwig
@ 2004-05-27 10:05 ` Nathan Scott
  1 sibling, 0 replies; 7+ messages in thread
From: Nathan Scott @ 2004-05-27 10:05 UTC (permalink / raw)
  To: dag; +Cc: linux-kernel, linux-xfs

On Thu, May 27, 2004 at 01:09:46AM -0700, dag@bakke.com wrote:
> One failure, one success, one question  :-)
> ...
> But: his patch from hch Works For Me:
> 

Yep, use that final patch from Christoph, thats got all of
the bases covered.

> The one remaining question is: why does xfsrestore print
> xfsrestore: WARNING: open_by_handle of mnt failed:Bad file descriptor

Thats familiar - I can't remember the exact cause anymore,
but I think a more recent xfsdump solve that for you.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
  2004-05-26 16:13 dag
  2004-05-26 20:37 ` Nathan Scott
@ 2004-05-27  5:58 ` Nathan Scott
  1 sibling, 0 replies; 7+ messages in thread
From: Nathan Scott @ 2004-05-27  5:58 UTC (permalink / raw)
  To: dag; +Cc: linux-kernel, linux-xfs

On Wed, May 26, 2004 at 09:13:14AM -0700, dag@bakke.com wrote:
> 
> I experience hangs with xfsdump, when dumping my rootfs to a USB 2.0
> ...
> http://thaifood.homeip.net/xfsdumphang/xfsdump.dmesg.txt 
> ...

The xfsdump stack trace in there is the important one.
Can you try this patch and let me know how it goes?

thanks.

-- 
Nathan


--- fs/xfs/linux/xfs_buf.c.orig	2004-05-27 14:06:59.992936144 +1000
+++ fs/xfs/linux/xfs_buf.c	2004-05-27 14:08:21.548537808 +1000
@@ -370,8 +370,12 @@
 	      retry:
 		page = find_or_create_page(mapping, first + i, gfp_mask);
 		if (unlikely(page == NULL)) {
-			if (flags & PBF_READ_AHEAD)
+			if (flags & PBF_READ_AHEAD) {
+				for (--i; i >= 0; i--)
+					page_cache_release(bp->pb_pages[i]);
+				_pagebuf_free_pages(bp);
 				return -ENOMEM;
+			}
 
 			/*
 			 * This could deadlock.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
  2004-05-26 16:13 dag
@ 2004-05-26 20:37 ` Nathan Scott
  2004-05-27  5:58 ` Nathan Scott
  1 sibling, 0 replies; 7+ messages in thread
From: Nathan Scott @ 2004-05-26 20:37 UTC (permalink / raw)
  To: dag; +Cc: linux-kernel, linux-xfs

hi Dag,

On Wed, May 26, 2004 at 09:13:14AM -0700, dag@bakke.com wrote:
> 
> I experience hangs with xfsdump, when dumping my rootfs to a USB 2.0
> connected drive. The hangs are reproducible within 0.2-2 GB of dump, and
> always come together with one or two instances of :
> 
> pagebuf_get: failed to lookup pages
> 
>  xfssyncd      S C04F25E0     0   331      1           342   317 (L-TLB)
>  cfccbf9c 00000046 c1370610 c04f25e0 cfc31d60 c0238bec cfc31d98 c04fecd8 
>  00000031 00000000 cfccbfb0 00002773 37e96cbf 00000210 c13707b8 000a43c5 
>  cfccbfb0 00000000 00000000 c03d2ec3 cfccbfb0 000a43c5 00000000 c048e508 
>  Call Trace:
>  [<c0238bec>] pagebuf_rele+0x2c/0x120
>  [<c03d2ec3>] schedule_timeout+0x63/0xc0
>  [<c0121110>] process_timeout+0x0/0x10
>  [<c023f5e7>] xfssyncd+0x57/0xc0
>  [<c023f590>] xfssyncd+0x0/0xc0
>  [<c0103f4d>] kernel_thread_helper+0x5/0x18
>  
> Anyone?
> 

This looks like the result of an earlier error on the code
path at that initial warning there (known problem) - in the
current code there is a situation where we attempt metadata
readahead, cannot initialise a XFS buffer completely due to
low memory, but fail to correctly tear down that partially
created buffer when passing back the (recoverable) error.
We're working on a fix.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
@ 2004-05-26 17:01 dag
  0 siblings, 0 replies; 7+ messages in thread
From: dag @ 2004-05-26 17:01 UTC (permalink / raw)
  To: linux-kernel

On Wed, 26 May 2004 09:13:14 -0700 (PDT), dag@bakke.com wrote:

> 
> 
> I experience hangs with xfsdump, when dumping my rootfs to a USB 2.0
> connected drive. The hangs are reproducible within 0.2-2 GB of dump, 

Bah... ambiguity...
xfsdump hangs. Not the kernel. So it could quite possibly be a bug in
xfsdump. But the 

pagebuf_get: failed to lookup pages

message in syslog makes me think otherwise.


Dag B

^ permalink raw reply	[flat|nested] 7+ messages in thread

* xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3
@ 2004-05-26 16:13 dag
  2004-05-26 20:37 ` Nathan Scott
  2004-05-27  5:58 ` Nathan Scott
  0 siblings, 2 replies; 7+ messages in thread
From: dag @ 2004-05-26 16:13 UTC (permalink / raw)
  To: linux-kernel


I experience hangs with xfsdump, when dumping my rootfs to a USB 2.0
connected drive. The hangs are reproducible within 0.2-2 GB of dump, and
always come together with one or two instances of :

pagebuf_get: failed to lookup pages

I do not know if this is a problem with xfs, ide, scsi, usb, VM or some
other area of the kernel. But it is reproducible with 2.6.6 + a few
select patches, and with plain 2.6.7-rc1-bk3.

I have collected sysrq-t, sysrq-p info. A snippet below.
If none of this explains the hang, maybe the gurus would like to point a
browser at:

http://thaifood.homeip.net/xfsdumphang/xfsdump.dmesg.txt 
http://thaifood.homeip.net/xfsdumphang/config-2.6.7-rc1-bk3
http://thaifood.homeip.net/xfsdumphang/lspci.txt
http://thaifood.homeip.net/xfsdumphang/lsusb.txt
 
 xfssyncd      S C04F25E0     0   331      1           342   317 (L-TLB)
 cfccbf9c 00000046 c1370610 c04f25e0 cfc31d60 c0238bec cfc31d98 c04fecd8 
 00000031 00000000 cfccbfb0 00002773 37e96cbf 00000210 c13707b8 000a43c5 
 cfccbfb0 00000000 00000000 c03d2ec3 cfccbfb0 000a43c5 00000000 c048e508 
 Call Trace:
 [<c0238bec>] pagebuf_rele+0x2c/0x120
 [<c03d2ec3>] schedule_timeout+0x63/0xc0
 [<c0121110>] process_timeout+0x0/0x10
 [<c023f5e7>] xfssyncd+0x57/0xc0
 [<c023f590>] xfssyncd+0x0/0xc0
 [<c0103f4d>] kernel_thread_helper+0x5/0x18
 
 usb-storage   S C04F2A88     0   342      1           343   331 (L-TLB)
 cfc09f4c 00000046 c13ff0d0 c04f2a88 0000020f 3ccbf196 00000000 c58bfcea 
 c58c004c 0000020f c13ff0d0 0000012d c58c004c 0000020f c1370238 c13b0f04 
 00000246 cfc08000 c1370090 c03d24c7 cfc08000 c13b0f0c 00000000 00000001 
 Call Trace:
 [<c03d24c7>] __down_interruptible+0xa7/0x140
 [<c0115e60>] default_wake_function+0x0/0x20
 [<c011555d>] wake_up_process+0x1d/0x30
 [<c03d2573>] __down_failed_interruptible+0x7/0xc
 [<c032eead>] .text.lock.usb+0x5/0x58
 [<c01158f7>] schedule_tail+0x17/0x50
 [<c0105c82>] ret_from_fork+0x6/0x14
 [<c032e150>] usb_stor_control_thread+0x0/0x280
 [<c032e150>] usb_stor_control_thread+0x0/0x280
 [<c0103f4d>] kernel_thread_helper+0x5/0x18
 
 scsi_eh_0     S C04F25E0     0   343      1           485   342 (L-TLB)
 cfab7f78 00000046 cfab96b0 c04f25e0 00000000 00000000 00000000 00000000 
 00000086 cfab7f7c c13ff650 000015c8 2850a2f5 00000184 cfab9858 cfab7fd4 
 00000246 cfab6000 cfab96b0 c03d24c7 cfab6000 cfab7fdc 00000000 00000001 
 Call Trace:
 [<c03d24c7>] __down_interruptible+0xa7/0x140
 [<c0115e60>] default_wake_function+0x0/0x20
 [<c03d2573>] __down_failed_interruptible+0x7/0xc
 [<c02ddca8>] .text.lock.scsi_error+0x41/0x49
 [<c02dd960>] scsi_error_handler+0x0/0x110
 [<c0103f4d>] kernel_thread_helper+0x5/0x18



A few more bits of info, as I have no idea where to start *): 
- the target filesystem is writeable after xfsdump hangs
- the USB2IDE chip is an ISD-300.
- the USB 2.0 controller is a NEC chip on a CardBus card.
- gcc 3.3, xfsdump 2.2.16

*) Yeah, I can start doing a binary search for a working kernel....



Anyone?


Dag B

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-05-27 10:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-27  8:09 xfsdump hangs - 2.6.6 && 2.6.7-rc1-bk3 dag
2004-05-27  8:18 ` Christoph Hellwig
2004-05-27 10:05 ` Nathan Scott
  -- strict thread matches above, loose matches on Subject: below --
2004-05-26 17:01 dag
2004-05-26 16:13 dag
2004-05-26 20:37 ` Nathan Scott
2004-05-27  5:58 ` Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).