LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.24-rc2 XFS nfsd hang
@ 2007-11-14  7:04 Chris Wedgwood
  2007-11-14  7:43 ` Benny Halevy
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-14  7:04 UTC (permalink / raw)
  To: linux-xfs; +Cc: LKML

With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
see a hang when accessing some NFS exported XFS filesystems.  Local
access to these filesystems ahead of time works without problems.

This does not occur with 2.6.23.1.  The filesystem does not appear to
be corrupt.


The call chain for the wedged process is:

    [ 1462.911256] nfsd          D ffffffff80547840  4760  2966      2
    [ 1462.911283]  ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
    [ 1462.911322]  ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
    [ 1462.911360]  ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
    [ 1462.911391] Call Trace:
    [ 1462.911417]  [<ffffffff8052e638>] __down+0xe9/0x101
    [ 1462.911437]  [<ffffffff8022cc80>] default_wake_function+0x0/0xe
    [ 1462.911458]  [<ffffffff8052e275>] __down_failed+0x35/0x3a
    [ 1462.911480]  [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
    [ 1462.911501]  [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
    [ 1462.911522]  [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
    [ 1462.911543]  [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
    [ 1462.911564]  [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
    [ 1462.911586]  [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
    [ 1462.911607]  [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
    [ 1462.911629]  [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
    [ 1462.911652]  [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
    [ 1462.911673]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
    [ 1462.911694]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
    [ 1462.911717]  [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
    [ 1462.911738]  [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
    [ 1462.911761]  [<ffffffff8036d658>] submit_bio+0x10d/0x114
    [ 1462.911781]  [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
    [ 1462.911802]  [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
    [ 1462.911823]  [<ffffffff80353436>] xfs_lookup+0x44/0x6f
    [ 1462.911843]  [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
    [ 1462.915246]  [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
    [ 1462.915267]  [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
    [ 1462.915289]  [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
    [ 1462.915311]  [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
    [ 1462.915333]  [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
    [ 1462.915355]  [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
    [ 1462.915377]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
    [ 1462.915399]  [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
    [ 1462.915424]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.915448]  [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
    [ 1462.915469]  [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
    [ 1462.915491]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.915514]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.915534]  [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
    [ 1462.915557]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.915579]  [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
    [ 1462.915599]  [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
    [ 1462.915619]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.915642]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
    [ 1462.915663]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
    [ 1462.915686]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
    [ 1462.915706]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
    [ 1462.915729]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
    [ 1462.915749]  [<ffffffff8020c648>] child_rip+0xa/0x12
    [ 1462.915769]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
    [ 1462.915792]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
    [ 1462.915812]  [<ffffffff8020c63e>] child_rip+0x0/0x12

Over time other processes pile up beind this.

    [ 1462.910728] nfsd          D ffffffffffffffff  5440  2965      2
    [ 1462.910769]  ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
    [ 1462.910812]  ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
    [ 1462.910852]  ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
    [ 1462.910882] Call Trace:
    [ 1462.910909]  [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
    [ 1462.910931]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
    [ 1462.910950]  [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
    [ 1462.910971]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
    [ 1462.910994]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.911015]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
    [ 1462.911037]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
    [ 1462.911057]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
    [ 1462.911079]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
    [ 1462.911102]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
    [ 1462.911122]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
    [ 1462.911143]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
    [ 1462.911165]  [<ffffffff8020c648>] child_rip+0xa/0x12
    [ 1462.911184]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
    [ 1462.911206]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
    [ 1462.911225]  [<ffffffff8020c63e>] child_rip+0x0/0x12


Any suggestions other than to bisect this?  (Bisection might be
painful as it crosses the x86-merge.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14  7:04 2.6.24-rc2 XFS nfsd hang Chris Wedgwood
@ 2007-11-14  7:43 ` Benny Halevy
  2007-11-14 12:59   ` J. Bruce Fields
  2007-11-14 11:49 ` 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? Chris Wedgwood
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Benny Halevy @ 2007-11-14  7:43 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-xfs, LKML, Christian Kujau, J. Bruce Fields

I wonder if this is a similar hang to what Christian was seeing here:
http://lkml.org/lkml/2007/11/13/319

Benny

On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood <cw@f00f.org> wrote:
> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems.  Local
> access to these filesystems ahead of time works without problems.
> 
> This does not occur with 2.6.23.1.  The filesystem does not appear to
> be corrupt.
> 
> 
> The call chain for the wedged process is:
> 
>     [ 1462.911256] nfsd          D ffffffff80547840  4760  2966      2
>     [ 1462.911283]  ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
>     [ 1462.911322]  ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
>     [ 1462.911360]  ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
>     [ 1462.911391] Call Trace:
>     [ 1462.911417]  [<ffffffff8052e638>] __down+0xe9/0x101
>     [ 1462.911437]  [<ffffffff8022cc80>] default_wake_function+0x0/0xe
>     [ 1462.911458]  [<ffffffff8052e275>] __down_failed+0x35/0x3a
>     [ 1462.911480]  [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
>     [ 1462.911501]  [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
>     [ 1462.911522]  [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
>     [ 1462.911543]  [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
>     [ 1462.911564]  [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
>     [ 1462.911586]  [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
>     [ 1462.911607]  [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
>     [ 1462.911629]  [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
>     [ 1462.911652]  [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
>     [ 1462.911673]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
>     [ 1462.911694]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
>     [ 1462.911717]  [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
>     [ 1462.911738]  [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
>     [ 1462.911761]  [<ffffffff8036d658>] submit_bio+0x10d/0x114
>     [ 1462.911781]  [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
>     [ 1462.911802]  [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
>     [ 1462.911823]  [<ffffffff80353436>] xfs_lookup+0x44/0x6f
>     [ 1462.911843]  [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
>     [ 1462.915246]  [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
>     [ 1462.915267]  [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
>     [ 1462.915289]  [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
>     [ 1462.915311]  [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
>     [ 1462.915333]  [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
>     [ 1462.915355]  [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
>     [ 1462.915377]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
>     [ 1462.915399]  [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
>     [ 1462.915424]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915448]  [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
>     [ 1462.915469]  [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
>     [ 1462.915491]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915514]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915534]  [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
>     [ 1462.915557]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915579]  [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
>     [ 1462.915599]  [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
>     [ 1462.915619]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915642]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
>     [ 1462.915663]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
>     [ 1462.915686]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
>     [ 1462.915706]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
>     [ 1462.915729]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
>     [ 1462.915749]  [<ffffffff8020c648>] child_rip+0xa/0x12
>     [ 1462.915769]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
>     [ 1462.915792]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
>     [ 1462.915812]  [<ffffffff8020c63e>] child_rip+0x0/0x12
> 
> Over time other processes pile up beind this.
> 
>     [ 1462.910728] nfsd          D ffffffffffffffff  5440  2965      2
>     [ 1462.910769]  ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
>     [ 1462.910812]  ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
>     [ 1462.910852]  ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
>     [ 1462.910882] Call Trace:
>     [ 1462.910909]  [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
>     [ 1462.910931]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
>     [ 1462.910950]  [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
>     [ 1462.910971]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
>     [ 1462.910994]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.911015]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
>     [ 1462.911037]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.911057]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
>     [ 1462.911079]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
>     [ 1462.911102]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
>     [ 1462.911122]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
>     [ 1462.911143]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
>     [ 1462.911165]  [<ffffffff8020c648>] child_rip+0xa/0x12
>     [ 1462.911184]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
>     [ 1462.911206]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
>     [ 1462.911225]  [<ffffffff8020c63e>] child_rip+0x0/0x12
> 
> 
> Any suggestions other than to bisect this?  (Bisection might be
> painful as it crosses the x86-merge.)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* 2.6.24-rc2 XFS nfsd hang --- filldir change responsible?
  2007-11-14  7:04 2.6.24-rc2 XFS nfsd hang Chris Wedgwood
  2007-11-14  7:43 ` Benny Halevy
@ 2007-11-14 11:49 ` Chris Wedgwood
  2007-11-14 22:48   ` Christian Kujau
  2007-11-14 15:29 ` 2.6.24-rc2 XFS nfsd hang Christoph Hellwig
  2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
  3 siblings, 1 reply; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-14 11:49 UTC (permalink / raw)
  To: linux-xfs, Christoph Hellwig, David Chinner
  Cc: LKML, Benny Halevy, Christian Kujau

On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:

> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems.  Local
> access to these filesystems ahead of time works without problems.
>
> This does not occur with 2.6.23.1.  The filesystem does not appear
> to be corrupt.

After some bisection pain (sg broken in the middle and XFS not
compiling in other places) the regression seems to be:

    commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
    Author: Christoph Hellwig <hch@infradead.org>
    Date:   Tue Aug 28 13:58:24 2007 +1000

	[XFS] use filldir internally

There have been a lot of changes since this so reverting it and
retesting as-is won't work. I'll have to see what I can come up with
after some sleep.

I'm not building/testing with dmapi --- perhaps that makes a
difference here?  I would think it would have broken with xfsqa but
the number of bug reports seems small so far.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14  7:43 ` Benny Halevy
@ 2007-11-14 12:59   ` J. Bruce Fields
  2007-11-14 22:31     ` Christian Kujau
  0 siblings, 1 reply; 35+ messages in thread
From: J. Bruce Fields @ 2007-11-14 12:59 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Chris Wedgwood, linux-xfs, LKML, Christian Kujau

On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote:
> I wonder if this is a similar hang to what Christian was seeing here:
> http://lkml.org/lkml/2007/11/13/319

Ah, thanks for noticing that.  Christian Kujau, is /data an xfs
partition?  There are a bunch of xfs commits in

^92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0
^291702f017efdfe556cb87b8530eb7d1ff08cbae
^1d677a6dfaac1d1cf51a7f58847077240985faf2
^fba956c46a72f9e7503fd464ffee43c632307e31
^bbf25010f1a6b761914430f5fca081ec8c7accd1
6e800af233e0bdf108efb7bd23c11ea6fa34cdeb
7b1915a989ea4d426d0fd98974ab80f30ef1d779
c223701cf6c706f42840631c1ca919a18e6e2800
f77bf01425b11947eeb3b5b54685212c302741b8 

which was the range remaining for him to bisect.

--b.

> 
> Benny
> 
> On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood <cw@f00f.org> wrote:
> > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> > see a hang when accessing some NFS exported XFS filesystems.  Local
> > access to these filesystems ahead of time works without problems.
> > 
> > This does not occur with 2.6.23.1.  The filesystem does not appear to
> > be corrupt.
> > 
> > 
> > The call chain for the wedged process is:
> > 
> >     [ 1462.911256] nfsd          D ffffffff80547840  4760  2966      2
> >     [ 1462.911283]  ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
> >     [ 1462.911322]  ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
> >     [ 1462.911360]  ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> >     [ 1462.911391] Call Trace:
> >     [ 1462.911417]  [<ffffffff8052e638>] __down+0xe9/0x101
> >     [ 1462.911437]  [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> >     [ 1462.911458]  [<ffffffff8052e275>] __down_failed+0x35/0x3a
> >     [ 1462.911480]  [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> >     [ 1462.911501]  [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> >     [ 1462.911522]  [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
> >     [ 1462.911543]  [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
> >     [ 1462.911564]  [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
> >     [ 1462.911586]  [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
> >     [ 1462.911607]  [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
> >     [ 1462.911629]  [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
> >     [ 1462.911652]  [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
> >     [ 1462.911673]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> >     [ 1462.911694]  [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> >     [ 1462.911717]  [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
> >     [ 1462.911738]  [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
> >     [ 1462.911761]  [<ffffffff8036d658>] submit_bio+0x10d/0x114
> >     [ 1462.911781]  [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
> >     [ 1462.911802]  [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
> >     [ 1462.911823]  [<ffffffff80353436>] xfs_lookup+0x44/0x6f
> >     [ 1462.911843]  [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
> >     [ 1462.915246]  [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
> >     [ 1462.915267]  [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
> >     [ 1462.915289]  [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
> >     [ 1462.915311]  [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
> >     [ 1462.915333]  [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
> >     [ 1462.915355]  [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
> >     [ 1462.915377]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> >     [ 1462.915399]  [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
> >     [ 1462.915424]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915448]  [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
> >     [ 1462.915469]  [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
> >     [ 1462.915491]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915514]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915534]  [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> >     [ 1462.915557]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915579]  [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> >     [ 1462.915599]  [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> >     [ 1462.915619]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915642]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> >     [ 1462.915663]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> >     [ 1462.915686]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> >     [ 1462.915706]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
> >     [ 1462.915729]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> >     [ 1462.915749]  [<ffffffff8020c648>] child_rip+0xa/0x12
> >     [ 1462.915769]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> >     [ 1462.915792]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> >     [ 1462.915812]  [<ffffffff8020c63e>] child_rip+0x0/0x12
> > 
> > Over time other processes pile up beind this.
> > 
> >     [ 1462.910728] nfsd          D ffffffffffffffff  5440  2965      2
> >     [ 1462.910769]  ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
> >     [ 1462.910812]  ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
> >     [ 1462.910852]  ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
> >     [ 1462.910882] Call Trace:
> >     [ 1462.910909]  [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
> >     [ 1462.910931]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> >     [ 1462.910950]  [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
> >     [ 1462.910971]  [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> >     [ 1462.910994]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.911015]  [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> >     [ 1462.911037]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.911057]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> >     [ 1462.911079]  [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> >     [ 1462.911102]  [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> >     [ 1462.911122]  [<ffffffff805215cd>] svc_process+0x3f8/0x717
> >     [ 1462.911143]  [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> >     [ 1462.911165]  [<ffffffff8020c648>] child_rip+0xa/0x12
> >     [ 1462.911184]  [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> >     [ 1462.911206]  [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> >     [ 1462.911225]  [<ffffffff8020c63e>] child_rip+0x0/0x12
> > 
> > 
> > Any suggestions other than to bisect this?  (Bisection might be
> > painful as it crosses the x86-merge.)
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14  7:04 2.6.24-rc2 XFS nfsd hang Chris Wedgwood
  2007-11-14  7:43 ` Benny Halevy
  2007-11-14 11:49 ` 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? Chris Wedgwood
@ 2007-11-14 15:29 ` Christoph Hellwig
  2007-11-14 17:39   ` J. Bruce Fields
  2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
  3 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2007-11-14 15:29 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-xfs, LKML

On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:
> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems.  Local
> access to these filesystems ahead of time works without problems.
> 
> This does not occur with 2.6.23.1.  The filesystem does not appear to
> be corrupt.
> 

>     [ 1462.911360]  ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
>     [ 1462.911391] Call Trace:
>     [ 1462.911417]  [<ffffffff8052e638>] __down+0xe9/0x101
>     [ 1462.911437]  [<ffffffff8022cc80>] default_wake_function+0x0/0xe
>     [ 1462.911458]  [<ffffffff8052e275>] __down_failed+0x35/0x3a
>     [ 1462.911480]  [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
>     [ 1462.911501]  [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
>     [ 1462.911522]  [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45

this is bp->b_sema which lookup wants.

>     [ 1462.915534]  [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
>     [ 1462.915557]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915579]  [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
>     [ 1462.915599]  [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
>     [ 1462.915619]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
>     [ 1462.915642]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5

and this is the nasty nfsd case where a filldir callback calls back
into lookup.  I suspect we're somehow holding b_sema already.  Previously
this was okay because we weren't inside the actualy readdir code when
calling filldir but operate on a copy of the data.

This gem has bitten other filesystem before, I'll see if I can find a
way around it.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 15:29 ` 2.6.24-rc2 XFS nfsd hang Christoph Hellwig
@ 2007-11-14 17:39   ` J. Bruce Fields
  2007-11-14 17:44     ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: J. Bruce Fields @ 2007-11-14 17:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 03:29:52PM +0000, Christoph Hellwig wrote:
> On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:
> > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> > see a hang when accessing some NFS exported XFS filesystems.  Local
> > access to these filesystems ahead of time works without problems.
> > 
> > This does not occur with 2.6.23.1.  The filesystem does not appear to
> > be corrupt.
> > 
> 
> >     [ 1462.911360]  ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> >     [ 1462.911391] Call Trace:
> >     [ 1462.911417]  [<ffffffff8052e638>] __down+0xe9/0x101
> >     [ 1462.911437]  [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> >     [ 1462.911458]  [<ffffffff8052e275>] __down_failed+0x35/0x3a
> >     [ 1462.911480]  [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> >     [ 1462.911501]  [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> >     [ 1462.911522]  [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
> 
> this is bp->b_sema which lookup wants.
> 
> >     [ 1462.915534]  [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> >     [ 1462.915557]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915579]  [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> >     [ 1462.915599]  [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> >     [ 1462.915619]  [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> >     [ 1462.915642]  [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> 
> and this is the nasty nfsd case where a filldir callback calls back
> into lookup.  I suspect we're somehow holding b_sema already.  Previously
> this was okay because we weren't inside the actualy readdir code when
> calling filldir but operate on a copy of the data.
> 
> This gem has bitten other filesystem before, I'll see if I can find a
> way around it.

This must have come up before; feel free to remind me: is there any way
to make the interface easier to use?  (E.g. would it help if the filldir
callback could be passed a dentry?)

--b.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 17:39   ` J. Bruce Fields
@ 2007-11-14 17:44     ` Christoph Hellwig
  2007-11-14 17:53       ` J. Bruce Fields
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2007-11-14 17:44 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> This must have come up before; feel free to remind me: is there any way
> to make the interface easier to use?  (E.g. would it help if the filldir
> callback could be passed a dentry?)

The best thing for the filesystem would be to have a readdirplus
(or have it folded into readdir) instead of calling into lookup
from ->filldir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 17:44     ` Christoph Hellwig
@ 2007-11-14 17:53       ` J. Bruce Fields
  2007-11-14 18:02         ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: J. Bruce Fields @ 2007-11-14 17:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > This must have come up before; feel free to remind me: is there any way
> > to make the interface easier to use?  (E.g. would it help if the filldir
> > callback could be passed a dentry?)
> 
> The best thing for the filesystem would be to have a readdirplus
> (or have it folded into readdir) instead of calling into lookup
> from ->filldir.

And the readdirplus would pass a dentry to its equivalent of ->filldir?
Or something else?

--b.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 17:53       ` J. Bruce Fields
@ 2007-11-14 18:02         ` Christoph Hellwig
  2007-11-14 18:08           ` J. Bruce Fields
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2007-11-14 18:02 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote:
> On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > > This must have come up before; feel free to remind me: is there any way
> > > to make the interface easier to use?  (E.g. would it help if the filldir
> > > callback could be passed a dentry?)
> > 
> > The best thing for the filesystem would be to have a readdirplus
> > (or have it folded into readdir) instead of calling into lookup
> > from ->filldir.
> 
> And the readdirplus would pass a dentry to its equivalent of ->filldir?
> Or something else?

Personally I'd prefer it to only grow a struct stat or rather it's members
But the nfsd code currently expects a dentry so this might require some
major refactoring.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 18:02         ` Christoph Hellwig
@ 2007-11-14 18:08           ` J. Bruce Fields
  2007-11-21 15:07             ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: J. Bruce Fields @ 2007-11-14 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 06:02:41PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote:
> > On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> > > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > > > This must have come up before; feel free to remind me: is there any way
> > > > to make the interface easier to use?  (E.g. would it help if the filldir
> > > > callback could be passed a dentry?)
> > > 
> > > The best thing for the filesystem would be to have a readdirplus
> > > (or have it folded into readdir) instead of calling into lookup
> > > from ->filldir.
> > 
> > And the readdirplus would pass a dentry to its equivalent of ->filldir?
> > Or something else?
> 
> Personally I'd prefer it to only grow a struct stat or rather it's members
> But the nfsd code currently expects a dentry so this might require some
> major refactoring.

Well, we need to check for mountpoints, for example, so I don't see any
way out of needing a dentry.  What's the drawback?

--b.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 12:59   ` J. Bruce Fields
@ 2007-11-14 22:31     ` Christian Kujau
  2007-11-15  7:51       ` 2.6.24-rc2 XFS nfsd hang / smbd too Christian Kujau
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Kujau @ 2007-11-14 22:31 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Benny Halevy, Chris Wedgwood, linux-xfs, LKML

On Wed, 14 Nov 2007, J. Bruce Fields wrote:
> On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote:
>> I wonder if this is a similar hang to what Christian was seeing here:
>> http://lkml.org/lkml/2007/11/13/319
>
> Ah, thanks for noticing that.  Christian Kujau, is /data an xfs
> partition?

Sorry for the late reply :\

Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) 
on a NFS share which contained a XFS partition. I did not care for the 
underlying fs first so I just ls'ed my shares and noticed that it got 
stuck. Now that you mention it I tried again, with a (git-wise) current 
2.6 kernel and the same .config: http://nerdbynature.de/bits/2.6.24-rc2/nfsd/

Running ls on a ext3 or jfs backed nfs share did succeed, running ls on an 
xfs backed nfs share did not. The sysrq-t (see dmesg.2.gz please) looks 
like yours (to my untrained eye):

nfsd          D c04131c0     0  8535      2
       e7ea97b8 00000046 e7ea9000 c04131c0 e7ea97b8 e697e7e0 00000282 e697e7e8
       e7ea97e4 c0409ebc f71f3500 00000001 f71f3500 c0115540 e697e804 e697e804
       e697e7e0 8f082000 00000001 e7ea97f4 c0409cc2 00000004 00000062 e7ea9800
Nov 14 23:07:14 sheep kernel: [ 1870.124185] Call Trace:
[<c0409ebc>] __down+0x7c/0xd0
[<c0409cc2>] __down_failed+0xa/0x10
[<c0296d46>] xfs_buf_lock+0x46/0x50
[<c02985a2>] _xfs_buf_find+0xf2/0x190
[<c0298694>] xfs_buf_get_flags+0x54/0x120
[<c029877d>] xfs_buf_read_flags+0x1d/0x80
[<c0289afa>] xfs_trans_read_buf+0x4a/0x350
[<c025e049>] xfs_da_do_buf+0x409/0x760
[<c025e42f>] xfs_da_read_buf+0x2f/0x40
[<c02634f2>] xfs_dir2_leaf_lookup_int+0x172/0x270
[<c02637ce>] xfs_dir2_leaf_lookup+0x1e/0x90
[<c02608e4>] xfs_dir_lookup+0xe4/0x100
[<c028abde>] xfs_dir_lookup_int+0x2e/0x100
[<c028eee2>] xfs_lookup+0x62/0x90
[<c029b644>] xfs_vn_lookup+0x34/0x70
[<c016de06>] __lookup_hash+0xb6/0x100
[<c016ee6e>] lookup_one_len+0x4e/0x50
[<f9037769>] compose_entry_fh+0x59/0x120 [nfsd]
[<f9037c29>] encode_entry+0x329/0x3c0 [nfsd]
[<f9037cfb>] nfs3svc_encode_entry_plus+0x3b/0x50 [nfsd]
[<c02639b4>] xfs_dir2_leaf_getdents+0x174/0x900
[<c026070a>] xfs_readdir+0xba/0xd0
[<c0298d74>] xfs_file_readdir+0x44/0x70
[<c01726ae>] vfs_readdir+0x7e/0xa0
[<f902e6b3>] nfsd_readdir+0x73/0xe0 [nfsd]
[<f9036eea>] nfsd3_proc_readdirplus+0xda/0x200 [nfsd]
[<f902a2db>] nfsd_dispatch+0x11b/0x210 [nfsd]
[<f920f2ac>] svc_process+0x41c/0x760 [sunrpc]
[<f902a8c4>] nfsd+0x164/0x2a0 [nfsd]
[<c0103507>] kernel_thread_helper+0x7/0x10


>> Any suggestions other than to bisect this?  (Bisection might be
>> painful as it crosses the x86-merge.)

Make that "impossible" for me, as I could not boot the bisected kernel and 
marking versions as "bad" for unrelated things seems to invalidate the 
results. However, from ~2500 revisions (2.6.24-rc2 to 2.6.23.1) down to 
~20 or so in just 10 builds, that's pretty awesome.

Christian.
-- 
BOFH excuse #321:

Scheduled global CPU outage

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang --- filldir change responsible?
  2007-11-14 11:49 ` 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? Chris Wedgwood
@ 2007-11-14 22:48   ` Christian Kujau
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Kujau @ 2007-11-14 22:48 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: linux-xfs, Christoph Hellwig, David Chinner, LKML, Benny Halevy

On Wed, 14 Nov 2007, Chris Wedgwood wrote:
> After some bisection pain (sg broken in the middle and XFS not
> compiling in other places) the regression seems to be:
>
>    commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
>    Author: Christoph Hellwig <hch@infradead.org>
>    Date:   Tue Aug 28 13:58:24 2007 +1000

Following a git-bisect howto[0], I tried to revert this commit:

# git checkout master
# git revert 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
Auto-merged fs/xfs/linux-2.6/xfs_file.c
CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_file.c
Auto-merged fs/xfs/linux-2.6/xfs_vnode.h
CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_vnode.h
Auto-merged fs/xfs/xfs_dir2.c
CONFLICT (content): Merge conflict in fs/xfs/xfs_dir2.c
Auto-merged fs/xfs/xfs_dir2.h
Auto-merged fs/xfs/xfs_dir2_block.c
Auto-merged fs/xfs/xfs_dir2_sf.c
Auto-merged fs/xfs/xfs_vnodeops.c
CONFLICT (content): Merge conflict in fs/xfs/xfs_vnodeops.c
Automatic revert failed.  After resolving the conflicts,
mark the corrected paths with 'git add <paths>' and commit the result.

Any ideas?

Christian

[0] is this still up-to-date? 
http://kernel.org/pub/software/scm/git/docs/v1.4.4.4/howto/isolate-bugs-with-bisect.txt
-- 
BOFH excuse #423:

It's not RFC-822 compliant.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang / smbd too
  2007-11-14 22:31     ` Christian Kujau
@ 2007-11-15  7:51       ` Christian Kujau
  2007-11-15 14:44         ` Christian Kujau
                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Christian Kujau @ 2007-11-15  7:51 UTC (permalink / raw)
  To: LKML; +Cc: J. Bruce Fields, Benny Halevy, Chris Wedgwood, linux-xfs

On Wed, 14 Nov 2007, Christian Kujau wrote:
> Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) on 
> a NFS share which contained a XFS partition.

Since NFS was not working (the nfsd processes were already in D state), to 
mount a CIFS share from the very same server (and the same client). I'm 
exporting the same /data share (JFS), but, since it's smbd I don't have to 
export every single submount (as it is with NFS):

* with NFS:
server:/data      (jfs)
server:/data/sub  (xfs)

* with CIFS:
server:/data      (containing both the jfs and the xfs partition as one
                    single share to mount)

Upon accessing the /data/sub part of the CIFS share, the client hung, 
waiting for the server to respond (the [cifs] kernel thread on the client 
was spinning, waiting for i/o). On the server, similar things as with the 
nfsd processes happened (although I know that the smbd (Samba) processes 
are running completely in userspace):

http://nerdbynature.de/bits/2.6.24-rc2/nfsd/debug.3.txt.gz

Sysrq-t again on the server:
http://nerdbynature.de/bits/2.6.24-rc2/nfsd/dmesg.3.gz

smbd          D c04131c0     0 22782   3039
       e242ad60 00000046 e242a000 c04131c0 00000001 e7875264 00000246 e7f88a80
       e242ada8 c040914c 00000000 00000002 c016dc64 e7a3b7b8 e242a000 e7875284
       00000000 c016dc64 f7343d88 f6337e90 e7f88a80 e7875264 e242ad88 e7a3b7b8
Call Trace:
[<c040914c>] mutex_lock_nested+0xcc/0x2c0
[<c016dc64>] do_lookup+0xa4/0x190
[<c016f6f9>] __link_path_walk+0x749/0xd10
[<c016fd04>] link_path_walk+0x44/0xc0
[<c016fd98>] path_walk+0x18/0x20
[<c016ff98>] do_path_lookup+0x78/0x1c0
[<c0170998>] __user_walk_fd+0x38/0x60
[<c0169bd1>] vfs_stat_fd+0x21/0x50
[<c0169ca1>] vfs_stat+0x11/0x20
[<c0169cc4>] sys_stat64+0x14/0x30
[<c01028d6>] sysenter_past_esp+0x5f/0xa5
=======================

So, it's really not NFS but ?FS related?

Christian.
-- 
BOFH excuse #199:

the curls in your keyboard cord are losing electricity.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang / smbd too
  2007-11-15  7:51       ` 2.6.24-rc2 XFS nfsd hang / smbd too Christian Kujau
@ 2007-11-15 14:44         ` Christian Kujau
  2007-11-15 22:01         ` Christian Kujau
  2007-11-16  0:34         ` Chris Wedgwood
  2 siblings, 0 replies; 35+ messages in thread
From: Christian Kujau @ 2007-11-15 14:44 UTC (permalink / raw)
  To: LKML; +Cc: J. Bruce Fields, Benny Halevy, Chris Wedgwood, linux-xfs

On Thu, November 15, 2007 08:51, Christian Kujau wrote:
> Since NFS was not working (the nfsd processes were already in D state),
> to mount a CIFS share from the very same server (and the same client).

That should read:

Since NFS was not working (the nfsd processes were already in D state), I
decided to mount a CIFS share from the very same server (and the same
client). [...]

C.
-- 
BOFH excuse #442:

Trojan horse ran out of hay


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang / smbd too
  2007-11-15  7:51       ` 2.6.24-rc2 XFS nfsd hang / smbd too Christian Kujau
  2007-11-15 14:44         ` Christian Kujau
@ 2007-11-15 22:01         ` Christian Kujau
  2007-11-16  0:34         ` Chris Wedgwood
  2 siblings, 0 replies; 35+ messages in thread
From: Christian Kujau @ 2007-11-15 22:01 UTC (permalink / raw)
  To: LKML; +Cc: J. Bruce Fields, Benny Halevy, Chris Wedgwood, linux-xfs

On Thu, 15 Nov 2007, Christian Kujau wrote:
> Upon accessing the /data/sub part of the CIFS share, the client hung, waiting 
> for the server to respond (the [cifs] kernel thread on the client was 
> spinning, waiting for i/o). On the server, similar things as with the nfsd 
> processes happened

Turns out that the CIFS only hung because the server was already stuck 
because of the nfsd/XFS issue. After rebooting the server, I was 
able to access the CIFS shares (the xfs partition too) just fine. Yes, the 
xfs partition itsself has been checked too and no errors were found.

C.
-- 
BOFH excuse #348:

We're on Token Ring, and it looks like the token got loose.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang / smbd too
  2007-11-15  7:51       ` 2.6.24-rc2 XFS nfsd hang / smbd too Christian Kujau
  2007-11-15 14:44         ` Christian Kujau
  2007-11-15 22:01         ` Christian Kujau
@ 2007-11-16  0:34         ` Chris Wedgwood
  2007-11-16  9:17           ` 2.6.24-rc2 XFS nfsd hang Christian Kujau
  2 siblings, 1 reply; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-16  0:34 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, J. Bruce Fields, Benny Halevy, linux-xfs

On Thu, Nov 15, 2007 at 08:51:36AM +0100, Christian Kujau wrote:

> [<c040914c>] mutex_lock_nested+0xcc/0x2c0
> [<c016dc64>] do_lookup+0xa4/0x190
> [<c016f6f9>] __link_path_walk+0x749/0xd10
> [<c016fd04>] link_path_walk+0x44/0xc0
> [<c016fd98>] path_walk+0x18/0x20
> [<c016ff98>] do_path_lookup+0x78/0x1c0
> [<c0170998>] __user_walk_fd+0x38/0x60
> [<c0169bd1>] vfs_stat_fd+0x21/0x50
> [<c0169ca1>] vfs_stat+0x11/0x20
> [<c0169cc4>] sys_stat64+0x14/0x30
> [<c01028d6>] sysenter_past_esp+0x5f/0xa5

nfsd already wedged up and holds a lock, this is expected.


I'm not sure what you're doing here, but a viable work-around for now
might be to use nfsv2 mounts, something like

  mount -o vers=2 ...

or to keep v3 and disable readdirplus doing something like:

  mount -o vers=3,nordirplus ...

The later I didn't test but was suggested on #linuxfs.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-16  0:34         ` Chris Wedgwood
@ 2007-11-16  9:17           ` Christian Kujau
  2007-11-16 11:03             ` Chris Wedgwood
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Kujau @ 2007-11-16  9:17 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: LKML, J. Bruce Fields, Benny Halevy, linux-xfs

On Fri, November 16, 2007 01:34, Chris Wedgwood wrote:
> I'm not sure what you're doing here, but a viable work-around for now
> might be to use nfsv2 mounts, something like
>
> mount -o vers=2 ...
> or to keep v3 and disable readdirplus doing something like:
> mount -o vers=3,nordirplus ...

OK, I'll try this. I hope this can be fixed somehow before 2.6.24...

Thank you for your time,
Christian.
-- 
BOFH excuse #442:

Trojan horse ran out of hay


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-16  9:17           ` 2.6.24-rc2 XFS nfsd hang Christian Kujau
@ 2007-11-16 11:03             ` Chris Wedgwood
  2007-11-16 14:19               ` Trond Myklebust
  0 siblings, 1 reply; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-16 11:03 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML, J. Bruce Fields, Benny Halevy, linux-xfs

On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote:

> OK, I'll try this. I hope this can be fixed somehow before 2.6.24...

Well, one simple nasty idea would be something like:

diff --git a/fs/Kconfig b/fs/Kconfig
index 429a002..da231fd 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1604,7 +1604,7 @@ config NFS_FS
 
 config NFS_V3
 	bool "Provide NFSv3 client support"
-	depends on NFS_FS
+	depends on NFS_FS && !XFS
 	help
 	  Say Y here if you want your NFS client to be able to speak version
 	  3 of the NFS protocol.

So people who are likely to be affect just side-step the issue until
it's resolved.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-16 11:03             ` Chris Wedgwood
@ 2007-11-16 14:19               ` Trond Myklebust
  2007-11-16 21:43                 ` Chris Wedgwood
  0 siblings, 1 reply; 35+ messages in thread
From: Trond Myklebust @ 2007-11-16 14:19 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Christian Kujau, LKML, J. Bruce Fields, Benny Halevy, linux-xfs


On Fri, 2007-11-16 at 03:03 -0800, Chris Wedgwood wrote:
> On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote:
> 
> > OK, I'll try this. I hope this can be fixed somehow before 2.6.24...
> 
> Well, one simple nasty idea would be something like:
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 429a002..da231fd 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -1604,7 +1604,7 @@ config NFS_FS
>  
>  config NFS_V3
>  	bool "Provide NFSv3 client support"
> -	depends on NFS_FS
> +	depends on NFS_FS && !XFS
>  	help
>  	  Say Y here if you want your NFS client to be able to speak version
>  	  3 of the NFS protocol.
> 
> So people who are likely to be affect just side-step the issue until
> it's resolved.

Very funny, but disabling XFS on the client won't help.

Trond


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-16 14:19               ` Trond Myklebust
@ 2007-11-16 21:43                 ` Chris Wedgwood
  2007-11-18 14:44                   ` Christian Kujau
  0 siblings, 1 reply; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-16 21:43 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christian Kujau, LKML, J. Bruce Fields, Benny Halevy, linux-xfs

On Fri, Nov 16, 2007 at 09:19:32AM -0500, Trond Myklebust wrote:

> Very funny, but disabling XFS on the client won't help.

Oops, I meant it for NFSD...  and I'm somewhat serious.  I'm not
saying it's a good long term solution, but a potentially safer
short-term workaround.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-16 21:43                 ` Chris Wedgwood
@ 2007-11-18 14:44                   ` Christian Kujau
  2007-11-18 15:31                     ` Justin Piszcz
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Kujau @ 2007-11-18 14:44 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Trond Myklebust, LKML, J. Bruce Fields, Benny Halevy, linux-xfs

On Fri, 16 Nov 2007, Chris Wedgwood wrote:
> Oops, I meant it for NFSD...  and I'm somewhat serious.  I'm not
> saying it's a good long term solution, but a potentially safer
> short-term workaround.

I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this 
one (and to not forget about it :)).

I wonder why so few people are seeing this, I'd have assumed that
NFSv3 && XFS is not sooo exotic...

Christian.
-- 
BOFH excuse #273:

The cord jumped over and hit the power switch.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-18 14:44                   ` Christian Kujau
@ 2007-11-18 15:31                     ` Justin Piszcz
  2007-11-18 22:07                       ` Christian Kujau
  0 siblings, 1 reply; 35+ messages in thread
From: Justin Piszcz @ 2007-11-18 15:31 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Chris Wedgwood, Trond Myklebust, LKML, J. Bruce Fields,
	Benny Halevy, linux-xfs



On Sun, 18 Nov 2007, Christian Kujau wrote:

> On Fri, 16 Nov 2007, Chris Wedgwood wrote:
>> Oops, I meant it for NFSD...  and I'm somewhat serious.  I'm not
>> saying it's a good long term solution, but a potentially safer
>> short-term workaround.
>
> I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this one 
> (and to not forget about it :)).
>
> I wonder why so few people are seeing this, I'd have assumed that
> NFSv3 && XFS is not sooo exotic...
Still on 2.6.23.x here (also use nfsv3 + xfs).

>
> Christian.
> -- 
> BOFH excuse #273:
>
> The cord jumped over and hit the power switch.
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-18 15:31                     ` Justin Piszcz
@ 2007-11-18 22:07                       ` Christian Kujau
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Kujau @ 2007-11-18 22:07 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Chris Wedgwood, Trond Myklebust, LKML, J. Bruce Fields,
	Benny Halevy, linux-xfs

On Sun, 18 Nov 2007, Justin Piszcz wrote:
>> I wonder why so few people are seeing this, I'd have assumed that
>> NFSv3 && XFS is not sooo exotic...
> Still on 2.6.23.x here (also use nfsv3 + xfs).

So, it's the "too few people are testing -rc kernels" issue again :(

Christian.
-- 
BOFH excuse #118:

the router thinks its a printer.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-14 18:08           ` J. Bruce Fields
@ 2007-11-21 15:07             ` Christoph Hellwig
  2007-11-21 19:03               ` J. Bruce Fields
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2007-11-21 15:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote:
> > Personally I'd prefer it to only grow a struct stat or rather it's members
> > But the nfsd code currently expects a dentry so this might require some
> > major refactoring.
> 
> Well, we need to check for mountpoints, for example, so I don't see any
> way out of needing a dentry.  What's the drawback?

You're right - we'd probably need the dentry.  The drawback is that
we need to always get it in the dcache.  Which might be a good thing
depending on the workload.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.24-rc2 XFS nfsd hang
  2007-11-21 15:07             ` Christoph Hellwig
@ 2007-11-21 19:03               ` J. Bruce Fields
  0 siblings, 0 replies; 35+ messages in thread
From: J. Bruce Fields @ 2007-11-21 19:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Wedgwood, linux-xfs, LKML

On Wed, Nov 21, 2007 at 03:07:46PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote:
> > > Personally I'd prefer it to only grow a struct stat or rather it's members
> > > But the nfsd code currently expects a dentry so this might require some
> > > major refactoring.
> > 
> > Well, we need to check for mountpoints, for example, so I don't see any
> > way out of needing a dentry.  What's the drawback?
> 
> You're right - we'd probably need the dentry.  The drawback is that
> we need to always get it in the dcache.  Which might be a good thing
> depending on the workload.

In any case, if the new api were only used by nfsd for now, then there'd
be no change here.

Seems like it might be worth a try.

--b.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] xfs: revert to double-buffering readdir
  2007-11-14  7:04 2.6.24-rc2 XFS nfsd hang Chris Wedgwood
                   ` (2 preceding siblings ...)
  2007-11-14 15:29 ` 2.6.24-rc2 XFS nfsd hang Christoph Hellwig
@ 2007-11-25 16:30 ` Christoph Hellwig
  2007-11-27 19:43   ` Chris Wedgwood
                     ` (2 more replies)
  3 siblings, 3 replies; 35+ messages in thread
From: Christoph Hellwig @ 2007-11-25 16:30 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-xfs, LKML

The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback.  The
only short-term fix for this is to revert to the old inefficient
double-buffering scheme.

This patch does exactly that and reverts xfs_file_readdir to what's
basically the 2.6.23 version minus the uio and vnops junk.

I'll try to find something more optimal for 2.6.25 or at least find a
way to use the proper version for local access.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c	2007-11-25 11:41:20.000000000 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c	2007-11-25 17:14:27.000000000 +0100
@@ -218,6 +218,15 @@
 }
 #endif /* CONFIG_XFS_DMAPI */
 
+/*
+ * Unfortunately we can't just use the clean and simple readdir implementation
+ * below, because nfs might call back into ->lookup from the filldir callback
+ * and that will deadlock the low-level btree code.
+ *
+ * Hopefully we'll find a better workaround that allows to use the optimal
+ * version at least for local readdirs for 2.6.25.
+ */
+#if 0
 STATIC int
 xfs_file_readdir(
 	struct file	*filp,
@@ -249,6 +258,121 @@
 		return -error;
 	return 0;
 }
+#else
+
+struct hack_dirent {
+	int		namlen;
+	loff_t		offset;
+	u64		ino;
+	unsigned int	d_type;
+	char		name[];
+};
+
+struct hack_callback {
+	char		*dirent;
+	size_t		len;
+	size_t		used;
+};
+
+STATIC int
+xfs_hack_filldir(
+	void		*__buf,
+	const char	*name,
+	int		namlen,
+	loff_t		offset,
+	u64		ino,
+	unsigned int	d_type)
+{
+	struct hack_callback *buf = __buf;
+	struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
+
+	if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len)
+		return -EINVAL;
+
+	de->namlen = namlen;
+	de->offset = offset;
+	de->ino = ino;
+	de->d_type = d_type;
+	memcpy(de->name, name, namlen);
+	buf->used += sizeof(struct hack_dirent) + namlen;
+	return 0;
+}
+
+STATIC int
+xfs_file_readdir(
+	struct file	*filp,
+	void		*dirent,
+	filldir_t	filldir)
+{
+	struct inode	*inode = filp->f_path.dentry->d_inode;
+	xfs_inode_t	*ip = XFS_I(inode);
+	struct hack_callback buf;
+	struct hack_dirent *de;
+	int		error;
+	loff_t		size;
+	int		eof = 0;
+	xfs_off_t       start_offset, curr_offset, offset;
+
+	/*
+	 * Try fairly hard to get memory
+	 */
+	buf.len = PAGE_CACHE_SIZE;
+	do {
+		buf.dirent = kmalloc(buf.len, GFP_KERNEL);
+		if (buf.dirent)
+			break;
+		buf.len >>= 1;
+	} while (buf.len >= 1024);
+
+	if (!buf.dirent)
+		return -ENOMEM;
+
+	curr_offset = filp->f_pos;
+	if (curr_offset == 0x7fffffff)
+		offset = 0xffffffff;
+	else
+		offset = filp->f_pos;
+
+	while (!eof) {
+		int reclen;
+		start_offset = offset;
+
+		buf.used = 0;
+		error = -xfs_readdir(ip, &buf, buf.len, &offset,
+				     xfs_hack_filldir);
+		if (error || offset == start_offset) {
+			size = 0;
+			break;
+		}
+
+		size = buf.used;
+		de = (struct hack_dirent *)buf.dirent;
+		while (size > 0) {
+			if (filldir(dirent, de->name, de->namlen,
+					curr_offset & 0x7fffffff,
+					de->ino, de->d_type)) {
+				goto done;
+			}
+
+			reclen = sizeof(struct hack_dirent) + de->namlen;
+			size -= reclen;
+			curr_offset = de->offset /* & 0x7fffffff */;
+			de = (struct hack_dirent *)((char *)de + reclen);
+		}
+	}
+
+ done:
+ 	if (!error) {
+		if (size == 0)
+			filp->f_pos = offset & 0x7fffffff;
+		else if (de)
+			filp->f_pos = curr_offset;
+	}
+
+	kfree(buf.dirent);
+	return error;
+}
+#endif
 
 STATIC int
 xfs_file_mmap(

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
@ 2007-11-27 19:43   ` Chris Wedgwood
  2007-11-29 23:45   ` Christian Kujau
  2007-11-30  7:22   ` Timothy Shimmin
  2 siblings, 0 replies; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-27 19:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, LKML

On Sun, Nov 25, 2007 at 04:30:14PM +0000, Christoph Hellwig wrote:

> The current readdir implementation deadlocks on a btree buffers
> locks because nfsd calls back into ->lookup from the filldir
> callback.  The only short-term fix for this is to revert to the old
> inefficient double-buffering scheme.

This seems to work really well here.

> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.

This should probably be submitted for inclusion stable-2.6.24.
Perhaps a version with the #if 0 [...]  stuff dropped?  (I'm happy to
send a patch for that if you prefer).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
  2007-11-27 19:43   ` Chris Wedgwood
@ 2007-11-29 23:45   ` Christian Kujau
  2007-11-30  7:47     ` David Chinner
  2007-11-30  7:22   ` Timothy Shimmin
  2 siblings, 1 reply; 35+ messages in thread
From: Christian Kujau @ 2007-11-29 23:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, LKML

On Sun, 25 Nov 2007, Christoph Hellwig wrote:
> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.

Thanks, works here too (without nordirplus as a mountoption).
Am I supposed to close the bug[0] or do you guys want to leave this
open to track the Real Fix (TM) for 2.6.25?

Again, thank you for the fix!
Christian.

[0] http://bugzilla.kernel.org/show_bug.cgi?id=9400
-- 
BOFH excuse #112:

The monitor is plugged into the serial port

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
  2007-11-27 19:43   ` Chris Wedgwood
  2007-11-29 23:45   ` Christian Kujau
@ 2007-11-30  7:22   ` Timothy Shimmin
  2007-11-30 22:36     ` Stephen Lord
  2007-12-03 15:09     ` Christoph Hellwig
  2 siblings, 2 replies; 35+ messages in thread
From: Timothy Shimmin @ 2007-11-30  7:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Wedgwood, linux-xfs, LKML

Christoph Hellwig wrote:
> The current readdir implementation deadlocks on a btree buffers locks
> because nfsd calls back into ->lookup from the filldir callback.  The
> only short-term fix for this is to revert to the old inefficient
> double-buffering scheme.
> 

Probably why Steve did this: :)

xfs_file.c
----------------------------
revision 1.40
date: 2001/03/15 23:33:20;  author: lord;  state: Exp;  lines: +54 -17
modid: 2.4.x-xfs:slinx:90125a
Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
then call the filldir function on each entry. This is instead of doing the
filldir deep in the bowels of xfs which causes locking problems.
----------------------------


Yes it looks like it is done equivalently to before (minus the uio stuff etc).
I don't know what the 7fff* masking is about but we did that previously.
I hadn't come across the name[] struct field before,
was used to name[0] (or name[1] in times gone by) but found that is
a kosher way of doing things too for the variable len string at the end.

Hmmm, don't see the point of "eof" local var now.
Previously bhv_vop_readdir() returned eof.
I presume if we don't move the offset (offset == startoffset) then
we're done and break out?
So we lost eof when going to the filldir in the getdents code etc...

--Tim

> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.
> 
> I'll try to find something more optimal for 2.6.25 or at least find a
> way to use the proper version for local access.
> 
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c	2007-11-25 11:41:20.000000000 +0100
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c	2007-11-25 17:14:27.000000000 +0100
> @@ -218,6 +218,15 @@
>  }
>  #endif /* CONFIG_XFS_DMAPI */
>  
> +/*
> + * Unfortunately we can't just use the clean and simple readdir implementation
> + * below, because nfs might call back into ->lookup from the filldir callback
> + * and that will deadlock the low-level btree code.
> + *
> + * Hopefully we'll find a better workaround that allows to use the optimal
> + * version at least for local readdirs for 2.6.25.
> + */
> +#if 0
>  STATIC int
>  xfs_file_readdir(
>  	struct file	*filp,
> @@ -249,6 +258,121 @@
>  		return -error;
>  	return 0;
>  }
> +#else
> +
> +struct hack_dirent {
> +	int		namlen;
> +	loff_t		offset;
> +	u64		ino;
> +	unsigned int	d_type;
> +	char		name[];
> +};
> +
> +struct hack_callback {
> +	char		*dirent;
> +	size_t		len;
> +	size_t		used;
> +};
> +
> +STATIC int
> +xfs_hack_filldir(
> +	void		*__buf,
> +	const char	*name,
> +	int		namlen,
> +	loff_t		offset,
> +	u64		ino,
> +	unsigned int	d_type)
> +{
> +	struct hack_callback *buf = __buf;
> +	struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
> +
> +	if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len)
> +		return -EINVAL;
> +
> +	de->namlen = namlen;
> +	de->offset = offset;
> +	de->ino = ino;
> +	de->d_type = d_type;
> +	memcpy(de->name, name, namlen);
> +	buf->used += sizeof(struct hack_dirent) + namlen;
> +	return 0;
> +}
> +
> +STATIC int
> +xfs_file_readdir(
> +	struct file	*filp,
> +	void		*dirent,
> +	filldir_t	filldir)
> +{
> +	struct inode	*inode = filp->f_path.dentry->d_inode;
> +	xfs_inode_t	*ip = XFS_I(inode);
> +	struct hack_callback buf;
> +	struct hack_dirent *de;
> +	int		error;
> +	loff_t		size;
> +	int		eof = 0;
> +	xfs_off_t       start_offset, curr_offset, offset;
> +
> +	/*
> +	 * Try fairly hard to get memory
> +	 */
> +	buf.len = PAGE_CACHE_SIZE;
> +	do {
> +		buf.dirent = kmalloc(buf.len, GFP_KERNEL);
> +		if (buf.dirent)
> +			break;
> +		buf.len >>= 1;
> +	} while (buf.len >= 1024);
> +
> +	if (!buf.dirent)
> +		return -ENOMEM;
> +
> +	curr_offset = filp->f_pos;
> +	if (curr_offset == 0x7fffffff)
> +		offset = 0xffffffff;
> +	else
> +		offset = filp->f_pos;
> +
> +	while (!eof) {
> +		int reclen;
> +		start_offset = offset;
> +
> +		buf.used = 0;
> +		error = -xfs_readdir(ip, &buf, buf.len, &offset,
> +				     xfs_hack_filldir);
> +		if (error || offset == start_offset) {
> +			size = 0;
> +			break;
> +		}
> +
> +		size = buf.used;
> +		de = (struct hack_dirent *)buf.dirent;
> +		while (size > 0) {
> +			if (filldir(dirent, de->name, de->namlen,
> +					curr_offset & 0x7fffffff,
> +					de->ino, de->d_type)) {
> +				goto done;
> +			}
> +
> +			reclen = sizeof(struct hack_dirent) + de->namlen;
> +			size -= reclen;
> +			curr_offset = de->offset /* & 0x7fffffff */;
> +			de = (struct hack_dirent *)((char *)de + reclen);
> +		}
> +	}
> +
> + done:
> + 	if (!error) {
> +		if (size == 0)
> +			filp->f_pos = offset & 0x7fffffff;
> +		else if (de)
> +			filp->f_pos = curr_offset;
> +	}
> +
> +	kfree(buf.dirent);
> +	return error;
> +}
> +#endif
>  
>  STATIC int
>  xfs_file_mmap(
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-29 23:45   ` Christian Kujau
@ 2007-11-30  7:47     ` David Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: David Chinner @ 2007-11-30  7:47 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Christoph Hellwig, linux-xfs, LKML

On Fri, Nov 30, 2007 at 12:45:05AM +0100, Christian Kujau wrote:
> On Sun, 25 Nov 2007, Christoph Hellwig wrote:
> >This patch does exactly that and reverts xfs_file_readdir to what's
> >basically the 2.6.23 version minus the uio and vnops junk.
> 
> Thanks, works here too (without nordirplus as a mountoption).
> Am I supposed to close the bug[0] or do you guys want to leave this
> open to track the Real Fix (TM) for 2.6.25?

I've been giving the fix some QA - that change appears to have caused
a different regression as well so I'm holding off for a little bit
until we know what the cause of the other regression is before deciding
whether to take this fix or back the entire change out.

Either way we'll include the fix in 2.6.24....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-30  7:22   ` Timothy Shimmin
@ 2007-11-30 22:36     ` Stephen Lord
  2007-11-30 23:04       ` Chris Wedgwood
  2007-12-03 15:11       ` Christoph Hellwig
  2007-12-03 15:09     ` Christoph Hellwig
  1 sibling, 2 replies; 35+ messages in thread
From: Stephen Lord @ 2007-11-30 22:36 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML



Wow, was it really that long ago!

Looks like the readdir is in the bowels of the btree code when  
filldir gets called
here, there are probably locks on several buffers in the btree at  
this point. This
will only show up for large directories I bet.

The xfs readdir code has the complete xfs inode number in its hands  
at this point
(filldir is not necessarily getting all the bits of it). All we are  
doing the lookup
for really is to get the inode number back again so we can get the inode
and get the attributes. Rather dumb really. There has got to be a way of
doing a callout structure here so that the inode number can be pushed
through filldir and back into an fs specific call. The fs then can do  
a lookup
by id - which is what it does most of the time for resolving nfs handles
anyway. Should be more efficient than the current scheme.

Just rambling, not a single line of code was consulted in writing  
this message.

You want to make a big fat btree directory for testing this stuff.  
Make sure it gets
at least a couple of layers of node blocks.

Steve

On Nov 30, 2007, at 1:22 AM, Timothy Shimmin wrote:

> Christoph Hellwig wrote:
>> The current readdir implementation deadlocks on a btree buffers locks
>> because nfsd calls back into ->lookup from the filldir callback.  The
>> only short-term fix for this is to revert to the old inefficient
>> double-buffering scheme.
>
> Probably why Steve did this: :)
>
> xfs_file.c
> ----------------------------
> revision 1.40
> date: 2001/03/15 23:33:20;  author: lord;  state: Exp;  lines: +54 -17
> modid: 2.4.x-xfs:slinx:90125a
> Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
> then call the filldir function on each entry. This is instead of  
> doing the
> filldir deep in the bowels of xfs which causes locking problems.
> ----------------------------
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-30 22:36     ` Stephen Lord
@ 2007-11-30 23:04       ` Chris Wedgwood
  2007-12-01 13:04         ` Stephen Lord
  2007-12-03 15:11       ` Christoph Hellwig
  1 sibling, 1 reply; 35+ messages in thread
From: Chris Wedgwood @ 2007-11-30 23:04 UTC (permalink / raw)
  To: Stephen Lord; +Cc: Timothy Shimmin, Christoph Hellwig, linux-xfs, LKML

On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:

> Looks like the readdir is in the bowels of the btree code when
> filldir gets called here, there are probably locks on several
> buffers in the btree at this point. This will only show up for large
> directories I bet.

I see it for fairly small directories.  Larger than what you can stuff
into an inode but less than a block (I'm not checking but fairly sure
that's the case).

> Just rambling, not a single line of code was consulted in writing
> this message.

Can you explain why the offset is capped and treated in an 'odd way'
at all?

+       curr_offset = filp->f_pos;
+       if (curr_offset == 0x7fffffff)
+               offset = 0xffffffff;
+       else
+               offset = filp->f_pos;

and later the offset to filldir is masked.  Is that some restriction
in filldir?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-30 23:04       ` Chris Wedgwood
@ 2007-12-01 13:04         ` Stephen Lord
  0 siblings, 0 replies; 35+ messages in thread
From: Stephen Lord @ 2007-12-01 13:04 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Timothy Shimmin, Christoph Hellwig, linux-xfs, LKML


On Nov 30, 2007, at 5:04 PM, Chris Wedgwood wrote:

> On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:
>
>> Looks like the readdir is in the bowels of the btree code when
>> filldir gets called here, there are probably locks on several
>> buffers in the btree at this point. This will only show up for large
>> directories I bet.
>
> I see it for fairly small directories.  Larger than what you can stuff
> into an inode but less than a block (I'm not checking but fairly sure
> that's the case).

I told you I did not read any code..... once a directory is out of  
the inode
and into disk blocks, there will be a lock on the buffer while the  
contents
are copied out.

>
>> Just rambling, not a single line of code was consulted in writing
>> this message.
>
> Can you explain why the offset is capped and treated in an 'odd way'
> at all?
>
> +       curr_offset = filp->f_pos;
> +       if (curr_offset == 0x7fffffff)
> +               offset = 0xffffffff;
> +       else
> +               offset = filp->f_pos;
>
> and later the offset to filldir is masked.  Is that some restriction
> in filldir?

Too long ago to remember exact reasons. The only thing I do recall is  
issues
with glibc readdir code which wanted to remember positions in a dir  
and seek
backwards. It was translating structures and could end up with more
data from the kernel than would fit in the user buffer. This may have  
something
to do with that and special values used as eof markers in the  
getdents output
and signed 32 bit arguments to lseek. In the original xfs directory  
code, the
offset of an entry was a 64 bit hash+offset value, that really  
confused things
when glibc attempted to do math on it.

I also recall that the offsets in the directory fields had different  
meanings
on different OS's. Sometimes it was the offset of the entry itself,  
sometimes it
was the offset of the next entry, that was one of the reasons for the  
translation
layer I think.

Steve


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-30  7:22   ` Timothy Shimmin
  2007-11-30 22:36     ` Stephen Lord
@ 2007-12-03 15:09     ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2007-12-03 15:09 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML

On Fri, Nov 30, 2007 at 06:22:09PM +1100, Timothy Shimmin wrote:
> Hmmm, don't see the point of "eof" local var now.
> Previously bhv_vop_readdir() returned eof.
> I presume if we don't move the offset (offset == startoffset) then
> we're done and break out?
> So we lost eof when going to the filldir in the getdents code etc...

Yes, it's just copy & paste.  We can trivially kill the variable.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] xfs: revert to double-buffering readdir
  2007-11-30 22:36     ` Stephen Lord
  2007-11-30 23:04       ` Chris Wedgwood
@ 2007-12-03 15:11       ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2007-12-03 15:11 UTC (permalink / raw)
  To: Stephen Lord
  Cc: Timothy Shimmin, Christoph Hellwig, Chris Wedgwood, linux-xfs, LKML

On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:
> Wow, was it really that long ago!
>
> Looks like the readdir is in the bowels of the btree code when filldir gets 
> called
> here, there are probably locks on several buffers in the btree at this 
> point. This
> will only show up for large directories I bet.

Chris saw it with block-form directories.  I've verified it works fine
with short-form directories, and the leaf code looks like it could
happen aswell.  I also remember gfs2 running into a similar problem.

> The xfs readdir code has the complete xfs inode number in its hands at this 
> point
> (filldir is not necessarily getting all the bits of it). All we are doing 
> the lookup
> for really is to get the inode number back again so we can get the inode
> and get the attributes. Rather dumb really. There has got to be a way of
> doing a callout structure here so that the inode number can be pushed
> through filldir and back into an fs specific call. The fs then can do a 
> lookup
> by id - which is what it does most of the time for resolving nfs handles
> anyway. Should be more efficient than the current scheme.

Yes, a lot more efficient.  But it means adding a new operation for use
by the nfs server.


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-12-03 15:11 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-14  7:04 2.6.24-rc2 XFS nfsd hang Chris Wedgwood
2007-11-14  7:43 ` Benny Halevy
2007-11-14 12:59   ` J. Bruce Fields
2007-11-14 22:31     ` Christian Kujau
2007-11-15  7:51       ` 2.6.24-rc2 XFS nfsd hang / smbd too Christian Kujau
2007-11-15 14:44         ` Christian Kujau
2007-11-15 22:01         ` Christian Kujau
2007-11-16  0:34         ` Chris Wedgwood
2007-11-16  9:17           ` 2.6.24-rc2 XFS nfsd hang Christian Kujau
2007-11-16 11:03             ` Chris Wedgwood
2007-11-16 14:19               ` Trond Myklebust
2007-11-16 21:43                 ` Chris Wedgwood
2007-11-18 14:44                   ` Christian Kujau
2007-11-18 15:31                     ` Justin Piszcz
2007-11-18 22:07                       ` Christian Kujau
2007-11-14 11:49 ` 2.6.24-rc2 XFS nfsd hang --- filldir change responsible? Chris Wedgwood
2007-11-14 22:48   ` Christian Kujau
2007-11-14 15:29 ` 2.6.24-rc2 XFS nfsd hang Christoph Hellwig
2007-11-14 17:39   ` J. Bruce Fields
2007-11-14 17:44     ` Christoph Hellwig
2007-11-14 17:53       ` J. Bruce Fields
2007-11-14 18:02         ` Christoph Hellwig
2007-11-14 18:08           ` J. Bruce Fields
2007-11-21 15:07             ` Christoph Hellwig
2007-11-21 19:03               ` J. Bruce Fields
2007-11-25 16:30 ` [PATCH] xfs: revert to double-buffering readdir Christoph Hellwig
2007-11-27 19:43   ` Chris Wedgwood
2007-11-29 23:45   ` Christian Kujau
2007-11-30  7:47     ` David Chinner
2007-11-30  7:22   ` Timothy Shimmin
2007-11-30 22:36     ` Stephen Lord
2007-11-30 23:04       ` Chris Wedgwood
2007-12-01 13:04         ` Stephen Lord
2007-12-03 15:11       ` Christoph Hellwig
2007-12-03 15:09     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).