LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* NFS root lockups with -next 20110113
@ 2011-01-13 12:06 Mark Brown
  2011-01-13 13:22 ` J. R. Okajima
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Brown @ 2011-01-13 12:06 UTC (permalink / raw)
  To: Trond Myklebust, Nick Piggin; +Cc: linux-nfs, linux-kernel, linux-fsdevel

With -next 20110113 I'm experiencing lockups shortly after userspace
starts when booting with my root filesystem on NFS, log below.  I can
boot into /bin/sh but running real userspace triggers this very easily.
This was introduced sometime this week.

I've not bisected or otherwise investigated much yet, but I do notice
code added recently by Nick in "fs: rcu-walk for path lookup" showing up
in the backtrace so including him in the CCs.

Tail of the console log:

INIT: version 2.86 booting
[   15.250000] BUG: spinlock recursion on CPU#0, rc/1151
[   15.250000]  lock: c74ad678, .magic: dead4ead, .owner: rc/1151, .owner_cpu: 0
[   15.260000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190454>] (do_raw_spin_lock+0x48/0x148)
[   15.260000] [<c0190454>] (do_raw_spin_lock+0x48/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180)
[   15.270000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58)
[   15.280000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00)
[   15.290000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc)
[   15.300000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90)
[   15.310000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54)
[   15.320000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34)
[   15.330000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30)
[   25.830000] BUG: spinlock lockup on CPU#0, rc/1151, c74ad678
[   25.830000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190518>] (do_raw_spin_lock+0x10c/0x148)
[   25.840000] [<c0190518>] (do_raw_spin_lock+0x10c/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180)
[   25.850000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58)
[   25.860000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00)
[   25.870000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc)
[   25.870000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90)
[   25.880000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54)
[   25.890000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34)
[   25.900000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: NFS root lockups with -next 20110113
  2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown
@ 2011-01-13 13:22 ` J. R. Okajima
  2011-01-13 13:28   ` Santosh Shilimkar
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-13 13:22 UTC (permalink / raw)
  To: Mark Brown
  Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs,
	linux-kernel, linux-fsdevel


Mark Brown:
> With -next 20110113 I'm experiencing lockups shortly after userspace
> starts when booting with my root filesystem on NFS, log below.  I can
> boot into /bin/sh but running real userspace triggers this very easily.
> This was introduced sometime this week.
>
> I've not bisected or otherwise investigated much yet, but I do notice
> code added recently by Nick in "fs: rcu-walk for path lookup" showing up
> in the backtrace so including him in the CCs.

This and a report from Santosh Shilimkar look like the same problem
which I reported, and I am testing this patch. If you can, please try it
too.
Note: Of course this is not offcial fix.


J. R. Okajima

diff --git a/fs/namei.c b/fs/namei.c
index 5bb7588..51d052f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
 {
 	struct fs_struct *fs = current->fs;
 	struct dentry *parent = nd->path.dentry;
+	int isroot;
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 	if (nd->root.mnt) {
@@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
 			goto err_root;
 	}
 	spin_lock(&parent->d_lock);
-	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	if (!__d_rcu_to_refcount(dentry, nd->seq))
-		goto err;
+	isroot = IS_ROOT(dentry);
+	if (!isroot) {
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		if (!__d_rcu_to_refcount(dentry, nd->seq))
+			goto err;
+	}
 	/*
 	 * If the sequence check on the child dentry passed, then the child has
 	 * not been removed from its parent. This means the parent dentry must
 	 * be valid and able to take a reference at this point.
 	 */
-	BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
+	BUG_ON(!isroot && dentry->d_parent != parent);
 	BUG_ON(!parent->d_count);
 	parent->d_count++;
-	spin_unlock(&dentry->d_lock);
+	if (!isroot)
+		spin_unlock(&dentry->d_lock);
 	spin_unlock(&parent->d_lock);
 	if (nd->root.mnt) {
 		path_get(&nd->root);
@@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
 	nd->flags &= ~LOOKUP_RCU;
 	return 0;
 err:
-	spin_unlock(&dentry->d_lock);
+	if (!isroot)
+		spin_unlock(&dentry->d_lock);
 	spin_unlock(&parent->d_lock);
 err_root:
 	if (nd->root.mnt)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: NFS root lockups with -next 20110113
  2011-01-13 13:22 ` J. R. Okajima
@ 2011-01-13 13:28   ` Santosh Shilimkar
  2011-01-13 13:45     ` J. R. Okajima
  2011-01-13 13:35   ` NFS root lockups with -next 20110113 Mark Brown
  2011-01-13 13:41   ` Santosh Shilimkar
  2 siblings, 1 reply; 15+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:28 UTC (permalink / raw)
  To: J. R. Okajima, Mark Brown
  Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel

> -----Original Message-----
> From: J. R. Okajima [mailto:hooanon05@yahoo.co.jp]
> Sent: Thursday, January 13, 2011 6:52 PM
> To: Mark Brown
> Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org
> Subject: Re: NFS root lockups with -next 20110113
>
>
> Mark Brown:
> > With -next 20110113 I'm experiencing lockups shortly after
> userspace
> > starts when booting with my root filesystem on NFS, log below.  I
> can
> > boot into /bin/sh but running real userspace triggers this very
> easily.
> > This was introduced sometime this week.
> >
> > I've not bisected or otherwise investigated much yet, but I do
> notice
> > code added recently by Nick in "fs: rcu-walk for path lookup"
> showing up
> > in the backtrace so including him in the CCs.
>
> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please
> try it
> too.
> Note: Of course this is not offcial fix.
>

It works. My board booted with NFS with below patch


>
> diff --git a/fs/namei.c b/fs/namei.c
> index 5bb7588..51d052f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
>  {
>  	struct fs_struct *fs = current->fs;
>  	struct dentry *parent = nd->path.dentry;
> +	int isroot;
>
>  	BUG_ON(!(nd->flags & LOOKUP_RCU));
>  	if (nd->root.mnt) {
> @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
>  			goto err_root;
>  	}
>  	spin_lock(&parent->d_lock);
> -	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> -	if (!__d_rcu_to_refcount(dentry, nd->seq))
> -		goto err;
> +	isroot = IS_ROOT(dentry);
> +	if (!isroot) {
> +		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> +		if (!__d_rcu_to_refcount(dentry, nd->seq))
> +			goto err;
> +	}
>  	/*
>  	 * If the sequence check on the child dentry passed, then the
> child has
>  	 * not been removed from its parent. This means the parent
> dentry must
>  	 * be valid and able to take a reference at this point.
>  	 */
> -	BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> +	BUG_ON(!isroot && dentry->d_parent != parent);
>  	BUG_ON(!parent->d_count);
>  	parent->d_count++;
> -	spin_unlock(&dentry->d_lock);
> +	if (!isroot)
> +		spin_unlock(&dentry->d_lock);
>  	spin_unlock(&parent->d_lock);
>  	if (nd->root.mnt) {
>  		path_get(&nd->root);
> @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
>  	nd->flags &= ~LOOKUP_RCU;
>  	return 0;
>  err:
> -	spin_unlock(&dentry->d_lock);
> +	if (!isroot)
> +		spin_unlock(&dentry->d_lock);
>  	spin_unlock(&parent->d_lock);
>  err_root:
>  	if (nd->root.mnt)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: NFS root lockups with -next 20110113
  2011-01-13 13:22 ` J. R. Okajima
  2011-01-13 13:28   ` Santosh Shilimkar
@ 2011-01-13 13:35   ` Mark Brown
  2011-01-13 13:41   ` Santosh Shilimkar
  2 siblings, 0 replies; 15+ messages in thread
From: Mark Brown @ 2011-01-13 13:35 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs,
	linux-kernel, linux-fsdevel

On Thu, Jan 13, 2011 at 10:22:07PM +0900, J. R. Okajima wrote:

> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please try it
> too.
> Note: Of course this is not offcial fix.

Seems to do the trick:

Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: NFS root lockups with -next 20110113
  2011-01-13 13:22 ` J. R. Okajima
  2011-01-13 13:28   ` Santosh Shilimkar
  2011-01-13 13:35   ` NFS root lockups with -next 20110113 Mark Brown
@ 2011-01-13 13:41   ` Santosh Shilimkar
  2 siblings, 0 replies; 15+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:41 UTC (permalink / raw)
  To: J. R. Okajima, Mark Brown
  Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
	linux-fsdevel, linux-arm-kernel

(+ 'linux-arm' since same problem was getting discussed
 in another thread)

> -----Original Message-----
> From: Santosh Shilimkar [mailto:santosh.shilimkar@ti.com]
> Sent: Thursday, January 13, 2011 6:59 PM
> To: 'J. R. Okajima'; 'Mark Brown'
> Cc: 'Trond Myklebust'; 'Nick Piggin'; 'linux-nfs@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'linux-fsdevel@vger.kernel.org'
> Subject: RE: NFS root lockups with -next 20110113
>
> > -----Original Message-----
> > From: J. R. Okajima [mailto:hooanon05@yahoo.co.jp]
> > Sent: Thursday, January 13, 2011 6:52 PM
> > To: Mark Brown
> > Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> > nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > fsdevel@vger.kernel.org
> > Subject: Re: NFS root lockups with -next 20110113
> >
> >
> > Mark Brown:
> > > With -next 20110113 I'm experiencing lockups shortly after
> > userspace
> > > starts when booting with my root filesystem on NFS, log below.
> I
> > can
> > > boot into /bin/sh but running real userspace triggers this very
> > easily.
> > > This was introduced sometime this week.
> > >
> > > I've not bisected or otherwise investigated much yet, but I do
> > notice
> > > code added recently by Nick in "fs: rcu-walk for path lookup"
> > showing up
> > > in the backtrace so including him in the CCs.
> >
> > This and a report from Santosh Shilimkar look like the same
> problem
> > which I reported, and I am testing this patch. If you can, please
> > try it
> > too.
> > Note: Of course this is not offcial fix.
> >
>
> It works. My board booted with NFS with below patch
>
>
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 5bb7588..51d052f 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> >  {
> >  	struct fs_struct *fs = current->fs;
> >  	struct dentry *parent = nd->path.dentry;
> > +	int isroot;
> >
> >  	BUG_ON(!(nd->flags & LOOKUP_RCU));
> >  	if (nd->root.mnt) {
> > @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> >  			goto err_root;
> >  	}
> >  	spin_lock(&parent->d_lock);
> > -	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > -	if (!__d_rcu_to_refcount(dentry, nd->seq))
> > -		goto err;
> > +	isroot = IS_ROOT(dentry);
> > +	if (!isroot) {
> > +		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > +		if (!__d_rcu_to_refcount(dentry, nd->seq))
> > +			goto err;
> > +	}
> >  	/*
> >  	 * If the sequence check on the child dentry passed, then the
> > child has
> >  	 * not been removed from its parent. This means the parent
> > dentry must
> >  	 * be valid and able to take a reference at this point.
> >  	 */
> > -	BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> > +	BUG_ON(!isroot && dentry->d_parent != parent);
> >  	BUG_ON(!parent->d_count);
> >  	parent->d_count++;
> > -	spin_unlock(&dentry->d_lock);
> > +	if (!isroot)
> > +		spin_unlock(&dentry->d_lock);
> >  	spin_unlock(&parent->d_lock);
> >  	if (nd->root.mnt) {
> >  		path_get(&nd->root);
> > @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> >  	nd->flags &= ~LOOKUP_RCU;
> >  	return 0;
> >  err:
> > -	spin_unlock(&dentry->d_lock);
> > +	if (!isroot)
> > +		spin_unlock(&dentry->d_lock);
> >  	spin_unlock(&parent->d_lock);
> >  err_root:
> >  	if (nd->root.mnt)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: NFS root lockups with -next 20110113
  2011-01-13 13:28   ` Santosh Shilimkar
@ 2011-01-13 13:45     ` J. R. Okajima
  2011-01-14  3:59       ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: J. R. Okajima @ 2011-01-13 13:45 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs,
	linux-kernel, linux-fsdevel


Santosh Shilimkar:
> It works. My board booted with NFS with below patch

----------------------------------------

Mark Brown;
> Seems to do the trick:
>
> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>

Thank you so quick tests.
When I post the patch to be mainlined, I will add Tested-by for both of
you. But Nick Piggin may take another approach.


J. R. Okajima


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: NFS root lockups with -next 20110113
  2011-01-13 13:45     ` J. R. Okajima
@ 2011-01-14  3:59       ` Nick Piggin
  2011-01-14  4:41         ` J. R. Okajima
  2011-01-19  6:43         ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2011-01-14  3:59 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
	linux-nfs, linux-kernel, linux-fsdevel

On Fri, Jan 14, 2011 at 12:45 AM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Santosh Shilimkar:
>> It works. My board booted with NFS with below patch
>
> ----------------------------------------
>
> Mark Brown;
>> Seems to do the trick:
>>
>> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
>
> Thank you so quick tests.
> When I post the patch to be mainlined, I will add Tested-by for both of
> you. But Nick Piggin may take another approach.

Thanks for your help, can you see how I've fixed it in my vfs-scale
tree? What do you think?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: NFS root lockups with -next 20110113
  2011-01-14  3:59       ` Nick Piggin
@ 2011-01-14  4:41         ` J. R. Okajima
  2011-01-19  6:43         ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
  1 sibling, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-14  4:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
	linux-nfs, linux-kernel, linux-fsdevel


Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?

I will.
It may take some time.


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
  2011-01-14  3:59       ` Nick Piggin
  2011-01-14  4:41         ` J. R. Okajima
@ 2011-01-19  6:43         ` J. R. Okajima
  2011-01-19  7:21           ` Nick Piggin
  2011-02-11  3:49           ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
  1 sibling, 2 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-19  6:43 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
	linux-nfs, linux-kernel, linux-fsdevel


Hi,

Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?

Your fix is great. I have no objection at all.
Other than the fix, here are more generic questions about vfs-scale work.
I am happy if you reply when you have time.

- getcwd(2) needs d_lock?
  It acquires rename_lock and then tests whether the pwd is removed by
  d_unhashed(). If a race condition between vfs_rename_dir() which may
  unhash/rehash the dentry happens, then getcwd() may return the wrong
  result due to unprotected d_unhashed() call, I am afraid. rename_lock
  doesn't help this case.

- what is the right order of dget() and mntget()?
  If I remember correctly, someone said "mntget() first and then
  dget(). when putting, do in reverse" in the discussion when
  path_{get,put}() were born. So it is called "the right order" in the
  commit log.
  It was many years ago. Is it still true? And should rcu-walk follow it
  too? The current implementation doesn't seem to care about this order.

- d_move() and rename_lock
  This may be out of rcu-walk work, but rename_lock in d_move() looks
  outstanding since it surely kills concurrency. It is a pity that two
  unrelated but concurrent d_move-s are serialized when we run rename(2)
  on two different filesystems. Even if all of dentries, parents and
  hash buckets are different from each other, d_move() never run
  concurrently.


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
  2011-01-19  6:43         ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
@ 2011-01-19  7:21           ` Nick Piggin
  2011-01-20  9:05             ` vfs-scale, general questions J. R. Okajima
  2011-02-11  3:49           ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
  1 sibling, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2011-01-19  7:21 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
	linux-nfs, linux-kernel, linux-fsdevel

On Wed, Jan 19, 2011 at 5:43 PM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Hi,
>
> Nick Piggin:
>> Thanks for your help, can you see how I've fixed it in my vfs-scale
>> tree? What do you think?
>
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.

Thanks for reviewing.


> - getcwd(2) needs d_lock?
>  It acquires rename_lock and then tests whether the pwd is removed by
>  d_unhashed(). If a race condition between vfs_rename_dir() which may
>  unhash/rehash the dentry happens, then getcwd() may return the wrong
>  result due to unprotected d_unhashed() call, I am afraid. rename_lock
>  doesn't help this case.

We have the lock in write mode there, so it should exclude that
particular race. But I need to take another look at this code I
think, I'm not sure it's completely right, so I would appreciate reviews.

A while back I had some extra checks in there and would restart
the entire reverse walk in case of races... but need to think about
it.


> - what is the right order of dget() and mntget()?
>  If I remember correctly, someone said "mntget() first and then
>  dget(). when putting, do in reverse" in the discussion when
>  path_{get,put}() were born. So it is called "the right order" in the
>  commit log.
>  It was many years ago. Is it still true? And should rcu-walk follow it
>  too? The current implementation doesn't seem to care about this order.

Well dget and mntget is not a problem, because we can only do
mntget while already guaranteeing a reference on the mount, and
only dget when already guaranteeing a ref on the dentry (and mount).

But dput must happen before mntput so you don't have dentry ref
without mnt ref. Can you point out where rcu-walk does this wrongly?


> - d_move() and rename_lock
>  This may be out of rcu-walk work, but rename_lock in d_move() looks
>  outstanding since it surely kills concurrency. It is a pity that two
>  unrelated but concurrent d_move-s are serialized when we run rename(2)
>  on two different filesystems. Even if all of dentries, parents and
>  hash buckets are different from each other, d_move() never run
>  concurrently.

Yes I have a patch for that. I made a small hash table of rename locks.
This makes independent same-dir renames scalable. However that was
not the main motivation of the patch. On a really big POWER7 system,
the lookup path goes into a strange bimodal behaviour in the presence
of a relatively small amount of rename activity and sometimes starves
and throughput crashes. Breaking up rename_lock solves that too.

I'll wait until things settle down a bit more and perhaps have a chance
to get more numbers before submitting it (although I can show you when
I get back).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions
  2011-01-19  7:21           ` Nick Piggin
@ 2011-01-20  9:05             ` J. R. Okajima
  2011-01-20 11:15               ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: J. R. Okajima @ 2011-01-20  9:05 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Nick Piggin, linux-kernel, linux-fsdevel


(Some of mail destinations are removed since this is not nfs specific
anymore.)

Nick Piggin:
> > - getcwd(2) needs d_lock?
> > =A0It acquires rename_lock and then tests whether the pwd is removed by
> > =A0d_unhashed(). If a race condition between vfs_rename_dir() which may
> > =A0unhash/rehash the dentry happens, then getcwd() may return the wrong
> > =A0result due to unprotected d_unhashed() call, I am afraid. rename_lock
> > =A0doesn't help this case.
>
> We have the lock in write mode there, so it should exclude that
> particular race. But I need to take another look at this code I
> think, I'm not sure it's completely right, so I would appreciate reviews.

You might think about the race around d_move, but what I meant is the
race between d_unlinked and unhash/rehash.

- getcwd return -ENOENT when pwd is unhashed.
- vfs_rename_dir()
  + makes the existing target unhashed.
  + FS ->rename() is called, here let's assume an error happened. so the
    target dir is surely alive and reachable, nothing have been changed.
  + vfs_rename_dir() rehashes it again.

During this unhashed period, getcwd(2) may be issued.
And I am afraid it may return an error incorrectly.


About other issues, I will reply when I have time.


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions
  2011-01-20  9:05             ` vfs-scale, general questions J. R. Okajima
@ 2011-01-20 11:15               ` Miklos Szeredi
  2011-01-21  6:38                 ` J. R. Okajima
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2011-01-20 11:15 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel

On Thu, 20 Jan 2011, J. R. Okajima wrote:
> You might think about the race around d_move, but what I meant is the
> race between d_unlinked and unhash/rehash.
> 
> - getcwd return -ENOENT when pwd is unhashed.
> - vfs_rename_dir()
>   + makes the existing target unhashed.

No it doesn't.  dentry_unhash() doesn't drop the dentry from the hash
unless it has no other references, cwd being one such ref.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions
  2011-01-20 11:15               ` Miklos Szeredi
@ 2011-01-21  6:38                 ` J. R. Okajima
  0 siblings, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-21  6:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel


Miklos Szeredi:
> No it doesn't.  dentry_unhash() doesn't drop the dentry from the hash
> unless it has no other references, cwd being one such ref.

I see.
When chdir() completets before vfs_rename_dir(), the dir will not be
unhashed. It is OK.

Even if the dir is unhashed temporary before chdir(), then the lookup
fallback to ref-walk and i_mutex protects. It is OK too.

Thank you.
J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
  2011-01-19  6:43         ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
  2011-01-19  7:21           ` Nick Piggin
@ 2011-02-11  3:49           ` Ian Kent
  2011-02-13  2:19             ` J. R. Okajima
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2011-02-11  3:49 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
	Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel

On Wed, 2011-01-19 at 15:43 +0900, J. R. Okajima wrote:
> Hi,
> 
> Nick Piggin:
> > Thanks for your help, can you see how I've fixed it in my vfs-scale
> > tree? What do you think?
> 
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.
> 
> - getcwd(2) needs d_lock?
>   It acquires rename_lock and then tests whether the pwd is removed by
>   d_unhashed(). If a race condition between vfs_rename_dir() which may
>   unhash/rehash the dentry happens, then getcwd() may return the wrong
>   result due to unprotected d_unhashed() call, I am afraid. rename_lock
>   doesn't help this case.
> 
> - what is the right order of dget() and mntget()?
>   If I remember correctly, someone said "mntget() first and then
>   dget(). when putting, do in reverse" in the discussion when
>   path_{get,put}() were born. So it is called "the right order" in the
>   commit log.
>   It was many years ago. Is it still true? And should rcu-walk follow it
>   too? The current implementation doesn't seem to care about this order.

I didn't spot that, where did you see this?

I'm not sure about the get but I fairly sure the dput() has to be before
the mntput() because the shrink_dcache_*() cleanup routines object to
dentrys that have a reference count of more than one.

Ian



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
  2011-02-11  3:49           ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
@ 2011-02-13  2:19             ` J. R. Okajima
  0 siblings, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-02-13  2:19 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
	Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel


Ian Kent:
> > - what is the right order of dget() and mntget()?
> >   If I remember correctly, someone said "mntget() first and then
> >   dget(). when putting, do in reverse" in the discussion when
> >   path_{get,put}() were born. So it is called "the right order" in the
> >   commit log.
> >   It was many years ago. Is it still true? And should rcu-walk follow it
> >   too? The current implementation doesn't seem to care about this order.
>
> I didn't spot that, where did you see this?
>
> I'm not sure about the get but I fairly sure the dput() has to be before
> the mntput() because the shrink_dcache_*() cleanup routines object to
> dentrys that have a reference count of more than one.

For dget - mntget, there are several such code. For example,
nameidata_dentry_drop_rcu()
{
	struct dentry *parent = nd->path.dentry;
		:::
	parent->d_count++;
	spin_unlock(&dentry->d_lock);
	spin_unlock(&parent->d_lock);
		:::
	mntget(nd->path.mnt);
		:::

But I am not sure the "get" order is a problem.
Nick Piggin also replied and said dget and mntget is not a problem, and
I replied if I found such "put" order, I would write again.


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-13  2:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28   ` Santosh Shilimkar
2011-01-13 13:45     ` J. R. Okajima
2011-01-14  3:59       ` Nick Piggin
2011-01-14  4:41         ` J. R. Okajima
2011-01-19  6:43         ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
2011-01-19  7:21           ` Nick Piggin
2011-01-20  9:05             ` vfs-scale, general questions J. R. Okajima
2011-01-20 11:15               ` Miklos Szeredi
2011-01-21  6:38                 ` J. R. Okajima
2011-02-11  3:49           ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
2011-02-13  2:19             ` J. R. Okajima
2011-01-13 13:35   ` NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:41   ` Santosh Shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).