LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/3] Fixes for vfs-scale and vfs-automount
@ 2011-01-18  4:05 Ian Kent
  2011-01-18  4:06 ` [PATCH 1/3] autofs4 - fix get_next_positive_dentry() Ian Kent
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Ian Kent @ 2011-01-18  4:05 UTC (permalink / raw)
  To: Nick Piggin, David Howells
  Cc: Al Viro, Kernel Mailing List, linux-fsdevel, Linus Torvalds,
	Andrew Morton

I noticed a couple of problems which needed to be fixed before I can
start testing.

Nick, David, can you check my changes please, particularly the cases
for the do_lookup() change.

---

Ian Kent (3):
      autofs4 - fix debug print in autofs4_lookup()
      vfs - fix dentry ref count in do_lookup()
      autofs4 - fix get_next_positive_dentry()


 fs/autofs4/expire.c |    4 ++--
 fs/autofs4/root.c   |    3 ++-
 fs/namei.c          |    5 ++++-
 3 files changed, 8 insertions(+), 4 deletions(-)

-- 
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] autofs4 - fix get_next_positive_dentry()
  2011-01-18  4:05 [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
@ 2011-01-18  4:06 ` Ian Kent
  2011-01-18  4:06 ` [PATCH 2/3] vfs - fix dentry ref count in do_lookup() Ian Kent
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Ian Kent @ 2011-01-18  4:06 UTC (permalink / raw)
  To: Nick Piggin, David Howells
  Cc: Al Viro, Kernel Mailing List, linux-fsdevel, Linus Torvalds,
	Andrew Morton

The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry()
appears to be incorrect. If prev == NULL I believe that root should be
returned.

Further down, at the current dentry check for it being simple_positive()
it looks like the d_lock for dentry p should be dropped instead of dentry
ret, otherwise when p is assinged to ret we end up with no lock on p and
a lost lock on ret, which leads to a deadlock.
---

 fs/autofs4/expire.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 3ed79d7..f43100b 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -96,7 +96,7 @@ static struct dentry *get_next_positive_dentry(struct dentry *prev,
 	struct dentry *p, *ret;
 
 	if (prev == NULL)
-		return dget(prev);
+		return dget(root);
 
 	spin_lock(&autofs4_lock);
 relock:
@@ -133,7 +133,7 @@ again:
 	spin_lock_nested(&ret->d_lock, DENTRY_D_LOCK_NESTED);
 	/* Negative dentry - try next */
 	if (!simple_positive(ret)) {
-		spin_unlock(&ret->d_lock);
+		spin_unlock(&p->d_lock);
 		p = ret;
 		goto again;
 	}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/3] vfs - fix dentry ref count in do_lookup()
  2011-01-18  4:05 [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
  2011-01-18  4:06 ` [PATCH 1/3] autofs4 - fix get_next_positive_dentry() Ian Kent
@ 2011-01-18  4:06 ` Ian Kent
  2011-01-18  4:44   ` Al Viro
  2011-01-18  4:06 ` [PATCH 3/3] autofs4 - fix debug print in autofs4_lookup() Ian Kent
  2011-01-19  7:06 ` [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
  3 siblings, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-01-18  4:06 UTC (permalink / raw)
  To: Nick Piggin, David Howells
  Cc: Al Viro, Kernel Mailing List, linux-fsdevel, Linus Torvalds,
	Andrew Morton

There is a ref count problem in fs/namei.c:do_lookup().

When walking in ref-walk mode, if follow_managed() returns a fail the
reference held by path.dentry isn't dropped.
---

 fs/namei.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b753192..dbc36ff 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1272,8 +1272,11 @@ done:
 	path->mnt = mnt;
 	path->dentry = dentry;
 	err = follow_managed(path, nd->flags);
-	if (unlikely(err < 0))
+	if (unlikely(err < 0)) {
+		if (!(nd->flags & LOOKUP_RCU))
+			dput(dentry);
 		return err;
+	}
 	*inode = path->dentry->d_inode;
 	return 0;
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/3] autofs4 - fix debug print in autofs4_lookup()
  2011-01-18  4:05 [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
  2011-01-18  4:06 ` [PATCH 1/3] autofs4 - fix get_next_positive_dentry() Ian Kent
  2011-01-18  4:06 ` [PATCH 2/3] vfs - fix dentry ref count in do_lookup() Ian Kent
@ 2011-01-18  4:06 ` Ian Kent
  2011-01-19  7:06 ` [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
  3 siblings, 0 replies; 24+ messages in thread
From: Ian Kent @ 2011-01-18  4:06 UTC (permalink / raw)
  To: Nick Piggin, David Howells
  Cc: Al Viro, Kernel Mailing List, linux-fsdevel, Linus Torvalds,
	Andrew Morton

oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead.
---

 fs/autofs4/root.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 1dba035..427129a 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -488,7 +488,8 @@ static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, s
 	sbi = autofs4_sbi(dir->i_sb);
 
 	DPRINTK("pid = %u, pgrp = %u, catatonic = %d, oz_mode = %d",
-		current->pid, task_pgrp_nr(current), sbi->catatonic, oz_mode);
+		current->pid, task_pgrp_nr(current), sbi->catatonic,
+		autofs4_oz_mode(sbi));
 
 	active = autofs4_lookup_active(dentry);
 	if (active) {


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] vfs - fix dentry ref count in do_lookup()
  2011-01-18  4:06 ` [PATCH 2/3] vfs - fix dentry ref count in do_lookup() Ian Kent
@ 2011-01-18  4:44   ` Al Viro
  0 siblings, 0 replies; 24+ messages in thread
From: Al Viro @ 2011-01-18  4:44 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, David Howells, Kernel Mailing List, linux-fsdevel,
	Linus Torvalds, Andrew Morton

On Tue, Jan 18, 2011 at 12:06:10PM +0800, Ian Kent wrote:
> There is a ref count problem in fs/namei.c:do_lookup().
> 
> When walking in ref-walk mode, if follow_managed() returns a fail the
> reference held by path.dentry isn't dropped.

If we get to follow_managed(), we *are* in ref-walk mode.  Unconditionally.
Besided, that's path_put_conditional(), not dput() - we might have both
grabbed vfsmount on mountpoint crossing *AND* changed dentry.

Applied with modifications...

The rest applied as-is.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-01-18  4:05 [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
                   ` (2 preceding siblings ...)
  2011-01-18  4:06 ` [PATCH 3/3] autofs4 - fix debug print in autofs4_lookup() Ian Kent
@ 2011-01-19  7:06 ` Ian Kent
  2011-02-15 14:25   ` Ian Kent
  3 siblings, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-01-19  7:06 UTC (permalink / raw)
  To: Nick Piggin, David Howells
  Cc: Al Viro, Kernel Mailing List, linux-fsdevel, Linus Torvalds,
	Andrew Morton

On Tue, 2011-01-18 at 12:05 +0800, Ian Kent wrote:
> I noticed a couple of problems which needed to be fixed before I can
> start testing.

Before I return to testing I thought I should let everyone know the
results so far, following the addition of the merged patches from this
thread.

I have two tests, the first is the autofs connectathon test suite.
It consists of a wide range of automount map constructs, and contains
both success and failure cases and results in about 260 mounts. I
usually run the test 3 times, and wait for the mounts to expire between
runs.

The connectathon test worked fine and I didn't observe any unexpected
log messages over the three runs.

The second test I run has been adapted from one of the connectathon
tests. This test uses autofs submounts which are the most problematic of
the autofs map constructs. The modified test is meant to introduce a
fairly high level of contention by using a number of processes accessing
the mounts at the same time (currently configured as 10 processes). I
also attempt to introduce expire to mount contention by adjusting the
expire timeout. In addition the test runs twice, one the the default
nobrowse option and one for the browse option (no need to worry about
what that means). All in all the test seems to be fairly good at
exposing problems and 150 iterations of each configuration seems to get
most problems within three consecutive runs.

The first run of this test returned a pass although there were some
unexpected log messages. I'm not sure yet what is causing these. Of
interest was a bunch of reported lstat() failures against directories
that corresponded to failure cases and should not have existed at all.

The second run went through to completion but triggered the BUG() at
line 941 in shrink_dcache_for_umount_subtree() when shutting down autofs
after the first (nobrowse) part of the test run.

So, there is still a ref count problem somewhere, at least.

> 
> Nick, David, can you check my changes please, particularly the cases
> for the do_lookup() change.
> 
> ---
> 
> Ian Kent (3):
>       autofs4 - fix debug print in autofs4_lookup()
>       vfs - fix dentry ref count in do_lookup()
>       autofs4 - fix get_next_positive_dentry()
> 
> 
>  fs/autofs4/expire.c |    4 ++--
>  fs/autofs4/root.c   |    3 ++-
>  fs/namei.c          |    5 ++++-
>  3 files changed, 8 insertions(+), 4 deletions(-)
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-01-19  7:06 ` [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
@ 2011-02-15 14:25   ` Ian Kent
  2011-02-23  7:22     ` Ian Kent
  2011-02-24  1:58     ` Al Viro
  0 siblings, 2 replies; 24+ messages in thread
From: Ian Kent @ 2011-02-15 14:25 UTC (permalink / raw)
  To: Nick Piggin, Trond Myklebust
  Cc: David Howells, Al Viro, Kernel Mailing List, linux-fsdevel,
	Linus Torvalds, Andrew Morton, Linux NFS Mailing List

On Wed, 2011-01-19 at 15:06 +0800, Ian Kent wrote:
> On Tue, 2011-01-18 at 12:05 +0800, Ian Kent wrote:
> > I noticed a couple of problems which needed to be fixed before I can
> > start testing.
> 
> Before I return to testing I thought I should let everyone know the
> results so far, following the addition of the merged patches from this
> thread.

Sorry for the large recipient list but I think there's a need.
I've made very little progress on this but my most recent development
deserves an update.

Most recent kernel used is 2.6.38-rc4.

> 
> I have two tests, the first is the autofs connectathon test suite.
> It consists of a wide range of automount map constructs, and contains
> both success and failure cases and results in about 260 mounts. I
> usually run the test 3 times, and wait for the mounts to expire between
> runs.
> 
> The connectathon test worked fine and I didn't observe any unexpected
> log messages over the three runs.
> 
> The second test I run has been adapted from one of the connectathon
> tests. This test uses autofs submounts which are the most problematic of
> the autofs map constructs. The modified test is meant to introduce a
> fairly high level of contention by using a number of processes accessing
> the mounts at the same time (currently configured as 10 processes). I
> also attempt to introduce expire to mount contention by adjusting the
> expire timeout. In addition the test runs twice, one the the default
> nobrowse option and one for the browse option (no need to worry about
> what that means). All in all the test seems to be fairly good at
> exposing problems and 150 iterations of each configuration seems to get
> most problems within three consecutive runs.
> 
> The first run of this test returned a pass although there were some
> unexpected log messages. I'm not sure yet what is causing these. Of
> interest was a bunch of reported lstat() failures against directories
> that corresponded to failure cases and should not have existed at all.
> 
> The second run went through to completion but triggered the BUG() at
> line 941 in shrink_dcache_for_umount_subtree() when shutting down autofs
> after the first (nobrowse) part of the test run.
> 
> So, there is still a ref count problem somewhere, at least.

I'm seeing a reference being gained, or perhaps not being released
quickly enough after a close(2) on a file handle open on a mount point
being umounted seen in the backtrace [1] below. I can't say if there is
more than one like this because the BUG() stops the umounting. I'm
thinking of modifying the code to try and continue to see what else I
can find out.

I get this quite reliably running the test described above.

I've looked long and hard at the vfs-scale path walking code (and the
vfs-automount code) and I can't yet see how this could be possible.

Here are some observations I've made so far.

In this segement of code in autofs4:

int autofs4_d_manage(struct dentry *dentry, bool mounting_here, bool rcu_walk)
{
        struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);

        DPRINTK("dentry=%p %.*s",
                dentry, dentry->d_name.len, dentry->d_name.name);

        /* The daemon never waits. */
        if (autofs4_oz_mode(sbi) || mounting_here) {
                if (!d_mountpoint(dentry))
                        return -EISDIR;
                return 0;
        }

        /* We need to sleep, so we need pathwalk to be in ref-mode */
        if (rcu_walk)
                return -ECHILD;

        /* Wait for pending expires */
        do_expire_wait(dentry);

        /*
         * This dentry may be under construction so wait on mount
         * completion.
         */
        return autofs4_mount_wait(dentry);
}

If I move the

        /* We need to sleep, so we need pathwalk to be in ref-mode */
        if (rcu_walk)
                return -ECHILD;

above the

        /* The daemon never waits. */
        if (autofs4_oz_mode(sbi) || mounting_here) {
                if (!d_mountpoint(dentry))
                        return -EISDIR;
                return 0;
        }

I almost never see the problem in the first stage of the test, that
being the nobrowse configuration, but almost always see it in the second
stage, the so called browse configuration. Which amounts to saying that
the problem appears to happen more often when mount point directories in
the automount headachy exist before being mounted on and are not removed
when they are expired. Unfortunately it isn't as simple as that either
since the automount map itself is fairly complex. Still, I thought it
worth mentioning.

The most recent curious thing is that if I change the test a bit to use
bind mounts on local paths instead of NFS mounts I don't get this BUG()
at all. Don't get me wrong, I'm not saying this is necessarily an NFS
problem, it may be that the reduced latency of bind mounts changes the
test behavior and hides the bug. So I'd appreciate some help from NFS
folks in case it is something that needs to be changed in NFS, since I
can't see anything that might cause it in the NFS code myself.

If anyone would like to try and run the test themselves I'll work on
decoupling it from the RHTS infrastructure within which it is currently
implemented. 

[1] the BUG trace I see

Feb 14 20:20:20 perseus kernel: pid 15761: autofs4_kill_sb: s_root ffff88020d388f00 d_count 1 d_inode ffff88020ddf7840 d_flags 60010 sbi type 4 open_count 0 ino->flags 0
Feb 14 20:20:20 perseus kernel: pid 15761: autofs4_kill_sb: s_root ffff88020d3889c0 d_count 1 d_inode ffff88020ddf5440 d_flags 60010 sbi type 4 open_count 0 ino->flags 0
Feb 14 20:20:20 perseus kernel: pid 15743: autofs4_expire_indirect: returning ffff8801f96a0480 g3c
Feb 14 20:20:20 perseus kernel: pid 15762: autofs4_kill_sb: s_root ffff8800ccbe6000 d_count 2 d_inode ffff88020ddcc480 d_flags 60010 sbi type 4 open_count 0 ino->flags 0
Feb 14 20:20:20 perseus kernel: pid 15762: autofs4_kill_sb: s_root->d_subdirs empty
Feb 14 20:20:20 perseus kernel: BUG: Dentry ffff8800ccbe6000{i=26a028,n=/} still in use (1) [unmount of autofs autofs]
Feb 14 20:20:20 perseus kernel: ------------[ cut here ]------------
Feb 14 20:20:20 perseus kernel: kernel BUG at fs/dcache.c:943!
Feb 14 20:20:20 perseus kernel: invalid opcode: 0000 [#1] SMP
Feb 14 20:20:20 perseus kernel: last sysfs file: /sys/devices/system/cpu/cpu5/cache/index2/shared_cpu_map
Feb 14 20:20:20 perseus kernel: CPU 0
Feb 14 20:20:20 perseus kernel: Modules linked in: nls_utf8 udf crc_itu_t tcp_lp tun fuse ip6table_filter ip6_tables
	ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfs lockd fscache nfs_acl auth_rpcgss
	autofs4 sunrpc ipv6 cpufreq_ondemand powernow_k8 freq_table mperf kvm_amd kvm uinput snd_hda_codec_hdmi
	snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd
	edac_core edac_mce_amd asus_atk0110 r8169 mii i2c_piix4 ppdev parport_pc parport soundcore joydev wmi
	xhci_hcd serio_raw snd_page_alloc shpchp pcspkr microcode ata_generic pata_acpi usb_storage pata_atiixp
	radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Feb 14 20:20:20 perseus kernel:
Feb 14 20:20:20 perseus kernel: Pid: 15762, comm: automount Not tainted 2.6.38-rc4+ #2 M4A88T-M/USB3/System Product Name
Feb 14 20:20:20 perseus kernel: RIP: 0010:[<ffffffff811206e3>]  [<ffffffff811206e3>] shrink_dcache_for_umount_subtree+0x102/0x1d0
Feb 14 20:20:20 perseus kernel: RSP: 0018:ffff880209b23d58  EFLAGS: 00010296
Feb 14 20:20:20 perseus kernel: RAX: 000000000000005d RBX: ffff8800ccbe6000 RCX: 00000000000051e2
Feb 14 20:20:20 perseus kernel: RDX: 0000000000000000 RSI: 0000000000000092 RDI: 0000000000000246
Feb 14 20:20:20 perseus kernel: RBP: ffff880209b23d88 R08: 0000000000000002 R09: ffff8802ffffffff
Feb 14 20:20:20 perseus kernel: R10: ffff880209b23c48 R11: 0000000000000000 R12: ffff8800ccbe6000
Feb 14 20:20:20 perseus kernel: R13: ffff8800ccbe60a0 R14: ffff88020a68fc00 R15: ffff8800ccbe60a0
Feb 14 20:20:20 perseus kernel: FS:  00007fa9a2101700(0000) GS:ffff8800cfc00000(0000) knlGS:00000000f75e4710
Feb 14 20:20:20 perseus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 14 20:20:20 perseus kernel: CR2: 00007fa9940089c0 CR3: 000000017022a000 CR4: 00000000000006f0
Feb 14 20:20:20 perseus kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 14 20:20:20 perseus kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 14 20:20:20 perseus kernel: Process automount (pid: 15762, threadinfo ffff880209b22000, task ffff880059ee1710)
Feb 14 20:20:20 perseus kernel: Stack:
Feb 14 20:20:20 perseus kernel: ffff88020a68fe58 ffffffff8104771e 0000000000000004 ffff88020a68fc00
Feb 14 20:20:20 perseus kernel: ffff8800ccbe6000 ffff880209e680c0 ffff880209b23da8 ffffffff811210b8
Feb 14 20:20:20 perseus kernel: ffff88020a68fc00 ffffffffa027c990 ffff880209b23dd8 ffffffff81110721
Feb 14 20:20:20 perseus kernel: Call Trace:
Feb 14 20:20:20 perseus kernel: [<ffffffff8104771e>] ? try_to_wake_up+0x214/0x226
Feb 14 20:20:20 perseus kernel: [<ffffffff811210b8>] shrink_dcache_for_umount+0x4d/0x5f
Feb 14 20:20:20 perseus kernel: [<ffffffff81110721>] generic_shutdown_super+0x29/0xf6
Feb 14 20:20:20 perseus kernel: [<ffffffff8111086e>] kill_anon_super+0x16/0x50
Feb 14 20:20:20 perseus kernel: [<ffffffff811108cf>] kill_litter_super+0x27/0x2c
Feb 14 20:20:20 perseus kernel: [<ffffffffa0279390>] autofs4_kill_sb+0x1a7/0x1b6 [autofs4]
Feb 14 20:20:20 perseus kernel: [<ffffffff81110a9e>] deactivate_locked_super+0x26/0x46
Feb 14 20:20:20 perseus kernel: [<ffffffff811115ba>] deactivate_super+0x3a/0x3e
Feb 14 20:20:20 perseus kernel: [<ffffffff81126599>] mntput_no_expire+0xd0/0xd5
Feb 14 20:20:20 perseus kernel: [<ffffffff811270f1>] sys_umount+0x2e9/0x317
Feb 14 20:20:20 perseus kernel: [<ffffffff8100ac02>] system_call_fastpath+0x16/0x1b
Feb 14 20:20:20 perseus kernel: Code: 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 af 68 7a 81 48 89 de 31 c0 e8 a1 60 33 00 <0f> 0b 4c 8b 63 18 4c 8d ab 90 00 00 00 4c 39 e3 75 0d 4c 89 ef
Feb 14 20:20:20 perseus kernel: RIP  [<ffffffff811206e3>] shrink_dcache_for_umount_subtree+0x102/0x1d0
Feb 14 20:20:20 perseus kernel: RSP <ffff880209b23d58>
Feb 14 20:20:20 perseus kernel: ---[ end trace 58b3a9ecb744fd7d ]---


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-15 14:25   ` Ian Kent
@ 2011-02-23  7:22     ` Ian Kent
  2011-02-23 16:37       ` Linus Torvalds
  2011-02-24  1:58     ` Al Viro
  1 sibling, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-02-23  7:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Trond Myklebust, David Howells, Al Viro, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Tue, 2011-02-15 at 22:25 +0800, Ian Kent wrote:
> On Wed, 2011-01-19 at 15:06 +0800, Ian Kent wrote:
> > On Tue, 2011-01-18 at 12:05 +0800, Ian Kent wrote:
> > > I noticed a couple of problems which needed to be fixed before I can
> > > start testing.
> > 
> > Before I return to testing I thought I should let everyone know the
> > results so far, following the addition of the merged patches from this
> > thread.
> 
> Sorry for the large recipient list but I think there's a need.
> I've made very little progress on this but my most recent development
> deserves an update.
> 
> Most recent kernel used is 2.6.38-rc4.

Now at rc5, and still struggling with this but I have made some
progress.

Nick, what stops a dentry from going away during the rcu-walk?

Ian



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-23  7:22     ` Ian Kent
@ 2011-02-23 16:37       ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2011-02-23 16:37 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Trond Myklebust, David Howells, Al Viro,
	Kernel Mailing List, linux-fsdevel, Andrew Morton,
	Linux NFS Mailing List

On Tue, Feb 22, 2011 at 11:22 PM, Ian Kent <raven@themaw.net> wrote:
>
> Nick, what stops a dentry from going away during the rcu-walk?

Dentries (and now inodes) are all RCU-free'd. So the dentry might "go
away", but it wouldn't actually be physically free'd, and you can
still access it.

                            Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-15 14:25   ` Ian Kent
  2011-02-23  7:22     ` Ian Kent
@ 2011-02-24  1:58     ` Al Viro
  2011-02-24  3:03       ` Ian Kent
  1 sibling, 1 reply; 24+ messages in thread
From: Al Viro @ 2011-02-24  1:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Tue, Feb 15, 2011 at 10:25:02PM +0800, Ian Kent wrote:

> I'm seeing a reference being gained, or perhaps not being released
> quickly enough after a close(2) on a file handle open on a mount point
> being umounted seen in the backtrace [1] below. I can't say if there is
> more than one like this because the BUG() stops the umounting. I'm
> thinking of modifying the code to try and continue to see what else I
> can find out.
> 
> I get this quite reliably running the test described above.
> 
> I've looked long and hard at the vfs-scale path walking code (and the
> vfs-automount code) and I can't yet see how this could be possible.
> 
> Here are some observations I've made so far.
> 
> In this segement of code in autofs4:
 
> int autofs4_d_manage(struct dentry *dentry, bool mounting_here, bool rcu_walk)
> {
>         struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
> 
>         DPRINTK("dentry=%p %.*s",
>                 dentry, dentry->d_name.len, dentry->d_name.name);
> 
>         /* The daemon never waits. */
>         if (autofs4_oz_mode(sbi) || mounting_here) {
>                 if (!d_mountpoint(dentry))
>                         return -EISDIR;
>                 return 0;
>         }
> 
>         /* We need to sleep, so we need pathwalk to be in ref-mode */
>         if (rcu_walk)
>                 return -ECHILD;

[snip]

> If I move the
> 
>         /* We need to sleep, so we need pathwalk to be in ref-mode */
>         if (rcu_walk)
>                 return -ECHILD;
> 
> above the
> 
>         /* The daemon never waits. */
>         if (autofs4_oz_mode(sbi) || mounting_here) {
>                 if (!d_mountpoint(dentry))
>                         return -EISDIR;
>                 return 0;
>         }
> 
> I almost never see the problem in the first stage of the test, that
> being the nobrowse configuration, but almost always see it in the second
> stage, the so called browse configuration. Which amounts to saying that
> the problem appears to happen more often when mount point directories in
> the automount headachy exist before being mounted on and are not removed
> when they are expired. Unfortunately it isn't as simple as that either
> since the automount map itself is fairly complex. Still, I thought it
> worth mentioning.

Curious...  The only caller affected by that transposition is
__follow_mount_rcu() and it would have to
	* be called from do_lookup()
	* have path->dentry pointing to a mountpoint
	* being called by the daemon.
So basically you are forcing the daemon to try and drop out of RCU mode
when it reaches a mountpoint on autofs.

> The most recent curious thing is that if I change the test a bit to use
> bind mounts on local paths instead of NFS mounts I don't get this BUG()
> at all. Don't get me wrong, I'm not saying this is necessarily an NFS
> problem, it may be that the reduced latency of bind mounts changes the
> test behavior and hides the bug. So I'd appreciate some help from NFS
> folks in case it is something that needs to be changed in NFS, since I
> can't see anything that might cause it in the NFS code myself.

It might be "kicks the thing out of RCU mode as soon as we reach into a
directory on NFS"...  Try binds down into sysfs; ->d_revalidate() will
act there as well.

> If anyone would like to try and run the test themselves I'll work on
> decoupling it from the RHTS infrastructure within which it is currently
> implemented. 

Ho-hum...  I can reach RHTS, but I'd rather do that at home boxen, if
possible...  Has it been reproduced on UP boxen with SMP kernels, BTW?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  1:58     ` Al Viro
@ 2011-02-24  3:03       ` Ian Kent
  2011-02-24  3:14         ` Al Viro
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-02-24  3:03 UTC (permalink / raw)
  To: Al Viro
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, 2011-02-24 at 01:58 +0000, Al Viro wrote: 
> On Tue, Feb 15, 2011 at 10:25:02PM +0800, Ian Kent wrote:
> 
> > I'm seeing a reference being gained, or perhaps not being released
> > quickly enough after a close(2) on a file handle open on a mount point
> > being umounted seen in the backtrace [1] below. I can't say if there is
> > more than one like this because the BUG() stops the umounting. I'm
> > thinking of modifying the code to try and continue to see what else I
> > can find out.
> > 
> > I get this quite reliably running the test described above.
> > 
> > I've looked long and hard at the vfs-scale path walking code (and the
> > vfs-automount code) and I can't yet see how this could be possible.
> > 
> > Here are some observations I've made so far.
> > 
> > In this segement of code in autofs4:
>  
> > int autofs4_d_manage(struct dentry *dentry, bool mounting_here, bool rcu_walk)
> > {
> >         struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
> > 
> >         DPRINTK("dentry=%p %.*s",
> >                 dentry, dentry->d_name.len, dentry->d_name.name);
> > 
> >         /* The daemon never waits. */
> >         if (autofs4_oz_mode(sbi) || mounting_here) {
> >                 if (!d_mountpoint(dentry))
> >                         return -EISDIR;
> >                 return 0;
> >         }
> > 
> >         /* We need to sleep, so we need pathwalk to be in ref-mode */
> >         if (rcu_walk)
> >                 return -ECHILD;
> 
> [snip]
> 
> > If I move the
> > 
> >         /* We need to sleep, so we need pathwalk to be in ref-mode */
> >         if (rcu_walk)
> >                 return -ECHILD;
> > 
> > above the
> > 
> >         /* The daemon never waits. */
> >         if (autofs4_oz_mode(sbi) || mounting_here) {
> >                 if (!d_mountpoint(dentry))
> >                         return -EISDIR;
> >                 return 0;
> >         }
> > 
> > I almost never see the problem in the first stage of the test, that
> > being the nobrowse configuration, but almost always see it in the second
> > stage, the so called browse configuration. Which amounts to saying that
> > the problem appears to happen more often when mount point directories in
> > the automount headachy exist before being mounted on and are not removed
> > when they are expired. Unfortunately it isn't as simple as that either
> > since the automount map itself is fairly complex. Still, I thought it
> > worth mentioning.
> 
> Curious...  The only caller affected by that transposition is
> __follow_mount_rcu() and it would have to
> 	* be called from do_lookup()
> 	* have path->dentry pointing to a mountpoint
> 	* being called by the daemon.
> So basically you are forcing the daemon to try and drop out of RCU mode
> when it reaches a mountpoint on autofs.

At this point I'm partly clueless as to what's happening.
Even the statements above turn out to be inaccurate although that is
what I observed for quite a while initially.

One thing I've found is that it's possible for __follow_mount_rcu() to
be called on a non-mountpoint autofs dentry that needs to block transit.

Another is an out of order call bug with the active list usage. It was
removing the dentry from the list before hashing it which can cause
multiple dentrys to be created for the same lookup. At this point I
don't think just changing the function call order is sufficient to fix
it. I tried to remove the active list code during the vfs-automount
development but found I couldn't always drop dentrys when I needed to
for error cases. I didn't pursue that because I had something else that
was working fine and seemed to play well with the VFS.

There is also the expire. It uses reference counts to establish
busyness, which is mostly fine, except that's also used to avoid
attempting to expire a tree that has a process walking somewhere within
it. Which more or less says that I need to drop out of rcu-walk for any
dentry that is not already being expired (or checked).

I also have a sick feeling that dentrys may become negative at any point
after __d_lookup_rcu() .....

> 
> > The most recent curious thing is that if I change the test a bit to use
> > bind mounts on local paths instead of NFS mounts I don't get this BUG()
> > at all. Don't get me wrong, I'm not saying this is necessarily an NFS
> > problem, it may be that the reduced latency of bind mounts changes the
> > test behavior and hides the bug. So I'd appreciate some help from NFS
> > folks in case it is something that needs to be changed in NFS, since I
> > can't see anything that might cause it in the NFS code myself.
> 
> It might be "kicks the thing out of RCU mode as soon as we reach into a
> directory on NFS"...  Try binds down into sysfs; ->d_revalidate() will
> act there as well.
> 
> > If anyone would like to try and run the test themselves I'll work on
> > decoupling it from the RHTS infrastructure within which it is currently
> > implemented. 
> 
> Ho-hum...  I can reach RHTS, but I'd rather do that at home boxen, if
> possible...  Has it been reproduced on UP boxen with SMP kernels, BTW?

Nope, I'd need to build a kernel specifically for that. I'm not sure how
useful that would be though since the test is specifically meant to
expose problems with multiple concurrent processes accessing an
automount tree. I don't see any problem running the Connectathon tests
which is essentially one automount and one client process.

Ian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  3:03       ` Ian Kent
@ 2011-02-24  3:14         ` Al Viro
  2011-02-24  3:28           ` Ian Kent
  0 siblings, 1 reply; 24+ messages in thread
From: Al Viro @ 2011-02-24  3:14 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, Feb 24, 2011 at 11:03:38AM +0800, Ian Kent wrote:

> I also have a sick feeling that dentrys may become negative at any point
> after __d_lookup_rcu() .....

Yes.  To get stability of ->d_inode (assuming the sucker isn't pinned down
in normal way by ->d_count) you need ->d_lock.

> > Ho-hum...  I can reach RHTS, but I'd rather do that at home boxen, if
> > possible...  Has it been reproduced on UP boxen with SMP kernels, BTW?
> 
> Nope, I'd need to build a kernel specifically for that. I'm not sure how
> useful that would be though since the test is specifically meant to
> expose problems with multiple concurrent processes accessing an
> automount tree. I don't see any problem running the Connectathon tests
> which is essentially one automount and one client process.

Heh...  No, it's just that the only SMP box I have locally right now
is dual ultrasparc.  Anyway, I can live with RHTS.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  3:14         ` Al Viro
@ 2011-02-24  3:28           ` Ian Kent
  2011-02-24  3:58             ` Al Viro
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-02-24  3:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, 2011-02-24 at 03:14 +0000, Al Viro wrote:
> On Thu, Feb 24, 2011 at 11:03:38AM +0800, Ian Kent wrote:
> 
> > I also have a sick feeling that dentrys may become negative at any point
> > after __d_lookup_rcu() .....
> 
> Yes.  To get stability of ->d_inode (assuming the sucker isn't pinned down
> in normal way by ->d_count) you need ->d_lock.
> 
> > > Ho-hum...  I can reach RHTS, but I'd rather do that at home boxen, if
> > > possible...  Has it been reproduced on UP boxen with SMP kernels, BTW?
> > 
> > Nope, I'd need to build a kernel specifically for that. I'm not sure how
> > useful that would be though since the test is specifically meant to
> > expose problems with multiple concurrent processes accessing an
> > automount tree. I don't see any problem running the Connectathon tests
> > which is essentially one automount and one client process.
> 
> Heh...  No, it's just that the only SMP box I have locally right now
> is dual ultrasparc.  Anyway, I can live with RHTS.

If you want to get hold of the test I'm using checkout autofs-RHEL-5,
"cd autofs-tests/submount-test", "make rpm" and use the resulting
rh-tests-autofs-submount-test-1.0-15.noarch.rpm. There are some
beaker/rhts rpm dependencies. Let me know if it gets painful and I'll
try and work out what you need. The test isn't very flash but it does
stress autofs.

Ha, I haven't even turned on my Ultrsparc 2 in months, it's only got an
old version of Solaris on it now anyway, ;)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  3:28           ` Ian Kent
@ 2011-02-24  3:58             ` Al Viro
  2011-02-24  5:47               ` Al Viro
  2011-02-24  6:34               ` Ian Kent
  0 siblings, 2 replies; 24+ messages in thread
From: Al Viro @ 2011-02-24  3:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, Feb 24, 2011 at 11:28:57AM +0800, Ian Kent wrote:

> Ha, I haven't even turned on my Ultrsparc 2 in months, it's only got an
> old version of Solaris on it now anyway, ;)

U60, with lenny (and mainline kernel) on it.  Probably ought to upgrade
to squeeze one of those days...  It works, all right, but it's only 2-way,
so reproducing would probably be harder.  Plus the fun of building the tests
themselves on somewhat different userland...

Anyway, I wonder why you care about __d_lookup_rcu() and ->d_inode stability;
d_mountpoint() _is_ stable at that point (we hold vfsmount_lock) and you
don't seem to look at ->d_inode at all in RCU case.  Note that ->d_automount()
is never called in RCU case at all; nor is ->lookup() and friends, so you
really only have ->d_manage() to cope with, what with autofs4 having no
->d_revalidate() anymore.

BTW, Nick has moved to kernel.dk; whether he's still reachable there is
a question, though - he seems to have disappeared since mid-January.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  3:58             ` Al Viro
@ 2011-02-24  5:47               ` Al Viro
  2011-02-24  7:23                 ` Ian Kent
  2011-02-24  6:34               ` Ian Kent
  1 sibling, 1 reply; 24+ messages in thread
From: Al Viro @ 2011-02-24  5:47 UTC (permalink / raw)
  To: Ian Kent
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, Feb 24, 2011 at 03:58:37AM +0000, Al Viro wrote:
> On Thu, Feb 24, 2011 at 11:28:57AM +0800, Ian Kent wrote:
> 
> > Ha, I haven't even turned on my Ultrsparc 2 in months, it's only got an
> > old version of Solaris on it now anyway, ;)
> 
> U60, with lenny (and mainline kernel) on it.  Probably ought to upgrade
> to squeeze one of those days...  It works, all right, but it's only 2-way,
> so reproducing would probably be harder.  Plus the fun of building the tests
> themselves on somewhat different userland...
> 
> Anyway, I wonder why you care about __d_lookup_rcu() and ->d_inode stability;
> d_mountpoint() _is_ stable at that point (we hold vfsmount_lock) and you
> don't seem to look at ->d_inode at all in RCU case.  Note that ->d_automount()
> is never called in RCU case at all; nor is ->lookup() and friends, so you
> really only have ->d_manage() to cope with, what with autofs4 having no
> ->d_revalidate() anymore.

FWIW, can we _ever_ get to __do_follow_link() with link->mnt != nd->path.mnt?
It's probably not what's happening here, or we would've stepped on another
BUG_ON(), but still it might be worth checking...

AFAICS, if we ever get there that way, we are fscked, so the check before
mntget() ought to replaced with BUG_ON(link->mnt != nd->path.mnt)...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  3:58             ` Al Viro
  2011-02-24  5:47               ` Al Viro
@ 2011-02-24  6:34               ` Ian Kent
  2011-02-24  7:07                 ` Al Viro
  1 sibling, 1 reply; 24+ messages in thread
From: Ian Kent @ 2011-02-24  6:34 UTC (permalink / raw)
  To: Al Viro
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, 2011-02-24 at 03:58 +0000, Al Viro wrote:
> 
> Anyway, I wonder why you care about __d_lookup_rcu() and ->d_inode stability;
> d_mountpoint() _is_ stable at that point (we hold vfsmount_lock) and you
> don't seem to look at ->d_inode at all in RCU case.  Note that ->d_automount()
> is never called in RCU case at all; nor is ->lookup() and friends, so you
> really only have ->d_manage() to cope with, what with autofs4 having no
> ->d_revalidate() anymore.

It does now but it doesn't do a whole lot, just checks for a negative
dentry, returning false, and drops out of RCU mode when the dentry isn't
selected or being checked for expiry yet.

Indeed, I shouldn't need to care about it and checking for it doesn't
seem to do a whole lot.

I've seen walks fail with a dentry that has gone away (ie. I'm walking
in a tree that is being expired, they seem to be trees with no actual
mount at their base), which should be because I'm not catching the block
request at a higher level dentry. Unfortunately, including a check for
the base dentry in __follow_mount_rcu(), regardless of it being a
mountpoint, doesn't always catch the need to block either. That the tree
is being selected for expire is probably because the expire can't detect
a walk within the mount tree when the walker is between the
__d_lookup_rcu() and __follow_mount_rcu() calls. Jumping out of rcu-walk
mode in an added revalidate reduces the possibility of this happening
(assuming walks are sneaking in between dropping and acquiring the
vfsmount_lock) but doesn't eliminate it. 

At this point I loop back to the trying to work out why a reference gets
picked up and the cycle repeats with various small changes to the
overall approach.

Actually, it's not just the root of a mount that is getting an extra
reference either. When I notice a root dentry has picked up a reference
(from a printk) and SIGKILL everything and start umounting manually I
get to a point in the tree where I need to umount autofs mounts twice,
which usually means nothing more than a get/put mismatch somewhere, but
in that case I normally don't see a root dentry pick up a reference.
That's confusing me too.

> 
> BTW, Nick has moved to kernel.dk; whether he's still reachable there is
> a question, though - he seems to have disappeared since mid-January.

Yeah, I've forwarded a couple of messages where I forgot to change the
email. A few emails back Nick said he was following the thread anyway.

Ian



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  6:34               ` Ian Kent
@ 2011-02-24  7:07                 ` Al Viro
  2011-02-24 10:07                   ` Ian Kent
  2011-02-24 10:21                   ` Ian Kent
  0 siblings, 2 replies; 24+ messages in thread
From: Al Viro @ 2011-02-24  7:07 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, Feb 24, 2011 at 02:34:20PM +0800, Ian Kent wrote:

> It does now but it doesn't do a whole lot, just checks for a negative
> dentry, returning false, and drops out of RCU mode when the dentry isn't
> selected or being checked for expiry yet.

Er?  Where does ->d_revalidate() exist in fs/autofs4?  Mainline doesn't
have anything of that kind...
 
> I've seen walks fail with a dentry that has gone away (ie. I'm walking
> in a tree that is being expired, they seem to be trees with no actual
> mount at their base), which should be because I'm not catching the block
> request at a higher level dentry. Unfortunately, including a check for
> the base dentry in __follow_mount_rcu(), regardless of it being a
> mountpoint, doesn't always catch the need to block either. That the tree
> is being selected for expire is probably because the expire can't detect
> a walk within the mount tree when the walker is between the
> __d_lookup_rcu() and __follow_mount_rcu() calls. Jumping out of rcu-walk
> mode in an added revalidate reduces the possibility of this happening
> (assuming walks are sneaking in between dropping and acquiring the
> vfsmount_lock) but doesn't eliminate it. 
> 
> At this point I loop back to the trying to work out why a reference gets
> picked up and the cycle repeats with various small changes to the
> overall approach.
> 
> Actually, it's not just the root of a mount that is getting an extra
> reference either. When I notice a root dentry has picked up a reference
> (from a printk) and SIGKILL everything and start umounting manually I
> get to a point in the tree where I need to umount autofs mounts twice,
> which usually means nothing more than a get/put mismatch somewhere, but
> in that case I normally don't see a root dentry pick up a reference.
> That's confusing me too.

OK, now I'm really confused.  Where does your tree live?

I really don't like the look of autofs4_tree_busy(), BTW - you don't need
RCU to get races here; just a plain chdir("../../.."); done by a process
that sits deep enough in subtree you are checking can very well race with
your check, so that your iterator gets through the new cwd before chdir()
and through the old one after it...

Maybe I'm missing something subtle here.  A braindump on how you expect
all "we choose dentry for expiry" stuff to work and what is really required
from it would be nice...
 
> > BTW, Nick has moved to kernel.dk; whether he's still reachable there is
> > a question, though - he seems to have disappeared since mid-January.
> 
> Yeah, I've forwarded a couple of messages where I forgot to change the
> email. A few emails back Nick said he was following the thread anyway.

Could you _please_ forward to him the question I tried to ask, without
any success so far?  Namely, what was the reason for reverting do_filp_open()
to use of do_path_lookup() in non-create case?  I have a couple of guesses,
but...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  5:47               ` Al Viro
@ 2011-02-24  7:23                 ` Ian Kent
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Kent @ 2011-02-24  7:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Nick Piggin, Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List

On Thu, 2011-02-24 at 05:47 +0000, Al Viro wrote:
> On Thu, Feb 24, 2011 at 03:58:37AM +0000, Al Viro wrote:
> > On Thu, Feb 24, 2011 at 11:28:57AM +0800, Ian Kent wrote:
> > 
> > > Ha, I haven't even turned on my Ultrsparc 2 in months, it's only got an
> > > old version of Solaris on it now anyway, ;)
> > 
> > U60, with lenny (and mainline kernel) on it.  Probably ought to upgrade
> > to squeeze one of those days...  It works, all right, but it's only 2-way,
> > so reproducing would probably be harder.  Plus the fun of building the tests
> > themselves on somewhat different userland...
> > 
> > Anyway, I wonder why you care about __d_lookup_rcu() and ->d_inode stability;
> > d_mountpoint() _is_ stable at that point (we hold vfsmount_lock) and you
> > don't seem to look at ->d_inode at all in RCU case.  Note that ->d_automount()
> > is never called in RCU case at all; nor is ->lookup() and friends, so you
> > really only have ->d_manage() to cope with, what with autofs4 having no
> > ->d_revalidate() anymore.
> 
> FWIW, can we _ever_ get to __do_follow_link() with link->mnt != nd->path.mnt?
> It's probably not what's happening here, or we would've stepped on another
> BUG_ON(), but still it might be worth checking...
> 
> AFAICS, if we ever get there that way, we are fscked, so the check before
> mntget() ought to replaced with BUG_ON(link->mnt != nd->path.mnt)...

Yeah, don't think autofs can go their with no ->follow_link() defined.
I've largely ignored that code to date.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  7:07                 ` Al Viro
@ 2011-02-24 10:07                   ` Ian Kent
  2011-02-24 14:59                     ` Al Viro
  2011-02-24 19:10                     ` Al Viro
  2011-02-24 10:21                   ` Ian Kent
  1 sibling, 2 replies; 24+ messages in thread
From: Ian Kent @ 2011-02-24 10:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, 2011-02-24 at 07:07 +0000, Al Viro wrote: 
> On Thu, Feb 24, 2011 at 02:34:20PM +0800, Ian Kent wrote:
> 
> > It does now but it doesn't do a whole lot, just checks for a negative
> > dentry, returning false, and drops out of RCU mode when the dentry isn't
> > selected or being checked for expiry yet.
> 
> Er?  Where does ->d_revalidate() exist in fs/autofs4?  Mainline doesn't
> have anything of that kind...

Sorry, that's just in my testing source.

> 
> > I've seen walks fail with a dentry that has gone away (ie. I'm walking
> > in a tree that is being expired, they seem to be trees with no actual
> > mount at their base), which should be because I'm not catching the block
> > request at a higher level dentry. Unfortunately, including a check for
> > the base dentry in __follow_mount_rcu(), regardless of it being a
> > mountpoint, doesn't always catch the need to block either. That the tree
> > is being selected for expire is probably because the expire can't detect
> > a walk within the mount tree when the walker is between the
> > __d_lookup_rcu() and __follow_mount_rcu() calls. Jumping out of rcu-walk
> > mode in an added revalidate reduces the possibility of this happening
> > (assuming walks are sneaking in between dropping and acquiring the
> > vfsmount_lock) but doesn't eliminate it. 
> > 
> > At this point I loop back to the trying to work out why a reference gets
> > picked up and the cycle repeats with various small changes to the
> > overall approach.
> > 
> > Actually, it's not just the root of a mount that is getting an extra
> > reference either. When I notice a root dentry has picked up a reference
> > (from a printk) and SIGKILL everything and start umounting manually I
> > get to a point in the tree where I need to umount autofs mounts twice,
> > which usually means nothing more than a get/put mismatch somewhere, but
> > in that case I normally don't see a root dentry pick up a reference.
> > That's confusing me too.
> 
> OK, now I'm really confused.  Where does your tree live?

Sorry.

> 
> I really don't like the look of autofs4_tree_busy(), BTW - you don't need
> RCU to get races here; just a plain chdir("../../.."); done by a process
> that sits deep enough in subtree you are checking can very well race with
> your check, so that your iterator gets through the new cwd before chdir()
> and through the old one after it...

Yes, that's a problem.

Previously the dcache_lock would have blocked on the dput() ... mmm, I'd
missed that so far, although Nick didn't talk with me about his changes
very much at all and I didn't pay enough attention to his patch series
along the way, oops!

> 
> Maybe I'm missing something subtle here.  A braindump on how you expect
> all "we choose dentry for expiry" stuff to work and what is really required
> from it would be nice...

Anything that has a reference count that reflects an open file or pwd is
considered busy. In some places only the dentry count is used but that's
mostly used (or was) to check for a concurrent path walk on the dentry
and was an optimization to bail out early.

A direct mount may have mount directly upon it and is checked for expiry
using may_umount_tree() on the root dentry and then the idle time is
checked.

An indirect mount may have mounts on dentrys within the root and it uses
may_umount_tree() and then checks idle time. It is supposed to ignore
mounts that are themselves autofs mounts (within the root directory) and
they are expected to be expired separately. If it sees any of those then
the mount is immediately considered busy.

Both indirect mount dentrys and direct mounts may have a tree of mounts
mounted on them, with or without a mount at the base. In these cases
anything mounted underneath, except for autofs trigger mounts, should
cause the tree to be seen as busy. In particular, submounts (an
independent indirect mount in itself) will have an ioctl file handle
open on it for its lifetime as will trigger mounts with a mount upon
them. 

When checking a direct mount or indirect mount dentry the super block
fs_lock is held which is meant to be used to prevent walks into that
dentry tree during the expire check. As you mentioned before fs_lock
should be a mutex not a spin lock, I agree.

When selected the AUTOFS_INF_EXPIRING flag is set as a result of the
expire check that, in conjunction with the fs_lock, should allow
processes to be blocked by autofs while a mount or tree of mounts is
expired. The DCACHE_MANAGE_TRANSIT flag has similar scope and is meant
to be used to get us into ->d_manage so we can check the autofs specific
flags although locking is not present for that check which might be a
problem. It wasn't prior to the rcu-walk changes.

> 
> > > BTW, Nick has moved to kernel.dk; whether he's still reachable there is
> > > a question, though - he seems to have disappeared since mid-January.
> > 
> > Yeah, I've forwarded a couple of messages where I forgot to change the
> > email. A few emails back Nick said he was following the thread anyway.
> 
> Could you _please_ forward to him the question I tried to ask, without
> any success so far?  Namely, what was the reason for reverting do_filp_open()
> to use of do_path_lookup() in non-create case?  I have a couple of guesses,
> but...

Nicks email, at least the one I think he is using is included in the cc
list above.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24  7:07                 ` Al Viro
  2011-02-24 10:07                   ` Ian Kent
@ 2011-02-24 10:21                   ` Ian Kent
  1 sibling, 0 replies; 24+ messages in thread
From: Ian Kent @ 2011-02-24 10:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, 2011-02-24 at 07:07 +0000, Al Viro wrote:
> On Thu, Feb 24, 2011 at 02:34:20PM +0800, Ian Kent wrote:
> 
> > It does now but it doesn't do a whole lot, just checks for a negative
> > dentry, returning false, and drops out of RCU mode when the dentry isn't
> > selected or being checked for expiry yet.
> 
> Er?  Where does ->d_revalidate() exist in fs/autofs4?  Mainline doesn't
> have anything of that kind...

snip ...

> 
> OK, now I'm really confused.  Where does your tree live?

It's a working tree.

I can update my autofs4 tree on kernel.org with the changes that I think
are actually needed so far. Even then I'm still not sure that revalidate
will be needed.

Ian



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24 10:07                   ` Ian Kent
@ 2011-02-24 14:59                     ` Al Viro
  2011-02-24 15:18                       ` Al Viro
  2011-02-25  3:07                       ` Ian Kent
  2011-02-24 19:10                     ` Al Viro
  1 sibling, 2 replies; 24+ messages in thread
From: Al Viro @ 2011-02-24 14:59 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, Feb 24, 2011 at 06:07:33PM +0800, Ian Kent wrote:
> > I really don't like the look of autofs4_tree_busy(), BTW - you don't need
> > RCU to get races here; just a plain chdir("../../.."); done by a process
> > that sits deep enough in subtree you are checking can very well race with
> > your check, so that your iterator gets through the new cwd before chdir()
> > and through the old one after it...
> 
> Yes, that's a problem.
> 
> Previously the dcache_lock would have blocked on the dput() ... mmm, I'd
> missed that so far, although Nick didn't talk with me about his changes
> very much at all and I didn't pay enough attention to his patch series
> along the way, oops!

Well...  In principle, you could mark all mountpoints in a tree as managed,
have an "expiry search in progress" flag to stop everything in there while
the expiry happens, then wait out all RCU walks in progress, then go ahead
with your expiry checks.  Would that work for you?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24 14:59                     ` Al Viro
@ 2011-02-24 15:18                       ` Al Viro
  2011-02-25  3:07                       ` Ian Kent
  1 sibling, 0 replies; 24+ messages in thread
From: Al Viro @ 2011-02-24 15:18 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, Feb 24, 2011 at 02:59:47PM +0000, Al Viro wrote:

> Well...  In principle, you could mark all mountpoints in a tree as managed,
> have an "expiry search in progress" flag to stop everything in there while
> the expiry happens, then wait out all RCU walks in progress, then go ahead
> with your expiry checks.  Would that work for you?

There's at least one leak I can see in expiry.c, BTW - autofs4_expire_direct()
leaks root dentry if you hit
        /* No point expiring a pending mount */
        if (ino->flags & AUTOFS_INF_PENDING) {
                spin_unlock(&sbi->fs_lock);
                return NULL;
        }
in there.  Try to stick dput(root) before that return and see if that's what
it is...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24 10:07                   ` Ian Kent
  2011-02-24 14:59                     ` Al Viro
@ 2011-02-24 19:10                     ` Al Viro
  1 sibling, 0 replies; 24+ messages in thread
From: Al Viro @ 2011-02-24 19:10 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, Feb 24, 2011 at 06:07:33PM +0800, Ian Kent wrote:

> Yes, that's a problem.
> 
> Previously the dcache_lock would have blocked on the dput() ... mmm, I'd
> missed that so far, although Nick didn't talk with me about his changes
> very much at all and I didn't pay enough attention to his patch series
> along the way, oops!

Actually, dcache_lock blocks only the _final_ dput().  I.e. one that has
refcount going to 0.  Not the case here, so that's a problem without
RCU stuff as well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] Fixes for vfs-scale and vfs-automount
  2011-02-24 14:59                     ` Al Viro
  2011-02-24 15:18                       ` Al Viro
@ 2011-02-25  3:07                       ` Ian Kent
  1 sibling, 0 replies; 24+ messages in thread
From: Ian Kent @ 2011-02-25  3:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Trond Myklebust, David Howells, Kernel Mailing List,
	linux-fsdevel, Linus Torvalds, Andrew Morton,
	Linux NFS Mailing List, npiggin

On Thu, 2011-02-24 at 14:59 +0000, Al Viro wrote:
> On Thu, Feb 24, 2011 at 06:07:33PM +0800, Ian Kent wrote:
> > > I really don't like the look of autofs4_tree_busy(), BTW - you don't need
> > > RCU to get races here; just a plain chdir("../../.."); done by a process
> > > that sits deep enough in subtree you are checking can very well race with
> > > your check, so that your iterator gets through the new cwd before chdir()
> > > and through the old one after it...
> > 
> > Yes, that's a problem.
> > 
> > Previously the dcache_lock would have blocked on the dput() ... mmm, I'd
> > missed that so far, although Nick didn't talk with me about his changes
> > very much at all and I didn't pay enough attention to his patch series
> > along the way, oops!
> 
> Well...  In principle, you could mark all mountpoints in a tree as managed,
> have an "expiry search in progress" flag to stop everything in there while
> the expiry happens, then wait out all RCU walks in progress, then go ahead
> with your expiry checks.  Would that work for you?

That approach would solve the problem.

It was deliberate to not make all dentrys as managed to avoid function
calls to ->manage() where possible but that might not be possible.

The priority right now though is the BUG() I'm getting due to the
elevated reference.

It turns out that the missing dput() you spotted won't trigger unless
user space does something stupid and that's why it didn't show up in
testing before.

Anyway, recent printk output is pointing me at do_filp_open() so I'm
probably going to need to look closely at it. I'm still gathering printk
evidence but I think that pain is coming.

Ian


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-02-25  3:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-18  4:05 [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
2011-01-18  4:06 ` [PATCH 1/3] autofs4 - fix get_next_positive_dentry() Ian Kent
2011-01-18  4:06 ` [PATCH 2/3] vfs - fix dentry ref count in do_lookup() Ian Kent
2011-01-18  4:44   ` Al Viro
2011-01-18  4:06 ` [PATCH 3/3] autofs4 - fix debug print in autofs4_lookup() Ian Kent
2011-01-19  7:06 ` [PATCH 0/3] Fixes for vfs-scale and vfs-automount Ian Kent
2011-02-15 14:25   ` Ian Kent
2011-02-23  7:22     ` Ian Kent
2011-02-23 16:37       ` Linus Torvalds
2011-02-24  1:58     ` Al Viro
2011-02-24  3:03       ` Ian Kent
2011-02-24  3:14         ` Al Viro
2011-02-24  3:28           ` Ian Kent
2011-02-24  3:58             ` Al Viro
2011-02-24  5:47               ` Al Viro
2011-02-24  7:23                 ` Ian Kent
2011-02-24  6:34               ` Ian Kent
2011-02-24  7:07                 ` Al Viro
2011-02-24 10:07                   ` Ian Kent
2011-02-24 14:59                     ` Al Viro
2011-02-24 15:18                       ` Al Viro
2011-02-25  3:07                       ` Ian Kent
2011-02-24 19:10                     ` Al Viro
2011-02-24 10:21                   ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).